wcwidth.api
The public API of clj-wcwidth.
code-point->string
(code-point->string code-point)Returns the String representation of any Unicode code-point†, or nil when code-point is nil.
One of the ways this is useful is because Clojure/Java String literals only support escape sequences (i.e. "\uXXXX") for code points in the basic plane; code points in the supplementary planes must be manually converted into their UTF-16 surrogate pair, and then each UTF-16 code unit in the pair escaped separately (tedious and error prone).
†a char or int, but int is usually the better choice, because of historical limitations with Java’s char type
code-point-to-string
deprecated
(code-point-to-string code-point)Deprecated. Use code-point->string instead.
code-points->string
(code-points->string code-points)Returns a String made up of all of the given Unicode code-points†, or nil when code-points is nil.
†a sequence of chars or ints, but ints are usually the better choice, because of historical limitations with Java’s char type
code-points-to-string
deprecated
(code-points-to-string code-points)Deprecated. Use code-points->string instead.
combining?
(combining? code-point)Is code-point† a combining character?
†a char or int, but int is usually the better choice, because of historical limitations with Java’s char type
display-width
(display-width s)(display-width s & {:keys [ignore-ansi?], :or {ignore-ansi? false}})Returns the number of columns needed to display s (a String), but deviates from POSIX wcswidth behaviour in these ways:
- non-printing characters are considered zero width (instead of causing the entire result to be
-1) - ANSI escape sequences are (by default, but configurable) also considered zero width
For most use cases, this function is more useful than wcswidth, despite not adhering to POSIX.
Returns 0 when s is nil.
grapheme-clusters
(grapheme-clusters s)Returns the Unicode grapheme clusters (what we tend to think of as “characters”) in s as a sequence of Strings, or nil when s is nil.
Notes:
- Will use ICU4J’s
BreakIteratorclass when available on the classpath, falling back on the JDK’s lower qualityBreakIteratorclass otherwise
grapheme-clusters-impl
Which implementation is in use for finding grapheme clusters? A keyword with one of these values:
:icu4j:jdk
non-printing?
(non-printing? code-point)Is code-point† a non-printing character?
†a char or int, but int is usually the better choice, because of historical limitations with Java’s char type
null?
(null? code-point)Is code-point† a null character?
†a char or int, but int is usually the better choice, because of historical limitations with Java’s char type
re-ansi
A regular expression for matching ANSI escape sequences in a larger text. Taken directly from ECMA-48.
remove-ansi
(remove-ansi s)Strips all ANSI escape sequences from s (a String). Returns nil if s is nil.
string->code-points
(string->code-points s)Returns all of the Unicode code points in s (a String), as a sequence of ints, or nil when s is nil.
string-to-code-points
deprecated
(string-to-code-points s)Deprecated. Use string->code-points instead.
wcswidth
(wcswidth s)Returns the number of columns needed to represent s (a String). If a non-printing code point occurs in s, -1 is returned (as defined in POSIX).
Returns 0 when s is nil.
wcwidth
(wcwidth code-point)Returns the number of columns needed to represent the code-point †, based on these rules:
- Printable:
0,1, or2 - Null character, or
nil:0 - Non-printing:
-1
†a char or int, but int is usually the better choice, because of historical limitations with Java’s char type
wide?
(wide? code-point)Is code-point† in the East Asian Wide (W), East Asian Full-width (F), or other wide character (e.g. emoji) category?
†a char or int, but int is usually the better choice, because of historical limitations with Java’s char type