wcwidth.api
The public API of clj-wcwidth
.
code-point->string
(code-point->string code-point)
Returns the String
representation of any Unicode code-point
†, or nil
when code-point
is nil
.
One of the ways this is useful is because Clojure/Java String
literals only support escape sequences (i.e. "\uXXXX"
) for code points in the basic plane; code points in the supplementary planes must be manually converted into their UTF-16 surrogate pair, and then each UTF-16 code unit in the pair escaped separately (tedious and error prone).
†a char
or int
, but int
is usually the better choice, because of historical limitations with Java’s char
type
code-point-to-string
deprecated
(code-point-to-string code-point)
Deprecated. Use code-point->string instead.
code-points->string
(code-points->string code-points)
Returns a String
made up of all of the given Unicode code-points
†, or nil
when code-points
is nil
.
†a sequence of char
s or int
s, but int
s are usually the better choice, because of historical limitations with Java’s char
type
code-points-to-string
deprecated
(code-points-to-string code-points)
Deprecated. Use code-points->string instead.
combining?
(combining? code-point)
Is code-point
† a combining character?
†a char
or int
, but int
is usually the better choice, because of historical limitations with Java’s char
type
display-width
(display-width s)
(display-width s & {:keys [ignore-ansi?], :or {ignore-ansi? false}})
Returns the number of columns needed to display s
(a String
), but deviates from POSIX wcswidth behaviour in these ways:
- non-printing characters are considered zero width (instead of causing the entire result to be
-1
) - ANSI escape sequences are (by default, but configurable) also considered zero width
For most use cases, this function is more useful than wcswidth, despite not adhering to POSIX.
Returns 0
when s
is nil
.
grapheme-clusters
(grapheme-clusters s)
Returns the Unicode grapheme clusters (what we tend to think of as “characters”) in s
as a sequence of String
s, or nil
when s
is nil
.
Notes:
- Will use ICU4J’s
BreakIterator
class when available on the classpath, falling back on the JDK’s lower qualityBreakIterator
class otherwise
grapheme-clusters-impl
Which implementation is in use for finding grapheme clusters? A keyword with one of these values:
:icu4j
:jdk
non-printing?
(non-printing? code-point)
Is code-point
† a non-printing character?
†a char
or int
, but int
is usually the better choice, because of historical limitations with Java’s char
type
null?
(null? code-point)
Is code-point
† a null character?
†a char
or int
, but int
is usually the better choice, because of historical limitations with Java’s char
type
re-ansi
A regular expression for matching ANSI escape sequences in a larger text. Taken directly from ECMA-48.
remove-ansi
(remove-ansi s)
Strips all ANSI escape sequences from s
(a String
). Returns nil
if s
is nil
.
string->code-points
(string->code-points s)
Returns all of the Unicode code points in s
(a String
), as a sequence of int
s, or nil
when s
is nil
.
string-to-code-points
deprecated
(string-to-code-points s)
Deprecated. Use string->code-points instead.
wcswidth
(wcswidth s)
Returns the number of columns needed to represent s
(a String
). If a non-printing code point occurs in s
, -1
is returned (as defined in POSIX).
Returns 0
when s
is nil
.
wcwidth
(wcwidth code-point)
Returns the number of columns needed to represent the code-point
†, based on these rules:
- Printable:
0
,1
, or2
- Null character, or
nil
:0
- Non-printing:
-1
†a char
or int
, but int
is usually the better choice, because of historical limitations with Java’s char
type
wide?
(wide? code-point)
Is code-point
† in the East Asian Wide (W), East Asian Full-width (F), or other wide character (e.g. emoji) category?
†a char
or int
, but int
is usually the better choice, because of historical limitations with Java’s char
type