wcwidth.api

The public API of clj-wcwidth.

code-point->string

(code-point->string code-point)

Returns the String representation of any Unicode code-point, or nil when code-point is nil.

One of the ways this is useful is because Clojure/Java String literals only support escape sequences (i.e. "\uXXXX") for code points in the basic plane; code points in the supplementary planes must be manually converted into their UTF-16 surrogate pair, and then each UTF-16 code unit in the pair escaped separately (tedious and error prone).

a char or int, but int is usually the better choice, because of historical limitations with Java’s char type

code-point-to-string

deprecated

(code-point-to-string code-point)

Deprecated. Use code-point->string instead.

code-points->string

(code-points->string code-points)

Returns a String made up of all of the given Unicode code-points, or nil when code-points is nil.

a sequence of chars or ints, but ints are usually the better choice, because of historical limitations with Java’s char type

code-points-to-string

deprecated

(code-points-to-string code-points)

Deprecated. Use code-points->string instead.

combining?

(combining? code-point)

Is code-point a combining character?

a char or int, but int is usually the better choice, because of historical limitations with Java’s char type

display-width

(display-width s)(display-width s & {:keys [ignore-ansi?], :or {ignore-ansi? false}})

Returns the number of columns needed to display s (a String), but deviates from POSIX wcswidth behaviour in these ways:

  • non-printing characters are considered zero width (instead of causing the entire result to be -1)
  • ANSI escape sequences are (by default, but configurable) also considered zero width

For most use cases, this function is more useful than wcswidth, despite not adhering to POSIX.

Returns 0 when s is nil.

grapheme-clusters

(grapheme-clusters s)

Returns the Unicode grapheme clusters (what we tend to think of as “characters”) in s as a sequence of Strings, or nil when s is nil.

Notes:

grapheme-clusters-impl

Which implementation is in use for finding grapheme clusters? A keyword with one of these values:

  • :icu4j
  • :jdk

non-printing?

(non-printing? code-point)

Is code-point a non-printing character?

a char or int, but int is usually the better choice, because of historical limitations with Java’s char type

null?

(null? code-point)

Is code-point a null character?

a char or int, but int is usually the better choice, because of historical limitations with Java’s char type

re-ansi

A regular expression for matching ANSI escape sequences in a larger text. Taken directly from ECMA-48.

remove-ansi

(remove-ansi s)

Strips all ANSI escape sequences from s (a String). Returns nil if s is nil.

string->code-points

(string->code-points s)

Returns all of the Unicode code points in s (a String), as a sequence of ints, or nil when s is nil.

string-to-code-points

deprecated

(string-to-code-points s)

Deprecated. Use string->code-points instead.

wcswidth

(wcswidth s)

Returns the number of columns needed to represent s (a String). If a non-printing code point occurs in s, -1 is returned (as defined in POSIX).

Returns 0 when s is nil.

wcwidth

(wcwidth code-point)

Returns the number of columns needed to represent the code-point , based on these rules:

  • Printable: 0, 1, or 2
  • Null character, or nil: 0
  • Non-printing: -1

a char or int, but int is usually the better choice, because of historical limitations with Java’s char type

wide?

(wide? code-point)

Is code-point in the East Asian Wide (W), East Asian Full-width (F), or other wide character (e.g. emoji) category?

a char or int, but int is usually the better choice, because of historical limitations with Java’s char type