wreck.api

The public API of wreck.

Notes:

  • Apart from passing through nil, this library does minimal argument checking, since the rules for regexes vary from platform to platform, and it is a first class requirement that callers be allowed to construct platform specific regexes if they wish.
  • As a result, all functions have the potential to throw platform-specific exceptions if the resulting regex is syntactically invalid.
  • On the JVM, these will typically be instances of the java.util.regex.PatternSyntaxException class.
  • On JavaScript, these will typically be a js/SyntaxError.
  • Platform specific behaviour is particularly notable for short / empty regexes, such as #"{}" (an error on the JVM, fine but nonsensical on JS) and #"{1}" (ironically, fine but nonsensical on the JVM, but an error on JS). 🤡
  • Furthemore, JavaScript fundamentally doesn’t support lossless round-tripping of RegExp objects to Strings and back, something this library relies upon and does extensively. The library makes a best effort to correct JavaScript’s problematic implementation, but because it’s fundamentally lossy there are some cases that (on ClojureScript only) may change your regexes in unexpected (though probably not semantically significant) ways.
  • Regex flags are supported to the best ability of the library, but please carefully review the usage notes in README.md for various caveats when flags are used.

='

(=' _)(=' re1 re2)(=' re1 re2 & more)

Equality for regexes, defined by having equal string representations and flags (including flags that cannot be embedded).

Notes:

  • Functionally equivalent regexes (e.g. #"..." and #".{3}" are not considered ='.
  • Some regexes may not be =' initially due to differing flag sets, but after being run through embed-flags may become =', due to non-embeddable flags being silently dropped (see embed-flags for details).

alt

(alt & res)

Returns a regex that will match any one of res, via alternation.

Notes:

  • Duplicate elements in res will only appear once in the result.
  • Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.

alt-cg

(alt-cg & res)

alt then cg.

alt-grp

(alt-grp & res)

alt then grp.

alt-ncg

(alt-ncg nm & res)

alt then ncg.

and'

(and' a b)(and' a b s)

Returns an ‘and’ regex that will match a and b in any order, and with the separator regex (if provided) between them. This is implemented as ASB|BSA, which means that A and B must be distinct (must not match the same text).

Notes:

  • May optimise the expression (via de-duplication in alt).
  • Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.

and-cg

(and-cg a b)(and-cg a b s)

and’ then cg.

Notes:

  • Unlike most other -cg fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

and-grp

(and-grp a b)(and-grp a b s)

and’ then grp.

Notes:

  • Unlike most other -grp fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

and-ncg

(and-ncg nm a b)(and-ncg nm a b s)

and’ then ncg.

Notes:

  • Unlike most other -ncg fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

cg

(cg & res)

As for grp, but uses a capturing group.

embed-flags

(embed-flags re)

Embeds any flags found in re at the start of re in a non-capturing group (to ensure scoping), returning a new regex. Returns re if re contains no flags or is nil.

For example, on the JVM #"(?i)[abc]+" would become #"(?i:[abc]+)".

Similarly, on ClojureScript (doto (js/RegExp.) (.compile "[abc]+" "i")) would also become #"(?i:[abc]+)".

Note:

  • flags-grp is almost always a better choice than this function! embed-flags is primarily intended for internal use by wreck, but may be useful in those rare cases where Clojure(Script) code receives a 3rd party regex, wishes to use it as part of composing a larger regex, doesn’t know if it contains flags or not, and doesn’t care that non-embeddable flags will be silently dropped.
  • ⚠️ On the JVM, ungrouped embedded flags in the middle of re will be moved to the beginning of the regex. This may alter the semantics of the regex - for example a(?i)b will become (?i:ab), which means that a will be matched case-insensitively by the result, which is not the same as the original (which matches lower-case a only). This is an unavoidable consequence of how the JVM regex engine reports flags. If you really need to use embedded flag(s) midway through a regex, use flags-grp to ensure proper scoping of the flag(s).
  • ⚠️ On the JVM, the programmatic flags LITERAL and CANON_EQ have no embeddable equivalent, and will be silently dropped by this function. Use has-non-embeddable-flags? if you need to check for the presence of these flags (e.g. in a 3rd party regex).
  • ⚠️ On JavaScript, only the flags ims can be embedded. All other flags will be silently dropped by this function. Use has-non-embeddable-flags? if you need to check for the presence of these flags (e.g. in a 3rd party regex).

empty?'

(empty?' re)

Is re nil or (=' #"")?

Notes:

  • Takes flags (if any) into account.

esc

(esc s)

Escapes s (a String) for use in a regex, returning a String.

Notes:

  • unlike most other fns in this namespace, this one does not support a regex as an input, nor return a regex as an output

exn

(exn n re)

Returns a regex where re will match exactly n times.

exn-cg

(exn-cg n & res)

cg then exn.

exn-grp

(exn-grp n & res)

grp then exn.

exn-ncg

(exn-ncg nm n & res)

ncg then exn.

flags-grp

(flags-grp flgs & res)

As for grp, but prefixes the group with flgs (a String). Returns nil if flgs is nil or empty. Throws if flgs contains an invalid flag character, including those that (ClojureScript only) cannot be embedded.

Notes:

  • If you must use regex flags, it is STRONGLY RECOMMENDED that you use this function! Programmatically set flags and ungrouped embedded flags (e.g. (?i)) have no explicit scope and so cannot be reliably used to compose larger regexes. wreck makes a best effort to always convert such ‘unscoped’ flags into their embedded equivalents when composing larger regexes (via embed-flags), but using flag groups explicitly in the first place is easier to reason about and avoids potential footguns.
  • Removes any ungrouped embedded flags in re (e.g. (?i)ab), but unlike embed-flags does not check that they appear in flgs.
  • ⚠️ On the JVM, ungrouped embedded flags in the middle of re will also be removed, which may alter the semantics of the regex.
  • ⚠️ On JavaScript, only the flags ims can be embedded (this is a limitation of the JavaScript regex engine). Other flags will result in a js/SyntaxError being thrown.
  • For the JVM, see the ‘special constructs’ section of the java.util.regex.Pattern JavaDoc for the set of valid flag characters.
  • For JavaScript, see the RegExp flags reference for the set of valid flag characters.

grp

(grp & res)

As for join, but encloses the joined res into a single non-capturing group.

has-non-embeddable-flags?

(has-non-embeddable-flags? re)

Does re have non-embeddable flags?

Notes:

  • On the JVM, the only non-embeddable flags are the programmatic flags LITERAL and CANON_EQ.
  • On JavaScript, this is every flag except i, m, and s.

join

(join & res)

Returns a regex that is all of the res joined together. Each element in res can be a regex, a String or something that can be turned into a String (including numbers, etc.). Returns nil when no res are provided, or they’re all nil.

Notes:

  • ⚠️ In ClojureScript be cautious about using numbers in these calls, since JavaScript’s number handling is a 🤡show. See the unit tests for examples.

n2m

(n2m n m re)

Returns a regex where re will match from n to m times.

n2m-cg

(n2m-cg n m & res)

cg then n2m.

n2m-grp

(n2m-grp n m & res)

grp then n2m.

n2m-ncg

(n2m-ncg nm n m & res)

ncg then n2m.

ncg

(ncg nm & res)

As for grp, but uses a named capturing group named nm. Returns nil if nm is nil or blank. Throws if nm is an invalid name for a named capturing group (alphanumeric only, must start with an alphabetical character, must be unique within the regex).

nom

(nom n re)

Returns a regex where re will match n or more times.

nom-cg

(nom-cg n & res)

cg then nom.

nom-grp

(nom-grp n & res)

grp then nom.

nom-ncg

(nom-ncg nm n & res)

ncg then nom.

oom

(oom re)

Returns a regex where re will match one or more times.

oom-cg

(oom-cg & res)

cg then oom.

oom-grp

(oom-grp & res)

grp then oom.

oom-ncg

(oom-ncg nm & res)

ncg then oom.

opt

(opt re)

Returns a regex where re is optional.

opt-cg

(opt-cg & res)

cg then opt.

opt-grp

(opt-grp & res)

grp then opt.

opt-ncg

(opt-ncg nm & res)

ncg then opt.

or'

(or' a b)(or' a b s)

Returns an ‘inclusive or’ regex that will match a or b, or both, in any order, and with the separator regex (if provided) between them. This is implemented as ASB|BSA|A|B, which means that A and B must be distinct (must not match the same text).

Notes:

  • May optimise the expression (via de-duplication in alt).
  • Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.

or-cg

(or-cg a b)(or-cg a b s)

or’ then cg.

Notes:

  • Unlike most other -cg fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

or-grp

(or-grp a b)(or-grp a b s)

or’ then grp.

Notes:

  • Unlike most other -grp fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

or-ncg

(or-ncg nm a b)(or-ncg nm a b s)

or’ then ncg.

Notes:

  • Unlike most other -ncg fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

qot

(qot re)

Quotes re (anything that can be accepted by join), returning a regex.

str'

(str' o)

Returns the String representation of o, with special handling for RegExp objects on ClojureScript in an attempt to correct JavaScript’s APPALLING default stringification.

Notes:

xor'

(xor' a b)

Returns an ‘exclusive or’ regex that will match a or b, but not both. This is identical to alt called with 2 arguments, and is provided as a convenience for those who might be building up large logic based regexes and would prefer to use more easily understood logical operator names throughout.

Notes:

  • May optimise the expression (via de-duplication in alt).
  • Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.

xor-cg

(xor-cg a b)

xor’ then cg.

Notes:

  • Unlike most other -cg fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

xor-grp

(xor-grp a b)

xor’ then grp.

Notes:

  • Unlike most other -grp fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

xor-ncg

(xor-ncg nm a b)

xor’ then ncg.

Notes:

  • Unlike most other -ncg fns, this one does not accept any number of res.
  • May optimise the expression (via de-duplication in alt).

zom

(zom re)

Returns a regex where re will match zero or more times.

zom-cg

(zom-cg & res)

cg then zom.

zom-grp

(zom-grp & res)

grp then zom.

zom-ncg

(zom-ncg nm & res)

ncg then zom.