wreck.api
The public API of wreck.
Notes:
- Apart from passing through
nil, this library does minimal argument checking, since the rules for regexes vary from platform to platform, and it is a first class requirement that callers be allowed to construct platform specific regexes if they wish. - As a result, all functions have the potential to throw platform-specific exceptions if the resulting regex is syntactically invalid. On the JVM, these will typically be instances of the
java.util.regex.PatternSyntaxExceptionclass. On JavaScript, these will typically be ajs/SyntaxError. - Platform specific behaviour is particularly notable for short / empty regexes, such as
#"{}"(an error on the JVM, fine but nonsensical on JS) and#"{1}"(ironically fine but nonsensical on the JVM, but an error on JS). 𤥠- Furthemore, JavaScript fundamentally doesnât support lossless round-tripping of
RegExpobjects toStrings and back, something this library does extensively. The library makes a best effort to correct JavaScriptâs problematic implementation, but because itâs fundamentally lossy there are some cases that (on ClojureScript only) may change your regexes in unexpected (though probably not semantically significant) ways. - Regex flags are supported to the best ability of the library, but please carefully review the usage notes in README.md for various caveats when flags are used.
- None of these functions perform
Stringescaping or quoting automatically. You can use esc or qot for this.
='
(=' _)(=' re1 re2)(=' re1 re2 & more)Equality for regexes, defined by having equal string representations and flags (including flags that cannot be embedded).
Notes:
- Functionally equivalent regexes (e.g.
#"..."and#".{3}"are not considered='. - Some regexes may not be
='initially due to differing flag sets, but after being run through embed-flags may become=', due to non-embeddable flags being silently dropped (see embed-flags for details).
alt
(alt & res)Returns a regex that will match any one of res, via alternation:
re|re|re|...
Returns an empty regex (#"") if no res are provided, or theyâre all empty?â.
Notes:
- Duplicate elements in
reswill only appear once in the result. This equality comparison occurs after each re is run through embed-flags. - Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
and'
(and' a b)(and' a b s)Returns an âandâ regex that will match a and b in any order, and with the separator regex s (if provided) between them:
asb|bsa
Returns an empty regex (#"") if re is empty?â.
Notes:
aandbmust be distinct (must not match the same text) or else the resulting regex will be logically inconsistent (will not be an âandâ)- May optimise the expression (via de-duplication in alt).
- Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
and-cg
(and-cg a b)(and-cg a b s)and-fgrp
(and-fgrp flgs a b)(and-fgrp flgs a b s)and-grp
(and-grp a b)(and-grp a b s)and-ncg
(and-ncg nm a b)(and-ncg nm a b s)cg
(cg & res)As for grp, but emits a capturing group:
(res)
Returns an empty capturing group (#"()") if no res are provided, or theyâre all empty?â. It does this to ensure that capturing groups are preserved during composition, even if theyâre empty (since not doing so will break code that uses indexes to access matched group content).
chcl
(chcl & res)As for join, but encloses the joined res into a character class:
[res]
Returns an empty regex (#"") if no res are provided, or theyâre all empty?â.
Notes:
- â ď¸ On ClojureScript nested character classes donât work as one might expect, even though they will compile just fine. For example, this code matches as expected on ClojureJVM, but does not on ClojureScript (despite the regex compiling):
(re-matches #"[[a-m][o-z]]+" "az").
embed-flags
(embed-flags re)Embeds any programmatic or ungrouped flags found in re. It does this by removing all flags from re then wrapping it in a flag group containing those flags that are embeddable (non-embeddable flags are silently dropped - use has-non-embeddable-flags? if you need to check for this). Returns re if re contains no flags.
For example on the JVM, both (Pattern/compile "[abc]+" Pattern/CASE_INSENSITIVE) and #"(?i)[abc]+" would become #"(?i:[abc]+)".
Similarly, on ClojureScript (doto (js/RegExp.) (.compile "[abc]+" "i")) would become #"(?i:[abc]+)".
Note:
- fgrp is almost always a better choice than this function!
embed-flagsis primarily intended for internal use bywreck, but may be useful in those rare cases where Clojure(Script) code receives a 3rd party regex, wishes to use it as part of composing a larger regex, doesnât know if it contains flags or not, and doesnât care that non-embeddable flags will be silently dropped. - â ď¸ On the JVM, ungrouped embedded flags in the middle of
rewill be moved to the beginning of the regex. This may alter the semantics of the regex - for examplea(?i)bwill become(?i:ab), which means thatawill be matched case-insensitively by the result, which is not the same as the original (which matches lower-caseaonly). This is an unavoidable consequence of how the JVM regex engine reports flags. If you really need to use embedded flag(s) midway through a regex, use fgrp to ensure proper scoping of the flag(s). - â ď¸ On the JVM, the programmatic flags
LITERALandCANON_EQhave no embeddable equivalent, and will be silently dropped by this function. - â ď¸ On JavaScript, only the flags
i,m, andscan be embedded. All other flags will be silently dropped by this function.
esc
(esc s)Escapes s (a String) for use in a regex, returning a String. Returns nil if s is nil.
Notes:
- unlike most other fns in this namespace, this one does not support a regex as an input, nor return a regex as an output
exn
(exn n re)Returns a regex where re will match exactly n times:
re{n}
Returns an empty regex (#"") if re is empty?â.
fgrp
(fgrp flgs & res)As for grp, but emits an embedded flag group with flgs (a String):
(?flgs:res)
Devolves to grp if flgs is blank. Throws if flgs contains an invalid flag character, including those that (ClojureScript only) cannot be embedded.
Notes:
- If you must use regex flags, it is STRONGLY RECOMMENDED that you use this function! Programmatically set flags and ungrouped embedded flags (e.g.
(?i)) have no explicit scope and so cannot be reliably used to compose larger regexes.wreckmakes a best effort to always convert such âunscopedâ flags into their embedded (scoped) equivalents (using embed-flags) when composing larger regexes , but usingfgrpvoids potential footguns. - Removes any ungrouped embedded flags in
re(e.g.(?i)ab), but does not add them toflgsif they arenât already there. - â ď¸ On the JVM, ungrouped embedded flags in the middle of
re(e.g.a(?i)b) will also be removed, which may alter the semantics of the regex. - â ď¸ On JavaScript, only the flags
i,mandscan be embedded (this is a limitation of the JavaScript regex engine). Other flags will result in ajs/SyntaxErrorbeing thrown. - For the JVM, see the âspecial constructsâ section of the
java.util.regex.PatternJavaDoc for the set of valid flag characters. - For JavaScript, see the
RegExpflags reference for the set of valid flag characters (while keeping in mind most of them canât be embedded).
grp
(grp & res)has-non-embeddable-flags?
(has-non-embeddable-flags? re)Does re have non-embeddable flags?
Notes:
- On the JVM, the only non-embeddable flags are the programmatic flags
LITERALandCANON_EQ. - On JavaScript, this is every flag except
i,m, ands.
join
(join & res)Returns a regex that is all of the res joined together. Each element in res can be a regex, a String or something that can be turned into a String (including numbers, etc.). Returns an empty regex (#"") if no res are provided, or theyâre all empty?â.
Notes:
- â ď¸ In ClojureScript be cautious about using numbers in these calls, since JavaScriptâs number handling is a đ¤Ąshow. See the unit tests for examples.
n2m
(n2m n m re)Returns a regex where re will match from n to m times:
re{n,m}
Returns an empty regex (#"") if re is empty?â.
ncg
(ncg nm & res)nom
(nom n re)Returns a regex where re will match n or more times:
re{n,}
Returns an empty regex (#"") if re is empty?â.
oom
(oom re)Returns a regex where re will match one or more times:
re+
Returns an empty regex (#"") if re is empty?â.
opt
(opt re)or'
(or' a b)(or' a b s)Returns an âinclusive orâ regex that will match a or b, or both, in any order, and with the separator regex s (if provided) between them:
asb|bsa|a|b
Returns an empty regex (#"") if re is empty?â.
Notes:
aandbmust be distinct (must not match the same text) or else the resulting regex will be logically inconsistent (will not be an âorâ)- May optimise the expression (via de-duplication in alt).
- Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
or-cg
(or-cg a b)(or-cg a b s)or-fgrp
(or-fgrp flgs a b)(or-fgrp flgs a b s)or-grp
(or-grp a b)(or-grp a b s)or-ncg
(or-ncg nm a b)(or-ncg nm a b s)regex?
(regex? x)Is x a regex?
Notes:
- ClojureScript already has a
regexp?predicate incljs.core, but ClojureJVM doesnât. See this ask.clojure.org post.
str'
(str' x)Returns the String representation of x, with special handling for RegExp objects on ClojureScript in an attempt to correct JavaScriptâs APPALLING default stringification.
Notes:
- Embeds flags (as per embed-flags).
xor'
(xor' a b)Returns an âexclusive orâ regex that will match a or b, but not both:
a|b
This is identical to alt called with 2 arguments, but is provided as a convenience for those who might be building up large logic based regexes and would prefer to use more easily understood logical operator names throughout.
Notes:
- May optimise the expression (via de-duplication in alt).
- Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
xor-cg
(xor-cg a b)xor-fgrp
(xor-fgrp flgs a b)xor-grp
(xor-grp a b)xor-ncg
(xor-ncg nm a b)zom
(zom re)Returns a regex where re will match zero or more times:
re*
Returns an empty regex (#"") if re is empty?â.