wreck.api
The public API of wreck
.
Notes:
- Apart from passing through
nil
, this library does minimal argument checking, since the rules for regexes vary from platform to platform, and it is a first class requirement that callers be allowed to construct platform specific regexes if they wish. - As a result, all functions have the potential to throw platform-specific exceptions if the resulting regex is syntactically invalid.
- On the JVM, these will typically be instances of the
java.util.regex.PatternSyntaxException
class. - On JavaScript, these will typically be a
js/SyntaxError
. - Platform specific behaviour is particularly notable for short / empty regexes, such as
#"{}"
(an error on the JVM, fine but nonsensical on JS) and#"{1}"
(ironically, fine but nonsensical on the JVM, but an error on JS). 𤥠- Furthemore, JavaScript fundamentally doesnât support lossless round-tripping of
RegExp
objects toString
s and back, something this library relies upon and does extensively. The library makes a best effort to correct JavaScriptâs problematic implementation, but because itâs fundamentally lossy there are some cases that (on ClojureScript only) may change your regexes in unexpected (though probably not semantically significant) ways. - Regex flags are supported to the best ability of the library, but please carefully review the usage notes in README.md for various caveats when flags are used.
='
(=' _)
(=' re1 re2)
(=' re1 re2 & more)
Equality for regexes, defined by having equal string representations and flags (including flags that cannot be embedded).
Notes:
- Functionally equivalent regexes (e.g.
#"..."
and#".{3}"
are not considered='
. - Some regexes may not be
='
initially due to differing flag sets, but after being run through embed-flags may become='
, due to non-embeddable flags being silently dropped (see embed-flags for details).
alt
(alt & res)
Returns a regex that will match any one of res
, via alternation.
Notes:
- Duplicate elements in
res
will only appear once in the result. - Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
and'
(and' a b)
(and' a b s)
Returns an âandâ regex that will match a
and b
in any order, and with the s
eparator regex (if provided) between them. This is implemented as ASB|BSA
, which means that A and B must be distinct (must not match the same text).
Notes:
- May optimise the expression (via de-duplication in alt).
- Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
and-cg
(and-cg a b)
(and-cg a b s)
and-grp
(and-grp a b)
(and-grp a b s)
and-ncg
(and-ncg nm a b)
(and-ncg nm a b s)
embed-flags
(embed-flags re)
Embeds any flags found in re
at the start of re
in a non-capturing group (to ensure scoping), returning a new regex. Returns re
if re
contains no flags or is nil
.
For example, on the JVM #"(?i)[abc]+"
would become #"(?i:[abc]+)"
.
Similarly, on ClojureScript (doto (js/RegExp.) (.compile "[abc]+" "i"))
would also become #"(?i:[abc]+)"
.
Note:
- flags-grp is almost always a better choice than this function!
embed-flags
is primarily intended for internal use bywreck
, but may be useful in those rare cases where Clojure(Script) code receives a 3rd party regex, wishes to use it as part of composing a larger regex, doesnât know if it contains flags or not, and doesnât care that non-embeddable flags will be silently dropped. - â ď¸ On the JVM, ungrouped embedded flags in the middle of
re
will be moved to the beginning of the regex. This may alter the semantics of the regex - for examplea(?i)b
will become(?i:ab)
, which means thata
will be matched case-insensitively by the result, which is not the same as the original (which matches lower-casea
only). This is an unavoidable consequence of how the JVM regex engine reports flags. If you really need to use embedded flag(s) midway through a regex, use flags-grp to ensure proper scoping of the flag(s). - â ď¸ On the JVM, the programmatic flags
LITERAL
andCANON_EQ
have no embeddable equivalent, and will be silently dropped by this function. Use has-non-embeddable-flags? if you need to check for the presence of these flags (e.g. in a 3rd party regex). - â ď¸ On JavaScript, only the flags
ims
can be embedded. All other flags will be silently dropped by this function. Use has-non-embeddable-flags? if you need to check for the presence of these flags (e.g. in a 3rd party regex).
esc
(esc s)
Escapes s
(a String
) for use in a regex, returning a String
.
Notes:
- unlike most other fns in this namespace, this one does not support a regex as an input, nor return a regex as an output
flags-grp
(flags-grp flgs & res)
As for grp, but prefixes the group with flgs
(a String
). Returns nil
if flgs
is nil
or empty. Throws if flgs
contains an invalid flag character, including those that (ClojureScript only) cannot be embedded.
Notes:
- If you must use regex flags, it is STRONGLY RECOMMENDED that you use this function! Programmatically set flags and ungrouped embedded flags (e.g.
(?i)
) have no explicit scope and so cannot be reliably used to compose larger regexes.wreck
makes a best effort to always convert such âunscopedâ flags into their embedded equivalents when composing larger regexes (via embed-flags), but using flag groups explicitly in the first place is easier to reason about and avoids potential footguns. - Removes any ungrouped embedded flags in
re
(e.g.(?i)ab
), but unlike embed-flags does not check that they appear inflgs
. - â ď¸ On the JVM, ungrouped embedded flags in the middle of
re
will also be removed, which may alter the semantics of the regex. - â ď¸ On JavaScript, only the flags
ims
can be embedded (this is a limitation of the JavaScript regex engine). Other flags will result in ajs/SyntaxError
being thrown. - For the JVM, see the âspecial constructsâ section of the
java.util.regex.Pattern
JavaDoc for the set of valid flag characters. - For JavaScript, see the
RegExp
flags reference for the set of valid flag characters.
has-non-embeddable-flags?
(has-non-embeddable-flags? re)
Does re
have non-embeddable flags?
Notes:
- On the JVM, the only non-embeddable flags are the programmatic flags
LITERAL
andCANON_EQ
. - On JavaScript, this is every flag except
i
,m
, ands
.
join
(join & res)
Returns a regex that is all of the res
joined together. Each element in res
can be a regex, a String
or something that can be turned into a String
(including numbers, etc.). Returns nil
when no res
are provided, or theyâre all nil
.
Notes:
- â ď¸ In ClojureScript be cautious about using numbers in these calls, since JavaScriptâs number handling is a đ¤Ąshow. See the unit tests for examples.
ncg
(ncg nm & res)
As for grp, but uses a named capturing group named nm
. Returns nil
if nm
is nil
or blank. Throws if nm
is an invalid name for a named capturing group (alphanumeric only, must start with an alphabetical character, must be unique within the regex).
or'
(or' a b)
(or' a b s)
Returns an âinclusive orâ regex that will match a
or b
, or both, in any order, and with the s
eparator regex (if provided) between them. This is implemented as ASB|BSA|A|B
, which means that A and B must be distinct (must not match the same text).
Notes:
- May optimise the expression (via de-duplication in alt).
- Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
or-cg
(or-cg a b)
(or-cg a b s)
or-grp
(or-grp a b)
(or-grp a b s)
or-ncg
(or-ncg nm a b)
(or-ncg nm a b s)
str'
(str' o)
Returns the String
representation of o
, with special handling for RegExp
objects on ClojureScript in an attempt to correct JavaScriptâs APPALLING default stringification.
Notes:
- Embeds flags (as per embed-flags).
xor'
(xor' a b)
Returns an âexclusive orâ regex that will match a
or b
, but not both. This is identical to alt called with 2 arguments, and is provided as a convenience for those who might be building up large logic based regexes and would prefer to use more easily understood logical operator names throughout.
Notes:
- May optimise the expression (via de-duplication in alt).
- Does not wrap the result in a group, which, because alternation has the lowest precedence in regexes, runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should almost always be preferred.
xor-cg
(xor-cg a b)
xor-grp
(xor-grp a b)
xor-ncg
(xor-ncg nm a b)