As I was trying to improve the lexer for
postcss-calc, I learnt about two regular expression features: the word boundary anchor character and lookahead.
Word boundary anchor character:
In a regular expression,
\b specifies that the expression has to match at the word boundary.
true because the space is a word separator.
false, because in
123 is not followed by a word separator.
\b in combination with digits and units can be treacherous, because it matches the
. decimal point character. Say you are expecting only whole numbers, but the input also contains decimal numbers with units.
123.45deg, leaving the
.45deg string behind, which can give the illusion that the input matches expectations.
Lookahead assertions in regular expressions
While I was looking for how to exclude the
\. character from word boundaries, I came across lookahead assertions. Lookahead assertions match a pattern depending on the pattern that follows it.
The syntax for lookahead assertion can be confusing, as it looks like the syntax for non-capturing groups.
For example, appending the negative lookahead assertion
(?!\.) to the pattern will only match the pattern if it is not followed by the decimal point. So
does not match any part of
In a similar fashion, the positive lookahead assertion
(?=\.) requires a decimal point after the pattern. In Java and in the 2018 edition of the ECMAScript standard, there’s also lookbehind assertions:
(?<=\.) requires a decimal point before the pattern.