Word Boundaries and Lookahead Assertions

As I was trying to improve the lexer for postcss-calc, I learnt about two regular expression features: the word boundary anchor character and lookahead.

Word boundary anchor character: \b

In a regular expression, \b specifies that the expression has to match at the word boundary. For example, in JavaScript,

/123\b/.test('123 456')

returns true because the space is a word separator.


returns false, because in 123456 123 is not followed by a word separator.

\b in combination with digits and units can be treacherous, because it matches the . decimal point character. Say you are expecting only whole numbers, but the input also contains decimal numbers with units. /[0-9]+\b/ matches 123 in 123.45deg, leaving the .45deg string behind, which can give the illusion that the input matches expectations.

Lookahead assertions in regular expressions

While I was looking for how to exclude the \. character from word boundaries, I came across lookahead assertions. Lookahead assertions match a pattern depending on the pattern that follows it. The syntax for lookahead assertion can be confusing, as it looks like the syntax for non-capturing groups.

For example, appending the negative lookahead assertion (?!\.) to the pattern will only match the pattern if it is not followed by the decimal point. So


does not match any part of 123.45deg.

In a similar fashion, the positive lookahead assertion (?=\.) requires a decimal point after the pattern. In Java and in the 2018 edition of the ECMAScript standard, there’s also lookbehind assertions: (?<=\.) requires a decimal point before the pattern.