JavaScript Tip - RegEx Instances Hold State

An edited street saying saying Do Not RegEx

You have a problem. The solution is RegEx. Now you have 2 problems.

Picture this somewhat familiar scenario – you run a new set of unit tests for a JavaScript function you have just written, and great, they all pass! It’s a pure function (or is it…) so you’re pretty confident your new function and tests will not cause any problems elsewhere in the codebase! You check it in and are immediately horrified as other tests start failing – or maybe even the one you just wrote!

Flaky Tests

Flaky tests, put simply, are tests that cannot be run reliably. In other words, they can fail unpredictably or under certain conditions. This is common enough to have its own term, so you may not panic just yet at your newly failing build.

However, upon closer inspection of your code, you still do not see any obvious dependencies on state or system information like date/time – what could be causing it?

Regular Expressions in JavaScript

Then you spot it.

const globalRegex = new RegExp('foo*', 'g');

That "g" in your RegEx object that you thought could be innocently shared across your functions and tests. You get that familiar, “Oh no” thought and sinking feeling as you rush to the docs. And lo and behold, your fears are realised:

From MDN

JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g., /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, test() can be used to iterate over multiple matches in a string of text (with capture groups).

Source: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test

The intent of this behaviour is to use the same regular expression instance on the same string to move through multiple matches within the same text. However, nothing stops you from using it repeatedly on any arbitrary string. This does not clear its already built-up state of matches and can result in matches where none are expected (or vice versa).

I’m not sure off the top of my head if other languages and frameworks have similar behaviour with RegEx (I try to minimise my contact with RegEx), but I wouldn’t be surprised if that is the case given the problem it solves.

Suggestions

Run from RegEx

My general recommendation would be to create a new instance of the RegEx object whenever needed and only consider reusing an instance for the particular use case of iterating over multiple matches in a string.

Alternatively, really consider if you need that ‘global’ modifier – perhaps what you’re actually doing can be accomplished without it, and you just need to know if any match is found.

And while sometimes not possible – maybe not using RegEx at all? I know that’s my preference!