JavaScript Tip - RegEx Instances Hold State
You have a problem. The solution is RegEx. Now you have 2 problems.
Picture this somewhat familiar scenario – you run a new set of unit tests for a JavaScript function you have just written, and great, they all pass! It’s a pure function (or is it…) so you’re pretty confident your new function and tests will not cause any problems elsewhere in the codebase! You check it in and are immediately horrified as other tests start failing – or maybe even the one you just wrote!
Flaky Tests
Flaky tests, put simply, are tests that cannot be run reliably. In other words, they can fail unpredictably or under certain conditions. This is common enough to have its own term, so you may not panic just yet at your newly failing build.
However, upon closer inspection of your code, you still do not see any obvious dependencies on state or system information like date/time – what could be causing it?
Regular Expressions in JavaScript
Then you spot it.
const globalRegex = new RegExp('foo*', 'g');
That "g"
in your RegEx object that you thought could be innocently shared across your functions and tests. You get that familiar, “Oh no” thought and sinking feeling as you rush to the docs. And lo and behold, your fears are realised:
JavaScript RegExp
objects are stateful when they have the global
or sticky
flags set (e.g., /foo/g
or /foo/y
). They store a lastIndex
from the previous match. Using this internally, test()
can be used to iterate over multiple matches in a string of text (with capture groups).
Source: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test
The intent of this behaviour is to use the same regular expression instance on the same string to move through multiple matches within the same text. However, nothing stops you from using it repeatedly on any arbitrary string. This does not clear its already built-up state of matches and can result in matches where none are expected (or vice versa).
I’m not sure off the top of my head if other languages and frameworks have similar behaviour with RegEx (I try to minimise my contact with RegEx), but I wouldn’t be surprised if that is the case given the problem it solves.
Suggestions
My general recommendation would be to create a new instance of the RegEx object whenever needed and only consider reusing an instance for the particular use case of iterating over multiple matches in a string.
Alternatively, really consider if you need that ‘global’ modifier – perhaps what you’re actually doing can be accomplished without it, and you just need to know if any match is found.
And while sometimes not possible – maybe not using RegEx at all? I know that’s my preference!