Unnecessary Regular Expressions
18 September, 2012 § 13 Comments
Regular expressions are powerful. They are the jackhammer for all chiseling purposes. In some situations they work great, but often they are either overkill or computationally unreasonable.
Regular expressions introduce a state-machine that is hard to optimize for each case.
Here are a couple regular expressions that I’ve seen within code that was replaced with easier-to-read and faster code.
s.replace(/[^.]*/g, “”).length
This regular expression removes all non-period characters in a string. It then queries for the length of the new string. This effectively just gets the number of periods in the string.
A much simpler approach, and also faster, would be to walk through the string and just count each instance of the period character.
var i, l = s.length, count = 0; for (i = 0; i < l; i++) { if (s[i] == '.') count++; }
s.replace(/\s+$/, “”)
This regular expression removes all whitespace at the end of the string. A much simpler approach would be to use the trim() function. There also exists non-standard functions trimRight() and trimLeft() that can be used to only remove whitespace from a specified side of the string.
Testing it out
If you’d like to see the performance differences for yourself, you can run the character counting and string trimming benchmarks on JSPerf.
knowing a thing or two about JS performance may push you toward always using the native regex rather than writing a JS loop. i’m not saying it’s faster, only that it may well be an expectation by a lot of people..
in fact, i just tested it, and if my calculations are correct (it’s hard to do these things right), that first regex is ~20 times faster than that code you wrote, at least in Firefox 16 (beta).
my suggestion: don’t offer performance tips unless you thoroughly test them.
and btw, there is a faster way to write that code:
var count = 0;
var len = s.length;
for (var i=0; i<len; i++) if (s[i]==='.') count++;
c += count;
and it's about 30% faster than the regex one, but that's highly dependent on JS engine JIT optimization, and may very well be slower in some other browser, and especially on mobile (often no JIT), where it arguably matters the most.
so, i'm pretty sure your advice is not that good, as a regex is much more likely to perform close to native speeds, and it's much harder to shoot yourself in your foot (and make it ~20 times slower)..
I tested these out on my local machine and found the non-regex implementations to be faster (and an order of magnitude faster in the case of string trimming).
Here are my timings for 1,000,000 iterations:
character counting regex: 2173 ms
character counting loop: 1148 ms
string trimming regex: 1201 ms
native string trimRight: 93 ms
Thanks for the tweak in the counting algorithm. I updated the post with the changes that I had made locally when benchmarking this.
Were you using the Scratchpad to measure your timings? I have found the Scratchpad to produce unreliable timings based on its sandbox.
“for (let c of s)” is preferable to “for each (let c in s)”, i.e. semantically more in line with what you really want and theoretically faster.
I’ve added some more cases to the character counting test, and at least on my machine found a much faster solution using indexOf: http://jsperf.com/character-counting/2
Tested your testcase in Chrome, IE9, Nightly, FF14, FF 16:
Chrome 21 IE9 Nightly 18.0a1 FF14 FF15
Regex: 1.28M 2.25M 1.59M 1.64M 2.41M
Iterative: 1.57M 0.30M 2.60M 0.8M 0.8M
Iter no coer: 1.65M 0.26M 2.62M 0.86M 0.90M
Iter Prefix: 1.64M 0.31M 2.62M 0.82M 0.88M
So, whilst using regex is still a safe bet across browsers, for JS code inside Nightly, the IonMonkey compiler is a very clear winner!
Interesting is that this regex testcase in Chrome seems slower than in FF, whilst the V8 regex benchmark says that Chrome is faster at regex?
FF: RegExp: 1698,
Chrome: RegExp: 3705
(note, all tested in very unscientific conditions, but the same machine).
I added two more tests for the character counting in revision 2, and the indexOf version is quite a bit faster than the others (in Firefox 18) – in Chrome the split version is a very clear winner. http://jsperf.com/character-counting/2
Later someone added a faster for loop, but it’s still slower than the indexOf solution http://jsperf.com/character-counting/3
It might be worth looking into why Chrome is so much faster at split() than Firefox though…
Turns out bug 688219 covers that performance gap for string.split
Maybe the code you were reading predated Firefox 3.5 and so didn’t have the luxury of a .trim() method.
Also speaking of inefficient code, I had to disable JavaScript in order to be able to use this page…
I’m guessing it’s due to DetectZoom’s fallback code because I was using a browser that doesn’t support .matchMedia() (why does that code feel the need to detect the zoom anyway?)
This blog post reminds me of this hilarious SO post. I have a tshirt of it too that they sent out to some people: http://stackoverflow.com/a/1732454/3153
[…] a recent article by Jared Wein, Unnecessary Regular Expressions, the topic of counting the occurrence of a specific character in a string was brought up. While the […]