Unnecessary Regular Expressions

18 September, 2012 § 13 Comments

Regular expressions are powerful. They are the jackhammer for all chiseling purposes. In some situations they work great, but often they are either overkill or computationally unreasonable.

Regular expressions introduce a state-machine that is hard to optimize for each case.

Here are a couple regular expressions that I’ve seen within code that was replaced with easier-to-read and faster code.

s.replace(/[^.]*/g, “”).length

This regular expression removes all non-period characters in a string. It then queries for the length of the new string. This effectively just gets the number of periods in the string.

A much simpler approach, and also faster, would be to walk through the string and just count each instance of the period character.

var i, l = s.length, count = 0;
for (i = 0; i < l; i++) {
  if (s[i] == '.')
    count++;
}

s.replace(/\s+$/, “”)

This regular expression removes all whitespace at the end of the string. A much simpler approach would be to use the trim() function. There also exists non-standard functions trimRight() and trimLeft() that can be used to only remove whitespace from a specified side of the string.

Testing it out

If you’d like to see the performance differences for yourself, you can run the character counting and string trimming benchmarks on JSPerf.

Tagged: javascript, planet-mozilla, programming

§ 13 Responses to Unnecessary Regular Expressions

tom jones says:

18 September, 2012 at 4:49 pm

knowing a thing or two about JS performance may push you toward always using the native regex rather than writing a JS loop. i’m not saying it’s faster, only that it may well be an expectation by a lot of people..

Reply
- tom jones says:
  
  18 September, 2012 at 5:43 pm
  
  in fact, i just tested it, and if my calculations are correct (it’s hard to do these things right), that first regex is ~20 times faster than that code you wrote, at least in Firefox 16 (beta).
  
  my suggestion: don’t offer performance tips unless you thoroughly test them.
  
  and btw, there is a faster way to write that code:
  
  var count = 0;
  var len = s.length;
  for (var i=0; i<len; i++) if (s[i]==='.') count++;
  c += count;
  
  and it's about 30% faster than the regex one, but that's highly dependent on JS engine JIT optimization, and may very well be slower in some other browser, and especially on mobile (often no JIT), where it arguably matters the most.
  
  so, i'm pretty sure your advice is not that good, as a regex is much more likely to perform close to native speeds, and it's much harder to shoot yourself in your foot (and make it ~20 times slower)..
- msujaws says:
  
  18 September, 2012 at 5:56 pm
  
  I tested these out on my local machine and found the non-regex implementations to be faster (and an order of magnitude faster in the case of string trimming).
  
  Here are my timings for 1,000,000 iterations:
  
  character counting regex: 2173 ms
  character counting loop: 1148 ms
  string trimming regex: 1201 ms
  native string trimRight: 93 ms
- msujaws says:
  
  18 September, 2012 at 6:00 pm
  
  Thanks for the tweak in the counting algorithm. I updated the post with the changes that I had made locally when benchmarking this.
  
  Were you using the Scratchpad to measure your timings? I have found the Scratchpad to produce unreliable timings based on its sandbox.
Dao says:

18 September, 2012 at 5:29 pm

“for (let c of s)” is preferable to “for each (let c in s)”, i.e. semantically more in line with what you really want and theoretically faster.

Reply
Christian Sonne says:

19 September, 2012 at 12:47 am

I’ve added some more cases to the character counting test, and at least on my machine found a much faster solution using indexOf: http://jsperf.com/character-counting/2

Reply
Alfred Kayser says:

19 September, 2012 at 12:54 am

Tested your testcase in Chrome, IE9, Nightly, FF14, FF 16:
Chrome 21 IE9 Nightly 18.0a1 FF14 FF15
Regex: 1.28M 2.25M 1.59M 1.64M 2.41M
Iterative: 1.57M 0.30M 2.60M 0.8M 0.8M
Iter no coer: 1.65M 0.26M 2.62M 0.86M 0.90M
Iter Prefix: 1.64M 0.31M 2.62M 0.82M 0.88M

So, whilst using regex is still a safe bet across browsers, for JS code inside Nightly, the IonMonkey compiler is a very clear winner!

Interesting is that this regex testcase in Chrome seems slower than in FF, whilst the V8 regex benchmark says that Chrome is faster at regex?
FF: RegExp: 1698,
Chrome: RegExp: 3705

(note, all tested in very unscientific conditions, but the same machine).

Reply
Christian Sonne (@ChristianSonne) says:

19 September, 2012 at 1:58 am

I added two more tests for the character counting in revision 2, and the indexOf version is quite a bit faster than the others (in Firefox 18) – in Chrome the split version is a very clear winner. http://jsperf.com/character-counting/2

Later someone added a faster for loop, but it’s still slower than the indexOf solution http://jsperf.com/character-counting/3

It might be worth looking into why Chrome is so much faster at split() than Firefox though…

Reply
- Christian Sonne (@ChristianSonne) says:
  
  19 September, 2012 at 2:45 am
  
  Turns out bug 688219 covers that performance gap for string.split
Neil Rashbrook says:

19 September, 2012 at 5:02 am

Maybe the code you were reading predated Firefox 3.5 and so didn’t have the luxury of a .trim() method.

Also speaking of inefficient code, I had to disable JavaScript in order to be able to use this page…

Reply
- Neil Rashbrook says:
  
  19 September, 2012 at 5:16 am
  
  I’m guessing it’s due to DetectZoom’s fallback code because I was using a browser that doesn’t support .matchMedia() (why does that code feel the need to detect the zoom anyway?)
Brian R. Bondy says:

19 September, 2012 at 6:43 am

This blog post reminds me of this hilarious SO post. I have a tshirt of it too that they sent out to some people: http://stackoverflow.com/a/1732454/3153

Reply
Counting dots « Geeks By Nature says:

26 September, 2012 at 9:08 am

[…] a recent article by Jared Wein, Unnecessary Regular Expressions, the topic of counting the occurrence of a specific character in a string was brought up. While the […]

Reply