Unnecessary Regular Expressions

18 September, 2012 § 13 Comments

Regular expressions are powerful. They are the jackhammer for all chiseling purposes. In some situations they work great, but often they are either overkill or computationally unreasonable.

Regular expressions introduce a state-machine that is hard to optimize for each case.

Here are a couple regular expressions that I’ve seen within code that was replaced with easier-to-read and faster code.

s.replace(/[^.]*/g, “”).length

This regular expression removes all non-period characters in a string. It then queries for the length of the new string. This effectively just gets the number of periods in the string.

A much simpler approach, and also faster, would be to walk through the string and just count each instance of the period character.

var i, l = s.length, count = 0;
for (i = 0; i < l; i++) {
  if (s[i] == '.')

s.replace(/\s+$/, “”)

This regular expression removes all whitespace at the end of the string. A much simpler approach would be to use the trim() function. There also exists non-standard functions trimRight() and trimLeft() that can be used to only remove whitespace from a specified side of the string.

Testing it out

If you’d like to see the performance differences for yourself, you can run the character counting and string trimming benchmarks on JSPerf.

Tagged: , ,

§ 13 Responses to Unnecessary Regular Expressions

  • tom jones says:

    knowing a thing or two about JS performance may push you toward always using the native regex rather than writing a JS loop. i’m not saying it’s faster, only that it may well be an expectation by a lot of people..

    • tom jones says:

      in fact, i just tested it, and if my calculations are correct (it’s hard to do these things right), that first regex is ~20 times faster than that code you wrote, at least in Firefox 16 (beta).

      my suggestion: don’t offer performance tips unless you thoroughly test them.

      and btw, there is a faster way to write that code:

      var count = 0;
      var len = s.length;
      for (var i=0; i<len; i++) if (s[i]==='.') count++;
      c += count;

      and it's about 30% faster than the regex one, but that's highly dependent on JS engine JIT optimization, and may very well be slower in some other browser, and especially on mobile (often no JIT), where it arguably matters the most.

      so, i'm pretty sure your advice is not that good, as a regex is much more likely to perform close to native speeds, and it's much harder to shoot yourself in your foot (and make it ~20 times slower)..

    • msujaws says:

      I tested these out on my local machine and found the non-regex implementations to be faster (and an order of magnitude faster in the case of string trimming).

      Here are my timings for 1,000,000 iterations:

      character counting regex: 2173 ms
      character counting loop: 1148 ms
      string trimming regex: 1201 ms
      native string trimRight: 93 ms

    • msujaws says:

      Thanks for the tweak in the counting algorithm. I updated the post with the changes that I had made locally when benchmarking this.

      Were you using the Scratchpad to measure your timings? I have found the Scratchpad to produce unreliable timings based on its sandbox.

  • Dao says:

    “for (let c of s)” is preferable to “for each (let c in s)”, i.e. semantically more in line with what you really want and theoretically faster.

  • I’ve added some more cases to the character counting test, and at least on my machine found a much faster solution using indexOf: http://jsperf.com/character-counting/2

  • Alfred Kayser says:

    Tested your testcase in Chrome, IE9, Nightly, FF14, FF 16:
    Chrome 21 IE9 Nightly 18.0a1 FF14 FF15
    Regex: 1.28M 2.25M 1.59M 1.64M 2.41M
    Iterative: 1.57M 0.30M 2.60M 0.8M 0.8M
    Iter no coer: 1.65M 0.26M 2.62M 0.86M 0.90M
    Iter Prefix: 1.64M 0.31M 2.62M 0.82M 0.88M

    So, whilst using regex is still a safe bet across browsers, for JS code inside Nightly, the IonMonkey compiler is a very clear winner!

    Interesting is that this regex testcase in Chrome seems slower than in FF, whilst the V8 regex benchmark says that Chrome is faster at regex?
    FF: RegExp: 1698,
    Chrome: RegExp: 3705

    (note, all tested in very unscientific conditions, but the same machine).

  • I added two more tests for the character counting in revision 2, and the indexOf version is quite a bit faster than the others (in Firefox 18) – in Chrome the split version is a very clear winner. http://jsperf.com/character-counting/2

    Later someone added a faster for loop, but it’s still slower than the indexOf solution http://jsperf.com/character-counting/3

    It might be worth looking into why Chrome is so much faster at split() than Firefox though…

  • Maybe the code you were reading predated Firefox 3.5 and so didn’t have the luxury of a .trim() method.

    Also speaking of inefficient code, I had to disable JavaScript in order to be able to use this page…

    • I’m guessing it’s due to DetectZoom’s fallback code because I was using a browser that doesn’t support .matchMedia() (why does that code feel the need to detect the zoom anyway?)

  • This blog post reminds me of this hilarious SO post. I have a tshirt of it too that they sent out to some people: http://stackoverflow.com/a/1732454/3153

  • […] a recent article by Jared Wein, Unnecessary Regular Expressions, the topic of counting the occurrence of a specific character in a string was brought up. While the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

What’s this?

You are currently reading Unnecessary Regular Expressions at JAWS.


%d bloggers like this: