Friday, January 28, 2011

Regex beats For loop for JavaScript search and replace

Here's an interesting test Rob and I did today at i.tv trying to figure out the best way to replace html entities (&) with their equivalent (&). We were hoping to basically replace the html_entity_decode function in php, but we're only worried about a limited subset of characters.

The basic approach is something like this:
str = str.replace(/&/g, '&');

There was something I found on stack overflow about this that is kind of start:
http://stackoverflow.com/questions/784586/convert-special-characters-to-html-in-javascript

We needed to do many search and replaces and compared two methods. Loop through a list of characters and replace them individually or use a big regular expression and do only one match. I wasn't sure what would be best, but from the results clearly the regex wins.

Code:

Results (Safari 5 on top, Firefox 3.6 to the right, and Chrome 8 on the bottom):



Hopefully the timing comparison demonstrates how much faster the regex is over a normal loop! Oh and Firefox 3.6 hates this test, but Chrome really rocks it.

Final solution:
var htmlEntityDecode (function({
  var specialChars {
      "&":"&",
      "
&eacute":"é",
      "©":"©",
      "ñ":"ñ",
      "æ":"æ",
      "Æ":"Æ",
      "¿":"¿",
      "£":"£",
      "¢":"¢",
      "®":"®"
    }
    inside Object.keys(specialChars).join("|"),
    regex =  new RegExp(inside'g');
    
  return function(str{
    return str.replace(regexfunction(html{
      return specialChars[html];
    });
  }
})();
  

Update: it looks like someone else also came up with a similar answer:

2 comments: