Friday, January 28, 2011

Regex beats For loop for JavaScript search and replace

Here's an interesting test Rob and I did today at trying to figure out the best way to replace html entities (&) with their equivalent (&). We were hoping to basically replace the html_entity_decode function in php, but we're only worried about a limited subset of characters.

The basic approach is something like this:
str = str.replace(/&/g, '&');

There was something I found on stack overflow about this that is kind of start:

We needed to do many search and replaces and compared two methods. Loop through a list of characters and replace them individually or use a big regular expression and do only one match. I wasn't sure what would be best, but from the results clearly the regex wins.


Results (Safari 5 on top, Firefox 3.6 to the right, and Chrome 8 on the bottom):

Hopefully the timing comparison demonstrates how much faster the regex is over a normal loop! Oh and Firefox 3.6 hates this test, but Chrome really rocks it.

Final solution:
var htmlEntityDecode (function({
  var specialChars {
    inside Object.keys(specialChars).join("|"),
    regex =  new RegExp(inside'g');
  return function(str{
    return str.replace(regexfunction(html{
      return specialChars[html];

Update: it looks like someone else also came up with a similar answer: