A useful JavaScript idiom for histograms and binning

September 12, 2010

I once hated JavaScript. I do so no longer. It is actually a quite decent language, a combination of Scheme and Self. But this heritage is buried, and not easily visible at first. But once you get the hang of it, working in JavaScript can be fun. You just have to throw some misconceptions overboard, and learn to work with the grain of the language (I wholeheartedly recommend Douglas Crockford’s book JavaScript: The good parts. It is quite thin, which says a lot).

I’d like to share a nice JavaScript idiom that at first looks ugly, but is in fact quite elegant.

Let’s say you have some data with integer values, and you want to create a histogram of the distribution of those values, resp. their frequencies. Here, we don’t care where the data comes from – an array, over the wire, generated on the fly. We just want to store how often each value appears.

Of course, you can create an array a[i] to hold the number of times each value i appeared; first initialize it with zeros, then increase a[i] by one everytime you encounter an i. However, what to do if you don’t know the maximum i? Or what if the i‘s are sparsely distributed?

JavaScript offers sparse arrays as part of the language. However, you can’t initialize them (that’s the whole point of sparse data structures). So, any time you want to increase a[i], you first have to test if a[i] is already initialized. Looks like:

        var a = [];

        while (...) {
                i = ...;
                if (a[i] === undefined) {
                        a[i] = 0;
                }
                a[i]++;
        }

Then you can access the frequencies afterwards by using

        for (i in a) {
                if (typeof i == 'number') {
                        dosomething(a[i]);
                }
        }

(Never mind that the for in statement is brain-damaged and likes to traverse along the prototype chain. This is one of the stupid parts of JavaScript.)†

However, the explicit test for undefined in the first fragment seems horribly inelegant to me. It feels like a roadblock. I like the following approach much better, which seems to me to be a typical JavaScript idiom (and quite different from how you’d do it in other languages):

        var a = [];

        while (...) {
                i = ...;
                a[i] = a[i]+1 || 1;
        }

This works because a previously undefined array element yields undefined on reading. Adding a number to undefined yields NaN; NaN is false-y, therefore the second term of the disjunction is evaluated, which then sets the array element to the correct first value of one. If the element already was defined, then the first term produces a number larger than zero, which is true-y, therefore the second term is never evaluated. In either case, a[i] is updated with a correct value.

So, you access an array element that doesn’t exist. No exception is raised, but a special value undefined is returned that conveys an out-of-bounds error. You don’t care, you don’t test for it – instead, you do an arithmetic operation on this value. The result is again a special value, which indicates that you tried to do an arithmetic operation with an illegal value. Again, you don’t really care; you happily use this illegal value in a boolean expression. And only here you substitute a real value for the case that underway went something wrong. But not enough; in the boolean expression (a simple or), you don’t use any boolean value – you use numbers or illegal values, never a true or false.

To a purist, this looks like blasphemy. To a JavaScript enthusiast, this looks natural. My point is: work with the grain of the language, not against it. It feels much more natural.

I can’t believe I’m writing this.

† It is easy to rail against JavaScript. But I challenge you to come up with a better language, when you have only ten days for designing and implementing it …