Comparing Nodes in a Tree

[September, 2012] By Andy Bartlett. Filed under: software —

I have no idea whether this algorithm is old or new, or even whether there are more efficient alternatives. But for a long time now, I’ve had a geeky fondness for an orphan data structure. I call it an orphan because I could never see any way of using it. It was just one of those pretty things. If you’ve ever had a geeky fondness for a data structure, then you’re pretty weird too.

I first met the idea when I was an undergraduate, studying Neurobiology. Chris Longuet-Higgins taught a small seminar class on the psychology of music. It was actually about some computer modeling work he did while at the Machine Intelligence unit in Edinburgh. The centerpiece of his office was a midi keyboard wired to a large two-dimensional display that lit up as he played. Remember the scene in Close Encounters?

Anyway, he proposed that any melody in western classical music is written in musical intervals which can all be uniquely described as products of the first three prime numbers. Music is our perception of the fundamental theorem of arithmetic. He described a three dimensional tonal space, where each discrete node on that space can be defined in the x,y,z co-ordinate system where x is the octave (2:1) , y is the fifth (3:2) and z is the major third (5:4). The interval is 1:n where n is x^a * y^b * z^{c –}and most important, the relationship between any two intervals makes sense within that space. The major scale is a discrete and obvious area, so is the minor scale. The aliens in Close Encounters were way behind Chris.

But the geeky idea that has stayed with me is that when you have such a coordinate space, only a discrete set of numbers are represented in the space, and they identify where you are in the space.

Imagine a one-dimensional space. The numbers 1, 2, 3 etc are also positions. Position number 2 is after 1 and before 3. So I can tag things with a number and know which comes before which. And now imagine any space where a single number identifies where you are and it is possible to derive an unambiguous location in that n-dimensional space from the number. Its probably first year stuff in Computer Science courses, but I read Biology and reinvent wheels sometimes.

So this week I was working on DOM nodes in an HTML document. I had two nodes and needed to know their relative position in the document. The DOM object has a compareDocumentPosition method, but surprise surprise it is not uniformly supported in all the browsers. And there are lots of code fragments across the internet to do the same thing. All horribly tacky.

I thought of Chris. Lets create a tree space. And there is an interesting shadow area around the top node that is more fun to think about than counting sheep, but lets say that the first child of a node is 2x and that each next sibling is (2x+1) of the previous one. Each number represents a node in the tree space. And by comparing these two numbers you can say which comes before the other in a depth first search order of the tree. So leaving the top node of the tree to one side for now, its first child has the value of 0. That child’s next sibling has the value 1. And the next, 3. And the next, 7 etc. The first child of node “3” is 6. Its next sibling is 13. And in this 2-dimensional world, 30 is greater than 96, 13 is greater than 17 and 121 is way bigger than 192. And there are is an infinite set of numbers between 2 and 3. The arithmetic is just different.

For any two nodes in the tree, you can get its “index” by racing back to the top of the tree, left, up, left, left, up etc. Feels just like working out heirs in a parentelic probate system.

here is some javascript code for calculating a node’s “number”:

function nodenumber(n)
{
        if (n == null)
                return 0;
        if (n.previousSibling != null)
                return (2 * nodenumber(n.previousSibling)) + 1;
        return 2 * nodenumber(n.parentNode);
}

and here’s code that decomposes a number into its path (why store information in a complex array structure when you can do it in an integer)

function nodearray( nn, na)
{
        if (nn == 0)
                return na;

        na.push(nn);

        if (nn & 0x1)
                return nodearray((nn-1)/2, na);

        return nodearray(nn/2, na);
}

Suppose you have two nodes – one with node-number N1 the other with node-number N2. And the function nodecompare(N1,N2) will act like strcmp and return -1 if N1 is before N2, 0 if they are the same and 1 if N1 is after N2:

function nodecompare(N1, N2)
{
        if (N1 == N2)
                return 0;

        return nodecompare1( nodearray( N1, new Array()), nodearray( N2, new Array()));
}

You decompose the numbers and then follow them back. I’ve written it in two stages for clarity. Everything is either a shift left (divide by 2) or a test for the first bit (odd number). All very natural and fast operations.

function nodecompare1(a1, a2)
{
        var i1 = a1.pop();
        var i2 = a2.pop();

        if (i1 == i2) {
                if (a1.length == 0)
                        return -1;
                if (a2.length == 0)
                        return 1;
                return nodecompare1( a1, a2);
        }
        if (i1 & 0x1)
                return 1;
        return -1;
}

Its useful to write it out using arrays, but all we’re really doing in the code above is breaking down the numbers and analyzing them bit by bit. The two numbers themselves can give you the answer without all of this effort.

function nodecmp( N1, N2)
{
        if (N1 == N2)
                return 0;

        if (N1 < N2) 
                N2 = (N1 & N2);
        else
                N1 = (N2 & N1);

        if (N1 > N2)
                return 1;

        return -1;
}

I was always taught to think of numbers as scalars. But it just ain’t so, Joe.

Comparing document position is a trivial application. And in a tree there is a clear notion of where the tree starts and in which direction it is going. In Chris’ tonal space, there is no such start and end point – in fact you are using the interval as a measure of relative distance between two places. But there are endless possibilities for my favorite orphan data structure – id cards where you have a node-number based on family relationships, single digit unique zipcodes, semantic ip addresses. And being able to tell similarity just by listening to a tonal representation of the relationship. Such fun. And if you’re thinking of racing to the PTO to patent my orphan, tough luck. The ideas were all there in Chris’ published papers 30 years ago, and they are obvious to this ordinarily creative PHOSITA.

Law and Software

Comparing Nodes in a Tree

Leave a Reply