Thursday, February 9, 2012

Balancing Graphs in O(n*log(n))

The Barnes-Hut algorithm is used to model N-body systems more efficiently, at the expense of precision. However, the improvement in performance is tremendous: from O( n² ) to O( n*log(n) ). For reference, this is what that looks like:

Since our force-based approach to computing graph layout is essentially nothing other than an N-body simulation, and we don't require great precision anyway, this algorithm seems perfect for us to use. The way it works is basically this:

  • The set of nodes is sorted into a quadtree, which is essentially a binary tree, but with an added dimension
  • Each of the quadtree sectors keeps track of its mass and center of mass
  • To compute the forces, the tree is traversed and for points that are "far" from a certain center of mass, only the force from the node to that center of mass is calculated, as opposed to each force between all nodes
This leads to a significant decrease in overall calculations that needs to be done, since distant nodes no longer need to interact directly. There is some overhead on creating the quadtree every frame, but the growth of this is also O(n*log(n)). If you want to learn more, here is a very good explanation of the algorithm complete with pseudocode (in German, but the code is understandable anyway). 

I haven't had time to implement this algorithm in our Qt application to graph Wikipedia, however I managed to code up the basic algorithm in Python. The improvement is tremendous, before I was never able to reasonably run a simulation with more than 80 or so nodes in Python, now even systems up to 350 nodes run at a reasonable framerate.
Illustration of the quadtree (unbalanced)
The same graph after balancing

The algorithm can also be parallelized very well, so the potential of this is tremendous, with good implementation and appropriate hardware we could realistically graph all of the German Wikipedia (~1.3 million articles), maybe more. 

The full Python code is here.

No comments: