Automatically sorting graph curves

Sum­mary

A script for auto­mat­i­cally sort­ing graph curves, e.g. for gnu­plot.

Prob­lem

When you have a bunch of curves, and you plot them in an arbi­trary order, you might get the following:

Typ­i­cally, you want to sort the graphs in what appears to be visu­ally descend­ing order, as follows:

Sort­ing the curves is usu­ally done man­u­ally, by eye­balling the curves. How­ever, man­ual sort­ing of graph curves can become tedious. And when some curves don’t go out as far on the x-axis, it can be even trick­ier to place these short curves. (Some curves might be short if this exper­i­men­tal run trains more slowly.)

Heuris­tic approach

An auto­matic heuris­tic sort­ing approach is as follows:

  • We main­tain a sorted list of curves, from high­est to low­est. The sorted list is ini­tial­ized to empty.
  • At each iter­a­tion, we find the curve that goes the fur­thest out on the x-axis, but is not yet in the sorted list. We then will choose where to insert it into the sorted list.
    • For this curve and all curves in the sorted list, we want an esti­mate of the curve value at the cur­rent curve’s fur­thest x-value. We com­pute this esti­mate using a mov­ing aver­age. (For this rea­son, all curves should have aligned x-axis steps, and should have equidis­tant x-axis steps.)
    • We place this curve into the sorted list, to min­i­mize the num­ber of rank errors of curve esti­mates at this x-value.

And that’s it!

Exam­ple output

Here is the sorted out­put of a larger, more dif­fi­cult exam­ple, sorted using the above heuris­tic. Click on this image to get a larger ver­sion you can inspect:

A few of the deci­sions aren’t good. For exam­ple, why is curve 15 placed about curve 6? But most of the deci­sions are rea­son­able. For exam­ple, curve 13 is placed at the bot­tom, because it is very low com­pared to the other curves for the short dura­tion that curve 13 is present.

Code

I have writ­ten a script imple­ment­ing the heuris­tic above.

Here is the lat­est ver­sion of sort-curves.py.
You will also need movingaverage.py from my Python com­mon library.

USAGE:

./sort-curves.py *.dat

where every *.dat is in stan­dard (gnu­plot) two-column-per-line format:

xvalue yvalue

Over­all, I find this script a use­ful timesaver.

Reblog this post [with Zemanta]
blog comments powered by Disqus