Kragen Javier Sitaker | 18 Oct 2011 00:55

radix-sorting rational numbers with an efficient serialization of continued fractions

There are a family of interesting algorithms that sort lexicographically
on strings of bits or bytes rather than by comparing items pairwise with
some kind of customizable comparison function on individual items.

Optimal comparison sorts such as heapsort, quicksort, mergesort,
library sort, and sorting with a red-black tree are often described as
having O(N lg N) runtime, but in fact this is only true if we assume
individual comparisons take constant time.  In fact, though,
individual lexicographical comparisons can take up to O(N) time as
well, if N is the total size of the input in bits; consider comparing
“aaaaaaaaaaaaaaaab” to “aaaaaaaaaaaaaaaac”.  Consequently their
average runtimes are slightly worse, and their worst-case runtimes are
much worse.

Radix sorting algorithms, by contrast, can operate in O(N) time!  This
sounds like an extraordinary claim, since they still must do at least
O(lg M) bit comparisons per item if you have M unique items; but, the
fact is, the length of each of M unique items is also O(lg M), so the
total size N of the input is O(M lg M).  So they’re still O(M lg M) in
the number of items, but O(N) in the number of bits in the input.

(If you have nonunique input items, you can count the instances of
each unique item in linear time while you sort the unique items.)

There are similar advantages associated with radix trees (called
“tries”, invented by Fredkin), in particular Patricia (called
“crit-bit” by Bernstein).

The trouble with radix-sorting algorithms is that you have to encode
your data in a form where lexicographical comparison results in the
(Continue reading)


Gmane