18 Oct 2011 00:55
radix-sorting rational numbers with an efficient serialization of continued fractions
Kragen Javier Sitaker <kragen <at> canonical.org>
2011-10-17 22:55:37 GMT
2011-10-17 22:55:37 GMT
There are a family of interesting algorithms that sort lexicographically on strings of bits or bytes rather than by comparing items pairwise with some kind of customizable comparison function on individual items. Optimal comparison sorts such as heapsort, quicksort, mergesort, library sort, and sorting with a red-black tree are often described as having O(N lg N) runtime, but in fact this is only true if we assume individual comparisons take constant time. In fact, though, individual lexicographical comparisons can take up to O(N) time as well, if N is the total size of the input in bits; consider comparing “aaaaaaaaaaaaaaaab” to “aaaaaaaaaaaaaaaac”. Consequently their average runtimes are slightly worse, and their worst-case runtimes are much worse. Radix sorting algorithms, by contrast, can operate in O(N) time! This sounds like an extraordinary claim, since they still must do at least O(lg M) bit comparisons per item if you have M unique items; but, the fact is, the length of each of M unique items is also O(lg M), so the total size N of the input is O(M lg M). So they’re still O(M lg M) in the number of items, but O(N) in the number of bits in the input. (If you have nonunique input items, you can count the instances of each unique item in linear time while you sort the unique items.) There are similar advantages associated with radix trees (called “tries”, invented by Fredkin), in particular Patricia (called “crit-bit” by Bernstein). The trouble with radix-sorting algorithms is that you have to encode your data in a form where lexicographical comparison results in the(Continue reading)
RSS Feed