(June 2010 — May 2012)
Everybody with any computer science education has been taught to disdain the bubble sort. It’s an O(n2) algorithm with well-known O(n log n) rivals like Quicksort. That is to say, it does considerably more work than necessary, under some standard assumptions.
But some of those assumptions may be worth reconsidering. In days of yore, when the CS canon was being laid down by giants like Donald Knuth, computers were big, delicate, and terribly expensive. Anybody with access to a computer at all was lucky to have one of them. Its memory was similarly constrained: thus the concept of a “memory hierarchy,” with CPU registers at the top, core RAM in the middle, and slow-but-persistent things like punch cards, magnetic tape, or hermetically sealed rotating platters down at the bottom — and with the time cost of access to storage considered uniform at each level in the hierarchy. The dominant metaphor for the computer (kind of a lousy one, in retrospect) was the “artificial brain.” The electrical engineers knew better, of course, but the notion of an abstract machine, manipulating disembodied symbols in methodical sequence, had (and has) the appeal of consigning to superfluity the many devils in the many details. Mathematicians’ weakness for spherical cows is well known.
By contrast, nowadays computers are so tiny and cheap that it’s easy to put lots of them to work on a single task. Similarly, it should be easy to add huge amounts of fast memory to our computer architectures. But one of those devils just can’t be banished by any reasonable technology we have available. Its name is c, the speed of light. Ours is, apparently, a relativistic universe: to communicate across any distance, even microns, takes time. Processors and memory take up space, creating distance. Over billions of iterations, tiny timing differences can quickly accumulate into major discrepancies. In short, communication is the cost that counts now, and we can’t effectively coordinate our big parallel systems of fast little machines by having them all march to the same drum.
With all that in mind, let’s re-examine the bubble sort. Even its biggest haters must admit that it is profoundly simple: in essence, you just line up your unordered data elements, and keep swapping adjacent out-of-order pairs until they’re all sorted. It’s hard to imagine anything simpler that would work. I count that as a virtue.
Now suppose that we have as many processors as data elements. (This is cheating, according to your parallel algorithms textbook, but let’s be bold.) They can’t all talk to each other directly, because the wires would take up too much room. Being physical objects, they must be put in some physical configuration: let’s just line them all up in a row, connecting nearest neighbors. Further, let’s imagine that these machines (“cells,” let’s call them) already come pre-programmed with the data we want to sort, one element per machine. (That last part might seem like cheating too, but input and output are a separate issue. For the moment, we can justify this assumption by supposing, reasonably enough, that the cheap little cells are connected to cheap little sensors which directly supply the data we care about, one sample at a time; and we only need access to one end of the row.)
Now, if we could only suppose that this array of machines was globally synchronized, we could have them do bubbly compare-and-swap exchanges in alternating “even” and “odd” pairs. We’d have caught up with computer scientist A N Haberman circa 1972: this is called “even-odd transposition sort” and it’s easy to see that it will always sort n data in n steps or less:
In the “space-time diagram” above, space is horizontal and time runs vertically downward. (Each row represents an instantaneous system state, and each column represents the history of a particular cell.) The data are grey pixels, initially in a random order, with the lightest placed on the far left and “stained” blue so we can see its trajectory across the array. In cellular automata (even-odd sort qualifies as such, given a broad enough definition) we’d say that the blue element is moving at the “speed of light:” one cell per time step — as fast as is possible.
Now, the O(n2) work of bubble sort is spread evenly across our n cells, resulting in O(n) time complexity: better than any possible sequential sorting algorithm, no matter how clever. Many cells make fast work. But, we’d be relying on global synchronization, which perhaps we can’t really have in a truly scalable system. Let’s get rid of it and see what happens. To make the pictures easier to read, we’ll appeal to the “zero-one principle,” which states that an algorithm capable of sorting any sequence of zeros and ones can also sort any other sortable data. So instead of shades of grey, we’ll have black and white pixels.
On the left we have our unreasonably perfect even-odd sort, given a sequence of fifty white “ones” followed by fifty black “zeros.” This reverse-ordered sequence is, in some sense, as bad as things can get for our spatially constrained system, but it’s still sorted in 100 steps.
The other three diagrams show randomized transposition sorts of the same reverse-ordered sequence, finishing in 362, 349, and 338 steps respectively. Cells just pair up randomly at each step: those which are left without partners are colored red in the pictures. (This “monomer-dimer covering” of the linear array has been the subject of a book chapter by the celebrated math popularizer and applied mathematician par excellence Dr Steven Strogatz: in an infinite array, the expected proportion of red pixels would be exactly 1/e2, about 13.533% — but I digress!)
When working with a randomized process, it’s very helpful (if not to say essential) to take a statistical view. The picture on the left shows ten separate runs of the same algorithm, all superimposed together. The picture on the right shows a hundred runs superimposed, which should give us some confidence (though not a proof!) that a little less than 4n steps should suffice to sort n data without any form of synchronization whatsoever.
But now we have a new mystery: why the vertical asymmetry? The diagram of the even-odd sort is perfectly symmetrical, but the randomized sort apparently takes only about half as long to strip the reverse-sorted zeros and ones from the ends of the array as it does to converge in the middle. The synchronized sorting process was already an irreversible one, since we throw away information with every potential swap. But by introducing another form of randomness throughout the process, we've made it even more irreversible. If you have some insight into this, please let me know!
As it turns out, it’s quite possible to dynamically synchronize a big distributed system in an entirely decentralized fashion. Some firefly species do this trick, and it’s how your heart beats. Even pendulum clocks hung on a wall will spontaneously fall into sync, as their inventor Christiaan Huygens noted (with some chagrin) back in 1655. Dr Strogatz has an enjoyable pop-science book about it. So the constant factor-of-four slowdown caused by obliterating all synchrony in the sorting process should really be taken as a theoretical worst case. Constant factors sometimes matter more than some algorithm textbooks would have you think... but anyway, the randomized parallel bubble sort is still O(n) in time complexity.
More interesting, to me at least, is the prospect of bubble-sorting composite data along multiple axes simultaneously. This is harder to make non-moving pictures of, but it’s a completely trivial extension of the approach already developed here. By accepting the spatially-embodied, physical nature of information, we get to take advantage of the structure of space itself in our algorithms, and perhaps even build entirely new kinds of computing machinery.
Here is the NetLogo model I used to make the pictures, and its extensions into multi-dimensional arrays. Enjoy!