When we have to sort huge sequences, we naturally represent our sequence as a list of arrays, and then bucket sort is a natural sorting strategy. Bucket sort consists of two steps: (1) distribute the numbers into p buckets (the numbers in the i-th bucket are all smaller than the ones in the (i+1)-th bucket); (2) apply qsort to all buckets. A basic implementation of this method is in the file in bucket_sort.c . Bucket sort belongs to the category of "sorting by distribution", also known as "radix sort". It sorts sequences of size n in time O(n log(n)).
The suggestive notation of p as the number of buckets above leads to an immediate parallel version of bucket sort. The manager distributes the sequence among the processors so that every processor (including the manager node) has one bucket to sort. Every processor applies qsort to its bucket. The sorted buckets are gathered by the manager. We showed, when the size of the sequence n dominates p, that then an even distribution of the numbers into the buckets leads to an optimal speedup. Also for sufficiently large n the computation time (i.e.: the total cost of comparisons) dominates the communication time.
Encouraged by this promising analysis of the performance of this parallel bucket sort, we left the coding of it as an exercise.
Bibliography