Comparison based sorting needs Ω(lgn!)=Ω(n lgn)
comparisons because each comparison yields one bit of information
and there are n! permutations. If you are sorting integers then
you can do better, and not only asymptotically. Radix sort works
as follows:
deque<int> bucket[2][256];
void rsort(vector<int>& v) {
int i, j, k;
for (i = 0; i < v.size(); ++i)
bucket[0][v[i]&0x000000ff].push_back(v[i]);
for (i = 0; i < 256; ++i) {
while (!bucket[0][i].empty()) {
int &e = bucket[0][i].front();
bucket[1][(e&0x0000ff00)>>8].push_back(e);
bucket[0][i].pop_front();
}
}
for (i = 0; i < 256; ++i) {
while (!bucket[1][i].empty()) {
int &e = bucket[1][i].front();
bucket[0][(e&0x00ff0000)>>16].push_back(e);
bucket[1][i].pop_front();
}
}
for (i = 0; i < 256; ++i) {
while (!bucket[0][i].empty()) {
int &e = bucket[0][i].front();
bucket[1][(e&0xff000000)>>8].push_back(e);
bucket[0][i].pop_front();
}
}
k = 0;
for (i = 0; i < 256; ++i) {
while (!bucket[1][i].empty()) {
v[k++] = bucket[1][i].front();
bucket[1][i].pop_front();
}
}
}
This straightforward (and somewhat repetitive) code is faster than
std::sort if you sort more than one thousand integers.
Whenever someone says that radix sort works in linear time there
is someone that feels compelled to point out that saying “linear
time” is cheating. I really don’t care whether it “works in linear
time” or not, but I can tell you that it is about two to three times
faster than std::sort. We can still analyze the runtime and
get a better idea of just how fast it is, but we need to be a bit
more precise.
Let’s say we work on a computer with 2k bit words. Typical computers
nowadays have k=5 or k=6, and the code above assumes k=5. We
group the bits of a word in digits, each with 2w bits.
In the example above w=3. We have b=22w buckets
and d=2k−w digits. The number of operations of the code above
is proportional to n+(b+n)d=n+(22w+n)2k−w. We can’t
change how many numbers we must sort (n) or the machine on which
the program runs (k). But we can change w. What is the optimum
value? If we let the derivative with respect to w equal 0 we get
22w+wln2=2w+wn. That looks horrible. But it has one interesting
feature: It doesn’t contain k, which means that the optimum
number of bits per digit does not depend on the machine on which
we do the sorting. It can be a 8 bit machine, a 32 bit machine,
a 64 bit machine, or even a 8192 bit machine, we should still
have the same number of bits per digit for a certain size of the
list that we are sorting.
We now turn to approximations. It’s probably safe to assume 2w≫ w
and wn≫2w. We are left with 22wln2=wn, which is still
horrible. Since the left side varies much faster than the
right one we can replace the w on the right with a (yet unknown)
suitable constant w′. We get that 2w varies as O(lgn).
(Note that this is an upper bound.) If we plug this back
into the running time formula we finally get O(n 2k/lgn).
Here 2k is how many bits the machine has per word and lgn
is how many bits we need to write down the length of the list we
are sorting. So radix sort should be quite good when we have to sort
lots of numbers. And indeed, as I said, it clearly beats std:sort.
Now, if you have to sort really lots of numbers, then you
should probably look at funnelsort.