Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Python has a built in function sum, which is effectively equivalent to:
def sum2(iterable, start=0):
return start + reduce(operator.add, iterable)
for all types of parameters except strings. It works for numbers and lists, for example:
sum([1,2,3], 0) = sum2([1,2,3],0) = 6 #Note: 0 is the default value for start, but I include it for clarity
sum({888:1}, 0) = sum2({888:1},0) = 888
Why were strings specially left out?
sum( ['foo','bar'], '') # TypeError: sum() can't sum strings [use ''.join(seq) instead]
sum2(['foo','bar'], '') = 'foobar'
I seem to remember discussions in the Python list for the reason, so an explanation or a link to a thread explaining it would be fine.
Edit: I am aware that the standard way is to do "".join. My question is why the option of using sum for strings was banned, and no banning was there for, say, lists.
Edit 2: Although I believe this is not needed given all the good answers I got, the question is: Why does sum work on an iterable containing numbers or an iterable containing lists but not an iterable containing strings?
Python tries to discourage you from "summing" strings. You're supposed to join them:
"".join(list_of_strings)
It's a lot faster, and uses much less memory.
A quick benchmark:
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = reduce(operator.add, strings)'
100 loops, best of 3: 8.46 msec per loop
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = "".join(strings)'
1000 loops, best of 3: 296 usec per loop
Edit (to answer OP's edit): As to why strings were apparently "singled out", I believe it's simply a matter of optimizing for a common case, as well as of enforcing best practice: you can join strings much faster with ''.join, so explicitly forbidding strings on sum will point this out to newbies.
BTW, this restriction has been in place "forever", i.e., since the sum was added as a built-in function (rev. 32347)
You can in fact use sum(..) to concatenate strings, if you use the appropriate starting object! Of course, if you go this far you have already understood enough to use "".join(..) anyway..
>>> class ZeroObject(object):
... def __add__(self, other):
... return other
...
>>> sum(["hi", "there"], ZeroObject())
'hithere'
Here's the source: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?revision=81029&view=markup
In the builtin_sum function we have this bit of code:
/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instead]");
Py_DECREF(iter);
return NULL;
}
Py_INCREF(result);
}
So.. that's your answer.
It's explicitly checked in the code and rejected.
From the docs:
The preferred, fast way to concatenate a
sequence of strings is by calling
''.join(sequence).
By making sum refuse to operate on strings, Python has encouraged you to use the correct method.
Short answer: Efficiency.
Long answer: The sum function has to create an object for each partial sum.
Assume that the amount of time required to create an object is directly proportional to the size of its data. Let N denote the number of elements in the sequence to sum.
doubles are always the same size, which makes sum's running time O(1)×N = O(N).
int (formerly known as long) is arbitary-length. Let M denote the absolute value of the largest sequence element. Then sum's worst-case running time is lg(M) + lg(2M) + lg(3M) + ... + lg(NM) = N×lg(M) + lg(N!) = O(N log N).
For str (where M = the length of the longest string), the worst-case running time is M + 2M + 3M + ... + NM = M×(1 + 2 + ... + N) = O(N²).
Thus, summing strings would be much slower than summing numbers.
str.join does not allocate any intermediate objects. It preallocates a buffer large enough to hold the joined strings, and copies the string data. It runs in O(N) time, much faster than sum.
The Reason Why
#dan04 has an excellent explanation for the costs of using sum on large lists of strings.
The missing piece as to why str is not allowed for sum is that many, many people were trying to use sum for strings, and not many use sum for lists and tuples and other O(n**2) data structures. The trap is that sum works just fine for short lists of strings, but then gets put in production where the lists can be huge, and the performance slows to a crawl. This was such a common trap that the decision was made to ignore duck-typing in this instance, and not allow strings to be used with sum.
Edit: Moved the parts about immutability to history.
Basically, its a question of preallocation. When you use a statement such as
sum(["a", "b", "c", ..., ])
and expect it to work similar to a reduce statement, the code generated looks something like
v1 = "" + "a" # must allocate v1 and set its size to len("") + len("a")
v2 = v1 + "b" # must allocate v2 and set its size to len("a") + len("b")
...
res = v10000 + "$" # must allocate res and set its size to len(v9999) + len("$")
In each of these steps a new string is created, which for one might give some copying overhead as the strings are getting longer and longer. But that’s maybe not the point here. What’s more important, is that every new string on each line must be allocated to it’s specific size (which. I don’t know it it must allocate in every iteration of the reduce statement, there might be some obvious heuristics to use and Python might allocate a bit more here and there for reuse – but at several points the new string will be large enough that this won’t help anymore and Python must allocate again, which is rather expensive.
A dedicated method like join, however has the job to figure out the real size of the string before it starts and would therefore in theory only allocate once, at the beginning and then just fill that new string, which is much cheaper than the other solution.
I dont know why, but this works!
import operator
def sum_of_strings(list_of_strings):
return reduce(operator.add, list_of_strings)
Related
Is there a built-in python function to count bit flip in a binary string? The question I am trying to solve is given a binary string of arbitrary length, how can I return the number of bit flips of the string. For example, the bit flip number of '01101' is 3, and '1110011' is 2, etc.
The way I can come up with to solve this problem is to use a for loop and a counter. However, that seems too lengthy. Is there a way I can do that faster? Or is there a built-in function in python that allows me to do that directly? Thanks for the help!
There is a very fast way to do that without any explicit loops and only using Python builtins: you can convert the string to a binary number, then detect all the bit flips using a XOR-based integer tricks and then convert the integer back to a string to count the number of bit flips. Here is the code:
# Convert the binary string `s` to an integer: "01101" -> 0b01101
n = int(s, 2)
# Build a binary mask to skip the most significant bit of n: 0b01101 -> 0b01111
mask = (1 << (len(s)-1)) - 1
# Check if the ith bit of n is different from the (i+1)th bit of n using a bit-wise XOR:
# 0b01101 & 0b01111 -> 0b1101 (discard the first bit)
# 0b01101 >> 1 -> 0b0110
# 0b1101 ^ 0b0110 -> 0b1011
bitFlips = (n & mask) ^ (n >> 1)
# Convert the integer back to a string and count the bit flips: 0b1011 -> "0b1011" -> 3
flipCount = bin(bitFlips).count('1')
This trick is much faster than other methods since integer operations are very optimized compare to a loop-based interpreted codes or the ones working on iterables. Here are performance results for a string of size 1000 on my machine:
ljdyer's solution: 96 us x1.0
Karl's solution: 39 us x2.5
This solution: 4 us x24.0
If you are working with short bounded strings, then there are even faster ways to count the number of bits set in an integer.
Don't know about a built in function, but here's a one-liner:
bit_flip_count = len([x for x in range(1, len(x0)) if x0[x] != x0[x-1]])
Given a sequence of values, you can find the number of times that the value changes by grouping contiguous values and then counting the groups. There will be one more group than the number of changes (since the elements before the first change are also in a group). (Of course, for an empty sequence, this gives you a result of -1; you may want to handle this case separately.)
Grouping in Python is built-in, via the standard library itertools.groupby. This tool only considers contiguous groups, which is often a drawback (if you want to make a histogram, for example, you have to sort the data first) but in our case is exactly what we want. The overall interface of this tool is a bit complex, but in our case we can use it simply:
from itertools import groupby
def changes_in(sequence):
return len(list(groupby(sequence))) - 1
Given a minimum length N and a string S of 1's and 0's (e.g. "01000100"), I am trying to return the number of non-overlapping occurrences of a sub-string of length n containing all '0's. For example, given n=2 and the string "01000100", the number of non-overlapping "00"s is 2.
This is what I have done:
def myfunc(S,N):
return S.count('0'*N)
My question: is there a faster way of performing this for very long strings? This is from an online coding practice site and my code passes all but one of the test cases, which fails due to not being able to finish within a time limit. Doing some research it seems I can only find that count() is the fastest method for this.
This might be faster:
>>> s = "01000100"
>>> def my_count( a, n ) :
... parts = a.split('1')
... return sum( len(p)//n for p in parts )
...
>>> my_count(s, 2)
2
>>>
Worst case scenario for count() is O(N^2), the function above is strictly linear O(N). Here's the discussion where O(N^2) number came from: What's the computational cost of count operation on strings Python?
Also, you may always do this manually, without using split(), just loop over the string, reset counter (once saved counter // n somewhere) on 1 and increase counter on 0. This would definitely beat any other approach because strictly O(N).
Finally, for relatively large values of n (n > 10 ?), there might be a sub-linear (or still linear, but with a smaller constant) algorithm, which starts with comparing a[n-1] to 0, and going back to beginning. Chances are, there going to be a 1 somewhere, so we don't have to analyse the beginning of the string if a[n-1] is 1 -- simply because there's no way to fit enough zeros in there. Assuming we have found 1 at position k, the next position to compare would be a[k+n-1], again going back to the beginning of the string.
This way we can effectively skip most of the string during the search.
lenik posted a very good response that worked well. I also found another method faster than count() that I will post here as well. It uses the findall() method from the regex library:
import re
def my_count(a, n):
return len(re.findall('0'*n, a))
Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Python has a built in function sum, which is effectively equivalent to:
def sum2(iterable, start=0):
return start + reduce(operator.add, iterable)
for all types of parameters except strings. It works for numbers and lists, for example:
sum([1,2,3], 0) = sum2([1,2,3],0) = 6 #Note: 0 is the default value for start, but I include it for clarity
sum({888:1}, 0) = sum2({888:1},0) = 888
Why were strings specially left out?
sum( ['foo','bar'], '') # TypeError: sum() can't sum strings [use ''.join(seq) instead]
sum2(['foo','bar'], '') = 'foobar'
I seem to remember discussions in the Python list for the reason, so an explanation or a link to a thread explaining it would be fine.
Edit: I am aware that the standard way is to do "".join. My question is why the option of using sum for strings was banned, and no banning was there for, say, lists.
Edit 2: Although I believe this is not needed given all the good answers I got, the question is: Why does sum work on an iterable containing numbers or an iterable containing lists but not an iterable containing strings?
Python tries to discourage you from "summing" strings. You're supposed to join them:
"".join(list_of_strings)
It's a lot faster, and uses much less memory.
A quick benchmark:
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = reduce(operator.add, strings)'
100 loops, best of 3: 8.46 msec per loop
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = "".join(strings)'
1000 loops, best of 3: 296 usec per loop
Edit (to answer OP's edit): As to why strings were apparently "singled out", I believe it's simply a matter of optimizing for a common case, as well as of enforcing best practice: you can join strings much faster with ''.join, so explicitly forbidding strings on sum will point this out to newbies.
BTW, this restriction has been in place "forever", i.e., since the sum was added as a built-in function (rev. 32347)
You can in fact use sum(..) to concatenate strings, if you use the appropriate starting object! Of course, if you go this far you have already understood enough to use "".join(..) anyway..
>>> class ZeroObject(object):
... def __add__(self, other):
... return other
...
>>> sum(["hi", "there"], ZeroObject())
'hithere'
Here's the source: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?revision=81029&view=markup
In the builtin_sum function we have this bit of code:
/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instead]");
Py_DECREF(iter);
return NULL;
}
Py_INCREF(result);
}
So.. that's your answer.
It's explicitly checked in the code and rejected.
From the docs:
The preferred, fast way to concatenate a
sequence of strings is by calling
''.join(sequence).
By making sum refuse to operate on strings, Python has encouraged you to use the correct method.
Short answer: Efficiency.
Long answer: The sum function has to create an object for each partial sum.
Assume that the amount of time required to create an object is directly proportional to the size of its data. Let N denote the number of elements in the sequence to sum.
doubles are always the same size, which makes sum's running time O(1)×N = O(N).
int (formerly known as long) is arbitary-length. Let M denote the absolute value of the largest sequence element. Then sum's worst-case running time is lg(M) + lg(2M) + lg(3M) + ... + lg(NM) = N×lg(M) + lg(N!) = O(N log N).
For str (where M = the length of the longest string), the worst-case running time is M + 2M + 3M + ... + NM = M×(1 + 2 + ... + N) = O(N²).
Thus, summing strings would be much slower than summing numbers.
str.join does not allocate any intermediate objects. It preallocates a buffer large enough to hold the joined strings, and copies the string data. It runs in O(N) time, much faster than sum.
The Reason Why
#dan04 has an excellent explanation for the costs of using sum on large lists of strings.
The missing piece as to why str is not allowed for sum is that many, many people were trying to use sum for strings, and not many use sum for lists and tuples and other O(n**2) data structures. The trap is that sum works just fine for short lists of strings, but then gets put in production where the lists can be huge, and the performance slows to a crawl. This was such a common trap that the decision was made to ignore duck-typing in this instance, and not allow strings to be used with sum.
Edit: Moved the parts about immutability to history.
Basically, its a question of preallocation. When you use a statement such as
sum(["a", "b", "c", ..., ])
and expect it to work similar to a reduce statement, the code generated looks something like
v1 = "" + "a" # must allocate v1 and set its size to len("") + len("a")
v2 = v1 + "b" # must allocate v2 and set its size to len("a") + len("b")
...
res = v10000 + "$" # must allocate res and set its size to len(v9999) + len("$")
In each of these steps a new string is created, which for one might give some copying overhead as the strings are getting longer and longer. But that’s maybe not the point here. What’s more important, is that every new string on each line must be allocated to it’s specific size (which. I don’t know it it must allocate in every iteration of the reduce statement, there might be some obvious heuristics to use and Python might allocate a bit more here and there for reuse – but at several points the new string will be large enough that this won’t help anymore and Python must allocate again, which is rather expensive.
A dedicated method like join, however has the job to figure out the real size of the string before it starts and would therefore in theory only allocate once, at the beginning and then just fill that new string, which is much cheaper than the other solution.
I dont know why, but this works!
import operator
def sum_of_strings(list_of_strings):
return reduce(operator.add, list_of_strings)
Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Python has a built in function sum, which is effectively equivalent to:
def sum2(iterable, start=0):
return start + reduce(operator.add, iterable)
for all types of parameters except strings. It works for numbers and lists, for example:
sum([1,2,3], 0) = sum2([1,2,3],0) = 6 #Note: 0 is the default value for start, but I include it for clarity
sum({888:1}, 0) = sum2({888:1},0) = 888
Why were strings specially left out?
sum( ['foo','bar'], '') # TypeError: sum() can't sum strings [use ''.join(seq) instead]
sum2(['foo','bar'], '') = 'foobar'
I seem to remember discussions in the Python list for the reason, so an explanation or a link to a thread explaining it would be fine.
Edit: I am aware that the standard way is to do "".join. My question is why the option of using sum for strings was banned, and no banning was there for, say, lists.
Edit 2: Although I believe this is not needed given all the good answers I got, the question is: Why does sum work on an iterable containing numbers or an iterable containing lists but not an iterable containing strings?
Python tries to discourage you from "summing" strings. You're supposed to join them:
"".join(list_of_strings)
It's a lot faster, and uses much less memory.
A quick benchmark:
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = reduce(operator.add, strings)'
100 loops, best of 3: 8.46 msec per loop
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = "".join(strings)'
1000 loops, best of 3: 296 usec per loop
Edit (to answer OP's edit): As to why strings were apparently "singled out", I believe it's simply a matter of optimizing for a common case, as well as of enforcing best practice: you can join strings much faster with ''.join, so explicitly forbidding strings on sum will point this out to newbies.
BTW, this restriction has been in place "forever", i.e., since the sum was added as a built-in function (rev. 32347)
You can in fact use sum(..) to concatenate strings, if you use the appropriate starting object! Of course, if you go this far you have already understood enough to use "".join(..) anyway..
>>> class ZeroObject(object):
... def __add__(self, other):
... return other
...
>>> sum(["hi", "there"], ZeroObject())
'hithere'
Here's the source: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?revision=81029&view=markup
In the builtin_sum function we have this bit of code:
/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instead]");
Py_DECREF(iter);
return NULL;
}
Py_INCREF(result);
}
So.. that's your answer.
It's explicitly checked in the code and rejected.
From the docs:
The preferred, fast way to concatenate a
sequence of strings is by calling
''.join(sequence).
By making sum refuse to operate on strings, Python has encouraged you to use the correct method.
Short answer: Efficiency.
Long answer: The sum function has to create an object for each partial sum.
Assume that the amount of time required to create an object is directly proportional to the size of its data. Let N denote the number of elements in the sequence to sum.
doubles are always the same size, which makes sum's running time O(1)×N = O(N).
int (formerly known as long) is arbitary-length. Let M denote the absolute value of the largest sequence element. Then sum's worst-case running time is lg(M) + lg(2M) + lg(3M) + ... + lg(NM) = N×lg(M) + lg(N!) = O(N log N).
For str (where M = the length of the longest string), the worst-case running time is M + 2M + 3M + ... + NM = M×(1 + 2 + ... + N) = O(N²).
Thus, summing strings would be much slower than summing numbers.
str.join does not allocate any intermediate objects. It preallocates a buffer large enough to hold the joined strings, and copies the string data. It runs in O(N) time, much faster than sum.
The Reason Why
#dan04 has an excellent explanation for the costs of using sum on large lists of strings.
The missing piece as to why str is not allowed for sum is that many, many people were trying to use sum for strings, and not many use sum for lists and tuples and other O(n**2) data structures. The trap is that sum works just fine for short lists of strings, but then gets put in production where the lists can be huge, and the performance slows to a crawl. This was such a common trap that the decision was made to ignore duck-typing in this instance, and not allow strings to be used with sum.
Edit: Moved the parts about immutability to history.
Basically, its a question of preallocation. When you use a statement such as
sum(["a", "b", "c", ..., ])
and expect it to work similar to a reduce statement, the code generated looks something like
v1 = "" + "a" # must allocate v1 and set its size to len("") + len("a")
v2 = v1 + "b" # must allocate v2 and set its size to len("a") + len("b")
...
res = v10000 + "$" # must allocate res and set its size to len(v9999) + len("$")
In each of these steps a new string is created, which for one might give some copying overhead as the strings are getting longer and longer. But that’s maybe not the point here. What’s more important, is that every new string on each line must be allocated to it’s specific size (which. I don’t know it it must allocate in every iteration of the reduce statement, there might be some obvious heuristics to use and Python might allocate a bit more here and there for reuse – but at several points the new string will be large enough that this won’t help anymore and Python must allocate again, which is rather expensive.
A dedicated method like join, however has the job to figure out the real size of the string before it starts and would therefore in theory only allocate once, at the beginning and then just fill that new string, which is much cheaper than the other solution.
I dont know why, but this works!
import operator
def sum_of_strings(list_of_strings):
return reduce(operator.add, list_of_strings)
I've been immensely frustrated with many of the implementations of python radix sort out there on the web.
They consistently use a radix of 10 and get the digits of the numbers they iterate over by dividing by a power of 10 or taking the log10 of the number. This is incredibly inefficient, as log10 is not a particularly quick operation compared to bit shifting, which is nearly 100 times faster!
A much more efficient implementation uses a radix of 256 and sorts the number byte by byte. This allows for all of the 'byte getting' to be done using the ridiculously quick bit operators. Unfortunately, it seems that absolutely nobody out there has implemented a radix sort in python that uses bit operators instead of logarithms.
So, I took matters into my own hands and came up with this beast, which runs at about half the speed of sorted on small arrays and runs nearly as quickly on larger ones (e.g. len around 10,000,000):
import itertools
def radix_sort(unsorted):
"Fast implementation of radix sort for any size num."
maximum, minimum = max(unsorted), min(unsorted)
max_bits = maximum.bit_length()
highest_byte = max_bits // 8 if max_bits % 8 == 0 else (max_bits // 8) + 1
min_bits = minimum.bit_length()
lowest_byte = min_bits // 8 if min_bits % 8 == 0 else (min_bits // 8) + 1
sorted_list = unsorted
for offset in xrange(lowest_byte, highest_byte):
sorted_list = radix_sort_offset(sorted_list, offset)
return sorted_list
def radix_sort_offset(unsorted, offset):
"Helper function for radix sort, sorts each offset."
byte_check = (0xFF << offset*8)
buckets = [[] for _ in xrange(256)]
for num in unsorted:
byte_at_offset = (num & byte_check) >> offset*8
buckets[byte_at_offset].append(num)
return list(itertools.chain.from_iterable(buckets))
This version of radix sort works by finding which bytes it has to sort by (if you pass it only integers below 256, it'll sort just one byte, etc.) then sorting each byte from LSB up by dumping them into buckets in order then just chaining the buckets together. Repeat this for each byte that needs to be sorted and you have your nice sorted array in O(n) time.
However, it's not as fast as it could be, and I'd like to make it faster before I write about it as a better radix sort than all the other radix sorts out there.
Running cProfile on this tells me that a lot of time is being spent on the append method for lists, which makes me think that this block:
for num in unsorted:
byte_at_offset = (num & byte_check) >> offset*8
buckets[byte_at_offset].append(num)
in radix_sort_offset is eating a lot of time. This is also the block that, if you really look at it, does 90% of the work for the whole sort. This code looks like it could be numpy-ized, which I think would result in quite a performance boost. Unfortunately, I'm not very good with numpy's more complex features so haven't been able to figure that out. Help would be very appreciated.
I'm currently using itertools.chain.from_iterable to flatten the buckets, but if anyone has a faster suggestion I'm sure it would help as well.
Originally, I had a get_byte function that returned the nth byte of a number, but inlining the code gave me a huge speed boost so I did it.
Any other comments on the implementation or ways to squeeze out more performance are also appreciated. I want to hear anything and everything you've got.
You already realized that
for num in unsorted:
byte_at_offset = (num & byte_check) >> offset*8
buckets[byte_at_offset].append(num)
is where most of the time goes - good ;-)
There are two standard tricks for speeding that kind of thing, both having to do with moving invariants out of loops:
Compute "offset*8" outside the loop. Store it in a local variable. Save a multiplication per iteration.
Add bucketappender = [bucket.append for bucket in buckets] outside the loop. Saves a method lookup per iteration.
Combine them, and the loop looks like:
for num in unsorted:
bucketappender[(num & byte_check) >> ofs8](num)
Collapsing it to one statement also saves a pair of local vrbl store/fetch opcodes per iteration.
But, at a higher level, the standard way to speed radix sort is to use a larger radix. What's magical about 256? Nothing, apart from that it's convenient for bit-shifting. But so are 512, 1024, 2048 ... it's a classical time/space tradeoff.
PS: for very long numbers,
(num >> offset*8) & 0xff
will run faster. That's because your num & byte_check takes time proportional to log(num) - it generally has to create an integer about as big as num.
This is an old thread, but I came across this when looking to radix sort an array of positive integers. I was trying to see if I can do any better than the already wickedly fast timsort (hats off to you again, Tim Peters) which implements python's builtin sorted and sort! Either I don't understand certain aspects of the above code, or if I do, the code as presented above has some problems IMHO.
It only sorts bytes starting with the highest byte of the smallest item and ending with the highest byte of the biggest item. This may be okay in some cases of special data. But in general the approach fails to differentiate items which differ on account of the lower bits. For example:
arr=[65535,65534]
radix_sort(arr)
produces the wrong output:
[65535, 65534]
The range used to loop over the helper function is not correct. What I mean is that if lowest_byte and highest_byte are the same, execution of the helper function is altogether skipped. BTW I had to change xrange to range in 2 places.
With modifications to address the above 2 points, I got it to work. But it is taking 10-20 times the time of python's builtin sorted or sort! I know timsort is very efficient and takes advantage of already sorted runs in the data. But I was trying to see if I can use the prior knowledge that my data is all positive integers to some advantage in my sorting. Why is the radix sort doing so badly compared to timsort? The array sizes I was using are in the order of 80K items. Is it because the timsort implementation in addition to its algorithmic efficiency has also other efficiencies stemming from possible use of low level libraries? Or am I missing something entirely? The modified code I used is below:
import itertools
def radix_sort(unsorted):
"Fast implementation of radix sort for any size num."
maximum, minimum = max(unsorted), min(unsorted)
max_bits = maximum.bit_length()
highest_byte = max_bits // 8 if max_bits % 8 == 0 else (max_bits // 8) + 1
# min_bits = minimum.bit_length()
# lowest_byte = min_bits // 8 if min_bits % 8 == 0 else (min_bits // 8) + 1
sorted_list = unsorted
# xrange changed to range, lowest_byte deleted from the arguments
for offset in range(highest_byte):
sorted_list = radix_sort_offset(sorted_list, offset)
return sorted_list
def radix_sort_offset(unsorted, offset):
"Helper function for radix sort, sorts each offset."
byte_check = (0xFF << offset*8)
# xrange changed to range
buckets = [[] for _ in range(256)]
for num in unsorted:
byte_at_offset = (num & byte_check) >> offset*8
buckets[byte_at_offset].append(num)
return list(itertools.chain.from_iterable(buckets))
You could simply use one of the existing C or C++ implementations, such
as example, integer_sort from Boost.Sort or u4_sort from usort. It is surprisingly easy to call native C or C++ code from Python, see How to sort an array of integers faster than quicksort?
I totally get your frustration. Although it's been more than 2 years, numpy still does not have radix sort. I will let the NumPy developers know that they could simply grab one of the existing implementations; licensing should not be an issue.