Imagine you want to process all pairs of the numbers 0 to n-1, for example for n = 4 that's these six pairs:
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
Three ways to create those pairs:
list(combinations(range(n), 2))
[(i, j) for i, j in combinations(range(n), 2)]
[(i, j) for i in range(n) for j in range(i+1, n)]
Benchmark results for n = 1000:
44.1 ms ± 0.2 ms f_combinations_pure
57.7 ms ± 0.3 ms f_combinations
66.6 ms ± 0.1 ms f_ranges
Note I'm not really interested in just storing the pairs (i, j). That's just a minimal example usage of i and j so that we can compare different approaches without much overhead. In reality, you want to do something with i and j, for example [my_string[i:j] for ...] to get substrings (the question where comments inspired this). So the list(combinations(...)) one doesn't really count here, and I show it just to make that clear (although I still liked seeing how fast it is).
Question 1: Why is f_ranges slower than f_combinations? Its for i in runs only n times overall, so it's insignificant compared to the for j in, which runs n*(n-1)/2 times. And for j in range(...) only assigns one number, whereas for i, j in combinations(...) builds and assigns pairs of numbers, so the latter should be slower. Why is it faster?
Question 2: What's the fastest way you can come up with? For fair comparison, it shall be a list comprehension [(i, j) for ...] producing the same list of pairs.
(As I'm including an answer myself (which is encouraged), I'm including benchmark code there.)
About question 1: Why is range slower than combinations?
While for j in range(...) indeed has the advantage of assigning just one number, it has the disadvantage of creating them over and over again. In Python, numbers are objects, and their creation (and deletion) takes a little time.
combinations(...) on the other hand first creates and stores the number objects only once, and then reuses them over and over again in the pairs. You might think "Hold on, it can reuse the numbers, but it produces the pairs as tuple objects, so it also creates one object per iteration!". Well... it has an optimization. It actually reuses the same tuple object over and over again, filling it with different numbers. "What? No way! Tuples are immutable!" Well... ostensibly they're immutable, yes. But if the combinations iterator sees that there are no other references to its result tuple, then it "cheats" and modifies it anyway. At the C code level, it can do that. And if nothing else has a reference to it, then there's no harm. Note that for i, j in ... unpacks the tuple and doesn't keep a reference to it. If you instead use for pair in ..., then pair is a reference to it and the optimization isn't applied and indeed a new result tuple gets created every time. See the source code of combinations_next if you're interested.
About question 2: What's the fastest way?
I found four faster ways:
44.1 ms ± 0.2 ms f_combinations_pure
51.7 ms ± 0.1 ms f_list
52.7 ms ± 0.2 ms f_tee
53.6 ms ± 0.1 ms f_copy_iterator
54.6 ms ± 0.1 ms f_tuples
57.7 ms ± 0.3 ms f_combinations
66.6 ms ± 0.1 ms f_ranges
All four faster ways avoid what made the range solution slow: Instead of creating (and deleting) Θ(n²) int objects, they reuse the same ones over and over again.
f_tuples puts them into a tuple and iterates over slices:
def f_tuples(n):
nums = tuple(range(n))
return [(i, j)
for i in nums
for j in nums[i+1:]]
f_list puts them into a list and then before each j-loop, it removes the first number:
def f_list(n):
js = list(range(n))
return [(i, j)
for i in range(n)
if [js.pop(0)]
for j in js]
f_copy_iterator puts them into a tuple, then uses an iterator for i and a copy of that iterator for j (which is an iterator starting one position after i):
def f_copy_iterator(n):
nums = iter(tuple(range(n)))
return [(i, j)
for i in nums
for j in copy(nums)]
f_tee uses itertools.tee for a similar effect as copy. Its JS is the main iterator of j values, and before each j-loop, it discards the first value and then tees JS to get a second iterator of the remaining values:
def f_tee(n):
return [(i, j)
for JS in [iter(range(n))]
for i in range(n)
for _, (JS, js) in [(next(JS), tee(JS))]
for j in js]
Bonus question: Is it worth it to optimize like those faster ways?
Meh, probably not. Probably you'd best just use for i, j in combinations(...). The faster ways aren't much faster, and they're somewhat more complicated. Plus, in reality, you'll actually do something with i and j (like getting substrings), so the relatively small speed advantage becomes even relatively smaller.
But I hope you at least found this interesting and perhaps learned something new that is useful some day.
Full benchmark code
Try it online!
def f_combinations_pure(n):
return list(combinations(range(n), 2))
def f_combinations(n):
return [(i, j) for i, j in combinations(range(n), 2)]
def f_ranges(n):
return [(i, j) for i in range(n) for j in range(i+1, n)]
def f_tuples(n):
nums = tuple(range(n))
return [(i, j) for i in nums for j in nums[i+1:]]
def f_list(n):
js = list(range(n))
return [(i, j) for i in range(n) if [js.pop(0)] for j in js]
def f_copy_iterator(n):
nums = iter(tuple(range(n)))
return [(i, j) for i in nums for j in copy(nums)]
def f_tee(n):
return [(i, j)
for JS in [iter(range(n))]
for i in range(n)
for _, (JS, js) in [(next(JS), tee(JS))]
for j in js]
fs = [
f_combinations_pure,
f_combinations,
f_ranges,
f_tuples,
f_list,
f_copy_iterator,
f_tee
]
from timeit import default_timer as time
from itertools import combinations, tee
from statistics import mean, stdev
from random import shuffle
from copy import copy
# Correctness
expect = fs[0](1000)
for f in fs:
result = f(1000)
assert result == expect
# Prepare for timing
times = {f: [] for f in fs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):4.1f} ms ± {stdev(ts):3.1f} ms '
# Timing
for i in range(25):
shuffle(fs)
for f in fs:
start = time()
result = f(1000)
end = time()
times[f].append(end - start)
del result
# Results
for f in sorted(fs, key=stats):
print(stats(f), f.__name__)
Related
I found this one line function on the python wiki that creates a set of all sets that can be created from a list passed as an argument.
f = lambda x: [[y for j, y in enumerate(set(x)) if (i >> j) & 1] for i in range(2**len(set(x)))]
Can someone please explain how this function works?
To construct the powerset, iterating over 2**len(set(x)) gives you all the binary combinations of the set.
range(2**len(set(x))) == [00000, 00001, 00010, ..., 11110, 11111]
Now you just need to test if the bit is set in i to see if you need to include it in the set, e.g.:
>>> i = 0b10010
>>> [y for j, y in enumerate(range(5)) if (i >> j) & 1]
[1, 4]
Though I'm not sure how efficient it is given the call to set(x) for every iteration. There is a small hack that would avoid that:
f = lambda x: [[y for j, y in enumerate(s) if (i >> j) & 1] for s in [set(x)] for i in range(2**len(s))]
A couple of other forms using itertools:
import itertools as it
f1 = lambda x: [list(it.compress(s, i)) for s in [set(x)] for i in it.product((0,1), repeat=len(s))]
f2 = lambda x: list(it.chain.from_iterable(it.combinations(set(x), r) for r in range(len(set(x))+1)))
Note: this last one could just return an iterable vs list if you remove list() depending on the use-case this could save some memory.
Looking at some timings of a list of 25 random numbers 0-50:
%%timeit binary: 1 loop, best of 3: 20.1 s per loop
%%timeit binary+hack: 1 loop, best of 3: 17.9 s per loop
%%timeit compress/product: 1 loop, best of 3: 5.27 s per loop
%%timeit chain/combinations: 1 loop, best of 3: 659 ms per loop
Let's rewrite it a bit and break it down step by step:
f = lambda x: [[y for j, y in enumerate(set(x)) if (i >> j) & 1] for i in range(2**len(set(x)))]
is equivalent to:
def f(x):
n = len(set(x))
sets = []
for i in range(n): # all combinations of members of the set in binary
set_i = []
for j, y in enumerate(set(x)):
if (i>>j) & 1: #check if bit nr j is set
set_x.append(y)
sets.append(set_i)
return sets
for an input list like [1,2,3,4], the following happens:
n=4
range(2**n)=[0,1,2,3...15]
which, in binary is:
0,1,10,11,100...1110,1111
Enumerate makes tuples of y with its index, so in our case:
[(0,1),(1,2),(2,3),(3,4)]
The (i>>j) & 1 part might require some explanation.
(i>>j) shifts the number i j places to the right, e.g. in decimal: 4>>2=1, or in binary:100>>2=001. The & is the bit-wise and operator. This checks, for every bit of both operands, if they are 1 and returns the result as a number, acting like a filter: 10111 & 11001 = 10101.
In the case of our example, it checks if the bit at place j is 1. If it is, the corresponding value is added to the result list. This way the binary map of combinations is converted to a list of lists, which is returned.
How to best write a Python function (check_list) to efficiently test if an element (x) occurs at least n times in a list (l)?
My first thought was:
def check_list(l, x, n):
return l.count(x) >= n
But this doesn't short-circuit once x has been found n times and is always O(n).
A simple approach that does short-circuit would be:
def check_list(l, x, n):
count = 0
for item in l:
if item == x:
count += 1
if count == n:
return True
return False
I also have a more compact short-circuiting solution with a generator:
def check_list(l, x, n):
gen = (1 for item in l if item == x)
return all(next(gen,0) for i in range(n))
Are there other good solutions? What is the best efficient approach?
Thank you
Instead of incurring extra overhead with the setup of a range object and using all which has to test the truthiness of each item, you could use itertools.islice to advance the generator n steps ahead, and then return the next item in the slice if the slice exists or a default False if not:
from itertools import islice
def check_list(lst, x, n):
gen = (True for i in lst if i==x)
return next(islice(gen, n-1, None), False)
Note that like list.count, itertools.islice also runs at C speed. And this has the extra advantage of handling iterables that are not lists.
Some timing:
In [1]: from itertools import islice
In [2]: from random import randrange
In [3]: lst = [randrange(1,10) for i in range(100000)]
In [5]: %%timeit # using list.index
....: check_list(lst, 5, 1000)
....:
1000 loops, best of 3: 736 µs per loop
In [7]: %%timeit # islice
....: check_list(lst, 5, 1000)
....:
1000 loops, best of 3: 662 µs per loop
In [9]: %%timeit # using list.index
....: check_list(lst, 5, 10000)
....:
100 loops, best of 3: 7.6 ms per loop
In [11]: %%timeit # islice
....: check_list(lst, 5, 10000)
....:
100 loops, best of 3: 6.7 ms per loop
You could use the second argument of index to find the subsequent indices of occurrences:
def check_list(l, x, n):
i = 0
try:
for _ in range(n):
i = l.index(x, i)+1
return True
except ValueError:
return False
print( check_list([1,3,2,3,4,0,8,3,7,3,1,1,0], 3, 4) )
About index arguments
The official documentation does not mention in its Python Tutuorial, section 5 the method's second or third argument, but you can find it in the more comprehensive Python Standard Library, section 4.6:
s.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i and before index j) (8)
(8) index raises ValueError when x is not found in s. When supported, the additional arguments to the index method allow efficient searching of subsections of the sequence. Passing the extra arguments is roughly equivalent to using s[i:j].index(x), only without copying any data and with the returned index being relative to the start of the sequence rather than the start of the slice.
Performance Comparison
In comparing this list.index method with the islice(gen) method, the most important factor is the distance between the occurrences to be found. Once that distance is on average 13 or more, the list.index has a better performance. For lower distances, the fastest method also depends on the number of occurrences to find. The more occurrences to find, the sooner the islice(gen) method outperforms list.index in terms of average distance: this gain fades out when the number of occurrences becomes really large.
The following graph draws the (approximate) border line, at which both methods perform equally well (the X-axis is logarithmic):
Ultimately short circuiting is the way to go if you expect a significant number of cases will lead to early termination. Let's explore the possibilities:
Take the case of the list.index method versus the list.count method (these were the two fastest according to my testing, although ymmv)
For list.index if the list contains n or more of x and the method is called n times. Whilst within the list.index method, execution is very fast, allowing for much faster iteration than the custom generator. If the occurances of x are far enough apart, a large speedup will be seen from the lower level execution of index. If instances of x are close together (shorter list / more common x's), much more of the time will be spent executing the slower python code that mediates the rest of the function (looping over n and incrementing i)
The benefit of list.count is that it does all of the heavy lifting outside of slow python execution. It is a much easier function to analyse, as it is simply a case of O(n) time complexity. By spending almost none of the time in the python interpreter however it is almost gaurenteed to be faster for short lists.
Summary of selection criteria:
shorter lists favor list.count
lists of any length that don't have a high probability to short circuit favor list.count
lists that are long and likely to short circuit favor list.index
I would recommend using Counter from the collections module.
from collections import Counter
%%time
[k for k,v in Counter(np.random.randint(0,10000,10000000)).items() if v>1100]
#Output:
Wall time: 2.83 s
[1848, 1996, 2461, 4481, 4522, 5844, 7362, 7892, 9671, 9705]
This shows another way of doing it.
Sort the list.
Find the index of the first occurrence of the item.
Increase the index by one less than the number of times the item must occur. (n - 1)
Find if the element at that index is the same as the item you want to find.
def check_list(l, x, n):
_l = sorted(l)
try:
index_1 = _l.index(x)
return _l[index_1 + n - 1] == x
except IndexError:
return False
c=0
for i in l:
if i==k:
c+=1
if c>=n:
print("true")
else:
print("false")
Another possibility might be:
def check_list(l, x, n):
return sum([1 for i in l if i == x]) >= n
I am implementing a reverse(s) function in Python 2.7 and I made a code like this:
# iterative version 1
def reverse(s):
r = ""
for c in range(len(s)-1, -1, -1):
r += s[c];
return r
print reverse("Be sure to drink your Ovaltine")
But for each iteration, it gets the length of the string even though it's been deducted.
I made another version that
# iterative version 2
def reverse(s):
r = ""
l = len(s)-1
for c in range(l, -1, -1):
r += s[c];
return r
print reverse("Be sure to drink your Ovaltine")
This version remembers the length of the string and doesn't ask for it every iteration, is this faster for longer strings (like a string that has the length of 1024) than the first version or does it have no effect at all?
In [12]: %timeit reverse("Be sure to drink your Ovaltine")
100000 loops, best of 3: 2.53 µs per loop
In [13]: %timeit reverse1("Be sure to drink your Ovaltine")
100000 loops, best of 3: 2.55 µs per loop
reverse is your first method, reverse1 is the second.
As you can see from timing there is very little difference in the performance.
You can use Ipython to time your code with the above syntax, just def your functions and use %timeit and then your function and whatever parameters .
In the line
for c in range(len(s)-1, -1, -1):
len(s) is evaluated only once, and the result (minus one) passed as an argument to range. Therefore the two versions are almost identical - if anything, the latter may be (very) slightly slower, as it creates a new name to assign the result of the subtraction.
I've tried the basic cython tutorial here to see how significant the speed up is.
I've also made two different python implementations which differ quit significantly in runtime. I've tested run times of the differences, and as far as I can see, they do not explain the overall runtime difference.
The code is calculating the first kmax primes:
def pyprimes1(kmax):
p=[]
result = []
if kmax > 1000:
kmax = 1000
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p.append(n)
k = k + 1
result.append(n)
n = n + 1
return result
def pyprimes2(kmax):
p=zeros(kmax)
result = []
if kmax > 1000:
kmax = 1000
p=zeros(kmax)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
return result
As you can see, the only difference between the two implementations is in the usage of the p variable, in the first it is a python list, in the other it is a numpy array. I used IPython %timeit magic to test timinigs. who do you think preformed better? here is what I got:
%timeit pyprimes1(1000)
10 loops, best of 3: 79.4 ms per loop
%timeit pyprimes2(1000)
1 loops, best of 3: 1.14 s per loop
That was strange and surprising, as I thought a numpy array pre-allocated and probably C implemented would be much faster.
I've also test:
array assignment:
%timeit p[100]=5
10000000 loops, best of 3: 116 ns per loop
array selection:
%timeit p[100]
1000000 loops, best of 3: 252 ns per loop
which was twice slower.. also didnt expect that.
array initialization:
%timeit zeros(1000)
1000000 loops, best of 3: 1.65 µs per loop
list appending:
%timeit p.append(1)
10000000 loops, best of 3: 164 ns per loop
list selection:
%timeit p[100]
10000000 loops, best of 3: 56 ns per loop
So it seems list selection is 5 times faster then array selection.
I cant see how this numbers adds-up to the more then x10 time difference. while we do selection in each iteration, it is only 5 times faster.
Would appriciate an explanation regarding the timing differnces bewtween arrays and lists and also the overall time differnce between the two implementations. or am I using %timeit wrong by measuring time on increased length list?
BTW, the cython code did best at 3.5ms.
The 1000th prime number is 7919. So if on average the inner loops iterates kmax/2 times (very roughly), your program performs approx. 7919 * (1000/2) ~ = 4*106 selections from the array/list. If a single selection from a list for the first version takes 56 ns, even the selections wouldn't fit into 79 ms (0.056 µs * 4*106 ~ = 0.22 sec).
Probably these nanosecond times are not very accurate.
By the way, performance of append depends on size of the list. In some cases it can lead to reallocation, but in most the list has enough free space and it's lightning fast.
Numpy's main use case is to perform operations on whole arrays and slices, not single elements. Those operations are implemented in C and therefore much faster than the equivalent Python code. For example,
c = a + b
will be much faster than
for i in xrange(len(a)):
c[i] = a[i] + b[i]
even if the variables are numpy arrays in both cases.
However, single element operations like the ones you are testing may well be worse than Python lists. Python lists are plain C arrays of structs, which are quite simple to access.
On the other hand, accessing an element in a numpy array comes with lots of overhead to support multiple raw data formats and advanced indexing options, among other reasons.
If I make a list in Python and want to write a function that would return only odd numbers from a range 1 to x how would I do that?
For example, if I have list [1, 2, 3, 4] from 1 to 4 (4 ix my x), I want to return [1, 3].
If you want to start with an arbitrary list:
[item for item in yourlist if item % 2]
but if you're always starting with range, range(1, x, 2) is better!-)
For example:
$ python -mtimeit -s'x=99' 'filter(lambda(t): t % 2 == 1, range(1, x))'
10000 loops, best of 3: 38.5 usec per loop
$ python -mtimeit -s'x=99' 'range(1, x, 2)'
1000000 loops, best of 3: 1.38 usec per loop
so the right approach is about 28 times (!) faster than a somewhat-typical wrong one, in this case.
The "more general than you need if that's all you need" solution:
$ python -mtimeit -s'yourlist=range(1,99)' '[item for item in yourlist if item % 2]'
10000 loops, best of 3: 21.6 usec per loop
is only about twice as fast as the sample wrong one, but still over 15 times slower than the "just right" one!-)
What's wrong with:
def getodds(lst):
return lst[1::2]
....???
(Assuming you want every other element from some arbitrary sequence ... all those which have odd indexes).
Alternatively if you want all items from a list of numbers where the value of that element is odd:
def oddonly(lst):
return [x for x in lst if x % 2]
[Update: 2017]
You could use "lazy evaluation" to yield these from generators:
def get_elements_at_odd_indices(sequence):
for index, item in enumerate(sequence):
if index % 2:
yield item
else:
continue
For getting odd elements (rather than elements at each odd offset from the start of the sequence) you could use the even simpler:
def get_odd_elements(sequence):
for item in sequence:
if item % 2:
yield item
else:
continue
This should work for any sequence or iterable object types. (Obviously the latter only works for those sequences or iterables which yield numbers ... or other types for which % 2 evaluates to a meaningfully "odd" result).
Also note that, if we want to efficiently operate on Pandas series or dataframe columns, or the underlying NumPy then we could get the elements at odd indexes using the [1::2] slice notation, and we can get each of the elements containing odd values using NumPy's "fancy indexing"
For example:
import numpy as nd
arr = nd.arange(1000)
odds = arr[arr%2!=0]
I show the "fancy index" as arr[arr%2!=0] because that will generalize better to filtering out every third, fourth or other nth element; but you can use much more elaborate expressions.
Note that the syntax arr[arr%2!=0] may look a bit odd. It's magic in the way that NumPy over-rides various arithmetic and bitwise operators and augmented assignment operations. The point is that NumPy evaluates such operations into machine code which can be efficiently vectorized over NumPy arrays ... using SIMD wherever the underlying CPU supports. For example on typical laptop and desktop systems today NumPy can evaluate many arithmetic operations into SSE operations.
To have a range of odd/even numbers up to and possibly including a number n, you can:
def odd_numbers(n):
return range(1, n+1, 2)
def even_numbers(n):
return range(0, n+1, 2)
If you want a generic algorithm that will take the items with odd indexes from a sequence, you can do the following:
import itertools
def odd_indexes(sequence):
return itertools.islice(sequence, 1, None, 2)
def even_indexes(sequence):
return itertools.islice(sequence, 0, None, 2)