Consider the following toy example:
>>> def square(x): return x*x
...
>>> [square(x) for x in range(12) if square(x) > 50]
[64, 81, 100, 121]
I have to call square(x) twice in the list comprehension. The duplication is ugly, bug-prone (it's easy to change only one of the two calls when modifying the code), and inefficient.
Of course I can do this:
>>> squares = [square(x) for x in range(12)]
>>> [s for s in squares if s > 50]
[64, 81, 100, 121]
or this:
[s for s in [square(x) for x in range(12)] if s > 50]
These are both livable, but it feels as though there might be a way to do it all in a single statement without nesting the two list comprehensions, which I know I'll have to stare it for a while next time I'm reading the code just to figure out what's going on. Is there a way?
I think a fair question to ask of me would be what I imagine such syntax could look like. Here are two ideas, but neither feels idiomatic in Python (nor do they work). They are inspired by anaphoric macros in Lisp.
[square(x) for x in range(12) if it > 50]
[it=square(x) for x in range(12) if it > 50]
You should use a generator:
[s for s in (square(x) for x in range(12)) if s > 50]
This avoids creating an intermediate unfiltered list of squares.
Here is a comparison of nested generator vs "chained" list comps vs calculating twice
$ python -m timeit "[s for n in range(12) for s in [n * n] if s > 50]"
100000 loops, best of 3: 2.48 usec per loop
$ python -m timeit "[s for s in (x * x for x in range(12)) if s > 50]"
1000000 loops, best of 3: 1.89 usec per loop
$ python -m timeit "[n * n for n in range(12) if n * n > 50]"
1000000 loops, best of 3: 1.1 usec per loop
$ pypy -m timeit "[s for n in range(12) for s in [n * n] if s > 50]"
1000000 loops, best of 3: 0.211 usec per loop
$ pypy -m timeit "[s for s in (x * x for x in range(12)) if s > 50]"
1000000 loops, best of 3: 0.359 usec per loop
$ pypy -m timeit "[n * n for n in range(12) if n * n > 50]"
10000000 loops, best of 3: 0.0834 usec per loop
I used n * n instead of square(n) because it was convenient and removes the function call overhead from the benckmark
TLDR: For simple cases it may be best to just duplicate the calculation.
Another alternative, using "chained" list comps rather than nested ones:
[s for n in range(12) for s in [square(n)] if s > 50]
Might be a weird read, though.
[square(s) for s in range(12) if s >= 7] # sqrt(50) = 7.071...
Or even simpler (no branching, woo!)
[square(s) for s in range(7, 12)] # sqrt(50) = 7.071...
EDIT: I'm blind, duplicated Eevee's answer.
It is possible to abuse iteration over a 1-element list to "bind" intermediate variables:
[s for x in range(12) for s in [square(x)] if s > 50]
I'm hesitant to recommend this as a readable solution though.
Pro: Compared to the nested comprehension, I prefer the order here — having for x in range(12) outside. You can just read it sequentially instead of zooming in then back out...
Con: The for s in [...] is a non-idiomatic hack and might give readers a pause. The nested comprehension while arguably harder to decipher at least uses language features in an "obvious" way.
Idea: Renaming the intermediate variable something like tmp could I think make it clearer.
The bottom line is that I'm not happy with either.
Probably the most readable is naming the intermediate generator:
squares = (square(x) for x in range(12))
result = [s for s in squares if s > 50]
[Side note: naming results of generator expressions is a bit rare. But read David Beazley's lecture and it might grow on you.]
OTOH if you're going to write such constructs a lot, go for the for tmp in [expr(x)] pattern — it will become "locally idiomatic" within your code and once familiar, its compactness will pay off. My readability concern is more about one-off use...
Related
def multArray(A, k):
A = [i * k for i in A]
return A
# tests
tests = (([5,12,31,7,25],10),
([-5,12,-31,7,25],10),
([-5,12,-31,7,25],-1),
([-5,12,-31,7,25],0),
([],10),
([],-1))
# should print: [50,120,310,70,250],[-50,120,-310,70,250],[5,-12,31,-7,-25],[0,0,0,0,0],[],[]
for (A,k) in tests:
multArray(A, k)
print(A)
This is the solution I see on here from other questions but I can't seem to get it to work. Needs to be done without maps or numpty.
Your example code is creating a new list with the updated values, however, the test prints the original list. To multiple the original list by a factor you need to update the values in the original list, something like this:
def multArray(A, k):
for i in range(len(A)):
A[i] *= k
# tests
tests = (([5,12,31,7,25],10),
([-5,12,-31,7,25],10),
([-5,12,-31,7,25],-1),
([-5,12,-31,7,25],0),
([],10),
([],-1))
for (A,k) in tests:
multArray(A, k)
print(A)
Creating new list vs. updating existing list
It must be noted that it is much quicker to create a new list in a case like this. If we look at the performance of creating a new list versus updating the existing list:
python -m timeit -s "a = [i for i in range(100)]" "b = [i * 7 for i in a]"
executes in:
50000 loops, best of 5: 8.54 usec per loop
whereas:
python -m timeit -s "a = [i for i in range(100)]" "for i in range(len(a)): a[i] *= 7"
executes in:
5000 loops, best of 5: 40.6 usec per loop
So it is roughly x5 faster to create a new list. There is an argument to say updating the existing list is more memory efficient, but certainly not for these types of examples.
Relative performance of numpy array
It should also be noted that multiplying all the elements in a numpy array by a particular value is much faster than either method using lists.
python -m timeit -s "import numpy as np; a = np.array([i for i in range(100)])" "a = a * 7"
executes in:
500000 loops, best of 5: 839 nsec per loop
This is because the memory allocated in numpy arrays is contiguous in memory so operations benefit from a wide range processor level caching effects.
Given a number of players n, I need to find H, the list of all tuples where each tuple is a combination of coalitions (of the players, e.g. (1,2,3) is the coalition of players 1, 2 and 3. ((1,2,3),(4,5),(6,)) is a combination of coalitions - which are also tuples) that respects this rule: each player appears only and exactly once (i.e. in only one coalition).
P.S. Each combination of coalitions is called layout in the code.
At the beginning I wrote a snippet in which I computed all combinations of all coalitions and for each combination I checked the rule. Problem is that for 5-6 players the number of combinations of coalitions was already so big that my computer went phut.
In order to avoid a a big part of the computation (all possible combinations, the loop and the ifs) I wrote the following (which I tested and it's equivalent to the previous snippet):
from itertools import combinations, combinations_with_replacement, product, permutations
players = range(1,n+1)
coalitions = [[coal for coal in list(combinations(players,length))] for length in players]
H = [tuple(coalitions[0]),(coalitions[-1][0],)]
combs = [comb for length in xrange(2,n) for comb in combinations_with_replacement(players,length) if sum(comb) == n]
perms = list(permutations(players))
layouts = set(frozenset(frozenset(perm[i:i+x]) for (i,x) in zip([0]+[sum(comb[:y]) for y in xrange(1,len(comb))],comb)) for comb in combs for perm in perms)
H.extend(tuple(tuple(tuple(coal) for coal in layout) for layout in layouts))
print H
EXPLANATION: say n = 3
First I create all possible coalitions:
coalitions = [[(1,),(2,),(3,)],[(1,2),(1,3),(2,3)],[(1,2,3)]]
Then I initialize H with the obvious combinations: each player in his own coalition and every player in the biggest coalition.
H = [((1,),(2,),(3,)),((1,2,3),)]
Then I compute all the possible forms of the layouts:
combs = [(1,2)] #(1,2) represents a layout in which there is
#one 1-player coalition and one 2-player coalition.
I compute the permutations (perms).
Finally for each perm and for each comb I calculate the different possible layouts. I set the result (layouts) in order to delete duplicates and add to H.
H = [((1,),(2,),(3,)),((1,2,3),),((1,2),(3,)),((1,3),(2,)),((2,3),(1,))]
Here's the comparison:
python script.py
4: 0.000520944595337 seconds
5: 0.0038321018219 seconds
6: 0.0408189296722 seconds
7: 0.431486845016 seconds
8: 6.05224680901 seconds
9: 76.4520540237 seconds
pypy script.py
4: 0.00342392921448 seconds
5: 0.0668039321899 seconds
6: 0.311077833176 seconds
7: 1.13124799728 seconds
8: 11.5973010063 seconds
9: went phut
Why is pypy that slower? What should I change?
First, I want to point out that you are studying the Bell numbers, which might ease the next part of your work, after you're done generating all the subsets. For example, it's easy to know how large each Bell set will be; OEIS has the sequence of Bell numbers already.
I hand-wrote the loops to generate the Bell sets; here is my code:
cache = {0: (), 1: ((set([1]),),)}
def bell(x):
# Change these lines to alter memoization.
if x in cache:
return cache[x]
previous = bell(x - 1)
new = []
for sets in previous:
r = []
for mark in range(len(sets)):
l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
r.append(tuple(l))
new.extend(r)
new.append(sets + (set([x]),))
cache[x] = tuple(new)
return new
I included some memoization here for practical purposes. However, by commenting out some code, and writing some other code, you can obtain the following un-memoized version, which I used for benchmarks:
def bell(x):
if x == 0:
return ()
if x == 1:
return ((set([1]),),)
previous = bell(x - 1)
new = []
for sets in previous:
r = []
for mark in range(len(sets)):
l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
r.append(tuple(l))
new.extend(r)
new.append(sets + (set([x]),))
cache[x] = tuple(new)
return new
My numbers are based on a several-year-old Thinkpad that I do most of my work on. Most of the smaller cases are way too fast to measure reliably (not even a single millisecond per trial for the first few), so my benchmarks are testing bell(9) through bell(11).
Benchmarks for CPython 2.7.11, using the standard timeit module:
$ python -mtimeit -s 'from derp import bell' 'bell(9)'
10 loops, best of 3: 31.5 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 176 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 1.07 sec per loop
And on PyPy 4.0.1, also using timeit:
$ pypy -mtimeit -s 'from derp import bell' 'bell(9)'
100 loops, best of 3: 14.3 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 90.8 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 675 msec per loop
So, the conclusion that I've come to is that itertools is not very fast when you try to use it outside of its intended idioms. Bell numbers are interesting combinatorically but they do not naturally arise from any simple composition of itertools widgets that I can find.
In response to your original query of what to do to make it faster: Just open-code it. Hope this helps!
~ C.
Here's a Pypy issue on itertools.product.
https://bitbucket.org/pypy/pypy/issues/1677/itertoolsproduct-slower-than-nested-fors
Note that our goal is to ensure that itertools is not massively slower than
plain Python, but we don't really care about making it exactly as fast (or
faster) as plain Python. As long as it's not massively slower, it's fine. (At
least I don't agree with you about whether a) or b) is easier to read :-)
Without studying your code in detail, it looks like it makes heavy use of the itertools combinations, permutations and product functions. In regular CPython those are written in compiled C code, with the intention of making them fast. Pypy does not implement the C code, so it shouldn't be surprising that these functions are slower.
Let's say I have a list:
list=['plu;ean;price;quantity','plu1;ean1;price1;quantity1']
I want to iterate over the list + split the list by ";" and put an if clause, like this:
for item in list:
split_item=item.split(";")
if split_item[0] == "string_value" or split_item[1] == "string_value":
do something.....
I was wondering, if this is the fastest way possible? Let's say my initial list is a lot bigger (has a lot more list items). I tried with list comprehensions:
item=[item.split(";") for item in list if item.split(";")[0] == "string_value" or item.split(";")[1] == "string_value"]
But this is actually giving me slower results. The first case is giving me an average of 90ms, while the second one is giving me an average of 130ms.
Am I doing the list comprehension wrong? Is there a faster solution?
I was wondering, if this is the fastest way possible?
No, of course not. You can implement it a lot faster in hand-coded assembly than in Python. So what?
If the "do something..." is not trivial, and there are many matches, the cost to do something 100000 times is going to be a lot more expensive than the cost of looping 500000 times, so finding the fastest way to loop doesn't matter at all.
In fact, just calling split two to three each loop instead of remembering and reusing the result is going to swamp the cost of iteration, and not passing a maxsplit argument when you only care about two results may as well.
So, you're trying to optimize the wrong thing. But what if, after you fix everything else, it turns out that the cost of iteration really does matter here?
Well, you can't use a comprehension directly to speed things up, because comprehensions are for expressions that return values, not statements to do things.
But, if you look at your code, you'll realize you're actually doing three things: splitting each string, then filtering out the ones that don't match, then doing the "do something". So, you can use a comprehension for the first two parts, and then you're only using a slow for loop for the much smaller list of values that passed the filter.
It looks like you tried this, but you made two mistakes.
First, you're better off with a generator expression than a list comprehension—you don't need a list here, just something to iterator over, so don't pay to build one.
Second, you don't want to split the string three times. You can probably find some convoluted way to get the split done once in a single comprehension, but why bother? Just write each step as its own step.
So:
split_items = (item.split(';') for item in items)
filtered_items = (item for item in split_items
if item[0] == "string_value" or item[1] == "string_value")
for item in filtered_items:
do something...
Will this actually be faster? If you can get some real test data, and "do something..." code, that shows that the iteration is a bottleneck, you can test on that real data and code. Until then, there's nothing to test.
Split the whole string only when the first two items retrieved from str.split(';', 2) satisfy the conditions:
>>> strs = 'plu;ean;price;quantity'
>>> strs.split(';', 2)
['plu', 'ean', 'price;quantity']
Here split the third item('price;quantity') only if the first two items have satisfied the condition:
>>> lis = ['plu;ean;price;quantity'*1000, 'plu1;ean1;price1;quantity1'*1000]*1000
Normal for-loop, single split of whole string for each item of the list.
>>> %%timeit
for item in lis:
split_item=item.split(";")
if split_item[0] == "plu" or split_item[1] == "ean":pass
...
1 loops, best of 3: 952 ms per loop
List comprehension equivalent to the for-loop above:
>>> %timeit [x for x in (item.split(';') for item in lis) if x[0]== "plu" or x[1]=="ean"]
1 loops, best of 3: 961 ms per loop
Split on-demand:
>>> %timeit [[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== "plu" or y=="ean"]
1 loops, best of 3: 508 ms per loop
Of course, if the list and strings are small then such optimisation doesn't matter.
EDIT: It turns out that the Regex cache was being a bit unfair to the competition. My bad. Regex is only a small percentage faster.
If you're looking for speed, hcwhsa's answer should be good enough. If you need slightly more, look to re.
import re
from itertools import chain
lis = ['plu;ean;price;quantity'*1000, 'plu1;ean1;price1;quantity1'*100]*1000
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
[l.split(';') for l in lis if matcher(l)]
Timings, for mostly positive results (aka. split is the major cause of slowness):
SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)] + ['plu;ean;price;quantity' for i in range(10000)]
"
python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"
We see mine's a little faster.
10 loops, best of 3: 55 msec per loop
10 loops, best of 3: 49.5 msec per loop
For mostly negative results (most things are filtered):
SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(1000)] + ['plu;ean;price;quantity' for i in range(10000)]
"
python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"
The lead's a touch higher.
10 loops, best of 3: 40.9 msec per loop
10 loops, best of 3: 35.7 msec per loop
If the result will always be unique, use
next([x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean')
or the faster Regex version
next(filter(matcher, lis)).split(';')
(use itertools.ifilter on Python 2).
Timings:
SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)] + ['plu;ean;price;quantity'] + ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)]
"
python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "next([x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean')"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"
python -m timeit -s "$SETUP" "next(filter(matcher, lis)).split(';')"
Results:
10 loops, best of 3: 31.3 msec per loop
100 loops, best of 3: 15.2 msec per loop
10 loops, best of 3: 28.8 msec per loop
100 loops, best of 3: 14.1 msec per loop
So this gives a substantial boost to both methods.
I found a good alternative here.
You can use a combination of map and filter. Try this:
>>>import itertools
>>>splited_list = itertools.imap(lambda x: x.split(";"), your_list)
>>>result = filter(lambda x: filter(lambda x: x[0] == "plu" or x[1] == "string_value", lista)
The first item will create a iterator of elements. And The second one will filter it.
I run a small benchmark in my IPython Notebook shell, and got the following results:
1st test:
With small sizes, the one-line solution works better
2nd test:
With a bigger list, the map/filter solution is slightly better
3rd test:
With a big list and bigger elements, the map/filter solution it`s way better.
I guess the difference in performance continues increasing as the size of the list goes by, untill peaks in 66% more time (in a 10000 elements list trial).
The difference between the map/filter solution and the list comprehension solutions is the number of calls to .split(). Ones calls it 3 times for each item, the other just one, because list comprehensions are just a pythonic way to do map/filter together. I used to use list comprehensions a lot, and thought that i don't knew what the lambda was all about. Untill i discovered that map and list comprehensions are the same thing.
If you don't care about memory usage, you can use regular map instead of imap. It will create the list with splits at once. It will use more memory to store it, but its slightly faster.
Actually, if you don't care about memory usage, you can write the map/filter solution using 2 list comprehensions, and get the same exact result. Checkout:
Is it possible to return two lists from a list comprehension? Well, this obviously doesn't work, but something like:
rr, tt = [i*10, i*12 for i in xrange(4)]
So rr and tt both are lists with the results from i*10 and i*12 respectively.
Many thanks
>>> rr,tt = zip(*[(i*10, i*12) for i in xrange(4)])
>>> rr
(0, 10, 20, 30)
>>> tt
(0, 12, 24, 36)
Creating two comprehensions list is better (at least for long lists). Be aware that, the best voted answer is slower can be even slower than traditional for loops. List comprehensions are faster and clearer.
python -m timeit -n 100 -s 'rr=[];tt = [];' 'for i in range(500000): rr.append(i*10);tt.append(i*12)'
10 loops, best of 3: 123 msec per loop
> python -m timeit -n 100 'rr,tt = zip(*[(i*10, i*12) for i in range(500000)])'
10 loops, best of 3: 170 msec per loop
> python -m timeit -n 100 'rr = [i*10 for i in range(500000)]; tt = [i*10 for i in range(500000)]'
10 loops, best of 3: 68.5 msec per loop
It would be nice to see list comprehensionss supporting the creation of multiple lists at a time.
However,
if you can take an advantage of using a traditional loop (to be precise, intermediate calculations), then it is possible that you will be better of with a loop (or an iterator/generator using yield). Here is an example:
$ python3 -m timeit -n 100 -s 'rr=[];tt=[];' "for i in (range(1000) for x in range(10000)): tmp = list(i); rr.append(min(tmp));tt.append(max(tmp))"
100 loops, best of 3: 314 msec per loop
$ python3 -m timeit -n 100 "rr=[min(list(i)) for i in (range(1000) for x in range(10000))];tt=[max(list(i)) for i in (range(1000) for x in range(10000))]"
100 loops, best of 3: 413 msec per loop
Of course, the comparison in these cases are unfair; in the example, the code and calculations are not equivalent because in the traditional loop a temporary result is stored (see tmp variable). So, the list comprehension is doing much more internal operations (it calculates the tmp variable twice!, yet it is only 25% slower).
It is possible for a list comprehension to return multiple lists if the elements are lists.
So for example:
>>> x, y = [[] for x in range(2)]
>>> x
[]
>>> y
[]
>>>
The trick with zip function would do the job, but actually is much more simpler and readable if you just collect the results in lists with a loop.
i would like to ask what is the best way to make simple iteration. suppose i want to repeat certain task 1000 times, which one of the following is the best? or is there a better way?
for i in range(1000):
do something with no reference to i
i = 0
while i < 1000:
do something with no reference to i
i += 1
thanks very much
The first is considered idiomatic. In Python 2.x, use xrange instead of range.
The for loop is more concise and more readable. while loops are rarely used in Python (with the exception of while True).
A bit of idiomatic Python: if you're trying to do something a set number of times with a range (with no need to use the counter), it's good practice to name the counter _. Example:
for _ in range(1000):
# do something 1000 times
In Python 2, use
for i in xrange(1000):
pass
In Python 3, use
for i in range(1000):
pass
Performance figures for Python 2.6:
$ python -s -m timeit '' 'i = 0
> while i < 1000:
> i += 1'
10000 loops, best of 3: 71.1 usec per loop
$ python -s -m timeit '' 'for i in range(1000): pass'
10000 loops, best of 3: 28.8 usec per loop
$ python -s -m timeit '' 'for i in xrange(1000): pass'
10000 loops, best of 3: 21.9 usec per loop
xrange is preferable to range in this case because it produces a generator rather than the whole list [0, 1, 2, ..., 998, 999]. It'll use less memory, too. If you needed the actual list to work with all at once, that's when you use range. Normally you want xrange: that's why in Python 3, xrange(...) becomes range(...) and range(...) becomes list(range(...)).
first. because the integer is done in the internal layer rather than interpretor. Also one less global variable.