$ python -m timeit -s'tes = "987kkv45kk321"*100' 'a = [list(i) for i in tes.split("kk")]'
10000 loops, best of 3: 79.4 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"*100' 'b = list(map(list, tes.split("kk")))'
10000 loops, best of 3: 66.9 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"*10' 'a = [list(i) for i in tes.split("kk")]'
100000 loops, best of 3: 8.34 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"*10' 'b = list(map(list, tes.split("kk")))'
100000 loops, best of 3: 7.38 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"' 'a = [list(i) for i in tes.split("kk")]'
1000000 loops, best of 3: 1.51 usec per loop
$ python -m timeit -s'tes = "987kkv45kk321"' 'b = list(map(list, tes.split("kk")))'
1000000 loops, best of 3: 1.63 usec per loop
I tried using timeit and wonder why creating list of lists from string.split() with list comprehension is faster for a shorter string but slower for longer string.
The fixed setup costs for map are higher than the setup costs for the listcomp solution. But the per-item costs for map are lower. So for short inputs, map is paying more in fixed setup costs than it saves on the per item costs (because there are so few items). When the number of items increases, the fixed setup costs for map don't change, but the savings per item is being reaped for more items, so map slowly pulls ahead.
Things that map saves on:
Only looks up list once (the listcomp has to look it up in the builtin namespace every single loop, after checking the nested and global scopes first, because it can't guarantee list isn't overridden from loop to loop)
Executes no Python bytecode per item (because the mapping function is also C level), so the interpreter doesn't get involved at all, reducing the amount of hot C level code
map loses on the actual call to map (C built-in functions are fast to run, but comparatively slow to call, especially if they take variable length arguments), and the creation and cleanup of the map object (the listcomp closure is compiled up front). But as I noted above, neither of these is tied to the size of the inputs, so you make up for it rapidly if the mapping function is a C builtin.
This kind of timing is basically useless.
The time frames you are getting are in microseconds - and you are just creating tens of different one-character-long-elements list in each interaction. You get basically linear type, because the number of objects you create is proportional to your string lengths. There is hardly any surprise in this.
I have a (potentially quite big) dictionary and a list of 'possible' keys. I want to quickly find which of the keys have matching values in the dictionary. I've found lots of discussion of single dictionary values here and here, but no discussion of speed or multiple entries.
I've come up with four ways, and for the three that work best I compare their speed on different sample sizes below - are there better methods? If people can suggest sensible contenders I'll subject them to the analysis below as well.
Sample lists and dictionaries are created as follows:
import cProfile
from random import randint
length = 100000
listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}
Method 1: the 'in' keyword:
def way1(theList,theDict):
resultsList = []
for listItem in theList:
if listItem in theDict:
resultsList.append(theDict[listItem])
return resultsList
cProfile.run('way1(listOfRandomInts,dictionaryOfRandomInts)')
32 function calls in 0.018 seconds
Method 2: error handling:
def way2(theList,theDict):
resultsList = []
for listItem in theList:
try:
resultsList.append(theDict[listItem])
except:
;
return resultsList
cProfile.run('way2(listOfRandomInts,dictionaryOfRandomInts)')
32 function calls in 0.087 seconds
Method 3: set intersection:
def way3(theList,theDict):
return list(set(theList).intersection(set(theDict.keys())))
cProfile.run('way3(listOfRandomInts,dictionaryOfRandomInts)')
26 function calls in 0.046 seconds
Method 4: Naive use of dict.keys():
This is a cautionary tale - it was my first attempt and BY FAR the slowest!
def way4(theList,theDict):
resultsList = []
keys = theDict.keys()
for listItem in theList:
if listItem in keys:
resultsList.append(theDict[listItem])
return resultsList
cProfile.run('way4(listOfRandomInts,dictionaryOfRandomInts)')
12 function calls in 248.552 seconds
EDIT: Bringing the suggestions given in the answers into the same framework that I've used for consistency. Many have noted that more performance gains can be achieved in Python 3.x, particularly list comprehension-based methods. Many thanks for all of the help!
Method 5: Better way of performing intersection (thanks jonrsharpe):
def way5(theList, theDict):
return = list(set(theList).intersection(theDict))
25 function calls in 0.037 seconds
Method 6: List comprehension (thanks jonrsharpe):
def way6(theList, theDict):
return [item for item in theList if item in theDict]
24 function calls in 0.020 seconds
Method 7: Using the & keyword (thanks jonrsharpe):
def way7(theList, theDict):
return list(theDict.viewkeys() & theList)
25 function calls in 0.026 seconds
For methods 1-3 and 5-7 I timed them as above with list/dictionaries of length 1000, 10000, 100000, 1000000, 10000000 and 100000000 and show a log-log plot of time taken. Across all lengths the intersection and in-statement method perform better. The gradients are all about 1 (maybe a bit higher), indicating O(n) or perhaps slightly super-linear scaling.
Of a couple of additional methods I've tried, the fastest was a simple list comprehension:
def way6(theList, theDict):
return [item for item in theList if item in theDict]
This runs the same process as your fastest approach, way1, but more quickly. For comparison, the quickest set-based way was
def way5(theList, theDict):
return list(set(theList).intersection(theDict))
timeit results:
>>> import timeit
>>> setup = """from __main__ import way1, way5, way6
from random import randint
length = 100000
listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}
"""
>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.550477756582723
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
19.597916393388232
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.652289059326904
Having added #abarnert's suggestion:
def way7(theList, theDict):
return list(theDict.viewkeys() & theList)
and re-run the timing I now get:
>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.110055883138497
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.292466681101036
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.351759544463917
>>> timeit.timeit('way7(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.206370930653392
way1 and way6 have switched places, so I re-ran again:
>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.648176054011941
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.847062579316628
So it looks like the set approach is slower than the list, but the difference between the list and list comprehension is (surprisingly, to me at least) is a bit variable. I'd say just pick one, and not worry about it unless it becomes a real bottleneck later.
First, I think you're on 2.7, so I'll do most of this with 2.7. But it's worth noting that if you're really interested in optimizing your code, the 3.x branch continues to get performance improvements, and the 2.x branch never will. And why are you using CPython instead of PyPy?
Anyway, some further micro-optimizations to try (in addition to the ones in jonrsharpe's answer:
Caching attribute and/or global lookups in local variables (it's called LOAD_FAST for a reason). For example:
def way1a(theList, theDict):
resultsList = []
rlappend = resultsList.append
for listItem in theList:
if listItem in theDict:
rlappend(theDict[listItem])
return resultsList
In [10]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.2 ms per loop
In [11]: %timeit way1a(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 12.4 ms per loop
But for some operator special methods, like __contains__ and __getitem__, that may not be worth doing. Of course you won't know until you try:
def way1b(theList, theDict):
resultsList = []
rlappend = resultsList.append
tdin = theDict.__contains__
tdgi = theDict.__getitem__
for listItem in theList:
if tdin(listItem):
rlappend(tdgi(listItem))
return resultsList
In [14]: %timeit way1b(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 12.8 ms per loop
Meanwhile, Jon's way6 answer already optimizes out the resultList.append entirely by using a listcomp, and we just saw that optimizing out the lookups he does have probably won't help. Especially in 3.x, where the comprehension is going to be compiled into a function of its own, but even in 2.7 I wouldn't expect any benefit, for the same reasons as in the explicit loop. But let's try just to be sure:
def way6(theList, theDict):
return [theDict[item] for item in theList if item in theDict]
def way6a(theList, theDict):
tdin = theDict.__contains__
tdgi = theDict.__getitem__
return [tdgi(item) for item in theList if tdin(item)]
In [31]: %timeit way6(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 14.7 ms per loop
In [32]: %timeit way6a(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.9 ms per loop
Surprisingly (at least to me), this time it actually helped. Not sure why.
But what I was really setting up for was this: another advantage of turning both the filter expression and the value expression into function calls is that we can use filter and map:
def way6b(theList, theDict):
tdin = theDict.__contains__
tdgi = theDict.__getitem__
return map(tdgi, filter(tdin, theList))
def way6c(theList, theDict):
tdin = theDict.__contains__
tdgi = theDict.__getitem__
return map(tdgi, ifilter(tdin, theList))
In [34]: %timeit way6b(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 10.7 ms per loop
In [35]: %timeit way6c(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13 ms per loop
But that gain is largely 2.x-specific; 3.x has faster comprehensions, while its list(map(filter(…))) is slower than 2.x's map(filter(…)) or map(ifilter(…)).
You don't need to convert both sides of a set intersection to a set, just the left side; the right side can be any iterable, and a dict is already an iterable of its keys.
But, even better, a dict's key view (dict.keys in 3.x, dict.keyview in 2.7) is already a set-like object, and one backed by the dict's hash table, so you don't need to transform anything. (It doesn't have quite the same interface—it has no intersection method but its & operator takes iterables, unlike set, which has an intersection method that takes iterables but its & only takes sets. That's annoying, but we only care about performance here, right?)
def way3(theList,theDict):
return list(set(theList).intersection(set(theDict.keys())))
def way3a(theList,theDict):
return list(set(theList).intersection(theDict))
def way3b(theList,theDict):
return list(theDict.viewkeys() & theList)
In [20]: %timeit way3(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 23.7 ms per loop
In [20]: %timeit way3a(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 15.5 ms per loop
In [20]: %timeit way3b(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 15.7 ms per loop
That last one didn't help (although using Python 3.4 instead of 2.7, it was 10% faster…), but the first one definitely did.
In real life, you may also want to compare the sizes of the two collections to decide which one gets setified, but here that information is static, so there's no point writing the code to test it.
Anyway, my fastest result was the map(filter(…)) on 2.7, by a pretty good margin. On 3.4 (which I didn't show here), Jon's listcomp was fastest (even fixed to return the values rather than the keys), and faster than any of the 2.7 methods. Also, 3.4's fastest set operation (using the key view as a set and the list as an iterable) were a lot closer to the iterative methods than in 2.7.
$ ipython2 # Apple CPython 2.7.6
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.8 ms per loop
$ python27x -m ipython # custom-built 2.7.9
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.7 ms per loop
$ ipython3 # python.org CPython 3.4.1
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 12.8 ms per loop
So, that's an 8% speedup just by using a later Python. (And the speedup was closer to 20% on the listcomp and dict-key-view versions.) And it's not because Apple's 2.7 is bad or anything, it's just that 3.x has continued to get optimizations over the past 5+ years, while 2.7 has not (and never will again).
And meanwhile:
$ ipython_pypy # PyPy 2.5.0 Python 2.7.8
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
1000000000 loops, best of 3: 1.97 ns per loop
That's a 7000000x speedup just by typing 5 extra characters. :)
I'm sure it's cheating here. Either the JIT implicitly memoized the result, or it's noticed that I didn't even look at the result and pushed that up the chain and realized it didn't need to do any of the steps, or something. But this actually happens in real life sometimes; I've had a huge mess of code that spent 3 days debugging and trying to optimize before realizing that everything it did was unnecessary…
At any rate, speedups on the order of 10x are pretty typical from PyPy even when it can't cheat. And it's a lot easier than tweaking attribute lookups or reversing the order of who gets turned into a set for 5%.
Jython is more unpredictable—sometimes almost as fast as PyPy, sometimes much slower than CPython. Unfortunately, timeit is broken in Jython 2.5.3, and I just broke my Jython 2.7 completely by upgrading from rc2 to rc3, so… no tests today. Similarly, IronPython is basically Jython redone on a different VM; it's usually faster, but again unpredictable. But my current version of Mono and my current version of IronPython aren't playing nice together, so no tests there either.
In python, which is faster?
1
for word in listOfWords:
doSomethingToWord(word)
2
for i in range(len(listOfWords)):
doSomethingToWord(listOfWords[i])
Of course I'd use xrange in python 2.x.
My assumption is 1. is faster than 2. If so, why is it?
Use Python's timeit module to answer this kind of question:
duncan#ubuntu:~$ python -m timeit -s "listOfWords=['hello']*1000" "for word in listOfWords: len(word)"
10000 loops, best of 3: 37.2 usec per loop
duncan#ubuntu:~$ python -m timeit -s "listOfWords=['hello']*1000" "for i in range(len(listOfWords)): len(listOfWords[i])"
10000 loops, best of 3: 52.1 usec per loop
Instead of asking this questions, you can always try do them by yourself. It is not hard.
Super simple benchmarking will show you the difference.
from datetime import datetime
arr = [4 for _ in xrange(10**8)]
startTime = datetime.now()
for i in arr:
i
print datetime.now() - startTime
startTime = datetime.now()
for i in xrange(len(arr)):
arr[i]
print datetime.now() - startTime
On my machine it is:
0:00:04.822513
0:00:05.676396
Note that the list you are iterating should be pretty big to see the difference. The second loop is longer because each time you need to make a look up by index (arr[i]) and also to generate the values for xrange.
Please do not spend too much time in mostly useless microoptimization, rather try to look whether you can improve the computational complexity of your inner loop functions.
simply try timeit.
In [2]: def solve(listOfWords):
for word in range(len(listOfWords)):
pass
...:
In [3]: %timeit solve(xrange(10**5))
100 loops, best of 3: 4.34 ms per loop
In [4]: def solve(listOfWords):
for word in listOfWords:
pass
...:
In [5]: %timeit solve(xrange(10**5))
1000 loops, best of 3: 1.84 ms per loop
In addition to the speed advantage, 1 is "cleaner-looking", but also will work for sequences that do not support len, namely generator expressions and the results from generator functions. To use solution 2, you would first have to convert the generator to a list in order to get its length if you could. But what if the generator is generating the list of all prime numbers, and doSomething is looking for the first value > 100?
for num in prime_number_generator():
if num > 100: return num
There is no way to convert this to the second form, since this generator has no end.
Also, what if it is very expensive to create the elements of the list (as in fetching from a database, or remote web server)? If you are looking for a matching value out of a generated set of N values, with #1 you could exit as soon as you found a match, and avoid on average the generation of N/2 values. To use #2, you first have to generate all N values in order to get the length in order to make the range.
There is a reason Python 3 converted many builtins to return iterators instead of lists - they are more flexible.
What is Pythonic?
"for i in range(len(seq)):"? No.
Use "for x in seq:"
This is mostly an exercise in learning Python. I wrote this function to test if a number is prime:
def p1(n):
for d in xrange(2, int(math.sqrt(n)) + 1):
if n % d == 0:
return False
return True
Then I realized I can make easily rewrite it using any():
def p2(n):
return not any((n % d == 0) for d in xrange(2, int(math.sqrt(n)) + 1))
Performance-wise, I was expecting p2 to be faster than, or at the very least as fast as, p1 because any() is builtin, but for a large-ish prime, p2 is quite a bit slower:
$ python -m timeit -n 100000 -s "import test" "test.p1(999983)"
100000 loops, best of 3: 60.2 usec per loop
$ python -m timeit -n 100000 -s "import test" "test.p2(999983)"
100000 loops, best of 3: 88.1 usec per loop
Am I using any() incorrectly here? Is there a way to write this function using any() so that it's as far as iterating myself?
Update: Numbers for an even larger prime
$ python -m timeit -n 1000 -s "import test" "test.p1(9999999999971)"
1000 loops, best of 3: 181 msec per loop
$ python -m timeit -n 1000 -s "import test" "test.p2(9999999999971)"
1000 loops, best of 3: 261 msec per loop
The performance difference is minimal, but the reason it exists is that any incurs building a generator expression, and an extra function call, compared to the for loop. Both have identical behaviors, though (shortcut evaluation).
As the size of your input grows, the difference won't diminish (I was wrong) because you're using a generator expression, and iterating over it requires calling a method (.next()) on it and an extra stack frame. any does that under the hood, of course.
The for loop is iterating over an xrange object. any is iterating over a generator expression, which itself is iterating over an xrange object.
Either way, use whichever produces the most readable/maintainable code. Choosing one over the other will have little, if any, performance impact on whatever program you're writing.
I have a generator that generates a finite sequence. To determine
the length of this sequence I tried these two approaches:
seq_len = sum([1 for _ in euler14_seq(sv)]) # list comp
and
seq_len = sum(1 for _ in euler14_seq(sv)) # generator expression
sv is a constant starting value for the sequence.
I had expected that list comprehension would be slower and the
generator expression faster, but it turns out the other way around.
I assume the first one will be much more memory intensive since it
creates a complete list in memory first - part of the reason I also thought it would be slower.
My question: Is this observation generalizable? And is this due to
having two generators involved in the second statement vs the first?
I've looked at these What's the shortest way to count the number of items in a generator/iterator?, Length of generator output, and
Is there any built-in way to get the length of an iterable in python? and saw some other approaches to measuring the length of a sequence, but I'm specifically curious about the comparison of list comp vs generator expression.
PS: This came up when I decided to solve Euler Project #14 based on a
question asked on SO yesterday.
(By the way, what's the general feeling regarding use of the '_' in
places where variable values are not needed).
This was done with Python 2.7.2 (32-bit) under Windows 7 64-bit
On this computer, the generator expression becomes faster somewhere between 100,000 and 1,000,000
$ python -m timeit "sum(1 for x in xrange(100000))"
10 loops, best of 3: 34.8 msec per loop
$ python -m timeit "sum([1 for x in xrange(100000)])"
10 loops, best of 3: 20.8 msec per loop
$ python -m timeit "sum(1 for x in xrange(1000000))"
10 loops, best of 3: 315 msec per loop
$ python -m timeit "sum([1 for x in xrange(1000000)])"
10 loops, best of 3: 469 msec per loop
The following code block should generate the length:
>>> gen1 = (x for x in range(10))
>>> len(list(gen1))
10