python: deque vs list performance comparison - python

In python docs I can see that deque is a special collection highly optimized for poping/adding items from left or right sides. E.g. documentation says:
Deques are a generalization of stacks and queues (the name is
pronounced “deck” and is short for “double-ended queue”). Deques
support thread-safe, memory efficient appends and pops from either
side of the deque with approximately the same O(1) performance in
either direction.
Though list objects support similar operations, they are optimized for
fast fixed-length operations and incur O(n) memory movement costs for
pop(0) and insert(0, v) operations which change both the size and
position of the underlying data representation.
I decided to make some comparisons using ipython. Could anyone explain me what I did wrong here:
In [31]: %timeit range(1, 10000).pop(0)
10000 loops, best of 3: 114 us per loop
In [32]: %timeit deque(xrange(1, 10000)).pop()
10000 loops, best of 3: 181 us per loop
In [33]: %timeit deque(range(1, 10000)).pop()
1000 loops, best of 3: 243 us per loop

Could anyone explain me what I did wrong here
Yes, your timing is dominated by the time to create the list or deque. The time to do the pop is insignificant in comparison.
Instead you should isolate the thing you're trying to test (the pop speed) from the setup time:
In [1]: from collections import deque
In [2]: s = list(range(1000))
In [3]: d = deque(s)
In [4]: s_append, s_pop = s.append, s.pop
In [5]: d_append, d_pop = d.append, d.pop
In [6]: %timeit s_pop(); s_append(None)
10000000 loops, best of 3: 115 ns per loop
In [7]: %timeit d_pop(); d_append(None)
10000000 loops, best of 3: 70.5 ns per loop
That said, the real differences between deques and list in terms of performance are:
Deques have O(1) speed for appendleft() and popleft() while lists have O(n) performance for insert(0, value) and pop(0).
List append performance is hit and miss because it uses realloc() under the hood. As a result, it tends to have over-optimistic timings in simple code (because the realloc doesn't have to move data) and really slow timings in real code (because fragmentation forces realloc to move all the data). In contrast, deque append performance is consistent because it never reallocs and never moves data.

For what it is worth:
Python 3
deque.pop vs list.pop
> python3 -mtimeit -s 'import collections' -s 'items = range(10000000); base = [*items]' -s 'c = collections.deque(base)' 'c.pop()'
5000000 loops, best of 5: 46.5 nsec per loop
> python3 -mtimeit -s 'import collections' -s 'items = range(10000000); base = [*items]' 'base.pop()'
5000000 loops, best of 5: 55.1 nsec per loop
deque.appendleft vs list.insert
> python3 -mtimeit -s 'import collections' -s 'c = collections.deque()' 'c.appendleft(1)'
5000000 loops, best of 5: 52.1 nsec per loop
> python3 -mtimeit -s 'c = []' 'c.insert(0, 1)'
50000 loops, best of 5: 12.1 usec per loop
Python 2
> python -mtimeit -s 'import collections' -s 'c = collections.deque(xrange(1, 100000000))' 'c.pop()'
10000000 loops, best of 3: 0.11 usec per loop
> python -mtimeit -s 'c = range(1, 100000000)' 'c.pop()'
10000000 loops, best of 3: 0.174 usec per loop
> python -mtimeit -s 'import collections' -s 'c = collections.deque()' 'c.appendleft(1)'
10000000 loops, best of 3: 0.116 usec per loop
> python -mtimeit -s 'c = []' 'c.insert(0, 1)'
100000 loops, best of 3: 36.4 usec per loop
As you can see, where it really shines is in appendleft vs insert.

I would recommend you to refer
https://wiki.python.org/moin/TimeComplexity
Python lists and deque have simlilar complexities for most operations(push,pop etc.)

I found my way to this question and thought I'd offer up an example with a little context.
A classic use-case for using a Deque might be rotating/shifting elements in a collection because (as others have mentioned), you get very good (O(1)) complexity for push/pop operations on both ends because these operations are just moving references around as opposed to a list which has to physically move objects around in memory.
So here are 2 very similar-looking implementations of a rotate-left function:
def rotate_with_list(items, n):
l = list(items)
for _ in range(n):
l.append(l.pop(0))
return l
from collections import deque
def rotate_with_deque(items, n):
d = deque(items)
for _ in range(n):
d.append(d.popleft())
return d
Note: This is such a common use of a deque that the deque has a built-in rotate method, but I'm doing it manually here for the sake of visual comparison.
Now let's %timeit.
In [1]: def rotate_with_list(items, n):
...: l = list(items)
...: for _ in range(n):
...: l.append(l.pop(0))
...: return l
...:
...: from collections import deque
...: def rotate_with_deque(items, n):
...: d = deque(items)
...: for _ in range(n):
...: d.append(d.popleft())
...: return d
...:
In [2]: items = range(100000)
In [3]: %timeit rotate_with_list(items, 800)
100 loops, best of 3: 17.8 ms per loop
In [4]: %timeit rotate_with_deque(items, 800)
The slowest run took 5.89 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 527 µs per loop
In [5]: %timeit rotate_with_list(items, 8000)
10 loops, best of 3: 174 ms per loop
In [6]: %timeit rotate_with_deque(items, 8000)
The slowest run took 8.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.1 ms per loop
In [7]: more_items = range(10000000)
In [8]: %timeit rotate_with_list(more_items, 800)
1 loop, best of 3: 4.59 s per loop
In [9]: %timeit rotate_with_deque(more_items, 800)
10 loops, best of 3: 109 ms per loop
Pretty interesting how both data structures expose an eerily similar interface but have drastically different performance :)

out of curiosity I tried inserting in beginning in list vs appendleft() of deque.
clearly deque is winner.

Related

what is the most efficient way to find the position of the first np.nan value?

consider the array a
a = np.array([3, 3, np.nan, 3, 3, np.nan])
I could do
np.isnan(a).argmax()
But this requires finding all np.nan just to find the first.
Is there a more efficient way?
I've been trying to figure out if I can pass a parameter to np.argpartition such that np.nan get's sorted first as opposed to last.
EDIT regarding [dup].
There are several reasons this question is different.
That question and answers addressed equality of values. This is in regards to isnan.
Those answers all suffer from the same issue my answer faces. Note, I provided a perfectly valid answer but highlighted it's inefficiency. I'm looking to fix the inefficiency.
EDIT regarding second [dup].
Still addressing equality and question/answers are old and very possibly outdated.
It might also be worth to look into numba.jit; without it, the vectorized version will likely beat a straight-forward pure-Python search in most scenarios, but after compiling the code, the ordinary search will take the lead, at least in my testing:
In [63]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [70]: %paste
import numba
def naive(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
def short(a):
return np.isnan(a).argmax()
#numba.jit
def naive_jit(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
#numba.jit
def short_jit(a):
return np.isnan(a).argmax()
## -- End pasted text --
In [71]: %timeit naive(a)
100 loops, best of 3: 7.22 ms per loop
In [72]: %timeit short(a)
The slowest run took 4.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 37.7 µs per loop
In [73]: %timeit naive_jit(a)
The slowest run took 6821.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.79 µs per loop
In [74]: %timeit short_jit(a)
The slowest run took 395.51 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop
Edit: As pointed out by #hpaulj in their answer, numpy actually ships with an optimized short-circuited search whose performance is comparable with the JITted search above:
In [26]: %paste
def plain(a):
return a.argmax()
#numba.jit
def plain_jit(a):
return a.argmax()
## -- End pasted text --
In [35]: %timeit naive(a)
100 loops, best of 3: 7.13 ms per loop
In [36]: %timeit plain(a)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.04 µs per loop
In [37]: %timeit naive_jit(a)
100000 loops, best of 3: 6.91 µs per loop
In [38]: %timeit plain_jit(a)
10000 loops, best of 3: 125 µs per loop
I'll nominate
a.argmax()
With #fuglede's test array:
In [1]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [2]: np.isnan(a).argmax()
Out[2]: 9999
In [3]: np.argmax(a)
Out[3]: 9999
In [4]: a.argmax()
Out[4]: 9999
In [5]: timeit a.argmax()
The slowest run took 29.94 ....
10000 loops, best of 3: 20.3 µs per loop
In [6]: timeit np.isnan(a).argmax()
The slowest run took 7.82 ...
1000 loops, best of 3: 462 µs per loop
I don't have numba installed, so can compare that. But my speedup relative to short is greater than #fuglede's 6x.
I'm testing in Py3, which accepts <np.nan, while Py2 raises a runtime warning. But the code search suggests this isn't dependent on that comparison.
/numpy/core/src/multiarray/calculation.c PyArray_ArgMax plays with axes (moving the one of interest to the end), and delegates the action to arg_func = PyArray_DESCR(ap)->f->argmax, a function that depends on the dtype.
In numpy/core/src/multiarray/arraytypes.c.src it looks like BOOL_argmax short circuits, returning as soon as it encounters a True.
for (; i < n; i++) {
if (ip[i]) {
*max_ind = i;
return 0;
}
}
And #fname#_argmax also short circuits on maximal nan. np.nan is 'maximal' in argmin as well.
#if #isfloat#
if (#isnan#(mp)) {
/* nan encountered; it's maximal */
return 0;
}
#endif
Comments from experienced c coders are welcomed, but it appears to me that at least for np.nan, a plain argmax will be as fast you we can get.
Playing with the 9999 in generating a shows that the a.argmax time depends on that value, consistent with short circuiting.
Here is a pythonic approach using itertools.takewhile():
from itertools import takewhile
sum(1 for _ in takewhile(np.isfinite, a))
Benchmark with generator_expression_within_next approach: 1
In [118]: a = np.repeat(a, 10000)
In [120]: %timeit next(i for i, j in enumerate(a) if np.isnan(j))
100 loops, best of 3: 12.4 ms per loop
In [121]: %timeit sum(1 for _ in takewhile(np.isfinite, a))
100 loops, best of 3: 11.5 ms per loop
But still (by far) slower than numpy approach:
In [119]: %timeit np.isnan(a).argmax()
100000 loops, best of 3: 16.8 µs per loop
1. The problem with this approach is using enumerate function. Which returns an enumerate object from the numpy array first (which is an iterator like object) and calling the generator function and next attribute of the iterator will take time.
When looking for the first match in various scenarios, we could iterate through and look for the first match and exit out on the first match rather than going/processing the entire array. So, we would have an approach using Python's next function , like so -
next((i for i, val in enumerate(a) if np.isnan(val)))
Sample runs -
In [192]: a = np.array([3, 3, np.nan, 3, 3, np.nan])
In [193]: next((i for i, val in enumerate(a) if np.isnan(val)))
Out[193]: 2
In [194]: a[2] = 10
In [195]: next((i for i, val in enumerate(a) if np.isnan(val)))
Out[195]: 5

Python, faster to use class property, or variable in a function?

In Python, theoretically, which method should be faster out of test1 and test2 (assuming same value of x). I have tried using %timeit but see very little difference.
import numpy as np
class Tester():
def __init__(self):
self.x = np.arange(100000)
def test1(self):
return np.sum(self.x * self.x )
def test2(self,x):
return np.sum(x*x)
In any implementation of Python, the time will be overwhelmingly dominated by the multiplication of two vectors with 100,000 elements each. Everything else is noise compared to that. Make the vector much smaller if you're really interested in measuring other overheads.
In CPython, test2() will most likely be a little faster. It has an "extra" argument, but arguments are unpacked "at C speed" so that doesn't matter much. Arguments are accessed the same way as local variables, via the LOAD_FAST opcode, which is a simple array[index] access.
In test1(), each instance of self.x causes the string "x" to be looked up in the dictionary self.__dict__. That's slower than an indexed array access. But compared to the time taken by the long-winded multiplication, it's basically nothing.
I know this sort of misses the point of the question, but since you tagged the question with numpy and are looking at speed differences for a large array, I thought I would mention that there are faster solutions would be something else entirely.
So, what you're doing is a dot product, so use numpy.dot, which is built with the multiplying and summing all together from an external library (LAPACK?) (For convenience I'll use the syntax of test1, despite #Tim's answer, because no extra argument needs to be passed.)
def test3(self):
return np.dot(self.x, self.x)
or possibly even faster (and certainly more general):
def test4(self):
return np.einsum('i,i->', self.x, self.x)
Here are some tests:
In [363]: paste
class Tester():
def __init__(self, n):
self.x = np.arange(n)
def test1(self):
return np.sum(self.x * self.x)
def test2(self, x):
return np.sum(x*x)
def test3(self):
return np.dot(self.x, self.x)
def test4(self):
return np.einsum('i,i->', self.x, self.x)
## -- End pasted text --
In [364]: t = Tester(10000)
In [365]: np.allclose(t.test1(), [t.test2(t.x), t.test3(), t.test4()])
Out[365]: True
In [366]: timeit t.test1()
10000 loops, best of 3: 37.4 µs per loop
In [367]: timeit t.test2(t.x)
10000 loops, best of 3: 37.4 µs per loop
In [368]: timeit t.test3()
100000 loops, best of 3: 15.2 µs per loop
In [369]: timeit t.test4()
100000 loops, best of 3: 16.5 µs per loop
In [370]: t = Tester(10)
In [371]: timeit t.test1()
100000 loops, best of 3: 16.6 µs per loop
In [372]: timeit t.test2(t.x)
100000 loops, best of 3: 16.5 µs per loop
In [373]: timeit t.test3()
100000 loops, best of 3: 3.14 µs per loop
In [374]: timeit t.test4()
100000 loops, best of 3: 6.26 µs per loop
And speaking of small, almost syntactic, speed differences, think of using a method rather than standalone function:
def test1b(self):
return (self.x*self.x).sum()
gives:
In [385]: t = Tester(10000)
In [386]: timeit t.test1()
10000 loops, best of 3: 40.6 µs per loop
In [387]: timeit t.test1b()
10000 loops, best of 3: 37.3 µs per loop
In [388]: t = Tester(3)
In [389]: timeit t.test1()
100000 loops, best of 3: 16.6 µs per loop
In [390]: timeit t.test1b()
100000 loops, best of 3: 14.2 µs per loop

Efficient iteration over slice in Python

How efficient are iterations over slice operations in Python? And if a copy is inevitable with slices, is there an alternative?
I know that a slice operation over a list is O(k), where k is the size of the slice.
x[5 : 5+k] # O(k) copy operation
However, when iterating over a part of a list, I find that the cleanest (and most Pythonic?) way to do this (without having to resort to indices) is to do:
for elem in x[5 : 5+k]:
print elem
However my intuition is that this still results in an expensive copy of the sublist, rather than simply iterating over the existing list.
Use:
for elem in x[5 : 5+k]:
It's Pythonic! Don't change this until you've profiled your code and determined that this is a bottleneck -- though I doubt you will ever find this to be the main source of a bottleneck.
In terms of speed it will probably be your best choice:
In [30]: x = range(100)
In [31]: k = 90
In [32]: %timeit x[5:5+k]
1000000 loops, best of 3: 357 ns per loop
In [35]: %timeit list(IT.islice(x, 5, 5+k))
100000 loops, best of 3: 2.42 us per loop
In [36]: %timeit [x[i] for i in xrange(5, 5+k)]
100000 loops, best of 3: 5.71 us per loop
In terms of memory, it is not as bad you might think. x[5: 5+k] is a shallow copy of part of x. So even if the objects in x are large, x[5: 5+k] is creating a new list with k elements which reference the same objects as in x. So you only need extra memory to create a list with k references to pre-existing objects. That probably is not going to be the source of any memory problems.
You can use itertools.islice to get a sliced iterator from the list:
Example:
>>> from itertools import islice
>>> lis = range(20)
>>> for x in islice(lis, 10, None, 1):
... print x
...
10
11
12
13
14
15
16
17
18
19
Update:
As noted by #user2357112 the performance of islice depends on the start point of slice and the size of the iterable, normal slice is going to be fast in almost all cases and should be preferred. Here are some more timing comparisons:
For Huge lists islice is slightly faster or equal to normal slice when the slice's start point is less than half the size of list, for bigger indexes normal slice is the clear winner.
>>> def func(lis, n):
it = iter(lis)
for x in islice(it, n, None, 1):pass
...
>>> def func1(lis, n):
#it = iter(lis)
for x in islice(lis, n, None, 1):pass
...
>>> def func2(lis, n):
for x in lis[n:]:pass
...
>>> lis = range(10**6)
>>> n = 100
>>> %timeit func(lis, n)
10 loops, best of 3: 62.1 ms per loop
>>> %timeit func1(lis, n)
1 loops, best of 3: 60.8 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 82.8 ms per loop
>>> n = 1000
>>> %timeit func(lis, n)
10 loops, best of 3: 64.4 ms per loop
>>> %timeit func1(lis, n)
1 loops, best of 3: 60.3 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 85.8 ms per loop
>>> n = 10**4
>>> %timeit func(lis, n)
10 loops, best of 3: 61.4 ms per loop
>>> %timeit func1(lis, n)
10 loops, best of 3: 61 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 80.8 ms per loop
>>> n = (10**6)/2
>>> %timeit func(lis, n)
10 loops, best of 3: 39.2 ms per loop
>>> %timeit func1(lis, n)
10 loops, best of 3: 39.6 ms per loop
>>> %timeit func2(lis, n)
10 loops, best of 3: 41.5 ms per loop
>>> n = (10**6)-1000
>>> %timeit func(lis, n)
100 loops, best of 3: 18.9 ms per loop
>>> %timeit func1(lis, n)
100 loops, best of 3: 18.8 ms per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 50.9 us per loop #clear winner for large index
>>> %timeit func1(lis, n)
For Small lists normal slice is faster than islice for almost all cases.
>>> lis = range(1000)
>>> n = 100
>>> %timeit func(lis, n)
10000 loops, best of 3: 60.7 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 59.6 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 59.9 us per loop
>>> n = 500
>>> %timeit func(lis, n)
10000 loops, best of 3: 38.4 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 33.9 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 26.6 us per loop
>>> n = 900
>>> %timeit func(lis, n)
10000 loops, best of 3: 20.1 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 17.2 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 11.3 us per loop
Conclusion:
Go for normal slices.
Just traverse the desired indexes, there's no need to create a new slice for this:
for i in xrange(5, 5+k):
print x[i]
Granted: it looks unpythonic, but it's more efficient than creating a new slice in the sense that no extra memory is wasted. An alternative would be to use an iterator, as demonstrated in #AshwiniChaudhary's answer.
You're already doing an O(n) iteration over the slice. In most cases, this will be much more of a concern than the actual creation of the slice, which happens entirely in optimized C. Looping over a slice once you've made it takes over twice as long as making the slice, even if you don't do anything with it:
>>> timeit.timeit('l[50:100]', 'import collections; l=range(150)')
0.46978958638010226
>>> timeit.timeit('for x in slice: pass',
'import collections; l=range(150); slice=l[50:100]')
1.2332711270150867
You might try iterating over the indices with xrange, but accounting for the time needed to retrieve the list element, it's slower than slicing. Even if you skip that part, it still doesn't beat slicing:
>>> timeit.timeit('for i in xrange(50, 100): x = l[i]', 'l = range(150)')
4.3081963062022055
>>> timeit.timeit('for i in xrange(50, 100): pass', 'l = range(150)')
1.675838213385532
Do not use itertools.islice for this! It will loop through your list from the start rather than skipping to the values you want with __getitem__. Here's some timing data showing how its performance depends on where the slice starts:
>>> timeit.timeit('next(itertools.islice(l, 9, None))', 'import itertools; l = r
ange(1000000)')
0.5628290558478852
>>> timeit.timeit('next(itertools.islice(l, 999, None))', 'import itertools; l =
range(1000000)')
6.885294697594759
Here's islice losing to regular slicing:
>>> timeit.timeit('for i in itertools.islice(l, 900, None): pass', 'import itert
ools; l = range(1000)')
8.979957560911316
>>> timeit.timeit('for i in l[900:]: pass', 'import itertools; l = range(1000)')
3.0318417204211983
This is on Python 2.7.5, in case any later versions add list-specific optimizations.
I think a better way is using a c-like iteration if the 'k' is so large (as a large 'k' -like 10000000000000- even could make you wait about 10 hours to get answer in a pythonic for loop)
here is what I try to tell you do:
i = 5 ## which is the initial value
f = 5 + k ## which will be the final index
while i < f:
print(x[i])
i += 1
I assume this one could do it just in 5 hours (as the pythonic equivalent for loop do it for about 10 hours) for that large k that I told because it need to go from 5 to 10000000000005 just one time!
every use of 'range()' of 'xrange()' or even the slicing itself (as yo mentioned above) make the program just do 20000000000000 iterations that could lead to a longer execution time i think. (as I learn using a generator method will return an iterable object that need to run the generator completely first to be made and it takes twice time to do job; One for the generator itself and another for the 'for' loop)
Edited:
In python 3 the generator method/object does not need to run first to make the iterable object for the for loop
The accepted answer does not offer an efficient solution. It is in order of n, where n is a slice starting point. If n is large, it is going to be a problem.
The followup conclusion ("Go for normal slices") is not ideal either, because it uses extra space in order of k for copying.
Python provides a very elegant and efficient solution to the slicing problem called generator expression that works as optimal as you can get: O(1) space and O(k) running time:
(l[i] for i in range(n,n+k))
It is similar to list comprehension, except that it works lazily and you can combine it with other iterator tools like itertools module or filters. Note the round parenthesis enclosing the expression.
For iterating through subarray (for creating subarray see unutbu's answer) slicing is slightly faster than indexing in worst case (l[1:]).
10 items
========
Slicing: 2.570001e-06 s
Indexing: 3.269997e-06 s
100 items
=========
Slicing: 6.820001e-06 s
Indexing: 1.220000e-05 s
1000 items
==========
Slicing: 7.647000e-05 s
Indexing: 1.482100e-04 s
10000 items
===========
Slicing: 2.876200e-04 s
Indexing: 5.270000e-04 s
100000 items
============
Slicing: 3.763300e-03 s
Indexing: 7.731050e-03 s
1000000 items
=============
Slicing: 2.963523e-02 s
Indexing: 4.921381e-02 s
Benchmark code:
def f_slice(l):
for v in l[1:]:
_x = v
def f_index(l):
for i in range(1, len(l)):
_x = l[i]
from time import perf_counter
def func_time(func, l):
start = perf_counter()
func(l)
return perf_counter()-start
def bench(num_item):
l = list(range(num_item))
times = 10
t_index = t_slice = 0
for _ in range(times):
t_slice += func_time(f_slice, l)
t_index += func_time(f_index, l)
print(f"Slicing: {t_slice/times:e} s")
print(f"Indexing: {t_index/times:e} s")
for i in range(1, 7):
s = f"{10**i} items"
print(s)
print('='*len(s))
bench(10**i)
print()

String multiplication versus for loop

I was solving a Python question on CodingBat.com. I wrote following code for a simple problem of printing a string n times-
def string_times(str, n):
return n * str
Official result is -
def string_times(str, n):
result = ""
for i in range(n):
result = result + str
return result
print string_times('hello',3)
The output is same for both the functions. I am curious how string multiplication (first function) perform against for loop (second function) on performance basis. I mean which one is faster and mostly used?
Also please suggest me a way to get the answer to this question myself (using time.clock() or something like that)
We can use the timeit module to test this:
python -m timeit "100*'string'"
1000000 loops, best of 3: 0.222 usec per loop
python -m timeit "''.join(['string' for _ in range(100)])"
100000 loops, best of 3: 6.9 usec per loop
python -m timeit "result = ''" "for i in range(100):" " result = result + 'string'"
100000 loops, best of 3: 13.1 usec per loop
You can see that multiplying is the far faster option. You can take note that while the string concatenation version isn't that bad in CPython, that may not be true in other versions of Python. You should always opt for string multiplication or str.join() for this reason - not only but speed, but for readability and conciseness.
I've timed the following three functions:
def string_times_1(s, n):
return s * n
def string_times_2(s, n):
result = ""
for i in range(n):
result = result + s
return result
def string_times_3(s, n):
"".join(s for _ in range(n))
The results are as follows:
In [4]: %timeit string_times_1('hello', 10)
1000000 loops, best of 3: 262 ns per loop
In [5]: %timeit string_times_2('hello', 10)
1000000 loops, best of 3: 1.63 us per loop
In [6]: %timeit string_times_3('hello', 10)
100000 loops, best of 3: 3.87 us per loop
As you can see, s * n is not only the clearest and the most concise, it is also the fastest.
You can use the timeit stuff from either the command line or in code to see how fast some bit of python code is:
$ python -m timeit "\"something\" * 100"
1000000 loops, best of 3: 0.608 usec per loop
Do something similar for your other function and compare.

Two-dimensional vs. One-dimensional dictionary efficiency in Python

What is more efficient in terms of memory and speed between
d[(first,second)]
and
d[first][second],
where d is a dictionary of either tuples or dictionaries?
Here is some very basic test data that indicates that for a very contrived example(storing 'a' a million times using numbers as keys) using 2 dictionaries is significantly faster.
$ python -m timeit 'd = {i:{j:"a" for j in range(1000)} for i in range(1000)};a = [d[i][j] for j in range(1000) for i in range(1000)];'
10 loops, best of 3: 316 msec per loop
$ python -m timeit 'd = {(i, j):"a" for j in range(1000) for i in range(1000)};a = [d[i, j] for j in range(1000) for i in range(1000)];'
10 loops, best of 3: 970 msec per loop
Of course, these tests do not necessarily mean anything depending on what you are trying to do. Determine what you'll be storing, and then test.
A little more data:
$ python -m timeit 'a = [(hash(i), hash(j)) for i in range(1000) for j in range(1000)]'
10 loops, best of 3: 304 msec per loop
$ python -m timeit 'a = [hash((i, j)) for i in range(1000) for j in range(1000)]'
10 loops, best of 3: 172 msec per loop
$ python -m timeit 'd = {i:{j:"a" for j in range(1000)} for i in range(1000)}'
10 loops, best of 3: 101 msec per loop
$ python -m timeit 'd = {(i, j):"a" for j in range(1000) for i in range(1000)}'
10 loops, best of 3: 645 msec per loop
Once again this is clearly not indicative of real world use, but it seems to me like the cost of building a dictionary with tuples like that is huge and that's where the dictionary in a dictionary wins out. This surprises me, I was expecting completely different results. I'll have to try a few more things when I have time.
A little surprisingly, the dictionary of dictionaries is faster than the tuple in both CPython 2.7 and Pypy 1.8.
I didn't check on space, but you can do that with ps.

Categories