I timed set() and list() constructors. set() was significantly slower than list(). I benchmarked them using values where no duplicates exist. I know set use hashtables is it reason it's slower?
I'm using Python 3.7.5 [MSC v.1916 64 bit (AMD64)], Windows 10, as of this writing( 8th March).
#No significant changed observed.
timeit set(range(10))
517 ns ± 4.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
timeit list(range(10))
404 ns ± 4.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
When the size increases set() became very slower than list()
# When size is 100
timeit set(range(100))
2.13 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
timeit list(range(100))
934 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# when size is ten thousand.
timeit set(range(10000))
325 µs ± 2.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
timeit list(range(10000))
240 µs ± 2.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# When size is one million.
timeit set(range(1000000))
86.9 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
timeit list(range(1000000))
37.7 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Both of them take O(n) asymptotically. When there are no duplicates shouldn't set(...) approximately equal be to list(...).
To my surprise set comprehension and list comprehension didn't show those huge deviations like set() and list() showed.
# When size is 100.
timeit {i for i in range(100)}
3.96 µs ± 858 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
timeit [i for i in range(100)]
3.01 µs ± 265 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# When size is ten thousand.
timeit {i for i in range(10000)}
434 µs ± 5.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
timeit [i for i in range(10000)]
395 µs ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# When size is one million.
timeit {i for i in range(1000000)}
95.1 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
timeit [i for i in range(1000000)]
87.3 ms ± 760 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Why should they be the same? Yes, they are both O(n) but set() needs to hash each element and needs to account for elements not being unique. This translates to a higher fixed cost per element.
Big O says nothing about absolute times, only how the time taken will grow as the size of the input grows. Two O(n) algorithms, given the same inputs, can take vastly different amounts of time to complete. All you can say is that when the size of the input doubles, the amount of time taken will (roughly) double, for both functions.
If you want to understand Big O better, I highly recommend Ned Batchelder’s introduction to the subject.
When there are no duplicates shouldn't set(...) approximately equal be to list(...).
No, they are not equal, because list() doesn't hash. That there are no duplicates doesn't figure.
To my suprise set comprehension and list comprehension didn't show those huge deviations like set() and list() showed.
The additional loop executed by the Python interpreter loop adds overhead that dominates the time taken. The higher fixed cost of set() is then less prominent.
There are other differences that may make a difference:
Given a sequence with a known length, list() can pre-allocate enough memory to fit those elements. Sets can't pre-allocate as they can't know how many duplicates there will be. Pre-allocating avoids the (amortised) cost of having to grow the list dynamically.
List and set comprehensions add one element at a time, so list objects can't preallocate, increasing the fixed per-item cost slightly.
Related
I have a Python loop that's expected to run billions of times, and as such it need to be as tightly optimized as possible.
One of the operations is checking if a list of ~50 items contains a float or an integer.
I know about the any() builtin method, but is it the fastest way to do this kind of checking?
This kind of question about how fast or slow something is can be tested for yourself using the timeit module, though it can be hard to know some different ways to test agains. Below I have tested several options and included the timings. Overall, for a 50 element list, checking types is very unlikely to be the bottleneck in a complex program
#initialize a list of integers to create a random list from
ch=[1,2,3,4,5,6,7]
#Fill a list with random integers, 5000 items in length just for a bigger test
arr=[random.choice(ch) for _ in range(5000)]
#Add a single string to the end for a worst case iterating scenario
arr.extend('a')
#check the end of the list for funsies
arr[-5:]
[3, 6, 7, 4, 'a']
#Check for stringiness with the OP-mentioned any() function
%timeit any(type(i)==str for i in arr)
2.52 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#Since isinstance() is the more pythonic way of assertive type-checking, let's see if it makes a difference
%timeit any(isinstance(i, str) for i in arr)
2.05 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#Define a function to make time checking easier
def check_list(a):
for i in a:
#stop iteration if a string is found
if isinstance(i, str):
return True
else:
return False
#Try out our function
%timeit check_list(arr)
711 ns ± 85.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
#let's pretend booleans are numbers to math up a solution
%timeit sum(map(lambda x:isinstance(x, str), arr))>0
2.86 ms ± 280 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#conversion to a set takes some time but reduces the number of items we need to check, so let's try it
%timeit any(type(i)==str for i in set(arr))
99.4 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Let's try our custom function with a set
%timeit check_list(set(arr))
115 µs ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
def check_set(a):
#let's convert the list to a set inside the function to see what happens
for i in set(a):
if isinstance(i, str):
return True
else:
return False
%timeit check_set(arr)
94.7 µs ± 1.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
We have a winner on this synthetic problem, but more importantly we can see how we tested several different options
I have the following issue: I have a matrix yj of size (m,200) (m = 3683), and I have a dictionary that for each key, returns a numpy array of row indices for yj (for each key, the size array changes, just in case anyone is wondering).
Now, I have to access this matrix lots of times (around 1M times) and my code is slowing down because of the indexing (I've profiled the code and it takes 65% of time on this step).
Here is what I've tried out:
First of all, use the indices for slicing:
>> %timeit yj[R_u_idx_train[1]]
10.5 µs ± 79.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The variable R_u_idx_train is the dictionary that has the row indices.
I thought that maybe boolean indexing might be faster:
>> yj[R_u_idx_train_mask[1]]
10.5 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
R_u_idx_train_mask is a dictionary that returns a boolean array of size m where the indices given by R_u_idx_train are set to True.
I also tried np.ix_
>> cols = np.arange(0,200)
>> %timeit ix_ = np.ix_(R_u_idx_train[1], cols); yj[ix_]
42.1 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I also tried np.take
>> %timeit np.take(yj, R_u_idx_train[1], axis=0)
2.35 ms ± 88.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And while this seems great, it is not, since it gives an array that is shape (R_u_idx_train[1].shape[0], R_u_idx_train[1].shape[0]) (it should be (R_u_idx_train[1].shape[0], 200)). I guess I'm not using the method correctly.
I also tried np.compress
>> %timeit np.compress(R_u_idx_train_mask[1], yj, axis=0)
14.1 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Finally I tried to index with a boolean matrix
>> %timeit yj[R_u_idx_train_mask2[1]]
244 µs ± 786 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So, is 10.5 µs ± 79.7 ns per loop the best I can do? I could try to use cython but that seems like a lot of work for just indexing...
Thanks a lot.
A very smart solution was given by V.Ayrat in the comments.
>> newdict = {k: yj[R_u_idx_train[k]] for k in R_u_idx_train.keys()}
>> %timeit newdict[1]
202 ns ± 6.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Anyway maybe it would still be cool to know if there is a way to speed it up using numpy!
I'm currently working with generators and factorials in python.
As an example:
itertools.permutations(range(100))
Meaning, I receive a generator object containing 100! values.
In reality, this code does look a bit more complicated; I'm using a list of sublists instead of range(100), with the goal to find a combination of those sublists meeting my conditions.
This is the code:
mylist = [[0, 0, 1], ..., [5, 7, 3]] # random numbers
x = True in (combination for combination in itertools.permutations(mylist)
if compare(combination))
# Compare() does return True for one or a few combination in that generator
I realized this is very time-consuming. Is there a more efficient way to do this, and, moreover, a way to compute how many time it is going to take?
I've done a few %timeit using ipython:
%timeit (combination for combination in itertools.permutations(mylist) if compare(combination))
--> 697 ns
%timeit (combination for combination in itertools.permutations(range(100)) if compare(combination))
--> 572 ns
Note: I do understand that the generator is just being created, when it's "consumed", meaning the genertor comprehension needs to be executed at first, to start the creaton of itself at all.
I've seen a lot of tutorials explaining how generators do work, but I've found nothing about the execution time.
Moreover I don't need an exact value, like timing the execution time using time-module in my program, hence I need a rough value before execution.
Edit:
I've also tested this for a smaller amount of values, for a list containing 24 sublists, 10 sublists and 5 sublists. Doing this, I receive an instant output.
This means, the program does work, it is just a matter of time.
My problem is (said more clarified): How much time is this going to take, and: Is there a less time consuming way to do it?
A comparison of generators, generator expression, lists and list comprehensions:
In [182]: range(5)
Out[182]: range(0, 5)
In [183]: list(range(5))
Out[183]: [0, 1, 2, 3, 4]
In [184]: (x for x in range(5))
Out[184]: <generator object <genexpr> at 0x7fc18cd88a98>
In [186]: [x for x in range(5)]
Out[186]: [0, 1, 2, 3, 4]
Some timings:
In [187]: timeit range(1000)
248 ns ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [188]: timeit (x for x in range(1000))
802 ns ± 6.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [189]: timeit [x for x in range(1000)]
43.4 µs ± 27.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [190]: timeit list(range(1000))
23.6 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
time for setting up a generator is (practically) independent of the parameter. Populating the list scales roughly with the size.
In [193]: timeit range(100000)
252 ns ± 1.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [194]: timeit list(range(100000))
4.41 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
edit
Timings show that a in test on a generator is somewhat faster than a list, but it still scales with the len:
In [264]: timeit True in (True for x in itertools.permutations(range(15),2) if x==(14,4))
17.1 µs ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [265]: timeit list (True for x in itertools.permutations(range(15),2) if x==(14,4))
18.5 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [266]: timeit (14,4) in itertools.permutations(range(15),2)
8.85 µs ± 8.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [267]: timeit list(itertools.permutations(range(15),2))
11.3 µs ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Consider the following code:
import numpy as np
import pandas as pd
a = pd.DataFrame({'case': np.arange(10000) % 100,
'x': np.random.rand(10000) > 0.5})
%timeit any(a.x)
%timeit a.x.max()
%timeit a.groupby('case').x.transform(any)
%timeit a.groupby('case').x.transform(max)
13.2 µs ± 179 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
195 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
25.9 ms ± 555 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.43 ms ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
b = pd.DataFrame({'x': np.random.rand(100) > 0.5})
%timeit any(b.x)
%timeit b.x.max()
13.1 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
81.5 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
We see that "any" works faster than "max" on a boolean pandas.Series of size 100 and 10000, but when we try to groupby and transform data in groups of 100, suddenly "max" is a lot faster than "any". Why?
Because any evaluation is lazy. Which means that the that the any function will stop at the first True boolean element.
The max, however, can't do so because it required to inspect every element in a sequence to be sure it haven't missed any greater element.
That's why, max always will inspect all element when any inspect only element before the first True.
The case when max works faster are probably the cases with type coercion because all values in numpy are stored in their own types and formats, mathematical operations may be faster that python's any.
As said in comment, the python any fonction have a short circuit mechanism, when np.any
have not. see here.
But True in a.x is even faster:
%timeit any(a.x)
53.6 µs ± 543 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit True in (a.x)
3.39 µs ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is only a question to satisfy my curiosity I'm not actually planning on using lists as arguments for a numba function.
But I was wondering why passing a list to a numba function seems like an O(n) operation, while it's an O(1) operation in pure-Python functions.
Some simple example code:
import numba as nb
#nb.njit
def take_list(lst):
return None
take_list([1, 2, 3]) # warmup
And the timings:
for size in [10, 100, 1000, 10000, 100000, 1000000]:
lst = [0]*size
print(len(lst))
%timeit take_list(lst) # IPythons "magic" timeit
Results:
10
4.06 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
100
14 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1000
109 µs ± 434 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
10000
1.08 ms ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
100000
10.7 ms ± 26.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1000000
112 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Manipulating a Python list takes Python API calls, which are forbidden in nopython mode. Numba actually copies the list contents into its own data structure, which takes time proportional to the size of the list.