In NumPy, larger arrays are created more quickly? - python

Is this a cache thing, as timeit suggests?
In [55]: timeit a = zeros((10000, 400))
100 loops, best of 3: 3.11 ms per loop
In [56]: timeit a = zeros((10000, 500))
The slowest run took 13.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.43 µs per loop
Tried to fool it, but it didn't work:
In [58]: timeit a = zeros((10000, 500+random.randint(100)))
The slowest run took 13.31 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.35 µs per loop

The reason is not caching but that numpy just creates a placeholder instead of the full array. This can be easily verified by monitoring your RAM usage when you do something like this:
a = np.zeros((20000, 20000), np.float64)
This doesn't allocate 20k*20k*8byte ~ 3GB on my computer (but might be OS-dependant because np.zeros uses the C function calloc). But be careful because most operations on this array (for example a += 5) will immediatly allocate that memory! Make sure you use an appropriate size compared to your RAM so that you'll notice the RAM increase without overusing it.
In the end this just postpones the allocation of the array, as soon as you operate with it the combined timing of allocation and operation should be as expected (linear with the number of elements). It seems you're using IPython so you can use a block-timeit %%timeit:
%%timeit
a = np.zeros((10000, 400))
a += 10
# => 10 loops, best of 3: 30.3 ms per loop
%%timeit
a = np.zeros((10000, 800))
a += 10
# => 10 loops, best of 3: 60.2 ms per loop

Related

Which is faster? Checking if something is in a Python list or not? I.e. membership vs non-membership

this might be a noob question or blindingly obvious to those who understand more computer science than I do. Perhaps that is why I could not find anything from Google or SO after some searching. Maybe I'm not using the right vocabulary.
The title says it all. If I know that x is in my_list most of the time, which of the following is faster?
if x in my_list:
func1(x)
else:
func2(x)
Or
if x not in my_list:
func2(x)
else:
func1(x)
Does the size of the list matter? E.g. ten elements vs 10,000 elements? For my particular case my_list consists of strings and integers, but does anyone have any idea if other considerations apply to more complicated types such as dicts?
Thank you.
Checking if element is in a list or if element is not in a list calling the same operation x in my_list, so there should not be any difference.
Does the size of the list matter?
Checking if element is in a list is an O(N) operation, this means that the size does matter, roughly proportionately.
If you need to do checking a lot, you probably want to look into set, checking if an element is in a set is O(1), this means that checking time does not change much as size of set increases.
There should be no noticeable performance difference. You are better off writing whichever one makes your code more readable. Either one will be O(n) complexity, and will mostly depend where the element is located in the list. Also you should avoid optimizing prematurely, it doesn't matter for most use cases, and when it does, you are usually better off using other data structures.
If you want to lookups with faster performance, use dicts, they are likely to have O(1) complexity.
For details refer to https://wiki.python.org/moin/TimeComplexity .
Python includes a module and function timeit that can tell you how long a snippet of code takes to execute. The snippet must be a single statement, which leaves out directly timing a compound statement like an if but we can wrap your statements in a function and time the function call.
Even easier than calling timeit.timeit() is using a jupyter notebook and using the magic %timeit magic statement at the beginning of a line.
This proves that long list or short, succeeding or failing, the two ways you ask about, checking in alist or not in alist, give timings that are the same within the variability of measurement.
import random
# set a seed so results will be repeatable
random.seed(456789)
# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)]
def func1(x):
# included just so we get a function call
return True
def func2(x):
# included just so we get a function call
return False
def way1(x):
if x in my_list:
result = func1(x)
else:
result = func2(x)
return result
def way2(x):
if x not in my_list:
result = func2(x)
else:
result = func1(x)
return result
%timeit way1(101) # failure with large list
The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop
%timeit way1(0) # success with large list
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop
my_list.index(0)
186
%timeit way2(101) # failure with large list
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop
%timeit way2(0) # success with large list
The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop
my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value
-37
# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop
# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop
One desired characteristic in software implementations is to have low coupling. Your implementation should not be defined by the way your Python interpreter tests for list membership, as that is a high level of coupling. It could be that the implementation changes and it is no longer the faster way.
All that we should care about in this case is that testing for membership in a list is linear on the size of the list. If faster membership testing is desired you could use a set.

How is numpy multi_dot slower than numpy.dot?

I'm trying to optimize some code that performs lots of sequential matrix operations.
I figured numpy.linalg.multi_dot (docs here) would perform all the operations in C or BLAS and thus it would be way faster than going something like arr1.dot(arr2).dot(arr3) and so on.
I was really surprised running this code on a notebook:
v1 = np.random.rand(2,2)
v2 = np.random.rand(2,2)
%%timeit
​
v1.dot(v2.dot(v1.dot(v2)))
The slowest run took 9.01 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.14 µs per loop
%%timeit ​
np.linalg.multi_dot([v1,v2,v1,v2])
The slowest run took 4.67 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 32.9 µs per loop
To find out that the same operation is about 10x slower using multi_dot.
My questions are:
Am I missing something ? does it make any sense ?
Is there another way to optimize sequential matrix operations ?
Should I expect the same behavior using cython ?
It's because your test matrices are too small and too regular; the overhead in figuring out the fastest evaluation order may outweights the potential performance gain.
Using the example from the document:
import numpy as snp
from numpy.linalg import multi_dot
# Prepare some data
A = np.random.rand(10000, 100)
B = np.random.rand(100, 1000)
C = np.random.rand(1000, 5)
D = np.random.rand(5, 333)
%timeit -n 10 multi_dot([A, B, C, D])
%timeit -n 10 np.dot(np.dot(np.dot(A, B), C), D)
%timeit -n 10 A.dot(B).dot(C).dot(D)
Result:
10 loops, best of 3: 12 ms per loop
10 loops, best of 3: 62.7 ms per loop
10 loops, best of 3: 59 ms per loop
multi_dot improves performance by evaluating the fastest multiplication order in which there are least scalar multiplications.
In the above case, the default regular multiplication order ((AB)C)D is evaluated as A((BC)D)--so that a 1000x100 # 100x1000 multiplication is reduced to 1000x100 # 100x333, cutting down at least 2/3 scalar multiplications.
You can verify this by testing
%timeit -n 10 np.dot(A, np.dot(np.dot(B, C), D))
10 loops, best of 3: 19.2 ms per loop

numpy's random vs python's default random subsampling

I observed that python's default random.sample is much faster than numpy's random.choice. Taking a small sample from an array of length 1 million, random.sample is more than 1000x faster than its numpy's counterpart.
In [1]: import numpy as np
In [2]: import random
In [3]: arr = [x for x in range(1000000)]
In [4]: nparr = np.array(arr)
In [5]: %timeit random.sample(arr, 5)
The slowest run took 5.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 4.54 µs per loop
In [6]: %timeit np.random.choice(arr, 5)
10 loops, best of 3: 47.7 ms per loop
In [7]: %timeit np.random.choice(nparr, 5)
The slowest run took 6.79 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 7.79 µs per loop
Although numpy sampling from numpy array was decently fast yet it was slower than default random sampling.
Is the observation above correct, or am I missing the difference between what random.sample and np.random.choice compute?
What you're seeing in your first call of numpy.random.choice is simply the overhead of converting the list arr to a numpy array.
As for your second call, the slightly worse is probably due to the fact that numpy.random.choice offers the ability to sample non-uniformly, and can also sample with replacement as well as without.

Fastest way to replace values in a numpy array with a list

I want to read a list into a numpy array. This list is being replaced in every iteration of a loop and further operations are done on the array. These operations include element-wise subtraction from another numpy array for a distance measure, and checking a threshold condition in this distance using the numpy.all() function. Currently I am using np.array( list ) each time to convert the list to an array:
#!/usr/bin/python
import numpy as np
a = [1.33,2.555,3.444,5.666,4.555,6.777,8.888]
%timeit b = np.array(a)
100000 loops, best of 3: 4.83 us per loop
Is it possible to do anything better than this, if I know the size of the list and it is invariable? Even small improvements are welcome, as I run this a very large number of times.
I've tried %timeit(np.take(a,range(len(a)),out=b)) which takes much longer: 100000 loops, best of 3: 16.8 us per loop
As you "know the size of the list and it is invariable", you can set up an array first:
b = np.zeros((7,))
This then works faster:
%timeit b[:] = a
1000000 loops, best of 3: 1.41 µs per loop
vs
%timeit b = np.array(a)
1000000 loops, best of 3: 1.67 µs per loop

Fast indexed dot-product for numpy/scipy

I'm using numpy to do linear algebra. I want to do fast subset-indexed dot and other linear operations.
When dealing with big matrices, slicing solution like A[:,subset].dot(x[subset]) may be longer than doing the multiplication on the full matrix.
A = np.random.randn(1000,10000)
x = np.random.randn(10000,1)
subset = np.sort(np.random.randint(0,10000,500))
Timings show that sub-indexing can be faster when columns are in one block.
%timeit A.dot(x)
100 loops, best of 3: 4.19 ms per loop
%timeit A[:,subset].dot(x[subset])
100 loops, best of 3: 7.36 ms per loop
%timeit A[:,:500].dot(x[:500])
1000 loops, best of 3: 1.75 ms per loop
Still the acceleration is not what I would expect (20x faster!).
Does anyone know an idea of a library/module that allow these kind of fast operation through numpy or scipy?
For now on I'm using cython to code a fast column-indexed dot product through the cblas library. But for more complex operation (pseudo-inverse, or subindexed least square solving) I'm not shure to reach good acceleration.
Thanks!
Well, this is faster.
%timeit A.dot(x)
#4.67 ms
%%timeit
y = numpy.zeros_like(x)
y[subset]=x[subset]
d = A.dot(y)
#4.77ms
%timeit c = A[:,subset].dot(x[subset])
#7.21ms
And you have all(d-ravel(c)==0) == True.
Notice that how fast this is depends on the input. With subset = array([1,2,3]) you have that the time of my solution is pretty much the same, while the timing of the last solution is 46micro seconds.
Basically this will be faster if the size ofsubset is not much smaller than the size of x

Categories