I'm quite new to NumPy (or SciPy) and coming from Octave/Matlab, this seems a bit challenging to me.
I'm reading through the docs and writing some basic functions. I came across this section: Vectorizing functions (vectorize)
It defines this function:
def addsubtract(a, b):
if a > b:
return a - b
else:
return a + b
Then vectorizes it:
vec_addsubtract = np.vectorize(addsubtract)
But at the end, it says:
This particular function could have been written in vector form without the use of vectorize.
I wouldn't know any other way to write such function. So what is the vector form?
np.vectorize is a glorified python for loop, which means that it effectively strips away any optimizations that numpy offers.
To actually vectorize addsubtract, we can use the fact that numpy offers three things: a vectorized add function, a vectorized subtract function, and all sorts of boolean mask operations.
The simplest, but least efficient, way to write this is using np.where:
np.where(a > b, a - b, a + b)
This is inefficient because it pre-computes a - b and a + b in all cases, and then selects from one or the other for each element.
A more efficient solution would only compute the values where the condition required it:
result = np.empty_like(a)
mask = a > b
np.subtract(a, b, where=mask, out=result)
np.add(a, b, where=~mask, out=result)
For very small arrays, the overhead of the complicated method makes it less worthwhile. But for large arrays, it's the fastest solution.
Fun fact: the page in the tutorial you are referencing will not be available in future versions of the SciPy tutorial exactly because it is an intro to NumPy, as explained in PR #12432.
You can do this with np.where, which computes both results (a-b and a+b) and selects the values depending on an boolean array (a>b):
def addsubtract(a, b):
return np.where(a>b, a-b, a+b)
It can be seen as a vectorized ternary operator: "Where a>b, take the value from a-b, else take the value from a+b".
Despite computing both possible results, it was significantly faster than the vectorized if/else function you wrote (at least on my machine).
Related
numpy/pandas are known famous for their underlying acceleration, i.e. vectorization.
condition evaluation are common expressions that occurs in codes everywhere.
However, when using pandas dataframe apply function intuitively, the condition evaluation seems very slow.
An example of my apply code looks like:
def condition_eval(df):
x=df['x']
a=df['a']
b=df['b']
if x <= a:
d = round((x-a)/0.01)-1
if d <- 10:
d = -10
elif x >= b:
d = round((x-b)/0.01)+1
if d > 10:
d = 10
else:
d = 0
return d
df['eval_result'] = df.apply(condition_eval, axis=1)
The properties of such kind of problems could be:
the result can be computed with only using its own row data, and always using multiple columns.
each row has the same computation algorithm.
the algorithm may contain complex conditional branches.
What's the best practice in numpy/pandas to solve such kind of problems?
Some more thinkings.
In my opinion, one of the reason why vectorization acceleration can be effective is because the underlying cpu has some kind of vector instructions(e.g. SIMD, intel avx), which rely on a truth that the computational instructions have a deterministic behavior, i.e. no matter how the input data is, the result could be acquired after a fixed number of cpu cycles. Thus, parallelizing such kind of operations is easy.
However, branch execution in cpu is much more complicated. First of all, different branches of the same condition evaluation have different execution paths thus they may result in different cpu cycles. Modern cpus even leverage a lot of tricks like branch prediction which create more uncertainties.
So I wonder if and how pandas try to accelerate such kind of vector condition evaluation operations, and is their a better practice to work on such kind of computational workloads.
This should be equivalent:
import pandas as pd
import numpy as np
def get_eval_result(df):
conditions = (
df.x.le(df.a),
df.x.gt(df.b),
)
choices = (
np.where((d := df.x.sub(df.a).div(0.01).round().sub(1)).lt(-10), -10, d),
np.where((d := df.x.sub(df.b).div(0.01).round().add(1)).gt(10), 10, d),
)
return np.select(conditions, choices, 0)
df = df.assign(eval_result=get_eval_result)
My answer basically calculates the results of every branch, and then uses numpy syntax to specify which of those results should be used. This could be optimized slightly, but since it's using purely vectorized function, it should be far faster than using .apply.
np.select is best for this:
(df
.assign(column_to_alter=lambda x: np.select([cond1, cond2, cond3],
[option1, opt2, opt3],
default='somevalue'))
)
I need to compute many inner products for vectors stored in numpy arrays.
Namely, in mathematical terms,
Vi = Σk Ai,k Bi,k
and
Ci,j = Σk Ai,k Di,j,k
I can of course use a loop, but I am wondering whether I could make use of a higher level operator like dot or matmul in a clever way that I haven't thought of. What is bugging me is that np.cross does accept arrays of vector to perform on, but dot and inner don't do what I want.
Your mathematical formulas already gives you the outline to define the subscripts when using np.einsum. It is as simple as:
V = np.einsum('ij,ik->i', A, B)
which translates to V[i] = sum(A[i][k]*B[i][k]).
and
C = np.einsum('ik,ijk->ij', A, D)
i.e. C[i][j] = sum(A[i][k]*D[i][j][k]
You can read more about the einsum operator on this other question.
These can be done with element-wise multiplication and sum:
Vi = Σk Ai,k Bi,k
V = (A*B).sum(axis=-1)
and
Ci,j = Σk Ai,k Di,j,k
C = (A[:,None,:] * D).sum(axis=-1)
einsum may be faster, but it's a good idea of understand this approach.
To use matmul we have to add dimensions to fit the
(batch,i,j)#(batch,j,k) => (batch,k)
pattern. The sum-of-products dimension is j. Calculation-wise this is fast, but application here may be a bit tedious.
In numpy if I want to compare two arrays, say for example I want to test if all elements in A are less than values in B, I use if (A < B).all():. But in practice this requires allocation and evaluation of complete array C = A < B and then calling C.all() on it. This is a bit of waste. Is there any way to 'shortcut' the comparison, i.e. directly evaluate A < B element by element (without allocation and calculation of temporary C) and stop and return False when first invalid element comparison is found?
Plain Python and and or use shortcut evaluation, but numpy does not.
(A < B).all()
uses numpy building blocks, the broadcasting, the element by element comparison with < and the all reduction. The < works just other binary operations, plus, times, and, or, gt, le, etc. And all is like other reduction methods, any, max, sum, mean, and can operate on the whole array or by rows or by columns.
It is possible to write a function that combines the all and < into one iteration, but it would be difficult to get the generality that I just described.
But if you must implement an iterative solution, with a shortcut action, and do it fast, I'd suggest developing the idea with nditer, and then compile it with cython.
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html is a good tutorial on using nditer, and it takes you through using it in cython. nditer takes care of broadcasting and iteration, letting you concentrate on the comparison and any shortcutting.
Here's a sketch of an iterator that could be cast into cython:
import numpy as np
a = np.arange(4)[:,None]
b = np.arange(2,5)[None,:]
c = np.array(True)
it = np.nditer([a, b, c], flags=['reduce_ok'],
op_flags = [['readonly'], ['readonly'],['readwrite']])
for x, y, z in it:
z[...] = x<y
if not z:
print('>',x,y)
break
else:
print(x,y)
print(z)
with a sample run:
1420:~/mypy$ python stack34852272.py
(array(0), array(2))
(array(0), array(3))
(array(0), array(4))
(array(1), array(2))
(array(1), array(3))
(array(1), array(4))
('>', array(2), array(2))
False
Start with a default False, and a different break condition and you get a shortcutting any. Generalizing the test to handle <, <=, etc will be more work.
Get something like this working in Python, and then try it in Cython. If you have trouble with that step, come back with a new question. SO has a good base of Cython users.
How large are you arrays? I would imagine they are very large, e.g. A.shape = (1000000) or larger before performance becomes an issue. Would you consider using numpy views?
Instead of comparing (A < B).all() or (A < B).any() you can try defining a view, such as (A[:10] < B[:10]).all(). Here's a simple loop that might work:
k = 0
while( (A[k*10: (k+1)*10] < B[k*10: (k+1)*10] ).all() ):
k += 1
Instead of 10 you can use 100 or 10**3 segment size you wish. Obviously if your segment size is 1, you are saying:
k = 0
while ( A[k] < B[k] ):
k+= 1
Sometimes, comparing the entire array can become memory intensive. If A and B have length of 10000 and I need to compare each pair of elements, I am going to run out of space.
Suppose you have n square matrices A1,...,An. Is there anyway to multiply these matrices in a neat way? As far as I know dot in numpy accepts only two arguments. One obvious way is to define a function to call itself and get the result. Is there any better way to get it done?
This might be a relatively recent feature, but I like:
A.dot(B).dot(C)
or if you had a long chain you could do:
reduce(numpy.dot, [A1, A2, ..., An])
Update:
There is more info about reduce here. Here is an example that might help.
>>> A = [np.random.random((5, 5)) for i in xrange(4)]
>>> product1 = A[0].dot(A[1]).dot(A[2]).dot(A[3])
>>> product2 = reduce(numpy.dot, A)
>>> numpy.all(product1 == product2)
True
Update 2016:
As of python 3.5, there is a new matrix_multiply symbol, #:
R = A # B # C
Resurrecting an old question with an update:
As of November 13, 2014 there is now a np.linalg.multi_dot function which does exactly what you want. It also has the benefit of optimizing call order, though that isn't necessary in your case.
Note that this available starting with numpy version 1.10.
If you compute all the matrices a priori then you should use an optimization scheme for matrix chain multiplication. See this Wikipedia article.
Another way to achieve this would be using einsum, which implements the Einstein summation convention for NumPy.
To very briefly explain this convention with respect to this problem: When you write down your multiple matrix product as one big sum of products, you get something like:
P_im = sum_j sum_k sum_l A1_ij A2_jk A3_kl A4_lm
where P is the result of your product and A1, A2, A3, and A4 are the input matrices. Note that you sum over exactly those indices that appear twice in the summand, namely j, k, and l. As a sum with this property often appears in physics, vector calculus, and probably some other fields, there is a NumPy tool for it, namely einsum.
In the above example, you can use it to calculate your matrix product as follows:
P = np.einsum( "ij,jk,kl,lm", A1, A2, A3, A4 )
Here, the first argument tells the function which indices to apply to the argument matrices and then all doubly appearing indices are summed over, yielding the desired result.
Note that the computational efficiency depends on several factors (so you are probably best off with just testing it):
Why is numpy's einsum slower than numpy's built-in functions?
Why is numpy's einsum faster than numpy's built in functions?
A_list = [np.random.randn(100, 100) for i in xrange(10)]
B = np.eye(A_list[0].shape[0])
for A in A_list:
B = np.dot(B, A)
C = reduce(np.dot, A_list)
assert(B == C)
I'm trying to use Python and Numpy/Scipy to implement an image processing algorithm. The profiler tells me a lot of time is being spent in the following function (called often), which tells me the sum of square differences between two images
def ssd(A,B):
s = 0
for i in range(3):
s += sum(pow(A[:,:,i] - B[:,:,i],2))
return s
How can I speed this up? Thanks.
Just
s = numpy.sum((A[:,:,0:3]-B[:,:,0:3])**2)
(which I expect is likely just sum((A-B)**2) if the shape is always (,,3))
You can also use the sum method: ((A-B)**2).sum()
Right?
Just to mention that one can also use np.dot:
def ssd(A,B):
dif = A.ravel() - B.ravel()
return np.dot( dif, dif )
This might be a bit faster and possibly more accurate than alternatives using np.sum and **2, but doesn't work if you want to compute ssd along a specified axis. In that case, there might be a magical subscript formula using np.einsum.
I am confused why you are taking i in range(3). Is that supposed to be the whole array, or just part?
Overall, you can replace most of this with operations defined in numpy:
def ssd(A,B):
squares = (A[:,:,:3] - B[:,:,:3]) ** 2
return numpy.sum(squares)
This way you can do one operation instead of three and using numpy.sum may be able to optimize the addition better than the builtin sum.
Further to Ritsaert Hornstra's answer that got 2 negative marks (admittedly I didn't see it in it's original form...)
This is actually true.
For a large number of iterations it can often take twice as long to use the '**' operator or the pow(x,y) method as to just manually multiply the pairs together. If necessary use the math.fabs() method if it's throwing out NaN's (which it sometimes does especially when using int16s etc.), and it still only takes approximately half the time of the two functions given.
Not that important to the original question I know, but definitely worth knowing.
I do not know if the pow() function with power 2 will be fast. Try:
def ssd(A,B):
s = 0
for i in range(3):
s += sum((A[:,:,i] - B[:,:,i])*(A[:,:,i] - B[:,:,I]))
return s
You can try this one:
dist_sq = np.sum((A[:, np.newaxis, :] - B[np.newaxis, :, :]) ** 2, axis=-1)
More details can be found here (the 'k-Nearest Neighbors' example):
https://jakevdp.github.io/PythonDataScienceHandbook/02.08-sorting.html
In Ruby language you can achieve this in this way
def diff_btw_sum_of_squars_and_squar_of_sum(from=1,to=100) # use default values from 1..100.
((1..100).inject(:+)**2) -(1..100).map {|num| num ** 2}.inject(:+)
end
diff_btw_sum_of_squars_and_squar_of_sum #call for above method