Sum of squared differences - NumPy/Python - python

I have these 2 vectors A and B:
import numpy as np
A=np.array([1,2,3])
B=np.array([8,7])
and I want to add them up with this expression:
Result = sum((A-B)**2)
The expected result that I need is:
Result = np.array([X,Y])
Where:
X = (1-8)**2 + (2-8)**2 + (3-8)**2 = 110
Y = (1-7)**2 + (2-7)**2 + (3-7)**2 = 77
How can I do it? The 2 arrays are an example, in my case I have a very large arrays and I cannot do it manually.

You can make A a 2d array and utilize numpy's broadcasting property to vectorize the calculation:
((A[:, None] - B) ** 2).sum(0)
# array([110, 77])

Since you have mentioned that you are working with large arrays, with focus on performance here's one with np.einsum that does the combined operation of squaring and sum-reduction in one step efficiently, like so -
def einsum_based(A,B):
subs = A[:,None] - B
return np.einsum('ij,ij->j',subs, subs)
Sample run -
In [16]: A = np.array([1,2,3])
...: B = np.array([8,7])
...:
In [17]: einsum_based(A,B)
Out[17]: array([110, 77])
Runtime test with large arrays scaling up the given sample 1000x -
In [8]: A = np.random.rand(3000)
In [9]: B = np.random.rand(2000)
In [10]: %timeit ((A[:, None] - B) ** 2).sum(0) # #Psidom's soln
10 loops, best of 3: 21 ms per loop
In [11]: %timeit einsum_based(A,B)
100 loops, best of 3: 12.3 ms per loop

Related

How to vectorize fourier series partial sum in numpy

Given the Fourier series coefficients a[n] and b[n] (for cosines and sines respectively) of a function with period T and t an equally spaced interval the following code will evaluate the partial sum for all points in interval t (a,b,t are all numpy arrays). It is clarified that len(t) <> len(a).
yn=ones(len(t))*a[0]
for n in range(1,len(a)):
yn=yn+(a[n]*cos(2*pi*n*t/T)-b[n]*sin(2*pi*n*t/T))
My question is: Can this for loop be vectorized?
Here's one vectorized approach making use broadcasting to create the 2D array version of cosine/sine input : 2*pi*n*t/T and then using matrix-multiplication with np.dot for the sum-reduction -
r = np.arange(1,len(a))
S = 2*np.pi*r[:,None]*t/T
cS = np.cos(S)
sS = np.sin(S)
out = a[1:].dot(cS) - b[1:].dot(sS) + a[0]
Further performance boost
For further boost, we can make use of numexpr module to compute those trignometric steps -
import numexpr as ne
cS = ne.evaluate('cos(S)')
sS = ne.evaluate('sin(S)')
Runtime test -
Approaches -
def original_app(t,a,b,T):
yn=np.ones(len(t))*a[0]
for n in range(1,len(a)):
yn=yn+(a[n]*np.cos(2*np.pi*n*t/T)-b[n]*np.sin(2*np.pi*n*t/T))
return yn
def vectorized_app(t,a,b,T):
r = np.arange(1,len(a))
S = (2*np.pi/T)*r[:,None]*t
cS = np.cos(S)
sS = np.sin(S)
return a[1:].dot(cS) - b[1:].dot(sS) + a[0]
def vectorized_app_v2(t,a,b,T):
r = np.arange(1,len(a))
S = (2*np.pi/T)*r[:,None]*t
cS = ne.evaluate('cos(S)')
sS = ne.evaluate('sin(S)')
return a[1:].dot(cS) - b[1:].dot(sS) + a[0]
Also, including function PP from #Paul Panzer's post.
Timings -
In [22]: # Setup inputs
...: n = 10000
...: t = np.random.randint(0,9,(n))
...: a = np.random.randint(0,9,(n))
...: b = np.random.randint(0,9,(n))
...: T = 3.45
...:
In [23]: print np.allclose(original_app(t,a,b,T), vectorized_app(t,a,b,T))
...: print np.allclose(original_app(t,a,b,T), vectorized_app_v2(t,a,b,T))
...: print np.allclose(original_app(t,a,b,T), PP(t,a,b,T))
...:
True
True
True
In [25]: %timeit original_app(t,a,b,T)
...: %timeit vectorized_app(t,a,b,T)
...: %timeit vectorized_app_v2(t,a,b,T)
...: %timeit PP(t,a,b,T)
...:
1 loops, best of 3: 6.49 s per loop
1 loops, best of 3: 6.24 s per loop
1 loops, best of 3: 1.54 s per loop
1 loops, best of 3: 1.96 s per loop
Can't beat numexpr, but if it's not available we can save on the transcendentals (testing and benchmarking code heavily based on #Divakar's code in case you didn't notice ;-) ):
import numpy as np
from timeit import timeit
def PP(t,a,b,T):
CS = np.empty((len(t), len(a)-1), np.complex)
CS[...] = np.exp(2j*np.pi*(t[:, None])/T)
np.cumprod(CS, axis=-1, out=CS)
return a[1:].dot(CS.T.real) - b[1:].dot(CS.T.imag) + a[0]
def original_app(t,a,b,T):
yn=np.ones(len(t))*a[0]
for n in range(1,len(a)):
yn=yn+(a[n]*np.cos(2*np.pi*n*t/T)-b[n]*np.sin(2*np.pi*n*t/T))
return yn
def vectorized_app(t,a,b,T):
r = np.arange(1,len(a))
S = 2*np.pi*r[:,None]*t/T
cS = np.cos(S)
sS = np.sin(S)
return a[1:].dot(cS) - b[1:].dot(sS) + a[0]
n = 1000
t = 2000
t = np.random.randint(0,9,(t))
a = np.random.randint(0,9,(n))
b = np.random.randint(0,9,(n))
T = 3.45
print(np.allclose(original_app(t,a,b,T), vectorized_app(t,a,b,T)))
print(np.allclose(original_app(t,a,b,T), PP(t,a,b,T)))
print('{:18s} {:9.6f}'.format('orig', timeit(lambda: original_app(t,a,b,T), number=10)/10))
print('{:18s} {:9.6f}'.format('Divakar no numexpr', timeit(lambda: vectorized_app(t,a,b,T), number=10)/10))
print('{:18s} {:9.6f}'.format('PP', timeit(lambda: PP(t,a,b,T), number=10)/10))
Prints:
True
True
orig 0.166903
Divakar no numexpr 0.179617
PP 0.060817
Btw. if delta t divides T one can potentially save more, or even run the full fft and discard what's too much.
This is not really another answer but a comment on #Paul Panzer's one, written as an answer because I needed to post some code. If there is a way to post propely formatted code in a comment please advice.
Inspired by #Paul Panzer cumprod idea, I came up with the following:
an = ones((len(a)-1,len(te)))*2j*pi*te/T
CS = exp(cumsum(an,axis=0))
out = (a[1:].dot(CS.real) - b[1:].dot(CS.imag)) + a[0]
Although it seems properly vectorized and produces correct results, its performance is miserable. It is not only much slower than the cumprod, which is expected as len(a)-1 exponentiations more are made, but 50% slower than the original unvectorized version. What is the cause of this poor performance?

Parsing Complex Mathematical Functions in Python

Is there a way in Python to parse a mathematical expression in Python that describes a 3D graph? Using other math modules or not. I couldn't seem to find a way for it to handle two inputs.
An example of a function I would want to parse is Holder Table Function.
It has multiple inputs, trigonometric functions, and I would need to parse the abs() parts too.
I have two 2D numpy meshgrids as input for x1 and x2, and would prefer to pass them directly to the expression for it to evaluate.
Any help would be greatly appreciated, thanks.
It's not hard to write straight numpy code that evaluates this formula.
def holder(x1, x2):
f1 = 1 - np.sqrt(x1**2 + x2**2)/np.pi
f2 = np.exp(np.abs(f1))
f3 = np.sin(x1)*np.cos(x2)*f2
return -np.abs(f3)
Evaluated at a point:
In [109]: holder(10,10)
Out[109]: -15.140223856952055
Evaluated on a grid:
In [60]: I,J = np.mgrid[-bd:bd:.01, -bd:bd:.01]
In [61]: H = holder(I,J)
Quick-n-dirty sympy
Diving into sympy without reading much of the docs:
In [65]: from sympy.abc import x,y
In [69]: import sympy as sp
In [70]: x
Out[70]: x
In [71]: x**2
Out[71]: x**2
In [72]: (x**2 + y**2)/sp.pi
Out[72]: (x**2 + y**2)/pi
In [73]: 1-(x**2 + y**2)/sp.pi
Out[73]: -(x**2 + y**2)/pi + 1
In [75]: from sympy import Abs
...
In [113]: h = -Abs(sp.sin(x)*sp.cos(y)*sp.exp(Abs(1-sp.sqrt(x**2 + y**2)/sp.pi)))
In [114]: float(h.subs(x,10).subs(y,10))
Out[114]: -15.140223856952053
Or with the sympify that DSM suggested
In [117]: h1 = sp.sympify("-Abs(sin(x)*cos(y)*exp(Abs(1-sqrt(x**2 + y**2)/pi)))")
In [118]: h1
Out[118]: -exp(Abs(sqrt(x**2 + y**2)/pi - 1))*Abs(sin(x)*cos(y))
In [119]: float(h1.subs(x,10).subs(y,10))
Out[119]: -15.140223856952053
...
In [8]: h1.subs({x:10, y:10}).n()
Out[8]: -15.1402238569521
Now how do I use this with numpy arrays?
Evaluating the result of sympy lambdify on a numpy mesgrid
In [22]: f = sp.lambdify((x,y), h1, [{'ImmutableMatrix': np.array}, "numpy"])
In [23]: z = f(I,J)
In [24]: z.shape
Out[24]: (2000, 2000)
In [25]: np.allclose(z,H)
Out[25]: True
In [26]: timeit holder(I,J)
1 loop, best of 3: 898 ms per loop
In [27]: timeit f(I,J)
1 loop, best of 3: 899 ms per loop
Interesting - basically the same speed.
Earlier answer along this line: How to read a system of differential equations from a text file to solve the system with scipy.odeint?

Find all occurences of a specified match of two numbers in numpy array

what i need to achieve is to get array of all indexes, where in my data array filled with zeros and ones is step from zero to one. I need very quick solution, because i have to work with milions of arrays of hundrets milions length. It will be running in computing centre. For instance..
data_array = np.array([1,1,0,1,1,1,0,0,0,1,1,1,0,1,1,0])
result = [3,9,13]
try this:
In [23]: np.where(np.diff(a)==1)[0] + 1
Out[23]: array([ 3, 9, 13], dtype=int64)
Timing for 100M element array:
In [46]: a = np.random.choice([0,1], 10**8)
In [47]: %timeit np.nonzero((a[1:] - a[:-1]) == 1)[0] + 1
1 loop, best of 3: 1.46 s per loop
In [48]: %timeit np.where(np.diff(a)==1)[0] + 1
1 loop, best of 3: 1.64 s per loop
Here's the procedure:
Compute the diff of the array
Find the index where the diff == 1
Add 1 to the results (b/c len(diff) = len(orig) - 1)
So try this:
index = numpy.nonzero((data_array[1:] - data_array[:-1]) == 1)[0] + 1
index
# [3, 9, 13]
Well thanks a lot to all of you. Solution with nonzero is probably better for me, because I need to know steps from 0->1 and also 1->0 and finally calculate differences. So this is my solution. Any other advice appreciated .)
i_in = np.nonzero( (data_array[1:] - data_array[:-1]) == 1 )[0] +1
i_out = np.nonzero( (data_array[1:] - data_array[:-1]) == -1 )[0] +1
i_return_in_time = (i_in - i_out[:i_in.size] )
Since it's an array filled with 0s and 1s, you can benefit from just comparing rather than performing arithmetic operation between the one-shifted versions to directly give us the boolean array, which could be fed to np.flatnonzero to get us the indices and the final output.
Thus, we would have an implementation like so -
np.flatnonzero(data_array[1:] > data_array[:-1])+1
Runtime test -
In [26]: a = np.random.choice([0,1], 10**8)
In [27]: %timeit np.nonzero((a[1:] - a[:-1]) == 1)[0] + 1
1 loop, best of 3: 1.91 s per loop
In [28]: %timeit np.where(np.diff(a)==1)[0] + 1
1 loop, best of 3: 1.91 s per loop
In [29]: %timeit np.flatnonzero(a[1:] > a[:-1])+1
1 loop, best of 3: 954 ms per loop

Special kind of row-by-row multiplication of 2 sparse matrices in Python

What I'm looking for: a way to implement in Python a special multiplication operation for matrices that happen to be in scipy sparse (csr) format. This is a special kind of multiplication, not matrix multiplication nor Kronecker multiplication nor Hadamard aka pointwise multiplication, and does not seem to have any built-in support in scipy.sparse.
The desired operation: Each row of the output should contain the results of every product of the elements of the corresponding rows in the two input matrices. So starting with two identically sized matrices, each with dimensions m by n, the result should have dimensions m by n^2.
It looks like this:
Python code:
import scipy.sparse
A = scipy.sparse.csr_matrix(np.array([[1,2],[3,4]]))
B = scipy.sparse.csr_matrix(np.array([[0,5],[6,7]]))
# C is the desired product of A and B. It should look like:
C = scipy.sparse.csr_matrix(np.array([[0,5,0,10],[18,21,24,28]]))
What would be a nice or efficient way to do this? I've tried looking here on stackoverflow as well as elsewhere, with no luck so far. So far it sounds like my best bet is to do a row by row operation in a for loop, but that sounds horrendous seeing as my input matrices have a few million rows and few thousand columns, mostly sparse.
In your example, C is the first and last row of kron
In [4]: A=np.array([[1,2],[3,4]])
In [5]: B=np.array([[0,5],[6,7]])
In [6]: np.kron(A,B)
Out[6]:
array([[ 0, 5, 0, 10],
[ 6, 7, 12, 14],
[ 0, 15, 0, 20],
[18, 21, 24, 28]])
In [7]: np.kron(A,B)[[0,3],:]
Out[7]:
array([[ 0, 5, 0, 10],
[18, 21, 24, 28]])
kron contains the same values as np.outer, but they are in a different order.
For large dense arrays, einsum might provide good speed:
np.einsum('ij,ik->ijk',A,B).reshape(A.shape[0],-1)
sparse.kron does the same thing as the np.kron:
As = sparse.csr_matrix(A); Bs ...
sparse.kron(As,Bs).tocsr()[[0,3],:].A
sparse.kron is written in Python, so you probably could modify it if it is doing unnecessary calculations.
An iterative solution appears to be:
sparse.vstack([sparse.kron(a,b) for a,b in zip(As,Bs)]).A
Being iterative I don't expect it to be faster than paring down the full kron. But short of digging into the logic of sparse.kron it is probably the best I can do.
vstack uses bmat, so the calculation is:
sparse.bmat([[sparse.kron(a,b)] for a,b in zip(As,Bs)])
But bmat is rather complex, so it won't be easy to simplify this further.
The np.einsum solution can't be easily extended to sparse - there isn't a sparse.einsum, and the intermediate product is 3d, which sparse does not handle.
sparse.kron uses coo format, which is no good for working with the rows. But working in the spirit of that function, I've worked out a function that iterates on the rows of csr format matrices. Like kron and bmat I'm constructing the data, row, col arrays, and constructing a coo_matrix from those. That in turn can be converted to other formats.
def test_iter(A, B):
m,n1 = A.shape
n2 = B.shape[1]
Cshape = (m, n1*n2)
data = np.empty((m,),dtype=object)
col = np.empty((m,),dtype=object)
row = np.empty((m,),dtype=object)
for i,(a,b) in enumerate(zip(A, B)):
data[i] = np.outer(a.data, b.data).flatten()
#col1 = a.indices * np.arange(1,a.nnz+1) # wrong when a isn't dense
col1 = a.indices * n2 # correction
col[i] = (col1[:,None]+b.indices).flatten()
row[i] = np.full((a.nnz*b.nnz,), i)
data = np.concatenate(data)
col = np.concatenate(col)
row = np.concatenate(row)
return sparse.coo_matrix((data,(row,col)),shape=Cshape)
With these small 2x2 matrices, as well as larger ones (e.g. A1=sparse.rand(1000,2000).tocsr()), this is about 3x faster than the version using bmat. For large enough matrices it is better than dense einsum version (which can have memory errors).
A non-optimal way to do it is to kron separately for each row:
def my_mult(A, B):
nrows = A.shape[0]
prodrows = []
for i in xrange(0, nrows):
Arow = A.getrow(i)
Brow = B.getrow(i)
prodrow = scipy.sparse.kron(Arow,Brow)
prodrows.append(prodrow)
return scipy.sparse.vstack(prodrows)
This is approx 3x worse in performance than #hpaulj's solution here, as can be seen by running the following code:
A=scipy.sparse.rand(20000,1000, density=0.05).tocsr()
B=scipy.sparse.rand(20000,1000, density=0.05).tocsr()
# Check memory
%memit C1 = test_iter(A,B)
%memit C2 = my_mult(A,B)
# Check time
%timeit C1 = test_iter(A,B)
%timeit C2 = my_mult(A,B)
# Last but not least, check correctness!
print (C1 - C2).nnz == 0
Results:
hpaulj's method:
peak memory: 1993.93 MiB, increment: 1883.80 MiB
1 loops, best of 3: 6.42 s per loop
this method:
peak memory: 2456.75 MiB, increment: 1558.78 MiB
1 loops, best of 3: 18.9 s per loop
hpauj's answer to my another post:
How do i create interacting sparse matrix?
def test_iter2(A, B):
m,n1 = A.shape
n2 = B.shape[1]
Cshape = (m, n1*n2)
data = []
col = []
row = []
for i in range(A.shape[0]):
slc1 = slice(A.indptr[i],A.indptr[i+1])
data1 = A.data[slc1]; ind1 = A.indices[slc1]
slc2 = slice(B.indptr[i],B.indptr[i+1])
data2 = B.data[slc2]; ind2 = B.indices[slc2]
data.append(np.outer(data1, data2).ravel())
col.append(((ind1*n2)[:,None]+ind2).ravel())
row.append(np.full(len(data1)*len(data2), i))
data = np.concatenate(data)
col = np.concatenate(col)
row = np.concatenate(row)
return sparse.coo_matrix((data,(row,col)),shape=Cshape)
It got 6 times faster.
In [536]: S0=sparse.random(200,200, 0.01, format='csr')
In [537]: S1=sparse.random(200,200, 0.01, format='csr')
In [538]: timeit test_iter(S0,S1)
42.8 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [539]: timeit test_iter2(S0,S1)
6.94 ms ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

push for-loop to numpy

Can this next for-loop be done any faster by pushing it down to numpy?
ri = numpy.zeros((R.shape[0],R.shape[2]))
for i in range(R.shape[0]):
ri[i, :] = R[i, indices[i], :]
This relates to my previous question making numpy.nanargmin return nan if column is all nan , which was to speed up this bit:
bestepsilons = numpy.zeros((R.shape[0]))
for i in range(R.shape[0]):
bestindex = numpy.nanargmin(R[i,:])
if(numpy.isnan(bestindex)):
bestepsilons[i]=numpy.nan
else:
bestepsilons[i]=epsilon[bestindex]
and got solved (by myself) as:
bestepsilons1 = numpy.zeros(R.shape[0])+numpy.nan
d0 = numpy.nanmin(R, axis=1) # places where the best index is not a nan
bestepsilons1[~numpy.isnan(d0)] = epsilon[numpy.nanargmin(R[~numpy.isnan(d0),:], axis=1)]
But now the more complicated case is:
bestepsilons = numpy.zeros((R.shape[0]))
for i in range(R.shape[0]):
bestindex = numpy.nanargmin(R[i,indices[i],:])
if(numpy.isnan(bestindex)):
bestepsilons[i]=numpy.nan
else:
bestepsilons[i]=epsilon[bestindex]
And now this trick to show the places where the best index is not a nan does not work anymore with that axis argument.
It is possible to push it down to numpy but whether or not it is faster will depend on the sizes of your arrays. Hopefully, there is a more elegant solution, but this works:
ii = np.arange(R.shape[0]) * R.shape[1] + indices
ri = R.reshape(-1, R.shape[2])[ii]
Here are a couple timing tests:
def f1(R, indices):
ri = numpy.zeros((R.shape[0],R.shape[2]))
for i in range(R.shape[0]):
ri[i, :] = R[i, indices[i], :]
return ri
def f2(R, indices):
ii = np.arange(R.shape[0]) * R.shape[1] + indices
return R.reshape(-1, R.shape[2])[ii]
Smaller R:
In [25]: R = np.random.rand(30, 40, 50)
In [26]: indices = np.random.choice(range(R.shape[1]), R.shape[0], replace=True)
In [27]: %timeit(f1(R, indices))
10000 loops, best of 3: 61.4 us per loop
In [28]: %timeit(f2(R, indices))
10000 loops, best of 3: 21.9 us per loop
Larger R:
In [29]: R = np.random.rand(300, 400, 500)
In [30]: indices = np.random.choice(range(R.shape[1]), R.shape[0], replace=True)
In [31]: %timeit(f1(R, indices))
1000 loops, best of 3: 713 us per loop
In [32]: %timeit(f2(R, indices))
1000 loops, best of 3: 1.23 ms per loop
In [33]: np.all(f1(R, indices) == f2(R, indices))
Out[33]: True
Found that this is faster by about 10%:
d1 = numpy.arange(R.shape[0])[:,None]
d2 = indices[numpy.arange(R.shape[0])][:,None]
d3 = numpy.arange(R.shape[2])[None,:]
ri = R[d1,d2,d3]
bestepsilons = numpy.zeros(R.shape[0])+numpy.nan
d0 = numpy.nanmin(ri, axis=1) # places where the best index is not a nan
bestepsilons[~numpy.isnan(d0)] = epsilon[numpy.nanargmin(ri[~numpy.isnan(d0),:], axis=1)]
But this is with R defined as:
R = (self.VVm[:,None,None]-VVs[None,:,:])**2 + (self.HHm[:,None,None]-HHs[None,:,:])**2
and i found that if i define R differently it speeds up a huge load more:
ti = indices[numpy.arange(len(VVm))]
R1 = (VVm[:,None]-VVs[ti,:])**2+(HHm[:,None]-HHs[ti,:])**2
d0 = numpy.nanmin(R1, axis=1) # places where the best index is not a nan
bestepsilons2[~numpy.isnan(d0)] = epsilon[numpy.nanargmin(R1[~numpy.isnan(d0),:], axis=1)]
This way it does not need to make a 3D R, but makes it directly in 2D (gets a 4X speedup)

Categories