I'm looking for the fastest way to calculate the squared difference between two vectors ((x1-x2)**2), but pairwise (all combinations or only the upper triangle).
x1 = [1,3,5,6,8]
x2 = [3,6,7,9,12]
Expected output:
array([[ 4., 25., 36., 64., 121.],
[ 0., 9., 16., 36., 81.],
[ 4., 1., 4., 16., 49.],
[ 9., 0., 1., 9., 36.],
[ 25., 4., 1., 1., 16.]])
or
array([[ 4., 25., 36., 64., 121.],
[ 0., 9., 16., 36., 81.],
[ 0., 0., 4., 16., 49.],
[ 0., 0., 0., 9., 36.],
[ 0., 0., 0., 0., 16.]])
or even (if faster):
array([ 4., 25., 36., 64., 121., 9., 16., 36., 81.,
4., 1., 4., 16., 49., 9., 1., 9., 36.,
25., 4., 1., 1., 16.])
Here's one with broadcasting and masking to get the upper triangular ones and then squaring only those for better performance efficiency -
def pairwise_squared_diff(x1, x2):
x1 = np.asarray(x1)
x2 = np.asarray(x2)
diffs = x1[:,None] - x2
mask = np.arange(len(x1))[:,None] <= np.arange(len(x2))
return (diffs[mask])**2
Sample run -
In [85]: x1
Out[85]: array([1, 3, 5, 6, 8])
In [86]: x2
Out[86]: array([ 3, 6, 7, 9, 12])
In [87]: pairwise_squared_diff(x1, x2)
Out[87]:
array([ 4, 25, 36, 64, 121, 9, 16, 36, 81, 4, 16, 49, 9,
36, 16])
Possible improvements
Improvement #1 :
We could also use np.tri to generate mask -
mask = ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
Improvement #2 :
If we are okay with a 2D output with the lower triangular ones set as 0s, then a simple elementwise multiplication with mask solves it too to get the final output -
(diffs*mask)**2
This would work well with numexpr module for large data and to gain memory efficiency and hence performance.
Improvement #3 :
We could also compute the differences with numexpr and hence compute the masked output too with the same evaulate method, to give ourselves a new solution altogether -
def pairwise_squared_diff_numexpr(x1, x2):
x1 = np.asarray(x1)
x2 = np.asarray(x2)
mask = ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
return ne.evaluate('mask*((x1D-x2)**2)',{'x1D':x1[:,None]})
Timings with improvements
Let's study these suggestions on performance for large arrays -
Setup :
In [136]: x1 = np.random.randint(0,9,(1000))
In [137]: x2 = np.random.randint(0,9,(1000))
With Improvement #1 :
In [138]: %timeit np.arange(len(x1))[:,None] <= np.arange(len(x2))
1000 loops, best of 3: 772 µs per loop
In [139]: %timeit ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
1000 loops, best of 3: 243 µs per loop
With Improvement #2 :
In [140]: import numexpr as ne
In [141]: diffs = x1[:,None] - x2
...: mask = np.arange(len(x1))[:,None] <= np.arange(len(x2))
In [142]: %timeit (diffs[mask])**2
1000 loops, best of 3: 1.46 ms per loop
In [143]: %timeit ne.evaluate('(diffs*mask)**2')
1000 loops, best of 3: 1.05 ms per loop
With Improvement #3 on complete solutions :
In [170]: %timeit pairwise_squared_diff(x1, x2)
100 loops, best of 3: 3.66 ms per loop
In [171]: %timeit pairwise_squared_diff_numexpr(x1, x2)
1000 loops, best of 3: 1.54 ms per loop
Loopy one
For completeness, here's a loopy one that leverages slicing to perform better than pure broadcasting one, owing to the memory-efficiency -
def pairwise_squared_diff_loopy(x1,x2):
n = len(x2)
idx = np.concatenate(( [0], np.arange(n,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
L = n*(n+1)//2
out = np.empty(L,dtype=np.result_type(x1,x2))
for i,(s0,s1) in enumerate(zip(start,stop)):
out[s0:s1] = x1[i] - x2[i:]
return out**2
Timings -
In [300]: x1 = np.random.randint(0,9,(1000))
...: x2 = np.random.randint(0,9,(1000))
In [301]: %timeit pairwise_squared_diff(x1, x2)
100 loops, best of 3: 3.44 ms per loop
In [302]: %timeit pairwise_squared_diff_loopy(x1, x2)
100 loops, best of 3: 2.73 ms per loop
You can use broadcasting:
x1 = np.asarray([1,3,5,6,8]).reshape(-1, 1)
x2 = np.asarray([3,6,7,9,12]).reshape(1, -1)
(x1 - x2)**2
Output:
array([[ 4, 25, 36, 64, 121],
[ 0, 9, 16, 36, 81],
[ 4, 1, 4, 16, 49],
[ 9, 0, 1, 9, 36],
[ 25, 4, 1, 1, 16]])
which is simple to code, but computes all values, so it may be optimized to compute only the upper triangle.
Related
I have two numpy arrays NS, EW to sum up. Each of them has missing values at different positions, like
NS = array([[ 1., 2., nan],
[ 4., 5., nan],
[ 6., nan, nan]])
EW = array([[ 1., 2., nan],
[ 4., nan, nan],
[ 6., nan, 9.]]
How can I perform a summation operation in the numpy way, which will treat nan as zero if one array has nan at a location, and keep nan if both arrays has nan at the same location.
The result I expect to see is
SUM = array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
When I try
SUM=np.add(NS,EW)
it gives me
SUM=array([[ 2., 4., nan],
[ 8., nan, nan],
[ 12., nan, nan]])
When I try
SUM = np.nansum(np.dstack((NS,EW)),2)
it gives me
SUM=array([[ 2., 4., 0.],
[ 8., 5., 0.],
[ 12., 0., 9.]])
Of course, I can realize my goal by doing element-level operation,
for i in range(np.size(NS,0)):
for j in range(np.size(NS,1)):
if np.isnan(NS[i,j]) and np.isnan(EW[i,j]):
SUM[i,j] = np.nan
elif np.isnan(NS[i,j]):
SUM[i,j] = EW[i,j]
elif np.isnan(EW[i,j]):
SUM[i,j] = NS[i,j]
else:
SUM[i,j] = NS[i,j]+EW[i,j]
but it is very slow. So I'm looking for a more numpy solution to solve this problem.
Thanks for help in advance!
Approach #1 : One approach with np.where -
def sum_nan_arrays(a,b):
ma = np.isnan(a)
mb = np.isnan(b)
return np.where(ma&mb, np.nan, np.where(ma,0,a) + np.where(mb,0,b))
Sample run -
In [43]: NS
Out[43]:
array([[ 1., 2., nan],
[ 4., 5., nan],
[ 6., nan, nan]])
In [44]: EW
Out[44]:
array([[ 1., 2., nan],
[ 4., nan, nan],
[ 6., nan, 9.]])
In [45]: sum_nan_arrays(NS, EW)
Out[45]:
array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
Approach #2 : Probably a faster one with a mix of boolean-indexing -
def sum_nan_arrays_v2(a,b):
ma = np.isnan(a)
mb = np.isnan(b)
m_keep_a = ~ma & mb
m_keep_b = ma & ~mb
out = a + b
out[m_keep_a] = a[m_keep_a]
out[m_keep_b] = b[m_keep_b]
return out
Runtime test -
In [140]: # Setup input arrays with 4/9 ratio of NaNs (same as in the question)
...: a = np.random.rand(3000,3000)
...: b = np.random.rand(3000,3000)
...: a.ravel()[np.random.choice(range(a.size), size=4000000, replace=0)] = np.nan
...: b.ravel()[np.random.choice(range(b.size), size=4000000, replace=0)] = np.nan
...:
In [141]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[141]: 0.0
In [142]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 141 ms per loop
In [143]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 177 ms per loop
In [144]: # Setup input arrays with lesser NaNs
...: a = np.random.rand(3000,3000)
...: b = np.random.rand(3000,3000)
...: a.ravel()[np.random.choice(range(a.size), size=4000, replace=0)] = np.nan
...: b.ravel()[np.random.choice(range(b.size), size=4000, replace=0)] = np.nan
...:
In [145]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[145]: 0.0
In [146]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 69.6 ms per loop
In [147]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 38 ms per loop
Actually your nansum approach almost worked, you just need to add in the nans again:
def add_ignore_nans(a, b):
stacked = np.array([a, b])
res = np.nansum(stacked, axis=0)
res[np.all(np.isnan(stacked), axis=0)] = np.nan
return res
>>> add_ignore_nans(a, b)
array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
This will be slower than #Divakars answer but I wanted to mention that you were pretty close already! :-)
I think we can get a bit more concise, in the same vein as Divakar's second approach. With a = NS and b = EW:
na = numpy.isnan(a)
nb = numpy.isnan(b)
a[na] = 0
b[nb] = 0
a += b
na &= nb
a[na] = numpy.nan
The operations are done in-place where possible to save memory, assuming this is is feasible in your scenario. The final result is in a.
I have two numpy arrays a and b.
I would like to know the position of the entry in the first column of a in the first column of b. If the entry in the first column of a does not exist in b, a NaN should be set for this row. The resulting column should be added at the right side of a.
Remark: the first columns of a and b are numerically ordered.
Example:
import numpy as np
a = np.array([
[1, 201, ],
[2, 202, ],
[3, 203, ],
[4, 204, ],
[5, 205, ],
[6, 206, ],
[7, 207, ],
[8, 208, ],
[14,214,],
[23,223,],
])
b = np.array([
[3, 303, ], # 0
[6, 306, ], # 1
[7, 307, ], # 2
[14,314,], # 3
[16,316,], # 4
])
So, I am looking for a fast (if possible: numpy or related packages based) solution so that I get a new version of a that looks like:
array([[ 1., 201., nan],
[ 2., 202., nan],
[ 3., 203., 0.],
[ 4., 204., nan],
[ 5., 205., nan],
[ 6., 206., 1.],
[ 7., 207., 2.],
[ 8., 208., nan],
[ 14., 214., 3.],
[ 23., 223., nan]])
Provided your arrays are sorted on [:, 0] as in your example, then you can use np.searchsorted() for a (very) fast solution (note: bold claim here; if there is a faster way I'd love to hear it):
ix = np.searchsorted(b[:, 0], a[:, 0]).clip(max=b.shape[0]-1)
out = np.hstack((a, np.where(b[ix, 0] == a[:, 0], ix, np.nan)[:, None]))
Out:
[[ 1. 201. nan]
[ 2. 202. nan]
[ 3. 203. 0.]
[ 4. 204. nan]
[ 5. 205. nan]
[ 6. 206. 1.]
[ 7. 207. 2.]
[ 8. 208. nan]
[ 14. 214. 3.]
[ 23. 223. nan]]
Explanation:
The first line does most of the heavy-lifting work:
>>> np.searchsorted(b[:, 0], a[:, 0])
array([0, 0, 0, 1, 1, 1, 2, 3, 3, 5])
Except we clip it so that it doesn't exceed the number of rows in b. At that point, if b[ix, 0] matches a[:, 0], then ix is correct, otherwise select NaN:
ix = np.searchsorted(b[:, 0], a[:, 0]).clip(max=b.shape[0]-1)
b[ix, 0]
# array([ 3, 3, 3, 6, 6, 6, 7, 14, 14, 16])
a[:, 0]
# array([ 1, 2, 3, 4, 5, 6, 7, 8, 14, 23])
Addendum: timing info
Let:
v0(a, b) be the version proposed here (using searchsorted)
v1(a, b) be #anky's version based on isin and cumsum
v2(a, b) be #ShubhamSharma's version based on where(m.any(1),...)
Also, side note: observe that v1 is wrong when there are repeated values either in a or b (or both).
n = 100_000
m = 10_000
a = np.random.randint(0, 4, size=(n, 2)).cumsum(axis=0)
b = np.random.randint(0, 4, size=(m, 2)).cumsum(axis=0)
t0 = %timeit -o v0(a, b)
# 2.17 ms ± 855 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
t1 = %timeit -o v1(a, b)
# 4.34 ms ± 3.85 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
t2 = %timeit -o v2(a, b)
# 1.12 s ± 473 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> t1.best / t0.best
2.0
>>> t2.best / t0.best
519.1
Let's try numpy broadcasting to find position:
m = a[:, 0, None] == b[:, 0]
np.c_[a, np.where(m.any(1), m.argmax(1), np.nan)]
array([[ 1., 201., nan],
[ 2., 202., nan],
[ 3., 203., 0.],
[ 4., 204., nan],
[ 5., 205., nan],
[ 6., 206., 1.],
[ 7., 207., 2.],
[ 8., 208., nan],
[ 14., 214., 3.],
[ 23., 223., nan]])
Given an ndarray x and a one dimensional array containing the length of contiguous slices of a dimension of x, I want to compute a new array that contains the sum of all of the slices. For example, in two dimensions summing over dimension one:
>>> lens = np.array([1, 3, 2])
array([1, 3, 2])
>>> x = np.arange(4 * lens.sum()).reshape((4, lens.sum())).astype(float)
array([[ 0., 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10., 11.],
[ 12., 13., 14., 15., 16., 17.],
[ 18., 19., 20., 21., 22., 23.]])
# I want to compute:
>>> result
array([[ 0., 6., 9.],
[ 6., 24., 21.],
[ 12., 42., 33.],
[ 18., 60., 45.]])
# 0 = 0
# 6 = 1 + 2 + 3
# ...
# 45 = 22 + 23
The two ways that come to mind are:
a) Use cumsum and fancy indexing:
def cumsum_method(x, lens):
xc = x.cumsum(1)
lc = lens.cumsum() - 1
res = xc[:, lc]
res[:, 1:] -= xc[:, lc[:-1]]
return res
b) Use bincount and intelligently generate the appropriate bins:
def bincount_method(x, lens):
bins = np.arange(lens.size).repeat(lens) + \
np.arange(x.shape[0])[:, None] * lens.size
return np.bincount(bins.flat, weights=x.flat).reshape((-1, lens.size))
Timing these two on large input had the cumsum method performing slightly better:
>>> lens = np.random.randint(1, 100, 100)
>>> x = np.random.random((100000, lens.sum()))
>>> %timeit cumsum_method(x, lens)
1 loops, best of 3: 3 s per loop
>>> %timeit bincount_method(x, lens)
1 loops, best of 3: 3.9 s per loop
Is there an obviously more efficient way that I'm missing? It seems like a native c call would be faster because it wouldn't require allocating the cumsum or the bins array. A numpy builtin function that does something close to this could likely be better than (a) or (b). I couldn't find anything through searching and looking through the documentation.
Note, this is similar to this question, but the summation intervals aren't regular.
You can use np.add.reduceat:
>>> np.add.reduceat(x, [0, 1, 4], axis=1)
array([[ 0., 6., 9.],
[ 6., 24., 21.],
[ 12., 42., 33.],
[ 18., 60., 45.]])
The list of indices [0, 1, 4] means: "sum the slices 0:1, 1:4 and 4:". You could generate these values from lens using np.hstack(([0], lens[:-1])).cumsum().
Even factoring in the calculation of the indices from lens, a reduceat method is likely to be significantly faster than alternative methods:
def reduceat_method(x, lens):
i = np.hstack(([0], lens[:-1])).cumsum()
return np.add.reduceat(x, i, axis=1)
lens = np.random.randint(1, 100, 100)
x = np.random.random((1000, lens.sum())
%timeit reduceat_method(x, lens)
# 100 loops, best of 3: 4.89 ms per loop
%timeit cumsum_method(x, lens)
# 10 loops, best of 3: 35.8 ms per loop
%timeit bincount_method(x, lens)
# 10 loops, best of 3: 43.6 ms per loop
I have a 70x70 numpy ndarray, which is mainly diagonal. The only off-diagonal values are the below the diagonal. I would like to make the matrix symmetric.
As a newcomer from Matlab world, I can't get it working without for loops. In MATLAB it was easy:
W = max(A,A')
where A' is matrix transposition and the max() function takes care to make the W matrix which will be symmetric.
Is there an elegant way to do so in Python as well?
EXAMPLE
The sample A matrix is:
1 0 0 0
0 2 0 0
1 0 2 0
0 1 0 3
The desired output matrix W is:
1 0 1 0
0 2 0 1
1 0 2 0
0 1 0 3
Found a following solution which works for me:
import numpy as np
W = np.maximum( A, A.transpose() )
Use the NumPy tril and triu functions as follows. It essentially "mirrors" elements in the lower triangle into the upper triangle.
import numpy as np
A = np.array([[1, 0, 0, 0], [0, 2, 0, 0], [1, 0, 2, 0], [0, 1, 0, 3]])
W = np.tril(A) + np.triu(A.T, 1)
tril(m, k=0) gets the lower triangle of a matrix m (returns a copy of the matrix m with all elements above the kth diagonal zeroed). Similarly, triu(m, k=0) gets the upper triangle of a matrix m (all elements below the kth diagonal zeroed).
To prevent the diagonal being added twice, one must exclude the diagonal from one of the triangles, using either np.tril(A) + np.triu(A.T, 1) or np.tril(A, -1) + np.triu(A.T).
Also note that this behaves slightly differently to using maximum. All elements in the upper triangle are overwritten, regardless of whether they are the maximum or not. This means they can be any value (e.g. nan or inf).
For what it is worth, using the MATLAB's numpy equivalent you mentioned is more efficient than the link #plonser added.
In [1]: import numpy as np
In [2]: A = np.zeros((4, 4))
In [3]: np.fill_diagonal(A, np.arange(4)+1)
In [4]: A[2:,:2] = np.eye(2)
# numpy equivalent to MATLAB:
In [5]: %timeit W = np.maximum( A, A.T)
100000 loops, best of 3: 2.95 µs per loop
# method from link
In [6]: %timeit W = A + A.T - np.diag(A.diagonal())
100000 loops, best of 3: 9.88 µs per loop
Timing for larger matrices can be done similarly:
In [1]: import numpy as np
In [2]: N = 100
In [3]: A = np.zeros((N, N))
In [4]: A[2:,:N-2] = np.eye(N-2)
In [5]: np.fill_diagonal(A, np.arange(N)+1)
In [6]: print A
Out[6]:
array([[ 1., 0., 0., ..., 0., 0., 0.],
[ 0., 2., 0., ..., 0., 0., 0.],
[ 1., 0., 3., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 98., 0., 0.],
[ 0., 0., 0., ..., 0., 99., 0.],
[ 0., 0., 0., ..., 1., 0., 100.]])
# numpy equivalent to MATLAB:
In [6]: %timeit W = np.maximum( A, A.T)
10000 loops, best of 3: 28.6 µs per loop
# method from link
In [7]: %timeit W = A + A.T - np.diag(A.diagonal())
10000 loops, best of 3: 49.8 µs per loop
And with N = 1000
# numpy equivalent to MATLAB:
In [6]: %timeit W = np.maximum( A, A.T)
100 loops, best of 3: 5.65 ms per loop
# method from link
In [7]: %timeit W = A + A.T - np.diag(A.diagonal())
100 loops, best of 3: 11.7 ms per loop
I a little confused by the broadcasting rules of numpy. Suppose you want to perform an axis-wise scalar product of a higher dimension array to reduce the array dimension by one (basically to perform a weighted summation along one axis):
from numpy import *
A = ones((3,3,2))
v = array([1,2])
B = zeros((3,3))
# V01: this works
B[0,0] = v.dot(A[0,0])
# V02: this works
B[:,:] = v[0]*A[:,:,0] + v[1]*A[:,:,1]
# V03: this doesn't
B[:,:] = v.dot(A[:,:])
Why does V03 not work?
Cheers
np.dot(a, b) operates over the last axis of a and the second-to-last of b. So for your particular case in your question,you could always go with :
>>> a.dot(v)
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
If you want to keep the v.dot(a) order, you need to get the axis into position, which can easily be achieved with np.rollaxis:
>>> v.dot(np.rollaxis(a, 2, 1))
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
I don't like np.dot too much, unless it is for the obvious matrix or vector multiplication, because it is very strict about the output dtype when using the optional out parameter. Joe Kington has mentioned it already, but if you are going to be doing this type of things, get used to np.einsum: once you get the hang of the syntax, it will cut down the amount of time you spend worrying about reshaping things to a minimum:
>>> a = np.ones((3, 3, 2))
>>> np.einsum('i, jki', v, a)
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
Not that it is too relevant in this case, but it is also ridiculously fast:
In [4]: %timeit a.dot(v)
100000 loops, best of 3: 2.43 us per loop
In [5]: %timeit v.dot(np.rollaxis(a, 2, 1))
100000 loops, best of 3: 4.49 us per loop
In [7]: %timeit np.tensordot(v, a, axes=(0, 2))
100000 loops, best of 3: 14.9 us per loop
In [8]: %timeit np.einsum('i, jki', v, a)
100000 loops, best of 3: 2.91 us per loop
You can also use tensordot, in this particular case.
import numpy as np
A = np.ones((3,3,2))
v = np.array([1,2])
print np.tensordot(v, A, axes=(0, 2))
This yields:
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
The axes=(0,2) indicates that tensordot should sum over the first axis in v and the third axis in A. (Also have a look at einsum, which is more flexible, but harder to understand if you're not used to the notation.)
If speed is a consideration, tensordot is considerably faster than using apply_along_axes for small arrays.
In [14]: A = np.ones((3,3,2))
In [15]: v = np.array([1,2])
In [16]: %timeit np.tensordot(v, A, axes=(0, 2))
10000 loops, best of 3: 21.6 us per loop
In [17]: %timeit np.apply_along_axis(v.dot, 2, A)
1000 loops, best of 3: 258 us per loop
(The difference is less apparent for large arrays due to a constant overhead, though tensordot is consistently faster.)
You could use numpy.apply_along_axis() for this:
In [35]: np.apply_along_axis(v.dot, 2, A)
Out[35]:
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
The reason I think V03 doesn't work is that it's no different to:
B[:,:] = v.dot(A)
i.e. it tries to compute the dot product along the outermost axis of A.