Numpy searchsorted along many dimensions? [duplicate] - python

Assume that I have two arrays A and B, where both A and B are m x n. My goal is now, for each row of A and B, to find where I should insert the elements of row i of A in the corresponding row of B. That is, I wish to apply np.digitize or np.searchsorted to each row of A and B.
My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?

We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.
So, we would have a vectorized implementation like so -
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
Runtime test -
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop

The solution provided by #Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g. [[1.0, 2,0, 3.0, 1.0e+20],...]). In some cases r may be so large that applying a+r and b+r wipes out the original values you're trying to run searchsorted on, and you're just comparing r to r.
To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.
def searchsorted_2d (a, v, side='left', sorter=None):
import numpy as np
# Make sure a and v are numpy arrays.
a = np.asarray(a)
v = np.asarray(v)
# Augment a with row id
ai = np.empty(a.shape,dtype=[('row',int),('value',a.dtype)])
ai['row'] = np.arange(a.shape[0]).reshape(-1,1)
ai['value'] = a
# Augment v with row id
vi = np.empty(v.shape,dtype=[('row',int),('value',v.dtype)])
vi['row'] = np.arange(v.shape[0]).reshape(-1,1)
vi['value'] = v
# Perform searchsorted on augmented array.
# The row information is embedded in the values, so only the equivalent rows
# between a and v are considered.
result = np.searchsorted(ai.flatten(),vi.flatten(), side=side, sorter=sorter)
# Restore the original shape, decode the searchsorted indices so they apply to the original data.
result = result.reshape(vi.shape) - vi['row']*a.shape[1]
return result
Edit: The timing on this approach is abysmal!
In [21]: %timeit searchsorted_2d(a,b)
10 loops, best of 3: 92.5 ms per loop
You would be better off just just using map over the array:
In [22]: %timeit np.array(list(map(np.searchsorted,a,b)))
100 loops, best of 3: 13.8 ms per loop
For integer data, #Divakar's approach is still the fastest:
In [23]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 7.26 ms per loop

Related

Most Pythonic way to multiply these two vectors?

I have two ndarrays with shapes:
A = (32,512,640)
B = (4,512)
I need to multiply A and B such that I get a new ndarray:
C = (4,32,512,640)
Another way to think of it is that each row of vector B is multiplied along axis=-2 of A, which results in a new 1,32,512,640 cube. Each row of B can be looped over forming 1,32,512,640 cubes, which can then be used to build C up by using np.concatenate or np.vstack, such as:
# Sample inputs, where the dimensions aren't necessarily known
a = np.arange(32*512*465, dtype='f4').reshape((32,512,465))
b = np.ones((4,512), dtype='f4')
# Using a loop
d = []
for row in b:
d.append(np.expand_dims(row[None,:,None]*a, axis=0))
# Or using list comprehension
d = [np.expand_dims(row[None,:,None]*a,axis=0) for row in b]
# Stacking the final list
result = np.vstack(d)
But I am wondering if it's possible to use something like np.einsum or np.tensordot to get this vectorized all in one line. I'm still learning how to use those two methods, so I'm not sure if it's appropriate here.
Thanks!
We can leverage broadcasting after extending the dimensions of B with None/np.newaxis -
C = A * B[:,None,:,None]
With einsum, it would be -
C = np.einsum('ijk,lj->lijk',A,B)
There's no sum-reduction happening here, so einsum won't be any better than the explicit-broadcasting one. But since, we are looking for Pythonic solution, that could be used, once we get past its string notation.
Let's get some timings to finish things off -
In [15]: m,n,r,p = 32,512,640,4
...: A = np.random.rand(m,n,r)
...: B = np.random.rand(p,n)
In [16]: %timeit A * B[:,None,:,None]
10 loops, best of 3: 80.9 ms per loop
In [17]: %timeit np.einsum('ijk,lj->lijk',A,B)
10 loops, best of 3: 109 ms per loop
# Original soln
In [18]: %%timeit
...: d = []
...: for row in B:
...: d.append(np.expand_dims(row[None,:,None]*A, axis=0))
...:
...: result = np.vstack(d)
10 loops, best of 3: 130 ms per loop
Leverage multi-core
We could leverage multi-core capability of numexpr, which is suited for arithmetic operations and large data and thus gain some performance boost here. Let's time with it -
In [42]: import numexpr as ne
In [43]: B4D = B[:,None,:,None] # this is virtually free
In [44]: %timeit ne.evaluate('A*B4D')
10 loops, best of 3: 64.6 ms per loop
In one-line as : ne.evaluate('A*B4D',{'A':A,'B4D' :B[:,None,:,None]}).
Related post on how to control multi-core functionality.

Find nearest indices for one array against all values in another array - Python / NumPy

I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.
My current approach with numpy:
import numpy as np
refArray = np.random.random(16);
myArray = np.random.random(1000);
def find_nearest(array, value):
idx = (np.abs(array-value)).argmin()
return idx;
for value in np.nditer(myArray):
index = find_nearest(refArray, value);
print(index);
Unfortunately, this takes ages for a large amount of values.
Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?
FYI: I don't necessarily need numpy in my script.
Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.
Here's one vectorized approach with np.searchsorted based on this post -
def closest_argmin(A, B):
L = B.size
sidx_B = B.argsort()
sorted_B = B[sidx_B]
sorted_idx = np.searchsorted(sorted_B, A)
sorted_idx[sorted_idx==L] = L-1
mask = (sorted_idx > 0) & \
((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
return sidx_B[sorted_idx-mask]
Brief explanation :
Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.
Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.
Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.
Benchmarking
Original approach -
def org_app(myArray, refArray):
out1 = np.empty(myArray.size, dtype=int)
for i, value in enumerate(myArray):
# find_nearest from posted question
index = find_nearest(refArray, value)
out1[i] = index
return out1
Timings and verification -
In [188]: refArray = np.random.random(16)
...: myArray = np.random.random(1000)
...:
In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop
In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop
In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True
50x+ speedup for the posted sample and hopefully more for larger datasets!
An answer that is much shorter than that of #Divakar, also using broadcasting and even slightly faster:
abs(myArray[:, None] - refArray[None, :]).argmin(axis=-1)

Getting both numpy.partition and numpy.argpartition outputs efficiently

I am using python 3.6 and numpy. I have a n dimensional array. I need to perform both partition and argpartition on the last dimension of the array. I can obviously call both functions, but it feels like wasting resources. Is there a way to get the results of both np.partition and np.argpartition at the same time? There should be a way to get the result of np.partition applying to the array the indices I get from np.argpartition but I don't see it at the moment!
Thank you!
Get those argpartition indices and then use advanced-indexing to get the partitioned array.
Thus, an implementation for a generic ndarray of any number of dimensions and along any generic axis, would be like so -
def partition_results(a, k, axis=-1):
idx = np.argpartition(a, k, axis=axis)
index_arr = list(np.ix_(*[range(i) for i in a.shape]))
index_arr[axis] = idx
return idx, a[index_arr]
np.ix_ gives us the "spread-out" range arrays to accomplish the task of advanced-indexing. These range arrays are needed to cover all dimensions corresponding to the lengths of the axes in argpartition indices array except the last one, for which we have those argpartition indices themselves. This setup is needed for such an indexing operation.
So, with the approach of using two separate calls to np.argpartition and np.partition, we would have it, like so -
def partition_results_exclusive_way(a, k):
idx = np.argpartition(a, k, axis=-1)
part_arr = np.partition(a, k, axis=-1)
return idx , part_arr
We will use it for comparison on performance and value verification in the next section.
Sample run and runtime test -
In [496]: a = np.random.rand(20,20,20,20,20)
In [502]: A0, B0 = partition_results_exclusive_way(a, 10)
In [503]: A1, B1 = partition_results(a, 10)
In [504]: np.allclose(A0,A1)
Out[504]: True
In [505]: np.allclose(B0,B1)
Out[505]: True
In [506]: %timeit partition_results_exclusive_way(a, 10)
10 loops, best of 3: 92.6 ms per loop
In [507]: %timeit partition_results(a, 10)
10 loops, best of 3: 76 ms per loop
Dissecting a bit more on the performance numbers, let's time argpartition and partition separately -
In [509]: %timeit np.argpartition(a, 10, axis=-1)
10 loops, best of 3: 49.6 ms per loop
In [510]: %timeit np.partition(a, 10, axis=-1)
10 loops, best of 3: 43.6 ms per loop
So, the advanced-indexing operation costed us around half of what we had with np.partition. We are definitely saving there!

Average of numpy array ignoring specified value

I have a number of 1-dimensional numpy ndarrays containing the path length between a given node and all other nodes in a network for which I would like to calculate the average. The matter is complicated though by the fact that if no path exists between two nodes the algorithm returns a value of 2147483647 for that given connection. If I leave this value untreated it would obviously grossly inflate my average as a typical path length would be somewhere between 1 and 3 in my network.
One option of dealing with this would be to loop through all elements of all arrays and replace 2147483647 with NaN and then use numpy.nanmean to find the average though that is probably not the most efficient method of going about it. Is there a way of calculating the average with numpy just ignoring all values of 2147483647?
I should add that, I could have up to several million arrays with several million values to average over so any performance gain in how the average is found will make a real difference.
Why not using your usual numpy filtering for this?
m = my_array[my_array != 2147483647].mean()
By the way, if you really want speed, your whole algorithm description seems certainly naive and could be improved by a lot.
Oh and I guess that you are calculating the mean because you have rigorously checked that the underlying distribution is normal so that it means something, aren't you?
np.nanmean(np.where(my_array == 2147483647, np.nan, my_array))
Timings
a = np.random.randn(100000)
a[::10] = 2147483647
%timeit np.nanmean(np.where(a == 2147483647, np.nan, a))
1000 loops, best of 3: 639 µs per loop
%timeit a[a != 2147483647].mean()
1000 loops, best of 3: 259 µs per loop
import pandas as pd
%timeit pd.Series(a).ne(2147483647).mean()
1000 loops, best of 3: 493 µs per loop
One way would be to get the sum for all elements in one go and then removing the contribution from the invalid ones. Finally, we need to get the average value itself, divide by the number of valid elements. So, we would have an implementation like so -
def mean_ignore_num(arr,num):
# Get count of invalid ones
invc = np.count_nonzero(arr==num)
# Get the average value for all numbers and remove contribution from num
return (arr.sum() - invc*num)/float(arr.size-invc)
Verify results -
In [191]: arr = np.full(10,2147483647).astype(np.int32)
...: arr[1] = 5
...: arr[4] = 4
...:
In [192]: arr.max()
Out[192]: 2147483647
In [193]: arr.sum() # Extends beyond int32 max limit, so no overflow
Out[193]: 17179869185
In [194]: arr[arr != 2147483647].mean()
Out[194]: 4.5
In [195]: mean_ignore_num(arr,2147483647)
Out[195]: 4.5
Runtime test -
In [38]: arr = np.random.randint(0,9,(10000))
In [39]: arr[arr != 7].mean()
Out[39]: 3.6704609489462414
In [40]: mean_ignore_num(arr,7)
Out[40]: 3.6704609489462414
In [41]: %timeit arr[arr != 7].mean()
10000 loops, best of 3: 102 µs per loop
In [42]: %timeit mean_ignore_num(arr,7)
10000 loops, best of 3: 36.6 µs per loop

Numpy elementwise product of 3d array

I have two 3d arrays A and B with shape (N, 2, 2) that I would like to multiply element-wise according to the N-axis with a matrix product on each of the 2x2 matrix. With a loop implementation, it looks like
C[i] = dot(A[i], B[i])
Is there a way I could do this without using a loop? I've looked into tensordot, but haven't been able to get it to work. I think I might want something like tensordot(a, b, axes=([1,2], [2,1])) but that's giving me an NxN matrix.
It seems you are doing matrix-multiplications for each slice along the first axis. For the same, you can use np.einsum like so -
np.einsum('ijk,ikl->ijl',A,B)
We can also use np.matmul -
np.matmul(A,B)
On Python 3.x, this matmul operation simplifies with # operator -
A # B
Benchmarking
Approaches -
def einsum_based(A,B):
return np.einsum('ijk,ikl->ijl',A,B)
def matmul_based(A,B):
return np.matmul(A,B)
def forloop(A,B):
N = A.shape[0]
C = np.zeros((N,2,2))
for i in range(N):
C[i] = np.dot(A[i], B[i])
return C
Timings -
In [44]: N = 10000
...: A = np.random.rand(N,2,2)
...: B = np.random.rand(N,2,2)
In [45]: %timeit einsum_based(A,B)
...: %timeit matmul_based(A,B)
...: %timeit forloop(A,B)
100 loops, best of 3: 3.08 ms per loop
100 loops, best of 3: 3.04 ms per loop
100 loops, best of 3: 10.9 ms per loop
You just need to perform the operation on the first dimension of your tensors, which is labeled by 0:
c = tensordot(a, b, axes=(0,0))
This will work as you wish. Also you don't need a list of axes, because it's just along one dimension you're performing the operation. With axes([1,2],[2,1]) you're cross multiplying the 2nd and 3rd dimensions. If you write it in index notation (Einstein summing convention) this corresponds to c[i,j] = a[i,k,l]*b[j,k,l], thus you're contracting the indices you want to keep.
EDIT: Ok, the problem is that the tensor product of a two 3d object is a 6d object. Since contractions involve pairs of indices, there's no way you'll get a 3d object by a tensordot operation. The trick is to split your calculation in two: first you do the tensordot on the index to do the matrix operation and then you take a tensor diagonal in order to reduce your 4d object to 3d. In one command:
d = np.diagonal(np.tensordot(a,b,axes=()), axis1=0, axis2=2)
In tensor notation d[i,j,k] = c[i,j,i,k] = a[i,j,l]*b[i,l,k].

Categories