Cannot assign values to a 'double slice' using numpy - python

How would I do the following in numpy?
select all rows of an array containing note more than 50% 0 values.
select first n (let's say 2) rows from all rows satisfying 1.
do something and place modified rows on same index of a zero array with equal shape of 1.
The following results in an array where no new values are assigned:
In [177]:
a = np.array([[0,0,3],[4,5,6],[7,0,0],[10,11,12],[13,14,15]])
b = np.zeros_like(a)
a
Out[177]:
array([[ 0, 0, 3],
[ 4, 5, 6],
[ 7, 0, 0],
[10, 11, 12],
[13, 14, 15]])
In [178]:
# select all rows containg note more than 50% 0 values
percent = np.sum(a == 0, axis=-1) / float(check.shape[1])
percent = percent >= 0.5
slice = np.invert(percent).nonzero()[0]
In [183]:
# select first two rows satisfying 'slice'
a[slice][0:2]
Out[183]:
array([[ 4, 5, 6],
[10, 11, 12]])
In [182]:
# do something and place modified rows on same index of zero array
b[slice][0:2] = a[slice][0:2] * 2
In [184]:
b
Out[184]:
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])

The problem is that b[slice] creates a copy rather than a view (it triggers fancy indexing). The code b[slice][0:2] creates a view of this copy (not the original b!). Therefore...
b[slice][0:2] = a[slice][0:2] * 2
...is assigning the corresponding rows of a to a view of the copy of b.
Because it can lead to these situations, it's better not to chain indexing operations in this way. Instead, just compute the relevant row numbers for slice first and then do the assignment:
slice = np.invert(percent).nonzero()[0][:2] # first two rows
b[slice] = a[slice] * 2

Related

Splitting a nump array at specific locations

Hey so i basically have a problem like this:
i have a numpy array which contains a matrix of values, for example:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
Using another array i want to extract the specific values and sum them together, this is a bit hard to explain so ill just show an example:
values = np.array([3,1,3,4,2])
so this means we want the first 3 values of the first row, first value of the second row, first 3 values of the 3rd row, first 4 values of the 4th row and first 2 values of the the last row, so we only want this data:
final_data = np.array([
[3, 0, 1],
[0],
[0, 3, 0],
[0, 0, 0, 6],
[5, 1]])
then we want to get the sum amount of those values, in this case the sum value will be 19.
Is there any easy way to do this? also, the data isn't always the same size so i cant have any fixed variables.
An even better answer:
Data[np.arange(Data.shape[1])<values[:,None]].sum()
You can try:
sum([Data[i, :j].sum() for i, j in enumerate(values)])
You can accomplish this with advanced indexing. The advanced coordinates can be calculated separately before pulling them from the array.
Explicitly:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
values = np.array([3,1,3,4,2])
X = [0,0,0,1,2,2,2,3,3,3,3,4,4]
Y = [0,1,2,0,0,1,2,0,1,2,3,0,1]
Data[X,Y]
Notice X is the number of times to access each row and Y is the column to access with each X. These can be calculated from values directly:
X = np.concatenate([[n]*i for n,i in enumerate(values)])
Y = np.concatenate([np.arange(i) for i in values])

compare large sets of arrays

I have a numpy array A of n 1x3 arrays where n is the total number of possible combinations of elements in the 1x3 arrays, where each element ranges from 0 to 50. That is,
A = [[0,0,0],[0,0,1]...[0,1,0]...[50,50,50]]
and
len(A) = 50*50*50 = 125000
I have a numpy array B of m 1x3 arrays where m = 10 million, and the arrays can have values belonging to the set described by A.
I want to count up how many of each combination is present in B, that is, how many times [0,0,0] appears in B, how many times [0,0,1] appears...how many times [50,50,50] appears. So far I have the following:
for i in range(len(A)):
for j in range(len(B)):
if np.array_equal(A[i], B[j]):
y[i] += 1
where y keeps track of how many times the ith array occurs. So, y[0] is how many times [0,0,0] appeared in B, y[1] is how many times [0,0,1] appeared...y[125000] is how many times [50,50,50] appeared, etc.
The problem is this takes forever. It has to check 10 million entries, 125000 times. Is there a quicker and more efficient way to do this?
Here is a fast approach. It processes 10 million tuples out of range(50)^3 in a fraction of a second and is roughly 100 times faster than the next best solution (#Primusa's):
It uses the fact that there is a straight-forward translation between such tuples and the numbers 0 - 50^3 - 1. (The mapping happens to be the same as the one between the rows of your A and the row numbers.) The functions np.ravel_multi_index and np.unravel_index implement this translation and its inverse.
Once B is translated into numbers, their frequencies can be determined very efficiently using np.bincount. Below I reshape the result to get a 50x50x50 histogram but that is just a matter of taste and can be left out. (I have taken the liberty to only use numbers 0 through 49, so len(A) becomes 125000):
>>> B = np.random.randint(0, 50, (10000000, 3))
>>> Br = np.ravel_multi_index(B.T, (50, 50, 50))
>>> result = np.bincount(Br, minlength=125000).reshape(50, 50, 50)
Let's look at a smaller example for demonstration:
>>> B = np.random.randint(0, 3, (10, 3))
>>> Br = np.ravel_multi_index(B.T, (3, 3, 3))
>>> result = np.bincount(Br, minlength=27).reshape(3, 3, 3)
>>>
>>> B
array([[1, 1, 2],
[2, 1, 2],
[2, 0, 0],
[2, 1, 0],
[2, 0, 2],
[0, 0, 2],
[0, 0, 2],
[0, 2, 2],
[2, 0, 0],
[0, 2, 0]])
>>> result
array([[[0, 0, 2],
[0, 0, 0],
[1, 0, 1]],
[[0, 0, 0],
[0, 0, 1],
[0, 0, 0]],
[[2, 0, 1],
[1, 0, 1],
[0, 0, 0]]])
To query for example how many times [2,1,0] is in B one would do
>>> result[2,1,0]
1
As mentioned above: To convert between indices into your A and the actual rows of A (which are the indices into my result), np.ravel_multi_index and np.unravel_index can be used. Or you can leave out the last reshape (i.e. use result = np.bincount(Br, minlength=125000); then the counts are indexed exactly the same as A.
You can use a dict() to speed up this process to just going through 10 million entries.
So the first thing you want to do is to change all the sublists in A to hashable objects do you can use them as keys in a dict.
Converting all the sublists to tuples:
A = [tuple(i) for i in A]
Then create a dict() with every value in A as the key and the value being 0.
d = {i:0 for i in A}
Now for each subarray in your numpy array, you just want to convert it to a tuple and increment d[that array] by 1
for subarray in B:
d[tuple(subarray)] += 1
D is now a dictionary where for each key the value is how many times that key occured in B.
You can find the unique rows and their counts from array B by calling the np.unique over its first axis and return_counts=True. Then, you can use broadcasting to find the indices of the B's unique rows in A by calling the ndarray.all and ndarray.any methods on proper axises. Then all you need is just a simple indexing:
In [82]: unique, counts = np.unique(B, axis=0, return_counts=True)
In [83]: indices = np.where((unique == A[:,None,:]).all(axis=2).any(axis=0))[0]
# Get items from A that exist in B
In [84]: unique[indices]
# Get the counts
In [85]: counts[indices]
Example:
In [86]: arr = np.array([[2 ,3, 4], [5, 6, 0], [2, 3, 4], [1, 0, 4], [3, 3, 3], [5, 6, 0], [2, 3, 4]])
In [87]: a = np.array([[2, 3, 4], [1, 9, 5], [3, 3, 3]])
In [88]: unique, counts = np.unique(arr, axis=0, return_counts=True)
In [89]: indices = np.where((unique == a[:,None,:]).all(axis=2).any(axis=0))[0]
In [90]: unique[indices]
Out[90]:
array([[2, 3, 4],
[3, 3, 3]])
In [91]: counts[indices]
Out[91]: array([3, 1])
You can do this
y=[np.where(np.all(B==arr,axis=1))[0].shape[0] for arr in A]
arr just iterate over A and np.all checks where it matches with B and np.where returns the positions of those matches as an array then shape just returns the length of that array or in other words the desired frequency

vectorize upper level of a vectoized code - python - numpy [duplicate]

I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]

Find the row indexes of several values in a numpy array

I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]

2D array indexing

How can I do the indexing of some arrays used as indices? I have the following six 2D arrays like this-
array([[2, 0],
[3, 0],
[3, 1],
[5, 0],
[5, 1],
[5, 2]])
I want to use these arrays as indices and put the value 10 in the corresponding indices of a new empty matrix. The output should look like this-
array([[ 0, 0, 0],
[ 0, 0, 0],
[10, 0, 0],
[10, 10, 0],
[ 0, 0, 0],
[10, 10, 10]])
So far I have tried this-
from numpy import*
a = array([[2,0],[3,0],[3,1],[5,0],[5,1],[5,2]])
b = zeros((6,3),dtype ='int32')
b[a] = 10
But this gives me the wrong output.
In [1]: import numpy as np
In [2]: a = np.array([[2,0],[3,0],[3,1],[5,0],[5,1],[5,2]])
In [3]: b = np.zeros((6,3), dtype='int32')
In [4]: b[a[:,0], a[:,1]] = 10
In [5]: b
Out[5]:
array([[ 0, 0, 0],
[ 0, 0, 0],
[10, 0, 0],
[10, 10, 0],
[ 0, 0, 0],
[10, 10, 10]])
Why it works:
If you index b with two numpy arrays in an assignment,
b[x, y] = z
then think of NumPy as moving simultaneously over each element of x and each element of y and each element of z (let's call them xval, yval and zval), and assigning to b[xval, yval] the value zval. When z is a constant, "moving over z just returns the same value each time.
That's what we want, with x being the first column of a and y being the second column of a. Thus, choose x = a[:, 0], and y = a[:, 1].
b[a[:,0], a[:,1]] = 10
Why b[a] = 10 does not work
When you write b[a], think of NumPy as creating a new array by moving over each element of a, (let's call each one idx) and placing in the new array the value of b[idx] at the location of idx in a.
idx is a value in a. So it is an int32. b is of shape (6,3), so b[idx] is a row of b of shape (3,). For example, when idx is
In [37]: a[1,1]
Out[37]: 0
b[a[1,1]] is
In [38]: b[a[1,1]]
Out[38]: array([0, 0, 0])
So
In [33]: b[a].shape
Out[33]: (6, 2, 3)
So let's repeat: NumPy is creating a new array by moving over each element of a and placing in the new array the value of b[idx] at the location of idx in a. As idx moves over a, an array of shape (6,2) would be created. But since b[idx] is itself of shape (3,), at each location in the (6,2)-shaped array, a (3,)-shaped value is being placed. The result is an array of shape (6,2,3).
Now, when you make an assignment like
b[a] = 10
a temporary array of shape (6,2,3) with values b[a] is created, then the assignment is performed. Since 10 is a constant, this assignment places the value 10 at each location in the (6,2,3)-shaped array.
Then the values from the temporary array are reassigned back to b.
See reference to docs. Thus the values in the (6,2,3)-shaped array are copied back to the (6,3)-shaped b array. Values overwrite each other. But the main point is you do not obtain the assignments you desire.

Categories