I have two numpy arrays:
A.shape = (nA,x,y)
and
B.shape = (nB,x,y).
I want to find all subarrays such that
A(i,:,:) == B(j,:,:).
I know I can write a double for loop and use
np.array_equal(A(i,:,:),B(j,:,:)
However, is there a more efficient method?
You should only need to loop through one of the arrays, since you wouldn't find any additional unique subarrays after that, and you can do this with a simple list comprehension.
subarrays = [x for x in A if x in B]
If you only want the indices instead of storing the whole subarray, you can do:
indices = [x[0] for x in enumerate(A) if x[1] in B]
Utilizing Steven Rouk's solution, here is a method to get the indices for the subarrays that are equal:
indicesForMatches = [(i,j) for i,subArrayOfA in enumerate(A) for j,subArrayOfB in enumerate(B) if np.array_equal(subArrayOfA,subArrayOfB)]
You can use NumPy broadcasting for a vectorized solution, like so -
mask = ((A[:,None,:,:] == B).all(2)).all(2)
A_idx,B_idx = np.where(mask)
You can use reshaping to avoid double .all() usages and get the mask, like so -
mask = (A.reshape(A.shape[0],1,-1) == B.reshape(B.shape[0],-1)).all(-1)
Sample run -
In [41]: # Setup input arrays and force some indices to be same between A and B
...: nA = 4
...: nB = 5
...: x = 3
...: y = 2
...:
...: A = np.random.randint(0,9,(nA,x,y))
...: B = np.random.randint(0,9,(nB,x,y))
...:
...: A[2,:,:] = B[1,:,:]
...: A[3,:,:] = B[4,:,:]
...:
In [42]: mask = ((A[:,None,:,:] == B).all(2)).all(2)
...: A_idx,B_idx = np.where(mask)
...:
In [43]: A_idx, B_idx
Out[43]: (array([2, 3]), array([1, 4]))
In [44]: mask = (A.reshape(A.shape[0],1,-1) == B.reshape(B.shape[0],-1)).all(-1)
...: A_idx,B_idx = np.where(mask)
...:
In [45]: A_idx, B_idx
Out[45]: (array([2, 3]), array([1, 4]))
Related
I am trying to create permutations of size 4 from a group of real numbers. After that, I'd like to know the position of the first element in a permutation after I sort it. Here is what I have tried so far. What's the best way to do this?
import numpy as np
from itertools import chain, permutations
N_PLAYERS = 4
N_STATES = 60
np.random.seed(0)
state_space = np.linspace(0.0, 1.0, num=N_STATES, retstep=True)[0].tolist()
perms = permutations(state_space, N_PLAYERS)
perms_arr = np.fromiter(chain(*perms),dtype=np.float16)
def loc(row):
return np.where(np.argsort(row) == 0)[0].tolist()[0]
locs = np.apply_along_axis(loc, 0, perms)
In [153]: N_PLAYERS = 4
...: N_STATES = 60
...: np.random.seed(0)
...: state_space = np.linspace(0.0, 1.0, num=N_STATES, retstep=True)[0].tolist()
...: perms = itertools.permutations(state_space, N_PLAYERS)
In [154]: alist = list(perms)
In [155]: len(alist)
Out[155]: 11703240
Simply making a list from the permuations produces a list of lists, with all sublists of length N_PLAYERS.
Making an array from that with chain flattens it:
In [156]: perms = itertools.permutations(state_space, N_PLAYERS)
In [158]: perms_arr = np.fromiter(itertools.chain(*perms),dtype=np.float16)
In [159]: perms_arr.shape
Out[159]: (46812960,)
In [160]: alist[0]
Which could be reshaped to (11703240,4).
Using apply on that 1d array doesn't work (or make sense):
In [170]: perms_arr.shape
Out[170]: (46812960,)
In [171]: locs = np.apply_along_axis(loc, 0, perms_arr)
In [172]: locs.shape
Out[172]: ()
Reshape to 4 columns:
In [173]: locs = np.apply_along_axis(loc, 0, perms_arr.reshape(-1,4))
In [174]: locs.shape
Out[174]: (4,)
In [175]: locs
Out[175]: array([ 0, 195054, 578037, 769366])
This applies loc to each column, returning one value for each. But loc has a row variable. Is that supposed to be significant?
I could switch the axis; this takes much longer, and al
In [176]: locs = np.apply_along_axis(loc, 1, perms_arr.reshape(-1,4))
In [177]: locs.shape
Out[177]: (11703240,)
list comprehension
This iteration does the same thing as your apply_along_axis, and I expect is faster (though I haven't timed it - it's too slow).
In [188]: locs1 = np.array([loc(row) for row in perms_arr.reshape(-1,4)])
In [189]: np.allclose(locs, locs1)
Out[189]: True
whole array sort
But argsort takes an axis, so I can sort all rows at once (instead of iterating):
In [185]: np.nonzero(np.argsort(perms_arr.reshape(-1,4), axis=1)==0)
Out[185]:
(array([ 0, 1, 2, ..., 11703237, 11703238, 11703239]),
array([0, 0, 0, ..., 3, 3, 3]))
In [186]: np.allclose(_[1],locs)
Out[186]: True
Or going the other direction: - cf with Out[175]
In [187]: np.nonzero(np.argsort(perms_arr.reshape(-1,4), axis=0)==0)
Out[187]: (array([ 0, 195054, 578037, 769366]), array([0, 1, 2, 3]))
Use the following code to illustrate my question.
import numpy as np
np.random.seed(200)
a = np.array([1,21,6,41,8]) # given an array with 5 elements
idx = np.random.choice(5, 3, replace=False) # randomly select 3 indexes between 0 and 4
idx.sort() # sort indexes
print(idx) # [0 3 4]
print(a[idx]) # get random selected subset using the indexes, [ 1 41 8]
How to get the remaining indexes [1,2]?
In [123]: np.random.seed(200)
...: a = np.array([1,21,6,41,8]) # given an array with 5 elements
...: idx = np.random.choice(5, 3, replace=False) # randomly select 3 indexe
...: s between 0 and 4
...: idx.sort() # sort indexes
In [124]: idx
Out[124]: array([0, 3, 4])
In [125]: a[idx]
Out[125]: array([ 1, 41, 8])
We could make a boolean mask, and find the True indices:
In [126]: mask = np.ones(a.shape, bool)
In [127]: mask[idx]=False
In [128]: mask
Out[128]: array([False, True, True, False, False])
In [129]: np.nonzero(mask)[0]
Out[129]: array([1, 2])
In [131]: np.arange(a.shape[0])[mask]
Out[131]: array([1, 2])
np.delete does this same sort of masking:
In [132]: np.delete(np.arange(a.shape[0]), idx)
Out[132]: array([1, 2])
One way to do it:
inverted_idx = [x not in idx for x in range(0, len(a))]
print(a[inverted_idx])
Result:
[21 6]
That creates a boolean mask, if you prefer an integer mask, like the one you had:
inverted_idx = [x for x in range(0, len(a)) if x not in idx]
print(a[inverted_idx])
Very similar to https://math.stackexchange.com/q/3615927/419686, but different.
I have 2 matrices (A with shape (5,2,3) and B with shape (6,3,8)), and I want to perform some kind of multiplication in order to take a new matrix with shape (5,6,2,8).
Python code:
import numpy as np
np.random.seed(1)
A = np.random.randint(0, 10, size=(5,2,3))
B = np.random.randint(0, 10, size=(6,3,8))
C = np.zeros((5,6,2,8))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
C[i,j] = A[i].dot(B[j])
Is it possible to do the above operation without using a loop?
In [52]: np.random.seed(1)
...: A = np.random.randint(0, 10, size=(5,2,3))
...: B = np.random.randint(0, 10, size=(6,3,8))
...:
...: C = np.zeros((5,6,2,8))
...: for i in range(A.shape[0]):
...: for j in range(B.shape[0]):
...: C[i,j] = A[i].dot(B[j])
...:
np.dot does broadcast the outer dimensions:
In [53]: D=np.dot(A,B)
In [54]: C.shape
Out[54]: (5, 6, 2, 8)
In [55]: D.shape
Out[55]: (5, 2, 6, 8)
The axes order is different, but we can easily change that:
In [56]: np.allclose(C, D.transpose(0,2,1,3))
Out[56]: True
In [57]: np.allclose(C, np.swapaxes(D,1,2))
Out[57]: True
From the np.dot docs:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
Use np.einsum which is very powerful:
C = np.einsum('aij, bjk -> abik', A, B)
Lets say I have a simple array:
a = np.arange(3)
And an array of indices with the same length:
I = np.array([0, 0, 1])
I now want to group the values based on the indices.
How would I group the elements of the first array to produce the result below?
np.array([[0, 1], [2], dtype=object)
Here is what I tried:
a = np.arange(3)
I = np.array([0, 0, 1])
out = np.empty(2, dtype=object)
out.fill([])
aslists = np.vectorize(lambda x: [x], otypes=['object'])
out[I] += aslists(a)
However, this approach does not concatenate the lists, but only maintains the last value for each index:
array([[1], [2]], dtype=object)
Or, for a 2-dimensional case:
a = np.random.rand(100)
I = (np.random.random(100) * 5 //1).astype(int)
J = (np.random.random(100) * 5 //1).astype(int)
out = np.empty((5, 5), dtype=object)
out.fill([])
How can I append the items from a to out based on the two index arrays?
1D Case
Assuming I being sorted, for a list of arrays as output -
idx = np.unique(I, return_index=True)[1]
out = np.split(a,idx)[1:]
Another with slicing to get idx for splitting a -
out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
To get an array of lists as output -
np.array([i.tolist() for i in out])
Sample run -
In [84]: a = np.arange(3)
In [85]: I = np.array([0, 0, 1])
In [86]: out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
In [87]: out
Out[87]: [array([0, 1]), array([2])]
In [88]: np.array([i.tolist() for i in out])
Out[88]: array([[0, 1], [2]], dtype=object)
2D Case
For 2D case of filling into a 2D array with groupings made from indices in two arrays I and J that represent the rows and columns where the groups are to be assigned, we could do something like this -
ncols = 5
lidx = I*ncols+J
sidx = lidx.argsort() # Use kind='mergesort' to keep order
lidx_sorted = lidx[sidx]
unq_idx, split_idx = np.unique(lidx_sorted, return_index=True)
out.flat[unq_idx] = np.split(a[sidx], split_idx)[1:]
Given a matrix like the following:
A = np.array([[1,2,3],
[3,4,5],
[4,5,6]])
How can I pinpoint the index of an element of interest. For example, assume I would like to find the index of 2 in the first row of the np.array, like so:
A[0,:].index(2), but clearly this does not work because A[0,:] is not a list.
You can compare the array to the value 2, and then use where.
For example, to find the location of 2 in the first row of A:
In [179]: np.where(A[0, :] == 2)[0]
Out[179]: array([1])
In [180]: j = np.where(A[0, :] == 2)[0]
In [181]: A[0, j]
Out[181]: array([2])
where also works with higher-dimensional arrays. For example, to find 2 in the full array A:
In [182]: i, j = np.where(A == 2)
In [183]: A[i,j]
Out[183]: array([2])