Rearranging 2D numpy array by 2D row and column arrays - python

I have tried to find a similar question but so far it seems only half my question can be answered.
I have a 2D numpy array, e.g.:
a= np.array([[6, 4, 5],
[4, 7, 8],
[2, 8, 9]])
And i also have 2 further numpy arrays, indicating the rows, and columns where i would like to rearrange (or not):
rows= np.array([[0, 0, 0],
[1, 0, 1],
[2, 2, 2]])
cols= np.array([[0, 1, 2],
[0, 0, 2],
[0, 1, 2]])
now i would like to rearrange the array "a" based on these indices, so that the result is:
result= np.array([[6, 4, 5],
[4, 6, 8],
[2, 8, 9]])
Doing this only for columns or only for rows is easy, e.g. see this Thread:
np.array(list(map(lambda x, y: y[x], cols, a)))

This is a typical case of fancy/array indexing:
result = a[rows, cols]
Output:
array([[6, 4, 5],
[4, 6, 8],
[2, 8, 9]])

Related

Given indexes, get values from numpy matrix

Let's say I have this numpy matrix:
>>> mat = np.matrix([[3,4,5,2,1], [1,2,7,6,5], [8,9,4,5,2]])
>>> mat
matrix([[3, 4, 5, 2, 1],
[1, 2, 7, 6, 5],
[8, 9, 4, 5, 2]])
Now let's say I have some indexes in this form:
>>> ind = np.matrix([[0,2,3], [0,4,2], [3,1,2]])
>>> ind
matrix([[0, 2, 3],
[0, 4, 2],
[3, 1, 2]])
What I would like to do is to get three values from each row of the matrix, specifically values at columns 0, 2, and 3 for the first row, values at columns 0, 4 and 2 for the second row, etc. This is the expected output:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
I've tried using np.take but it doesn't seem to work. Any suggestion?
This is take_along_axis.
>>> np.take_along_axis(mat, ind, axis=1)
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
This will do it: mat[np.arange(3).reshape(-1, 1), ind]
In [245]: mat[np.arange(3).reshape(-1, 1), ind]
Out[245]:
matrix([[3, 5, 2],
[1, 5, 7],
[5, 9, 4]])
(but take_along_axis in #user3483203's answer is simpler).

efficient per column matrix indexing in numpy

I have two matrices of the same size, A, B. I want to use the columns of B to acsses the columns of A, on a per column basis. For example,
A = np.array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
and
B = np.array([[0, 0, 2],
[1, 2, 1],
[2, 1, 0]])
I want something like:
A[B] = [[1, 4, 9],
[2, 6, 8],
[3, 5, 7]]
I.e., I've used the j'th column of B as indices to the j'th column of A.
Is there any effiecnt way of doing so?
Thanks!
You can use advanced indexing:
A[B, np.arange(A.shape[0])]
array([[1, 4, 9],
[2, 6, 8],
[3, 5, 7]])
Or with np.take_along_axis:
np.take_along_axis(A, B, axis=0)
array([[1, 4, 9],
[2, 6, 8],
[3, 5, 7]])

How to split array by indices where the splitted sub-arrays include the split point

I have a 2D array containing values and a 1D array with index values where I would like to split the 2D matrix, where the splitted sub-arrays include the 'split-point'.
I know I can use the numpy.split function to split by indices and I know I can use stride_tricks to split an array for creating consecutive overlapping subset-views.
But it seems the stride_ticks only applies if we want to split an array into equal sized sub-arrays.
Minimal example, I can do the following:
>>> import numpy as np
>>> array = np.random.randint(0,10, (10,2))
>>> indices = np.array([2,3,8])
>>> array
array([[8, 1],
[1, 0],
[2, 0],
[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4],
[6, 7],
[6, 4]])
>>> split_array = np.split(array, indices, axis=0)
>>> split_array
[array([[8, 1],
[1, 0]]),
array([[2, 0]]),
array([[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4]]),
array([[6, 7],
[6, 4]])]
But I'm merely looking for an option within the split function where I could define include_split_point=True, which would give me a result as such:
[array([[8, 1],
[1, 0],
[2, 0]]),
array([[2, 0],
[8, 8]]),
array([[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4],
[6, 7]]),
array([[6, 7],
[6, 4]])]
Create a new array with the index elements repeated
new_indices = np.zeros(array.shape[0], dtype = int)
new_indices[indices] = 1
new_indices += 1
new_array = np.repeat(array, new_indices, axis = 0)
Update indices to account for the changed array
indices = indices + np.arange(1, len(indices)+1)
Split using the indices as usual
np.split(new_array, indices, axis = 0)
output:
[array([[8, 1],
[1, 0],
[2, 0]]),
array([[2, 0],
[8, 8]]),
array([[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4],
[6, 7]]),
array([[6, 7],
[6, 4]])]

Flattening an array of matrices to a single matrix (python)

I have a list of matrices:
arr = [array([[1, 2, 3], [7, 8, 9]]), array([[4, 5, 6], [0, 0, 1]])]
I want to flatten them in the following way:
[[1, 2, 3], [7, 8, 9], [4, 5, 6], [0, 0, 1]]
numpy.flatten flattens it into a single array of numbers.
I tried this: flattened_list = [y for x in arr for y in x]
It does the job, but all rows of the matrix are numpy arrays.
Is there any way to flatten numpy arrays upto a certain depth?
You should use reshape:
out = arr.reshape((4,3))
What you want is the vstack function from numpy. It takes a tuple of ndarrays and returns a new ndarray which is the result of stacking them vertically with the first ndarray being on top and so on.
For example:
import numpy as np
>>> a = np.array([1, 2])
>>> b = np.array([3, 4])
>>> c = np.array([5, 6])
>>> np.vstack(a, b)
array([[1, 2],
[3, 4],
[5, 6]])
In your case you can easily call the tuple function on your list of ndarrays
>>> arr = [array([[1, 2, 3], [7, 8, 9]]), array([[4, 5, 6], [0, 0, 1]])]
>>> np.vstack(tuple(arr))
array([[1, 2, 3],
[7, 8, 9],
[4, 5, 6],
[0, 0, 1]])
If you want your answer as a python list then just call numpy's ndarray.tolist function on the result like so:
>>> np.ndarray.tolist(np.vstack(arr))
[[1, 2, 3], [7, 8, 9], [4, 5, 6], [0, 0, 1]]

Remove only rows which contain duplicates within that row of 3D numpy array

I have a 3D numpy array like this:
>>> a
array([[[0, 1, 2],
[0, 1, 2],
[6, 7, 8]],
[[6, 7, 8],
[0, 1, 2],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
I want to remove only those rows which contain duplicates within themselves. For instance the output should look like this:
>>> remove_row_duplicates(a)
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
This is the function that I am using:
delindices = np.empty(0, dtype=int)
for i in range(len(a)):
_, indices = np.unique(np.around(a[i], decimals=10), axis=0, return_index=True)
if len(indices) < len(a[i]):
delindices = np.append(delindices, i)
a = np.delete(a, delindices, 0)
This works perfectly, but the problem is now my array shape is like (1000000,7,3). The for loop is pretty slow in python and this take a lot of time. Also my original array contains floating numbers. Any one who has a better solution or who can help me vectorizing this function?
Sort it along the rows for each 2D block i.e. along axis=1 and then look for matching rows along the successive ones and finally look for any matches along the same axis=1 -
b = np.sort(a,axis=1)
out = a[~((b[:,1:] == b[:,:-1]).all(-1)).any(1)]
Sample run with explanation
Input array :
In [51]: a
Out[51]:
array([[[0, 1, 2],
[0, 1, 2],
[6, 7, 8]],
[[6, 7, 8],
[0, 1, 2],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Code steps :
# Sort along axis=1, i.e rows in each 2D block
In [52]: b = np.sort(a,axis=1)
In [53]: b
Out[53]:
array([[[0, 1, 2],
[0, 1, 2],
[6, 7, 8]],
[[0, 1, 2],
[6, 7, 8],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
In [54]: (b[:,1:] == b[:,:-1]).all(-1) # Look for successive matching rows
Out[54]:
array([[ True, False],
[False, True],
[False, False]])
# Look for matches along each row, which indicates presence
# of duplicate rows within each 2D block in original 2D array
In [55]: ((b[:,1:] == b[:,:-1]).all(-1)).any(1)
Out[55]: array([ True, True, False])
# Invert those as we need to remove those cases
# Finally index with boolean indexing and get the output
In [57]: a[~((b[:,1:] == b[:,:-1]).all(-1)).any(1)]
Out[57]:
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
You can probably do this easily using broadcasting but since you're dealing with more than 2D arrays it wont be as optimized as you expect and even in some cases very slow. Instead you can use following approach inspired by Jaime's answer:
In [28]: u = np.unique(arr.view(np.dtype((np.void, arr.dtype.itemsize*arr.shape[1])))).view(arr.dtype).reshape(-1, arr.shape[1])
In [29]: inds = np.where((arr == u).all(2).sum(0) == u.shape[1])
In [30]: arr[inds]
Out[30]:
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])

Categories