select elements of different columns at different rows of numpy array - python

In [62]: a
Out[62]:
array([[1, 2],
[3, 4]])
Is there an easy way to get [2,3], i.e. the second element of the first row, and the first element of the second row? I have the list of the indices for each row, i.e. [1,0] in this case. I have tried a[:,[1,0]], but it doesn't work.

You need to specify both i and j for all the elements you want. For example:
import numpy as np
a = np.array([[1, 2],
[3, 4]])
i = [0, 1]
j = [1, 0]
print(a[i, j])
# [2, 3]
If you need one item from each row, you can use i = np.arange(a.shape[0])

Related

check if values of a column are in values of another (numpy array)

I have a numpy array and I want to create a new one appending the row only if for each row, the element in the columnX is absent in every row of the columnY.
I thought about a for loop to do it, but it doesn't work.
array = [()]
for row in data:
if data[:, 0] == data[:, 6]:
np.append(array, row)
I know you said numpy but it is this easy in pandas
df = pd.DataFrame(data, columns=[x,y])
df = df[~df.x.isin(df.y)]
output = df.values # back to numpy
Guess from #Jordan's answer, the resulting array should be like:
array = np.array([[1, 2], [2, 3], [3, 4], [5, 6], [4, 3]])
array[~np.isin(array[:, 0], array[:, 1])]
and get
array([[1, 2],
[5, 6]])
I'm not sure if I get the point of your question.

Split numpy 2D array based on separate label array

I have a 2D numpy array A. For example:
A = np.array([[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 0]])
I have another label array B corresponding to rows of A. For example:
B = np.array([0, 1, 2, 0, 1])
I want to split A into 3 arrays based on their labels, so the result would be:
[[[1, 2],
[7, 8]],
[[3, 4],
[9, 0]],
[[5, 6]]]
Are there any numpy built in functions to achieve this?
Right now, my solution is rather ugly and involves repeating calling numpy.where in a for-loop, and slicing the indices tuples to contain only the rows.
Here's one way to do it:
hstack both the array together.
sort the array by the last column
split the array based on unique value index
a = np.hstack((A,B[:,None]))
a = a[a[:, -1].argsort()]
a = np.split(a[:,:-1], np.unique(a[:, -1], return_index=True)[1][1:])
OUTPUT:
[array([[1, 2],
[7, 8]]),
array([[3, 4],
[9, 0]]),
array([[5, 6]])]
If the output can always be an array because the labels are equally distributed, you only need to sort the data by label:
idx = B.argsort()
n = np.flatnonzero(np.diff(idx))[0] + 1
result = A[idx].reshape(n, A.shape[0] // n, A.shape[1])
If the labels aren't equally distributed, you'll have to make a list in the outer dimension:
_, indices, counts = np.unique(B, return_counts=True, return_inverse=True)
result = np.split(A[indices.argsort()], counts.cumsum()[:-1])
Using the equivalent of np.where is not very efficient, but you can do it without a loop:
b, idx = np.unique(B, return_inverse=True)
mask = idx[:, None] == np.arange(b.size)
result = np.split(A[idx.argsort()], np.count_nonzero(mask, axis=0).cumsum()[:-1])
You can compute the mask simulataneously for all the labels and apply it to the sorted A (A[idx.argsort()]) by counting the number of matching elements in each category (np.count_nonzero(mask, axis=0).cumsum()). The last index is stripped off the cumulative sum because np.split always adds an implicit total index.
You could also use Pandas for this because it's designed for labelled data and has a powerful groupby method.
import pandas as pd
index = pd.Index(B, name='label')
df = pd.DataFrame(A, index=index)
groups = {k: v.values for k, v in df.groupby('label')}
print(groups)
This produces a dictionary of arrays of the grouped values:
{0: array([[1, 2],
[7, 8]]), 1: array([[3, 4],
[9, 0]]), 2: array([[5, 6]])}
For a list of the arrays you can do this instead:
groups = [v.values for k, v in df.groupby('label')]
This is probably the simplest way:
groups = [A[B == label, :] for label in np.unique(B)]
print(groups)
Output:
[array([[1, 2],
[7, 8]]), array([[3, 4],
[9, 0]]), array([[5, 6]])]

Given the indexes corresponding to each row, get the corresponding elements from a matrix

Given indexes for each row, how to return the corresponding elements in a 2-d matrix?
For instance, In array of np.array([[1,2,3,4],[4,5,6,7]]) I expect to see the output [[1,2],[4,5]] given indxs = np.array([[0,1],[0,1]]). Below is what I've tried:
a= np.array([[1,2,3,4],[4,5,6,7]])
indxs = np.array([[0,1],[0,1]]) #means return the elements located at 0 and 1 for each row
#I tried this, but it returns an array with shape (2, 2, 4)
a[idxs]
The reason you are getting two times your array is that when you do a[[0,1]] you are selecting the rows 0 and 1 from your array a, which are indeed your entire array.
In[]: a[[0,1]]
Out[]: array([[1, 2, 3, 4],
[4, 5, 6, 7]])
You can get the desired output using slides. That would be the easiest way.
a = np.array([[1,2,3,4],[4,5,6,7]])
a[:,0:2]
Out []: array([[1, 2],
[4, 5]])
In case you are still interested on indexing, you could also get your output doing:
In[]: [list(a[[0],[0,1]]),list(a[[1],[0,1]])]
Out[]: [[1, 2], [4, 5]]
The NumPy documentation gives you a really nice overview on how indexes work.
In [120]: indxs = np.array([[0,1],[0,1]])
In [121]: a= np.array([[1,2,3,4],[4,5,6,7]])
...: indxs = np.array([[0,1],[0,1]]) #
You need to provide an index for the first dimension, one that broadcasts with with indxs.
In [122]: a[np.arange(2)[:,None], indxs]
Out[122]:
array([[1, 2],
[4, 5]])
indxs is (2,n), so you need a (2,1) array to give a (2,n) result

Modify different columns in each row of a 2D NumPy array

I have the following problem:
Let's say I have an array defined like this:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
What I would like to do is to make use of Numpy multiple indexing and set several elements to 0. To do that I'm creating a vector:
indices_to_remove = [1, 2, 0]
What I want it to mean is the following:
Remove element with index '1' from the first row
Remove element with index '2' from the second row
Remove element with index '0' from the third row
The result should be the array [[1,0,3],[4,5,0],[0,8,9]]
I've managed to get values of the elements I would like to modify by following code:
values = np.diagonal(np.take(A, indices, axis=1))
However, that doesn't allow me to modify them. How could this be solved?
You could use integer array indexing to assign those zeros -
A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
Sample run -
In [445]: A
Out[445]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [446]: indices_to_remove
Out[446]: [1, 2, 0]
In [447]: A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
In [448]: A
Out[448]:
array([[1, 0, 3],
[4, 5, 0],
[0, 8, 9]])

Numpy: sort rows of an array by rows in another array

I have a 2D array of "neighbors", and I want to re-order each row according to a corresponding row in another matrix (called "radii"). The below code works, but it uses a for loop over a numpy array, which I know is the incorrect way to do it. What is the correct numpy / broadcast solution to this re-ordering?
neighbors = np.array([[8,7,6], [3,2,1]])
radii = np.array([[0.4, 0.2, 0.1], [0.3, 0.9, 0.1]])
order = radii.argsort(axis=1)
for i in range(2):
neighbors[i] = neighbors[i,order[i]]
print(neighbors)
# Result:
[[6 7 8]
[1 3 2]]
In NumPy you would write something like this:
>>> neighbors[np.arange(2)[:, None], order]
array([[6, 7, 8],
[1, 3, 2]])
(More generally you'd write the first index as np.arange(order.shape[0])[:, None] instead.)
This works because np.arange(2)[:, None] looks like this:
array([[0],
[1]])
and order looks like this:
array([[2, 1, 0],
[2, 0, 1]])
For the fancy indexing, NumPy pairs off the arrays indexing each axis. The row index [0] is paired with the column index [2, 1, 0] and the new row is created in the order this determines. Similarly for [1] and [2, 0, 1] to determine the second row.

Categories