How to get 2d array of indices from 1d array? - python

I'm looking for an efficient way to return indices for a 2d array based on values in a 1d array. I currently have a nested for loop set up that is painfully slow.
Here is some example data and what I want to get:
data2d = np.array( [ [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])
data1d = np.array([1,2,3,4,5,6,7,8,9])
I would like to return the indices where data2d is equal to data1d. My desired output would be this 2d array:
locs = np.array([[0, 1], [0, 2], [2, 3], [0, 1], [6, 8]])
The only thing I've come up with is the nested for loop:
locs = np.full((np.shape(data2d)), np.nan)
for i in range(0, 5):
for j in range(0, 2):
loc_val = np.where(data1d == data2d[i, j])
loc_val = loc_val[0]
locs[i, j] = loc_val
This would be fine for a small set of data but I have 87,600 2d grids that are each 428x614 grid points.

Use np.searchsorted:
np.searchsorted(data1d, data2d.ravel()).reshape(data2d.shape)
array([[0, 1],
[0, 2],
[2, 3],
[0, 1],
[6, 8]])
searchsorted performs binary search with the ravelled data2d. The result is then reshaped.
Another option is to build an index and query it in constant time. You can do this with pandas' Index API.
import pandas as pd
idx = pd.Index([1,2,3,4,5,6,7,8,9])
idx
# Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
idx.get_indexer(data2d.ravel()).reshape(data2d.shape)
array([[0, 1],
[0, 2],
[2, 3],
[0, 1],
[6, 8]])

This should be fast also
import numpy as np
data2d = np.array( [ [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])
data1d = np.array([1,2,3,4,5,6,7,8,9])
idxdict = dict(zip(data1d,range(len(data1d))))
locs = data2d
for i in range(len(locs)):
for j in range(len(locs[i])):
locs[i][j] = idxdict[locs[i][j]]

Related

Split numpy 2D array based on separate label array

I have a 2D numpy array A. For example:
A = np.array([[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 0]])
I have another label array B corresponding to rows of A. For example:
B = np.array([0, 1, 2, 0, 1])
I want to split A into 3 arrays based on their labels, so the result would be:
[[[1, 2],
[7, 8]],
[[3, 4],
[9, 0]],
[[5, 6]]]
Are there any numpy built in functions to achieve this?
Right now, my solution is rather ugly and involves repeating calling numpy.where in a for-loop, and slicing the indices tuples to contain only the rows.
Here's one way to do it:
hstack both the array together.
sort the array by the last column
split the array based on unique value index
a = np.hstack((A,B[:,None]))
a = a[a[:, -1].argsort()]
a = np.split(a[:,:-1], np.unique(a[:, -1], return_index=True)[1][1:])
OUTPUT:
[array([[1, 2],
[7, 8]]),
array([[3, 4],
[9, 0]]),
array([[5, 6]])]
If the output can always be an array because the labels are equally distributed, you only need to sort the data by label:
idx = B.argsort()
n = np.flatnonzero(np.diff(idx))[0] + 1
result = A[idx].reshape(n, A.shape[0] // n, A.shape[1])
If the labels aren't equally distributed, you'll have to make a list in the outer dimension:
_, indices, counts = np.unique(B, return_counts=True, return_inverse=True)
result = np.split(A[indices.argsort()], counts.cumsum()[:-1])
Using the equivalent of np.where is not very efficient, but you can do it without a loop:
b, idx = np.unique(B, return_inverse=True)
mask = idx[:, None] == np.arange(b.size)
result = np.split(A[idx.argsort()], np.count_nonzero(mask, axis=0).cumsum()[:-1])
You can compute the mask simulataneously for all the labels and apply it to the sorted A (A[idx.argsort()]) by counting the number of matching elements in each category (np.count_nonzero(mask, axis=0).cumsum()). The last index is stripped off the cumulative sum because np.split always adds an implicit total index.
You could also use Pandas for this because it's designed for labelled data and has a powerful groupby method.
import pandas as pd
index = pd.Index(B, name='label')
df = pd.DataFrame(A, index=index)
groups = {k: v.values for k, v in df.groupby('label')}
print(groups)
This produces a dictionary of arrays of the grouped values:
{0: array([[1, 2],
[7, 8]]), 1: array([[3, 4],
[9, 0]]), 2: array([[5, 6]])}
For a list of the arrays you can do this instead:
groups = [v.values for k, v in df.groupby('label')]
This is probably the simplest way:
groups = [A[B == label, :] for label in np.unique(B)]
print(groups)
Output:
[array([[1, 2],
[7, 8]]), array([[3, 4],
[9, 0]]), array([[5, 6]])]

Append indices of element to each element

So basically I want to create a new array for each element and append the coordinates of the element to the original value (so adding the x and y position to the original element):
[ [7,2,4],[1,5,3] ]
then becomes
[ [[0,0,7][0,1,2][0,2,4]],
[[1,0,1][1,1,5][1,2,3]] ]
I've been looking for different ways to make this work with the axis system in NumPy but I'm probably overseeing some more obvious way.
You can try np.meshgrid to create a grid and then np.stack to combine it with input array:
import numpy as np
a = np.asarray([[7,2,4],[1,5,3]])
result = np.stack(np.meshgrid(range(a.shape[1]), range(a.shape[0]))[::-1] + [a], axis=-1)
Output:
array([[[0, 0, 7],
[0, 1, 2],
[0, 2, 4]],
[[1, 0, 1],
[1, 1, 5],
[1, 2, 3]]])
Let me know if it helps.
Without numpy you could use list comprehension:
old_list = [ [7,2,4],[1,5,3] ]
new_list = [ [[i,j,old_list[i][j]] for j in range(len(old_list[i]))] for i in range(old_list) ]
I'd assume that numpy is faster but the sublists are not required to have equal length in this solution.
Another approach using enumerate
In [38]: merge = list()
...: for i,j in enumerate(val):
...: merge.append([[i, m, n] for m, n in enumerate(j)])
...:
In [39]: merge
Out[39]: [[[0, 0, 7], [0, 1, 2], [0, 2, 4]], [[1, 0, 1], [1, 1, 5], [1, 2, 3]]]
Hope it useful
a = np.array([[7,2,4], [1,5,3]])
idx = np.argwhere(a)
idx = idx.reshape((*(a.shape), -1))
a = np.expand_dims(a, axis=-1)
a = np.concatenate((idx, a), axis=-1)

How to find a row-wise intersection of 2d numpy arrays?

I look for an efficient way to get a row-wise intersection of two two-dimensional numpy ndarrays. There is only one intersection per row. For example:
[[1, 2], ∩ [[0, 1], -> [1,
[3, 4]] [0, 3]] 3]
In the best case zeros should be ignored:
[[1, 2, 0], ∩ [[0, 1, 0], -> [1,
[3, 4, 0]] [0, 3, 0]] 3]
My solution:
import numpy as np
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[0, 1],
[0, 3]])
arr3 = np.empty(len(arr1))
for i in range(len(arr1)):
arr3[i] = np.intersect1d(arr1[i], arr2[i])
print(arr3)
# [ 1. 3.]
I have about 1 million rows, so the vectorized operations are most preferred. You are welcome to use other python packages.
You can use np.apply_along_axis.
I wrote a solution that pads to the size of the arr1.
Didn't test the efficiency.
import numpy as np
def intersect1d_padded(x):
x, y = np.split(x, 2)
padded_intersection = -1 * np.ones(x.shape, dtype=np.int)
intersection = np.intersect1d(x, y)
padded_intersection[:intersection.shape[0]] = intersection
return padded_intersection
def rowwise_intersection(a, b):
return np.apply_along_axis(intersect1d_padded,
1, np.concatenate((a, b), axis=1))
result = rowwise_intersection(arr1,arr2)
>>> array([[ 1, -1],
[ 3, -1]])
if you know you have only one element in the intersection you can use
result = rowwise_intersection(arr1,arr2)[:,0]
>>> array([1, 3])
You can also modify intersect1d_padded to return a scalar with the intersection value.
I don't know of an elegant way to do it in numpy, but a simple list comprehension can do the trick:
[list(set.intersection(set(_x),set(_y)).difference({0})) for _x,_y in zip(x,y)]

How do I calculate xi^j in a matrix in Numpy

I am trying to calculate a matrix from an array that is inputted.
I would like to be able to input
a = [0,1,2]
in python and would like to reshape it with Numpy such that the result is that the array is in the form of x_i^j at row i and column j,
so for example
the input is:
a = [0,1,2]
and the output should be
[[1,0,0],
[1,1,1],
[1,2,4]]
and I have used the following code
xij = np.matrix([np.power(xi,j) for j in x for xi in x]).reshape(3,3)
[[ 1, 2, 3],
[ 1, 4, 9],
[ 1, 8, 27]]
I assume I'm using the wrong formula for Numpy,
please could you assist me in this to solve the problem.
Thanks in advance
You need to use a range(len(a)) to get the exponents and the correct order of for loops
a = [0,1,2]
xij = np.matrix([np.power(xi,j) for xi in a for j in range(len(a))]).reshape(3,3)
# matrix([[1, 0, 0],
# [1, 1, 1],
# [1, 2, 4]])
With array broadcasting:
In [823]: np.array([0,1,2])**np.arange(3)[:,None]
Out[823]:
array([[1, 1, 1],
[0, 1, 2],
[0, 1, 4]])
In [825]: np.array([1,2,3])**np.arange(1,4)[:,None]
Out[825]:
array([[ 1, 2, 3],
[ 1, 4, 9],
[ 1, 8, 27]])

Howto expand 2D NumPy array by copy bottom row and right column?

I have a 2D NumPy array and I hope to expand its size on both dimensions by copying the bottom row and right column.
For example, from 2x2:
[[0,1],
[2,3]]
to 4x4:
[[0,1,1,1],
[2,3,3,3],
[2,3,3,3],
[2,3,3,3]]
What's the best way to do it?
Thanks.
Here, the hstack and vstack functions can come in handy. For example,
In [16]: p = array(([0,1], [2,3]))
In [20]: vstack((p, p[-1], p[-1]))
Out[20]:
array([[0, 1],
[2, 3],
[2, 3],
[2, 3]])
And remembering that p.T is the transpose:
So now you can do something like the following:
In [16]: p = array(([0,1], [2,3]))
In [22]: p = vstack((p, p[-1], p[-1]))
In [25]: p = vstack((p.T, p.T[-1], p.T[-1])).T
In [26]: p
Out[26]:
array([[0, 1, 1, 1],
[2, 3, 3, 3],
[2, 3, 3, 3],
[2, 3, 3, 3]])
So the 2 lines of code should do it...
Make an empty array and copy whatever rows, columns you want into it.
def expand(a, new_shape):
x, y = a.shape
r = np.empty(new_shape, a.dtype)
r[:x, :y] = a
r[x:, :y] = a[-1:, :]
r[:x, y:] = a[:, -1:]
r[x:, y:] = a[-1, -1]
return r

Categories