I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:
Identify coordinate pairs with duplicated x-values.
Keep only the coordinate pair of those duplicates with the highest y-value.
For example, in the following array:
arr = [[1, 4]
[1, 8]
[2, 3]
[4, 6]
[4, 2]
[5, 1]
[5, 2]
[5, 6]]
I would like the result to be:
arr = [[1, 8]
[2, 3]
[4, 6]
[5, 6]]
Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!
Here's one way based on np.maximum.reduceat -
def grouby_maxY(a):
b = a[a[:,0].argsort()] # if first col is already sorted, skip this
grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
return np.c_[b[grp_idx,0], grp_maxY]
Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].
Sample run -
In [453]: np.random.seed(0)
In [454]: arr = np.random.randint(0,5,(10,2))
In [455]: arr
Out[455]:
array([[4, 0],
[3, 3],
[3, 1],
[3, 2],
[4, 0],
[0, 4],
[2, 1],
[0, 1],
[1, 0],
[1, 4]])
In [456]: grouby_maxY(arr)
Out[456]:
array([[0, 4],
[1, 4],
[2, 1],
[3, 3],
[4, 0]])
Related
I want to append elements of A (shape=(1,10,2)) with the same j to create a new array A1. For example, [1,3] and [2,3] should be appended into one element because of same j (=3) and different i (=1 and =2 respectively). The desired output is attached.
import numpy as np
A=np.array([[
[0, 1],
[0, 2],
[1, 3],
[2, 3],
[2, 4],
[3, 5],
[3, 6],
[4, 6],
[5, 7],
[6, 7]]])
The desired output is
A1=array([[
[0, 1],
[0, 2],
[[1, 3],[2, 3]],
[2, 4],
[3, 5],
[[3, 6],[4, 6]],
[[5, 7],[6, 7]]]])
A1.shape=(1,7,2)
I've done it using the following steps. The only problem is that you can't have the final result as an array because of varying sizes. If you convert the result to a numpy array it becomes an array of lists of shape (7,).
You can however still iterate through it with for loops if it's not a huge list.
If you are using it in neural networks, you might want to consider converting to a ragged tensor
Get the list of second numbers
second_numbers = A[:,:,1].reshape(-1)
Get unique values from that list
uniq = set(second_numbers)
Create new list based on those unique values
new_list = []
for i in uniq:
new_list.append((A[:, second_numbers == i, :].reshape(-1,2)).tolist())
Full code with result:
second_numbers = A[:,:,1].reshape(-1)
uniq = set(second_numbers)
new_list = []
for i in uniq:
new_list.append((A[:, second_numbers == i, :].reshape(-1,2)).tolist())
new_list
>>> [[[0, 1]],
[[0, 2]],
[[1, 3], [2, 3]],
[[2, 4]],
[[3, 5]],
[[3, 6], [4, 6]],
[[5, 7], [6, 7]]]
Consider the following array:
x = np.array([[1, 1],[1, 1], [1, 2], [1, 2], [1, 2],
[2, 3], [2, 3], [2, 3], [2, 4], [2, 4]])
x
Out[12]:
array([[1, 1],
[1, 1],
[1, 2],
[1, 2],
[1, 2],
[2, 3],
[2, 3],
[2, 3],
[2, 4],
[2, 4],
[2, 5],
[2, 5],
[2, 5]])
How would I get the number of unique column 2 values for each column 1 value?
For example: if it can be done using a function V, then V(x) = [2, 3].
I have implemented this using a for loop. However, it seems more complicated than necessary and takes too much time (when applied to my actual dataset which is much larger than this example).
I am interested in performance and am willing to sacrifice code clarity for speed (although they usually are directly correlated!).
Use numpy.unique twice:
import numpy as np
x = np.array([[1, 1],[1, 1], [1, 2], [1, 2], [1, 2],
[2, 3], [2, 3], [2, 3], [2, 4], [2, 4]])
# drop duplicates
xx = np.unique(x, axis=0)
# count the first column
values, counts = np.unique(xx[:,0], return_counts=True)
print(values)
print(counts)
# [1, 2]
# [2, 2]
I have 0s and 1s store in a 3-dimensional numpy array:
g = np.array([[[0, 1], [0, 1], [1, 0]], [[0, 0], [1, 0], [1, 1]]])
# array([
# [[0, 1], [0, 1], [1, 0]],
# [[0, 0], [1, 0], [1, 1]]])
and I'd like to replace these values by those in another array using a row-wise replacement strategy. For example, replacing the vales of g by x:
x = np.array([[2, 3], [4, 5]])
array([[2, 3],
[4, 5]])
to obtain:
array([
[[2, 3], [2, 3], [3, 2]],
[[4, 4], [5, 4], [5, 5]]])
The idea here would be to have the first row of g replaced by the first elements of x (0 becomes 2 and 1 becomes 3) and the same for the other row (the first dimension - number of "rows" - will always be the same for g and x)
I can't seem to be able to use np.where because there's a ValueError: operands could not be broadcast together with shapes (2,3,2) (2,2) (2,2).
IIUC,
np.stack([x[i, g[i]] for i in range(x.shape[0])])
Output:
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
Vectorized approach with np.take_along_axis to index into the last axis of x with g using axis=-1 -
In [20]: np.take_along_axis(x[:,None],g,axis=-1)
Out[20]:
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
Or with manual integer-based indexing -
In [27]: x[np.arange(len(g))[:,None,None],g]
Out[27]:
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
One solution, is to simply use comprehension directly here:
>>> np.array([[x[i][c] for c in r] for i, r in enumerate(g)])
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
From what I understand, g is an array of indexes (indexes being 0 or 1) and x is the array to who's values you use.
Something like this should work (tested quickly)
import numpy as np
def swap_indexes(index_array, array):
out_array = []
for i, row in enumerate(index_array):
out_array.append([array[i,indexes] for indexes in row])
return np.array(out_array)
index_array = np.array([[[0, 1], [0, 1], [1, 0]], [[0, 0], [1, 0], [1, 1]]])
x = np.array([[2, 3], [4, 5]])
print(swap_indexes(index_array, x))
[EDIT: fixed typo that created duplicates]
I have a 3D array as follow, 'b', which I want to represent an array of 2-D array. I want to remove the duplicates of my 2-D arrays and get the unique ones.
>>> a = [[[1, 2], [1, 2]], [[1, 2], [4, 5]], [[1, 2], [1, 2]]]
>>> b = numpy.array(a)
>>> b
array([[[1, 2],
[1, 2]],
[[1, 2],
[4, 5]],
[[1, 2],
[1, 2]]])
In this above example, I really want to return the following because there exist one duplicate which I want to remove.
unique = array([[[1, 2],
[1, 2]],
[[1, 2],
[4, 5]])
How should do this with numpy package? Thanks
See previous answer: Remove duplicate rows of a numpy array
convert to array of tuples and then apply np.unique()
Converting to tuple and back again is probably going to be quire expensive, instead you can do a generalized view:
def unique_by_first(a):
tmp = a.reshape(a.shape[0], -1)
b = np.ascontiguousarray(tmp).view(np.dtype((np.void, tmp.dtype.itemsize * tmp.shape[1])))
_, idx = np.unique(b, return_index=True)
return a[idx].reshape(-1, *a.shape[1:])
Usage:
print unique_by_first(a)
[[[1 2]
[1 2]]
[[1 2]
[4 5]]]
Effectively, a generalization of previous answers.
You can convert each such 2D slice off the last two axes into a scalar each by considering them as indices on a multi-dimensional grid. The intention is to map each such slice to a scalar based on their uniqueness. Then, using those scalars, we could use np.unique to keep one instance only.
Thus, an implementation would be -
idx = np.ravel_multi_index(a.reshape(a.shape[0],-1).T,a.max(0).ravel()+1)
out = a[np.sort(np.unique(idx, return_index=1)[1])]
Sample run -
In [43]: a
Out[43]:
array([[[8, 1],
[2, 8]],
[[3, 8],
[3, 4]],
[[2, 4],
[1, 0]],
[[3, 0],
[4, 8]],
[[2, 4],
[1, 0]],
[[8, 1],
[2, 8]]])
In [44]: idx = np.ravel_multi_index(a.reshape(a.shape[0],-1).T,a.max(0).ravel()+1)
In [45]: a[np.sort(np.unique(idx, return_index=1)[1])]
Out[45]:
array([[[8, 1],
[2, 8]],
[[3, 8],
[3, 4]],
[[2, 4],
[1, 0]],
[[3, 0],
[4, 8]]])
If you don't mind the order of such slices being maintained, skip the np.sort() at the last step.
Reshape, find the unique rows, then reshape again.
Finding unique tuples by converting to a set.
import numpy as np
a = [[[1, 2], [1, 2]], [[1, 2], [4, 5]], [[1, 2], [1, 2]]]
b = np.array(a)
new_array = [tuple(row) for row in b.reshape(3,4)]
uniques = list(set(new_array))
output = np.array(uniques).reshape(len(uniques), 2, 2)
output
Out[131]:
array([[[1, 2],
[1, 2]],
[[1, 2],
[4, 5]]])
This seems been asked many times, however the answer I found not work now. Let's be simple, here I have a numpy matrix
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
Then sort by second column as below
[[1, 0],
[3, 2],
[7, 6],
[5, 7],
[9, 8]])
I tried a lot examples like Python Matrix sorting via one column but none of them worked.
I wondering maybe because the answers were posted years ago which do not work for newest Python? My Python is 3.5.1.
Example of my failed trial:
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
temp = data.view(np.ndarray)
np.lexsort((temp[:, 1], ))
print(temp)
print(data)
You are a moving target.
Sort each column independently:
In [151]: np.sort(data,axis=0)
Out[151]:
matrix([[1, 0],
[3, 2],
[5, 6],
[7, 7],
[9, 8]])
Sort on the values of the second column
In [160]: ind=np.argsort(data[:,1],axis=0)
In [161]: ind
Out[161]:
matrix([[4],
[3],
[1],
[2],
[0]], dtype=int32)
In [162]: data[ind.ravel(),:] # ravel needed because of matrix
Out[162]:
matrix([[[1, 0],
[3, 2],
[7, 6],
[5, 7],
[9, 8]]])
Another way to get a valid ind array:
In [163]: ind=np.argsort(data.A[:,1],axis=0)
In [164]: ind
Out[164]: array([4, 3, 1, 2, 0], dtype=int32)
In [165]: data[ind,:]
To use lexsort you need something like
In [175]: np.lexsort([data.A[:,0],data.A[:,1]])
Out[175]: array([4, 3, 1, 2, 0], dtype=int32)
or your 'failed' case - which isn't a fail
In [178]: np.lexsort((data.A[:,1],))
Out[178]: array([4, 3, 1, 2, 0], dtype=int32)
here data[:,1] is the primary key. data[:,0] is the tie breaker (not applicable in your example). I'm just working from the docs.
The approach in your link is working:
import numpy as np
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
print(data[np.argsort(data.A[:, 1])])
[[1 0]
[3 2]
[7 6]
[5 7]
[9 8]]
And now an example where it's better to see:
data = np.matrix([[1, 9],
[2, 8],
[3, 7],
[4, 6],
[0, 5]])
[[0 5]
[4 6]
[3 7]
[2 8]
[1 9]]