how to make my numpy array unique by axis?

how to make my numpy array unique by axis? - python

i have a numpy array like the XY coordinates here below:
2d_coords = [
[1,2]
[1,1]
[2,1]
[3,1]
...
]
either [1,1] or [1,2] need to go (doesn't care which one) , only one point on the X coordinate is possible.
How can I do that ?

numpy.unique would be helpful. For example,
import numpy as np
l = np.asarray([
[1, 2],
[1, 1],
[2, 1],
[3, 1],
])
_, unique_indices = np.unique(l[:, 0], return_index=True) # get the indices with unique x coordinates
print(l[unique_indices])
The example output:
[[1 2]
[2 1]
[3 1]]

You can use NumPy and matplotlib:
import numpy as np
import matplotlib.pyplot as plt
coords = np.array([[1, 2], [1, 1], [2, 1], [3, 1]])
plot_coords = coords[np.unique(coords[:,0])].T
plt.plot(plot_coords[0], plot_coords[1])
plt.show()

What about pandas?
pd.DataFrame(coords).drop_duplicates(0).values
array([[1, 2],
[2, 1],
[3, 1]])

Without using any external library, you can use a conditional list comprehension:
d_coords = [[1,2],[1,1],[2,1],[3,1]]
new_list = [d_coords[i] for i in range(len(d_coords)) if d_coords[i][0] not in [k[0] for k in d_coords[:i]]]
# new_list: [[1, 2], [2, 1], [3, 1]]
NOTE: don't start variable names with numbers

Related

Combine two numpy arrays

Let say I have 2 numpy arrays
import numpy as np
x = np.array([1,2,3])
y = np.array([1,2,3,4])
With this, I want to create a 2-dimensional array as below
Is there any method available to directly achieve this?

You problem is about writing the Cartesian product. In numpy, you can write it using repeat and tile:
out = np.c_[np.repeat(x, len(y)), np.tile(y, len(x))]
Python's builtin itertools module has a method designed for this: product:
from itertools import product
out = np.array(list(product(x,y)))
Output:
array([[1, 1],
[1, 2],
[1, 3],
[1, 4],
[2, 1],
[2, 2],
[2, 3],
[2, 4],
[3, 1],
[3, 2],
[3, 3],
[3, 4]])

Problem when I put float into a numpy array

I have a problem when I put floats in a numpy array.
Here is my code:
x=sum([item[0] for item in clusters[k]])/len(clusters[k])
y=sum([item[1] for item in clusters[k]])/len(clusters[k])
centers[k]=np.array([x,y])
And this is what I get, when I print x, y and centers:
x:
5.029068157893012
y:
4.9725319416514395
x:
1.0273866309343607
y:
0.9492915123013862
x:
8.01021923983273
y:
1.034128622860488
cluster:
[[5 4]
[1 0]
[8 1]]
I have tried to use:
centers[k]=np.array([x,y],dtype=np.float64)
without any success...
Thank you in advance for your help!

If I understand you correctly, you need the following:
import numpy as np
cluster = np.array([[5, 4],
[1, 0],
[8, 1]])
# Sum each column and divide by the length
center = np.sum(cluster, axis=0)/len(cluster)
print(center)
# array([4.66666667, 1.66666667])
In the case of a multidimensional data set, you can try the following:
import numpy as np
clusters = np.array([[[5, 4], [1, 0], [8, 1]],
[[5, 4], [1, 0], [8, 1]],
[[5, 4], [1, 0], [8, 1]]])
_sum = np.sum(clusters, axis=1)
center = np.array([_sum[k]/len(clusters[k]) for k in range(len(clusters))])
center

Append indices of element to each element

So basically I want to create a new array for each element and append the coordinates of the element to the original value (so adding the x and y position to the original element):
[ [7,2,4],[1,5,3] ]
then becomes
[ [[0,0,7][0,1,2][0,2,4]],
[[1,0,1][1,1,5][1,2,3]] ]
I've been looking for different ways to make this work with the axis system in NumPy but I'm probably overseeing some more obvious way.

You can try np.meshgrid to create a grid and then np.stack to combine it with input array:
import numpy as np
a = np.asarray([[7,2,4],[1,5,3]])
result = np.stack(np.meshgrid(range(a.shape[1]), range(a.shape[0]))[::-1] + [a], axis=-1)
Output:
array([[[0, 0, 7],
[0, 1, 2],
[0, 2, 4]],
[[1, 0, 1],
[1, 1, 5],
[1, 2, 3]]])
Let me know if it helps.

Without numpy you could use list comprehension:
old_list = [ [7,2,4],[1,5,3] ]
new_list = [ [[i,j,old_list[i][j]] for j in range(len(old_list[i]))] for i in range(old_list) ]
I'd assume that numpy is faster but the sublists are not required to have equal length in this solution.

Another approach using enumerate
In [38]: merge = list()
...: for i,j in enumerate(val):
...: merge.append([[i, m, n] for m, n in enumerate(j)])
...:
In [39]: merge
Out[39]: [[[0, 0, 7], [0, 1, 2], [0, 2, 4]], [[1, 0, 1], [1, 1, 5], [1, 2, 3]]]

Hope it useful
a = np.array([[7,2,4], [1,5,3]])
idx = np.argwhere(a)
idx = idx.reshape((*(a.shape), -1))
a = np.expand_dims(a, axis=-1)
a = np.concatenate((idx, a), axis=-1)

How to find a row-wise intersection of 2d numpy arrays?

I look for an efficient way to get a row-wise intersection of two two-dimensional numpy ndarrays. There is only one intersection per row. For example:
[[1, 2], ∩ [[0, 1], -> [1,
[3, 4]] [0, 3]] 3]
In the best case zeros should be ignored:
[[1, 2, 0], ∩ [[0, 1, 0], -> [1,
[3, 4, 0]] [0, 3, 0]] 3]
My solution:
import numpy as np
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[0, 1],
[0, 3]])
arr3 = np.empty(len(arr1))
for i in range(len(arr1)):
arr3[i] = np.intersect1d(arr1[i], arr2[i])
print(arr3)
# [ 1. 3.]
I have about 1 million rows, so the vectorized operations are most preferred. You are welcome to use other python packages.

You can use np.apply_along_axis.
I wrote a solution that pads to the size of the arr1.
Didn't test the efficiency.
import numpy as np
def intersect1d_padded(x):
x, y = np.split(x, 2)
padded_intersection = -1 * np.ones(x.shape, dtype=np.int)
intersection = np.intersect1d(x, y)
padded_intersection[:intersection.shape[0]] = intersection
return padded_intersection
def rowwise_intersection(a, b):
return np.apply_along_axis(intersect1d_padded,
1, np.concatenate((a, b), axis=1))
result = rowwise_intersection(arr1,arr2)
>>> array([[ 1, -1],
[ 3, -1]])
if you know you have only one element in the intersection you can use
result = rowwise_intersection(arr1,arr2)[:,0]
>>> array([1, 3])
You can also modify intersect1d_padded to return a scalar with the intersection value.

I don't know of an elegant way to do it in numpy, but a simple list comprehension can do the trick:
[list(set.intersection(set(_x),set(_y)).difference({0})) for _x,_y in zip(x,y)]

Filtering multiple NumPy arrays based on the intersection of one column

I have three rather large NumPy arrays with varying numbers of rows, whose first columns are all integers. My hope is to filter these arrays such that the only rows left are those for whom the value in the first column is shared by all three. This would leave three arrays of the same size. The entries in the other columns are not necessarily shared across arrays.
So, with input:
A =
[[1, 1],
[2, 2],
[3, 3],]
B =
[[2, 1],
[3, 2],
[4, 3],
[5, 4]]
C =
[[2, 2],
[3, 1]
[5, 2]]
I hope to get back as output:
A =
[[2, 2],
[3, 3]]
B =
[[2, 1],
[3, 2]]
C =
[[2, 2],
[3, 1]]
My current approach is to:
Find the intersection of the three first columns using numpy.intersect1d()
Use numpy.in1d() on this intersection and the first columns of each array to find the row indices that are not shared in each array (converting boolean to index using a modified version of the method found here: Python: intersection indices numpy array )
Finally using numpy.delete() with each of these indices and its respective array to remove rows with non-shared entries in the first column.
I'm wondering if there might be a faster or more elegantly Pythonic way to go about this however, something that is suited to very large arrays.

Your indices in your example are sorted and unique. Assuming this is no coincidence (and this situation often arises, or can easily be enforced), the following works:
import numpy as np
A = np.array(
[[1, 1],
[2, 2],
[3, 3],])
B = np.array(
[[2, 1],
[3, 2],
[4, 3],
[5, 4]])
C = np.array(
[[2, 2],
[3, 1],
[5, 2],])
I = reduce(
lambda l,r: np.intersect1d(l,r,True),
(i[:,0] for i in (A,B,C)))
print A[np.searchsorted(A[:,0], I)]
print B[np.searchsorted(B[:,0], I)]
print C[np.searchsorted(C[:,0], I)]
and in case the first column is not in sorted order (but is still unique):
C = np.array(
[[9, 2],
[1,6],
[5, 1],
[2, 5],
[3, 2],])
def index_by_first_column_entry(M, keys):
colkeys = M[:,0]
sorter = np.argsort(colkeys)
index = np.searchsorted(colkeys, keys, sorter = sorter)
return M[sorter[index]]
print index_by_first_column_entry(C, I)
and make sure to change the true to false in
I = reduce(
lambda l,r: np.intersect1d(l,r,False),
(i[:,0] for i in (A,B,C)))
generalization to duplicate values can be made using np.unique

One way to do this is to build an indicator array, or a hash table if you like, to indicate which integers are in all your input arrays. Then you can use boolean indexing based on this indicator array to get the subarrays. Something like this:
import numpy as np
# Setup
A = np.array(
[[1, 1],
[2, 2],
[3, 3],])
B = np.array(
[[2, 1],
[3, 2],
[4, 3],
[5, 4]])
C = np.array(
[[2, 2],
[3, 1],
[5, 2],])
def take_overlap(*input):
n = len(input)
maxIndex = max(array[:, 0].max() for array in input)
indicator = np.zeros(maxIndex + 1, dtype=int)
for array in input:
indicator[array[:, 0]] += 1
indicator = indicator == n
result = []
for array in input:
# Look up each integer in the indicator array
mask = indicator[array[:, 0]]
# Use boolean indexing to get the sub array
result.append(array[mask])
return result
subA, subB, subC = take_overlap(A, B, C)
This should be quite fast and this method does not assume the elements of the input arrays are unique or sorted. However this method could take a lot of memory, and might e a bit slower, if the indexing integers are sparse, ie [1, 10, 10000], but should be close to optimal if the integers are more or less dense.

This works but I'm not sure if it is faster than any of the other answers:
import numpy as np
A = np.array(
[[1, 1],
[2, 2],
[3, 3],])
B = np.array(
[[2, 1],
[3, 2],
[4, 3],
[5, 4]])
C = np.array(
[[2, 2],
[3, 1],
[5, 2],])
a = A[:,0]
b = B[:,0]
c = C[:,0]
ab = np.where(a[:, np.newaxis] == b[np.newaxis, :])
bc = np.where(b[:, np.newaxis] == c[np.newaxis, :])
ab_in_bc = np.in1d(ab[1], bc[0])
bc_in_ab = np.in1d(bc[0], ab[1])
arows = ab[0][ab_in_bc]
brows = ab[1][ab_in_bc]
crows = bc[1][bc_in_ab]
anew = A[arows, :]
bnew = B[brows, :]
cnew = C[crows, :]
print(anew)
print(bnew)
print(cnew)
gives:
[[2 2]
[3 3]]
[[2 1]
[3 2]]
[[2 2]
[3 1]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to make my numpy array unique by axis? - python

i have a numpy array like the XY coordinates here below: 2d_coords = [ [1,2] [1,1] [2,1] [3,1] ... ] either [1,1] or [1,2] need to go (doesn't care which one) , only one point on the X coordinate is possible. How can I do that ?

numpy.unique would be helpful. For example, import numpy as np l = np.asarray([ [1, 2], [1, 1], [2, 1], [3, 1], ]) _, unique_indices = np.unique(l[:, 0], return_index=True) # get the indices with unique x coordinates print(l[unique_indices]) The example output: [[1 2] [2 1] [3 1]]

You can use NumPy and matplotlib: import numpy as np import matplotlib.pyplot as plt coords = np.array([[1, 2], [1, 1], [2, 1], [3, 1]]) plot_coords = coords[np.unique(coords[:,0])].T plt.plot(plot_coords[0], plot_coords[1]) plt.show()

What about pandas? pd.DataFrame(coords).drop_duplicates(0).values array([[1, 2], [2, 1], [3, 1]])

Related

Combine two numpy arrays

Problem when I put float into a numpy array

Append indices of element to each element

How to find a row-wise intersection of 2d numpy arrays?

Filtering multiple NumPy arrays based on the intersection of one column

Categories

Resources