Find indices of unique values of a 3-dim numpy array - python

I have an array with coordinates of N points. Another array contains the masses of these N points.
>>> import numpy as np
>>> N=10
>>> xyz=np.random.randint(0,2,(N,3))
>>> mass=np.random.rand(len(xyz))
>>> xyz
array([[1, 0, 1],
[1, 1, 0],
[0, 1, 1],
[0, 0, 0],
[0, 1, 0],
[1, 1, 0],
[1, 0, 1],
[0, 0, 1],
[1, 0, 1],
[0, 0, 1]])
>>> mass
array([ 0.38668401, 0.44385111, 0.47756182, 0.74896529, 0.20424403,
0.21828435, 0.98937523, 0.08736635, 0.24790248, 0.67759276])
Now I want to obtain an array with unique values of xyz and a corresponding array of summed up masses. That means the following arrays:
>>> xyz_unique
array([[0, 1, 1],
[1, 1, 0],
[0, 0, 1],
[1, 0, 1],
[0, 0, 0],
[0, 1, 0]])
>>> mass_unique
array([ 0.47756182, 0.66213546, 0.76495911, 1.62396172, 0.74896529,
0.20424403])
My attempt was the following code with a double for-loop:
>>> xyz_unique=np.array(list(set(tuple(p) for p in xyz)))
>>> mass_unique=np.zeros(len(xyz_unique))
>>> for j in np.arange(len(xyz_unique)):
... indices=np.array([],dtype=np.int64)
... for i in np.arange(len(xyz)):
... if np.all(xyz[i]==xyz_unique[j]):
... indices=np.append(indices,i)
... mass_unique[j]=np.sum(mass[indices])
The problem is that this takes too long, I actually have N=100000.
Is there a faster way or how could I improve my code?
EDIT My coordinates are actually float numbers. To keep things simple, I made random integers to have duplicates at low N.

Case 1: Binary numbers in xyz
If the elements in the input array xyz were 0's and 1's, you can convert each row into a decimal number, then label each row based on their uniqueness with other decimal numbers. Then, based on those labels, you can use np.bincount to accumulate the summations, just like in MATLAB one could use accumarray. Here's the implementation to achieve all that -
import numpy as np
# Input arrays xyz and mass
xyz = np.array([
[1, 0, 1],
[1, 1, 0],
[0, 1, 1],
[0, 0, 0],
[0, 1, 0],
[1, 1, 0],
[1, 0, 1],
[0, 0, 1],
[1, 0, 1],
[0, 0, 1]])
mass = np.array([ 0.38668401, 0.44385111, 0.47756182, 0.74896529, 0.20424403,
0.21828435, 0.98937523, 0.08736635, 0.24790248, 0.67759276])
# Convert each row entry in xyz into equivalent decimal numbers
dec_num = np.dot(xyz,2**np.arange(xyz.shape[1])[:,None])
# Get indices of the first occurrences of the unique values and also label each row
_, unq_idx,row_labels = np.unique(dec_num, return_index=True, return_inverse=True)
# Find unique rows from xyz array
xyz_unique = xyz[unq_idx,:]
# Accumulate the summations from mass based on the row labels
mass_unique = np.bincount(row_labels, weights=mass)
Output -
In [148]: xyz_unique
Out[148]:
array([[0, 0, 0],
[0, 1, 0],
[1, 1, 0],
[0, 0, 1],
[1, 0, 1],
[0, 1, 1]])
In [149]: mass_unique
Out[149]:
array([ 0.74896529, 0.20424403, 0.66213546, 0.76495911, 1.62396172,
0.47756182])
Case 2: Generic
For a general case, you can use this -
import numpy as np
# Perform lex sort and get the sorted indices
sorted_idx = np.lexsort(xyz.T)
sorted_xyz = xyz[sorted_idx,:]
# Differentiation along rows for sorted array
df1 = np.diff(sorted_xyz,axis=0)
df2 = np.append([True],np.any(df1!=0,1),0)
# Get unique sorted labels
sorted_labels = df2.cumsum(0)-1
# Get labels
labels = np.zeros_like(sorted_idx)
labels[sorted_idx] = sorted_labels
# Get unique indices
unq_idx = sorted_idx[df2]
# Get unique xyz's and the mass counts using accumulation with bincount
xyz_unique = xyz[unq_idx,:]
mass_unique = np.bincount(labels, weights=mass)
Sample run -
In [238]: xyz
Out[238]:
array([[1, 2, 1],
[1, 2, 1],
[0, 1, 0],
[1, 0, 1],
[2, 1, 2],
[2, 1, 1],
[0, 1, 0],
[1, 0, 0],
[2, 1, 0],
[2, 0, 1]])
In [239]: mass
Out[239]:
array([ 0.5126308 , 0.69075674, 0.02749734, 0.384824 , 0.65151772,
0.77718427, 0.18839268, 0.78364902, 0.15962722, 0.09906355])
In [240]: xyz_unique
Out[240]:
array([[1, 0, 0],
[0, 1, 0],
[2, 1, 0],
[1, 0, 1],
[2, 0, 1],
[2, 1, 1],
[1, 2, 1],
[2, 1, 2]])
In [241]: mass_unique
Out[241]:
array([ 0.78364902, 0.21589002, 0.15962722, 0.384824 , 0.09906355,
0.77718427, 1.20338754, 0.65151772])

Related

Numpy create block matrices from object type numpy arrays

I am trying to create block matrices with numpy with this shape
Where each entry is a 4x4 matrix.
As an example lets fill all the entries with 4x4 zero matrices.
N = 9
sizeOfBlock = 4
A = np.zeros((N, N), dtype =object)
for i in np.arange(N):
for j in np.arange(N):
A[i,j] = np.zeros((sizeOfBlock,sizeOfBlock))
This will create the matrix with the correct shape.
Now I would like to convert it from a 9x9 matrix of object to a 36x36 matrix will all the entries.
Any way to do this?
Best Regards
Your use of object dtype array has few, if any, advantages over a list of lists.
block can make an array from nested lists of arrays:
In [294]: A=np.arange(4).reshape(2,2); Z=np.zeros((2,2),int)
In [295]: np.block([[A,Z],[Z,A]])
Out[295]:
array([[0, 1, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 1],
[0, 0, 2, 3]])
An object dtype array:
In [296]: O = np.empty((2,2),object)
In [298]: O[0,0]=O[1,1]=A
In [300]: O[0,1]=O[1,0]=Z
In [301]: O
Out[301]:
array([[array([[0, 1],
[2, 3]]), array([[0, 0],
[0, 0]])],
[array([[0, 0],
[0, 0]]), array([[0, 1],
[2, 3]])]], dtype=object)
block doesn't do anything useful with that:
In [302]: np.block(O)
Out[302]:
array([[array([[0, 1],
[2, 3]]), array([[0, 0],
[0, 0]])],
[array([[0, 0],
[0, 0]]), array([[0, 1],
[2, 3]])]], dtype=object)
However if we turn into a list of lists as in [295]
In [303]: np.block(O.tolist())
Out[303]:
array([[0, 1, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 1],
[0, 0, 2, 3]])
The current code of block is obscured with the dispatching layer, but when I looked at it in the past, it was using a recursive hstack and vstack
In [307]: np.vstack((np.hstack(O[0]),np.hstack(O[1])))
Out[307]:
array([[0, 1, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 1],
[0, 0, 2, 3]])
concatenate and family can handle 1d object dtype arrays, treating them as a sequence, or list, of arrays. They can't work directly with 2d arrays.
Comments suggest working with a 4d array:
In [308]: arr = np.array([[A,Z],[Z,A]])
In [309]: arr
Out[309]:
array([[[[0, 1],
[2, 3]],
[[0, 0],
[0, 0]]],
[[[0, 0],
[0, 0]],
[[0, 1],
[2, 3]]]])
simple reshape doesn't not preserve the block layout:
In [310]: arr.reshape(4,4)
Out[310]:
array([[0, 1, 2, 3],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 1, 2, 3]])
but an axis swap will:
In [311]: arr.transpose(0,2,1,3).reshape(4,4)
Out[311]:
array([[0, 1, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 1],
[0, 0, 2, 3]])
For some layouts, Kronecker product does the job
In [315]: np.kron(np.eye(2).astype(int),A)
Out[315]:
array([[0, 1, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 1],
[0, 0, 2, 3]])

Count indices to array to produce heatmap

I'd like to accumulate indices that point to a m-by-n array to another array of that very shape to produce a heatmap. For example, these indices:
[
[0, 1, 2, 0, 1, 2]
[0, 1, 0, 0, 0, 2]
]
would produce the following array:
[
[2, 0, 0]
[1, 1, 0]
[1, 0, 1]
]
I've managed to succesfully implement an algorithm, but I started wondering, whether there is already a built-in NumPy solution for this kind of problem.
Here's my code:
a = np.array([[0, 1, 2, 0, 1, 2], [0, 1, 0, 0, 0, 2]])
def _gather_indices(indices: np.ndarray, shape: tuple):
heat = np.zeros(shape)
for i in range(indices.shape[-1]):
heat[tuple(indices[:, i])] += 1
Two methods could be suggested.
With np.add.at -
heat = np.zeros(shape,dtype=int)
np.add.at(heat,(a[0],a[1]),1)
Or with tuple() based one for a more aesthetic one -
np.add.at(heat,tuple(a),1)
With bincount -
idx = np.ravel_multi_index(a,shape)
np.bincount(idx,minlength=np.prod(shape)).reshape(shape)
Additionally, we could compute shape using the max-limits of the indices in a -
shape = a.max(axis=1)+1
Sample run -
In [147]: a
Out[147]:
array([[0, 1, 2, 0, 1, 2],
[0, 1, 0, 0, 0, 2]])
In [148]: shape = (3,3)
In [149]: heat = np.zeros(shape,dtype=int)
...: np.add.at(heat,(a[0],a[1]),1)
In [151]: heat
Out[151]:
array([[2, 0, 0],
[1, 1, 0],
[1, 0, 1]])
In [173]: idx = np.ravel_multi_index(a,shape)
In [174]: np.bincount(idx,minlength=np.prod(shape)).reshape(shape)
Out[174]:
array([[2, 0, 0],
[1, 1, 0],
[1, 0, 1]])

Deleting row in numpy array based on condition

I have a 2D numpy array of shape [6,2] and I want to remove the subarrays with the third element containing 0.
array([[0, 2, 1], #Input
[0, 1, 1],
[1, 1, 0],
[1, 0, 2],
[0, 2, 0],
[2, 1, 2]])
array([[0, 2, 1], #Output
[0, 1, 1],
[1, 0, 2],
[2, 1, 2]])
My code is positives = gt_boxes[np.where(gt_boxes[range(gt_boxes.shape[0]),2] != 0)]
It works but is there a simplified method to this?
You can use boolean indexing.
In [413]: x[x[:, -1] != 0]
Out[413]:
array([[0, 2, 1],
[0, 1, 1],
[1, 0, 2],
[2, 1, 2]])
x[:, -1] will retrieve the last column
x[:, -1] != 0 returns a boolean mask
Use the mask to index into the original array

How do I set cell values in `np.array()` based on condition?

I have a numpy array and a list of valid values in that array:
import numpy as np
arr = np.array([[1,2,0], [2,2,0], [4,1,0], [4,1,0], [3,2,0], ... ])
valid = [1,4]
Is there a nice pythonic way to set all array values to zero, that are not in the list of valid values and do it in-place? After this operation, the list should look like this:
[[1,0,0], [0,0,0], [4,1,0], [4,1,0], [0,0,0], ... ]
The following creates a copy of the array in memory, which is bad for large arrays:
arr = np.vectorize(lambda x: x if x in valid else 0)(arr)
It bugs me, that for now I loop over each array element and set it to zero if it is in the valid list.
Edit: I found an answer suggesting there is no in-place function to achieve this. Also stop changing my whitespaces. It's easier to see the changes in arr whith them.
You can use np.place for an in-situ update -
np.place(arr,~np.in1d(arr,valid),0)
Sample run -
In [66]: arr
Out[66]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [67]: np.place(arr,~np.in1d(arr,valid),0)
In [68]: arr
Out[68]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])
Along the same lines, np.put could also be used -
np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
Sample run -
In [70]: arr
Out[70]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [71]: np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
In [72]: arr
Out[72]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])
Indexing with booleans would work too:
>>> arr = np.array([[1, 2, 0], [2, 2, 0], [4, 1, 0], [4, 1, 0], [3, 2, 0]])
>>> arr[~np.in1d(arr, valid).reshape(arr.shape)] = 0
>>> arr
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])

Sort numpy.array rows by indices

I have 2D numpy.array and a tuple of indices:
a = array([[0, 0], [0, 1], [1, 0], [1, 1]])
ix = (2, 0, 3, 1)
How can I sort array's rows by the indices? Expected result:
array([[1, 0], [0, 0], [1, 1], [0, 1]])
I tried using numpy.take, but it works as I expect only with 1D arrays.
You can in fact use ndarray.take() for this. The trick is to supply the second argument (axis):
>>> a.take(ix, 0)
array([[1, 0],
[0, 0],
[1, 1],
[0, 1]])
(Without axis, the array is flattened before elements are taken.)
Alternatively:
>>> a[ix, ...]
array([[1, 0],
[0, 0],
[1, 1],
[0, 1]])

Categories