Remove duplicate index in 2D array - python

I have this 2D numpy array here:
arr = np.array([[1,2],
[2,2],
[3,2],
[4,2],
[5,3]])
I would like to delete all duplicates corresponding to the previous index at index 1 and get an output like so:
np.array([[1,2],
[5,3]])
However, when I try my code it errors.
Here is my code:
for x in range(0, len(arr)):
if arr[x][1] == arr[x-1][1]:
arr = np.delete(arr, x, 0)
>>> IndexError: index 3 is out of bounds for axis 0 with size 2

Rather than trying to delete from the array, you can use np.unique to find the indices of first occurrences of the unique values in the second columns and use that to pull those values out:
import numpy as np
arr = np.array([[1,2],
[2,2],
[3,2],
[4,2],
[5,3]])
u, i = np.unique(arr[:,1], return_index=True)
arr[i]
# array([[1, 2],
# [5, 3]])

Related

Get values of pandas series from a array of index locations

I have a 2-d array of an index of a pandas series. Would like to create a 2-d array of the values from the pandas series that correspond to the index.
For example:
import pandas as pd
import numpy as np
A = pd.Series(data=[1,2,3,4,5])
idx = np.array([[0,2,3],[2,3,1]])
Would like to return:
B = np.array([[1,3,4],[3,4,2]])
I know I could do this as a loop:
B = np.zeros((2,3))
for i in [0,1]:
B[i,:] = test[idx[i]]
However, in practice need to do this repeatedly so would like to broadcast the index locations directly. Pandas is not necessary, happy to do it all in numpy if easier.
Something like this might work:
A[idx.flatten()].values.reshape(idx.shape)
A[idx] gives a Cannot index with multidimensional key error.
In [190]: A = pd.Series(data=[1,2,3,4,5])
...: idx = np.array([[0,2,3],[2,3,1]])
But the 1d array derived from the Series, can be indexed this way:
In [191]: A.values
Out[191]: array([1, 2, 3, 4, 5])
In [192]: A.values[idx]
Out[192]:
array([[1, 3, 4],
[3, 4, 2]])
numpy has no problems returning an array with a dimension that matches idx.
Indexing the Series like this returns a Series - which by definition is 1d:
In [194]: A[idx.ravel()]
Out[194]:
0 1
2 3
3 4
2 3
3 4
1 2
dtype: int64

How to add element to empty 2d numpy array

I'm trying to insert elements to an empty 2d numpy array. However, I am not getting what I want.
I tried np.hstack but it is giving me a normal array only. Then I tried using append but it is giving me an error.
Error:
ValueError: all the input arrays must have same number of dimensions
randomReleaseAngle1 = np.random.uniform(20.0, 77.0, size=(5, 1))
randomVelocity1 = np.random.uniform(40.0, 60.0, size=(5, 1))
randomArray =np.concatenate((randomReleaseAngle1,randomVelocity1),axis=1)
arr1 = np.empty((2,2), float)
arr = np.array([])
for i in randomArray:
data = [[170, 68.2, i[0], i[1]]]
df = pd.DataFrame(data, columns = ['height', 'release_angle', 'velocity', 'holding_angle'])
test_y_predictions = model.predict(df)
print(test_y_predictions)
if (np.any(test_y_predictions == 1)):
arr = np.hstack((arr, np.array([i[0], i[1]])))
arr1 = np.append(arr1, np.array([i[0], i[1]]), axis=0)
print(arr)
print(arr1)
I wanted to get something like
[[1.5,2.2],
[3.3,4.3],
[7.1,7.3],
[3.3,4.3],
[3.3,4.3]]
However, I'm getting
[56.60290125 49.79106307 35.45102444 54.89380834 47.09359271 49.19881675
22.96523274 44.52753514 67.19027156 54.10421167]
The recommended list append approach:
In [39]: alist = []
In [40]: for i in range(3):
...: alist.append([i, i+10])
...:
In [41]: alist
Out[41]: [[0, 10], [1, 11], [2, 12]]
In [42]: np.array(alist)
Out[42]:
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
If we start with a empty((2,2)) array:
In [47]: arr = np.empty((2,2),int)
In [48]: arr
Out[48]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952]])
In [49]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[49]:
array([[139934912589760, 139934912589784],
[139934871674928, 139934871674952],
[ 1, 10],
[ 2, 11]])
Note that empty does not mean the same thing as the list []. It's a real 2x2 array, with 'unspecified' values. And those values remain when we add other arrays to it.
I could start with an array with a 0 dimension:
In [51]: arr = np.empty((0,2),int)
In [52]: arr
Out[52]: array([], shape=(0, 2), dtype=int64)
In [53]: np.concatenate((arr, [[1,10]],[[2,11]]), axis=0)
Out[53]:
array([[ 1, 10],
[ 2, 11]])
That looks more like the list append approach. But why start with the (0,2) array in the first place?
np.concatenate takes a list of arrays (or lists that can be made into arrays). I used nested lists that make (1,2) arrays. With this I can join them on axis 0.
Each concatenate makes a new array. So if done iteratively it is more expensive than the list append.
np.append just takes 2 arrays and does a concatenate. So doesn't add much. hstack tweaks shapes and joins on the 2nd (horizontal) dimension. vstack is another variant. But they all end up using concatenate.
With the hstack method, you can just reshape after you get the final array:
arr = arr.reshape(-1, 2)
print(arr)
The other method can be more easily done in a similar way:
arr1 = np.append(arr1, np.array([i[0], i[1]]) # in the loop
arr1 = arr1.reshape(-1, 2)
print(arr1)

How to index and assign to a tensor in tensorflow?

I have a tensor as follows and a numpy 2D array
k = 1
mat = np.array([[1,2],[3,4],[5,6]])
for row in mat:
values_zero, indices_zero = tf.nn.top_k(row, len(row) - k)
row[indices_zero] = 0 #????
I want to assign the elements in that row to be zero at those indices. However I can't index a tensor and assign to it as well. I have tried using the tf.gather function but how can I do an assignment? I want to keep it as a tensor and then run it in a session at the end if that is possible.
I guess you are trying to mask the maximum in each row to zero? If so, I would do it like this. The idea is to create the tensor by construction rather than assignment.
import numpy as np
import tensorflow as tf
mat = np.array([[1, 2], [3, 4], [5, 6]])
# All tensorflow from here
tmat = tf.convert_to_tensor(mat)
# Get index of maximum
max_inds = tf.argmax(mat, axis=1)
# Create an array of column indices in each row
shape = tmat.get_shape()
inds = tf.range(0, shape[1], dtype=max_inds.dtype)[None, :]
# Create boolean mask of maximums
bmask = tf.equal(inds, max_inds[:, None])
# Convert boolean mask to ones and zeros
imask = tf.where(bmask, tf.zeros_like(tmat), tf.ones_like(tmat))
# Create new tensor that is masked with maximums set to zer0
newmat = tmat * imask
with tf.Session() as sess:
print(newmat.eval())
which outputs
[[1 0]
[3 0]
[5 0]]
One way to do this is by advanced indexing:
In [87]: k = 1
In [88]: mat = np.array([[1,2],[3,4],[5,6]])
# `sess` is tf.InteractiveSession()
In [89]: vals, idxs = sess.run(tf.nn.top_k(mat, k=1))
In [90]: idxs
Out[90]:
array([[1],
[1],
[1]], dtype=int32)
In [91]: mat[:, np.squeeze(idxs)[0]] = 0
In [92]: mat
Out[92]:
array([[1, 0],
[3, 0],
[5, 0]])

How to group values in matrix with items of unequal length

Lets say I have a simple array:
a = np.arange(3)
And an array of indices with the same length:
I = np.array([0, 0, 1])
I now want to group the values based on the indices.
How would I group the elements of the first array to produce the result below?
np.array([[0, 1], [2], dtype=object)
Here is what I tried:
a = np.arange(3)
I = np.array([0, 0, 1])
out = np.empty(2, dtype=object)
out.fill([])
aslists = np.vectorize(lambda x: [x], otypes=['object'])
out[I] += aslists(a)
However, this approach does not concatenate the lists, but only maintains the last value for each index:
array([[1], [2]], dtype=object)
Or, for a 2-dimensional case:
a = np.random.rand(100)
I = (np.random.random(100) * 5 //1).astype(int)
J = (np.random.random(100) * 5 //1).astype(int)
out = np.empty((5, 5), dtype=object)
out.fill([])
How can I append the items from a to out based on the two index arrays?
1D Case
Assuming I being sorted, for a list of arrays as output -
idx = np.unique(I, return_index=True)[1]
out = np.split(a,idx)[1:]
Another with slicing to get idx for splitting a -
out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
To get an array of lists as output -
np.array([i.tolist() for i in out])
Sample run -
In [84]: a = np.arange(3)
In [85]: I = np.array([0, 0, 1])
In [86]: out = np.split(a, np.flatnonzero(I[1:] != I[:-1])+1)
In [87]: out
Out[87]: [array([0, 1]), array([2])]
In [88]: np.array([i.tolist() for i in out])
Out[88]: array([[0, 1], [2]], dtype=object)
2D Case
For 2D case of filling into a 2D array with groupings made from indices in two arrays I and J that represent the rows and columns where the groups are to be assigned, we could do something like this -
ncols = 5
lidx = I*ncols+J
sidx = lidx.argsort() # Use kind='mergesort' to keep order
lidx_sorted = lidx[sidx]
unq_idx, split_idx = np.unique(lidx_sorted, return_index=True)
out.flat[unq_idx] = np.split(a[sidx], split_idx)[1:]

Find Indices Of Columns Having Some Nonzero Element In A 2d array

I have a numpy array with dim (157,1944).
I want to get indices of columns that have a Nonzero element in any row.
example: [[0,0,3,4], [0,0,1,1]] ----> [2,3]
If you look each row, there is a Non Zero element in columns [2, 3]
So if I have
[[0,1,3,4], [0,0,1,1]]
I should get [1,2,3] because column index 0 has no Nonzero elements in any row.
Not sure if your question is completely defined. However, say we start with
import numpy as np
a = np.array([[0,0,3,4], [0,0,1,1]])
then
>>> np.nonzero(np.all(a != 0, axis=0))[0]
array([2, 3])
are the indices of the columns for which none of the rows are nonzero, and
>>> np.nonzero(np.any(a != 0, axis=0))[0]
array([2, 3])
are the indices of the columns for which not all of the rows are zero (it happens to be the same for the example you gave).

Categories