ndarray delete rows by indexes from condition in another array

ndarray delete rows by indexes from condition in another array - python

I have two ndarrays, where the length of the first dimension of X is the same as the size of y:
X = np.asarray([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[3, 6, 1]])
y = np.asarray([1, 0, 2, 3])
and I have a list:
l = [0, 2, 7]
I want to delete every row from X , if the value of row of the same index from y is in l.
So in that case, I will have:
X = np.asarray([[1, 2, 3],
[3, 6, 1]])
That is because the 2nd and 3rd elements from y - are in l. Therefore, the 2nd and 3rd rows should be deleted from X.
How can it be done?

A simple one-liner solution would be using delete and argwhere
X = np.delete(X, np.argwhere(np.isin(y, l)).flatten(), axis=0)
Output
array([[1, 2, 3],
[3, 6, 1]])

Related

Different length sum on numpy array axis

Suppose to have, a numpy 3D tensor D of dimension r x c x d, such as:
r = 2
c = 3
d = 3
D = np.array([[[1, 5, 3], [1, 2, 5], [1, 4, 3]], [[1, 1, 6], [3, 1, 7], [5, 1, 3]]])
array([[[1, 5, 3],
[1, 2, 5],
[1, 4, 3]],
[[1, 1, 6],
[3, 1, 7],
[5, 1, 3]]])
and a 2D integer matrix Q of dimensions r x c, such as:
Q = np.array([[1, 1, 2], [2, 1, 2]])
array([[1, 1, 2],
[2, 1, 2]])
where every element in Q is less than d.
I need to sum the first Q[r_i][c_i] element of the third dimension of matrix D for every 0 < r_i < r and 0 < c_i < c.
The expected results (Res) using the example above is a 2D matrix of r x c (2x3):
Res = np.array([[6, 3, 8], [8, 4, 5]])
array([[6, 3, 8],
[8, 4, 5]])
My actual solution is using a list comprehension looping over r_i and c_i:
r = 2
c = 3
res = np.array([[np.sum(D[r_i, c_i, :Q[r_i, c_i]+1]) for c_i in range(c)] for r_i in range(r)])
There is a more efficient or elegant solution to solve this problem?

Let us try:
# this is equivalent to double loop on r_i, c_i
x,y = np.ogrid[:r, :c]
# we take the cumsum on the last axis,
# then extract the Q[r_i, c_i]'th sum at r_i, c_i
out = D.cumsum(axis=-1)[x,y, Q]
Output:
array([[6, 3, 8],
[8, 4, 9]])
Cross check
np.allclose(out, res)
# True

How to loop back to beginning of the array for out of bounds index in numpy?

I have a 2D numpy array that I want to extract a submatrix from.
I get the submatrix by slicing the array as below.
Here I want a 3*3 submatrix around an item at the index of (2,3).
>>> import numpy as np
>>> a = np.array([[0, 1, 2, 3],
... [4, 5, 6, 7],
... [8, 9, 0, 1],
... [2, 3, 4, 5]])
>>> a[1:4, 2:5]
array([[6, 7],
[0, 1],
[4, 5]])
But what I want is that for indexes that are out of range, it goes back to the beginning of array and continues from there. This is the result I want:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
I know that I can do things like getting mod of the index to the width of the array; but I'm looking for a numpy function that does that.
And also for an one dimensional array this will cause an index out of range error, which is not really useful...

This is one way using np.pad with wraparound mode.
>>> a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
>>> pad_width = 1
>>> i, j = 2, 3
>>> startrow, endrow = i-1+pad_width, i+2+pad_width # for 3 x 3 submatrix
>>> startcol, endcol = j-1+pad_width, j+2+pad_width
>>> np.pad(a, (pad_width, pad_width), 'wrap')[startrow:endrow, startcol:endcol]
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
Depending on the shape of your patch (eg. 5 x 5 instead of 3 x 3) you can increase the pad_width and start and end row and column indices accordingly.

np.take does have a mode parameter which can wrap-around out of bound indices. But it's a bit hacky to use np.take for multidimensional arrays since the axis must be a scalar.
However, In your particular case you could do this:
a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
np.take(a, np.r_[2:5], axis=1, mode='wrap')[1:4]
Output:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
EDIT
This function might be what you are looking for (?)
def select3x3(a, idx):
x,y = idx
return np.take(np.take(a, np.r_[x-1:x+2], axis=0, mode='wrap'), np.r_[y-1:y+2], axis=1, mode='wrap')
But in retrospect, i recommend using modulo and fancy indexing for this kind of operation (it's basically what the mode='wrap' is doing internally anyways):
def select3x3(a, idx):
x,y = idx
return a[np.r_[x-1:x+2][:,None] % a.shape[0], np.r_[y-1:y+2][None,:] % a.shape[1]]
The above solution is also generalized for any 2d shape on a.

Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array

Suppose I have two NumPy arrays
x = [[5, 2, 8],
[4, 9, 1],
[7, 8, 9],
[1, 3, 5],
[1, 2, 3],
[1, 2, 4]]
y = [0, 0, 1, 1, 1, 2]
I want to efficiently split the array x into sub-arrays according to the values in y.
My desired outputs would be
z_0 = [[5, 2, 8],
[4, 9, 1]]
z_1 = [[7, 8, 9],
[1, 3, 5],
[1, 2, 3]]
z_2 = [[1, 2, 4]]
Assuming that y starts with zero and is sorted in ascending order, what is the most efficient way to do this?
Note: This question is the sorted version of this question:
Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

If y is grouped (doesn't have to be sorted), you can use diff to get the split points:
indices = np.flatnonzero(np.diff(y)) + 1
You can pass those directly to np.split:
z = np.split(x, indices, axis=0)
If you want to know the labels too:
labels = y[np.r_[0, indices]]

How to sort an nx3 numpy array by column(s) but it remembers the data in that row?

First off, I'm very new to python and so any tips/help is really appreciated.
Essentially I want an nx3 numpy array to be sorted initially by the second column then by the third but I want all of the data in the row to remain with each other.
Like so:
import numpy as np
a = np.array([[20, 2, 4],
[7, 5, 6],
[25, 1, 5],
[2, 2, 3],
[3, 5, 8],
[4, 1, 3]])
......... (n times)
In this array the first column represents the value, the second it's x coordinate and the third its y coordinate. What is the best way to do a descending sort the array by first the x coordinate, then do a descending sort on the y coordinate whilst value still stays assigned to the x and y coordinate?
So after the sort, it looks like this:
a = ([[4, 1, 3],
[25, 1, 5],
[2, 2, 3],
[20, 2, 4],
[7, 5, 6],
[3, 5, 8]])
......... (n times)
As you can see how can it first sort the x coordinate then with sort all the y coordinates which have the same x coordinates. As it first finds all x coordinates of 1 then within that sort the y coordinates. Whilst the value, x and y coordinates all remain on the same row with each other.

Easiest way is to convert it into a pandas dataframe then it's easier to manipulate it.
import pandas as pd
df = pd.DataFrame({'a': [6, 2, 1], 'b': [4, 5, 6]})
print(df)
Out
a b
0 6 4
1 2 5
2 1 6
sorteddf = df.sort_values(by='a')
print(sorteddf)
Out
a b
2 1 6
1 2 5
0 6 4

Take a look at the 'order' parameter: https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
import numpy as np
dtype = [('x',int),('y',int)]
values = [(1,7),(3,4),(1,4),(1,3)]
a = np.array(values,dtype=dtype)
print(np.sort(a, order=['x','y']))

The easiest way is to first sort by y and then sort the result by x, so for equals values of x the final result will be sorted by y.
Full code:
import numpy as np
a = np.array([[20, 2, 4],
[7, 5, 6],
[25, 1, 5],
[2, 2, 3],
[3, 5, 8],
[4, 1, 3]])
a = a[a[:,2].argsort()] # sort according to column 2
a = a[a[:,1].argsort()] # sort according to column 1
result
a
array([[ 4, 1, 3],
[25, 1, 5],
[ 2, 2, 3],
[20, 2, 4],
[ 7, 5, 6],
[ 3, 5, 8]])

Python: How to create a list of integers depending on a specific distribution

Is there a way in python/numpy/scipy to create dynamically a list of integers in a specific range, which can vary and in which the numbers are ordererd depending on a distribtuin, like nomral(gaussian), exponential, linear. I imagine something
like for range 3:
[1,2,3]
[2,1,2]
[1,2,1]
[3,2,1]
for range 4:
[1,2,3,4]
[2,1,1,2]
[1,2,2,1]
[4,3,2,1]
for range 5:
[1,2,3,4,5]
[2,1,0,1,2]
[1,2,3,2,1]
[5,4,3,2,1]

We could use a bit of trickery using np.minimum to generate the symmetrical version in third row. The second row is just a complement of the third row subtracted from 3. The first and last rows are just ranges starting from 1 till n and flipped version of it respectively.
Thus, we would have one approach after row-stacking those rows to have a 2D array, like so -
def ranged_arr(n):
r = np.arange(n)+1
row3 = np.minimum(r,r[::-1])
return np.c_[r, 3-row3, row3, r[::-1]].T
We could also use np.row_stack to do the stacking -
np.row_stack((r, 3-row3, row3, r[::-1]))
Sample runs -
In [106]: ranged_arr(n=3)
Out[106]:
array([[1, 2, 3],
[2, 1, 2],
[1, 2, 1],
[3, 2, 1]])
In [107]: ranged_arr(n=4)
Out[107]:
array([[1, 2, 3, 4],
[2, 1, 1, 2],
[1, 2, 2, 1],
[4, 3, 2, 1]])
In [108]: ranged_arr(n=5)
Out[108]:
array([[1, 2, 3, 4, 5],
[2, 1, 0, 1, 2],
[1, 2, 3, 2, 1],
[5, 4, 3, 2, 1]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

ndarray delete rows by indexes from condition in another array - python

A simple one-liner solution would be using delete and argwhere X = np.delete(X, np.argwhere(np.isin(y, l)).flatten(), axis=0) Output array([[1, 2, 3], [3, 6, 1]])

Related

Different length sum on numpy array axis

How to loop back to beginning of the array for out of bounds index in numpy?

Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array

How to sort an nx3 numpy array by column(s) but it remembers the data in that row?

Python: How to create a list of integers depending on a specific distribution

Categories

Resources