reduce ndarray dimesion to max nonzero index - python

I have a large ndarray and I want to reduce it's first dimension, replacing it with the highest nonzero index number of that dimension. For example, if [:,0,0] is (0,1,6,0,0), I would like to [0,0] of the new array to equal "3". I've run ndarray.nonzero, but I'm unsure of where to go from there. Any help appreciated. A little code below to get started:
import numpy as np
myary = np.random.randint(0,4,(10,5,5))
newary = np.nonzero(myary)

"Mask, flip and argmax" solution
You can still use argmax for another purpose: getting the first Trues along the flipped subarrays.
Condensed version
def last_nonzero(arr):
return arr.shape[-1] - np.argmax(np.flip(arr>0, axis=-1), axis=-1) - 1
last_nonzero(np.array([[0, 1, 6, 0, 0], [1, 2, 0, 3, 0]]))
# > array([2, 3], dtype=int64)
Explanation
Just generating sparse data:
import numpy as np
myary = np.random.randint(0,2, (4,3,2))*np.random.randint(0,20,(4, 3, 2))
# > array([[[18, 5, 3, 17],
# [ 0, 0, 0, 0],
# [ 0, 14, 0, 17]],
# [[ 0, 0, 10, 8],
# [ 0, 18, 13, 0],
# [ 7, 3, 0, 0]]])
We can now mask for values > 0 to get a boolean array, then flip it along the last axis:
flipped_mask = np.flip(myary>0, axis=-1)
# > array([[[ True, True, True, True],
# [False, False, False, False],
# [ True, False, True, False]],
# [[ True, True, False, False],
# [False, True, True, False],
# [False, False, True, True]]])
Now we can take the argmax along the last axis. It will give us the index of the first True in each subarray of the mask.
max_mask = np.argmax(flipped_mask, axis=-1)
# > array([[0, 0, 0],
# [0, 1, 2]], dtype=int64)
Finally, since we processed the flipped version, we must "flip back" the indices by substracting them to the length of the subarrays (minus 1 because of zero-indexing):
last_nonzero = myary.shape[-1] - max_mask - 1
# > array([[3, 3, 3],
# [3, 2, 1]], dtype=int64)
Observation
If a "subarray" is full of zeros only, this solution will return the last index of the subarray.
last_nonzero(np.array([[0,1,2,3,0], [0, 0, 0, 0, 0]]))
# > array([3, 4], dtype=int64)

Related

How to efficiently filter/create mask of numpy.array based on list of tuples

I try to create mask of numpy.array based on list of tuples. Here is my solution that produces expected result:
import numpy as np
filter_vals = [(1, 1, 0), (0, 0, 1), (0, 1, 0)]
data = np.array([
[[0, 0, 0], [1, 1, 0], [1, 1, 1]],
[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
[[1, 1, 0], [0, 1, 1], [1, 0, 1]],
])
mask = np.array([], dtype=bool)
for f_val in filter_vals:
if mask.size == 0:
mask = (data == f_val).all(-1)
else:
mask = mask | (data == f_val).all(-1)
Output/mask:
array([[False, True, False],
[False, True, True],
[ True, False, False]]
Problem is that with bigger data array and increasing number of tuples in filter_vals, it is getting slower.
It there any better solution? I tried to use np.isin(data, filter_vals), but it does not provide result I need.
A classical approach using broadcasting would be:
*A, B = data.shape
(data.reshape((-1,B)) == np.array(filter_vals)[:,None]).all(-1).any(0).reshape(A)
This will however be memory expensive. So applicability really depends on your use case.
output:
array([[False, True, False],
[False, True, True],
[ True, False, False]])

is there a method for finding the indexes of a 2d-array based on a given array

suppose we have two arrays like these two:
A=np.array([[1, 4, 3, 0, 5],[6, 0, 7, 12, 11],[20, 15, 34, 45, 56]])
B=np.array([[4, 5, 6, 7]])
I intend to write a code in which I can find the indexes of an array such as A based on values in
the array B
for example, I want the final results to be something like this:
C=[[0 1]
[0 4]
[1 0]
[1 2]]
can anybody provide me with a solution or a hint?
Do you mean?
In [375]: np.isin(A,B[0])
Out[375]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
In [376]: np.argwhere(np.isin(A,B[0]))
Out[376]:
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]])
B shape of (1,4) where the initial 1 isn't necessary. That's why I used B[0], though isin, via in1d ravels it anyways.
where is result is often more useful
In [381]: np.where(np.isin(A,B))
Out[381]: (array([0, 0, 1, 1]), array([1, 4, 0, 2]))
though it's a bit harder to understand.
Another way to get the isin array:
In [383]: (A==B[0,:,None,None]).any(axis=0)
Out[383]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
You can try in this way by using np.where().
index = []
for num in B:
for nums in num:
x,y = np.where(A == nums)
index.append([x,y])
print(index)
>>array([[0,1],
[0,4],
[1,0],
[1,2]])
With zip and np.where:
>>> list(zip(*np.where(np.in1d(A, B).reshape(A.shape))))
[(0, 1), (0, 4), (1, 0), (1, 2)]
Alternatively:
>>> np.vstack(np.where(np.isin(A,B))).transpose()
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]], dtype=int64)

Changing the array value above the first non-zero element in the column

I'm looking for vectorized way to changing the array value above the first non-zero element in the column.
for x in range(array.shape[1]):
for y in range(array.shape[0]):
if array[y,x]>0:
break
else:
array[y,x]=255
In
Out
As you wrote about an array (not a DataFrame), I assume that you have
a Numpy array and want to use Numpy methods.
To do your task, run the following code:
np.where(np.cumsum(np.not_equal(array, 0), axis=0), array, 255)
Example and explanation of steps:
The source array:
array([[0, 1, 0],
[0, 0, 1],
[1, 1, 0],
[1, 0, 0]])
np.not_equal(array, 0) computes a boolean array with True for
elements != 0:
array([[False, True, False],
[False, False, True],
[ True, True, False],
[ True, False, False]])
np.cumsum(..., axis=0) computes cumulative sum (True counted as 1)
along axis 0 (in columns):
array([[0, 1, 0],
[0, 1, 1],
[1, 2, 1],
[2, 2, 1]], dtype=int32)
​4. The above array is a mask used in where. For masked values (where
the corresponding element of the mask is True (actually, != 0)),
take values from corresponding elements of array, otherwise take 255:
np.where(..., array, 255)
The result (for my array) is:
array([[255, 1, 255],
[255, 0, 1],
[ 1, 1, 0],
[ 1, 0, 0]])
Use masking:
array[array == 0] = 255

Elementwise AND or OR operations in python for 2D array

Is there any method in python to compute elementwise OR or AND operations for 2D arrays across rows or columns?
For example, for the following array, elementwise OR operations across row would result in a vector [1, 0, 0, 0, 0, 0, 0, 0].
array([[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
numpy has logical_or, logical_xor and logical_and which have a reduce method
>> np.logical_or.reduce(a, axis=0)
array([ True, False, False, False, False, False, False, False], dtype=bool)
as you see in the example they coerce to bool dtype, so if you require uint8 you have to cast back in the end.
since bools are stored as bytes you can use cheap viewcasting for that.
with the axis keyword you can select along which axis to reduce. it is possible to select multiple axes
>> np.logical_or.reduce(a, axis=1)
array([ True, True, True, True], dtype=bool)
>>> np.logical_or.reduce(a, axis=(0, 1))
True
the keepdims keyword is useful for broadcasting, for example to find all "crosses" of rows and columns >= 2 in array b
>>> b = np.random.randint(0,10, (4, 4))
>>> b
array([[0, 5, 3, 4],
[4, 1, 5, 4],
[4, 5, 5, 5],
[2, 4, 6, 1]])
>>> rows = np.logical_and.reduce(b >= 2, axis=1, keepdims=True)
# keepdims=False (default) -> rows.shape==(4,) keepdims=True -> rows.shape==(4, 1)
>>> cols = np.logical_and.reduce(b >= 2, axis=0, keepdims=True)
# keepdims=False (default) -> cols.shape==(4,) keepdims=True -> cols.shape==(1, 4)
>>> rows & cols # shapes (4, 1) and (1, 4) are broadcast to (4, 4)
array([[False, False, False, False],
[False, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)
notice the slight abuse of the & operator which stands for bitwise_and. since the effect is the same on bools (in fact trying to use and in this place would have thrown an exception) this is common practice
as #ajcr points out the popular np.any and np.all are shorthand for np.logical_or.reduce and np.logical_and.reduce.
note, however, that there are subtle differences
>>> np.logical_or.reduce(a)
array([ True, False, False, False, False, False, False, False], dtype=bool)
>>> np.any(a)
True
OR:
if you want to stick with uint8 and know for certain all your entries will be 0 and 1 you can use bitwise_and, bitwise_or and bitwise_xor
>>> np.bitwise_or.reduce(a, axis=0)
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)

Select elements of numpy array via boolean mask array

I have a boolean mask array a of length n:
a = np.array([True, True, True, False, False])
I have a 2d array with n columns:
b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
I want a new array which contains only the "True"-values, for example
c = ([[1,2,3], [1,2,3]])
c = a * b does not work because it contains also "0" for the false columns what I don't want
c = np.delete(b, a, 1) does not work
Any suggestions?
You probably want something like this:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
>>> b[:,a]
array([[1, 2, 3],
[1, 2, 3]])
Note that for this kind of indexing to work, it needs to be an ndarray, like you were using, not a list, or it'll interpret the False and True as 0 and 1 and give you those columns:
>>> b[:,[True, True, True, False, False]]
array([[2, 2, 2, 1, 1],
[2, 2, 2, 1, 1]])
You can use numpy.ma module and use np.ma.masked_array function to do so.
>>> x = np.array([1, 2, 3, -1, 5])
>>> mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0])
masked_array(data=[1, 2, 3, --, 5], mask=[False, False, False, True, False], fill_value=999999)
Hope I'm not too late! Here's your array:
X = np.array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
Let's create an array of zeros of the same shape as X:
mask = np.zeros_like(X)
# array([[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0]])
Then, specify the columns that you want to mask out or hide with a 1. In this case, we want the last 2 columns to be masked out.
mask[:, -2:] = 1
# array([[0, 0, 0, 1, 1],
# [0, 0, 0, 1, 1]])
Create a masked array:
X_masked = np.ma.masked_array(X, mask)
# masked_array(data=[[1, 2, 3, --, --],
# [1, 2, 3, --, --]],
# mask=[[False, False, False, True, True],
# [False, False, False, True, True]],
# fill_value=999999)
We can then do whatever we want with X_masked, like taking the sum of each column (along axis=0):
np.sum(X_masked, axis=0)
# masked_array(data=[2, 4, 6, --, --],
# mask=[False, False],
# fill_value=1e+20)
Great thing about this is that X_masked is just a view of X, not a copy.
X_masked.base is X
# True

Categories