Numpy array , Find columns with 1s - python

I am looking to find columns in a numpy array where atleast one cell has a 1.
Input Array
[0,0,1,0,0,0,1,0,0,1]
[0,1,0,0,0,0,0,0,1,0]
[0,0,0,0,0,0,0,1,0,0]
[0,0,0,1,0,0,1,0,0,0]
Expected Output
[0,1,1,1,0,0,1,1,1,1]

Use numpy.any with axis=0 (to flatten along the first axis, i.e. flatten along the rows):
>>> np.any(a, axis=0)
array([False, True, True, True, False, False, True, True, True, True], dtype=bool)
Of course, you can convert the boolean array into integers easily:
>>> np.any(a, axis=0)*1
array([0, 1, 1, 1, 0, 0, 1, 1, 1, 1])

You can simply | (or) them all together:
>>> np.array([0,0,1,0,0,0,1,0,0,1]) | np.array([0,1,0,0,0,0,0,0,1,0])
array([0, 1, 1, 0, 0, 0, 1, 0, 1, 1])

Related

Numpy: Mask of elements in array1 that are also elements of array2

I wonder if there is a numpy natural way of creating a binary mask over array1 for elements that are also in array2. Another way to say it, binary mask over array1 for intersection of array1 and 2.
This works:
def bin_mask(a, b):
return sum(a==n for n in b)
a = np.array([1,2,3,4,5,6,7,8,9,20])
b = np.array([3,5,7])
In: bin_mask(a,b)
Out: array([0, 0, 1, 0, 1, 0, 1, 0, 0, 0])
But I wonder if there is some numpy prebuilt I am missing.
Edit:
Correct answer from comments: np.isin(a, b). I also marked in1d as the correct answer. Both work.
in1d method does the trick too:
>>> np.in1d(a,b)
array([False, False, True, False, True, False, True, False, False, False])
>>> np.in1d(a,b).astype(int)
array([0, 0, 1, 0, 1, 0, 1, 0, 0, 0])

Changing the array value above the first non-zero element in the column

I'm looking for vectorized way to changing the array value above the first non-zero element in the column.
for x in range(array.shape[1]):
for y in range(array.shape[0]):
if array[y,x]>0:
break
else:
array[y,x]=255
In
Out
As you wrote about an array (not a DataFrame), I assume that you have
a Numpy array and want to use Numpy methods.
To do your task, run the following code:
np.where(np.cumsum(np.not_equal(array, 0), axis=0), array, 255)
Example and explanation of steps:
The source array:
array([[0, 1, 0],
[0, 0, 1],
[1, 1, 0],
[1, 0, 0]])
np.not_equal(array, 0) computes a boolean array with True for
elements != 0:
array([[False, True, False],
[False, False, True],
[ True, True, False],
[ True, False, False]])
np.cumsum(..., axis=0) computes cumulative sum (True counted as 1)
along axis 0 (in columns):
array([[0, 1, 0],
[0, 1, 1],
[1, 2, 1],
[2, 2, 1]], dtype=int32)
​4. The above array is a mask used in where. For masked values (where
the corresponding element of the mask is True (actually, != 0)),
take values from corresponding elements of array, otherwise take 255:
np.where(..., array, 255)
The result (for my array) is:
array([[255, 1, 255],
[255, 0, 1],
[ 1, 1, 0],
[ 1, 0, 0]])
Use masking:
array[array == 0] = 255

Elementwise AND or OR operations in python for 2D array

Is there any method in python to compute elementwise OR or AND operations for 2D arrays across rows or columns?
For example, for the following array, elementwise OR operations across row would result in a vector [1, 0, 0, 0, 0, 0, 0, 0].
array([[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
numpy has logical_or, logical_xor and logical_and which have a reduce method
>> np.logical_or.reduce(a, axis=0)
array([ True, False, False, False, False, False, False, False], dtype=bool)
as you see in the example they coerce to bool dtype, so if you require uint8 you have to cast back in the end.
since bools are stored as bytes you can use cheap viewcasting for that.
with the axis keyword you can select along which axis to reduce. it is possible to select multiple axes
>> np.logical_or.reduce(a, axis=1)
array([ True, True, True, True], dtype=bool)
>>> np.logical_or.reduce(a, axis=(0, 1))
True
the keepdims keyword is useful for broadcasting, for example to find all "crosses" of rows and columns >= 2 in array b
>>> b = np.random.randint(0,10, (4, 4))
>>> b
array([[0, 5, 3, 4],
[4, 1, 5, 4],
[4, 5, 5, 5],
[2, 4, 6, 1]])
>>> rows = np.logical_and.reduce(b >= 2, axis=1, keepdims=True)
# keepdims=False (default) -> rows.shape==(4,) keepdims=True -> rows.shape==(4, 1)
>>> cols = np.logical_and.reduce(b >= 2, axis=0, keepdims=True)
# keepdims=False (default) -> cols.shape==(4,) keepdims=True -> cols.shape==(1, 4)
>>> rows & cols # shapes (4, 1) and (1, 4) are broadcast to (4, 4)
array([[False, False, False, False],
[False, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)
notice the slight abuse of the & operator which stands for bitwise_and. since the effect is the same on bools (in fact trying to use and in this place would have thrown an exception) this is common practice
as #ajcr points out the popular np.any and np.all are shorthand for np.logical_or.reduce and np.logical_and.reduce.
note, however, that there are subtle differences
>>> np.logical_or.reduce(a)
array([ True, False, False, False, False, False, False, False], dtype=bool)
>>> np.any(a)
True
OR:
if you want to stick with uint8 and know for certain all your entries will be 0 and 1 you can use bitwise_and, bitwise_or and bitwise_xor
>>> np.bitwise_or.reduce(a, axis=0)
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)

Instantiate a matrix with x zeros and the rest ones

I would like to be able to quickly instantiate a matrix where the first few (variable number of) cells in a row are 0, and the rest are ones.
Imagine we want a 3x4 matrix.
I have instantiated the matrix first as all ones:
ones = np.ones([4,3])
Then imagine we have an array that announces how many leading zeros there are:
arr = np.array([2,1,3,0]) # first row has 2 zeroes, second row 1 zero, etc
Required result:
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
Obviously this can be done in the opposite way as well, but I'd consider the approach where 1 is a default value, and zeros would be replaced.
What would be the best way to avoid some silly loop?
Here's one way. n is the number of columns in the result. The number of rows is determined by len(arr).
In [29]: n = 5
In [30]: arr = np.array([1, 2, 3, 0, 3])
In [31]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[31]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
There are two parts to the explanation of how this works. First, how to create a row with m zeros and n-m ones? For that, we use np.arange to create a row with values [0, 1, ..., n-1]`:
In [35]: n
Out[35]: 5
In [36]: np.arange(n)
Out[36]: array([0, 1, 2, 3, 4])
Next, compare that array to m:
In [37]: m = 2
In [38]: np.arange(n) >= m
Out[38]: array([False, False, True, True, True], dtype=bool)
That gives an array of boolean values; the first m values are False and the rest are True. By casting those values to integers, we get an array of 0s and 1s:
In [39]: (np.arange(n) >= m).astype(int)
Out[39]: array([0, 0, 1, 1, 1])
To perform this over an array of m values (your arr), we use broadcasting; this is the second key idea of the explanation.
Note what arr[:, np.newaxis] gives:
In [40]: arr
Out[40]: array([1, 2, 3, 0, 3])
In [41]: arr[:, np.newaxis]
Out[41]:
array([[1],
[2],
[3],
[0],
[3]])
That is, arr[:, np.newaxis] reshapes arr into a 2-d array with shape (5, 1). (arr.reshape(-1, 1) could have been used instead.) Now when we compare this to np.arange(n) (a 1-d array with length n), broadcasting kicks in:
In [42]: np.arange(n) >= arr[:, np.newaxis]
Out[42]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[ True, True, True, True, True],
[False, False, False, True, True]], dtype=bool)
As #RogerFan points out in his comment, this is basically an outer product of the arguments, using the >= operation.
A final cast to type int gives the desired result:
In [43]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[43]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
Not as concise as I wanted (I was experimenting with mask_indices), but this will also do the work:
>>> n = 3
>>> zeros = [2, 1, 3, 0]
>>> numpy.array([[0] * zeros[i] + [1]*(n - zeros[i]) for i in range(len(zeros))])
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
>>>
Works very simple: concatenates multiplied required number of times, one-element lists [0] and [1], creating the array row by row.

How can you turn an index array into a mask array in Numpy?

Is it possible to convert an array of indices to an array of ones and zeros, given the range?
i.e. [2,3] -> [0, 0, 1, 1, 0], in range of 5
I'm trying to automate something like this:
>>> index_array = np.arange(200,300)
array([200, 201, ... , 299])
>>> mask_array = ??? # some function of index_array and 500
array([0, 0, 0, ..., 1, 1, 1, ... , 0, 0, 0])
>>> train(data[mask_array]) # trains with 200~299
>>> predict(data[~mask_array]) # predicts with 0~199, 300~499
Here's one way:
In [1]: index_array = np.array([3, 4, 7, 9])
In [2]: n = 15
In [3]: mask_array = np.zeros(n, dtype=int)
In [4]: mask_array[index_array] = 1
In [5]: mask_array
Out[5]: array([0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0])
If the mask is always a range, you can eliminate index_array, and assign 1 to a slice:
In [6]: mask_array = np.zeros(n, dtype=int)
In [7]: mask_array[5:10] = 1
In [8]: mask_array
Out[8]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
If you want an array of boolean values instead of integers, change the dtype of mask_array when it is created:
In [11]: mask_array = np.zeros(n, dtype=bool)
In [12]: mask_array
Out[12]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False], dtype=bool)
In [13]: mask_array[5:10] = True
In [14]: mask_array
Out[14]:
array([False, False, False, False, False, True, True, True, True,
True, False, False, False, False, False], dtype=bool)
For a single dimension, try:
n = (15,)
index_array = [2, 5, 7]
mask_array = numpy.zeros(n)
mask_array[index_array] = 1
For more than one dimension, convert your n-dimensional indices into one-dimensional ones, then use ravel:
n = (15, 15)
index_array = [[1, 4, 6], [10, 11, 2]] # you may need to transpose your indices!
mask_array = numpy.zeros(n)
flat_index_array = np.ravel_multi_index(
index_array,
mask_array.shape)
numpy.ravel(mask_array)[flat_index_array] = 1
There's a nice trick to do this as a one-liner, too - use the numpy.in1d and numpy.arange functions like this (the final line is the key part):
>>> x = np.linspace(-2, 2, 10)
>>> y = x**2 - 1
>>> idxs = np.where(y<0)
>>> np.in1d(np.arange(len(x)), idxs)
array([False, False, False, True, True, True, True, False, False, False], dtype=bool)
The downside of this approach is that it's ~10-100x slower than the appropch Warren Weckesser gave... but it's a one-liner, which may or may not be what you're looking for.
As requested, here it is in an answer. The code:
[x in index_array for x in range(500)]
will give you a mask like you asked for, but it will use Bools instead of 0's and 1's.

Categories