numpy where operation on 2D array - python

I have a numpy array 'A' of size 571x24 and I am trying to find the index of zeros in it so I do:
>>>A.shape
(571L, 24L)
import numpy as np
z1 = np.where(A==0)
z1 is a tuple with following size:
>>> len(z1)
2
>>> len(z1[0])
29
>>> len(z1[1])
29
I was hoping to create a z1 of same size as A. How do I achieve that?
Edit: I want to create array z1 of booleans for presence of zero in A such that:
>>>z1.shape
(571L, 24L)

You can just check this with the equality operator in python with numpy. Example:
>>> A = np.array([[0,2,2,1],[2,0,0,3]])
>>> A == 0
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)
np.where() does something else, see documentation. Although, it is possible to achieve this with np.where() using broadcasting. See documentation.
>>> np.where(A == 0, True, False)
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)

Try this:
import numpy as np
myarray = np.array([[0,3,4,5],[9,4,0,4],[1,2,3,4]])
ix = np.in1d(myarray.ravel(), 0).reshape(myarray.shape)
Output of ix:
array([[ True, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)

Related

Numpy array all and any operation with combination in 3d array

I'm very new to numpy. I have an array like this and want to apply some operations on it.
It's easy in 2d array, lots of example are there but won't any such in 3d array.
arr = np.array([[[ True, True, False, True],[False, False, True, False],[False, False, False, False], [False, False, False, False],[False, False, True, False]],[[False, False, False, False], [True, True, True, True], [False, False, False, False], [False, False, False, False], [True, True, True, True] ], [[False, False, False, False], [False, False, False, False], [True, True, True, True], [False, False, False, False], [False, False, False, False] ] ])
If after applying np.all in the inner arrays the result would be: [[False, False, False, False, False], [False, True, False, False, True], [False, False, True, False, False]]
Then after np.any result would be: [False, True, True]
What, I mean is to get the overall index of array where any of the all 1d array values are true. Like in my certain case it should return index first and second, as at first index the sub first index and sub fourth index of array all values true and in second index sub second index all array values are true while in zero'th index not a single array whose all value are true.
What I've done so far is
np.all(np.any(arr, axis=1), axis=1)
OR
np.any(np.all(arr, axis=1), axis=1)
But both are not fruitfull, yes ,I can solve by comprehension but don't want any type looping which is gonna be my last option.
I think the issue comes from confusion in the axis handling.
As a rule of thumb, you can consider that the innermost brackets contain data of the highest dimension.
In your example, if you type arr.shape in your interpreter it will return the following tuple (3, 5, 4) which represent the size of arr in each dimension. If you now do the same with the expected result of the first operation you mentioned in your question you'll get (3, 5), so it looks like you (sort of speak) want to project the data of the highest dimension (i.e. axis=2) to the other axis. Same thing for the next operation except that the highest dimension is now axis=1.
To conclude, the following line would do the job np.any(np.all(arr, axis=2), axis=1).

Detect if any value is above zero and change it

I have the following array:
[(True,False,True), (False,False,False), (False,False,True)]
If any element contains a True then they should all be true. So the above should become:
[(True,True,True), (False,False,False), (True,True,True)]
My below code attempts to do that but it simply converts all elements to True:
a = np.array([(True,False,True), (False,False,False), (False,True,False)], dtype='bool')
aint = a.astype('int')
print(aint)
aint[aint.sum() > 0] = (1,1,1)
print(aint.astype('bool'))
The output is:
[[1 0 1]
[0 0 0]
[0 1 0]]
[[ True True True]
[ True True True]
[ True True True]]
You could try np.any, which tests whether any array element along a given axis evaluates to True.
Here's a quick line of code that uses a list comprehension to get your intended result.
lst = [(True,False,True), (False,False,False), (False,False,True)]
result = [(np.any(x),) * len(x) for x in lst]
# result is [(True, True, True), (False, False, False), (True, True, True)]
I'm no numpy wizard but this should return what you want.
import numpy as np
def switch(arr):
if np.any(arr):
return np.ones(*arr.shape).astype(bool)
return arr.astype(bool)
np.apply_along_axis(switch, 1, a)
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
ndarray.any along axis=1 and np.tile will get job done
np.tile(a.any(1)[:,None], a.shape[1])
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
Create an array of True's based on the original array's second dimension and assign it to all rows that have a True in it.
>>> a
array([[ True, False, True],
[False, False, False],
[False, True, False]])
>>> a[a.any(1)] = np.ones(a.shape[1], dtype=bool)
>>> a
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
>>>
Relies on Broadcasting.

Can someone please explain np.less_equal.outer(range(1,18),range(1,13))

I was debugging a code written by someone who has left the organization and came across a line, which uses np.less_equal.outer & np.greater_equal.outer functions. I know that np.outer creates a Cartesian cross product of two 1-dimensional arrays and creates two arrays, and np.less_equal compares the element of two arrays and returns true or false. Can someone please explain how this combined form works.
Thanks!
less_equal and greater_equal are special types of numpy functions called ufuncs, in that they have extendible functionalities, including accumulate, at, and outer.
In this case ufunc.outer extends the function to work similarly to the outer product - but while the actual outer product would be multiply.outer, this instead does the greater or less than comparison.
So you get a 2d array of booleans corresponding to each element of the first array, and whether they are greater or less than each of the elements in the second array.
np.less_equal.outer(range(1,18),range(1,13))
Out[]:
array([[ True, True, True, ..., True, True, True],
[False, True, True, ..., True, True, True],
[False, False, True, ..., True, True, True],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]], dtype=bool)
EDIT: a much more pythonic way of doing this would be:
np.triu(np.ones((18, 13), dtype = bool), 0)
That is, the upper triangle of a boolean array of shape (18, 13)
From the documentation, we have that for one-dimensional arrays A and B, the operation np.less_equal.outer(A, B) is equivalent to:
m = len(A)
n = len(B)
r = empty(m, n)
for i in range(m):
for j in range(n):
r[i,j] = (A[i] <= B[j])
Here's the mathematical representation of the result:
here is an example:
np.less_equal([4, 2, 1], [2, 2, 2])
array([False, True, True])
np.greater_equal([4, 2, 1], [2, 2, 2])
array([ True, True, False], dtype=bool)
and first the outer function
np.outer(range(1,2), range(1,3))
array([[1 2 3],
[2 4 6],
)
hope that helps.

What is best way to find first half of True's in boolean numpy array?

Here is the problem:
Take a numpy boolean array:
a = np.array([False, False, True, True, True, False, True, False])
Which I am using as indexes to panda dataframe. But I need to create 2 new arrays where they each have half the True's as the original array. Note the example arrays are much shorter than actual set.
So like:
1st_half = array([False, False, True, True, False, False, False, False])
2nd_half = array([False, False, False, False, True, False, True, False])
Does anyone have a good way to do this? Thanks.
First find true indices
inds = np.where(a)[0]
cutoff = inds[inds.shape[0]//2]
Set values equivalent before and after cutoff:
b = np.zeros(a.shape,dtype=bool)
c = np.zeros(a.shape,dtype=bool)
c[cutoff:] = a[cutoff:]
b[:cutoff] = a[:cutoff]
Results:
b
Out[65]: array([False, False, True, True, False, False, False, False], dtype=bool)
c
Out[64]: array([False, False, False, False, True, False, True, False], dtype=bool)
There are numerous ways to handle the indexing.

Combining 3 boolean masks in Python

I have 3 lists:
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
When I type
a or b or c
I want to get back a list that's
[True, True, True]
but I'm getting back
[True, False, True]
Any ideas on why? And how can I combine these masks?
Your or operators are comparing the lists as entire objects, not their elements. Since a is not an empty list, it evaluates as true, and becomes the result of the or. b and c are not even evaluated.
To produce the logical OR of the three lists position-wise, you have to iterate over their contents and OR the values at each position. To convert a bunch of iterables into a list of their grouped elements, use zip(). To check if any element in an iterable is true (the OR of its entire contents), use any(). Do these two at once with a list comprehension:
mask = [any(tup) for tup in zip(a, b, c)]
How about this:
from numpy import asarray as ar
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
Try:
>>> ar(a) | ar(b) | ar(c) #note also the use `|` instead of `or`
array([ True, True, True], dtype=bool)
So no need for zip etc.
or returns the first operand if it evaluates as true, and a non-empty list evaluates as true; so, a or b or c will always return a if it's a non-empty list.
Probably you want
[any(t) for t in zip(a, b, c)]
(this works also for element-wise and if you replace any with all)
a is treated as true because it contains values; b, c is not evaluated.
>>> bool([])
False
>>> bool([True])
True
>>> bool([False])
True
>>> [False] or [True]
[False]
According to Boolean Operations:
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
how about
result = numpy.logical_or(a, b, c)
print(result)
Try this:
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
res = [a[i] or b[i] or c[i] for i in range(len(a))]
print res
Here is how to do it fast (large arrays) using numpy:
import numpy as np
a = [True, False, True,...]
b = [False, False, True,...]
c = [True, True, False,...]
res = (np.column_stack((a,b,c)).any(axis=1)
print res
Note that a becomes the first column, b the second, and so on when using np.column_stack(). Then do a np.any() (logical OR) on that array along axis=1 which will compare the first elements of a,b, and c, and so on, and so on; resulting in a boolean vector that is the same length as the vectors you want to compare.
This is a good place to use one of python's asterisk functions (*args, **kwargs), which will allow you to pass 3 or 300000 lists.
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
data_lists = [a,b,c]
Here you use * to expand that list as arguments, zip to rearrange your data into columns (as lists), and any to check if any cell in that column is True.
[any(l) for l in zip(*data_lists)]
[True, True, True]
If you're working with a lot of lists it's the same, e.g.
import numpy as np
data_lists = [[True if (i == j) else False for i in range(7)] for j in range(7)]
np.matrix(data_lists)
matrix([[ True, False, False, False, False, False, False],
[False, True, False, False, False, False, False],
[False, False, True, False, False, False, False],
[False, False, False, True, False, False, False],
[False, False, False, False, True, False, False],
[False, False, False, False, False, True, False],
[False, False, False, False, False, False, True]])
[any(l) for l in zip(*data_lists)]
[True, True, True, True, True, True, True]

Categories