Python boolean arrays confusion in jupyter IDE - python

I am new to boolean arrays and find these statements confusing
import numpy as np
a = np.arange(5)
the output of array a is : array([0, 1, 2, 3, 4])
But when i write down this
b = a[True, True, False, False, False]
and print the array b using
print(b)
the output is :
[]
As far as I understand I want to transfer some elements from array a to array b, but why is b empty?
What is happening in this code?

Try nested brackets:
a = np.arange(5)
b = a[[True, True, False, False, False]]
print(b)
Output:
[0 1]

Related

How do I pass a list as changing condition in an array?

Let's say that I have an numpy array a = [1 2 3 4 5 6 7 8] and I want to change everything else but 1,2 and 3 to 0. With a list b = [1,2,3] a tried a[a not in b] = 0, but Python does not accept this. Currently I'm using a for loop like this:
c = a.unique()
for i in c:
if i not in b:
a[a == i] = 0
Which works very slowly (Around 900 different values in a 3D array around the size of 1000x1000x1000) and doesn't fell like the optimal solution for numpy. Is there a more optimal way doing it in numpy?
You can use numpy.isin() to create a boolean mask to use as an index:
np.isin(a, b)
# array([ True, True, True, False, False, False, False, False])
Use ~ to do the opposite:
~np.isin(a, b)
# array([False, False, False, True, True, True, True, True])
Using this to index the original array lets you assign zero to the specific elements:
a = np.array([1,2,3,4,5,6,7,8])
b = np.array([1, 2, 3])
a[~np.isin(a, b)] = 0
print(a)
# [1 2 3 0 0 0 0 0]

Finding instances similar in two lists with the same shape

I am working with a timeseries data. Let's say I have two lists of equal shape and I need to find instances where both lists have numbers greater than zero at the same position.
To break it down
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
We can see that in four positions, both lists have numbers greater than 0. There are other instances , but I am sure if I can get a simple code, all it needs is adjusting the signs and I can also utilize in a larger scale.
I have tried
len([1 for i in A if i > 0 and 1 for i in B if i > 0 ])
But I think the answer it's giving me is a product of both instances instead.
Since you have a numpy tag:
A = np.array([1,0,2,0,4,6,0,5])
B = np.array([0,0,5,6,7,5,0,2])
mask = ((A>0)&(B>0))
# array([False, False, True, False, True, True, False, True])
mask.sum()
# 4
A[mask]
# array([2, 4, 6, 5])
B[mask]
# array([5, 7, 5, 2])
In pure python (can be generalized to any number of lists):
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
mask = [all(e>0 for e in x) for x in zip(A, B)]
# [False, False, True, False, True, True, False, True]
If you want to use vanilla python, this should be doing what you are looking for
l = 0
for i in range(len(A)):
if A[i] > 0 and B[i] > 0:
l = l + 1

Using np.where for nested lists

I am trying to use np.where() function with nested lists.
I would like to find an index with a given condition of the first layer of the nested list.
For example, if I have the following code
arr = [[1,1], [2,2],[3,3]]
a = np.where(arr == [2,2])
then ideally I would like code to return 'a' as 1.
Since [2,2] is in index 1 of the nested list.
However, I am just getting a empty array back as a result.
Of course, I can make it work easily by implementing external for loop such as
for n in range(len(arr)):
if arr[n] == [2,2]:
a = n
but I would like to implement this simply within the function np.where(write the entire code here).
Is there a way to do this?
Well you can write your own function to do so:
You'll need to
Find every line equal to what you looking for
Get indices of found rows (You can use where):
numpy compression
You can use compression operator to see if each line satisfies the condition. Such as:
np_arr = np.array(
[1, 2, 3, 4, 5]
)
print(np_arr < 3)
This will return a boolean where every element is True or False where the condition is satisfied:
[ True True False False False]
For a 2D array you'll get a 2D boolean array:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print(np_arr == to_find)
The result is:
[[False False]
[ True True]
[False False]
[ True True]]
Now we are looking for lines with all True values. So we can use all method of ndarray. And we will provide the axis we are looking to look to all. X, Y or Both. We want to look to x axis:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print((np_arr == to_find).all(axis=1))
The result is:
[False True False True]
Get indices of Trues
At the end you are looking for indices where the values are True:
np.where((np_arr == to_find).all(axis=1))
The result would be:
(array([1, 3]),)
The best solution is that mentioned by #Michael Szczesny, but using np.where you can do this too:
a = np.where(np.array(arr) == [2, 2])[0]
resulted_ind = np.where(np.bincount(a) == 2)[0] # --> [1]
numpy runs in Python, so you can use both the basic Python lists and numpy arrays (which are more like MATLAB matrices)
A list of lists:
In [43]: alist = [[1,1], [2,2],[3,3]]
A list has an index method, which tests against each element of the list (elements here are 2 element lists):
In [44]: alist.index([2,2])
Out[44]: 1
In [45]: alist.index([2,3])
Traceback (most recent call last):
Input In [45] in <cell line: 1>
alist.index([2,3])
ValueError: [2, 3] is not in list
alist==[2,2] returns False, because the list is not the same as the [2,2] list.
If we make an array from that list:
In [46]: arr = np.array(alist)
In [47]: arr
Out[47]:
array([[1, 1],
[2, 2],
[3, 3]])
we can do an == test - but it compares numeric elements.
In [48]: arr == np.array([2,2])
Out[48]:
array([[False, False],
[ True, True],
[False, False]])
Underlying this comparison is the concept of broadcasting, allow it to compare a (3,2) array with a (2,) (a 2d with a 1d). Here's its trivial, but it can be much more complicated.
To find rows where all values are True, use:
In [50]: (arr == np.array([2,2])).all(axis=1)
Out[50]: array([False, True, False])
and where finds the True in that array (the result is a tuple with 1 array):
In [51]: np.where(_)
Out[51]: (array([1]),)
In Octave the equivalent is:
>> arr = [[1,1];[2,2];[3,3]]
arr =
1 1
2 2
3 3
>> all(arr == [2,2],2)
ans =
0
1
0
>> find(all(arr == [2,2],2))
ans = 2

How to count repeated elements in a numpy 2d array?

I have many very large padded numpy 2d arrays, simplified to array A, shown below. Array Z is the basic pad array:
A = np.array(([1 , 2, 3], [2, 3, 4], [0, 0, 0], [0, 0, 0], [0, 0, 0]))
Z = np.array([0, 0, 0])
How to count the number of pads in array A in the simplest / fastest pythonic way?
This works (zCount=3), but seems verbose, loopy and unpythonic:
zCount = 0
for a in A:
if a.any() == Z.any():
zCount += 1
zCount
Also tried a one-line list comprehension, which doesn't work (dont know why not):
[zCount += 1 for a in A if a.any() == Z.any()]
zCount
Also tried a list count, but 'truth value of array with more than one element is ambiguous':
list(A).count(Z)
Have searched for a simple numpy expression without success. np.count_nonzero gives full elementwise boolean for [0]. Is there a one-word / one-line counting expression for [0, 0, 0]? (My actual arrays are approx. shape (100,30) and I have up to millions of these. I am trying to deal with them in batches, so any simple time savings generating a count would be helpful). thx
Try:
>>> np.equal(A, Z).all(axis=1).sum()
3
Step by step:
>>> np.equal(A, Z)
array([[False, False, False],
[False, False, False],
[ True, True, True],
[ True, True, True],
[ True, True, True]])
>>> np.equal(A, Z).all(axis=1)
array([False, False, True, True, True])
>>> np.equal(A, Z).all(axis=1).sum()
3

Iterate over a 2d numpy array until it contains only some specific values

I have a 2d numpy array which contains float numbers in each cell.
I'd like to iterate over it and change value of each cell (if a specific condition is matched), until it contains only the values 1, -1, or NaN in each cell.
How can I achieve this?
In numpy you can use a conditional indexing. i.e.:
import numpy as np
x = np.arange(10)
c = x > 5
print c
will give
array([False, False, False, False, False, False, True, True, True,
True], dtype=bool)
and finally use the condition
x[c] = -1
print x
gives array([ 0, 1, 2, 3, 4, 5, -1, -1, -1, -1])

Categories