Using Python numpy where condition to change entires below a certain value

Using Python numpy where condition to change entires below a certain value - python

Here's my array:
import numpy as np
a = np.array([0, 5.0, 0, 5.0, 5.0])
Is it possible to use numpy.where in some way to add a value x to all those entires in a that are less than l?
So something like:
a = a[np.where(a < 5).add(2.5)]
Should return:
array([2.5, 5.0, 2.5, 5.0, 5.0])

a = np.array([0., 5., 0., 5., 5.])
a[np.where(a < 5)] += 2.5
in case you really want to use where or just
a[a < 5] += 2.5
which I usually use for these kind of operations.

You could use np.where to create the array of additions and then simply add to a -
a + np.where(a < l, 2.5,0)
Sample run -
In [16]: a = np.array([1, 5, 4, 5, 5])
In [17]: l = 5
In [18]: a + np.where(a < l, 2.5,0)
Out[18]: array([ 3.5, 5. , 6.5, 5. , 5. ])

Given that you probably need to change the dtype (from int to float) you need to create a new array. A simple way without explicit .astype or np.where calls is multiplication with a mask:
>>> b = a + (a < 5) * 2.5
>>> b
array([ 2.5, 5. , 2.5, 5. , 5. ])
with np.where this can be changed to a simple expression (using the else-condition, third argument, in where):
>>> a = np.where(a < 5, a + 2.5, a)
>>> a
array([ 2.5, 5. , 2.5, 5. , 5. ])

a += np.where(a < 1, 2.5, 0)
where will return the second argument wherever the condition (first argument) is satisfied and the third argument otherwise.

You can use a "masked array" as an index. Boolean operations, such as a < 1 return such an array.
>>> a<1
array([False, False, False, False, False], dtype=bool)
you can use it as
>>> a[a<1] += 1
The a<1 part selects only the items in a that match the condition. You can operate on this part only then.
If you want to keep a trace of your selection, you can proceed in two steps.
>>> mask = a>1
>>> a[mask] += 1
Also, you can count the items matching the conditions:
>>> print np.sum(mask)

Related

How to set threshold values in a numpy array?

I have an array of values and I want to set specific values to integers. Anything below 0.95 set to 0, anything above 1.6 set to 2. How can I set everything between 0.95 and 1.6 to 1?
n1_binary = np.where(n1_img_resize < 0.95, 0, n1_img_resize)
n1_binary = np.where(n1_binary > 1.6, 2, n1_binary)

Like this in one line using np.where:
n1_binary = np.where((n1_binary > 0.95) & (n1_binary <= 1.6), 1, n1_binary)
Check below example:
In [652]: a = np.array([0.99, 1.23, 1.7, 9])
In [653]: a = np.where((a > 0.95) & (a <= 1.6), 1, a)
In [654]: a
Out[654]: array([1. , 1. , 1.7, 9. ])

Try this:
a = np.array([0.3, 5, 7, 2])
a[a < 0.95] = 0
a[a > 1.6] = 1
This is very clear and consice, and tells exactly what you are doing. a now is:
[0.0, 1.0, 1.0, 1.0]

Re-assign values with multiple if statements Numpy

I have a large Numpy ndarray, here is a sample of that:
myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[np.nan,0.2,0.3,4.2,15.1]])
myarray
array([[ 1.01, 9.4 , 0.0 , 6.9 , 5.7 ],
[ 1.9 , 2.6 , nan, 4.7 , -2.45],
[ nan, 0.2 , 0.3 , 4.2 , 15.1 ]])
As you can see, my array contains floats, positive, negative, zeros and NaNs. I would like to re-assign (re-class) the values in the array based on multiple if statements. I've read many answers and docs but all of which I've seen refer to a simple one or two conditions which can be easily be resolved using np.where for example.
I have multiple condition, for the sake of simplicity let's say I have four conditions (the desired solution should be able to handle more conditions). My conditions are:
if x > 6*y:
x=3
elif x < 4*z:
x=2
elif x == np.nan:
x=np.nan # maybe pass is better?
else:
x=0
where x is a value in the array, y and z are variable that will change among arrays. For example, array #1 will have y=5, z=2, array #2 will have y = 0.9, z= 0.5 etc. The condition for np.nan just means that if a value is nan, do not alter it, keep it nan.
Note that this needs to be executed at the same time, because if I use several np.where one after the other, than condition #2 will overwrite condition #1.
I tried to create a function and then apply it on the array but with no success. It seems that in order to apply a function to an array, the function must include only one argument (the array), and if I out to use a function, it should contain 3 arguments: the array, and y and z values.
What would be the most efficient way to achieve my goal?

In [11]: myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[
...: np.nan,0.2,0.3,4.2,15.1]])
In [13]: y, z = 0.9, 0.5
If I perform one of your tests on the whole array:
In [14]: mask1 = myarray >6*y
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater
It's the np.nan that cause this warning.
So lets first identify those nan (and replace):
In [25]: mask0 = np.isnan(myarray)
In [26]: mask0
Out[26]:
array([[False, False, False, False, False],
[False, False, True, False, False],
[ True, False, False, False, False]])
In [27]: arr = myarray.copy()
In [28]: arr[mask0] = 0 # temp replace the nan with 0
myarray == np.nan does not work; it produces False everywhere.
arr = np.nan_to_num(myarray) also works, replacing the nan with 0.
Now find the masks for the y and z tests. It doesn't matter how these handle the original nan (now 0). Calculate both masks first to reduce mutual interference.
In [29]: mask1 = arr > 6*y
In [30]: mask2 = arr < 4*z
In [31]: arr[mask1]
Out[31]: array([ 9.4, 6.9, 5.7, 15.1])
In [32]: arr[mask2]
Out[32]: array([ 1.01, 0. , 1.9 , 0. , -2.45, 0. , 0.2 , 0.3 ])
In [33]: arr[mask0]
Out[33]: array([0., 0.])
Since you want everything else to be 0, lets initial an array of zeros:
In [34]: res = np.zeros_like(arr)
now apply the 3 masks:
In [35]: res[mask1] = 3
In [36]: res[mask2] = 2
In [37]: res[mask0] = np.nan
In [38]: res
Out[38]:
array([[ 2., 3., 2., 3., 3.],
[ 2., 0., nan, 0., 2.],
[nan, 2., 2., 0., 3.]])
I could have applied the masks to arr:
In [40]: arr[mask1] = 3 # np.where(mask1, 3, arr) should also work
In [41]: arr[mask2] = 2
In [42]: arr[mask0] = np.nan
In [43]: arr
Out[43]:
array([[2. , 3. , 2. , 3. , 3. ],
[2. , 2.6, nan, 4.7, 2. ],
[nan, 2. , 2. , 4.2, 3. ]])
I still have to use some logic to combine the masks to identify the slots that are supposed to be 0.

numpy first occurence in array for an array of reference values

Given a threshold alpha and a numpy array a, there are multiple possibilities for finding the first index i such that arr[i] > alpha; see Numpy first occurrence of value greater than existing value:
numpy.searchsorted(a, alpha)+1
numpy.argmax(a > alpha)
In my case, alpha can be either a scalar or an array of arbitrary shape. I'd like to have a function get_lowest that works in both cases:
alpha = 1.12
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # 2
alpha = numpy.array(1.12, -0.5, 2.7])
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # [2, 0, 3]
Any hints?

You can use broadcasting:
In [9]: arr = array([ 0. , 1.1, 1.2, 3. ])
In [10]: alpha = array([ 1.12, -0.5 , 2.7 ])
In [11]: np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
Out[11]: array([2, 0, 3])
To collapse multidimensional arrays, you can use np.squeeze, but you might have to do something special if you want a Python float in your first case:
def get_lowest(arr, alpha):
b = np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
b = np.squeeze(b)
if np.size(b) == 1:
return float(b)
return b

searchsorted actually does the trick:
np.searchsorted(a, alpha)
The axis argument to argmax helps out; this
np.argmax(numpy.add.outer(alpha, -a) < 0, axis=-1)
does the trick. Indeed
import numpy as np
a = np.array([0.0, 1.1, 1.2, 3.0])
alpha = 1.12
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # 0
np.searchsorted(a, alpha) # 0
alpha = np.array([1.12, -0.5, 2.7])
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # [2 0 3]
np.searchsorted(a, alpha) # [2 0 3]

how does numpy.where work?

I can understand following numpy behavior.
>>> a
array([[ 0. , 0. , 0. ],
[ 0. , 0.7, 0. ],
[ 0. , 0.3, 0.5],
[ 0.6, 0. , 0.8],
[ 0.7, 0. , 0. ]])
>>> argmax_overlaps = a.argmax(axis=1)
>>> argmax_overlaps
array([0, 1, 2, 2, 0])
>>> max_overlaps = a[np.arange(5),argmax_overlaps]
>>> max_overlaps
array([ 0. , 0.7, 0.5, 0.8, 0.7])
>>> gt_argmax_overlaps = a.argmax(axis=0)
>>> gt_argmax_overlaps
array([4, 1, 3])
>>> gt_max_overlaps = a[gt_argmax_overlaps,np.arange(a.shape[1])]
>>> gt_max_overlaps
array([ 0.7, 0.7, 0.8])
>>> gt_argmax_overlaps = np.where(a == gt_max_overlaps)
>>> gt_argmax_overlaps
(array([1, 3, 4]), array([1, 2, 0]))
I understood 0.7, 0.7 and 0.8 is a[1,1],a[3,2] and a[4,0] so I got the tuple (array[1,3,4] and array[1,2,0]) each array of which composed of 0th and 1st indices of those three elements. I then tried other examples to see my understanding is correct.
>>> np.where(a == [0.3])
(array([2]), array([1]))
0.3 is in a[2,1] so the outcome looks as I expected. Then I tried
>>> np.where(a == [0.3, 0.5])
(array([], dtype=int64),)
?? I expected to see (array([2,2]),array([2,3])). Why do I see the output above?
>>> np.where(a == [0.7, 0.7, 0.8])
(array([1, 3, 4]), array([1, 2, 0]))
>>> np.where(a == [0.8,0.7,0.7])
(array([1]), array([1]))
I can't understand the second result either. Could someone please explain it to me? Thanks.

The first thing to realize is that np.where(a == [whatever]) is just showing you the indices where a == [whatever] is True. So you can get a hint by looking at the value of a == [whatever]. In your case that "works":
>>> a == [0.7, 0.7, 0.8]
array([[False, False, False],
[False, True, False],
[False, False, False],
[False, False, True],
[ True, False, False]], dtype=bool)
You aren't getting what you think you are. You think that is asking for the indices of each element separately, but instead it's getting the positions where the values match at the same position in the row. Basically what this comparison is doing is saying "for each row, tell me whether the first element is 0.7, whether the second is 0.7, and whether the third is 0.8". It then returns the indices of those matching positions. In other words, the comparison is done between entire rows, not just individual values. For your last example:
>>> a == [0.8,0.7,0.7]
array([[False, False, False],
[False, True, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
You now get a different result. It's not asking for "the indices where a has value 0.8", it's asking for only the indices where there is a 0.8 at the beginning of the row -- and likewise a 0.7 in either of the later two positions.
This type of row-wise comparison can only be done if the value you compare against matches the shape of a single row of a. So when you try it with a two-element list, it returns an empty set, because there it is trying to compare the list as a scalar value against individual values in your array.
The upshot is that you can't use == on a list of values and expect it to just tell you where any of the values occurs. The equality will match by value and position (if the value you compare against is the same shape as a row of your array), or it will try to compare the whole list as a scalar (if the shape doesn't match). If you want to search for the values independently, you need to do something like what Khris suggested in a comment:
np.where((a==0.3)|(a==0.5))
That is, you need to make two (or more) separate comparisons against separate values, not a single comparison against a list of values.

Numpy - delete data rows with negative values

I am taking the output from AUTO numerical continuation package, and need to filter out results that have negative values of the variables, as they are non-physical. So if I have, for example:
>>> a = np.array([[0,1,2,3,4],[-1,-0.5,0,0.5,1],[-3,-4,-5,0.1,0.2]])
I would like to be left with:
>>> b
array([[ 3. , 4. ],
[ 0.5, 1. ],
[ 0.1, 0.2]])
But when I try numpy.where I get:
>>> b = a[:,(np.where(a[1]>=0) and np.where(a[2]>=0))]
>>> b
array([[[ 3. , 4. ]],
[[ 0.5, 1. ]],
[[ 0.1, 0.2]]])
>>> b.shape
(3, 1, 2)
That is, it adds another unwanted axis to the array. What am I doing wrong?

Assuming all you want to do is to remove columns that have one or more negative values, you could do this:
a = np.array([[0,1,2,3,4],[-1,-0.5,0,0.5,1],[-3,-4,-5,0.1,0.2]])
b = a[:,a.min(axis=0)>=0]

If what you want are fully positive columns, then #Yakym's answer is the way to go as it is probably faster. However, if it was just an example and you want to threshold certain columns, you can do it by sightly modifying your example:
>>> a[:, (a[1] >= 0) & (a[2] >= 0)]
array([[ 3. , 4. ],
[ 0.5, 1. ],
[ 0.1, 0.2]])
Here (a[1] >= 0) and (a[2] >= 2) create boolean mask that are merged by the & (boolean/logical and) operator and used to index the array a.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Python numpy where condition to change entires below a certain value - python

Here's my array: import numpy as np a = np.array([0, 5.0, 0, 5.0, 5.0]) Is it possible to use numpy.where in some way to add a value x to all those entires in a that are less than l? So something like: a = a[np.where(a < 5).add(2.5)] Should return: array([2.5, 5.0, 2.5, 5.0, 5.0])

a = np.array([0., 5., 0., 5., 5.]) a[np.where(a < 5)] += 2.5 in case you really want to use where or just a[a < 5] += 2.5 which I usually use for these kind of operations.

You could use np.where to create the array of additions and then simply add to a - a + np.where(a < l, 2.5,0) Sample run - In [16]: a = np.array([1, 5, 4, 5, 5]) In [17]: l = 5 In [18]: a + np.where(a < l, 2.5,0) Out[18]: array([ 3.5, 5. , 6.5, 5. , 5. ])

a += np.where(a < 1, 2.5, 0) where will return the second argument wherever the condition (first argument) is satisfied and the third argument otherwise.

Related

How to set threshold values in a numpy array?

Re-assign values with multiple if statements Numpy

numpy first occurence in array for an array of reference values

how does numpy.where work?

Numpy - delete data rows with negative values

Categories

Resources