Re-assign values with multiple if statements Numpy - python

I have a large Numpy ndarray, here is a sample of that:
myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[np.nan,0.2,0.3,4.2,15.1]])
myarray
array([[ 1.01, 9.4 , 0.0 , 6.9 , 5.7 ],
[ 1.9 , 2.6 , nan, 4.7 , -2.45],
[ nan, 0.2 , 0.3 , 4.2 , 15.1 ]])
As you can see, my array contains floats, positive, negative, zeros and NaNs. I would like to re-assign (re-class) the values in the array based on multiple if statements. I've read many answers and docs but all of which I've seen refer to a simple one or two conditions which can be easily be resolved using np.where for example.
I have multiple condition, for the sake of simplicity let's say I have four conditions (the desired solution should be able to handle more conditions). My conditions are:
if x > 6*y:
x=3
elif x < 4*z:
x=2
elif x == np.nan:
x=np.nan # maybe pass is better?
else:
x=0
where x is a value in the array, y and z are variable that will change among arrays. For example, array #1 will have y=5, z=2, array #2 will have y = 0.9, z= 0.5 etc. The condition for np.nan just means that if a value is nan, do not alter it, keep it nan.
Note that this needs to be executed at the same time, because if I use several np.where one after the other, than condition #2 will overwrite condition #1.
I tried to create a function and then apply it on the array but with no success. It seems that in order to apply a function to an array, the function must include only one argument (the array), and if I out to use a function, it should contain 3 arguments: the array, and y and z values.
What would be the most efficient way to achieve my goal?

In [11]: myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[
...: np.nan,0.2,0.3,4.2,15.1]])
In [13]: y, z = 0.9, 0.5
If I perform one of your tests on the whole array:
In [14]: mask1 = myarray >6*y
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater
It's the np.nan that cause this warning.
So lets first identify those nan (and replace):
In [25]: mask0 = np.isnan(myarray)
In [26]: mask0
Out[26]:
array([[False, False, False, False, False],
[False, False, True, False, False],
[ True, False, False, False, False]])
In [27]: arr = myarray.copy()
In [28]: arr[mask0] = 0 # temp replace the nan with 0
myarray == np.nan does not work; it produces False everywhere.
arr = np.nan_to_num(myarray) also works, replacing the nan with 0.
Now find the masks for the y and z tests. It doesn't matter how these handle the original nan (now 0). Calculate both masks first to reduce mutual interference.
In [29]: mask1 = arr > 6*y
In [30]: mask2 = arr < 4*z
In [31]: arr[mask1]
Out[31]: array([ 9.4, 6.9, 5.7, 15.1])
In [32]: arr[mask2]
Out[32]: array([ 1.01, 0. , 1.9 , 0. , -2.45, 0. , 0.2 , 0.3 ])
In [33]: arr[mask0]
Out[33]: array([0., 0.])
Since you want everything else to be 0, lets initial an array of zeros:
In [34]: res = np.zeros_like(arr)
now apply the 3 masks:
In [35]: res[mask1] = 3
In [36]: res[mask2] = 2
In [37]: res[mask0] = np.nan
In [38]: res
Out[38]:
array([[ 2., 3., 2., 3., 3.],
[ 2., 0., nan, 0., 2.],
[nan, 2., 2., 0., 3.]])
I could have applied the masks to arr:
In [40]: arr[mask1] = 3 # np.where(mask1, 3, arr) should also work
In [41]: arr[mask2] = 2
In [42]: arr[mask0] = np.nan
In [43]: arr
Out[43]:
array([[2. , 3. , 2. , 3. , 3. ],
[2. , 2.6, nan, 4.7, 2. ],
[nan, 2. , 2. , 4.2, 3. ]])
I still have to use some logic to combine the masks to identify the slots that are supposed to be 0.

Related

Where clause with numpy with single array and / or empty_like

I am trying to figure out how the np.where clause works. I create a simple df:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0, 10, size=(3, 4)), columns=list('ABCD'))
print(df)
A B C D
0 5 8 9 5
1 0 0 1 7
2 6 9 2 4
Now when I implement:
print(np.where(df.values, 1, np.nan))
I receive:
[[ 1. 1. 1. 1.]
[ nan nan 1. 1.]
[ 1. 1. 1. 1.]]
But when I create an empty_like array from df: and put it into where clause I receive this:
print(np.where(np.empty_like(df.values), 1, np.nan))
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
Really could use help on explaining how where clause works on a single array.
np.empty_like()
Docs:-
numpy.empty_like(prototype, dtype=None, order='K', subok=True)
Return a new array with the same shape and type as a given array.
>>> a = ([1,2,3], [4,5,6]) # a is array-like
>>> np.empty_like(a)
array([[-1073741821, -1073741821, 3], #random
[ 0, 0, -1073741821]])
np.empty_like() creates an array of the same shape and type as the given array but with random numbers. This array now goes into np.where()
numpy.where()
Docs:-
numpy.where(condition[, x, y])
Return elements that are chosen from x or y depending on condition.
Example:-
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a < 5, a, 10*a)
array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
>>>np.where(a,1,np.nan)
array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])
In Python any number other than zero is considered to be TRUE whereas zero is considered to FALSE.
When np.where() gets a np.array it checks for the condition, Here the array itself acts as condition i.e, the np.where evaluates to TRUE when the array elements are not zero and FALSE when they are 0. So the "True" elements are replaced by 1 and "False" elements by np.nan.
Reference:-
numpy.where()
numpy.empty_like()

How to use arrays to access matrix elements?

I need to change all nans of a matrix to a different value. I can easily get the nan positions using argwhere, but then I am not sure how to access those positions programmatically. Here is my nonworking code:
myMatrix = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
nanPositions = np.argwhere(np.isnan(myMatrix))
maxVal = np.nanmax(abs(myMatrix))
for pos in nanPositions :
myMatrix[pos] = maxval
the problem is that myMatrix[pos] does not accept pos as an array.
The more-efficient way of generating your output has already been covered by sacul. However, you're incorrectly indexing your 2D matrix in the case where you want to use an array.
At least to me, it's a bit unintuitive, but you need to use:
myMatrix[[all_row_indices], [all_column_indices]]
The following will give you what you expect:
import numpy as np
myMatrix = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
nanPositions = np.argwhere(np.isnan(myMatrix))
maxVal = np.nanmax(abs(myMatrix))
print(myMatrix[nanPositions[:, 0], nanPositions[:, 1]])
You can see more about advanced indexing in the documentation
In [54]: arr = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
...:
In [55]: arr
Out[55]:
array([[3.2, 2. , nan, 3. ],
[3. , 1. , 2. , nan],
[3. , 3. , 3. , 3. ]])
Location of the nan:
In [56]: np.where(np.isnan(arr))
Out[56]: (array([0, 1]), array([2, 3]))
In [57]: np.argwhere(np.isnan(arr))
Out[57]:
array([[0, 2],
[1, 3]])
where produces a tuple of arrays; argwhere the same values but as a 2d array
In [58]: arr[Out[56]]
Out[58]: array([nan, nan])
In [59]: arr[Out[56]] = [100,200]
In [60]: arr
Out[60]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3. , 1. , 2. , 200. ],
[ 3. , 3. , 3. , 3. ]])
The argwhere can be used to index individual items:
In [72]: for ij in Out[57]:
...: print(arr[tuple(ij)])
100.0
200.0
The tuple() is needed here because np.array([1,3]) in interpreted as 2 element indexing on the first dimension.
Another way to get that indexing tuple is to use unpacking:
In [74]: [arr[i,j] for i,j in Out[57]]
Out[74]: [100.0, 200.0]
So while argparse looks useful, it is trickier to use than plain where.
You could, as noted in the other answers, use boolean indexing (I've already modified arr so the isnan test no longer works):
In [75]: arr[arr>10]
Out[75]: array([100., 200.])
More on indexing with a list or array, and indexing with a tuple:
In [77]: arr[[0,0]] # two copies of row 0
Out[77]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
In [78]: arr[(0,0)] # one element
Out[78]: 3.2
In [79]: arr[np.array([0,0])] # same as list
Out[79]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
In [80]: arr[np.array([0,0]),:] # making the trailing : explicit
Out[80]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
You can do this instead (IIUC):
myMatrix[np.isnan(myMatrix)] = np.nanmax(abs(myMatrix))

Replacing fill values in 2D array with values from a 1D array

My goal is to fill a 2D array with values from a 1D array that exactly matches the pattern of values in the 2D array. For example:
array_a =
([[nan,nan,0],
[0,nan,0],
[nan,0,0],
[0,0,nan]])
array_b =
([0.324,0.254,0.204,
0.469,0.381,0.292,
0.550])
And I want to get this:
array_c =
([[nan,nan,0.324],
[0.254,nan,0.204],
[nan,0.469,0.381],
[0.292,0.550,nan]])
The number of values that need to be filled in array_a will exactly match the number of values in array_b. The main issue is that I want to have the nan values in the appropiate order throughout the array and I'm not sure how best to do that.
boolean indexing does the job nicely:
Locate the nan:
In [229]: mask = np.isnan(array_a)
In [230]: mask
Out[230]:
array([[ True, True, False],
[False, True, False],
[ True, False, False],
[False, False, True]])
boolean mask applied to the array produces a 1d array:
In [231]: array_a[~mask]
Out[231]: array([0., 0., 0., 0., 0., 0., 0.])
Use that same array in a set context:
In [232]: array_a[~mask]=array_b
In [233]: array_a[~mask]
Out[233]: array([0.324, 0.254, 0.204, 0.469, 0.381, 0.292, 0.55 ])
In [234]: array_a
Out[234]:
array([[ nan, nan, 0.324],
[0.254, nan, 0.204],
[ nan, 0.469, 0.381],
[0.292, 0.55 , nan]])
You can also do:
np.place(array_a, array_a == 0, array_b)
array_a
array([[ nan, nan, 0.324],
[0.254, nan, 0.204],
[ nan, 0.469, 0.381],
[0.292, 0.55 , nan]])
This should do the trick, although there might be a pre-written solution or a list comprehension to do the same.
import numpy as np
b_index = 0
array_c = np.zeros(np.array(array_a).shape)
for row_index, row in enumerate(array_a):
for col_index, col in enumerate(row):
if not np.isnan(col):
array_c[row_index, col_index] = array_b[b_index]
b_index += 1
else:
array_c[row_index, col_index] = np.nan
>>> print(array_c)
[[ nan nan 0.324]
[0.254 nan 0.204]
[ nan 0.469 0.381]
[0.292 0.55 nan]]

Numpy does treat float('nan') and float differently - convert to None

I want to create a Numpy array form a normal array and convert nan values to None - but the success depends on weather the first value is a "normal" float, or a float('nan').
Here is my code, starting with the initial array:
print(a)
array('d', [3.2345, nan, 2.0, 3.2, 1.0, 3.0])
print(b)
array('d', [nan, nan, 2.0, 3.2, 1.0, 3.0])
Now I would like to swap all nan values to Python None via a vectorized function:
def convert(x):
if x != x:
return None
else:
return x
convert_vec = numpy.vectorize(convert)
Simple, but leads to two different results:
numpy.asarray(convert_vec(a))
array([[ 3.2345, 2. , 1. ], [ nan, 3.2 , 3. ]])
numpy.asarray(convert_vec(b))
array([[None, 2.0, 1.0], [None, 3.2, 3.0]], dtype=object)
Why is this? Yes, I can see a small difference - the second one has object as dtype. But using numpy.asarray(convert_vec(a), dtype=object) fixed it - both have object as dtype - but it doesn't change the difference in results.
np.nan is a float value, None is not numeric.
In [464]: np.array([1,2,np.nan,3])
Out[464]: array([ 1., 2., nan, 3.])
In [465]: np.array([1,2,None,3])
Out[465]: array([1, 2, None, 3], dtype=object)
In [466]: np.array([1,2,None,3],dtype=float)
Out[466]: array([ 1., 2., nan, 3.])
If you try to create an array that contains None, the result will be a dtype=object array. If you insist on a float dtype, the None will be converted to nan.
In the vectorize case, if you don't specify the return dtype, it deduces it from the first element.
Your examples are a bit confusing (you need to edit them), but I think that
convert(np.nan) => None
convert(123) => 123
so
convert_vec([123,nan,...]) => [123, nan, ...],dtype=float
convert_vec([nan,123,...]) => [None, 123,...],dtype=object
trying to convert np.nan to None is a bad idea, except maybe for display purposes.
vectorize without explicit result dtype specification is a bad idea
this probably isn't a good use of vectorize.
Here's an alternative way of converting the nan values:
In [467]: a=np.array([1,2,np.nan,34,np.nan],float)
In [468]: a
Out[468]: array([ 1., 2., nan, 34., nan])
In [471]: ind=a!=a
In [472]: ind
Out[472]: array([False, False, True, False, True], dtype=bool)
In [473]: a[ind]=0 # not trying None
In [474]: a
Out[474]: array([ 1., 2., 0., 34., 0.])
Or using masked arrays:
In [477]: am=np.ma.masked_invalid(a)
In [478]: am
Out[478]:
masked_array(data = [1.0 2.0 -- 34.0 --],
mask = [False False True False True],
fill_value = 1e+20)
In [479]: am.filled(0)
Out[479]: array([ 1., 2., 0., 34., 0.])
hpaulj has explained well, here is an easy demonstration on how to do it:
a = [3.2345, numpy.nan, 2.0, 3.2, 1.0, 3.0]
print [i if i is not numpy.nan else None for i in a]

Manipulating an array in python

I have an numpy array that is obtained by reading an image.
data=band.ReadAsArray(0,0,rows,cols)
Now the problem is that while using loops to manipulate the data it took around 13 min. how can I reduce this time. is there any other solution.
sample code
for i in range(rows):
for j in range(cols):
if data[i][j]>1 and data[i][j]<30:
data[i][j]=255
elif data[i][j]<1:
data[i][j]=0
else:
data[i][j]=1
it takes too long. any short method
With numpy you can use a mask to select all elements with a certain condition, as shown in the code example below:
import numpy as np
a = np.random.random((5,5))
a[a<0.5] = 0.0
print(a)
# [[ 0. 0.94925686 0.8946333 0.51562938 0.99873065]
# [ 0. 0. 0. 0. 0. ]
# [ 0.86719795 0. 0.8187514 0. 0.72529116]
# [ 0.6036299 0.9463493 0.78283466 0.6516331 0.84991734]
# [ 0.72939806 0.85408697 0. 0.59062025 0.6704499 ]]
If you wished to re-write your code then it could be something like:
data=band.ReadAsArray(0,0,rows,cols)
data[data >= 1 & data<30] = 255
data[data<1] = 0
Instead of looping, you can assign using a boolean array to select the values you're interested in changing. For example, if we have an array
>>> a = np.array([[0.1, 0.5, 1], [10, 20, 30], [40, 50, 60]])
>>> a
array([[ 0.1, 0.5, 1. ],
[ 10. , 20. , 30. ],
[ 40. , 50. , 60. ]])
We can apply your logic with something like
>>> anew = np.empty_like(a)
>>> anew.fill(1)
>>> anew[a < 1] = 0
>>> anew[(a > 1) & (a < 30)] = 255
>>> anew
array([[ 0., 0., 1.],
[ 255., 255., 1.],
[ 1., 1., 1.]])
This works because of how numpy indexing works:
>>> a < 1
array([[ True, True, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> anew[a < 1]
array([ 0., 0.])
Note: we don't really need anew-- you can act on a itself -- but then you have to be careful about the order you apply things in case your conditions and the target values overlap.
Note #2: your conditions mean that if there's an element of the array which is exactly 30, or anything greater, it will become 1, and not 255. That seems a little odd, but it's what your code does, so I reproduced it.

Categories