I want to create a Numpy array form a normal array and convert nan values to None - but the success depends on weather the first value is a "normal" float, or a float('nan').
Here is my code, starting with the initial array:
print(a)
array('d', [3.2345, nan, 2.0, 3.2, 1.0, 3.0])
print(b)
array('d', [nan, nan, 2.0, 3.2, 1.0, 3.0])
Now I would like to swap all nan values to Python None via a vectorized function:
def convert(x):
if x != x:
return None
else:
return x
convert_vec = numpy.vectorize(convert)
Simple, but leads to two different results:
numpy.asarray(convert_vec(a))
array([[ 3.2345, 2. , 1. ], [ nan, 3.2 , 3. ]])
numpy.asarray(convert_vec(b))
array([[None, 2.0, 1.0], [None, 3.2, 3.0]], dtype=object)
Why is this? Yes, I can see a small difference - the second one has object as dtype. But using numpy.asarray(convert_vec(a), dtype=object) fixed it - both have object as dtype - but it doesn't change the difference in results.
np.nan is a float value, None is not numeric.
In [464]: np.array([1,2,np.nan,3])
Out[464]: array([ 1., 2., nan, 3.])
In [465]: np.array([1,2,None,3])
Out[465]: array([1, 2, None, 3], dtype=object)
In [466]: np.array([1,2,None,3],dtype=float)
Out[466]: array([ 1., 2., nan, 3.])
If you try to create an array that contains None, the result will be a dtype=object array. If you insist on a float dtype, the None will be converted to nan.
In the vectorize case, if you don't specify the return dtype, it deduces it from the first element.
Your examples are a bit confusing (you need to edit them), but I think that
convert(np.nan) => None
convert(123) => 123
so
convert_vec([123,nan,...]) => [123, nan, ...],dtype=float
convert_vec([nan,123,...]) => [None, 123,...],dtype=object
trying to convert np.nan to None is a bad idea, except maybe for display purposes.
vectorize without explicit result dtype specification is a bad idea
this probably isn't a good use of vectorize.
Here's an alternative way of converting the nan values:
In [467]: a=np.array([1,2,np.nan,34,np.nan],float)
In [468]: a
Out[468]: array([ 1., 2., nan, 34., nan])
In [471]: ind=a!=a
In [472]: ind
Out[472]: array([False, False, True, False, True], dtype=bool)
In [473]: a[ind]=0 # not trying None
In [474]: a
Out[474]: array([ 1., 2., 0., 34., 0.])
Or using masked arrays:
In [477]: am=np.ma.masked_invalid(a)
In [478]: am
Out[478]:
masked_array(data = [1.0 2.0 -- 34.0 --],
mask = [False False True False True],
fill_value = 1e+20)
In [479]: am.filled(0)
Out[479]: array([ 1., 2., 0., 34., 0.])
hpaulj has explained well, here is an easy demonstration on how to do it:
a = [3.2345, numpy.nan, 2.0, 3.2, 1.0, 3.0]
print [i if i is not numpy.nan else None for i in a]
Related
I have a large Numpy ndarray, here is a sample of that:
myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[np.nan,0.2,0.3,4.2,15.1]])
myarray
array([[ 1.01, 9.4 , 0.0 , 6.9 , 5.7 ],
[ 1.9 , 2.6 , nan, 4.7 , -2.45],
[ nan, 0.2 , 0.3 , 4.2 , 15.1 ]])
As you can see, my array contains floats, positive, negative, zeros and NaNs. I would like to re-assign (re-class) the values in the array based on multiple if statements. I've read many answers and docs but all of which I've seen refer to a simple one or two conditions which can be easily be resolved using np.where for example.
I have multiple condition, for the sake of simplicity let's say I have four conditions (the desired solution should be able to handle more conditions). My conditions are:
if x > 6*y:
x=3
elif x < 4*z:
x=2
elif x == np.nan:
x=np.nan # maybe pass is better?
else:
x=0
where x is a value in the array, y and z are variable that will change among arrays. For example, array #1 will have y=5, z=2, array #2 will have y = 0.9, z= 0.5 etc. The condition for np.nan just means that if a value is nan, do not alter it, keep it nan.
Note that this needs to be executed at the same time, because if I use several np.where one after the other, than condition #2 will overwrite condition #1.
I tried to create a function and then apply it on the array but with no success. It seems that in order to apply a function to an array, the function must include only one argument (the array), and if I out to use a function, it should contain 3 arguments: the array, and y and z values.
What would be the most efficient way to achieve my goal?
In [11]: myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[
...: np.nan,0.2,0.3,4.2,15.1]])
In [13]: y, z = 0.9, 0.5
If I perform one of your tests on the whole array:
In [14]: mask1 = myarray >6*y
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater
It's the np.nan that cause this warning.
So lets first identify those nan (and replace):
In [25]: mask0 = np.isnan(myarray)
In [26]: mask0
Out[26]:
array([[False, False, False, False, False],
[False, False, True, False, False],
[ True, False, False, False, False]])
In [27]: arr = myarray.copy()
In [28]: arr[mask0] = 0 # temp replace the nan with 0
myarray == np.nan does not work; it produces False everywhere.
arr = np.nan_to_num(myarray) also works, replacing the nan with 0.
Now find the masks for the y and z tests. It doesn't matter how these handle the original nan (now 0). Calculate both masks first to reduce mutual interference.
In [29]: mask1 = arr > 6*y
In [30]: mask2 = arr < 4*z
In [31]: arr[mask1]
Out[31]: array([ 9.4, 6.9, 5.7, 15.1])
In [32]: arr[mask2]
Out[32]: array([ 1.01, 0. , 1.9 , 0. , -2.45, 0. , 0.2 , 0.3 ])
In [33]: arr[mask0]
Out[33]: array([0., 0.])
Since you want everything else to be 0, lets initial an array of zeros:
In [34]: res = np.zeros_like(arr)
now apply the 3 masks:
In [35]: res[mask1] = 3
In [36]: res[mask2] = 2
In [37]: res[mask0] = np.nan
In [38]: res
Out[38]:
array([[ 2., 3., 2., 3., 3.],
[ 2., 0., nan, 0., 2.],
[nan, 2., 2., 0., 3.]])
I could have applied the masks to arr:
In [40]: arr[mask1] = 3 # np.where(mask1, 3, arr) should also work
In [41]: arr[mask2] = 2
In [42]: arr[mask0] = np.nan
In [43]: arr
Out[43]:
array([[2. , 3. , 2. , 3. , 3. ],
[2. , 2.6, nan, 4.7, 2. ],
[nan, 2. , 2. , 4.2, 3. ]])
I still have to use some logic to combine the masks to identify the slots that are supposed to be 0.
My goal is to fill a 2D array with values from a 1D array that exactly matches the pattern of values in the 2D array. For example:
array_a =
([[nan,nan,0],
[0,nan,0],
[nan,0,0],
[0,0,nan]])
array_b =
([0.324,0.254,0.204,
0.469,0.381,0.292,
0.550])
And I want to get this:
array_c =
([[nan,nan,0.324],
[0.254,nan,0.204],
[nan,0.469,0.381],
[0.292,0.550,nan]])
The number of values that need to be filled in array_a will exactly match the number of values in array_b. The main issue is that I want to have the nan values in the appropiate order throughout the array and I'm not sure how best to do that.
boolean indexing does the job nicely:
Locate the nan:
In [229]: mask = np.isnan(array_a)
In [230]: mask
Out[230]:
array([[ True, True, False],
[False, True, False],
[ True, False, False],
[False, False, True]])
boolean mask applied to the array produces a 1d array:
In [231]: array_a[~mask]
Out[231]: array([0., 0., 0., 0., 0., 0., 0.])
Use that same array in a set context:
In [232]: array_a[~mask]=array_b
In [233]: array_a[~mask]
Out[233]: array([0.324, 0.254, 0.204, 0.469, 0.381, 0.292, 0.55 ])
In [234]: array_a
Out[234]:
array([[ nan, nan, 0.324],
[0.254, nan, 0.204],
[ nan, 0.469, 0.381],
[0.292, 0.55 , nan]])
You can also do:
np.place(array_a, array_a == 0, array_b)
array_a
array([[ nan, nan, 0.324],
[0.254, nan, 0.204],
[ nan, 0.469, 0.381],
[0.292, 0.55 , nan]])
This should do the trick, although there might be a pre-written solution or a list comprehension to do the same.
import numpy as np
b_index = 0
array_c = np.zeros(np.array(array_a).shape)
for row_index, row in enumerate(array_a):
for col_index, col in enumerate(row):
if not np.isnan(col):
array_c[row_index, col_index] = array_b[b_index]
b_index += 1
else:
array_c[row_index, col_index] = np.nan
>>> print(array_c)
[[ nan nan 0.324]
[0.254 nan 0.204]
[ nan 0.469 0.381]
[0.292 0.55 nan]]
For example, if I have the 2D array as follows.
[[1,2,3,NAN],
[4,5,NAN,NAN],
[6,NAN,NAN,NAN]
]
The desired result is
[[1,2,3],
[4,5],
[6]
]
How should I transform?
I find using
x = x[~numpy.isnan(x)] can only generate [1,2,3,4,5,6], which has been squeezed into one dimensional array.
Thanks!
Just apply that isnan on a row by row basis
In [135]: [row[~np.isnan(row)] for row in arr]
Out[135]: [array([1., 2., 3.]), array([4., 5.]), array([6.])]
Boolean masking as in x[~numpy.isnan(x)] produces a flattened result because, in general, the result will be ragged like this, and can't be formed into a 2d array.
The source array must be float dtype - because np.nan is a float:
In [138]: arr = np.array([[1,2,3,np.nan],[4,5,np.nan,np.nan],[6,np.nan,np.nan,np.nan]])
In [139]: arr
Out[139]:
array([[ 1., 2., 3., nan],
[ 4., 5., nan, nan],
[ 6., nan, nan, nan]])
If object dtype, the numbers can be integer, but np.isnan(arr) won't work.
If the original is a list, rather than an array:
In [146]: alist = [[1,2,3,np.nan],[4,5,np.nan,np.nan],[6,np.nan,np.nan,np.nan]]
In [147]: alist
Out[147]: [[1, 2, 3, nan], [4, 5, nan, nan], [6, nan, nan, nan]]
In [148]: [[i for i in row if ~np.isnan(i)] for row in alist]
Out[148]: [[1, 2, 3], [4, 5], [6]]
The flat array could be turned into a list of arrays with split:
In [152]: np.split(arr[~np.isnan(arr)],(3,5))
Out[152]: [array([1., 2., 3.]), array([4., 5.]), array([6.])]
where the (3,5) split parameter could be determined by counting the non-nan in each row, but that's more work and doesn't promise to be faster than than the row iteration.
aa = np.array([2.0, np.NaN])
aa[aa>1.0] = np.NaN
On running the code above, I get the foll. warning, I understand the reason for this warning, but how to avoid it?
RuntimeWarning: invalid value encountered in greater
Store the indices of the valid ones (non - NaNs). First off, we will use these indices to index into the array and perform the comparison to get a mask and then again index into those indices with that mask to retrieve back the indices corresponding to original order. Using the original-ordered indices, we could then assign elements in the input array to NaNs.
Thus, an implementation/solution would be -
idx = np.flatnonzero(~np.isnan(aa))
aa[idx[aa[idx] > 1.0]] = np.nan
Sample run -
In [106]: aa # Input array with NaNs
Out[106]: array([ 0., 3., nan, 0., 9., 6., 6., nan, 18., 6.])
In [107]: idx = np.flatnonzero(~np.isnan(aa)) # Store valid indices
In [108]: idx
Out[108]: array([0, 1, 3, 4, 5, 6, 8, 9])
In [109]: aa[idx[aa[idx] > 1.0]] = np.nan # Do the assignment
In [110]: aa # Verify
Out[110]: array([ 0., nan, nan, 0., nan, nan, nan, nan, nan, nan])
What is a pythonic way to access the shifted, either right or left, of a numpy array? A clear example:
a = np.array([1.0, 2.0, 3.0, 4.0])
Is there away to access:
a_shifted_1_left = np.array([2.0, 3.0, 4.0, 1.0])
from the numpy library?
You are looking for np.roll -
np.roll(a,-1) # shifted left
np.roll(a,1) # shifted right
Sample run -
In [28]: a
Out[28]: array([ 1., 2., 3., 4.])
In [29]: np.roll(a,-1) # shifted left
Out[29]: array([ 2., 3., 4., 1.])
In [30]: np.roll(a,1) # shifted right
Out[30]: array([ 4., 1., 2., 3.])
If you want more shifts, just go np.roll(a,-2) and np.roll(a,2) and so on.