My goal is to fill a 2D array with values from a 1D array that exactly matches the pattern of values in the 2D array. For example:
array_a =
([[nan,nan,0],
[0,nan,0],
[nan,0,0],
[0,0,nan]])
array_b =
([0.324,0.254,0.204,
0.469,0.381,0.292,
0.550])
And I want to get this:
array_c =
([[nan,nan,0.324],
[0.254,nan,0.204],
[nan,0.469,0.381],
[0.292,0.550,nan]])
The number of values that need to be filled in array_a will exactly match the number of values in array_b. The main issue is that I want to have the nan values in the appropiate order throughout the array and I'm not sure how best to do that.
boolean indexing does the job nicely:
Locate the nan:
In [229]: mask = np.isnan(array_a)
In [230]: mask
Out[230]:
array([[ True, True, False],
[False, True, False],
[ True, False, False],
[False, False, True]])
boolean mask applied to the array produces a 1d array:
In [231]: array_a[~mask]
Out[231]: array([0., 0., 0., 0., 0., 0., 0.])
Use that same array in a set context:
In [232]: array_a[~mask]=array_b
In [233]: array_a[~mask]
Out[233]: array([0.324, 0.254, 0.204, 0.469, 0.381, 0.292, 0.55 ])
In [234]: array_a
Out[234]:
array([[ nan, nan, 0.324],
[0.254, nan, 0.204],
[ nan, 0.469, 0.381],
[0.292, 0.55 , nan]])
You can also do:
np.place(array_a, array_a == 0, array_b)
array_a
array([[ nan, nan, 0.324],
[0.254, nan, 0.204],
[ nan, 0.469, 0.381],
[0.292, 0.55 , nan]])
This should do the trick, although there might be a pre-written solution or a list comprehension to do the same.
import numpy as np
b_index = 0
array_c = np.zeros(np.array(array_a).shape)
for row_index, row in enumerate(array_a):
for col_index, col in enumerate(row):
if not np.isnan(col):
array_c[row_index, col_index] = array_b[b_index]
b_index += 1
else:
array_c[row_index, col_index] = np.nan
>>> print(array_c)
[[ nan nan 0.324]
[0.254 nan 0.204]
[ nan 0.469 0.381]
[0.292 0.55 nan]]
Related
Suppose I have two arrays, a=np.array([0,0,1,1,1,2]), b=np.array([1,2,4,2,6,5]). Elements in a mean the row indices of where b should be assigned. And if there are multiple elements in the same row, the values should be assigned in order.
So the result is a 2D array c:
c = np.zeros((3, 4))
counts = {k:0 for k in range(3)}
for i in range(a.shape[0]):
c[a[i], counts[a[i]]]=b[i]
counts[a[i]]+=1
print(c)
Is there a way to use some fancy indexing method in numpy to get such results faster (without a for loop) in case these arrays are big.
I had to run your code to actually see what it produced. There are limits to what I can 'run' in my head.
In [230]: c
Out[230]:
array([[1., 2., 0., 0.],
[4., 2., 6., 0.],
[5., 0., 0., 0.]])
In [231]: counts
Out[231]: {0: 2, 1: 3, 2: 1}
Omitting this information may be delaying possible answers. 'vectorization' requires thinking in whole-array terms, which is easiest if I can visualize the result, and look for a pattern.
This looks like a padding problem.
In [260]: u, c = np.unique(a, return_counts=True)
In [261]: u
Out[261]: array([0, 1, 2])
In [262]: c
Out[262]: array([2, 3, 1]) # cf with counts
Load data with rows of different sizes into Numpy array
Working from previous padding questions, I can construct a mask:
In [263]: mask = np.arange(4)<c[:,None]
In [264]: mask
Out[264]:
array([[ True, True, False, False],
[ True, True, True, False],
[ True, False, False, False]])
and use that to assign the b values to c:
In [265]: c = np.zeros((3,4),int)
In [266]: c[mask] = b
In [267]: c
Out[267]:
array([[1, 2, 0, 0],
[4, 2, 6, 0],
[5, 0, 0, 0]])
Since a is already sorted we might get the counts faster than with unique. Also it will have problems if a doesn't have any values for some row(s).
I have a large Numpy ndarray, here is a sample of that:
myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[np.nan,0.2,0.3,4.2,15.1]])
myarray
array([[ 1.01, 9.4 , 0.0 , 6.9 , 5.7 ],
[ 1.9 , 2.6 , nan, 4.7 , -2.45],
[ nan, 0.2 , 0.3 , 4.2 , 15.1 ]])
As you can see, my array contains floats, positive, negative, zeros and NaNs. I would like to re-assign (re-class) the values in the array based on multiple if statements. I've read many answers and docs but all of which I've seen refer to a simple one or two conditions which can be easily be resolved using np.where for example.
I have multiple condition, for the sake of simplicity let's say I have four conditions (the desired solution should be able to handle more conditions). My conditions are:
if x > 6*y:
x=3
elif x < 4*z:
x=2
elif x == np.nan:
x=np.nan # maybe pass is better?
else:
x=0
where x is a value in the array, y and z are variable that will change among arrays. For example, array #1 will have y=5, z=2, array #2 will have y = 0.9, z= 0.5 etc. The condition for np.nan just means that if a value is nan, do not alter it, keep it nan.
Note that this needs to be executed at the same time, because if I use several np.where one after the other, than condition #2 will overwrite condition #1.
I tried to create a function and then apply it on the array but with no success. It seems that in order to apply a function to an array, the function must include only one argument (the array), and if I out to use a function, it should contain 3 arguments: the array, and y and z values.
What would be the most efficient way to achieve my goal?
In [11]: myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[
...: np.nan,0.2,0.3,4.2,15.1]])
In [13]: y, z = 0.9, 0.5
If I perform one of your tests on the whole array:
In [14]: mask1 = myarray >6*y
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater
It's the np.nan that cause this warning.
So lets first identify those nan (and replace):
In [25]: mask0 = np.isnan(myarray)
In [26]: mask0
Out[26]:
array([[False, False, False, False, False],
[False, False, True, False, False],
[ True, False, False, False, False]])
In [27]: arr = myarray.copy()
In [28]: arr[mask0] = 0 # temp replace the nan with 0
myarray == np.nan does not work; it produces False everywhere.
arr = np.nan_to_num(myarray) also works, replacing the nan with 0.
Now find the masks for the y and z tests. It doesn't matter how these handle the original nan (now 0). Calculate both masks first to reduce mutual interference.
In [29]: mask1 = arr > 6*y
In [30]: mask2 = arr < 4*z
In [31]: arr[mask1]
Out[31]: array([ 9.4, 6.9, 5.7, 15.1])
In [32]: arr[mask2]
Out[32]: array([ 1.01, 0. , 1.9 , 0. , -2.45, 0. , 0.2 , 0.3 ])
In [33]: arr[mask0]
Out[33]: array([0., 0.])
Since you want everything else to be 0, lets initial an array of zeros:
In [34]: res = np.zeros_like(arr)
now apply the 3 masks:
In [35]: res[mask1] = 3
In [36]: res[mask2] = 2
In [37]: res[mask0] = np.nan
In [38]: res
Out[38]:
array([[ 2., 3., 2., 3., 3.],
[ 2., 0., nan, 0., 2.],
[nan, 2., 2., 0., 3.]])
I could have applied the masks to arr:
In [40]: arr[mask1] = 3 # np.where(mask1, 3, arr) should also work
In [41]: arr[mask2] = 2
In [42]: arr[mask0] = np.nan
In [43]: arr
Out[43]:
array([[2. , 3. , 2. , 3. , 3. ],
[2. , 2.6, nan, 4.7, 2. ],
[nan, 2. , 2. , 4.2, 3. ]])
I still have to use some logic to combine the masks to identify the slots that are supposed to be 0.
I have a 3 dimensional numpy array with shape (x,y,R). For each (x,y) pair, I have a 1D numpy array of R values. I want to set the entire array to nan if any of the R values are nan or zero. I tried something like
# 3d np array is called: data
mask1 = (data==0).any(axis=2)
mask2 = (data==np.nan).any(axis=2)
data[np.logical_or(mask1, mask2)] = np.nan
But this doesn't seem to work, I think the problem is the way I am trying to subset the numpy array with the lower dimensional boolean area, but not quite sure how to solve this.
Some example data:
y = np.random.random(size=(2,2,3))
y[0,0,2] = np.nan
y[0,1,0] = np.nan
y[0,0,1] = np.nan
y[1,1,2] = 0.
so that:
y[0,0,:]
array([0.092718, nan, nan])
y[0,1,:]
array([ nan, 0.00243745, nan])
y[1,0,:]
array([0.5282173 , 0.7548559 , 0.08869139])
y[1,1,:]
array([0.19612415, 0.16969036, 0.0])
and the desired result:
y[0,0,:]
array([nan, nan, nan])
y[0,1,:]
array([nan, nan, nan])
y[1,0,:]
array([0.5282173 , 0.7548559 , 0.08869139])
y[1,1,:]
array([nan, nan, nan])
update
this seems to work, but perhaps there are more elegant appraoches:
mask1 = (y==0).any(axis=2)
y[np.logical_or(np.sum(np.isnan(y), axis=2) > 0, mask1)] = np.nan
y
array([[[ nan, nan, nan],
[ nan, nan, nan]],
[[0.5282173 , 0.7548559 , 0.08869139],
[ nan, nan, nan]]])
nan has the peculiar property of comparing not equal to anything, including nan itself:
>>> y = np.random.random(size=(2,2,3))
>>> y[0,0,2] = np.nan
>>> y[0,1,0] = np.nan
>>> y[0,0,1] = np.nan
>>> y[0,1,2] = np.nan
>>>
>>> y
array([[[0.03161193, nan, nan],
[ nan, 0.55789282, nan]],
[[0.78047397, 0.06949872, 0.65225197],
[0.84801579, 0.11298244, 0.07627531]]])
>>>
>>> y == np.nan
array([[[False, False, False],
[False, False, False]],
[[False, False, False],
[False, False, False]]])
To check for nan you have to use np.isnan
>>> np.isnan(y)
array([[[False, True, True],
[ True, False, True]],
[[False, False, False],
[False, False, False]]])
With this little modification your code will actually work:
>>> mask1 = (y==0).any(axis=2)
>>> mask2 = np.isnan(y).any(axis=2)
>>> y[np.logical_or(mask1, mask2)] = np.nan
>>>
>>> y
array([[[ nan, nan, nan],
[ nan, nan, nan]],
[[0.78047397, 0.06949872, 0.65225197],
[0.84801579, 0.11298244, 0.07627531]]])
As an addendum to #PaulPanzer's answer, I have attempted to get the same result with the minimum number of temp arrays. This answer is here for fun, and does not provide any benefits to outweigh the clarity and legibility of PaulPanzer's answer.
Instead of ndarray.any, you can check for zeros directly with ndarray.all and flip the 2D array in-place instead of the 3D, avoiding a temp array. You can use the property that any number added (or subtracted, multiplied, divided, etc) to nan results in nan. Instead of using ndarray.any, you can use ufunc.reduce to make your 2D matrix, which will save you another 3D boolean array. You can't use the fact that np.isnan is a ufunc directly, because it is a unary function which does not support the reduce operation.
# Check for zeros
mask = y.all(axis=2) # Straight to 2D, no temp arrays
mask = np.logical_not(mask, out=mask) # In place negation, no temp arrays
# Check for nans
nans = np.add.reduce(y, axis=2) # 2D temp array, not 3D
mask |= np.isnan(nans) # Another temp array, also 2D
I chose to use np.add because it is not likely to run into problems that cause false nans to appear (unlike say np.divide). Any overflows will become +/-inf, which will not trigger the isnan check.
aa = np.array([2.0, np.NaN])
aa[aa>1.0] = np.NaN
On running the code above, I get the foll. warning, I understand the reason for this warning, but how to avoid it?
RuntimeWarning: invalid value encountered in greater
Store the indices of the valid ones (non - NaNs). First off, we will use these indices to index into the array and perform the comparison to get a mask and then again index into those indices with that mask to retrieve back the indices corresponding to original order. Using the original-ordered indices, we could then assign elements in the input array to NaNs.
Thus, an implementation/solution would be -
idx = np.flatnonzero(~np.isnan(aa))
aa[idx[aa[idx] > 1.0]] = np.nan
Sample run -
In [106]: aa # Input array with NaNs
Out[106]: array([ 0., 3., nan, 0., 9., 6., 6., nan, 18., 6.])
In [107]: idx = np.flatnonzero(~np.isnan(aa)) # Store valid indices
In [108]: idx
Out[108]: array([0, 1, 3, 4, 5, 6, 8, 9])
In [109]: aa[idx[aa[idx] > 1.0]] = np.nan # Do the assignment
In [110]: aa # Verify
Out[110]: array([ 0., nan, nan, 0., nan, nan, nan, nan, nan, nan])
I want to create a Numpy array form a normal array and convert nan values to None - but the success depends on weather the first value is a "normal" float, or a float('nan').
Here is my code, starting with the initial array:
print(a)
array('d', [3.2345, nan, 2.0, 3.2, 1.0, 3.0])
print(b)
array('d', [nan, nan, 2.0, 3.2, 1.0, 3.0])
Now I would like to swap all nan values to Python None via a vectorized function:
def convert(x):
if x != x:
return None
else:
return x
convert_vec = numpy.vectorize(convert)
Simple, but leads to two different results:
numpy.asarray(convert_vec(a))
array([[ 3.2345, 2. , 1. ], [ nan, 3.2 , 3. ]])
numpy.asarray(convert_vec(b))
array([[None, 2.0, 1.0], [None, 3.2, 3.0]], dtype=object)
Why is this? Yes, I can see a small difference - the second one has object as dtype. But using numpy.asarray(convert_vec(a), dtype=object) fixed it - both have object as dtype - but it doesn't change the difference in results.
np.nan is a float value, None is not numeric.
In [464]: np.array([1,2,np.nan,3])
Out[464]: array([ 1., 2., nan, 3.])
In [465]: np.array([1,2,None,3])
Out[465]: array([1, 2, None, 3], dtype=object)
In [466]: np.array([1,2,None,3],dtype=float)
Out[466]: array([ 1., 2., nan, 3.])
If you try to create an array that contains None, the result will be a dtype=object array. If you insist on a float dtype, the None will be converted to nan.
In the vectorize case, if you don't specify the return dtype, it deduces it from the first element.
Your examples are a bit confusing (you need to edit them), but I think that
convert(np.nan) => None
convert(123) => 123
so
convert_vec([123,nan,...]) => [123, nan, ...],dtype=float
convert_vec([nan,123,...]) => [None, 123,...],dtype=object
trying to convert np.nan to None is a bad idea, except maybe for display purposes.
vectorize without explicit result dtype specification is a bad idea
this probably isn't a good use of vectorize.
Here's an alternative way of converting the nan values:
In [467]: a=np.array([1,2,np.nan,34,np.nan],float)
In [468]: a
Out[468]: array([ 1., 2., nan, 34., nan])
In [471]: ind=a!=a
In [472]: ind
Out[472]: array([False, False, True, False, True], dtype=bool)
In [473]: a[ind]=0 # not trying None
In [474]: a
Out[474]: array([ 1., 2., 0., 34., 0.])
Or using masked arrays:
In [477]: am=np.ma.masked_invalid(a)
In [478]: am
Out[478]:
masked_array(data = [1.0 2.0 -- 34.0 --],
mask = [False False True False True],
fill_value = 1e+20)
In [479]: am.filled(0)
Out[479]: array([ 1., 2., 0., 34., 0.])
hpaulj has explained well, here is an easy demonstration on how to do it:
a = [3.2345, numpy.nan, 2.0, 3.2, 1.0, 3.0]
print [i if i is not numpy.nan else None for i in a]