how do I null certain values in numpy array based on a condition?
I don't understand why I end up with 0 instead of null or empty values where the condition is not met... b is a numpy array populated with 0 and 1 values, c is another fully populated numpy array. All arrays are 71x71x166
a = np.empty(((71,71,166)))
d = np.empty(((71,71,166)))
for indexes, value in np.ndenumerate(b):
i,j,k = indexes
a[i,j,k] = np.where(b[i,j,k] == 1, c[i,j,k], d[i,j,k])
I want to end up with an array which only has values where the condition is met and is empty everywhere else but with out changing its shape
FULL ISSUE FOR CLARIFICATION as asked for:
I start with a float populated array with shape (71,71,166)
I make an int array based on a cutoff applied to the float array basically creating a number of bins, roughly marking out 10 areas within the array with 0 values in between
What I want to end up with is an array with shape (71,71,166) which has the average values in a particular array direction (assuming vertical direction, if you think of a 3D array as a 3D cube) of a certain "bin"...
so I was trying to loop through the "bins" b == 1, b == 2 etc, sampling the float where that condition is met but being null elsewhere so I can take the average, and then recombine into one array at the end of the loop....
Not sure if I'm making myself understood. I'm using the np.where and using the indexing as I keep getting errors when I try and do it without although it feels very inefficient.
Consider this example:
import numpy as np
data = np.random.random((4,3))
mask = np.random.random_integers(0,1,(4,3))
data[mask==0] = np.NaN
The data will be set to nan wherever the mask is 0. You can use any kind of condition you want, of course, or do something different for different values in b.
To erase everything except a specific bin, try the following:
c[b!=1] = np.NaN
So, to make a copy of everything in a specific bin:
a = np.copy(c)
a[b!=1] == np.NaN
To get the average of everything in a bin:
np.mean(c[b==1])
So perhaps this might do what you want (where bins is a list of bin values):
a = np.empty(c.shape)
a[b==0] = np.NaN
for bin in bins:
a[b==bin] = np.mean(c[b==bin])
np.empty sometimes fills the array with 0's; it's undefined what the contents of an empty() array is, so 0 is perfectly valid. For example, try this instead:
d = np.nan * np.empty((71, 71, 166)).
But consider using numpy's strength, and don't iterate over the array:
a = np.where(b, c, d)
(since b is 0 or 1, I've excluded the explicit comparison b == 1.)
You may even want to consider using a masked array instead:
a = np.ma.masked_where(b, c)
which seems to make more sense with respect to your question: "how do I null certain values in a numpy array based on a condition" (replace null with mask and you're done).
Related
I have three 2-D numpy arrays with shape as (3,7).
I want to take the (0,0) element from each of the array, pass these values in a function and store the returned value at the (0,0) index in a new 2-D array.
Then I want to take (0,1) element from each of the array, pass these values to the same function and store the returned value at the (0,1) index of the same new array.
I want to run this for all the columns and then move on to the next row and continue till the end of the array.
The catch here is that I don't want to use loops, just the numpy methods. Been struggling a lot on this lately. Any ideas would be of great help.
Thanks!
I am running a loop like this for now. It gives me back the result for each element in the 1st row only. Here a, b and c are the three 2-D arrays that I mentioned earlier.
count = 0
def(a, b, c):
for i in range(0,7):
count += -(c[:1,:][i][0]) - (((a[:1,:][0][i]-b[:1,:][i][0])/c[:1,:][i][0]))**2
return count
Since all three arrays are the same shape, and you're operating on each element in the same way, you can easily translate to vetorised NumPy functions like so:
# res is a 2-D array of the same shape as a, b and c
res = -c - ((a - b) / c)**2
It looks like in your example code you're trying to sum each row, so you can do this after performing the operations:
count = np.sum(res, axis=1)
I am trying to set the values of array elements to 0 in a numpy array +/- 50 from a certain index. I have a numpy array named proc_ranges and I am using numpy.put() to do this:
proc_ranges = numpy.put(proc_ranges,[closest_point_index-50:closest_point_index+50], 0)
I am getting a syntax error at the ":", but this seems like the correct way to do this according to the syntax outlined here
You could just do this instead
proc_ranges[closest_point_index-50:closest_point_index+50] = 0
Yeah, you just need to make sure that the value closest_point_index-50 orclosest_point_index+50 do not exceed the array size.
I have netcdf data that is masked. The data is in (time, latitude, longitude). I would like to make an array with the same size as the original data but with zeros when the data is masked and with ones where it is not masked. So fare I have tried to make this function:
def find_unmasked_values(data):
empty = np.ones((len(data),len(data[0]),len(data[0,0])))
for k in range(0,len(data[0,0]),1): # third coordinate
for j in range(0,len(data[0]),1): # second coordinate
for i in range(0,len(data),1): # first coordinate
if ma.is_mask(data[i,j,k]) is True:
empty[i,j,k] = 0
return(empty)
But this only returns an array with ones and no zeros eventhough there is masked values in the data. If you have suggestions on how to improve the code in efficiency I would also be very happy.
Thanks,
Keep it simple! There is no need for all the manual loops, which will make your approach very slow for large data sets. A small example with some other data (where thl is a masked variable):
import netCDF4 as nc4
nc = nc4.Dataset('bomex_qlcore_0000000.nc')
var = nc['default']['thl'][:]
mask_1 = var.mask # masked=True, not masked=False
mask_2 = ~var.mask # masked=False, not masked=True
# What you need:
int_mask = mask_2.astype(int) # masked=0, not masked=1
p.s.: some other notes:
Instead of len(array), len(array[0]), et cetera, you can also directly get the shape of your array with array.shape, which returns a tupple with the array dimensions.
If you want to create a new array with the same dimensions as another one, just use empty = np.ones_like(data) (or np.zeros_like() is you want an array with zeros).
ma.is_mask() already returns a bool; no need to compare it with True.
Don't confuse is with ==: Is there a difference between "==" and "is"?
I have a numpy array with some random numbers, how can I create a new array with the same size and fill it with a single value?
I have the following code:
A=np.array([[2,2],
[2,2]])
B=np.copy(A)
B=B.fill(1)
I want to have a new array B with the same size as A but filled with 1s. However, it returns a None object. Same when using np.full.
You can use np.full_like:
B = np.full_like(A, 1)
This will create an array with the same properties as A and will fill it with 1.
In case you want to fill it with 1 there is a also a convenience function: np.ones_like
B = np.ones_like(A)
Your example does not work because B.fill does not return anything. It works "in-place". So you fill your B but you immediatly overwrite your variable B with the None return of fill. It would work if you use it like this:
A=np.array([[2,2], [2,2]])
B=np.copy(A)
B.fill(1)
I was able to optimise some operations in my program quite a bit using numpy. When I profile a run, I noticed that most of the time is spent in numpy.nan_to_num. I'd like to improve this even further.
The sort of calculations occurring are multiplication of two arrays for which one of the arrays could contain nan values. I want these to be treated as zeros, but I can't initialise the array with zeros, as nan has a meaning later on and can't be set to 0. Is there a way of doing multiplications (and additions) with nan being treated as zero?
From the nan_to_num docstring, I can see a new array is produced which may explain why it's taking so long.
Replace nan with zero and inf with finite numbers.
Returns an array or scalar replacing Not a Number (NaN) with zero,...
A function like nansum for arbitrary arithmetic operations would be great.
Here's some example data:
import numpy as np
a = np.random.rand(1000, 1000)
a[a < 0.1] = np.nan # set some random values to nan
b = np.ones_like(a)
One option is to use np.where to set the value of the result to 0 wherever one of your arrays is equal to NaN:
result = np.where(np.isnan(a), 0, a * b)
If you have to do several operations on an array that contains NaNs, you might consider using masked arrays, which provide a more general method for dealing with missing or invalid values:
masked_a = np.ma.masked_invalid(a)
result2 = masked_a * b
Here, result2 is another np.ma.masked_array whose .mask attribute is set according to where the NaN values were in a. To convert this back to a normal np.ndarray with the masked values replaced by 0s, you can use the .filled() method, passing in the fill value of your choice:
result_filled = result2.filled(0)
assert np.all(result_filled == result)