getting all rows where complex condition holds in scipy/numpy - python

what is the simplest way to get all rows where a complex condition holds for an ndarray that represents a 2d matrix? e.g. get all rows where all the values are above 5 or all the values are below 5?
thanks.

You probably know that boolean arrays can be used for indexing, e.g.:
import numpy as np
x = np.arange(10)
x2 = x[x<5]
For a boolean array, np.all lets you apply it across a given axis:
y = np.arange(12).reshape(3,4)
b = y < 6
b1 = np.all(b, axis=0)
b2 = np.all(b, axis=1)
y1 = y[b1]
y2 = y[b2]
It only returns the arrays which meet the criteria, so the shape is not preserved. (If you do need to preserve the shape, then take a look at masked arrays.)

This will give you the row indices of the rows where all values are lower or higher than 5:
x = numpy.arange(100).reshape(20,5)
numpy.where((x > 5).all(axis=1) ^ (x < 5).all(axis=1))
or more concisely (but not proceeding via the same logic):
numpy.where(((x > 5) ^ (x < 5)).all(axis=1))
To fetch the data, rather than the indices, use the boolean result directly:
x[((x > 5) ^ (x < 5)).all(axis=1)]

Related

Iterating a function over an array

Posing the title of this question differently.
I have a function that take a three dimensional array and masks certain elements within the array based on specific conditions. See below:
#function for array masking
def masc(arr,z):
return(np.ma.masked_where((arr[:,:,2] <= z+0.05)*(arr[:,:,2] >= z-0.05), arr[:,:,2]))
arr is a 3D array and z is a single value.
I now want to iterate this for multiple Z values. Here is an example with 2 z values:
masked_array1_1 = masc(xyz,z1)
masked_array1_2 = masc(xyz,z2)
masked_1 = masked_array1_1.mask + masked_array1_2.mask
masked_array1 = np.ma.array(xyz[:,:,2],mask=masked_1)
The masked_array1 gives me exactly what i'm looking for.
I've started to write a forloop to iterate this over a 1D array of Z values:
mask_1 = xyz[:,:,2]
for i in range(Z_all_dim):
mask_1 += (masc(xyz,Z_all[i]).mask)
masked_array1 = np.ma.array(xyz[:,:,2], mask = mask_1)
Z_all is an array of 7 unique z values. This code does not work (the entire array ends up masked) but i feel like i'm very close. Does anyone see if i'm doing something wrong?
Your issue is that before the loop you start with mask_1 = xyz[:,:,2]. Adding a boolean array to a float will cast the boolean to 1s and 0s and unless your float array has any 0s in it, the final array will be all nonzero values, which then causes every value to get masked. Instead do
mask_1 = masc(xyz, Z_all[0]).mask
for z in Z_all[1:]:
mask_1 += masc(xyz, z).mask
Or avoiding any loops and broadcasting your operations
# No need to pass it through `np.ma.masked_where` if
# you're just going to extract just the boolean mask
mask = (xyz[...,2,None] <= Z_all + 0.05) * (xyz[...,2,None] >= Z_all - 0.05)
mask = np.any(mask, axis=-1)

Numpy - IF element is less than or equal to, then pass

I am trying to sort through a list of values in NUMPY / UPROOT, and I am having trouble with the formatting, as I am new to UPROOT.
The values are in some other list, and we'll call the values in one by one with a name, x.
If the values of x is greater than or equal to 5, I want to add it to the array, which is initially empty. If the number is less than 5, then we move onto the next number.
specifically, I need help with how to format the "greater than equal to"
array = []
if x is greater than or equal to 5:
array.append(x)
else:
return 0
Thanks everyone!
Using numpy you can do something like:
import numpy as np
# Initialize array
array = np.array([])
# Make some random values for x
x = np.floor(np.random.rand(10)*10)
for i in x: # Loop through x
if i >= 5: # If value is bigger or equal to 5
array = np.append(array, i) # add to array
So, "greater than equal to" is just >=
you're using a python list, which is different from numpy arrays. either way the following code should work
X = np.random.random(size= [10]) # array containing x values
if you want an numpy array
arr = X[X >= 5]
if you want a list
arr = list(X[X >= 5])

Using numpy where on multidimensional array but need to control indexing

I need to modify elements of an 3D array if they exceed some threshold value. The modification is based upon related elements of another array. More concretely:
A_ijk = A_ijk if A_ijk < threshold value
= (B_(i-1)jk + B_ijk) / 2, otherwise
Numpy.where provides most of the functionality I need, but I don't know how to iterate over the first index without an explicit loop. The follow code does what I want, but uses a loop. Is there a better way? Assume A and B are same shape.
for i in xrange(A.shape[0]):
A[i] = numpy.where(A[i] <= threshold, A[i], (B[i - 1] + B[i]) / 2)
To address the comments below: The first few rows of A are guaranteed to be below threshold. This keeps the i index from looping over to the last entry of A.
You can vectorize your operation by using boolean indexing to replace the elements of A that are above the threshold. A little care has to be taken, since the auxiliary array corresponding to (B[i-1] + B[i])/2 has one less size along the first dimension than A, so we have to explicitly ignore the first row of A (knowing that they are all below the threshold, as explained in the question):
import numpy as np
# some dummy data
A = np.random.rand(3,4,5)
B = np.random.rand(3,4,5)
threshold = 0.5
A[0,:] *= threshold # put the first dummy row below threshhold
mask = A[1:] > threshold # to be overwritten, shape (2,4,5)
replace = (B[:-1] + B[1:])/2 # to overwrite elements in A from, shape (2,4,5)
# replace corresponding elements where `mask` is True
A[1:][mask] = replace[mask]
You can use where to directly index into ndarray:
a = np.random.rand(4,3,10)
b = np.zeros(a.shape)
idx = np.where(a < .1)
print(a)
a[idx] = b[idx]
print(a)
If a for-loop is needed however, just convert the ravel the indices and update.
a = np.random.rand(4,3,10)
b = np.zeros(a.shape)
idx = [np.ravel_multi_index(i, a.shape) for i in zip(*np.where(a < .1))]
print(a, idx)
for i in idx:
a.ravel()[i] = b.ravel()[i]
print(a)

Multiple conditions np.extract

I have an array and want ot extract all entries which are in a specific range
x = np.array([1,2,3,4])
condition = x<=4 and x>1
x_sel = np.extract(condition,x)
But this does not work. I'm getting
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If I'm doing the same without the and and checking for example only one condition
x = np.array([1,2,3,4])
condition = x<=4
x_sel = np.extract(condition,x)
everything works...
Of courese I could just apply the procedure twice with one condition, but isn't there a solution to do this in one line?
Many thanks in advance
You can use either this:
import numpy as np
x = np.array([1,2,3,4])
condition = (x <= 4) & (x > 1)
x_sel = np.extract(condition,x)
print(x_sel)
# [2 3 4]
Or this without extract:
x_sel = x[(x > 1) & (x <= 4)]
This should work:
import numpy as np
x = np.array([1,2,3,4])
condition = (x<=4) & (x>1)
x_sel = np.extract(condition,x)
Mind the usage of and and &:
Difference between 'and' (boolean) vs. '&' (bitwise) in python. Why difference in behavior with lists vs numpy arrays?

Calculation between values in numpy array

New to python and need some help. I have a numpy array tuple with a shape of (1, 8760) with numbers within each of the 8760 locations. I've been trying to calculate if the values is between -2 and 2 then my new variable will be 0 else just keep the same value in the new variable. Here is what I tried and many others but I probably don't understand the array concept fully.
for x in flow:
if 2 > x < -2:
lflow = 0
else:
lflow = flow
I get this error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
From what I read those functions gives me a true or false but I want to calculate of value instead of telling me if it matches or not. Please help.
Thanks
If your shape is (1,8760) an array of 8760 elements is assigned to x in your iteration, because the loop iterates the first axis containing one element of size 8760.
Furthermore I'd suggest to use "where" function instead of a loop:
# create a random array with 100 values in the range [-5,5]
a = numpy.random.random(100)*10 - 5
# return an array with all elements within that range set to 0
print numpy.where((a < -2) | (a > 2), a, 0)
Numpy uses boolean masks to select or assign values to arrays in bulk. For example, given the array
A = np.array([-3,-1,-2,0,1,5,2])
This mask represents all the values in A that are less than -2 or greater than 2
mask = (A < -2) | (A > 2)
Then use it to assign those values to 0
A[mask] = 0
This is much faster than using a regular loop in python, since numpy will perform this operation in c or fortran code

Categories