Extracting specific columns in numpy array by condition - python

I have a homework assignment to extract a 2-dimensional numpy array out of another 2-dimensional np array by choosing specific columns by condition (not by range).
So I have an array A with shape (3, 50000). I am trying to get a new array with shape (3, x) for some x < 50000 with the original columns ofAthat satisfy the third cell in the column is-0.4 < z < 0.1`.
For example if:
A = [[1,2,3],[2,0.5,0],[9,-2,-0.2],[0,0,0.5]]
I wish to have back:
B = [[2,0.5,0],[9,-2,-0.2]
I have tried to make a bool 1 rank array that holds true on the columns I want, and to some how combine between the two. The problem it's output is 1 rank array which is not what I am looking for. And I got some ValueErrors..
bool_idx = (-0.4 < x_y_z[2] < 0.1)
This code made some troubles:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I can do it with some loops but NumPy got so many beautiful function I am sure I am missing something here..

In Python, the expression -0.4 < x_y_z[2] < 0.1 is roughly equivalent to -0.4 < x_y_z[2] and x_y_z[2] < 0.1. The and operator decides the truth value of each part of the expression by converting it into a bool. Unlike Python lists and tuples, numpy arrays do not support the conversion.
The correct way to specify the condition is with bitwise & (which is unambiguous and non-short-circuiting), rather than the implicit and (which short circuits and is ambiguous in this case):
condition = ((x_y_z[2, :] > - 0.4) & (x_y_z[2, :] < 0.1))
condition is a boolean mask that selects the columns you want. You can select the rows with a simple slice:
selection = x_y_z[:, condition]

Related

Number of values within a certain range in Python

I have an array A. I want to print total number of values in the range [1e-11,1e-7]. But I am getting an error. I present the expected output.
import numpy as np
A=np.array([ 4.21922009e+002, 4.02356746e+002, 3.96553289e-09,
3.91811967e-010, 3.88467908e-08, 3.86636300e-010])
B=1e-11<A<1e-07
print(B)
The error is
in <module>
B=1e-11<A<1e-07
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The expected output is
4
The numpy-way is to refactor the interval condition into two subconditions using the & operator:
a = np.array([ 4.21922009e+002, 4.02356746e+002, 3.96553289e-09,
3.91811967e-010, 3.88467908e-08, 3.86636300e-010])
mask = (1e-11<a) & (a<1e-07)
# if you care about the values of the filtered array
print(a[mask].size)
# or just
print(np.count_nonzero(mask))
You can't use your code with numpy array:
B = sum((1e-11<A) & (A<1e-07))
print(B)
# Output
4
It doesn't make sense for Python (and not numpy) to compare 2 scalar values to an array.

How to get the average of only positive numbers in a numpy array?

We were given two text files that contain numbers and we need to load them using numpy.loadtxt and then find the mean of only the positive numbers within the text files.
This is what I've tried so far:
A = np.loadtxt('A.txt')
B = np.loadtxt('B.txt')
A_mean = np.mean(np.where(A>0))
B_mean = np.mean(np.where(B>0))
print(f'A={A_mean}')
print(f'B={B_mean}')
I know this is obviously wrong since it appears to be returning the average of the index numbers, not the values. How can I get the average of the actual values?
np.where(A > 0) returns the indices where A > 0 as you noticed. To get the elements of A, use the indices: np.mean(A[np.where(A > 0)]).
But wait, A > 0 is a boolean array that has True in every element that meets the condition and False elsewhere. Such an array is called a "mask", and can be used for indexing as well:
A[A > 0].mean()

downsample big numpy.ndarray on if condition in python

I have a large numpy.ndarray and I need to downsample this array based on the value of one column. My solution works, but is very slow
data_table = data_table[[i for i in range(0, len(data_table)) if data_table[i][7] > 0.2 and data_table[i][7] < 0.75]]
does anybody know what the fastest way is to do this?
Use column-slicing to select relevant columns and compare those against the thresholds in a vectorized manner to give us a mask of valid rows and then index into the rows for the rows filtered output -
out = data_table[(data_table[:,7] > 0.2) & (data_table[:,7] < 0.75)]

setting null values in a numpy array

how do I null certain values in numpy array based on a condition?
I don't understand why I end up with 0 instead of null or empty values where the condition is not met... b is a numpy array populated with 0 and 1 values, c is another fully populated numpy array. All arrays are 71x71x166
a = np.empty(((71,71,166)))
d = np.empty(((71,71,166)))
for indexes, value in np.ndenumerate(b):
i,j,k = indexes
a[i,j,k] = np.where(b[i,j,k] == 1, c[i,j,k], d[i,j,k])
I want to end up with an array which only has values where the condition is met and is empty everywhere else but with out changing its shape
FULL ISSUE FOR CLARIFICATION as asked for:
I start with a float populated array with shape (71,71,166)
I make an int array based on a cutoff applied to the float array basically creating a number of bins, roughly marking out 10 areas within the array with 0 values in between
What I want to end up with is an array with shape (71,71,166) which has the average values in a particular array direction (assuming vertical direction, if you think of a 3D array as a 3D cube) of a certain "bin"...
so I was trying to loop through the "bins" b == 1, b == 2 etc, sampling the float where that condition is met but being null elsewhere so I can take the average, and then recombine into one array at the end of the loop....
Not sure if I'm making myself understood. I'm using the np.where and using the indexing as I keep getting errors when I try and do it without although it feels very inefficient.
Consider this example:
import numpy as np
data = np.random.random((4,3))
mask = np.random.random_integers(0,1,(4,3))
data[mask==0] = np.NaN
The data will be set to nan wherever the mask is 0. You can use any kind of condition you want, of course, or do something different for different values in b.
To erase everything except a specific bin, try the following:
c[b!=1] = np.NaN
So, to make a copy of everything in a specific bin:
a = np.copy(c)
a[b!=1] == np.NaN
To get the average of everything in a bin:
np.mean(c[b==1])
So perhaps this might do what you want (where bins is a list of bin values):
a = np.empty(c.shape)
a[b==0] = np.NaN
for bin in bins:
a[b==bin] = np.mean(c[b==bin])
np.empty sometimes fills the array with 0's; it's undefined what the contents of an empty() array is, so 0 is perfectly valid. For example, try this instead:
d = np.nan * np.empty((71, 71, 166)).
But consider using numpy's strength, and don't iterate over the array:
a = np.where(b, c, d)
(since b is 0 or 1, I've excluded the explicit comparison b == 1.)
You may even want to consider using a masked array instead:
a = np.ma.masked_where(b, c)
which seems to make more sense with respect to your question: "how do I null certain values in a numpy array based on a condition" (replace null with mask and you're done).

Setting values in a numpy arrays indexed by a slice and two boolean arrays

I have two numpy arrays:
a = np.arange(100*100).reshape(100,100)
b = np.random.rand(100, 100)
I also have a tuple of slices to extract a certain piece of the array:
slice_ = (slice(5, 10), slice(5, 10))
I then have a set of boolean indices to select a certain part of this slice:
indices = b[slice_] > 0.5
If I want to set these indices to a different value I can do it easily:
a[slice_][indices] = 42
However, if I create another set of boolean indices to select a specific part of the indexed array:
high_indices = a[slice_][indices] > 700
and then try and set the value of the array at these indices:
a[slice_][indices][high_indices] = 42 # Doesn't do anything!
I thought maybe I needed to AND the two index arrays together, but they are different shape: indices has a shape of (5, 5) and high_indices has a shape of (12,).
I think I've got myself into a terrible muddle here trying to do something relatively simple. How can I index using these two boolean arrays in such a way that I can set the values of the array?
Slicing a numpy array returns a view, but boolean indexing returns a copy of an array. So when you indexed it first time with boolean index in a[slice_][indices][high_indices], you got back a copy, and the value 42 is assigned to a copy and not to the array a. You can solve the problem by chaining the boolean index:
a[slice_][(a[slice_] > 700) & (b[slice_] > 0.5)] = 42

Categories