Condition in numpy array - python

I just learn about numpy and array today and I am quite confused about something. I don’t get line 3 as I thought np.array() should have a list in the (). Can someone explain that line to me? And for line 5, I know it is comparing array x and y. But can someone explain to me how it works? And what does x[y] mean? Thank you so much.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array(x<4)
print(y)
print(x[y])

What this code snippet does is masking the x array with a condition. x[y] is the masked array, which shows only the elements of x where y is True (in that case where x < 4).
y = np.array(x < 4) has an useless np.array call, as x < 4 is already a numpy array. That being said, you can give many objects to np.array() such as lists, tuples, other arrays...
The whole thing should be simply:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
print(x[x < 4])
# [1 2 3]

Line three tells you the places in x where the value is less than four. You don't need the "np.array" constructor out front though - y = x<4 will return you the same array.
Line five filters the x array, using the values in y that are 'True'. This is often referred to as a 'mask'.

Related

How to filter two numpy arrays?

Edit: I fixed y so that x,y have the same length
I don't understand much about programing but I have a giant mass of data to analyze and it has to be done in Python.
Say I have two arrays:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
and say I want to choose the values in y which are greater than 17, and keep only the values in x which has the same index as the left values in y. for example I want to erase the first value of y (25) and accordingly the matching value in x (1).
I tried this:
filter=np.where(y>17, 0, y)
but I don't know how to filter the x values accordingly (the actual data are much longer arrays so doing it "by hand" is basically imposible)
Solution: using #mozway tip, now that x,y have the same length the needed code is:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
x_filtered=x[y>17]
As your question is not fully clear and you did not provide the expected output, here are two possibilities:
filtering
Nunique arrays can be sliced by an array (iterable) of booleans.
If the two arrays were the same length you could do:
x[y>17]
Here, xis longer than y so we first need to make it the same length:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20])
x[:len(y)][y>17]
Output: array([1, 2, 4, 5, 8])
replacement
To select between x and y based on a condition, use where:
np.where(y>17, x[:len(y)], y)
Output:
array([ 1, 2, 16, 4, 5, 5, 9, 8])
As someone with little experience in Numpy specifically, I wrote this answer before seeing #mozway's excellent answer for filtering. My answer works on more generic containers than Numpy's arrays, though it uses more concepts as a result. I'll attempt to explain each concept in enough detail for the answer to make sense.
TL;DR:
Please, definitely read the rest of the answer, it'll help you understand what's going on.
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([25,18,16,19,30,5,9,20])
filtered_x_list = []
filtered_y_list = []
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
# These lines are just for us to see what happened
print(filtered_x) # prints [1 2 4 5 8]
print(filtered_y) # prints [25 18 19 30 20]
Pre-requisite Knowledge
Python containers (lists, arrays, and a bunch of other stuff I won't get into)
Lets take a look at the line:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
What's Python doing?
The first thing it's doing is creating a list:
[1, 2, 3] # and so on
Lists in Python have a few features that are useful for us in this solution:
Accessing elements:
x_list = [ 1, 2, 3 ]
print(x_list[0]) # prints 1
print(x_list[1]) # prints 2, and so on
Adding elements to the end:
x_list = [ 1, 2, 3 ]
x_list.append(4)
print(x_list) # prints [1, 2, 3, 4]
Iteration:
x_list = [ 1, 2, 3 ]
for x in x_list:
print(x)
# prints:
# 1
# 2
# 3
Numpy arrays are slightly different: we can still access and iterate elements in them, but once they're created, we can't modify them - they have no .append, and there are other modifications one can do with lists (like changing one value, or deleting a value) we can't do with numpy arrays.
So the filtered_x_list and the filtered_y_list are empty lists we're creating, but we're going to modify them by adding the values we care about to the end.
The second thing Python is doing is creating a numpy array, using the list to define its contents. The array constructor can take a list expressed as [...], or a list defined by x_list = [...], which we're going to take advantage of later.
A little more on iteration
In your question, for every x element, there is a corresponding y element. We want to test something for each y element, then act on the corresponding x element, too.
Since we can access the same element in both arrays using an index - x[0], for instance - instead of iterating over one list or the other, we can iterate over all indices needed to access the lists.
First, we need to figure out how many indices we're going to need, which is just the length of the lists. len(x) lets us do that - in this case, it returns 10.
What if x and y are different lengths? In this case, I chose the smallest of the two - first, do len(x) and len(y), then pass those to the min() function, which is what min(len(x), len(y)) in the code above means.
Finally, we want to actually iterate through the indices, starting at 0 and ending at len(x) - 1 or len(y) - 1, whichever is smallest. The range sequence lets us do exactly that:
for i in range(10):
print(i)
# prints:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
So range(min(len(x), len(y))), finally, gets us the indices to iterate over, and finally, this line makes sense:
for i in range(min(len(x), len(y))):
Inside this for loop, i now gives us an index we can use for both x and y.
Now, we can do the comparison in our for loop:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
Then, including xs for the corresponding ys is a simple case of just appending the same x value to the x list:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
The filtered lists now contain the numbers you're after. The last two lines, outside the for loop, just create numpy arrays from the results:
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
Which you might want to do, if certain numpy functions expect arrays.
While there are, in my opinion, better ways to do this (I would probably write custom iterators that produce the intended results without creating new lists), they require a somewhat more advanced understanding of programming, so I opted for something simpler.

Python - Create Array from List?!

import numpy as np
means = [[2, 2], [8, 3], [3, 6]]
cov = [[1, 0], [0, 1]]
N = 20
X0 = np.random.multivariate_normal(means[0], cov, N)
X1 = np.random.multivariate_normal(means[1], cov, N)
X2 = np.random.multivariate_normal(means[2], cov, N)
X = np.concatenate((X0, X1, X2), axis = 0)
Y = X[np.random.choice(X.shape[0], 3, replace=False)]
A = [X[np.random.choice(X.shape[0], 3, replace=False)]]
B = A[-1]
print(Y), print(type(Y))
print(A), print(type(A))
print(B), print(type(B))
>>>
[[3.58758421 6.83484817]
[9.10469916 4.23009063]
[7.24996633 4.0524614 ]]
<class 'numpy.ndarray'>
[array([[3.22836848, 7.06719777],
[2.33102712, 0.96966102],
[2.06576315, 4.84061538]])]
<class 'list'>
[[3.22836848 7.06719777]
[2.33102712 0.96966102]
[2.06576315 4.84061538]]
<class 'numpy.ndarray'>
Can you help me explain
What does X[np.random.choice(X.shape[0], 3, replace=False)] mean?
Is np.random.choice() supposed to return a new array?
Why Y and A return different results?
Is B supposed to return the last element in the list?
Thank you!
You can find the docs for scipy and numpy here as referenced in the comments.
Y is a numpy.ndarray object, and A is a list object. This is due to the [brackets] you have when you create A. The first and only element in A (the list) is Y (the array).
B does return the last element in the list. The last element in the list is the array object.
I would recommend reading this documentation on numpy.random.choice to find out exactly how the function works. In this instance, it essentially chooses 3 random indices from the numpy array X.
Y = X[np.random.choice(X.shape[0], 3, replace=False)]
This line can be thought of like this: Choose 3 random values from X, and create a new numpy array containing those values, and call it Y.
A = [X[np.random.choice(X.shape[0], 3, replace=False)]]
Then, define a regular python list. This is a list with only one element. That one element is a numpy array of 3 random values from X. The key concept is that A only has one element. However, that one element happens to be an array, which itself has 3 elements.
B = A[-1]
Finally, you are right that this returns the last element of A, and calls it B. From above, we know that A only has one element, an array of 3 elements. Therefore, that array is the last element of the list A.
The major takeaway is that python allows you to have lists of lists, lists of numpy arrays, etc.

Splitting a array in python

How do you split an array in python in terms of the number of elements in the array. Im doing knn classification and I need to take into account of the first k elements of the 2D array.
import numpy as np
x = np.array([1, 2, 4, 4, 6, 7])
print(x[range(0, 4)])
You can also split it up by taking the range of elements that you want to work with. You could store x[range(x, x)]) in a variable and work with those particular elements of the array as well. The output as you can see splits the array up:
[1 2 4 4]
In Numpy, there is a method numpy.split.
x = np.arange(9.0)
np.split(x, 3)

Python equivalent to Matlab [a,b] = sort(y)

I am pretty new to the Python language and want to know how to the following
(1) y = [some vector]
(2) z = [some other vector]
(3) [ynew,indx] = sort(y)
(4) znew = z(indx)
I can do lines 1,2 and 4 but line 3 is giving me fits. Any suggestions. What I am looking for is not a user written function but something intrinsic to the language itself.
Thanks
using NumPy for line 3, assuming y is a row vector, otherwise axis=0 is needed:
ynew=y.sort(axis=1)
indx=y.argsort(axis=1)
I had the same problem with the following form, and the solution provided didn't work for me. I found a solution that work for me and I thought I could share it here in case anyone has the same problem:
My goal was to sort x in ascending number and move the indices of y in the same manner
x = np.array([00, 44, 22, 33, 11]) # create an array to sort
y = np.array([00, 11, 22, 33, 44]) # another array where you want to move index in the same way you did x
x_sorted = x[x.argsort()] # x sorted in ascending number
y_sorted = y[x.argsort()] # y sorted
The difference is that you don't store the original index position but, in my case, it wasn't a problem since I did that to modify multiple array following one. Thus, by using x.argsort(), it already gives where index have to be moved and, I think, will achieve the same results.
You could try to do something like the following:
import numpy as np
y = [1,3,2]
z = [3,2,1]
indx = [i[0] for i in sorted(enumerate(y), key=lambda x:x[1])]
print(indx)
#convert z to numpy array in order to use np.ix_ function
z = np.asarray(z)
znew = z[np.ix_(indx)]
print(znew)
Results:
#the indx is
[0, 2, 1]
#the znew is
array([3, 1, 2])

numpy array equivalent for += operator

I often do the following:
import numpy as np
def my_generator_fun():
yield x # some magically generated x
A = []
for x in my_generator_fun():
A += [x]
A = np.array(A)
Is there a better solution to this which operates on a numpy array from the start and avoids the creation of a standard python list?
Note that the += operator allows to extend an empty and dimensionless array with an arbitrarily dimensioned array whereas np.append and np.concatenate demand for equally dimensioned arrays.
Use np.fromiter:
def f(n):
for j in range(n):
yield j
>>> np.fromiter(f(5), dtype=np.intp)
array([0, 1, 2, 3, 4])
If you know beforehand the number of items the iterator is going to return, you can speed things up using the count keyword argument:
>>> np.fromiter(f(5), dtype=np.intp, count=5)
array([0, 1, 2, 3, 4])
To get the same array A, do:
A = numpy.arange(5)
Arrays are not in general meant to be dynamically sized, but you could use numpy.concatenate.

Categories