Query regarding a Numpy exercise in Datacamp - python

Just started learning python with Datacamp and I ran into a question on Numpy. When doing this problem(it's a standalone question so should be easy to understand without any context), I am confused by the instruction:"You can use a little trick here: use np_positions == 'GK' as an index for np_heights".
Nowhere in the code did it link np_heights and np_positions together, how could this index work? At first I thought I had to concatenate the two vertically but it turns out it's not necessary.
Is it because there are only two Numpy arrays and it just so happened that they have the same number of elements, Python decides to pair them up automatically? What if I have multiple Numpy arrays with the same number of elements and I use that index, will it be a problem?

The only thing they have in common is their length. Other than that, they are not linked together. The length comes into play when you use boolean indexing.
Consider the following array:
arr = np.array([1, 2, 3])
With boolean values, we can index into this array:
arr[[True, False, True]]
Out: array([1, 3])
This returned values at positions 0 and 2 (where they have True values).
This boolean array may come from anywhere. It may come from the same array with a comparison, or from a different array of the same length.
arr1 = np.array(['a', 'b', 'a', 'c'])
If I do arr1 == 'a' it will do an element-wise comparison and return
arr1 == 'a'
Out: array([ True, False, True, False], dtype=bool)
I can use this in the same array:
arr1[arr1=='a']
Out:
array(['a', 'a'],
dtype='<U1')
Or in a different array:
arr2 = np.array([2, 5, 1, 7])
arr2[arr1=='a']
Out: array([2, 1])
Note that this is no different than arr2[[True, False, True, False]]. So we are not actually using arr1 here. In your example, np_positions == 'GK' will return a boolean array too. Since it will have the same size as the np_height array, you will only deal with positions where the boolean array has True values.

Related

Retrieving all numpy array indices where condition

Say I have a numpy array, a, of elements (which do not repeat) np.array([1,3,5,2,4]), I would like to retrieve the indices a contains [4,2]. Desired output: np.array([3,4]) as these are the indices the requested elements.
So far, I've tried
np.all(np.array([[1,2,3,4]]).transpose(), axis=1, where=lambda x: x in [1,2])
>>>
array([ True, True, True, True])
But this result does not make sense to me. Elements at indices 2,3 should be False
Perhaps I need to search for one element at a time, but I'd prefer if this operation could be vectorized/fast.
I'd say the function you're looking for is numpy.isin()
arr = np.array([[1,2,3,4]])
print(np.where(np.isin(arr, [1,2])))
Should give the output you're looking for

Return True/False for entire array if any value meets mask requirement(s)

I have already tried looking at other similar posts however, their solutions do not solve this specific issue. Using the answer from this post I found that I get the error: "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" because I define my array differently from theirs. Their array is a size (n,) while my array is a size (n,m). Moreover, the solution from this post does not work either because it applies to lists. The only method I could think of was this:
When there is at least 1 True in array, then entire array is considered True:
filt = 4
tracktruth = list()
arraytruth = list()
arr1 = np.array([[1,2,4]])
for track in range(0,arr1.size):
if filt == arr1[0,track]:
tracktruth.append(True)
else:
tracktruth.append(False)
if any(tracktruth):
arraytruth.append(True)
else:
arraytruth.append(False)
When there is not a single True in array, then entire array is considered False:
filt = 5
tracktruth = list()
arraytruth = list()
arr1 = np.array([[1,2,4]])
for track in range(0,arr1.size):
if filt == arr1[0,track]:
tracktruth.append(True)
else:
tracktruth.append(False)
if any(tracktruth):
arraytruth.append(True)
else:
arraytruth.append(False)
The reason the second if-else statement is there is because I wish to apply this mask to multiple arrays and ultimately create a master list that describes which arrays are true and which are false in their entirety. However, with a for loop and two if-else statements, I think this would be very slow with larger arrays. What would be a faster way to do this?
This seems overly complicated, you can use boolean indexing to achieve results without loops
arr1=np.array([[1,2,4]])
filt=4
arr1==filt
array([[False, False, True]])
np.sum(arr1==filt).astype(bool)
True
With nmore than one row, you can use the row or column index in the np.sum or you can use the axis parameter to sum on rows or columns
As pointed out in the comments, you can use np.any() instead of the np.sum(...).astype(bool) and it runs in roughly 2/3 the time on the test dataset:
np.any(a==filt, axis=1)
array([ True])
You can do this with list comprehension. I've done it here for one array but it's easily extended to multiple arrays with a for loop
filt = 4
arr1 = np.array([[1,2,4]])
print(any([part == filt for part in arr1[0]]))
You can get the arraytruth more generally, with list comprehension for the array of size (n,m)
import numpy as np
filt = 4
a = np.array([[1, 2, 4]])
b = np.array([[1, 2, 3],
[5, 6, 7]])
array_lists = [a, b]
arraytruth = [True if a[a==filt].size>0 else False for a in array_lists]
print(arraytruth)
This will give you:
[True, False]
[edit] Use numpy hstack method.
filt = 4
arr = np.array([[1,2,3,4], [1,2,3]])
print(any([x for x in np.hstack(arr) if x < filt]))

How to generate a bool 2D arrays from two 1D arrays using numpy

I have two arrays a=[1,2,3,4] and b=[2,3]. I am wondering is there an efficient way to construct a boolean 2D array c (2D matrix, i.e. 2*4 matrix) based on array element comparsions, i.e. c[0,0] = true iff a[0] == b[0]. The basic way is to iterate through all the elements of a and b, but I think there maybe a better using numpy. I checked numpyreference, but could not find a routine could exactly that.
thanks
If I understood the question correctly, you can extend the dimensions of b with np.newaxis/None to form a 2D array and then perform equality check against a, which will bring in broadcasting for a vectorized solution, like so -
b[:,None] == a
Sample run -
In [5]: a
Out[5]: array([1, 2, 3, 4])
In [6]: b
Out[6]: array([2, 3])
In [7]: b[:,None] == a
Out[7]:
array([[False, True, False, False],
[False, False, True, False]], dtype=bool)

Equality of copy.copy and copy.deepcopy in python copy module

I am creating a list of numpy arrays then copying it to another array to keep an original copy. Copying was done using deepcopy() function. When I am comparing the two arrays now, it is showing false in equivalence. But its all good when I am using copy() function .I understand the difference between copy and deepcopy function, but shall the equivalence be not same?
That is:
grid1=np.empty([3,3],dtype=object)
for i in xrange(3):
for j in xrange(3):
grid1[i][j] = [i,np.random.uniform(-3.5,3.5,(3,3))]
grid_init=[]
grid_init=copy.deepcopy(grid1)
grid1==grid_init #returns False
grid_init=[]
grid_init=copy.copy(grid1)
grid1==grid_init #returns True
grid_init=[]
grid_init=copy.deepcopy(grid1)
np.array_equal(grid1,grid_init) #returns False
Shall all be not true?
This is what I'm getting when running the first example:
WARNING:py.warnings:/usr/local/bin/ipython:1: DeprecationWarning: elementwise comparison failed; this will raise the error in the future.
To see why the elementwise comparison fails, simply try to compare a single element:
grid_init=copy.deepcopy(grid1)
grid_init[0][0] == grid1[0][0]
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This fails because the second element in the list is in itself a numpy array, and comparison of two numpy arrays does not return a bool (but an array).
Now, why does the example case behave differently?
Seems to be some interpreter optimization which avoid the actual comparison logic if the two objects are the same one. The two are the same object, because the copying was shallow.
grid_init=copy.copy(grid1)
grid_init[0][0] is grid1[0][0]
> True
grid_init[0][0] == grid1[0][0]
> True
The root cause is that you're using a numpy array of dtype=object, with lists in it. This is not a good idea, and can lead to all sorts of weirdnesses.
Instead, you should simply create 2 aligned arrays, one for the first element in your lists, and one for the second.
I must be running a different version of numpy/python, but I get slightly different errors and/or results. Still the same issue applies - mixing arrays and lists can produce complicated results.
Make the 2 copies
In [217]: x=copy.copy(grid1)
In [218]: y=copy.deepcopy(grid1)
Equality with the shallow copy, gives a element by element comparison, a 3x3 boolean:
In [219]: x==grid1
Out[219]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
The elements are 2 item lists:
In [220]: grid1[0,0]
Out[220]:
[0, array([[ 2.08833787, -0.24595155, -3.15694342],
[-3.05157909, 1.83814619, -0.78387624],
[ 1.70892355, -0.87361521, -0.83255383]])]
And in the shallow copy, the list ids are the same. The 2 arrays have different data buffers (x is not a view), but they both point to the same list objects (located else where in memeory).
In [221]: id(grid1[0,0])
Out[221]: 2958477004
In [222]: id(x[0,0])
Out[222]: 2958477004
With the same id the lists are equal (they also satisfy the is test).
In [234]: grid1[0,0]==x[0,0]
Out[234]: True
But == with the deepcopy produces a simple False. No element by element comparison here. I'm not sure why. Maybe this is an area in which numpy is undergoing development.
In [223]: y==grid1
Out[223]: False
Note that the deepcopy element ids are different:
In [229]: id(y[0,0])
Out[229]: 2957009900
When I try to apply == to an element of these arrays I get an error:
In [235]: grid1[0,0]==y[0,0]
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is the error that comes up repeatedly in SO questions, usually because people try to use an boolean array (from a comparison) in a scalar Python context.
I can compare the arrays with in the lists:
In [236]: grid1[0,0][1]==y[0,0][1]
Out[236]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
I can reproduce the ValueError with a simpler comparison - 2 lists, which contain an array. On the surface they look the same, but because the arrays have different ids, it fails.
In [239]: [0,np.arange(3)]==[0,np.arange(3)]
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This pair of comparisons shows what is going on:
In [242]: [0,np.arange(3)][0]==[0,np.arange(3)][0]
Out[242]: True
In [243]: [0,np.arange(3)][1]==[0,np.arange(3)][1]
Out[243]: array([ True, True, True], dtype=bool)
Python compares the respective elements of the lists, and then tries to perform a logical operation to combine them, all(). But it can't perform all on [True, array([True,True,True])].
So in my version, y==grid1 returns False because the element by element comparisons return ValueErrors. It's either that or raise an error or warning. They clearly aren't equal.
In sum, with this array of lists of number and array, equality tests end up mixing array operations and list operations. The outcomes are logical, but complicated. You have to be keenly aware of how arrays are compared, and how lists are compared. They are not interchangeable.
A structured array
You could put this data in a structured array, with a dtype like
dt = np.dtype([('f0',int),('f1',float,(3,3))])
In [263]: dt = np.dtype([('f0',int),('f1',float,(3,3))])
In [264]: grid2=np.empty([3,3],dtype=dt)
In [265]: for i in range(3):
for j in range(3):
grid2[i][j] = (i,np.random.uniform(-3.5,3.5,(3,3)))
.....:
In [266]: grid2
Out[266]:
array([[ (0,
[[2.719807845330254, -0.6379512247418969, -0.02567206509563602],
[0.9585030371031278, -1.0042751112999135, -2.7805349057485946],
[-2.244526250770717, 0.5740647379258945, 0.29076071288760574]]),
....]])]],
dtype=[('f0', '<i4'), ('f1', '<f8', (3, 3))])
The first field, integers can be fetched with (giving a 3x3 array)
In [267]: grid2['f0']
Out[267]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
The second field contains 3x3 arrays, which when accessed by field name are a 4d array:
In [269]: grid2['f1'].shape
Out[269]: (3, 3, 3, 3)
A single element is a record (or tuple),
In [270]: grid2[2,1]
Out[270]: (2, [[1.6236266210555836, -2.7383730706629636, -0.46604477485902374], [-2.781740733659544, 0.7822732671353201, 3.0054266762730473], [3.3135671425199824, -2.7466097112667103, -0.15205961855874406]])
Now both kinds of copy produce the same thing:
In [271]: x=copy.copy(grid2)
In [272]: y=copy.deepcopy(grid2)
In [273]: x==grid2
Out[273]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
In [274]: y==grid2
Out[274]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
Since grid2 is pure ndarray (no intermediate lists) I suspect copy.copy and copy.deepcopy end up using grid2.copy(). In numpy we normally use the array copy method, and don't bother with the copy module.
p.s. it appears that with dtype=object, grid1.copy() is the same as copy.copy(grid1) - a new array, but the same object pointers (i.e. same data).

Difference between nonzero(a), where(a) and argwhere(a). When to use which?

In Numpy, nonzero(a), where(a) and argwhere(a), with a being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?
On argwhere the documentation says:
np.argwhere(a) is the same as np.transpose(np.nonzero(a)).
Why have a whole function that just transposes the output of nonzero ? When would that be so useful that it deserves a separate function?
What about the difference between where(a) and nonzero(a)? Wouldn't they return the exact same result?
nonzero and argwhere both give you information about where in the array the elements are True. where works the same as nonzero in the form you have posted, but it has a second form:
np.where(mask,a,b)
which can be roughly thought of as a numpy "ufunc" version of the conditional expression:
a[i] if mask[i] else b[i]
(with appropriate broadcasting of a and b).
As far as having both nonzero and argwhere, they're conceptually different. nonzero is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0's are sparse:
mask = a == 0 # entire array of bools
mask = np.nonzero(a)
Now you can use that mask to index other arrays, etc. However, as it is, it's not very nice conceptually to figure out which indices correspond to 0 elements. That's where argwhere comes in.
I can't comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where vs nonzero. In it's simplest use case, where is indeed the same as nonzero.
>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
or
>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)
where is different from nonzero in the case when you wish to pick elements of from array a if some condition is True and from array b when that condition is False.
>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])
Again, I can't explain why they added the nonzero functionality to where, but this at least explains how the two are different.
EDIT: Fixed the first example... my logic was incorrect previously

Categories