How to do element-wise comparison between two NumPy arrays - python

I have two arrays. I would like to do an element-wise comparison between the two of them to find out which values are the same.
a= np.array([[1,2],[3,4]])
b= np.array([[3,2],[1,4]])
Is there a way for me to compare these two arrays to 1) find out which values are the same and 2) get the index of the same values?
Adding on to the previous question, is there a way for me to return 1 if the values are the same and 0 otherwise?
Thanks in advance!

a= np.array([[1,2],[3,4]])
b= np.array([[3,2],[1,4]])
#1) find out which values are the same
a==b
# array([[False, True],
# [False, True]])
#2) get the index of the same values?
np.where((a==b) == True) # or np.where(a==b)
#(array([0, 1]), array([1, 1]))
# Adding on to the previous question, is there a way for me to return 1 if the values are the same and 0 otherwise
(a==b).astype(int)
# array([[0, 1],
# [0, 1]])

Related

Python "in" keyword-function does not work properly on numpy arrays

Why does this piece of code return True when it clearly can be seen that the element [1, 1] is not present in the first array and what am I supposed to change in order to make it return False?
aux = np.asarray([[0, 1], [1, 2], [1, 3]])
np.asarray([1, 1]) in aux
True
Checking for equality for the two arrays broadcasts the 1d array so the == operator checks if the corresponding indices are equal.
>>> np.array([1, 1]) == aux
array([[False, True],
[ True, False],
[ True, False]])
Since none of the inner arrays are all True, no array in aux is completely equal to the other array. We can check for this using
np.any(np.all(np.array([1, 1]) == aux, axis=1))
You can think of the in operator looping through an iterable and comparing each item for equality with what's being matched. I think what happens here can be demonstrated with the comparison to the first vector in the matrix/list:
>>> np.array([1, 1]) == np.array([0,1])
array([False, True])
and bool([False, True]) in Python == True so the in operator immediately returns.

Extract numpy rows by given condition

I have numpy array as follows.
import numpy as np
data = np.array([[0,0,0,4],
[3,0,5,0],
[8,9,5,3]])
print (data)
I have to extract only those lines which first three elements are not all zeros
expected result is as follows:
result = np.array([[3,0,5,0],
[8,9,5,3]])
I tried as:
res = [l for l in data if l[:3].sum() !=0]
print (res)
It gives result. But, looking for better, numpy way of doing it.
sum is a bit unreliable if your array can contain negative numbers, but any will always work:
result = data[data[:, :3].any(1)]
You say
first three elements are not all zeros
so a solution is
import numpy as np
data = np.array([[0,0,0,4],
[3,0,5,0],
[8,9,5,3]])
data[~np.all(data[:, :3] == 0, axis=1), :]
I'll try to explaing how I think about these kinds of problems through my answer.
First step: define a function that returns a boolean indicating whether this is a good row.
For that, I use np.any, which checks if any of the entries is "True" (for integers, true is non-zero).
import numpy as np
v1 = np.array([1, 1, 1, 0])
v2 = np.array([0, 0, 0, 1])
good_row = lambda v: np.any(v[:3])
good_row(v1)
Out[28]: True
good_row(v2)
Out[29]: False
Second step: I apply this on all rows, and obtain a masking vector. To do so, one can use the 'axis' keyword in 'np.any', which will apply this on columns or rows depending on the axis value.
np.any(data[:, :3], axis=1)
Out[32]: array([False, True, True])
Final step: I combine this with indexing, to wrap it all.
rows_inds = np.any(data[:, :3], axis=1)
data[rows_inds]
Out[37]:
array([[3, 0, 5, 0],
[8, 9, 5, 3]])

Extract values that satisfy a condition from numpy array

Say I have the following arrays:
a = np.array([1,1,1,2,2,2])
b = np.array([4,6,1,8,2,1])
Is it possible to do the following:
a[np.where(b>3)[0]]
#array([1, 1, 2])
Thus select values from a according to the indices in which a condition in b is satisfied, but using exclusively np.where or a similar numpy function?
In other words, can np.where be used specifying only an array from which to get values when the condition is True? Or is there another numpy function to do this in one step?
Yes, there is a function: numpy.extract(condition, array) returns all values from array that satifsy the condition.
There is not much benefit in using this function over np.where or boolean indexing. All of these approaches create a temporary boolean array that stores the result of b>3. np.where creates an additional index array, while a[b>3]and np.extract use the boolean array directly.
Personally, I would use a[b>3] because it is the tersest form.
Just use boolean indexing.
>>> a = np.array([1,1,1,2,2,2])
>>> b = np.array([4,6,1,8,2,1])
>>>
>>> a[b > 3]
array([1, 1, 2])
b > 3 will give you array([True, True, False, True, False, False]) and with a[b > 3] you select all elements from a where the indexing array is True.
Let's use list comprehension to solve this -
a = np.array([1,1,1,2,2,2])
b = np.array([4,6,1,8,2,1])
indices = [i for i in range(len(b)) if b[i]>3] # Returns indexes of b where b > 3 - [0, 1, 3]
a[indices]
array([1, 1, 2])

How to generate a bool 2D arrays from two 1D arrays using numpy

I have two arrays a=[1,2,3,4] and b=[2,3]. I am wondering is there an efficient way to construct a boolean 2D array c (2D matrix, i.e. 2*4 matrix) based on array element comparsions, i.e. c[0,0] = true iff a[0] == b[0]. The basic way is to iterate through all the elements of a and b, but I think there maybe a better using numpy. I checked numpyreference, but could not find a routine could exactly that.
thanks
If I understood the question correctly, you can extend the dimensions of b with np.newaxis/None to form a 2D array and then perform equality check against a, which will bring in broadcasting for a vectorized solution, like so -
b[:,None] == a
Sample run -
In [5]: a
Out[5]: array([1, 2, 3, 4])
In [6]: b
Out[6]: array([2, 3])
In [7]: b[:,None] == a
Out[7]:
array([[False, True, False, False],
[False, False, True, False]], dtype=bool)

Remove one value from a NumPy array

I am trying to all rows that only contain zeros from a NumPy array. For example, I want to remove [0,0] from
n = np.array([[1,2], [0,0], [5,6]])
and be left with:
np.array([[1,2], [5,6]])
To remove the second row from a numpy table:
import numpy
n = numpy.array([[1,2],[0,0],[5,6]])
new_n = numpy.delete(n, 1, axis=0)
To remove rows containing only 0:
import numpy
n = numpy.array([[1,2],[0,0],[5,6]])
idxs = numpy.any(n != 0, axis=1) # index of rows with at least one non zero value
n_non_zero = n[idxs, :] # selection of the wanted rows
If you want to delete any row that only contains zeros, the fastest way I can think of is:
n = numpy.array([[1,2], [0,0], [5,6]])
keep_row = n.any(axis=1) # Index of rows with at least one non-zero value
n_non_zero = n[keep_row] # Rows to keep, only
This runs much faster than Simon's answer, because n.any() stops checking the values of each row as soon as it encounters any non-zero value (in Simon's answer, all the elements of each row are compared to zero first, which results in unnecessary computations).
Here is a generalization of the answer, if you ever need to remove a rows that have a specific value (instead of removing only rows that only contain zeros):
n = numpy.array([[1,2], [0,0], [5,6]])
to_be_removed = [0, 0] # Can be any row values: [5, 6], etc.
other_rows = (n != to_be_removed).any(axis=1) # Rows that have at least one element that differs
n_other_rows = n[other_rows] # New array with rows equal to to_be_removed removed.
Note that this solution is not fully optimized: even if the first element of to_be_removed does not match, the remaining row elements from n are compared to those of to_be_removed (as in Simon's answer).
I'd be curious to know if there is a simple efficient NumPy solution to the more general problem of deleting rows with a specific value.
Using cython loops might be a fast solution: for each row, element comparison could be stopped as soon as one element from the row differs from the corresponding element in to_be_removed.
You can use numpy.delete to remove specific rows or columns.
For example:
n = [[1,2], [0,0], [5,6]]
np.delete(n, 1, axis=0)
The output will be:
array([[1, 2],
[5, 6]])
To delete according to value,which is an Object.
To do like this:
>>> n
array([[1, 2],
[0, 0],
[5, 6]])
>>> bl=n==[0,0]
>>> bl
array([[False, False],
[ True, True],
[False, False]], dtype=bool)
>>> bl=np.any(bl,axis=1)
>>> bl
array([False, True, False], dtype=bool)
>>> ind=np.nonzero(bl)[0]
>>> ind
array([1])
>>> np.delete(n,ind,axis=0)
array([[1, 2],
[5, 6]])

Categories