What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE
Related
What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE
I want to check if two arrays share at least one common element. For two arrays of equal size I can do the following:
import numpy as np
A = np.array([0,1,2,3,4])
B = np.array([5,6,7,8,9])
print(np.isin(A,B).any())
False
In my task, however, I want to do this over a 2d array of variable size. Example:
A = np.array([[0,1,2,3,4],[3,4,5], [2,4,7], [12,14]])
B = np.array([5,6,7,8,9])
function(A,B)
should return:
[False, True, True, False]
How can this task be performed efficiently?
A = np.array([[0,1,2,3,4], [3,4,5], [2,4,7], [12,14]])
B = np.array([5,6,7,8,9])
result = [np.isin(x, B).any() for x in A]
This might be what you're looking for.
Solution without a loop:
import numpy as np
A = np.array([[0,1,2,3,4],[3,4,5], [2,4,7], [12,14]], dtype=object)
B = np.array([5,6,7,8,9])
result = np.intersect1d(np.hstack(A), B)
print(result)
Prints:
[5 7]
I have a list of numpy arrays, say,
a = [np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3)]
and I have a test array, say
b = np.random.rand(3, 3)
I want to check whether a contains b or not. However
b in a
throws the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
What is the proper way for what I want?
You can just make one array of shape (3, 3, 3) out of a:
a = np.asarray(a)
And then compare it with b (we're comparing floats here, so we should use isclose())
np.all(np.isclose(a, b), axis=(1, 2))
For example:
a = [np.random.rand(3,3),np.random.rand(3,3),np.random.rand(3,3)]
a = np.asarray(a)
b = a[1, ...] # set b to some value we know will yield True
np.all(np.isclose(a, b), axis=(1, 2))
# array([False, True, False])
As highlighted by #jotasi the truth value is ambiguous due to element-wise comparison within the array.
There was a previous answer to this question here. Overall your task can be done in various ways:
list-to-array:
You can use the "in" operator by converting the list to a (3,3,3)-shaped array as follows:
>>> a = [np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3)]
>>> a= np.asarray(a)
>>> b= a[1].copy()
>>> b in a
True
np.all:
>>> any(np.all((b==a),axis=(1,2)))
True
list-comperhension:
This done by iterating over each array:
>>> any([(b == a_s).all() for a_s in a])
True
Below is a speed comparison of the three approaches above:
Speed Comparison
import numpy as np
import perfplot
perfplot.show(
setup=lambda n: np.asarray([np.random.rand(3*3).reshape(3,3) for i in range(n)]),
kernels=[
lambda a: a[-1] in a,
lambda a: any(np.all((a[-1]==a),axis=(1,2))),
lambda a: any([(a[-1] == a_s).all() for a_s in a])
],
labels=[
'in', 'np.all', 'list_comperhension'
],
n_range=[2**k for k in range(1,20)],
xlabel='Array size',
logx=True,
logy=True,
)
Ok so in doesn't work because it's effectively doing
def in_(obj, iterable):
for elem in iterable:
if obj == elem:
return True
return False
Now, the problem is that for two ndarrays a and b, a == b is an array (try it), not a boolean, so if a == b fails. The solution is do define a new function
def array_in(arr, list_of_arr):
for elem in list_of_arr:
if (arr == elem).all():
return True
return False
a = [np.arange(5)] * 3
b = np.ones(5)
array_in(b, a) # --> False
This error is because if a and b are numpy arrays then a == b doesn't return True or False, but array of boolean values after comparing a and b element-wise.
You can try something like this:
np.any([np.all(a_s == b) for a_s in a])
[np.all(a_s == b) for a_s in a] Here you are creating list of boolean values, iterating through elements of a and checking if all elements in b and particular element of a are the same.
With np.any you can check if any element in your array is True
As pointed out in this answer, the documentation states that:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
a[0]==b is an array, though, containing an element-wise comparison of a[0] and b. The overall truth value of this array is obviously ambiguous. Are they the same if all elements match, or if most match of if at least one matches? Therefore, numpy forces you to be explicit in what you mean. What you want to know, is to test whether all elements are the same. You can do that by using numpy's all method:
any((b is e) or (b == e).all() for e in a)
or put in a function:
def numpy_in(arrayToTest, listOfArrays):
return any((arrayToTest is e) or (arrayToTest == e).all()
for e in listOfArrays)
Use array_equal from numpy
import numpy as np
a = [np.random.rand(3,3),np.random.rand(3,3),np.random.rand(3,3)]
b = np.random.rand(3,3)
for i in a:
if np.array_equal(b,i):
print("yes")
I know in R there is a rbind function:
list = c(1,2,3)
blah = NULL
blah = rbind(blah,list)
How would I replicate this in python? I know you could possible write:
a = NULL
b= array([1,2,3])
for i in range(100):
a = np.vstack((a,b))
but I am not sure what to write in the a=NULL spot. I am essentially looping and adding rows to a table. What's the most efficient way to do this?
In numpy, things will be more efficient if you first pre-allocate space and then loop to fill that space than if you dynamically create successively larger arrays. If, for instance, the size is 500, you would:
a = np.empty((500, b.shape[0]))
Then, loop and enter values as needed:
for i in range(500):
a[i,:] = ...
Note that if you really just want to repeat b 500 times, you can just do:
In [1]: import numpy as np
In [2]: b = np.array([1,2,3])
In [3]: a = np.empty((500, b.shape[0]))
In [4]: a[:] = b
In [5]: a[0,:] == b
Out[5]: array([ True, True, True], dtype=bool)
To answer your question exactly
a = []
b= np.array([1,2,3])
for i in xrange(100):
a.append(b)
a = np.vstack( tuple(a) )
The tuple function casts an iterable (in this case a list) as a tuple object, and np.vstack takes a tuple of numpy arrays as argument.
For my unittest, I want to check if two arrays are identical. Reduced example:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
if np.all(a==b):
print 'arrays are equal'
This does not work because nan != nan.
What is the best way to proceed?
For versions of numpy prior to 1.19, this is probably the best approach in situations that don't specifically involve unit tests:
>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True
However, modern versions provide the array_equal function with a new keyword argument, equal_nan, which fits the bill exactly.
This was first pointed out by flyingdutchman; see his answer below for details.
Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:
In : import numpy as np
In : def nan_equal(a,b):
...: try:
...: np.testing.assert_equal(a,b)
...: except AssertionError:
...: return False
...: return True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([1, 2, np.NaN])
In : nan_equal(a,b)
Out: True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([3, 2, np.NaN])
In : nan_equal(a,b)
Out: False
Edit
Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.
The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:
a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])
if np.allclose(a, b, equal_nan=True):
print('arrays are equal')
Then arrays are equal will be printed.
You can find here the related documentation
The numpy function array_equal fits the question's requirements perfectly with the equal_nan parameter added in 1.19.
The example would look as follows:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)
But be aware of the problem that this won't work if an element is of dtype object. Not sure if this is a bug or not.
You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:
For example:
a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True
Just to complete #Luis Albert Centeno’s answer, you may rather use:
np.allclose(a, b, rtol=0, atol=0, equal_nan=True)
rtol and atol control the tolerance of the equality test. In short, allclose() returns:
all(abs(a - b) <= atol + rtol * abs(b))
By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.
PS: "I want to check if two arrays are identical " >>
Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)
You’d test identity via keyword is:
a is b
When I used the above answer:
((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
It gave me some erros when evaluate list of strings.
This is more type generic:
def EQUAL(a,b):
return ((a == b) | ((a != a) & (b != b)))
As of v1.19, numpy's array_equal function supports an equal_nan argument:
assert np.array_equal(a, b, equal_nan=True)
For me this worked fine:
a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where =
numpy.logical_not(numpy.logical_or(
numpy.isnan(a),
numpy.isnan(b)
))
).all()
PS. Ignores comparison when there's a nan
If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:
a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)
Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)