What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE
I have the following question. I have a list of ranges like this:
parameterRanges2 = [(1,5),(1,5),(1,7),(1,7),(0,10),(1,20),(1,3),(0,1)]
And I have a numpy array like this :
arr = np.array([[2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0],
[4.0,2.0,3.0,4.0,2.0,4.0,5.0,1.0,2.0,4.0,1.0,3.0,4.0,2.0,3.0,5.0,1.0,3.0,4.0,2.0],
[2.0,3.0,4.0,6.0,7.0,1.0,2.0,3.0,5.0,6.0,1.0,2.0,4.0,5.0,6.0,2.0,3.0,4.0,5.0,6.0],
[6.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,2.0,2.0,3.0,4.0,5.0],
[8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,7.0,8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0],
[11.0,13.0,14.0,16.0,17.0,19.0,1.0,3.0,4.0,6.0,7.0,9.0,10.0,11.0,13.0,14.0,16.0,17.0,19.0,1.0],
[1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,1.0,1.0,1.0,1.0,1.0,2.0],
[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0]])
Every parameterRange tuple in the list corresponds with the subarray in the numpy array. Is there a way to check if all elements in the corresponding range exists at least one time? so for example that in the first sublist in the numpy array all numbers 1,2,3,4,5 exists at least one time, in the second sublists exists one time and in the third list for example the numbers 1,2,3,4,5,6,7 exists one time and so on.
Exploiting that the ranges are integer we can give an O(nm) solution, nxm being the shape of arr. The algo works as follows:
discard all non-int elements and all that are outside their range
use np.add.at to efficiently (O(mn)) generate bincounts for in-range numbers
count the above threshold bins in each row and compare to the range
.
import numpy as np
parameterRanges2 = np.array([(1,5),(1,5),(1,7),(1,7),(0,10),(1,20),(1,3),(0,1)])
arr = np.array([[2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0],
[4.0,2.0,3.0,4.0,2.0,4.0,5.0,1.0,2.0,4.0,1.0,3.0,4.0,2.0,3.0,5.0,1.0,3.0,4.0,2.0],
[2.0,3.0,4.0,6.0,7.0,1.0,2.0,3.0,5.0,6.0,1.0,2.0,4.0,5.0,6.0,2.0,3.0,4.0,5.0,6.0],
[6.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,2.0,2.0,3.0,4.0,5.0],
[8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,7.0,8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0],
[11.0,13.0,14.0,16.0,17.0,19.0,1.0,3.0,4.0,6.0,7.0,9.0,10.0,11.0,13.0,14.0,16.0,17.0,19.0,1.0],
[1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,1.0,1.0,1.0,1.0,1.0,2.0],
[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0]])
min_occ = 2
dp = np.diff(parameterRanges2, axis=-1)
m = np.zeros((parameterRanges2.shape[0], np.max(dp) + 2), dtype=int)
arr = arr - parameterRanges2[:, :1]
ia = arr.astype(int)
idx = np.where((arr==ia) & (ia>=0) & (ia<=dp), ia, -1)
np.add.at(m, (np.arange(parameterRanges2.shape[0])[:, None], idx), 1)
res = (m[:, :-1] >= min_occ).sum(axis=-1) == dp.ravel() + 1
print(res)
Output:
[ True True False True False False True True]
There may be a more efficient way just using Numpy functions, but the code below works. I can't think of a simple Numpy way to do it since we can't make a standard Numpy array of ranges, since all the row aren't the same length.
import numpy as np
arr = np.array([
[2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0],
[4.0,2.0,3.0,4.0,2.0,4.0,5.0,1.0,2.0,4.0,1.0,3.0,4.0,2.0,3.0,5.0,1.0,3.0,4.0,2.0],
[2.0,3.0,4.0,6.0,7.0,1.0,2.0,3.0,5.0,6.0,1.0,2.0,4.0,5.0,6.0,2.0,3.0,4.0,5.0,6.0],
[6.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,2.0,2.0,3.0,4.0,5.0],
[8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,7.0,8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0],
[11.0,13.0,14.0,16.0,17.0,19.0,1.0,3.0,4.0,6.0,7.0,9.0,10.0,11.0,13.0,14.0,16.0,17.0,19.0,1.0],
[1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,1.0,1.0,1.0,1.0,1.0,2.0],
[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0]
])
parameterRanges2 = [(1,5),(1,5),(1,7),(1,7),(0,10),(1,20),(1,3),(0,1)]
ranges = [np.arange(u, v+1, dtype='float64') for u, v in parameterRanges2]
print([np.all(np.isin(u,v)) for u, v in zip(ranges, arr)])
output
[True, True, True, True, False, False, True, True]