Comparing numpy arrays containing NaN - python

For my unittest, I want to check if two arrays are identical. Reduced example:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
if np.all(a==b):
print 'arrays are equal'
This does not work because nan != nan.
What is the best way to proceed?

For versions of numpy prior to 1.19, this is probably the best approach in situations that don't specifically involve unit tests:
>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True
However, modern versions provide the array_equal function with a new keyword argument, equal_nan, which fits the bill exactly.
This was first pointed out by flyingdutchman; see his answer below for details.

Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:
In : import numpy as np
In : def nan_equal(a,b):
...: try:
...: np.testing.assert_equal(a,b)
...: except AssertionError:
...: return False
...: return True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([1, 2, np.NaN])
In : nan_equal(a,b)
Out: True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([3, 2, np.NaN])
In : nan_equal(a,b)
Out: False
Edit
Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.

The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:
a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])
if np.allclose(a, b, equal_nan=True):
print('arrays are equal')
Then arrays are equal will be printed.
You can find here the related documentation

The numpy function array_equal fits the question's requirements perfectly with the equal_nan parameter added in 1.19.
The example would look as follows:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)
But be aware of the problem that this won't work if an element is of dtype object. Not sure if this is a bug or not.

You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:
For example:
a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True

Just to complete #Luis Albert Centeno’s answer, you may rather use:
np.allclose(a, b, rtol=0, atol=0, equal_nan=True)
rtol and atol control the tolerance of the equality test. In short, allclose() returns:
all(abs(a - b) <= atol + rtol * abs(b))
By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.
PS: "I want to check if two arrays are identical " >>
Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)
You’d test identity via keyword is:
a is b

When I used the above answer:
((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
It gave me some erros when evaluate list of strings.
This is more type generic:
def EQUAL(a,b):
return ((a == b) | ((a != a) & (b != b)))

As of v1.19, numpy's array_equal function supports an equal_nan argument:
assert np.array_equal(a, b, equal_nan=True)

For me this worked fine:
a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where =
numpy.logical_not(numpy.logical_or(
numpy.isnan(a),
numpy.isnan(b)
))
).all()
PS. Ignores comparison when there's a nan

If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:
a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)
Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)

Related

Get key of dictionary containing arrays - Python [duplicate]

What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE

Pandas - error while comparing two cells with the same NaN value [duplicate]

For my unittest, I want to check if two arrays are identical. Reduced example:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
if np.all(a==b):
print 'arrays are equal'
This does not work because nan != nan.
What is the best way to proceed?
For versions of numpy prior to 1.19, this is probably the best approach in situations that don't specifically involve unit tests:
>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True
However, modern versions provide the array_equal function with a new keyword argument, equal_nan, which fits the bill exactly.
This was first pointed out by flyingdutchman; see his answer below for details.
Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:
In : import numpy as np
In : def nan_equal(a,b):
...: try:
...: np.testing.assert_equal(a,b)
...: except AssertionError:
...: return False
...: return True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([1, 2, np.NaN])
In : nan_equal(a,b)
Out: True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([3, 2, np.NaN])
In : nan_equal(a,b)
Out: False
Edit
Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.
The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:
a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])
if np.allclose(a, b, equal_nan=True):
print('arrays are equal')
Then arrays are equal will be printed.
You can find here the related documentation
The numpy function array_equal fits the question's requirements perfectly with the equal_nan parameter added in 1.19.
The example would look as follows:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)
But be aware of the problem that this won't work if an element is of dtype object. Not sure if this is a bug or not.
You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:
For example:
a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True
Just to complete #Luis Albert Centeno’s answer, you may rather use:
np.allclose(a, b, rtol=0, atol=0, equal_nan=True)
rtol and atol control the tolerance of the equality test. In short, allclose() returns:
all(abs(a - b) <= atol + rtol * abs(b))
By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.
PS: "I want to check if two arrays are identical " >>
Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)
You’d test identity via keyword is:
a is b
When I used the above answer:
((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
It gave me some erros when evaluate list of strings.
This is more type generic:
def EQUAL(a,b):
return ((a == b) | ((a != a) & (b != b)))
As of v1.19, numpy's array_equal function supports an equal_nan argument:
assert np.array_equal(a, b, equal_nan=True)
For me this worked fine:
a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where =
numpy.logical_not(numpy.logical_or(
numpy.isnan(a),
numpy.isnan(b)
))
).all()
PS. Ignores comparison when there's a nan
If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:
a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)
Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)

Why aren't my matrices being correctly compared? [duplicate]

What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE

NumPy - assigning to views returned by function

Suppose I have NumPy N-D array a and a function f(a) which returns any complex view v of a, also array b which has same shape as v.
What is the easiest way to assign b to v? Both of them can be multi-dimensional.
Simplest trial like in next code to assign to function's return value fails with error: SyntaxError: can't assign to function call:
import numpy as np
a, b = np.arange(10), np.arange(2)
a[2:4] = b # Working
f = lambda a: a[2:4] # Returns any view of a
f(a) = b # Not working, syntax error
By the task it is not allowed to pass array b argument to function f, function itself should be un-modified.
#hpaulj suggested next solution that works for any dimensionality (unlike this solution):
f(a)[...] = b
I just figured out myself one simplest solution, it works correctly for any N-D case except for 0-dimensional arrays (scalars):
f(a)[:] = b
Before trying to find a solution, make sure you understand the problem.
In [27]: a, b = np.arange(10), np.arange(2)
In [28]: f = lambda a: a[2:4]
In [29]: f(a)
Out[29]: array([2, 3])
In [30]: f(a) = b
File "<ipython-input-30-df88b52b4d3c>", line 1
f(a) = b
^
SyntaxError: can't assign to function call
This error is a fundamental Python one. A matter of syntax.
But look at what happens when we using indexing.
The slicing you do in f is:
In [31]: a[2:4]
Out[31]: array([2, 3])
In [32]: a.__getitem__(slice(2,4))
Out[32]: array([2, 3])
The desired assignment slicing is:
In [33]: a[2:4] = b
In [34]: a.__setitem__(slice(2,4),b)
In [35]: a
Out[35]: array([0, 1, 0, 1, 4, 5, 6, 7, 8, 9])
Note that setitem takes b as an argument. a.__setitem__(slice(2,4))=b would run into that same syntax error.
This use of setitem allows us to use advanced indexing (a list):
In [38]: a[[0,2]] = b
In [39]: a.__setitem__([0,2],b)
Where as this does not work:
In [40]: a[[0,2]][...] = b
because it is actually a.__getitem__([0,2]).__setitem__(Ellipsis,b). The set modifies the copy produced by the get. This chaining only works when the first index produces a view.

np.where for finding differences, nans mixed with not nan values [duplicate]

For my unittest, I want to check if two arrays are identical. Reduced example:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
if np.all(a==b):
print 'arrays are equal'
This does not work because nan != nan.
What is the best way to proceed?
For versions of numpy prior to 1.19, this is probably the best approach in situations that don't specifically involve unit tests:
>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True
However, modern versions provide the array_equal function with a new keyword argument, equal_nan, which fits the bill exactly.
This was first pointed out by flyingdutchman; see his answer below for details.
Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:
In : import numpy as np
In : def nan_equal(a,b):
...: try:
...: np.testing.assert_equal(a,b)
...: except AssertionError:
...: return False
...: return True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([1, 2, np.NaN])
In : nan_equal(a,b)
Out: True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([3, 2, np.NaN])
In : nan_equal(a,b)
Out: False
Edit
Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.
The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:
a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])
if np.allclose(a, b, equal_nan=True):
print('arrays are equal')
Then arrays are equal will be printed.
You can find here the related documentation
The numpy function array_equal fits the question's requirements perfectly with the equal_nan parameter added in 1.19.
The example would look as follows:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)
But be aware of the problem that this won't work if an element is of dtype object. Not sure if this is a bug or not.
You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:
For example:
a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True
Just to complete #Luis Albert Centeno’s answer, you may rather use:
np.allclose(a, b, rtol=0, atol=0, equal_nan=True)
rtol and atol control the tolerance of the equality test. In short, allclose() returns:
all(abs(a - b) <= atol + rtol * abs(b))
By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.
PS: "I want to check if two arrays are identical " >>
Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)
You’d test identity via keyword is:
a is b
When I used the above answer:
((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
It gave me some erros when evaluate list of strings.
This is more type generic:
def EQUAL(a,b):
return ((a == b) | ((a != a) & (b != b)))
As of v1.19, numpy's array_equal function supports an equal_nan argument:
assert np.array_equal(a, b, equal_nan=True)
For me this worked fine:
a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where =
numpy.logical_not(numpy.logical_or(
numpy.isnan(a),
numpy.isnan(b)
))
).all()
PS. Ignores comparison when there's a nan
If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:
a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)
Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)

Categories