How does Python 2.7 compare items inside a list

How does Python 2.7 compare items inside a list - python

I came across this interesting example today
class TestableEq(object):
def __init__(self):
self.eq_run = False
def __eq__(self, other):
self.eq_run = True
if isinstance(other, TestableEq):
other.eq_run = True
return self is other
>>> eq = TestableEq()
>>> eq.eq_run
False
>>> eq == eq
True
>>> eq.eq_run
True
>>> eq = TestableEq()
>>> eq is eq
True
>>> eq.eq_run
False
>>> [eq] == [eq]
True
>>> eq.eq_run # Should be True, right?
False
>>> (eq,) == (eq,) # Maybe with tuples?
True
>>> eq.eq_run
False
>>> {'eq': eq} == {'eq': eq} # dicts?
True
>>> eq.eq_run
False
>>> import numpy as np # Surely NumPy works as expected
>>> np.array([eq]) == np.array([eq])
True
>>> eq.eq_run
False
So it seems that comparisons inside containers works differently in Python. I would expect that the call to == would use each object's implementation of __eq__, otherwise what's the point? Additionally
class TestableEq2(object):
def __init__(self):
self.eq_run = False
def __eq__(self, other):
self.eq_run = True
other.eq_run = True
return False
>>> eq = TestableEq2()
>>> [eq] == [eq]
True
>>> eq.eq_run
False
>>> eq == eq
False
>>> eq.eq_run
True
Does this mean that Python uses is from within container's implementations of __eq__ instead? Is there a way around this?
My use case is that I am building a data structure inheriting from some of the collections ABCs and I want to write tests to make sure my structure is behaving correctly. I figured it would be simple to inject a value that recorded when it was compared, but to my surprise the test failed when checking to ensure that comparison occurred.
EDIT: I should mention that this is on Python 2.7, but I see the same behavior on 3.3.

CPython's underlying implementation will skip the equality check (==) for items in a list if items are identical (is).
CPython uses this as an optimization assuming identity implies equality.
This is documented in PyObject_RichCompareBool, which is used to compare items:
Note: If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQ and 0 for Py_NE.
From the listobject.c implementation:
/* Search for the first index where items are different */
for (i = 0; i < Py_SIZE(vl) && i < Py_SIZE(wl); i++) {
int k = PyObject_RichCompareBool(vl->ob_item[i],
wl->ob_item[i], Py_EQ);
// k is 1 if objects are the same
// because of RichCmopareBool's behaviour
if (k < 0)
return NULL;
if (!k)
break;
}
As you can see as long as RichCompareBool is 1 (True) the items are not checked.
And from object.c's implementation of PyObject_RichCompareBool:
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}
// ... actually deep-compare objects
To override this you'll have to compare the items manually.

Python's testing of equality for sequences goes as follows:
Lists identical?
/ \
Y N
/ \
Equal Same length?
/ \
Y N
/ \
Items identical? Not equal
/ \
Y N
/ \
Equal Items equal?
/ \
Y N
/ \
Equal Not equal
You can see that the equality of the items at each position is tested only if the two sequences are the same length but the items at each position are not identical. If you want to force equality checks to be used, you need e.g.:
all(item1 == item2 for item1, item2 in zip(list1, list2))

If x is y there is no reason to call x == y, by contract of ==. Python is taking this shortcut.
This can be verified/disprove this by creating an eq1 and an eq2 in the tests and then using [eq1] == [eq2].
Here is as example:
class TestableEq(object):
def __init__(self):
self.eq_run = False
def __eq__(self, other):
self.eq_run = True
return True # always assume equals for test
eq1 = TestableEq()
eq2 = TestableEq()
eq3 = TestableEq()
print [eq1] == [eq2] # True
print eq1.eq_run # True - implies e1 == e2
print eq2.eq_run # False - but NOT e2 == e1
print [eq3] == [eq3] # True
print eq3.eq_run # False - implies NO e3 == e3
When the items are is there is no == involved.
The difference with the dictionaries can be explained similarly.

When comparing two lists, the cPython implementation short-circuits member comparisons using object equality (obj1 is obj2), because, according to a comment in the code:
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
If the two objects are not exactly the same object, then cPython does a rich compare, using __eq__ if implemented.

Related

Differentiate False and 0

Let's say I have a list with different values, like this:
[1,2,3,'b', None, False, True, 7.0]
I want to iterate over it and check that every element is not in list of some forbidden values. For example, this list is [0,0.0].
When I check if False in [0,0.0] I get True. I understand that python casts False to 0 here - but how I can avoid it and make this check right - that False value is not in [0,0.0]?

To tell the difference between False and 0 you may use is to compare them. False is a singleton value and always refers to the same object. To compare all the items in a list to make sure they are not False, try:
all(x is not False for x in a_list)
BTW, Python doesn't cast anything here: Booleans are a subclass of integers, and False is literally equal to 0, no conversion required.

You would want to use is instead of == when comparing.
y = 0
print y == False # True
print y is False # False
x = False
print x == False # True
print x is False # True

Found a weird corner case on differentiating between 0 and False today. If the initial list contains the numpy version of False (numpy.bool_(False)), the is comparisons don't work, because numpy.bool_(False) is not False.
These arise all the time in comparisons that use numpy types. For example:
>>> type(numpy.array(50)<0)
<class 'numpy.bool_'>
The easiest way would be to compare using the numpy.bool_ type: (np.array(50)<0) is (np.False_). But doing that requires a numpy dependency. The solution I came up with was to do a string comparison (working as of numpy 1.18.1):
str(numpy.bool_(False)) == str(False)
So when dealing with a list, a la #kindall it would be:
all(str(x) != str(False) for x in a_list)
Note that this test also has a problem with the string 'False'. To avoid that, you could exclude against cases where the string representation was equivalent to itself (this also dodges a numpy string array). Here's some test outputs:
>>> foo = False
>>> str(foo) != foo and str(foo) == str(False)
True
>>> foo = numpy.bool_(False)
>>> str(foo) != foo and str(foo) == str(False)
True
>>> foo = 0
>>> str(foo) != foo and str(foo) == str(False)
False
>>> foo = 'False'
>>> str(foo) != foo and str(foo) == str(False)
False
>>> foo = numpy.array('False')
>>> str(foo) != foo and str(foo) == str(False)
array(False)
I am not really an expert programmer, so there may be some limitations I've still missed, or a big reason not to do this, but it allowed me to differentiate 0 and False without needing to resort to a numpy dependency.

Why is (numpy.nan, 1) == (numpy.nan, 1)?

While numpy.nan is not equal to numpy.nan, and (float('nan'), 1) is not equal to float('nan', 1),
(numpy.nan, 1) == (numpy.nan, 1)
What could be the reason?
Does Python first check to see if the ids are identical?
If identity is checked first when comparing items of a tuple, then why isn't it checked when objects are compared directly?

When you do numpy.nan == numpy.nan it's numpy that is deciding whether the condition is true or not. When you compare tuples python is just checking if the tuples have the same objects which they do. You can make numpy have the decision by turning the tuples into numpy arrays.
np.array((1, numpy.nan)) == np.array((1,numpy.nan))
>>array([ True, False], dtype=bool)
The reason is when you do == with numpy objects you're calling the numpy function __eq__() that says specifically that nan != nan because mathematically speaking nan is undetermined (could be anything) so it makes sense that nan != nan. But when you do == with tuples you call the tuples __eq__() function that doesn't care about mathematics and only cares if python objects are the same or not. In case of (float('nan'),1)==(float('nan'),1) it returns False because each call of float('nan') allocates memory in a different place as you can check by doing float('nan') is float('nan').

Container objects are free to define what equality means for them, and for most that means one thing is really, really important:
for x in container:
assert x in container
So containers typically do an id check before an __eq__ check.

When comparing two objects in a tuple Python first check to see if they are the same.
Note that numpy.nan is numpy.nan, but float('nan') is not float('nan').
In Objects/tupleobject.c, the comparison is carried out like this:
for (i = 0; i < vlen && i < wlen; i++) {
int k = PyObject_RichCompareBool(vt->ob_item[i],
wt->ob_item[i], Py_EQ);
if (k < 0)
return NULL;
if (!k)
break;
}
And in PyObject_RichCompareBool, you can see the check for equality:
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}
You can verify this with the following example:
class A(object):
def __eq__(self, other):
print "Checking equality with __eq__"
return True
a1 = A()
a2 = A()
If you try (a1, 1) == (a1, 1) nothing get printed, while (a1, 1) == (a2, 1) would use __eq__ and print our the message.
Now try a1 == a1 and see if it surprises you ;P

Tuples do check first with identity and then with equality if identity doesn't match.
(float('nan'),) == (float('nan'),)
is False simply because a different object instance is created... if you do instead:
x = float('nan')
print (x,) == (x,)
you will get True too because x == x is False, but x is x is True.
Numpy numpy.nan is a static instance and that's why it "doesn't work".
As a wild guess this "shortcut" of checking identity first is done for performance reasons.

How can I return false if more than one number while ignoring "0"'s?

This is a function in a greater a program that solves a sudoku puzzle. At this point, I would like the function to return false if there is more then 1 occurrence of a number unless the number is zero. What do am I missing to achieve this?
L is a list of numbers
l =[1,0,0,2,3,0,0,8,0]
def alldifferent1D(l):
for i in range(len(l)):
if l.count(l[i])>1 and l[i] != 0: #does this do it?
return False
return True

Assuming the list is length 9, you can ignore the inefficiency of using count here (Using a helper datastructure - Counter etc probably takes longer than running .count() a few times). You can write the expression to say they are all different more naturally as:
def alldifferent1D(L):
return all(L.count(x) <= 1 for x in L if x != 0)
This also saves calling count() for all the 0's

>>> from collections import counter
>>> def all_different(xs):
... return len(set(Counter(filter(None, xs)).values()) - set([1])) == 0
Tests:
>>> all_different([])
True
>>> all_different([0,0,0])
True
>>> all_different([0,0,1,2,3])
True
>>> all_different([1])
True
>>> all_different([1,2])
True
>>> all_different([0,2,0,1,2,3])
False
>>> all_different([2,2])
False
>>> all_different([1,2,3,2,2,3])
False

So we can break this down into two problems:
Getting rid of the zeros, since we don't care about them.
Checking if there are any duplicate numbers.
Striping the zeros is easy enough:
filter(lambda a: a != 0, x)
And we can check for differences in a set (which has only one of each element) and a list
if len(x) == len(set(x)):
return True
return False
Making these into functions we have:
def remove_zeros(x):
return filter(lambda a: a != 0, x)
def duplicates(x):
if len(x) == len(set(x)):
return True
return False
def alldifferent1D(x):
return duplicates(remove_zeros(x))

One way to avoid searching for every entry in every position is to:
flags = (len(l)+1)*[False];
for cell in l:
if cell>0:
if flags[cell]:
return False
flags[cell] = True
return True
The flags list has a True at index k if the value k has been seen before in the list.
I'm sure you could speed this up with list comprehension and an all() or any() test, but this worked well enough for me.
PS: The first intro didn't survive my edit, but this is from a Sudoku solver I wrote years ago. (Python 2.4 or 2.5 iirc)

Python: testing inter-dependence between strings

I have a piece of code which compares two strings.
If they are both empty (i.e. ""), the code should return True
If they are both 'populated', the code should return True
otherwise (i.e. one is empty), the code should return False
Currently I have:
def compare(first, second):
if first:
return bool(second)
elif second:
return bool(first)
else:
return True
I'm sure there is a more succinct way to do this with fewer clauses (or ideally no clauses)?

You want the inverse of "exclusive or":
>>> def compare(first, second):
return not bool(first) ^ bool(second)
>>> compare("", "")
True
>>> compare("foo", "")
False
>>> compare("", "bar")
False
>>> compare("foo", "bar")
True

return len(first) == len(second) == 0 or len(first) > 0 and len(second) > 0
Or
return a==b=='' or a!='' and b!=''
You shouldn't count on boolean operators on strings

Does Python have an "or equals" function like ||= in Ruby?

If not, what is the best way to do this?
Right now I'm doing (for a django project):
if not 'thing_for_purpose' in request.session:
request.session['thing_for_purpose'] = 5
but its pretty awkward. In Ruby it would be:
request.session['thing_for_purpose'] ||= 5
which is much nicer.

Jon-Eric's answer's is good for dicts, but the title seeks a general equivalent to ruby's ||= operator.
A common way to do something like ||= in Python is
x = x or new_value

Precise answer: No. Python does not have a single built-in operator op that can translate x = x or y into x op y.
But, it almost does. The bitwise or-equals operator (|=) will function as described above if both operands are being treated as booleans, with a caveat. (What's the caveat? Answer is below of course.)
First, the basic demonstration of functionality:
x = True
x
Out[141]: True
x |= True
x
Out[142]: True
x |= False
x
Out[143]: True
x &= False
x
Out[144]: False
x &= True
x
Out[145]: False
x |= False
x
Out[146]: False
x |= True
x
Out[147]: True
The caveat is due python not being strictly-typed, and thus even if the values are being treated as booleans in an expression they will not be short-circuited if given to a bitwise operator. For example, suppose we had a boolean function which clears a list and returns True iff there were elements deleted:
def my_clear_list(lst):
if not lst:
return False
else:
del lst[:]
return True
Now we can see the short-circuited behavior as so:
x = True
lst = [1, 2, 3]
x = x or my_clear_list(lst)
print(x, lst)
Output: True [1, 2, 3]
However, switching the or to a bitwise or (|) removes the short-circuit, so the function my_clear_list executes.
x = True
lst = [1, 2, 3]
x = x | my_clear_list(lst)
print(x, lst)
Output: True []
Above, x = x | my_clear_list(lst) is equivalent to x |= my_clear_list(lst).

dict has setdefault().
So if request.session is a dict:
request.session.setdefault('thing_for_purpose', 5)

Setting a default makes sense if you're doing it in a middleware or something, but if you need a default value in the context of one request:
request.session.get('thing_for_purpose', 5) # gets a default
bonus: here's how to really do an ||= in Python.
def test_function(self, d=None):
'a simple test function'
d = d or {}
# ... do things with d and return ...

In general, you can use dict[key] = dict.get(key, 0) + val.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does Python 2.7 compare items inside a list - python

Related

Differentiate False and 0

Why is (numpy.nan, 1) == (numpy.nan, 1)?

How can I return false if more than one number while ignoring "0"'s?

Python: testing inter-dependence between strings

Does Python have an "or equals" function like ||= in Ruby?

Categories

Resources