How does "in" checks for membership? - python

I have a multiple instances of a class. I consider two classes equal, when a certain attribute matches.
All instances are in an array list = [a, b, c]. I now create a new instance of said class d. When I do d in list it ofc outputs false.
My question is: How is membership checked when using in? Is it normal comparison (which means I can use __eq__ in my class to implement the equality of classes)? If not: How can I achieve that in matches if a certain attribute of a class equals?

class Foo:
def __init__(self, x):
self.x = x
def __eq__(self, other):
if isinstance(other, Foo):
return self.x == other.x
a = [1,2,3,Foo(4),Foo(5)]
Foo(5) in a
>>>True
Foo(3) in a
>>>False

From the docs:
For user-defined classes which define the __contains__() method, x in y is true if and only if y.__contains__(x) is true.
For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.
Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is true if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).

Behavior of in is based on the __contains__() method. Let us see with an example:
class X():
def __contains__(self, m):
print 'Hello'
Now when you do in on X()m you can see 'Hello' printed
>>> x = X()
>>> 1 in x
Hello
False
As per the __contains__() document:
For objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__(), see this section in the language reference.

Related

How does "contains" work? Does it use iter and getitem?

How does __contains__ work? For example I have a class MyClass and an instance of this class called a, when I write if val in a: I'm basically invoking __contains__, from my understanding, if __contains__ is not implemented in the class then __iter__ is invoked, which iterates between the list returned by __getitem__ (which in my example is implemented in the class) and if val is equal to some element of the list then __contains__ returns True. Is it right?
EDIT: __getitem__ in my code only returns the element of the list at a given position so I don't know how would that work together with __iter__
I think the __contains__ document is clear enough,
> Called to implement membership test operators. Should return true if
> item is in self, false otherwise. For mapping objects, this should
> consider the keys of the mapping rather than the values or the
> key-item pairs.
>
> For objects that don’t define __contains__(), the membership test
> first tries iteration via __iter__(), then the old sequence iteration
> protocol via __getitem__(), see this section in the language
> reference.
That's exactly right. More specifics here:
For user-defined classes which define the contains() method, x in
y returns True if y.contains(x) returns a true value, and False
otherwise.
For user-defined classes which do not define contains() but do
define iter(), x in y is True if some value z, for which the
expression x is z or x == z is true, is produced while iterating over
y. If an exception is raised during the iteration, it is as if in
raised that exception.
Lastly, the old-style iteration protocol is tried: if a class defines
getitem(), x in y is True if and only if there is a non-negative integer index i such that x is y[i] or x == y[i], and no lower integer
index raises the IndexError exception. (If any other exception is
raised, it is as if in raised that exception).

What does "return a in b" mean?

I want to understand how the return statement works. I am familiar with the return statement but not aware of the return in statement. Below is an example of a class method that uses it and I would like to know what it does.
def a(self, argv):
some = self.fnc("Format Specifier")
return argv in some
value in values means "True if value is in values otherwise False"
a simple example:
In [1]: "foo" in ("foo", "bar", "baz")
Out[1]: True
In [2]: "foo" in ("bar", "baz")
Out[2]: False
So in your case return argv in some means "return True if argv is in some otherwise return False"
It means whether argv is an element of some(in Boolean value). some could be list, tuple, dict etc.
It may be more clear if you know what happens in the background. When you use x in y, that is a shortcut for y.__contains__(x)1. When you define a class, you can define your own __contains__ method that can actually return anything you want. Usually, it returns either True or False. Therefore, argv in some will be the result of argv.__contains__(some): either True or False. You then return that.
1If y does not have the __contains__ method, it is converted to an iterator and each item in it is checked for equality with x.
The return itself is of no importance here: you can interpret this as:
return (argv in some)
Now the in keyword means:
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s. The collection membership test has traditionally been bound to sequences; an object is a member of a collection if the collection is a sequence and contains an element equal to that object. However, it make sense for many other object types to support membership tests without being a sequence. In particular, dictionaries (for keys) and sets support membership testing.
Python uses a fallback mechanism where it will check whether some (in this case) supports one of the following methods:
First it checks whether some is a list or tuple:
For the list and tuple types, x in y is true if and only if there exists an index i such that either x is y[i] or x == y[i] is true.
Next it checks whether both argv and some are strings:
For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u'ab' in 'abc' will return True. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.
Now every object that implements a __contains__ method supports such in and not in test as is further described in the documentation:
For user-defined classes which define the __contains__() method, x in y is true if and only if y.__contains__(x) is true.
So besides implemented usages for dictionaries, tuples and lists, you can define your own __contains__ method for arbitrary objects.
Another way to support this functionality is the following:
For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.
And finally:
Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is true if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).

How to define __hash__ when an object can be equal to different kind of objects?

In Python doscumentation, we can read about __hash__ function :
The only required property is that objects which compare equal have
the same hash value.
I have an object which can be equal to other object of the same type, or to strings:
class MyClass:
def __eq__(self, other):
if isinstance(other, str):
return self.x == other
if isinstance(other, MyClass):
return id(self) == id(other)
return False
Having this __eq__ function, how could I define a valid __hash__ function ?
Warning: Here, MyClass() objects are mutable, and self.x could change !
You can't define a consistent hash. First, your class does not define __eq__ consistently; it is not guaranteed that if x == y and y == z, then x == z. Second, your object is mutable. In Python, mutable objects are not supposed to be hashable.
If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since hashable collection implementations require that a object’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
Example of broken ==:
x = MyClass('foo')
y = 'foo'
z = MyClass('foo')
x == y # True
y == z # True
x == z # False

Difference between the in keyword and __contains__ in Python

I was wondering if some one could explain the difference between the "in" keyword of Python and the contains method
I was working with a sample list and found this behavior. When are the two supposed to be used? Is there some efficiency that can be achieved if I use one over the other.
>>> my_list = ["a", "b", "c"]
>>> my_list.__contains__("a")
True
>>> "a" in my_list
True
The __contains__() method of an an object is called when you use the in statement.
For lists this is pre-defined, but you can also define your own class, add a __contains__ method and use in on the instances of that class.
You should be using in and not call __contains__() directly.
From the docs:
For the list and tuple types, x in y is true if and only if there
exists an index i such that x == y[i] is true.
string types, x in y is true if and only if x is a substring of y. An
equivalent test is y.find(x) != -1.
For user-defined classes which define the __contains__() method, x in
y is true if and only if y.__contains__(x) is true.
For user-defined classes which do not define __contains__() but do
define __iter__(), x in y is true if some value z with x == z is
produced while iterating over y. If an exception is raised during the
iteration, it is as if in raised that exception.
Lastly, the old-style iteration protocol is tried: if a class defines
__getitem__(), x in y is true if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do
not raise IndexError exception.
Like most magic methods, the __contains__ method is not meant to be called directly. The reason __contains__ exists is precisely so that you can write obj in container instead of having to use method-call syntax. So you should use obj in container.
Doing "a" in my_list actually calls __contains__ method of my_list if defined.
If __contains__ is not defined then __getitem__ is used.

Does a Python object which doesn't override comparison operators equals itself?

class A(object):
def __init__(self, value):
self.value = value
x = A(1)
y = A(2)
q = [x, y]
q.remove(y)
I want to remove from the list a specific object which was added before to it and to which I still have a reference. I do not want an equality test. I want an identity test. This code seems to work in both CPython and IronPython, but does the language guarantee this behavior or is it just a fluke?
The list.remove method documentation is this: same as del s[s.index(x)], which implies that an equality test is performed.
So will an object be equal to itself if you don't override __cmp__, __eq__ or __ne__?
Yes. In your example q.remove(y) would remove the first occurrence of an object which compares equal with y. However, the way the class A is defined, you shouldn't† ever have a variable compare equal with y - with the exception of any other names which are also bound to the same y instance.
The relevant section of the docs is here:
If no __cmp__(), __eq__() or __ne__() operation is defined, class
instances are compared by object identity ("address").
So comparison for A instances is by identity (implemented as memory address in CPython). No other object can have an identity equal to id(y) within y's lifetime, i.e. for as long as you hold a reference to y (which you must, if you're going to remove it from a list!)
† Technically, it is still possible to have objects at other memory locations which are comparing equal - mock.ANY is one such example. But these objects need to override their comparison operators to force the result.
In python, by default an object is always equal to itself (the only exception I can think of is float("nan"). An object of a user-defined class will not be equal to any other object unless you define a comparison function.
See also http://docs.python.org/reference/expressions.html#notin
The answer is yes and no.
Consider the following example
>>> class A(object):
def __init__(self, value):
self.value = value
>>> x = A(1)
>>> y = A(2)
>>> z = A(3)
>>> w = A(3)
>>> q = [x, y,z]
>>> id(y) #Second element in the list and y has the same reference
46167248
>>> id(q[1]) #Second element in the list and y has the same reference
46167248
>>> q.remove(y) #So it just compares the id and removes it
>>> q
[<__main__.A object at 0x02C19AB0>, <__main__.A object at 0x02C19B50>]
>>> q.remove(w) #Fails because though z and w contain the same value yet they are different object
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
q.remove(w)
ValueError: list.remove(x): x not in list
It will remove from the list iff they are the same object. If they are different object with same value it won;t remove it.

Categories