"is" operator not working as intended - python

Just have a look at this code:
import re
ti = "abcd"
tq = "abcdef"
check_abcd = re.compile('^abcd')
print id(check_abcd.search(ti))
print id(check_abcd.search(tq))
print check_abcd.search(ti)
print check_abcd.search(tq)
if check_abcd.search(ti) is check_abcd.search(tq):
print "Matching"
else:
print "not matching"
Output:
41696976
41696976
<_sre.SRE_Match object at 0x00000000027C3ED0>
<_sre.SRE_Match object at 0x00000000027C3ED0>
not matching
Definition of is:
`is` is identity testing, == is equality testing.
is will return True if two variables point to the same object
1)Now why is the is not returning True when id as well as object reference is same.
2)When is is replaced by == it is still returning false.Is that the expected behaviour when comparing objects using ==.

You never assigned the return values, so after printing the id() value of the return value of check_abcd.search() calls, Python discards the return value object as there is nothing referencing it anymore. CPython object lifetimes are directly governed by the number of references to them; as soon as that reference count drops to 0 the object is removed from memory.
Discarded memory locations can be re-used, so you'll like to see the same values crop up in id() calls. See the id() documentation:
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
At no point in your code did you actually have one object, you have two separate objects with non-overlapping lifetimes.
Assign return values if you want to make sure id() values are not reused:
>>> import re
>>> ti = "abcd"
>>> tq = "abcdef"
>>> check_abcd = re.compile('^abcd')
>>> ti_search = check_abcd.search(ti)
>>> tq_search = check_abcd.search(tq)
>>> id(ti_search), id(tq_search)
(4378421952, 4378422056)
>>> ti_search, tq_search
(<_sre.SRE_Match object at 0x104f96ac0>, <_sre.SRE_Match object at 0x104f96b28>)
>>> ti_search is tq_search
False
By assigning the return values of check_abcd.search() (the regular expression MatchObjects) an additional reference is created and Python cannot reuse the memory location.

WHen you do:
print id(check_abcd.search(ti)) in a line and don't store the return value of search anywhere, its reference count goes to zero and it is destroyed. The call in the line bellow that creates another object, which happens to be in the same memory address (which is used by CPython as an object ID) - but it is not the same object.
When you use the is operator, the previous object still has to exist in order for the comparison to occur, and its address will be different.
Just put the results of the calls to check_abcd.search in a variable before printing their ID (and use different variables) and you will be able to see what is actually going on.
Moreover: continuing in these lines can be instructive if you want to learn about the behavior of "is" and object IDs - but if you want to compare strings, and return values, just use the == operator, never is: subsequent function calls, even if returning the same value are not supposed to return the same object - the is comparison is only recommended when comparing with "None" (which is implemented as a singleton)

Martjin has given you the correct answer, but for further clarification here is an annotated version of what's happening in your code:
# this line creates returns a value from .search(),
# prints the id, then DISCARDS the value as you have not
# created a reference using a variable.
print id(check_abcd.search(ti))
# this line creates a new, distinct returned value from .search()
# COINCIDENTALLY reusing the memory address and id of the last one.
print id(check_abcd.search(tq))
# Same, a NEW value having (by coincidence) the same id
print check_abcd.search(ti)
# Same again
print check_abcd.search(tq)
# Here, Python is creating two distinct return values.
# The first object cannot be released and the id reused
# because both values must be held until the conditional statement has
# been completely evaluated. Therefore, while the FIRST value will
# probably reuse the same id, the second one will have a different id.
if check_abcd.search(ti) is check_abcd.search(tq):
print "Matching"
else:
print "not matching"

Related

Python: what is the difference of adding an object to a set by id() or directly?

Assume I have a custom class CustomObject and I do not define a custom __hash__ or __eq__ function for it. Will there be any difference between the following two operations in terms of outputs in any conditions?
a = CustomObject(1)
b = CustomObject(1)
setA = set()
# option 1
setA.add(a)
print((b in setA))
# option 2
setA.add(id(a))
print((id(b) in setA))
According to What is the default __hash__ in python?, the default __hash__ function is bound to the id of the object, so I assume there is no difference between the above two options?
If I define custom __hash__ functions for CustomObject like in add object into python's set collection and determine by object's attribute, the above two options will be different, right?
Saving the ID can result in a false positive if any of the objects become garbage and the ID is reassigned.
a = CustomObject(1)
setA = set()
setA.add(id(a))
del a
b = CustomObject(1)
print(id(b) in setA)
This would print True if b gets the same ID that a previously had.
The same reason as that mentioned by #Barmar, a phenomenon that is easier to reproduce is that only one address can be obtained by adding temporary CustomObject for many times:
>>> class CustomObject:
... def __init__(self, value):
... self.value = value
...
>>> {id(CustomObject(1)) for _ in range(10)}
{1799037490496}
>>> {id(CustomObject(i)) for i in range(10)}
{1799034371856}
In addition, you can only get the address instead of the object you added when iterating over the set. There are methods in the ctypes library that can get the object through the address, but when the object is destroyed, it is not safe to get it through the address.

Python 3 REGEX .finditer: Usefulness of hex address in "callable_iterator object at 0x..."?

The following Python 3.5 prints add_date_iter == <callable_iterator object at 0xb78e218c>.
import re
date_added_attrs = re.compile(r'( +ADD_DATE=)("(\d+)")')
add_date_iter = date_added_attrs.finditer(test_string)
print("add_date_iter ==", add_date_iter)
So far, so good. BUT of what use is 0xb78e218c? It appears to be a hexadecimal memory or object address. Whatever it is, why / how if at all might a Python 3 program make use of it?
EDIT: My question is NOT about REGEX. The REGEX works fine. My question is, what's the purpose / benefit of the hexadecimal value returned by the .finditer operation?
what's the purpose / benefit of the hexadecimal value returned by the .finditer operation?
When you write print("add_date_iter ==", add_date_iter), you are simply converting the iterator object to a string and printing it. That is, <callable_iterator object at 0xb78e218c> is the return value of the object's internal __str__ method.*
The hexadecimal address is telling you where in memory the iterator is stored. It is the same value you would get if you ran hex(id(add_date_iter)). It is generally only useful when you're trying to figure out the internals of how Python is managing memory during a certain process, or if you want to check whether two variables are holding a reference to the same object. When comparing objects, you can think of id(a) == id(b) as a long way of writing a is b.
More detail...
For instance, if you had this code:
class A:
def __init__(self):
self.val = 0
a = A()
b = A()
print(id(a), id(b))
print(a is b)
b = a
print(id(a), id(b))
print(a is b)
You would get an output like this:
140665126149392 140665230088528
False
140665126149392 140665126149392
True
In the first case, even though the instance variables val have the same value, the objects themselves are different. After writing b = a, though, both variables now refer to the same object.
One place you can get tripped up with this is with integers, which is why you should always use == instead of is unless you really know what you're doing (or you're checking is None, since None is actually a singleton object):
a = 5
b = 5
a is b # True
a = 300
b = 300
a is b # False
One final point: since Python has first class functions (i.e. functions are treated as objects), everything discussed above works with functions, too. Just reference the function without parentheses, like id(print).
* Note: If you write add_date_iter without print in the interactive shell, it will call the __repr__ method instead

Why isn't "is not" working with ConfigParser.sections() in a list comprehension? [duplicate]

In a comment on this question, I saw a statement that recommended using
result is not None
vs
result != None
What is the difference? And why might one be recommended over the other?
== is an equality test. It checks whether the right hand side and the left hand side are equal objects (according to their __eq__ or __cmp__ methods.)
is is an identity test. It checks whether the right hand side and the left hand side are the very same object. No methodcalls are done, objects can't influence the is operation.
You use is (and is not) for singletons, like None, where you don't care about objects that might want to pretend to be None or where you want to protect against objects breaking when being compared against None.
First, let me go over a few terms. If you just want your question answered, scroll down to "Answering your question".
Definitions
Object identity: When you create an object, you can assign it to a variable. You can then also assign it to another variable. And another.
>>> button = Button()
>>> cancel = button
>>> close = button
>>> dismiss = button
>>> print(cancel is close)
True
In this case, cancel, close, and dismiss all refer to the same object in memory. You only created one Button object, and all three variables refer to this one object. We say that cancel, close, and dismiss all refer to identical objects; that is, they refer to one single object.
Object equality: When you compare two objects, you usually don't care that it refers to the exact same object in memory. With object equality, you can define your own rules for how two objects compare. When you write if a == b:, you are essentially saying if a.__eq__(b):. This lets you define a __eq__ method on a so that you can use your own comparison logic.
Rationale for equality comparisons
Rationale: Two objects have the exact same data, but are not identical. (They are not the same object in memory.)
Example: Strings
>>> greeting = "It's a beautiful day in the neighbourhood."
>>> a = unicode(greeting)
>>> b = unicode(greeting)
>>> a is b
False
>>> a == b
True
Note: I use unicode strings here because Python is smart enough to reuse regular strings without creating new ones in memory.
Here, I have two unicode strings, a and b. They have the exact same content, but they are not the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the unicode object has implemented the __eq__ method.
class unicode(object):
# ...
def __eq__(self, other):
if len(self) != len(other):
return False
for i, j in zip(self, other):
if i != j:
return False
return True
Note: __eq__ on unicode is definitely implemented more efficiently than this.
Rationale: Two objects have different data, but are considered the same object if some key data is the same.
Example: Most types of model data
>>> import datetime
>>> a = Monitor()
>>> a.make = "Dell"
>>> a.model = "E770s"
>>> a.owner = "Bob Jones"
>>> a.warranty_expiration = datetime.date(2030, 12, 31)
>>> b = Monitor()
>>> b.make = "Dell"
>>> b.model = "E770s"
>>> b.owner = "Sam Johnson"
>>> b.warranty_expiration = datetime.date(2005, 8, 22)
>>> a is b
False
>>> a == b
True
Here, I have two Dell monitors, a and b. They have the same make and model. However, they neither have the same data nor are the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the Monitor object implemented the __eq__ method.
class Monitor(object):
# ...
def __eq__(self, other):
return self.make == other.make and self.model == other.model
Answering your question
When comparing to None, always use is not. None is a singleton in Python - there is only ever one instance of it in memory.
By comparing identity, this can be performed very quickly. Python checks whether the object you're referring to has the same memory address as the global None object - a very, very fast comparison of two numbers.
By comparing equality, Python has to look up whether your object has an __eq__ method. If it does not, it examines each superclass looking for an __eq__ method. If it finds one, Python calls it. This is especially bad if the __eq__ method is slow and doesn't immediately return when it notices that the other object is None.
Did you not implement __eq__? Then Python will probably find the __eq__ method on object and use that instead - which just checks for object identity anyway.
When comparing most other things in Python, you will be using !=.
Consider the following:
class Bad(object):
def __eq__(self, other):
return True
c = Bad()
c is None # False, equivalent to id(c) == id(None)
c == None # True, equivalent to c.__eq__(None)
None is a singleton, and therefore identity comparison will always work, whereas an object can fake the equality comparison via .__eq__().
>>> () is ()
True
>>> 1 is 1
True
>>> (1,) == (1,)
True
>>> (1,) is (1,)
False
>>> a = (1,)
>>> b = a
>>> a is b
True
Some objects are singletons, and thus is with them is equivalent to ==. Most are not.

Unexpected Python behavior with dictionary and class

class test:
def __init__(self):
self.see=0
self.dic={"1":self.see}
examine=test()
examine.see+=1
print examine.dic["1"]
print examine.see
this has as a result 0 and 1 and it makes no sense why.
print id(examine.dic["1"])
print id(examine.see)
they also have different memory addresses
However, if you use the same example but you have an array instead of variable in see. You get the expected output.
Any explanations?
This gives the expected output:
class test:
def __init__(self):
self.see=[0]
self.dic={"1":self.see}
examine=test()
examine.see[0]+=1
print examine.dic["1"][0]
print examine.see[0]
Short answer:
Arrays/lists are mutable whereas integers/ints are not.
lists are mutable (they can be changed in place), when you change a list the same object gets updated (the id doesn't change, because a new object is not needed).
Integers are immuable - this means to change the value of something, you have to create a new object, which will have a different id. Strings work the same way and you would have had the same "problem" if you set self.see = 'a', and then did examine.see += ' b'
>>> a = 'a'
>>> id(a)
3075861968L
>>> z = a
>>> id(z)
3075861968L
>>> a += ' b'
>>> id(a)
3075385776L
>>> id(z)
3075861968L
>>> z
'a'
>>> a
'a b'
In Python, names point to values; and values are managed by Python. The id() method returns a unique identifier of the value and not the name.
Any number of names can point to the same value. This means, you can have multiple names that are all linked to the same id.
When you first create your class object, the name see is pointing to the value of an integer object, and that object's value is 1. Then, when you create your class dic, the "1" key is now pointing to the same object that see was pointing to; which is 1.
Since 1 (an object of type integer) is immutable - whenever you update it, the original object is replaced and a new object is created - this is why the return value of id() changes.
Python is smart enough to know that there are some other names pointing to the "old" value, and so it keeps that around in memory.
However, now you have two objects; and the dictionary is still pointing to the "old" one, and see is now pointing to the new one.
When you use a list, Python doesn't need to create a new object because it can modify a list without destroying it; because lists are mutable. Now when you create a list and point two names to it, both the names are pointing to the same object. When you update this object (by adding a value, or deleting a value or changing its value) the same object is updated - and so everything pointing to it will get the "updated" value.
examine.dic["1"] and examine.see do indeed have different locations, even if the former's initial value is copied from the latter.
With your case of using an array, you're not changing the value of examine.see: you're instead changing examine.see[0], which is changing the content of the array it points to (which is aliased to examine.dic["1"]).
When you do self.dic={"1":self.see}, the dict value is set to the value of self.see at that moment. When you later do examine.see += 1, you set examine.see to a new value. This has no effect on the dict because the dict was set to the value of self.see; it does not know to "keep watching" the name self.see to see if is pointing to a different value.
If you set self.see to a list, and then do examine.see += [1], you are not setting examine.see to a new value, but are changing the existing value. This will be visible in the dict, because, again, the dict is set to the value, and that value can change.
The thing is that sometimes a += b sets a to a new value, and sometimes it changes the existing value. Which one happens depends on the type of a; you need to know what examine.see is to know what examine.see += something does.
Others have addressed the mutability/boxing question. What you seem to be asking for is late binding. This is possible, but a little counterintuitive and there's probably a better solution to your underlying problem… if we knew what it was.
class test:
#property
def dic(self):
self._dic.update({'1': self.see})
return self._dic
def __init__(self):
self.see = 0
self._dic = {}
>>> ex=test()
>>> ex.see
0
>>> ex.see+=1
>>> ex.see
1
>>> ex.dic
{'1': 1}
>>> ex.see+=1
>>> ex.dic
{'1': 2}
In fact, in this contrived example it's even a little dangerous because returning self._dic the consumer could modify the dict directly. But that's OK, because you don't need to do this in real life. If you want the value of self.see, just get the value of self.see.
In fact, it looks like this is what you want:
class test:
_see = 0
#property
def see(self):
self._see+=1
return self._see
or, you know, just itertools.count() :P
This solution worked for me. Feel free to use it.
class integer:
def __init__(self, integer):
self.value=integer
def plus(self):
self.value=self.value+1
def output(self):
return self.value
The solution replaces the mutable type int with a class whose address is used as reference.
Furthermore you can make changes to the class object and the changes apply to what the dictionary points. It is somewhat a pointer/datastructure.

Strange behaviour related to apparent caching of "{}"

Today I learned that Python caches the expression {}, and replaces it with a new empty dict when it's assigned to a variable:
print id({})
# 40357936
print id({})
# 40357936
x = {}
print id(x)
# 40357936
print id({})
# 40356432
I haven't looked at the source code, but I have an idea as to how this might be implemented. (Maybe when the reference count to the global {} is incremented, the global {} gets replaced.)
But consider this bit:
def f(x):
x['a'] = 1
print(id(x), x)
print(id(x))
# 34076544
f({})
# (34076544, {'a': 1})
print(id({}), {})
# (34076544, {})
print(id({}))
# 34076544
f modifies the global dict without causing it to be replaced, and it prints out the modified dict. But outside of f, despite the id being the same, the global dict is now empty!
What is happening??
It's not being cached -- if you don't assign the result of {} anywhere, its reference count is 0 and it's cleaned up right away. It just happened that the next one you allocated reused the memory from the old one. When you assign it to x you keep it alive, and then the next one has a different address.
In your function example, once f returns there are no remaining references to your dict, so it gets cleaned up too, and the same thing applies.
Python isn't doing any caching here. There are two possibilities when id() gives the same return value at different points in a program:
id() was called on the same object twice
The first object that id() was called on was garbage collected before the second object was created, and the second object was created in the same memory location as the original
In this case, it was the second one. This means that even though print id({}); print id({}) may print the same value twice, each call is on a distinct object.

Categories