Strange behaviour related to apparent caching of "{}" - python

Today I learned that Python caches the expression {}, and replaces it with a new empty dict when it's assigned to a variable:
print id({})
# 40357936
print id({})
# 40357936
x = {}
print id(x)
# 40357936
print id({})
# 40356432
I haven't looked at the source code, but I have an idea as to how this might be implemented. (Maybe when the reference count to the global {} is incremented, the global {} gets replaced.)
But consider this bit:
def f(x):
x['a'] = 1
print(id(x), x)
print(id(x))
# 34076544
f({})
# (34076544, {'a': 1})
print(id({}), {})
# (34076544, {})
print(id({}))
# 34076544
f modifies the global dict without causing it to be replaced, and it prints out the modified dict. But outside of f, despite the id being the same, the global dict is now empty!
What is happening??

It's not being cached -- if you don't assign the result of {} anywhere, its reference count is 0 and it's cleaned up right away. It just happened that the next one you allocated reused the memory from the old one. When you assign it to x you keep it alive, and then the next one has a different address.
In your function example, once f returns there are no remaining references to your dict, so it gets cleaned up too, and the same thing applies.

Python isn't doing any caching here. There are two possibilities when id() gives the same return value at different points in a program:
id() was called on the same object twice
The first object that id() was called on was garbage collected before the second object was created, and the second object was created in the same memory location as the original
In this case, it was the second one. This means that even though print id({}); print id({}) may print the same value twice, each call is on a distinct object.

Related

Why do these print()s print the same id? [duplicate]

Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))

Object id of object returned by python function [duplicate]

Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))

Python 3 set equality [duplicate]

Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))

"is" operator not working as intended

Just have a look at this code:
import re
ti = "abcd"
tq = "abcdef"
check_abcd = re.compile('^abcd')
print id(check_abcd.search(ti))
print id(check_abcd.search(tq))
print check_abcd.search(ti)
print check_abcd.search(tq)
if check_abcd.search(ti) is check_abcd.search(tq):
print "Matching"
else:
print "not matching"
Output:
41696976
41696976
<_sre.SRE_Match object at 0x00000000027C3ED0>
<_sre.SRE_Match object at 0x00000000027C3ED0>
not matching
Definition of is:
`is` is identity testing, == is equality testing.
is will return True if two variables point to the same object
1)Now why is the is not returning True when id as well as object reference is same.
2)When is is replaced by == it is still returning false.Is that the expected behaviour when comparing objects using ==.
You never assigned the return values, so after printing the id() value of the return value of check_abcd.search() calls, Python discards the return value object as there is nothing referencing it anymore. CPython object lifetimes are directly governed by the number of references to them; as soon as that reference count drops to 0 the object is removed from memory.
Discarded memory locations can be re-used, so you'll like to see the same values crop up in id() calls. See the id() documentation:
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
At no point in your code did you actually have one object, you have two separate objects with non-overlapping lifetimes.
Assign return values if you want to make sure id() values are not reused:
>>> import re
>>> ti = "abcd"
>>> tq = "abcdef"
>>> check_abcd = re.compile('^abcd')
>>> ti_search = check_abcd.search(ti)
>>> tq_search = check_abcd.search(tq)
>>> id(ti_search), id(tq_search)
(4378421952, 4378422056)
>>> ti_search, tq_search
(<_sre.SRE_Match object at 0x104f96ac0>, <_sre.SRE_Match object at 0x104f96b28>)
>>> ti_search is tq_search
False
By assigning the return values of check_abcd.search() (the regular expression MatchObjects) an additional reference is created and Python cannot reuse the memory location.
WHen you do:
print id(check_abcd.search(ti)) in a line and don't store the return value of search anywhere, its reference count goes to zero and it is destroyed. The call in the line bellow that creates another object, which happens to be in the same memory address (which is used by CPython as an object ID) - but it is not the same object.
When you use the is operator, the previous object still has to exist in order for the comparison to occur, and its address will be different.
Just put the results of the calls to check_abcd.search in a variable before printing their ID (and use different variables) and you will be able to see what is actually going on.
Moreover: continuing in these lines can be instructive if you want to learn about the behavior of "is" and object IDs - but if you want to compare strings, and return values, just use the == operator, never is: subsequent function calls, even if returning the same value are not supposed to return the same object - the is comparison is only recommended when comparing with "None" (which is implemented as a singleton)
Martjin has given you the correct answer, but for further clarification here is an annotated version of what's happening in your code:
# this line creates returns a value from .search(),
# prints the id, then DISCARDS the value as you have not
# created a reference using a variable.
print id(check_abcd.search(ti))
# this line creates a new, distinct returned value from .search()
# COINCIDENTALLY reusing the memory address and id of the last one.
print id(check_abcd.search(tq))
# Same, a NEW value having (by coincidence) the same id
print check_abcd.search(ti)
# Same again
print check_abcd.search(tq)
# Here, Python is creating two distinct return values.
# The first object cannot be released and the id reused
# because both values must be held until the conditional statement has
# been completely evaluated. Therefore, while the FIRST value will
# probably reuse the same id, the second one will have a different id.
if check_abcd.search(ti) is check_abcd.search(tq):
print "Matching"
else:
print "not matching"

Unexpected Python behavior with dictionary and class

class test:
def __init__(self):
self.see=0
self.dic={"1":self.see}
examine=test()
examine.see+=1
print examine.dic["1"]
print examine.see
this has as a result 0 and 1 and it makes no sense why.
print id(examine.dic["1"])
print id(examine.see)
they also have different memory addresses
However, if you use the same example but you have an array instead of variable in see. You get the expected output.
Any explanations?
This gives the expected output:
class test:
def __init__(self):
self.see=[0]
self.dic={"1":self.see}
examine=test()
examine.see[0]+=1
print examine.dic["1"][0]
print examine.see[0]
Short answer:
Arrays/lists are mutable whereas integers/ints are not.
lists are mutable (they can be changed in place), when you change a list the same object gets updated (the id doesn't change, because a new object is not needed).
Integers are immuable - this means to change the value of something, you have to create a new object, which will have a different id. Strings work the same way and you would have had the same "problem" if you set self.see = 'a', and then did examine.see += ' b'
>>> a = 'a'
>>> id(a)
3075861968L
>>> z = a
>>> id(z)
3075861968L
>>> a += ' b'
>>> id(a)
3075385776L
>>> id(z)
3075861968L
>>> z
'a'
>>> a
'a b'
In Python, names point to values; and values are managed by Python. The id() method returns a unique identifier of the value and not the name.
Any number of names can point to the same value. This means, you can have multiple names that are all linked to the same id.
When you first create your class object, the name see is pointing to the value of an integer object, and that object's value is 1. Then, when you create your class dic, the "1" key is now pointing to the same object that see was pointing to; which is 1.
Since 1 (an object of type integer) is immutable - whenever you update it, the original object is replaced and a new object is created - this is why the return value of id() changes.
Python is smart enough to know that there are some other names pointing to the "old" value, and so it keeps that around in memory.
However, now you have two objects; and the dictionary is still pointing to the "old" one, and see is now pointing to the new one.
When you use a list, Python doesn't need to create a new object because it can modify a list without destroying it; because lists are mutable. Now when you create a list and point two names to it, both the names are pointing to the same object. When you update this object (by adding a value, or deleting a value or changing its value) the same object is updated - and so everything pointing to it will get the "updated" value.
examine.dic["1"] and examine.see do indeed have different locations, even if the former's initial value is copied from the latter.
With your case of using an array, you're not changing the value of examine.see: you're instead changing examine.see[0], which is changing the content of the array it points to (which is aliased to examine.dic["1"]).
When you do self.dic={"1":self.see}, the dict value is set to the value of self.see at that moment. When you later do examine.see += 1, you set examine.see to a new value. This has no effect on the dict because the dict was set to the value of self.see; it does not know to "keep watching" the name self.see to see if is pointing to a different value.
If you set self.see to a list, and then do examine.see += [1], you are not setting examine.see to a new value, but are changing the existing value. This will be visible in the dict, because, again, the dict is set to the value, and that value can change.
The thing is that sometimes a += b sets a to a new value, and sometimes it changes the existing value. Which one happens depends on the type of a; you need to know what examine.see is to know what examine.see += something does.
Others have addressed the mutability/boxing question. What you seem to be asking for is late binding. This is possible, but a little counterintuitive and there's probably a better solution to your underlying problem… if we knew what it was.
class test:
#property
def dic(self):
self._dic.update({'1': self.see})
return self._dic
def __init__(self):
self.see = 0
self._dic = {}
>>> ex=test()
>>> ex.see
0
>>> ex.see+=1
>>> ex.see
1
>>> ex.dic
{'1': 1}
>>> ex.see+=1
>>> ex.dic
{'1': 2}
In fact, in this contrived example it's even a little dangerous because returning self._dic the consumer could modify the dict directly. But that's OK, because you don't need to do this in real life. If you want the value of self.see, just get the value of self.see.
In fact, it looks like this is what you want:
class test:
_see = 0
#property
def see(self):
self._see+=1
return self._see
or, you know, just itertools.count() :P
This solution worked for me. Feel free to use it.
class integer:
def __init__(self, integer):
self.value=integer
def plus(self):
self.value=self.value+1
def output(self):
return self.value
The solution replaces the mutable type int with a class whose address is used as reference.
Furthermore you can make changes to the class object and the changes apply to what the dictionary points. It is somewhat a pointer/datastructure.

Categories