This question already has answers here:
Why does id({}) == id({}) and id([]) == id([]) in CPython?
(2 answers)
Closed 7 years ago.
Here is the simple Python code: What's the difference between Case 1 and Case 2 -- why am I getting result as False in first case and True in other? Why are the ids equal in the Case 2? Also does dir(object) call object._dir__() internally? If so the return object/results of two calls should it be the same.
class Hello:
def __init__(self):
self.a1 = "a1"
hello = Hello()
print(hello)
# Case 1
var1 = dir(hello)
var2 = hello.__dir__()
print(id(var1), id(var2), id(var1) == id(var2))
# Case 2
print(id(dir(hello)), id(hello.__dir__()), id(dir(hello)) == id(hello.__dir__()))
print(dir(hello) == hello.__dir__())
Output
<__main__.Hello object at 0x7f320828c320>
139852862206472 139852862013960 False
139852862014024 139852862014024 True
False
It's just a coincidence that you're ever getting True. (Well, not a coincidence, since the implementation of CPython makes it very likely… but it's not something the language requires.)
In case 1, you have two different dicts in var1 and var2. They're both alive at the same time, so they can't have the same id.
In case 2, you again have two different dicts—but this time, you aren't storing them anywhere; as soon as you call id on one, you release it, which means it can get garbage collected* before you get the other one,** which means it can end up reusing the same id.***
Notice that the docs for id say:
This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
If you actually want to test whether two expressions refer to the same object, use is, don't compare their ids.
Your edited question also asks:
Also does dir(object) calls object._dir__() internally?
According to dir:
If the object has a method named __dir__(), this method will be called and must return the list of attributes.
And the data model section on __dir__ says:
Called when dir() is called on the object. A sequence must be returned. dir() converts the returned sequence to a list and sorts it.
Then you say:
If so the return object of two calls should be the same.
Well, it depends on what you mean by "the same". It should return an equal value (since nothing has changed), but it's not going to be the identical value, which is what you're trying to test for. (If it isn't obvious why dir gives you a new list each time, it should still be clear that it must do so from the fact that "dir() converts the returned sequence to a list and sorts it"…)
* Because CPython uses reference counting as its primary garbage collection mechanism, "can be collected" generally means "will be collected immediately". This isn't true for most other Python implementations.
** If the order in which parts of your expression get evaluated isn't clear to you from reading the docs, you can try dis.dis('id(dir(hello)) == id(hello.__dir__())') to see the actual bytecodes in order.
*** In CPython, the id is just the address of the PyObject struct that represents the object; if one PyObject gets freed and another one of the same type gets allocated immediately after, it will usually get the same address.
Related
How much can I rely on the object's id() and its uniqueness in practice? E.g.:
Does id(a) == id(b) mean a is b or vice versa? What about the opposite?
How safe is it to save an id somewhere to be used later (e.g. into some registry instead of the object itself)?
(Written as a proposed canonical in response to Canonicals for Python: are objects with the same id() the same object, `is` operator, unbound method objects)
According to the id() documentation, an id is only guaranteed to be unique
for the lifetime of the specific object, and
within a specific interpreter instance
As such, comparing ids is not safe unless you also somehow ensure that both objects whose ids are taken are still alive at the time of comparison (and are associated with the same Python interpreter instance, but you need to really try to make that become false).
Which is exactly what is does -- which makes comparing ids redundant. If you cannot use the is syntax for whatever reason, there's always operator.is_.
Now, whether an object is still alive at the time of comparison is not always obvious (and sometimes is grossly non-obvious):
Accessing some attributes (e.g. bound methods of an object) creates a new object each time. So, the result's id may or may not be the same on each attribute access.
Example:
>>> class C(object): pass
>>> c=C()
>>> c.a=1
>>> c.a is c.a
True # same object each time
>>> c.__init__ is c.__init__
False # a different object each time
# The above two are not the only possible cases.
# An attribute may be implemented to sometimes return the same object
# and sometimes a different one:
#property
def page(self):
if check_for_new_version():
self._page=get_new_version()
return self._page
If an object is created as a result of calculating an expression and not saved anywhere, it's immediately discarded,1 and any object created after that can take up its id.
This is even true within the same code line. E.g. the result of id(create_foo()) == id(create_bar()) is undefined.
Example:
>>> id([]) #the list object is discarded when id() returns
39733320L
>>> id([]) #a new, unrelated object is created (and discarded, too)
39733320L #its id can happen to be the same
>>> id([[]])
39733640L #or not
>>> id([])
39733640L #you never really know
Due to the above safety requirements when comparing ids, saving an id instead of the object is not very useful because you have to save a reference to the object itself anyway -- to ensure that it stays alive. Neither is there any performance gain: is implementation is as simple as comparing pointers.
Finally, as an internal optimization (and implementation detail, so this may differ between implementations and releases), CPython reuses some often-used simple objects of immutable types. As of this writing, that includes small integers and some strings. So even if you got them from different places, their ids might coincide.
This does not (technically) violate the above id() documentation's uniqueness promises: the reused object stays alive through all the reuses.
This is also not a big deal because whether two variables point to the same object or not is only practical to know if the object is mutable: if two variables point to the same mutable object, mutating one will (unexpectedly) change the other, too. Immutable types don't have that problem, so for them, it doesn't matter if two variables point to two identical objects or to the same one.
1Sometimes, this is called "unnamed expression".
This question already has answers here:
How can two Python objects have same id but 'is' operator returns False?
(2 answers)
Why is the id of a Python class not unique when called quickly?
(6 answers)
Unnamed Python objects have the same id
(2 answers)
Closed 4 years ago.
Note that this question might be (is?) specific to CPython.
Say you have some list, and check copies of the list for identity against each other:
>>> a=list(range(10))
>>> b,c=a[:],a[:]
>>> b is c
False
>>> id(b), id(c)
(3157888272304, 3157888272256)
No great shakes there. But if we do this in a more ephemeral way, things might seem a bit weird at first:
>>> a[:] is a[:]
False # <- two ephemeral copies not the same object (duh)
>>> id(a[:]),id(a[:])
(3157888272544, 3157888272544) # <- but two other ephemerals share the same id..? hmm....
...until we recognize what is probably going on here. I have not confirmed it by looking at the CPython implementation (I can barely read c++ so it would be a waste of time, to be honest), but it at least seems obvious that even though two objects have the same id, CPython is smart enough to know that they aren't the same object.
Assuming this is correct, my question is: what criteria is CPython using to determine whether the two ephemeral objects are the not the same object, given that they have the same id (presumably for efficiency reasons- see below)? Is it perhaps looking at the time it was marked to be garbage collected? The time it was created? Or something else...?
My theory on why they have the same id is that, likely, CPython knows an ephemeral copy of the list was already made and is waiting to be garbage collected, and it just efficiently re-uses the same memory location. It would be great if an answer could clarify/confirm this as well.
Two unmutable objects, sharing the same address, would, as you are concerned, be indistinguishable from each other.
The thing is that when you do a[:] is a[:] both objetcts are not at the same address - in order for the identity operator is to compare both objects, both operands have to exist - so, there is still a reference to the object at the left hand side when the native code for is is actually run.
On the other hand, when you do id(a[:]),id(a[:]) the object inside the parentheses on the first call is left without any references as soon as the id function call is done, and is destroyed, freeing the memory block to be used by the second a[:].
Normally if I assign a variable some value, and then check their ids, I expect them to be the same, because python is essentially just giving my object a "name". This can be seen in the below code:
>>> a = 3
>>> id(a)
19845928
>>> id(3)
19845928
The problem is when I perform the same with "name"
>>> __name__
'__main__'
>>> id(__name__)
19652416
>>> id('__main__')
19652448
How can there ids be different, shouldn't they be the same? Because __name__ should also be just a reference.
id() gives essentially the memory pointer to the data. Although strings are immutable, they are not guaranteed to be interned. This means that some strings with equal values have different pointers.
For integers (especially small ones), the pointers will be the same, so your 3 example works fine.
#KartikAnand: The way you're checking for 'same object' is valid, although the usual way is to use x is y. The problem is that they are not the same object, and not guaranteed to be. They simply have the same value. Note that when you do "__main__" you're creating a new object. Sometimes python does a nice optimization and re-uses a previously-created string of the same value, but it doesn't have to.
Kartik's goal is to "verify that assignment is in a way reference and objects are not created on the fly". To do this, avoid creating new objects (no string literals).
>>> __name__
'__main__'
>>> x = __name__
>>> id(__name__)
3078339808L
>>> id(x)
3078339808L
>>> __name__ is x
True
Just because two strings have the same value, it does not follow that they are the same object. This is completely expected behaviour.
In Python, small integers are "pooled", so that all small integer values point to the same object. This is not necessary true for strings.
At any rate, this is an implementation detail that should not be relied upon.
What you are running in to here is the fact that primitives are pseudo (or real) singletons in Python. Further, looking at strings clouds the issue because when strings are interned, value and id become synonymous as a side effect, so some strings with value matches will have id matches and others won't. Try looking at hand-built objects instead, as then you control when a new instance is created, and id vs value becomes more explicitly clear.
This question already has answers here:
Why variable = object doesn't work like variable = number
(10 answers)
Closed 4 years ago.
There is this code:
# assignment behaviour for integer
a = b = 0
print a, b # prints 0 0
a = 4
print a, b # prints 4 0 - different!
# assignment behaviour for class object
class Klasa:
def __init__(self, num):
self.num = num
a = Klasa(2)
b = a
print a.num, b.num # prints 2 2
a.num = 3
print a.num, b.num # prints 3 3 - the same!
Questions:
Why assignment operator works differently for fundamental type and
class object (for fundamental types it copies by value, for class object it copies by reference)?
How to copy class objects only by value?
How to make references for fundamental types like in C++ int& b = a?
This is a stumbling block for many Python users. The object reference semantics are different from what C programmers are used to.
Let's take the first case. When you say a = b = 0, a new int object is created with value 0 and two references to it are created (one is a and another is b). These two variables point to the same object (the integer which we created). Now, we run a = 4. A new int object of value 4 is created and a is made to point to that. This means, that the number of references to 4 is one and the number of references to 0 has been reduced by one.
Compare this with a = 4 in C where the area of memory which a "points" to is written to. a = b = 4 in C means that 4 is written to two pieces of memory - one for a and another for b.
Now the second case, a = Klass(2) creates an object of type Klass, increments its reference count by one and makes a point to it. b = a simply takes what a points to , makes b point to the same thing and increments the reference count of the thing by one. It's the same as what would happen if you did a = b = Klass(2). Trying to print a.num and b.num are the same since you're dereferencing the same object and printing an attribute value. You can use the id builtin function to see that the object is the same (id(a) and id(b) will return the same identifier). Now, you change the object by assigning a value to one of it's attributes. Since a and b point to the same object, you'd expect the change in value to be visible when the object is accessed via a or b. And that's exactly how it is.
Now, for the answers to your questions.
The assignment operator doesn't work differently for these two. All it does is add a reference to the RValue and makes the LValue point to it. It's always "by reference" (although this term makes more sense in the context of parameter passing than simple assignments).
If you want copies of objects, use the copy module.
As I said in point 1, when you do an assignment, you always shift references. Copying is never done unless you ask for it.
Quoting from Data Model
Objects are Python’s abstraction for data. All data in a Python
program is represented by objects or by relations between objects. (In
a sense, and in conformance to Von Neumann’s model of a “stored
program computer,” code is also represented by objects.)
From Python's point of view, Fundamental data type is fundamentally different from C/C++. It is used to map C/C++ data types to Python. And so let's leave it from the discussion for the time being and consider the fact that all data are object and are manifestation of some class. Every object has an ID (somewhat like address), Value, and a Type.
All objects are copied by reference. For ex
>>> x=20
>>> y=x
>>> id(x)==id(y)
True
>>>
The only way to have a new instance is by creating one.
>>> x=3
>>> id(x)==id(y)
False
>>> x==y
False
This may sound complicated at first instance but to simplify a bit, Python made some types immutable. For example you can't change a string. You have to slice it and create a new string object.
Often copying by reference gives unexpected results for ex.
x=[[0]*8]*8 may give you a feeling that it creates a two dimensional list of 0s. But in fact it creates a list of the reference of the same list object [0]s. So doing x[1][1] would end up changing all the duplicate instance at the same time.
The Copy module provides a method called deepcopy to create a new instance of the object rather than a shallow instance. This is beneficial when you intend to have two distinct object and manipulate it separately just as you intended in your second example.
To extend your example
>>> class Klasa:
def __init__(self, num):
self.num = num
>>> a = Klasa(2)
>>> b = copy.deepcopy(a)
>>> print a.num, b.num # prints 2 2
2 2
>>> a.num = 3
>>> print a.num, b.num # prints 3 3 - different!
3 2
It doesn't work differently. In your first example, you changed a so that a and b reference different objects. In your second example, you did not, so a and b still reference the same object.
Integers, by the way, are immutable. You can't modify their value. All you can do is make a new integer and rebind your reference. (like you did in your first example)
Suppose you and I have a common friend. If I decide that I no longer like her, she is still your friend. On the other hand, if I give her a gift, your friend received a gift.
Assignment doesn't copy anything in Python, and "copy by reference" is somewhere between awkward and meaningless (as you actually point out in one of your comments). Assignment causes a variable to begin referring to a value. There aren't separate "fundamental types" in Python; while some of them are built-in, int is still a class.
In both cases, assignment causes the variable to refer to whatever it is that the right-hand-side evaluates to. The behaviour you're seeing is exactly what you should expect in that environment, per the metaphor. Whether your "friend" is an int or a Klasa, assigning to an attribute is fundamentally different from reassigning the variable to a completely other instance, with the correspondingly different behaviour.
The only real difference is that the int doesn't happen to have any attributes you can assign to. (That's the part where the implementation actually has to do a little magic to restrict you.)
You are confusing two different concepts of a "reference". The C++ T& is a magical thing that, when assigned to, updates the referred-to object in-place, and not the reference itself; that can never be "reseated" once the reference is initialized. This is useful in a language where most things are values. In Python, everything is a reference to begin with. The Pythonic reference is more like an always-valid, never-null, not-usable-for-arithmetic, automatically-dereferenced pointer. Assignment causes the reference to start referring to a different thing completely. You can't "update the referred-to object in-place" by replacing it wholesale, because Python's objects just don't work like that. You can, of course, update its internal state by playing with its attributes (if there are any accessible ones), but those attributes are, themselves, also all references.
This would be similar to the java.lang.Object.hashcode() method.
I need to store objects I have no control over in a set, and make sure that only if two objects are actually the same object (not contain the same values) will the values be overwritten.
id(x)
will do the trick for you. But I'm curious, what's wrong about the set of objects (which does combine objects by value)?
For your particular problem I would probably keep the set of ids or of wrapper objects. A wrapper object will contain one reference and compare by x==y <==> x.ref is y.ref.
It's also worth noting that Python objects have a hash function as well. This function is necessary to put an object into a set or dictionary. It is supposed to sometimes collide for different objects, though good implementations of hash try to make it less likely.
That's what "is" is for.
Instead of testing "if a == b", which tests for the same value,
test "if a is b", which will test for the same identifier.
As ilya n mentions, id(x) produces a unique identifier for an object.
But your question is confusing, since Java's hashCode method doesn't give a unique identifier. Java's hashCode works like most hash functions: it always returns the same value for the same object, two objects that are equal always get equal codes, and unequal hash values imply unequal hash codes. In particular, two different and unequal objects can get the same value.
This is confusing because cryptographic hash functions are quite different from this, and more like (though not exactly) the "unique id" that you asked for.
The Python equivalent of Java's hashCode method is hash(x).
You don't have to compare objects before placing them in a set. set() semantics already takes care of this.
class A(object):
a = 10
b = 20
def __hash__(self):
return hash((self.a, self.b))
a1 = A()
a2 = A()
a3 = A()
a4 = a1
s = set([a1,a2,a3,a4])
s
=> set([<__main__.A object at 0x222a8c>, <__main__.A object at 0x220684>, <__main__.A object at 0x22045c>])
Note: You really don't have to override hash to prove this behaviour :-)