I tried the following code and It gave me different output.
>>> foo1 = 4
>>> foo2 = 2+2
>>> id(foo1)
37740064L
>>> id(foo2)
37740064L
>>> foo1 = 4.3
>>> foo2 = 1.3+3.0
>>> id(foo1)
37801304L
>>> id(foo2)
37801232L
>>>
I am using python 2.7.2. Why id function return different value in case of float but same value in case of integers?
That is because the result of id in numeric constants is implementation defined.
In your case, Python 2.7.2, IIRC, the issue is that the compiler builds a few useful integer constants as singletons, (from -1 to 100 or so). The rationale is that these numbers are used so frequently that it makes no sense to dynamically allocate them each time they are needed, they are simply reused.
But that constant singleton optimization is not useful for float values, other than maybe 0.0, there are too many of them! So each time a new float value is needed it is allocated, and it gets a different id.
For a more deeply insight, read the source! This file is from Python3, but the idea is the same: look for the small_ints array.
id is never really predictable, not even for integers. With low the very integers 2 and 4, you just happen to hit the small integer cache. Try this:
>>> a = 12345
>>> b = 12345
>>> id(a)
33525888
>>> id(b)
33525852
>>>
For small integers and strings, that are expected to be frequently used, Python uses internal memory optimization. Since any variable in Python is a reference to memory object, Python puts such small values into the memory only once. Then, whenever the same value is assigned to any other variable, it makes that variable point to the object already kept in memory. This works for strings and integers as they are immutable and if the variable value changes, effectively it's the reference used by this variable that is changed, the object in memory with original value is not itself affected.
That's why variables foo1 and foo2 in first case hold reference to the same integer object in memory with value 4 and therefore the ids are the same.
First of all, floating point numbers are not 'small', and, second, the same 4.3 in memory depending on calculations might be kept as 4.3123456789 and 4.31239874654 (just example numbers to explain). So these two values are two different objects, but during calculations and displaying the meaningful part looks the same, i.e. 4.3 (in fact there's obviously many more possible values in memory for the same meaningful floating point number). So reusing the same floating point number object in memory is problematic and doesn't worth it after all.
That's why in the second case foo1 and foo2 reference different floating point object in memory and therefore have different ids.
See more details on how floating point numbers are kept in memory:
http://floating-point-gui.de/
http://docs.python.org/2/tutorial/floatingpoint.html
Also, there's a big article on floating point numbers at Oracle docs.
#josliber, I edited the answer before reposting as you advised.
Besides from the mentionned (and very true) reasons, did you verify if foo1 == foo2 at the first place? As you are dealing with floating point values, there can very easily be differences...
For floating numbers, 1.0+3.3 == 4.3 is not always TRUE (in Python or other languages). and mutable objects with same value can have different IDs, too.
In that case the float value containing variable may contain same data but reason behind is to mathematical rules.
The only two value are consider after the point & remaining are value get neglected.
so the we get two different addresses of the same data containing variables.
To solve this error we can use the round() method.
y=3.1254
x = 3.1254
print(round(x,2))
print(round(y,2))
print(id(x)==id(y))
This can display the both has same addresses because in print(round(x,2)) method & print(round(y,2)) method takes the only two digits after point.
Related
In C, a statement x=x+1 will change the content at the same memory that is allocated for x. But in Python, since a variable can have different types, x at the left and right side of = may be of different types, which means they may refer to different pieces of memory. If so, after x changes its reference from the old memory to the new memory, the old memory can be reclaimed by the garbage collection mechanism. If it is the case, the following code may trigger the garbage collection process many times thus is very low efficient:
for i in range(1000000000)
i=i+1
Is my guess correct?
Update:
I need to correct the typo in the code to make the question clearer:
x=0
for i in range(1000000000)
x=x+1
#SvenMarnach, do you mean the integers 0,1,2,...,999999999 (which the label x once referred to) all exist in memory if garbage collection is not activated?
id can be used to track the 'allocation' of memory to objects. It should be used with caution, but here I think it's illuminating. id is a bit like a c pointer - that is, some how related to 'where' the object is located in memory.
In [18]: for i in range(0,1000,100):
...: print(i,id(i))
...: i = i+1
...: print(i,id(i))
...:
0 10914464
1 10914496
100 10917664
101 10917696
200 10920864
201 10920896
300 140186080959760
301 140185597404720
400 140186080959760
401 140185597404720
...
900 140186080959760
901 140185597404720
In [19]: id(1)
Out[19]: 10914496
Small integers (<256) are cached - that is, integer 1, once created is 'reused'.
In [20]: id(202)
Out[20]: 10920928 # same id as in the loop
In [21]: id(302)
Out[21]: 140185451618128 # different id
In [22]: id(901)
Out[22]: 140185597404208
In [23]: id(i)
Out[23]: 140185597404720 # = 901, but different id
In this loop, the first few iterations create or reuse small integers. But it appears that when creating larger integers, it is 'reusing' memory. It may not be full blown garbage collection, but the code is somehow optimized to avoid unnecessary memory use.
Generally as Python programmers don't focus on those details. Write clean reliable Python code. In this example, modifying an iteration variable in the loop is poor practice (even if it is just an example).
You are mostly correct, though I think a few clarifications may help.
First, the concept of variables in C in Python is rather different. In C, a variable generally references a fixed location in memory, as you stated yourself. In Python, a variable is just a label that can be attached to any object. An object could have multiple such labels, or none at all, and labels can be freely moved between objects. An assignment in C copies a new value to a memory location, while an assignment in Python attaches a new label to an object.
Integers are also very different in both languages. In C, an integer has a fixed size, and stores an integer value in a format native to the hardware. In Python, integers have arbitrary precision. They are stored as array of "digits" (usually 30-bit integers in CPython) together with a Python type header storing type information. Bigger integers will occupy more memory than smaller integers.
Moreover, integer objects in Python are immutable – they can't be changed once created. This means every arithmetic operation creates a new integer object. So the loop in your code indeed creates a new integer object in each iteration.
However, this isn't the only overhead. It also creates a new integer object for i in each iteration, which is dropped at the end of the loop body. And the arithmetic operation is dynamic – Python needs to look up the type of x and its __add__() method in each iteration to figure out how to add objects of this type. And function call overhead in Python is rather high.
Garbage collection and memory allocation on the other hand are rather fast in CPython. Garbage collection for integers relies completely on reference counting (no reference cycles possible here), which is fast. And for allocation, CPython uses an arena allocator for small objects that can quickly reuse memory slots without calling the system allocator.
So in summary, yes, compared to the same code in C, this code will run awfully slow in Python. A modern C compiler would simply compute the result of this loop at compile time and load the result to a register, so it would finish basically immediately. If raw speed for integer arithmetic is what you want, don't write that code in Python.
I understand the differences between mutable and immutable objects in Python. I have read many posts discussing the differences. However, I have not read anything regarding WHY integers are immutable objects.
Does there exist a reason for this? Or is the answer "that's just how it is"?
Edit: I am getting prompted to 'differentiate' this question from other questions as it seems to be a previously asked question. However, I believe what I'm asking is more of a philosophical Python question rather than a technical Python question.
It appears that 'primitive' objects in Python (i.e. strings, booleans, numbers, etc.) are immutable. I've also noticed that derived data types that are made up of primitives (i.e. dicts, lists, classes) are mutable.
Is that where the line is drawn whether or not an object is mutable? Primitive vs derived?
Making integers mutable would be very counter-intuitive to the way we are used to working with them.
Consider this code fragment:
a = 1 # assign 1 to a
b = a+2 # assign 3 to b, leave a at 1
After these assignments are executed we expect a to have the value 1 and b to have the value 3. The addition operation is creating a new integer value from the integer stored in a and an instance of the integer 2.
If the addition operation just took the integer at a and just mutated it then both a and b would have the value 3.
So we expect arithmetic operations to create new values for their results - not to mutate their input parameters.
However, there are cases where mutating a data structure is more convenient and more efficient. Let's suppose for the moment that list.append(x) did not modify list but returned a new copy of list with x appended.
Then a function like this:
def foo():
nums = []
for x in range(0,10):
nums.append(x)
return nums
would just return the empty list. (Remember - here nums.append(x) doesn't alter nums - it returns a new list with x appended. But this new list isn't saved anywhere.)
We would have to write the foo routine like this:
def foo():
nums = []
for x in range(0,10):
nums = nums.append(x)
return nums
(This, in fact, is very similar to the situation with Python strings up until about 2.6 or perhaps 2.5.)
Moreover, every time we assign nums = nums.append(x) we would be copying a list that is increasing in size resulting in quadratic behavior.
For those reasons we make lists mutable objects.
A consequence to making lists mutable is that after these statements:
a = [1,2,3]
b = a
a.append(4)
the list b has changed to [1,2,3,4]. This is something that we live with even though it still trips us up now and then.
What are the design decisions to make numbers immutable in Python?
There are several reasons for immutability, let's see first what are the reasons for immutability?
1- Memory
Saves memory. If it's well known that an object is immutable, it can be easily copied creating a new reference to the same object.
Performance. Python can allocate space for an immutable object at creation time, and the storage requirements are fixed and unchanging.
2- Fast execution.
It doesn't have to copy each part of the object, only a simple reference.
Easy to be compared, comparing equality by reference is faster than comparing values.
3- Security:
In Multi-threading apps Different threads can interact with data contained inside the immutable objects, without to worry about data consistency.
The internal state of your program will be consistent even if you have exceptions.
Classes should be immutable unless there's a very good reason to make them mutable....If a class cannot be made immutable, limit its mutability as much as possible
4- Ease to use
Is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways.
Immutable objects are easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce.
5- Keys must be immutable. Which means you can use strings, numbers or tuples as dictionary key. This is something that you want to use.
The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.
Going back to the integers:
Security (3), Easy to use (4) and capacity of using numbers as keys in dictionaries (5) are reasons for taken the decision of making numbers immutable.
Has fixed memory requirements since creation time (1).
All in Python is an object, the numbers (like strings) are "elemental" objects. No amount of activity will change the value 8 to anything else, and no amount of activity will change the string “eight” to anything else. This is because a decision in the design too.
I'm working through http://www.mypythonquiz.com, and question #45 asks for the output of the following code:
confusion = {}
confusion[1] = 1
confusion['1'] = 2
confusion[1.0] = 4
sum = 0
for k in confusion:
sum += confusion[k]
print sum
The output is 6, since the key 1.0 replaces 1. This feels a bit dangerous to me, is this ever a useful language feature?
First of all: the behaviour is documented explicitly in the docs for the hash function:
hash(object)
Return the hash value of the object (if it has one). Hash values are
integers. They are used to quickly compare dictionary keys during a
dictionary lookup. Numeric values that compare equal have the same
hash value (even if they are of different types, as is the case for 1
and 1.0).
Secondly, a limitation of hashing is pointed out in the docs for object.__hash__
object.__hash__(self)
Called by built-in function hash() and for operations on members of
hashed collections including set, frozenset, and dict. __hash__()
should return an integer. The only required property is that objects
which compare equal have the same hash value;
This is not unique to python. Java has the same caveat: if you implement hashCode then, in order for things to work correctly, you must implement it in such a way that: x.equals(y) implies x.hashCode() == y.hashCode().
So, python decided that 1.0 == 1 holds, hence it's forced to provide an implementation for hash such that hash(1.0) == hash(1). The side effect is that 1.0 and 1 act exactly in the same way as dict keys, hence the behaviour.
In other words the behaviour in itself doesn't have to be used or useful in any way. It is necessary. Without that behaviour there would be cases where you could accidentally overwrite a different key.
If we had 1.0 == 1 but hash(1.0) != hash(1) we could still have a collision. And if 1.0 and 1 collide, the dict will use equality to be sure whether they are the same key or not and kaboom the value gets overwritten even if you intended them to be different.
The only way to avoid this would be to have 1.0 != 1, so that the dict is able to distinguish between them even in case of collision. But it was deemed more important to have 1.0 == 1 than to avoid the behaviour you are seeing, since you practically never use floats and ints as dictionary keys anyway.
Since python tries to hide the distinction between numbers by automatically converting them when needed (e.g. 1/2 -> 0.5) it makes sense that this behaviour is reflected even in such circumstances. It's more consistent with the rest of python.
This behaviour would appear in any implementation where the matching of the keys is at least partially (as in a hash map) based on comparisons.
For example if a dict was implemented using a red-black tree or an other kind of balanced BST, when the key 1.0 is looked up the comparisons with other keys would return the same results as for 1 and so they would still act in the same way.
Hash maps require even more care because of the fact that it's the value of the hash that is used to find the entry of the key and comparisons are done only afterwards. So breaking the rule presented above means you'd introduce a bug that's quite hard to spot because at times the dict may seem to work as you'd expect it, and at other times, when the size changes, it would start to behave incorrectly.
Note that there would be a way to fix this: have a separate hash map/BST for each type inserted in the dictionary. In this way there couldn't be any collisions between objects of different type and how == compares wouldn't matter when the arguments have different types.
However this would complicate the implementation, it would probably be inefficient since hash maps have to keep quite a few free locations in order to have O(1) access times. If they become too full the performances decrease. Having multiple hash maps means wasting more space and also you'd need to first choose which hash map to look at before even starting the actual lookup of the key.
If you used BSTs you'd first have to lookup the type and the perform a second lookup. So if you are going to use many types you'd end up with twice the work (and the lookup would take O(log n) instead of O(1)).
You should consider that the dict aims at storing data depending on the logical numeric value, not on how you represented it.
The difference between ints and floats is indeed just an implementation detail and not conceptual. Ideally the only number type should be an arbitrary precision number with unbounded accuracy even sub-unity... this is however hard to implement without getting into troubles... but may be that will be the only future numeric type for Python.
So while having different types for technical reasons Python tries to hide these implementation details and int->float conversion is automatic.
It would be much more surprising if in a Python program if x == 1: ... wasn't going to be taken when x is a float with value 1.
Note that also with Python 3 the value of 1/2 is 0.5 (the division of two integers) and that the types long and non-unicode string have been dropped with the same attempt to hide implementation details.
In python:
1==1.0
True
This is because of implicit casting
However:
1 is 1.0
False
I can see why automatic casting between float and int is handy, It is relatively safe to cast int into float, and yet there are other languages (e.g. go) that stay away from implicit casting.
It is actually a language design decision and a matter of taste more than different functionalities
Dictionaries are implemented with a hash table. To look up something in a hash table, you start at the position indicated by the hash value, then search different locations until you find a key value that's equal or an empty bucket.
If you have two key values that compare equal but have different hashes, you may get inconsistent results depending on whether the other key value was in the searched locations or not. For example this would be more likely as the table gets full. This is something you want to avoid. It appears that the Python developers had this in mind, since the built-in hash function returns the same hash for equivalent numeric values, no matter if those values are int or float. Note that this extends to other numeric types, False is equal to 0 and True is equal to 1. Even fractions.Fraction and decimal.Decimal uphold this property.
The requirement that if a == b then hash(a) == hash(b) is documented in the definition of object.__hash__():
Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. __hash__() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.
TL;DR: a dictionary would break if keys that compared equal did not map to the same value.
Frankly, the opposite is dangerous! 1 == 1.0, so it's not improbable to imagine that if you had them point to different keys and tried to access them based on an evaluated number then you'd likely run into trouble with it because the ambiguity is hard to figure out.
Dynamic typing means that the value is more important than what the technical type of something is, since the type is malleable (which is a very useful feature) and so distinguishing both ints and floats of the same value as distinct is unnecessary semantics that will only lead to confusion.
I agree with others that it makes sense to treat 1 and 1.0 as the same in this context. Even if Python did treat them differently, it would probably be a bad idea to try to use 1 and 1.0 as distinct keys for a dictionary. On the other hand -- I have trouble thinking of a natural use-case for using 1.0 as an alias for 1 in the context of keys. The problem is that either the key is literal or it is computed. If it is a literal key then why not just use 1 rather than 1.0? If it is a computed key -- round off error could muck things up:
>>> d = {}
>>> d[1] = 5
>>> d[1.0]
5
>>> x = sum(0.01 for i in range(100)) #conceptually this is 1.0
>>> d[x]
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
d[x]
KeyError: 1.0000000000000007
So I would say that, generally speaking, the answer to your question "is this ever a useful language feature?" is "No, probably not."
I was playing a bit in my python shell while learning about mutability of objects.
I found something strange:
>>> x=5.0
>>> id(x)
48840312
>>> id(5.0)
48840296
>>> x=x+3.0
>>> id(x) # why did x (now 8.0) keep the same id as 5.0?
48840296
>>> id(5.0)
36582128
>>> id(5.0)
48840344
Why is the id of 5.0 reused after the statement x=x+3.0?
Fundamentally, the answer to your question is "calling id() on numbers will give you unpredictable results". The reason for this is because unlike languages like Java, where primitives literally are their value in memory, "primitives" in Python are still objects, and no guarantee is provided that exactly the same object will be used every time, merely that a functionally equivalent one will be.
CPython caches the values of the integers from -5 to 256 for efficiency (ensuring that calls to id() will always be the same), since these are commonly used and can be effectively cached, however nothing about the language requires this to be the case, and other implementations may chose not to do so.
Whenever you write a double literal in Python, you're asking the interpreter to convert the string into a valid numerical object. If it can, Python will reuse existing objects, but if it cannot easily determine whether an object exits already, it will simply create a new one.
This is not to say that numbers in Python are mutable - they aren't. Any instance of a number, such as 5.0, in Python cannot be changed by the user after being created. However there's nothing wrong, as far as the interpreter is concerned, with constructing more than one instance of the same number.
Your specific example of the object representing x = 5.0 being reused for the value of x += 3.0 is an implementation detail. Under the covers, CPython may, if it sees fit, reuse numerical objects, both integers and floats, to avoid the costly activity of constructing a whole new object. I stress however, this is an implementation detail; it's entirely possible certain cases will not display this behavior, and CPython could at any time change its number-handling logic to no longer behave this way. You should avoid writing any code that relies on this quirk.
The alternative, as eryksun points out, is simply that you stumbled on an object being garbage collected and replaced in the same location. From the user's perspective, there's no difference between the two cases, and this serves to stress that id() should not be used on "primitives".
The Devil is in the details
PyObject* PyInt_FromLong(long ival)
Return value: New reference.
Create a new integer object with a value of ival.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range
you actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
Note This is true only for CPython and may not apply for other Python Distribution.
Normally if I assign a variable some value, and then check their ids, I expect them to be the same, because python is essentially just giving my object a "name". This can be seen in the below code:
>>> a = 3
>>> id(a)
19845928
>>> id(3)
19845928
The problem is when I perform the same with "name"
>>> __name__
'__main__'
>>> id(__name__)
19652416
>>> id('__main__')
19652448
How can there ids be different, shouldn't they be the same? Because __name__ should also be just a reference.
id() gives essentially the memory pointer to the data. Although strings are immutable, they are not guaranteed to be interned. This means that some strings with equal values have different pointers.
For integers (especially small ones), the pointers will be the same, so your 3 example works fine.
#KartikAnand: The way you're checking for 'same object' is valid, although the usual way is to use x is y. The problem is that they are not the same object, and not guaranteed to be. They simply have the same value. Note that when you do "__main__" you're creating a new object. Sometimes python does a nice optimization and re-uses a previously-created string of the same value, but it doesn't have to.
Kartik's goal is to "verify that assignment is in a way reference and objects are not created on the fly". To do this, avoid creating new objects (no string literals).
>>> __name__
'__main__'
>>> x = __name__
>>> id(__name__)
3078339808L
>>> id(x)
3078339808L
>>> __name__ is x
True
Just because two strings have the same value, it does not follow that they are the same object. This is completely expected behaviour.
In Python, small integers are "pooled", so that all small integer values point to the same object. This is not necessary true for strings.
At any rate, this is an implementation detail that should not be relied upon.
What you are running in to here is the fact that primitives are pseudo (or real) singletons in Python. Further, looking at strings clouds the issue because when strings are interned, value and id become synonymous as a side effect, so some strings with value matches will have id matches and others won't. Try looking at hand-built objects instead, as then you control when a new instance is created, and id vs value becomes more explicitly clear.