Integers v/s Floats in python:Cannot understand the behavior - python

I was playing a bit in my python shell while learning about mutability of objects.
I found something strange:
>>> x=5.0
>>> id(x)
48840312
>>> id(5.0)
48840296
>>> x=x+3.0
>>> id(x) # why did x (now 8.0) keep the same id as 5.0?
48840296
>>> id(5.0)
36582128
>>> id(5.0)
48840344
Why is the id of 5.0 reused after the statement x=x+3.0?

Fundamentally, the answer to your question is "calling id() on numbers will give you unpredictable results". The reason for this is because unlike languages like Java, where primitives literally are their value in memory, "primitives" in Python are still objects, and no guarantee is provided that exactly the same object will be used every time, merely that a functionally equivalent one will be.
CPython caches the values of the integers from -5 to 256 for efficiency (ensuring that calls to id() will always be the same), since these are commonly used and can be effectively cached, however nothing about the language requires this to be the case, and other implementations may chose not to do so.
Whenever you write a double literal in Python, you're asking the interpreter to convert the string into a valid numerical object. If it can, Python will reuse existing objects, but if it cannot easily determine whether an object exits already, it will simply create a new one.
This is not to say that numbers in Python are mutable - they aren't. Any instance of a number, such as 5.0, in Python cannot be changed by the user after being created. However there's nothing wrong, as far as the interpreter is concerned, with constructing more than one instance of the same number.
Your specific example of the object representing x = 5.0 being reused for the value of x += 3.0 is an implementation detail. Under the covers, CPython may, if it sees fit, reuse numerical objects, both integers and floats, to avoid the costly activity of constructing a whole new object. I stress however, this is an implementation detail; it's entirely possible certain cases will not display this behavior, and CPython could at any time change its number-handling logic to no longer behave this way. You should avoid writing any code that relies on this quirk.
The alternative, as eryksun points out, is simply that you stumbled on an object being garbage collected and replaced in the same location. From the user's perspective, there's no difference between the two cases, and this serves to stress that id() should not be used on "primitives".

The Devil is in the details
PyObject* PyInt_FromLong(long ival)
Return value: New reference.
Create a new integer object with a value of ival.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range
you actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
Note This is true only for CPython and may not apply for other Python Distribution.

Related

How is Python statement x=x+1 implemented?

In C, a statement x=x+1 will change the content at the same memory that is allocated for x. But in Python, since a variable can have different types, x at the left and right side of = may be of different types, which means they may refer to different pieces of memory. If so, after x changes its reference from the old memory to the new memory, the old memory can be reclaimed by the garbage collection mechanism. If it is the case, the following code may trigger the garbage collection process many times thus is very low efficient:
for i in range(1000000000)
i=i+1
Is my guess correct?
Update:
I need to correct the typo in the code to make the question clearer:
x=0
for i in range(1000000000)
x=x+1
#SvenMarnach, do you mean the integers 0,1,2,...,999999999 (which the label x once referred to) all exist in memory if garbage collection is not activated?
id can be used to track the 'allocation' of memory to objects. It should be used with caution, but here I think it's illuminating. id is a bit like a c pointer - that is, some how related to 'where' the object is located in memory.
In [18]: for i in range(0,1000,100):
...: print(i,id(i))
...: i = i+1
...: print(i,id(i))
...:
0 10914464
1 10914496
100 10917664
101 10917696
200 10920864
201 10920896
300 140186080959760
301 140185597404720
400 140186080959760
401 140185597404720
...
900 140186080959760
901 140185597404720
In [19]: id(1)
Out[19]: 10914496
Small integers (<256) are cached - that is, integer 1, once created is 'reused'.
In [20]: id(202)
Out[20]: 10920928 # same id as in the loop
In [21]: id(302)
Out[21]: 140185451618128 # different id
In [22]: id(901)
Out[22]: 140185597404208
In [23]: id(i)
Out[23]: 140185597404720 # = 901, but different id
In this loop, the first few iterations create or reuse small integers. But it appears that when creating larger integers, it is 'reusing' memory. It may not be full blown garbage collection, but the code is somehow optimized to avoid unnecessary memory use.
Generally as Python programmers don't focus on those details. Write clean reliable Python code. In this example, modifying an iteration variable in the loop is poor practice (even if it is just an example).
You are mostly correct, though I think a few clarifications may help.
First, the concept of variables in C in Python is rather different. In C, a variable generally references a fixed location in memory, as you stated yourself. In Python, a variable is just a label that can be attached to any object. An object could have multiple such labels, or none at all, and labels can be freely moved between objects. An assignment in C copies a new value to a memory location, while an assignment in Python attaches a new label to an object.
Integers are also very different in both languages. In C, an integer has a fixed size, and stores an integer value in a format native to the hardware. In Python, integers have arbitrary precision. They are stored as array of "digits" (usually 30-bit integers in CPython) together with a Python type header storing type information. Bigger integers will occupy more memory than smaller integers.
Moreover, integer objects in Python are immutable – they can't be changed once created. This means every arithmetic operation creates a new integer object. So the loop in your code indeed creates a new integer object in each iteration.
However, this isn't the only overhead. It also creates a new integer object for i in each iteration, which is dropped at the end of the loop body. And the arithmetic operation is dynamic – Python needs to look up the type of x and its __add__() method in each iteration to figure out how to add objects of this type. And function call overhead in Python is rather high.
Garbage collection and memory allocation on the other hand are rather fast in CPython. Garbage collection for integers relies completely on reference counting (no reference cycles possible here), which is fast. And for allocation, CPython uses an arena allocator for small objects that can quickly reuse memory slots without calling the system allocator.
So in summary, yes, compared to the same code in C, this code will run awfully slow in Python. A modern C compiler would simply compute the result of this loop at compile time and load the result to a register, so it would finish basically immediately. If raw speed for integer arithmetic is what you want, don't write that code in Python.

Why are integers immutable in Python?

I understand the differences between mutable and immutable objects in Python. I have read many posts discussing the differences. However, I have not read anything regarding WHY integers are immutable objects.
Does there exist a reason for this? Or is the answer "that's just how it is"?
Edit: I am getting prompted to 'differentiate' this question from other questions as it seems to be a previously asked question. However, I believe what I'm asking is more of a philosophical Python question rather than a technical Python question.
It appears that 'primitive' objects in Python (i.e. strings, booleans, numbers, etc.) are immutable. I've also noticed that derived data types that are made up of primitives (i.e. dicts, lists, classes) are mutable.
Is that where the line is drawn whether or not an object is mutable? Primitive vs derived?
Making integers mutable would be very counter-intuitive to the way we are used to working with them.
Consider this code fragment:
a = 1 # assign 1 to a
b = a+2 # assign 3 to b, leave a at 1
After these assignments are executed we expect a to have the value 1 and b to have the value 3. The addition operation is creating a new integer value from the integer stored in a and an instance of the integer 2.
If the addition operation just took the integer at a and just mutated it then both a and b would have the value 3.
So we expect arithmetic operations to create new values for their results - not to mutate their input parameters.
However, there are cases where mutating a data structure is more convenient and more efficient. Let's suppose for the moment that list.append(x) did not modify list but returned a new copy of list with x appended.
Then a function like this:
def foo():
nums = []
for x in range(0,10):
nums.append(x)
return nums
would just return the empty list. (Remember - here nums.append(x) doesn't alter nums - it returns a new list with x appended. But this new list isn't saved anywhere.)
We would have to write the foo routine like this:
def foo():
nums = []
for x in range(0,10):
nums = nums.append(x)
return nums
(This, in fact, is very similar to the situation with Python strings up until about 2.6 or perhaps 2.5.)
Moreover, every time we assign nums = nums.append(x) we would be copying a list that is increasing in size resulting in quadratic behavior.
For those reasons we make lists mutable objects.
A consequence to making lists mutable is that after these statements:
a = [1,2,3]
b = a
a.append(4)
the list b has changed to [1,2,3,4]. This is something that we live with even though it still trips us up now and then.
What are the design decisions to make numbers immutable in Python?
There are several reasons for immutability, let's see first what are the reasons for immutability?
1- Memory
Saves memory. If it's well known that an object is immutable, it can be easily copied creating a new reference to the same object.
Performance. Python can allocate space for an immutable object at creation time, and the storage requirements are fixed and unchanging.
2- Fast execution.
It doesn't have to copy each part of the object, only a simple reference.
Easy to be compared, comparing equality by reference is faster than comparing values.
3- Security:
In Multi-threading apps Different threads can interact with data contained inside the immutable objects, without to worry about data consistency.
The internal state of your program will be consistent even if you have exceptions.
Classes should be immutable unless there's a very good reason to make them mutable....If a class cannot be made immutable, limit its mutability as much as possible
4- Ease to use
Is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways.
Immutable objects are easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce.
5- Keys must be immutable. Which means you can use strings, numbers or tuples as dictionary key. This is something that you want to use.
The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.
Going back to the integers:
Security (3), Easy to use (4) and capacity of using numbers as keys in dictionaries (5) are reasons for taken the decision of making numbers immutable.
Has fixed memory requirements since creation time (1).
All in Python is an object, the numbers (like strings) are "elemental" objects. No amount of activity will change the value 8 to anything else, and no amount of activity will change the string “eight” to anything else. This is because a decision in the design too.

Python: will id() always be nonzero?

I’m wondering if there is anything about python object IDs that will prevent them from ever equaling zero? I’m asking because I’m using zero as a stand-in for a special case in my code.
From the docs
CPython implementation detail: This is the address of the object in memory.
0 is an invalid memory location. So no object in C will ever have this memory location and no object in the CPython implementation will ever have an id of zero.
Not sure about other python implementations though
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
There's nothing that says that it cannot be zero (zero is an integer). If you rely on it not being zero then you're relying on a current implementation detail which is not smart.
What you instead should do is to use for example None to indicate that it isn't an id of an object.
This isn't as strong an answer as I'd like, but doing help(id) on python 2.7.5
id(...)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)
Assuming you don't have an object that is pointing to NULL, you should be safe there.
If you want an object that is different than any other, you can create one:
special = object()
As long as you don't delete it, special will be unique over the run time of your program. This might achieve the same thing you intend with checking id() being zero.

Does repeated identical assignment going to lead to the same id() value?

This question is motivated by this
this question which I both misread and
provided a botched answer (I deleted it)
I re-read http://docs.python.org/library/functions.html#id and just tried this in Python:
>>> a = 3
>>> id(a)
5392456
>>> a = 3
>>> id(a)
5392456
repeated a few times more ...
The fact that these numbers (ie addresses of the object in memory) are
the same is implementation dependent, and not guaranteed, is that
correct? They could be different, right? My understanding is that each time I do
this simple assignment, I am creating a new object and binding it to a variable
identifier, so I can't assume that they would be put in the same place
in memory.
Is this understanding correct? If so, are there any exceptions?
You can make that assumption but a times for immutable types like int rather than create a new object, your variable might just reference the immutable object if it already exist. When you do an assignment you are creating a reference to an object. That object might be already around or might be created.

Why does Python allow comparison of a callable and a number?

I used python to write an assignment last week, here is a code snippet
def departTime():
'''
Calculate the time to depart a packet.
'''
if(random.random < 0.8):
t = random.expovariate(1.0 / 2.5)
else:
t = random.expovariate(1.0 / 10.5)
return t
Can you see the problem? I compare random.random with 0.8, which
should be random.random().
Of course this because of my careless, but I don't get it. In my
opinion, this kind of comparison should invoke a least a warning in
any programming language.
So why does python just ignore it and return False?
This isn't always a mistake
Firstly, just to make things clear, this isn't always a mistake.
In this particular case, it's pretty clear the comparison is an error.
However, because of the dynamic nature of Python, consider the following (perfectly valid, if terrible) code:
import random
random.random = 9 # Very weird but legal assignment.
random.random < 10 # True
random.random > 10 # False
What actually happens when comparing objects?
As for your actual case, comparing a function object to a number, have a look at Python documentation: Python Documentation: Expressions. Check out section 5.9, titled "Comparisons", which states:
The operators <, >, ==, >=, <=, and != compare the values of two objects. The objects need not have the same type. If both are numbers, they are converted to a common type. Otherwise, objects of different types always compare unequal, and are ordered consistently but arbitrarily. You can control comparison behavior of objects of non-built-in types by defining a cmp method or rich comparison methods like gt, described in section Special method names.
(This unusual definition of comparison was used to simplify the definition of operations like sorting and the in and not in operators. In the future, the comparison rules for objects of different types are likely to change.)
That should explain both what happens and the reasoning for it.
BTW, I'm not sure what happens in newer versions of Python.
Edit: If you're wondering, Debilski's answer gives info about Python 3.
This is ‘fixed’ in Python 3 http://docs.python.org/3.1/whatsnew/3.0.html#ordering-comparisons.
Because in Python that is a perfectly valid comparison. Python can't know if you really want to make that comparison or if you've just made a mistake. It's your job to supply Python with the right objects to compare.
Because of the dynamic nature of Python you can compare and sort almost everything with almost everything (this is a feature). You've compared a function to a float in this case.
An example:
list = ["b","a",0,1, random.random, random.random()]
print sorted(list)
This will give the following output:
[0, 0.89329568818188976, 1, <built-in method random of Random object at 0x8c6d66c>, 'a', 'b']
I think python allows this because the random.random object could be overriding the > operator by including a __gt__ method in the object which might be accepting or even expecting a number. So, python thinks you know what you are doing... and does not report it.
If you try check for it, you can see that __gt__ exists for random.random...
>>> random.random.__gt__
<method-wrapper '__gt__' of builtin_function_or_method object at 0xb765c06c>
But, that might not be something you want to do.

Categories