How is Python statement x=x+1 implemented? - python

In C, a statement x=x+1 will change the content at the same memory that is allocated for x. But in Python, since a variable can have different types, x at the left and right side of = may be of different types, which means they may refer to different pieces of memory. If so, after x changes its reference from the old memory to the new memory, the old memory can be reclaimed by the garbage collection mechanism. If it is the case, the following code may trigger the garbage collection process many times thus is very low efficient:
for i in range(1000000000)
i=i+1
Is my guess correct?
Update:
I need to correct the typo in the code to make the question clearer:
x=0
for i in range(1000000000)
x=x+1
#SvenMarnach, do you mean the integers 0,1,2,...,999999999 (which the label x once referred to) all exist in memory if garbage collection is not activated?

id can be used to track the 'allocation' of memory to objects. It should be used with caution, but here I think it's illuminating. id is a bit like a c pointer - that is, some how related to 'where' the object is located in memory.
In [18]: for i in range(0,1000,100):
...: print(i,id(i))
...: i = i+1
...: print(i,id(i))
...:
0 10914464
1 10914496
100 10917664
101 10917696
200 10920864
201 10920896
300 140186080959760
301 140185597404720
400 140186080959760
401 140185597404720
...
900 140186080959760
901 140185597404720
In [19]: id(1)
Out[19]: 10914496
Small integers (<256) are cached - that is, integer 1, once created is 'reused'.
In [20]: id(202)
Out[20]: 10920928 # same id as in the loop
In [21]: id(302)
Out[21]: 140185451618128 # different id
In [22]: id(901)
Out[22]: 140185597404208
In [23]: id(i)
Out[23]: 140185597404720 # = 901, but different id
In this loop, the first few iterations create or reuse small integers. But it appears that when creating larger integers, it is 'reusing' memory. It may not be full blown garbage collection, but the code is somehow optimized to avoid unnecessary memory use.
Generally as Python programmers don't focus on those details. Write clean reliable Python code. In this example, modifying an iteration variable in the loop is poor practice (even if it is just an example).

You are mostly correct, though I think a few clarifications may help.
First, the concept of variables in C in Python is rather different. In C, a variable generally references a fixed location in memory, as you stated yourself. In Python, a variable is just a label that can be attached to any object. An object could have multiple such labels, or none at all, and labels can be freely moved between objects. An assignment in C copies a new value to a memory location, while an assignment in Python attaches a new label to an object.
Integers are also very different in both languages. In C, an integer has a fixed size, and stores an integer value in a format native to the hardware. In Python, integers have arbitrary precision. They are stored as array of "digits" (usually 30-bit integers in CPython) together with a Python type header storing type information. Bigger integers will occupy more memory than smaller integers.
Moreover, integer objects in Python are immutable – they can't be changed once created. This means every arithmetic operation creates a new integer object. So the loop in your code indeed creates a new integer object in each iteration.
However, this isn't the only overhead. It also creates a new integer object for i in each iteration, which is dropped at the end of the loop body. And the arithmetic operation is dynamic – Python needs to look up the type of x and its __add__() method in each iteration to figure out how to add objects of this type. And function call overhead in Python is rather high.
Garbage collection and memory allocation on the other hand are rather fast in CPython. Garbage collection for integers relies completely on reference counting (no reference cycles possible here), which is fast. And for allocation, CPython uses an arena allocator for small objects that can quickly reuse memory slots without calling the system allocator.
So in summary, yes, compared to the same code in C, this code will run awfully slow in Python. A modern C compiler would simply compute the result of this loop at compile time and load the result to a register, so it would finish basically immediately. If raw speed for integer arithmetic is what you want, don't write that code in Python.

Related

Python's special treatment to certain integers. Why? [duplicate]

This question already has an answer here:
What's with the integer cache maintained by the interpreter?
(1 answer)
Closed 2 years ago.
I came across this phrase:
"Python keeps an array of ints between -5 and 256. When you create
an int in that range, you get a reference to a pre-existing object"
You can verify with this code:
def check(n,d):
a = b = n
a -= d
b -= d
return a is b
Now, check(500,10) returns False. But check(500,300) returns True. Why does Python compiler would do such a thing? Isn't it a perfect recipe for bugs?
CPython (the reference interpreter) does it because it saves a lot of memory (and a small amount of execution time) to have the most commonly used ints served from a cache. Incrementing a number uses a shared temporary, not a unique value in each place you do the increment. Iterating a bytes or bytearray object can go much faster by directly pulling the cached entries. It's not a language guarantee though, so never write code like this (which relies on it).
It's not a bug factory because:
Relying on object identity tests for ints is a terrible idea in the first place; you should always be using == to compare ints, and
ints are immutable; it's impossible to modify the cached entries without writing intentionally evil ctypes or C extension modules. Normal Python code can't trigger bugs due to this cache.

How to get size of a Numba Dictionary?

I'm trying to get size of Numba typed dictionary in bytes:
from numba import njit
from numba.typed import Dict
from sys import getsizeof as gso
#njit
def get_dict(n):
d = {0:0}
for i in range(1,n):
d[i] = i
return d
print(gso(get_dict(10))
print(gso(get_dict(10000)))
In both cases getsizeof function returns 64 bytes. Obviously, size of Dictionary must depend on length.
If I transform typed dictionary into native Python dictionary using dict(), it works and returns 376 and 295016:
print(gso(dict(get_dict(10))))
print(gso(dict(get_dict(10000))))
How can I measure it?
Currently (numba 0.46.0) what you intended to do is very likely not possible with a DictType.
sys.getsizeof on containers is tricky at best and very misleading in the worst case. The problem is that getsizeof needs to reduce possibly very complicated implementation details to one integer number.
The first problem is that calling sys.getsizeof on a container typically only reports the size of the container not the contents of the container - or in case of an opaque container it only returns the size of the wrapper. What you encountered was the latter - DictType is just a wrapper for an opaque numba struct that's defined (probably) in c. So the 64 you're seeing is actually correct, that's the size of the wrapper. You can probably access the wrapped type, but given that it's hidden behind private attributes it's not intended for non-numba-code, so I won't go down that path - mostly because any answer relying on those kind of implementation details may be outdated at any time.
However sys.getsizeof requires deep understanding of the implementation details to be interpreted correctly. So even for the plain dict it's not obvious what the number represents. It's certainly the calculated memory in bytes of the container (without the contents) but it could also be a key-sharing dictionary (in your case it's not a key-sharing dictionary) where the number would be accurate but since part of the key-sharing dictionary are shared it's probably not the number you are looking for. As mentioned, it normally also doesn't account for the contents of the container but that's an implementation detail as well, for example numpy.array includes the size of the contents, while list, set, etc. don't. That's because a numpy array doesn't have "real contents" - at least it doesn't have Python objects as content.
So even if the DictType wrapper would report the size of the underlying dictionary, you would still need to know if it contained the contents in the dictionary, or as pointer to Python objects or (even more convoluted) as objects defined in another language (like C, or C++) to interpret the results correctly.
So my advise would be to not use sys.getsizeof, except out of curiosity or academic interest. And then only if you're willing to dig through all the implementation details (that may or may not change any time) in order to interpret the results correctly. If you're really interested in memory consumption you are often better off using a tool that tracks the memory usage of your program as a whole. That still has a lot of pitfalls (memory-reuse, unused memory-allocations) and requires a significant amount of knowledge how memory is used to interpret correctly (virtual memory, shared memory, and how memory is allocated) but it often yields a more realistic view how much memory your program actually uses.
import gc
import numba as nb
import psutil
#nb.njit
def get_dict(n):
d = {0:0}
for i in range(1,n):
d[i] = i
return d
get_dict(1)
gc.collect()
print(psutil.Process().memory_info())
d = get_dict(100_000)
gc.collect()
print(psutil.Process().memory_info())
Which gives on my computer:
pmem(rss=120696832, vms=100913152, num_page_faults=34254, peak_wset=120700928,
wset=120696832, peak_paged_pool=724280, paged_pool=724280, peak_nonpaged_pool=1255376,
nonpaged_pool=110224, pagefile=100913152, peak_pagefile=100913152, private=100913152)
pmem(rss=126820352, vms=107073536, num_page_faults=36717, peak_wset=129449984,
wset=126820352, peak_paged_pool=724280, paged_pool=724280, peak_nonpaged_pool=1255376,
nonpaged_pool=110216, pagefile=107073536, peak_pagefile=109703168, private=107073536)
Which shows that the program needed 6 123 520 bytes of memory (using the "rss" - resident set size) more after the call than it had allocated before.
And similar for a plain Python dictionary:
import gc
import psutil
gc.collect()
print(psutil.Process().memory_info())
d = {i: i for i in range(100_000)}
gc.collect()
print(psutil.Process().memory_info())
del d
gc.collect()
Which gave a difference of 8 552 448 bytes on my computer.
Note that these numbers represent the complete process, so treat these with caution. For example for small values (get_dict(10)) they return 4096 on my windows computer because that's the page-size of windows. The program there actually allocated more space than the dictionary needs because of the OS-restrictions.
However even with these pitfalls and restrictions that's still significantly more accurate if you're interested the memory-requirements of your program.
If you still (out of curiosity) want to know how much memory the DictType theoretically needs you should probably ask the numba developers to enhance numba so that they implement __sizeof__ for their Python wrappers so that the numbers are more representative. You could for example open an issue on their issue tracker or ask on their mailing list.
An alternative may be to use other third-party tools, for example pypmler, but I haven't used these myself so I don't know if they work in this case.

How to dynamically allocate memory in Python

Is there any method in python, that I can use to get a block of memory from the heap ,and use a variable to reference it. Just like the keyword "new" , or the function malloc() in other languages:
Object *obj = (Object *) malloc(sizeof(Object));
Object *obj = new Object();
In the project, my program is waiting to receive some data in uncertain intervals and with a certain length of bytes when correct.
I used to it like this:
void receive()// callback
{
if(getSize()<=sizeof(DataStruct))
{
DataStruct *pData=malloc(sizeof(DataStruct));
if(recvData(pData)>0)
list_add(globalList,pData);
}
}
void worker()
{
init()
while(!isFinish)
{
dataProcess(globalList);
}
}
Now, I want to migrate these old project to python, and I tried to do it like this:
def reveive():
data=dataRecv()
globalList.append(data)
However, I get the all item in the list are same, and equal to the latest received item. It is obvious that all the list items are point to the same memory adress, and I want to get a new memory adress each the function is called.
The equivalent of "new" in python is to just use a constructor eg:
new_list = list() # or [] - expandable heterogeneous list
new_dict = dict() # expandable hash table
new_obj = CustomObject() # assuming CustomObject has been defined
Since you are porting from C, some things to note.
Everything is an object in python including integers, and most variables are just references, but the rules for scalar variables such as integers and strings are different from containers, eg:
a = 2 # a is a reference to 2
b = a # b is a reference to 'a'
b = 3 # b now points to 3, while 'a' continues to point to 2
However:
alist = ['eggs', 2, 'juice'] # alist is reference to a new list
blist = alist # blist is a reference; changing blist affects alist
blist.append('coffee') # alist and blist both point to
# ['eggs', 2, 'juice', 'coffee']
You can pre-allocate sizes, if you'd like but it often doesn't buy you much benefit in python. The following is valid:
new_list4k = [None]*4096 # initialize to list of 4096 None's
new_list4k = [0]*4096 # initialize to 4096 0's
big_list = []
big_list.extend(new_list4k) # resizes big_list to accomodate at least 4k items
If you want to ensure memory leaks do not occur, use local variables as often as possible, eg, within a function so as things go out of scope you don't have to worry.
For efficient vectorized operations (and much lower memory footprint) use numpy arrays.
import numpy as np
my_array = np.zeros(8192) # create a fixed array length of 8K elements
my_array += 4 # fills everything with 4
My added two cents:
I'd probably start by asking what your primary goal is. There is the pythonic ways of doing things, while trying to optimize for speed of program execution or minimum memory footprint. And then there is the effort of trying to port a program in as little time as possible. Sometimes they all intersect but more often, you will find the pythonic way to be quick to translate but with higher memory requirements. Getting higher performance out of python will probably take focused experience.
Good luck!
You should read the Python tutorial.
You can create lists, dictionaries, objects and closures in Python. All these live in the (Python) heap, and Python has a naive garbage collector (reference counting + marking for circularity).
(the Python GC is naive because it does not use sophisticated GC techniques; hence it is slower than e.g. Ocaml or many JVM generational copying garbage collectors; read the GC handbook for more; however the Python GC is much more friendly to external C code)
Keep in mind that interpreted languages usually don't flatten the types as compiled languages do. The memory layout is (probably) completely different than in the raw data. Therefore, you cannot simply cast raw data to a class instance or vice versa. You have to read the raw data, interpret it and fill your objects manually.

Integers v/s Floats in python:Cannot understand the behavior

I was playing a bit in my python shell while learning about mutability of objects.
I found something strange:
>>> x=5.0
>>> id(x)
48840312
>>> id(5.0)
48840296
>>> x=x+3.0
>>> id(x) # why did x (now 8.0) keep the same id as 5.0?
48840296
>>> id(5.0)
36582128
>>> id(5.0)
48840344
Why is the id of 5.0 reused after the statement x=x+3.0?
Fundamentally, the answer to your question is "calling id() on numbers will give you unpredictable results". The reason for this is because unlike languages like Java, where primitives literally are their value in memory, "primitives" in Python are still objects, and no guarantee is provided that exactly the same object will be used every time, merely that a functionally equivalent one will be.
CPython caches the values of the integers from -5 to 256 for efficiency (ensuring that calls to id() will always be the same), since these are commonly used and can be effectively cached, however nothing about the language requires this to be the case, and other implementations may chose not to do so.
Whenever you write a double literal in Python, you're asking the interpreter to convert the string into a valid numerical object. If it can, Python will reuse existing objects, but if it cannot easily determine whether an object exits already, it will simply create a new one.
This is not to say that numbers in Python are mutable - they aren't. Any instance of a number, such as 5.0, in Python cannot be changed by the user after being created. However there's nothing wrong, as far as the interpreter is concerned, with constructing more than one instance of the same number.
Your specific example of the object representing x = 5.0 being reused for the value of x += 3.0 is an implementation detail. Under the covers, CPython may, if it sees fit, reuse numerical objects, both integers and floats, to avoid the costly activity of constructing a whole new object. I stress however, this is an implementation detail; it's entirely possible certain cases will not display this behavior, and CPython could at any time change its number-handling logic to no longer behave this way. You should avoid writing any code that relies on this quirk.
The alternative, as eryksun points out, is simply that you stumbled on an object being garbage collected and replaced in the same location. From the user's perspective, there's no difference between the two cases, and this serves to stress that id() should not be used on "primitives".
The Devil is in the details
PyObject* PyInt_FromLong(long ival)
Return value: New reference.
Create a new integer object with a value of ival.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range
you actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
Note This is true only for CPython and may not apply for other Python Distribution.

Deletion of a list in python with and without ':' operator

I've been working with python for quite a bit of time and I'm confused regarding few issues in the areas of Garbage Collection, memory management as well as the real deal with the deletion of the variables and freeing memory.
>>> pop = range(1000)
>>> p = pop[100:700]
>>> del pop[:]
>>> pop
[]
>>> p
[100.. ,200.. 300...699]
In the above piece of code, this happens. But,
>>> pop = range(1000)
>>> k = pop
>>> del pop[:]
>>> pop
[]
>>> k
[]
Here in the 2nd case, it implies that the k is just pointing the list 'pop'.
First Part of the question :
But, what's happening in the 1st code block? Is the memory containing [100:700] elements not getting deleted or is it duplicated when list 'p' is created?
Second Part of the question :
Also, I've tried including gc.enable and gc.collect statements in between wherever possible but there's no change in the memory utilization in both the codes. This is kind of puzzling. Isn't this bad that python is not returning free memory back to OS? Correct me if I'm wrong in the little research I've did. Thanks in advance.
Slicing a sequence results in a new sequence, with a shallow copy of the appropriate elements.
Returning the memory to the OS might be bad, since the script may turn around and create new objects, at which point Python would have to request the memory from the OS again.
1st part:
In the 1st code block, you create a new object where the elements of the old one are copied before deleting that one.
In the 2nd code block, however, you just assign a reference to the same object to another variable. Then you empty the list, which, of course, is visible via both references.
2nd part: Memory is returned when appropriate, but not always. Under the hood of Python, there is a memory allocator which has control over where the memory comes from. There are 2 ways: via the brk()/sbrk() mechanism (for smaller memory blocks) and via mmap() (larger blocks).
Here we have rather smaller blocks which get allocated directly at the end of the data segment:
datadatadata object1object1 object2object2
If we only free object1, we have a memory gap which can be reused for the next object, but cannot easily freed and returned to the OS.
If we free both objects, memory could be returned. But there probably is a threshold for keeping memory back for a while, because returning everything immediately is not the very best thing.

Categories