Okay, I got this concept of a class that would allow other classes to import classes on as basis versus if you use it you must import it. How would I go about implementing it? Or, does the Python interpreter already do this in a way? Does it destroy classes not in use from memory, and how so?
I know C++/C are very memory orientated with pointers and all that, but is Python? And I'm not saying I have problem with it; I, more or less, want to make a modification to it for my program's design. I want to write a large program that use hundreds of classes and modules. But I'm afraid if I do this I'll bog the application down, since I have no understanding of how Python handles memory management.
I know it is a vague question, but if somebody would link or point me in the right direction it would be greatly appreciated.
Python -- like C#, Java, Perl, Ruby, Lua and many other languages -- uses garbage collection rather than manual memory management. You just freely create objects and the language's memory manager periodically (or when you specifically direct it to) looks for any objects that are no longer referenced by your program.
So if you want to hold on to an object, just hold a reference to it. If you want the object to be freed (eventually) remove any references to it.
def foo(names):
for name in names:
print name
foo(["Eric", "Ernie", "Bert"])
foo(["Guthtrie", "Eddie", "Al"])
Each of these calls to foo creates a Python list object initialized with three values. For the duration of the foo call they are referenced by the variable names, but as soon as that function exits no variable is holding a reference to them and they are fair game for the garbage collector to delete.
x =10
print (type(x))
memory manager (MM):
x points to 10
y = x
if(id(x) == id(y)):
print('x and y refer to the same object')
(MM):
y points to same 10 object
x=x+1
if(id(x) != id(y)):
print('x and y refer to different objects')
(MM):
x points to another object is 11, previously pointed object was destroyed
z=10
if(id(y) == id(z)):
print('y and z refer to same object')
else:
print('y and z refer different objects')
Python memory management is been divided into two parts.
Stack memory
Heap memory
Methods and variables are created in Stack memory.
Objects and instance variables values are created in Heap memory.
In stack memory - a stack frame is created whenever methods and
variables are created.
These stacks frames are destroyed automaticaly whenever
functions/methods returns.
Python has mechanism of Garbage collector, as soon as variables and
functions returns, Garbage collector clear the dead objects.
Read through following articles about Python Memory Management :
Python : Memory Management (updated to version 3)
Exerpt: (examples can be found in the article):
Memory management in Python involves a private heap containing all
Python objects and data structures. The management of this private
heap is ensured internally by the Python memory manager. The Python
memory manager has different components which deal with various
dynamic storage management aspects, like sharing, segmentation,
preallocation or caching.
At the lowest level, a raw memory allocator ensures that there is
enough room in the private heap for storing all Python-related data by
interacting with the memory manager of the operating system. On top of
the raw memory allocator, several object-specific allocators operate
on the same heap and implement distinct memory management policies
adapted to the peculiarities of every object type. For example,
integer objects are managed differently within the heap than strings,
tuples or dictionaries because integers imply different storage
requirements and speed/space tradeoffs. The Python memory manager thus
delegates some of the work to the object-specific allocators, but
ensures that the latter operate within the bounds of the private heap.
It is important to understand that the management of the Python heap
is performed by the interpreter itself and that the user has no
control over it, even if she regularly manipulates object pointers to
memory blocks inside that heap. The allocation of heap space for
Python objects and other internal buffers is performed on demand by
the Python memory manager through the Python/C API functions listed in
this document.
My 5 cents:
most importantly, python frees memory for referenced objects only (not for classes because they are just containers or custom data types). Again, in python everything is an object, so int, float, string, [], {} and () all are objects. That mean if your program don't reference them anymore they are victims for garbage collection.
Though python uses'Reference count' and 'GC' to free memory (for the objects that are not in used), this free memory is not returned back to the operating system (in windows its different case though). This mean free memory chunk just return back to python interpreter not to the operating system. So utlimately your python process is going to hold the same memory. However, python will use this memory to allocate to some other objects.
Very good explanation for this given at: http://deeplearning.net/software/theano/tutorial/python-memory-management.html
Yes its the same behaviour in python3 as well
Related
I use a module (that I cannot modify) which contains a method that I need to use. This method returns 10GB of data, but also allocates 8GB of memory that it does not release. I need to use this method at the start of a script that runs for a long time, and I want to make sure the 8GB of memory are released after I run the method. What are my options here?
To be clear, the 8GB do not get reused by the script - i.e. if I create a large numpy array after running the method, extra memory is allocated for that numpy array.
I have considered running the method in a separate process using the multiprocessing module (and returning the result), but run into problems serializing the large result of the method - 10GB cannot be pickled by the default pickler, and even if I force multiprocessing to use pickle version 4 pickling has a very large memory overhead. Is there anything else I could do without being able to modify the offending module?
Edit: here is an example
from dataloader import dataloader1
result = dataloader1.get("DATA1")
As I understand it, dataloader is a Python wrapper around some C++ code using pybind11. I do not know much more about its internal workings. The code above results in 18GB being used. If I then run
del result
10GB gets freed up correctly, but 8GB continues being used (with seemingly no python objects existing any more).
Edit2: If I create a smallish numpy array (e.g. 3GB), memory usage stays at 8GB. If I delete it and instead create a 6GB numpy array, memory usage goes to 14GB and comes back down to 8GB after I delete it. I still need the 8GB released to the OS.
can you modify the function?
If the memory is held by some module, try to reload that module, (importlib.reload) which should release the memory.
If the memory is not released by th gc, it is probably because an object is store in the class that created it, so an option is to find what is this big attribute in the class (by profiling) instance and assigned it to None which may cause the gc to release the memory.
Python uses 2 different mechanisms to free memory.
Reference Counting which is employed primarily and deallocates memory as soon as it is no longer needed (eg. object lost from scope).
Garbage Collector, which is secondary and is used to collect objects with cyclic references (a -> b -> c -> a). This can be triggered using a method. Otherwise Python itself will decide, when to free memory.
However I would highly suggest profiling and chaning the code so that it does not use as much memory. Perhaps look into streams, or use a database.
I am coming from C++ where I worked on heap memory and there I had to delete the memory of heap which I created on heap using 'new' keyword and I am always in confusion what to do in python for heap memory to stop memory leakage please recommend me any text for detail of python memory allocation and deletion.Thanks
You do not have to do anything: Python first of all uses reference counting. This means that for every object it holds a counter that is incremented when you reference that object through a new variable, and decrements the counter in case you let the variable point to something else. In case the counter hits zero, then the object will be deleted (or scheduled for deletion).
This is not enough however, since two objects can reference each other and thus even if no other variable refer to the objects, these objects keep each other alive. For that, Python has an (optional) garbage collector that does cycle detection. In case such cycles are found, the objects are deleted. You can schedule such collection by calling gc.collect().
In short: Python takes care of memory management itself. Of course it is your task to make sure objects can be released. For instance it is wise not to refer to a large object longer than necessary. You can do this for instance by using the del keyword:
foo = ... # some large object
# ...
# use foo for some tasks
del foo
# ...
# do some other tasks
by using del we have removed the foo variable, and thus we also decremented the counter refering to the object to which foo was refering. As a result, the object foo was refering too can be scheduled for removal (earlier). Of course compilers/interpreters can do liveness analysis, and perhaps find out themselves that you do not use foo anymore, but better be safe than sorry.
So in short: Python manages memory itself by using reference counting and a garbage collector, the thing you have to worry about is that not that much objects are still "alive" if these are no longer necessary.
Python is a high level language. And here you need not worry about memory de-allocation. It is the responsibility of the python runtime to manage memory allocations and de-allocations.
I've tried module multiprocessing Manager and Array , but it can't meet my needs
Is there a method just like shared memory in linux C?
Not as such.
Sharing memory like this in the general case is very tricky. The CPython interpreter does not relocate objects, so they would have to be created in situ within the shared memory region. That means shared memory allocation, which is considerably more complex than just calling PyMem_Malloc(). In increasing order of difficulty, you would need cross-process locking, a per-process reference count, and some kind of inter-process cyclic garbage collection. That last one is really hard to do efficiently and safely. It's also necessary to ensure that shared objects only reference other shared objects, which is very difficult to do if you're not willing to relocate objects into the shared region. So Python doesn't provide a general purpose means of stuffing arbitrary full-blown Python objects into shared memory.
But you can share mmap objects between processes, and mmap supports the buffer protocol, so you can wrap it up in something higher-level like array/numpy.ndarray or anything else with buffer protocol support. Depending on your precise modality, you might have to write a small amount of C or Cython glue code to rapidly move data between the mmap and the array. This should not be necessary if you are working with NumPy. Note that high-level objects may require locking which mmap does not provide.
I have a question about the virtual memory in Python.
When the process is consuming a relatively large amount of memory, it doesn't "release" the unused memory. For example, after creating a massive list of strings, let's say the list uses 30MB of memory, so the entire process takes roughly 40MB, when the list is deleted, the process still consuming 40MB, but if another list with the same amount of data is created, the process will not take more memory, because it will use the virtual memory that is available but not released to the OS.
My question is: What kind of data will reuse that non-released virtual memory? I mean, that 30MB was "taken" from the OS when I created a list of strings, and even when I delete it, the next list of strings will not take more memory from the OS as long as it fits in the 30MB. But if instead a list of strings another type of data is created, like a QPixmap (from Qt, using PyQt), will it use that 30MB originally allocated by the list of strings?
Thank you in advance.
Edit: Well, this question sounds lazy. I know I could simply test this specific case, but i want to know in theory, I don't want the answer for this "list of strings and qpixmap" specific case, but in general.
At the C level (CPython's implementation), anything that is allocated on the heap with malloc() will consume memory and this memory will not be released to the OS when that memory is freed with free(). It will only be returned when the process dies. But when new blocks are allocated with malloc() they will use the freed-up memory.
(Unless the free memory is really badly fragmented and there is not enough contiguous free space in the freed-up zones to accommodate new allocations. But let's not worry about this pathological case.)
Every Python object is implemented by CPython as one or more blocks of memory allocated with malloc() so the answer to your question is: pretty much any piece of Python data can reuse the space that was freed by the deallocation of some other piece of Python data.
There are two parts to the problem of "freeing" memory: first, getting Python to garbage collect the objects, and second, getting unused memory returned to the OS at the C level.
If you are having problems with process size growing without bounds, you are almost certainly not allowing objects to be garbage collected. 99.9% of the time (to 0 significant digits :) ) if you are trying to second-guess Python's C-level memory management, you are in a bunny hole.
Remember that in Python your objects are not even candidates to be garbage collected until there are no more live objects with references to them. You can very easily squirrel away a reference to an object somewhere without realizing it.
There's a Python tool called Dowser that is very helpful at finding leaks of memory caused by keeping around references to objects. If you see your object count for a certain class growing without bounds over time.... there's your memory problem.
Good luck!
Somehow the memory my Python program takes more and more memory as it runs (the VIRT and RES) column of the "top" command keep increasing.
However, I double checked my code extremely carefully, and I am sure that there is no memory leaks (didn't use any dictionary, no global variables. It's just a main method calling a sub method for a number of times).
I used heapy to profile my memory usage by
from guppy import hpy;
heap = hpy();
.....
print heap.heap();
each time the main method calls the sub method. Surprisingly, it always gives the same output. But the memory usage just keeps growing.
I wonder if I didn't use heapy right, or VIRT and RES in "top" command do not really reflect the memory my code uses?
Or can anyone provide a better way to track down the memory usage in a Python script?
Thanks a lot!
Two possible cases:
your function is pure Python, in which case possible causes include
you are storing an increasing number of large objects
you are having cycles of objects with a __del__ method, which the gc won't touch
I'd suggest using the gc module and the gc.garbage and gc.get_objects function (see http://docs.python.org/library/gc.html#module-gc), to get list of existing objects, and you can then introspect them by looking at the __class__attribute of each object for instance to get information about the object's class.
your function is at least partially written in C / C++, in which case the problem potentially is in that code. The advice above still applies, but won't be able to see all leaks: you will see leaks caused by missing calls to PY_DECREF, but not low level C/C++ allocations without a corresponding deallocation. For this you will need valgrind. See this question for more info on that topic