Setup:
I am running a python code where:
I open a file.
For every line in file, I create an object
Do some operations with the object
Note that once I am done with the operations part, I no longer need the object. Every new line is independent.
Relevant Code as per request:
I have commented all the parts of my code, leaving below the following code:
import gc
for l in range(num_lines):
inp = f.readline()[:-1]
collector = [int(i) for i in inp]
M = BooleanFunction(collector)
deg = M.algebraic_degree()
del M
gc.collect()
The problem:
The object once created, is consuming some amount of memory. After performing the operations, I am not able to free it. So while looping over the file, my memory keeps getting accumulated with new objects, and by around 793 lines into the file, my 16 GB of RAM is completely depleted.
What I have tried:
Using the garbage collector:
import gc
del Object
gc.collect()
However, the garbage collector will not free up the RAM (or) python is not giving up the memory to the system. Creating child-processes is an idea, but not what I am up for.
Questions:
Is there any way I can free up all the memory currently occupied by the program to the OS? That means removing all variables (loop vars, global vars, etc). Something similar to what happens when you press CTRL+C to terminate the program, it returns all the memory to the OS.
A way to specifically de-allocate an object (If I am not doing it right).
Previous questions do not answer what if gc.collect() fails to do so and how do I completely give up the memory allocated.
Objects in Python can be garbage-colleted once their reference count drops to zero.
Looking at your code, every variable gets re-assigned in every iteration. So their reference count should be zero.
If that doesn't happen then I can see three main possibilities;
You are unwittingly keeping a reference to that object.
Garbage collection is disabled (gc.disable()) or frozen (gc.freeze() in Python 3.7).
The objects are made by a Python extension written in C that manages its own memory.
Note that (1) or (2) doesn't have to happen in your code. It can also happen in modules that you use.
In your case (2) should not be an issue since you force garbage collection.
For an example of (1), consider what would happen if BooleanFunction was memoized. Then a reference to each object (that you wouldn't see and can't delete) would be kept.
The only way to give all memory back to the OS is to terminate the program.
Edit 1:
Try running your program with the garbage collection debug flags enabled (gc.DEBUG_LEAK). Run gc.get_count() at the end of every loop. And maybe gc.garbage() as well.
For a better understanding of where the memory allocation happens and what exactly happens, you could run your script under the Python debugger. Step through the program line by line while monitoring the resident set size of the Python process with ps in another terminal.
Related
If I run a function in Python 3 (func()) is it possible that objects that are created inside func() but cannot be accessed after it has finished would cause it to increase its memory usage?
For instance, will running
def func():
# Objects being created, that are not able to be used after function call has ended.
while True:
func()
ever cause the program run out of memory, no matter what is in func()?
If the program is continually using memory, what are some possible things that could be going on in func() to cause it to continue using memory after it has been called?
Edit:
I'm only asking about creating objects that can no longer be accessed after the function has ended, so they should be deleted.
Yes, it is possible for a Python function to still use memory after being
called.
Python uses garbage collection (GC) for memory management. Most GCs (I suppose
there could be some exceptions) make no guarantee if or when they will free
the memory of unreferenced objects. Say you have a function
consume_lots_of_memory() and call it as:
while True:
consume_lots_of_memory()
There is no guarantee that all of the memory allocated in the first call
to consume_lots_of_memory() will be released before it is called a
second time. Ideally the GC would run after the call finished, but it
might run half way through the fifth call. So depending on when the GC
runs, you could end up consuming more memory than you would expect and
possibly even run out of memory.
Your function could be modifying global state, and using large amounts of
memory that never gets released. Say you have a module level cache, and a
function cache_lots_of_objects() called as:
module_cache = {}
while True:
cache_lots_of_objects()
Every call to cache_lots_of_objects() only ever adds to the cache, and
the cache just keeps consuming more memory. Even if the GC promptly
releases the non-cached objects created in cache_lots_of_objects(), your
cache could eventually consume all of your memory.
You could be encountering an actual memory leak from Python itself (unlikely
but possible), or from a third-party library improperly using the C API, using
a leaky C library, or incorrectly interfacing with a C library.
One final note about memory usage. Just because Python has freed allocated
objects, it does not necessarily mean that the memory will be released from the process
and returned to the operating system. The reason has to do with how memory is
allocated to a process in chunks (pages). See abarnert's answer
to Releasing memory in Python
for a better explanation than I can offer.
Looking for a verified solution for this problem in python.
I have code like this:
verySensitveData = "secret, big secret"
#useing verySensitveData in code
#not need it any more
del verySensitveData # now variable is unusable in later code
collected = gc.collect() #collected and removed
Now it should be gone from RAM when gc is called.
Does this force OS to erase data on address used by verySensitveData variable automatically with GC?
It should be gone for good, no ram memory dump can retrieve data that was in variable verySensitveData?
No. gc.collect() only causes Python to check for objects that are referenced but unreachable (e.g, where two objects refer to each other, but nothing else does). It does not trigger any sort of memory cleanup.
If making your program resistant to memory dumping is important, Python is not the right language to be writing it in. Python makes very few guarantees about how data will be stored in memory, and it is very likely that any string you process will be copied around in memory in the course of processing it, which may leave partial or complete copies of your string in memory. Python may reuse that memory or release it to the OS later, but it will not take any special measures to wipe it.
I made simple loop which for each iteration, appends a number to a list.
After the program has completed, will the memory used by the list be automatically freed?
if __name__ == '__main__':
for i in range(100000):
list.append(i)
Anyone can explain to me please?
Yes, all memory is freed when the program is terminated. There is just no way in a modern operating system for you to reserve memory and not have it freed when the process is terminated.
Garbage collector is for freeing memory before the program terminates. This way long-running programs won't reserved all the resources of the computer.
If you have a big data structure, and you want the garbage collector to take care of it (free the memory used by it), you should remove all references to it after you're done using it. In this case simple del list would be sufficient.
Yes, the operating system takes care and frees the memory used by a process after it terminates. Python has nothing to do with it.
However, Python itself has automatic garbage collection, so it frees all of the memory that is no longer necessary while your Python program is running.
Finally, you probably should just use:
if __name__ == '__main__':
list = range(100000)
to achieve exactly the same thing you have written.
Yes, the memory will be freed when the program terminates and on most implementations it will also be freed when the reference count to that list reaches zero, i.e., when there is no variables in scope pointing to that value.
You can also manually control the GC using the gc module.
If you are just iterating over the list and the list is big enough to get you worried about memory consumption, you probably should check Python generators and generator expressions.
So instead of:
for i in range(100000):
results.append(do_something_with(i))
for result in results:
do_something_else_with(result)
You can write:
partial = (do_something_with(i) for i in range(100000))
for result in partial:
do_something_else_with(result)
Or:
def do_something(iterable):
for item in iterable:
yield some_calculation(item)
def do_something_else(iterable):
for item in iterable:
yield some_other_calculation(item)
partial = do_something(range(100000))
for result in do_something_else(partial):
print result
And so on... This way you don't have to allocate the whole list in memory.
You needn't worry about freeing memory in python, since this is an automatic feature of the program. Python uses reference counting to manage memory, when something is no longer being referenced, python will automatically allocate memory accordingly. In other words, removing all references to the list should be enough to free the memory that was allocated to it.
That said, if your program is huge, you may use gc.collect() to make the garbage collector free some memory. However, this is generally unnecessary, as Python's garbage collector is designed to do its job well enough.
Moreover, although this is not recommended, and generally never very useful, you may also disable Python's automatic garbage collector using gc.disable(), which allows you as the user to allocate the memory manually, in an almost C style approach.
All the memory will be freed after termination but if you want to be efficient in creating a list this large during execution use xrange instead which generates the number on the fly.
alist = [i for i in xrange(10000)]
Somehow the memory my Python program takes more and more memory as it runs (the VIRT and RES) column of the "top" command keep increasing.
However, I double checked my code extremely carefully, and I am sure that there is no memory leaks (didn't use any dictionary, no global variables. It's just a main method calling a sub method for a number of times).
I used heapy to profile my memory usage by
from guppy import hpy;
heap = hpy();
.....
print heap.heap();
each time the main method calls the sub method. Surprisingly, it always gives the same output. But the memory usage just keeps growing.
I wonder if I didn't use heapy right, or VIRT and RES in "top" command do not really reflect the memory my code uses?
Or can anyone provide a better way to track down the memory usage in a Python script?
Thanks a lot!
Two possible cases:
your function is pure Python, in which case possible causes include
you are storing an increasing number of large objects
you are having cycles of objects with a __del__ method, which the gc won't touch
I'd suggest using the gc module and the gc.garbage and gc.get_objects function (see http://docs.python.org/library/gc.html#module-gc), to get list of existing objects, and you can then introspect them by looking at the __class__attribute of each object for instance to get information about the object's class.
your function is at least partially written in C / C++, in which case the problem potentially is in that code. The advice above still applies, but won't be able to see all leaks: you will see leaks caused by missing calls to PY_DECREF, but not low level C/C++ allocations without a corresponding deallocation. For this you will need valgrind. See this question for more info on that topic
Working in Python. I have a function that reads from a queue and creates a dictionary based on some of the XML tags in the record read from the queue, and returns this dictionary. I call this function in a loop forever. The dictionary gets reassigned each time. Does the memory previously used by the dictionary get freed at each reassignment, or does it get orphaned and eventually cause memory problems?
def readq():
qtags = {}
# Omitted code to read the queue record, get XML string, DOMify it
qtags['result'] = "Success"
qtags['call_offer_time'] = get_node_value_by_name(audio_dom, 'call_offer_time')
# More omitted code to extract the rest of the tags
return qtags
while signals.sigterm_caught == False:
tags = readq()
if tags['result'] == "Empty":
time.sleep(SLEEP_TIME)
continue
# Do stuff with the tags
So when I reassign tags each time in that loop, will the memory used by the previous assignment get freed before being allocated by the new assignment?
The memory of an object will be freed if it can be proven (from the knowledge the language implementation has at runtime) that it cannot possibly be accessed any more and the garbage collector sees it fit to make a collection. That's the absolute minimum, and you shouldn't assume any more. And you usually shouldn't have to worry about anything more.
More practically speaking, it may be freed at some point in time between the last reference (where "reference" isn't limited to names in scope, but can be anything that makes the object reachable) being removed and memory running out. It doesn't have to be freed by the Python implementation running your code, it may as well leave the memory cleaning to the OS and forget about any finalizers and such. Note that there can be a noticeable delay between the last reference dying and memory usage actually dropping. But as mentioned before, most implementations go out of their way to avoid excessive memory usage if there is garbage to collect.
Even more practically, you'll propably be running this on CPython (the reference implementation), which always used and most propably will always use reference counting (augmented with a real GC to handle cyclic references), so unless there's a cyclic reference (relatively rare and your code doesn't look like it has them, but can occur e.g. in graph-like structures) it will be freed as soon as the last reference to it is deleted/overwritten. Of course, other implementations aren't that predictable - PyPy alone has half a dozen different garbage collectors, all but one falling under the above paragraph.
No, it will be freed AFTER the new object has been created.
In order for the reference count to go down on the old object, tags has to be pointed to the new object. This happens after readq returns, so at the very least both objects will exist from the beginning of qtags = {} to after tags = readq().
As #delnan stated, soon after tags has been pointed to the new object, the old one will be freed by the garbage collector as there is no longer a reference to it.
Usually Python can keep up with anything you throw at it. The Garbage collector used in Python uses reference counting, so your memory usage should be about constant, you won't see any spikes in memory. Right when you remove a reference (assign the variable to something else), the garbage collector throws the memory back into the "heap" if you will. So don't worry about memory. I have run simulators doing tests for hours rewriting variables, but the memory usage stays about the same. It will be freed when you assign it a new dictionary.