I'm having a problem with releasing the memory of a dictionary in Python.
I run the following check and followed the process memory usage:
a = dict()
for i in xrange(1000000):
a[i] = i
for i in xrange(1000000):
del a[i]
gc.collect()
the memory usage after running those lines is much higher than before.
how can I release all of the memory?
notice I don't want to delete the dict itself.
thanks.
Simply removing all the elements from the dictionary is not going to remove the dictionary from the memory. Python uses the reference counting technique. So, only when the reference count of an object drops to 0, it will be ready for garbage collection. So, your best bet would be to remove the reference a from referring to the actual dictionary like this
a = None
if the dictionary has no other references, the dictionary referred by a will be garbage collected automatically.
Related
I have a list which will get really big. So I will save the List on my HDD and continue with an empty list. My question is: when I do myList[] will the old data be deleted or will it remain somewhere on the Ram. I fear, that the pointer of myList will just point somewhere else and and the old data will not be toched.
myList = []
for i in range(bigNumber1)
for k in range(bigNumber2)
myList.append( bigData(i,k) )
savemat("data"+str(i), "data":{myList})
myList = []
Good day.
In python and many other programming languages, object pointers that reference an unused object will be collected by the garbage collector, a feature that looks for these objects and clears them from memory. How this is done exactly under the hook, can be read about in more detail here:
https://stackify.com/python-garbage-collection/
Happy codings!
Python uses Garbage Collection for memory management (read more here).
The garbage collector attempts to reclaim memory which was allocated
by the program, but is no longer referenced—also called garbage.
So your data will automatically be deleted. However, if you want to sure that the memory is free at a particular point, you can call the GC directly with
import gc
gc.collect()
This is not recommended though.
Please observe this simple code:
import random
while True:
L = list( str(random.random()))
Question: if I let this run, will python run out of memory?
reason I am asking:
First iteration of this loop, a list is created, and 'L' is assigned to represent that list. The next iteration of this loop, another list is created, 'L' is yanked from the previous list and and assigned to the new list. The previous list has lost it reference. Is the previous list going to be garbage collected? if not at the end of each iteration, but eventually I hope?
Having said that, just expand the scenario a bit further into multiprocessing:
import random
while True:
l1 = list( str(random.random()))
pseudo: multiprocessing.Queue.put(l1)
# how is l1 handled here?
# is l1 .copy()-ed to the queue or referenced by the queue?
# is l1 destoryed in this process (this while loop) at the end of iteration?
The primary means of garbage collection is reference counting in CPython (the reference implementation of the language). When there are no longer any references to an object, the memory it occupies is freed immediately and can be reused by other Python objects. (It may or may not ever be released back to the operating system.) There are a few exceptions of objects that are never freed: smallish integers, interned strings (including literals), the empty tuple, None.
So to answer your initial question, L is going to be reassigned to a new list on each iteration. At that point, the previous list has no references and its memory will be released immediately.
With regard to your second example, putting something into a multiprocessing queue is, of necessity, a copy operation. The object must be serialized ("pickled" in Python parlance) to be sent to the new process, which has its own memory space and can't see anything from the original process's memory. When, in your loop, you reassign li to the next list, the previous list has no references and, again, will be released.
At the end of your loop, the L or l1 variable still refers to a list: the one you created in the last iteration of the loop. If you want to release this object, just del L or del l1 respectively.
PS -- When objects contain references to themselves (either directly, or indirectly through a chain of other objects), this is referred to a cyclic reference. These aren't collected automatically by reference counting and Python has a separate garbage collector which runs periodically to clean them up.
We can easily test this by adding a custom __del__ command to a class as watch what happens:
class WithDestructor(object):
def __del__(self):
print(f"Exploding {self}")
Q=None
for i in range(5):
Q = WithDestructor()
print(f"In loop {i}")
If cleanup only happened at the end of the loop, we'd get the loop output followed by the destructor output. Instead I get it interlaced, so the object in Q is getting immediately cleaned up when Q is reassigned.
In loop 0
Exploding <__main__.WithDestructor object at 0x7f93141176d8>
In loop 1
Exploding <__main__.WithDestructor object at 0x7f93141172b0>
In loop 2
Exploding <__main__.WithDestructor object at 0x7f93141176d8>
In loop 3
Exploding <__main__.WithDestructor object at 0x7f93141172b0>
In loop 4
I've a python code where the memory consumption steadily grows with time. While there are several objects which can legitimately grow quite large, I'm trying to understand whether the memory footprint I'm observing is due to these objects, or is it just me littering the memory with temporaries which don't get properly disposed of --- Being a recent convert from a world of manual memory management, I guess I just don't exactly understand some very basic aspects of how the python runtime deals with temporary objects.
Consider a code with roughly this general structure (am omitting irrelevant details):
def tweak_list(lst):
new_lst = copy.deepcopy(lst)
if numpy.random.rand() > 0.5:
new_lst[0] += 1 # in real code, the operation is a little more sensible :-)
return new_lst
else:
return lst
lst = [1, 2, 3]
cache = {}
# main loop
for step in xrange(some_large_number):
lst = tweak_list(lst) # <<-----(1)
# do something with lst here, cut out for clarity
cache[tuple(lst)] = 42 # <<-----(2)
if step%chunk_size == 0:
# dump the cache dict to a DB, free the memory (?)
cache = {} # <<-----(3)
Questions:
What is the lifetime of a new_list created in a tweak_list? Will it be destroyed on exit, or will it be garbage collected (at which point?). Will repeated calls to tweak_list generate a gazillion of small lists lingering around for a long time?
Is there a temporary creation when converting a list to a tuple to be used as a dict key?
Will setting a dict to an empty one release the memory?
Or, am I approaching the issue at hand from a completely wrong perspective?
new_lst is cleaned up when the function exists when not returned. It's reference count drops to 0, and it can be garbage collected. On current cpython implementations that happens immediately.
If it is returned, the value referenced by new_lst replaces lst; the list referred to by lst sees it's reference count drop by 1, but the value originally referred to by new_lst is still being referred to by another variable.
The tuple() key is a value stored in the dict, so that's not a temporary. No extra objects are created other than that tuple.
Replacing the old cache dict with a new one will reduce the reference count by one. If cache was the only reference to the dict it'll be garbage collected. This then causes the reference count for all contained tuple keys to drop by one. If nothing else references to those those will be garbage collected.
Note that when Python frees memory, that does not necessarily mean the operating system reclaims it immediately. Most operating systems will only reclaim the memory when it is needed for something else, instead presuming the program might need some or all of that memory again soon.
You might want to take a look at Heapy as a way of profiling memory usage. I think PySizer is also used in some instances for this but I am not familiar with it. ObjGraph is also a strong tool to take a lok at.
In many cases, you are sure you definitely won't use the list again, so you want the memory to be released right now.
a = [11,22,34,567,9999]
del a
I'm not sure if the above really releases the memory. You can use:
del a[:]
that actually removes all the elements in list a.
Is that the best way to release the memory?
def realse_list(a):
del a[:]
del a
I have the same question about tuples and sets.
def release_list(a):
del a[:]
del a
Do not ever do this. Python automatically frees all objects that are not referenced any more, so a simple del a ensures that the list's memory will be released if the list isn't referenced anywhere else. If that's the case, then the individual list items will also be released (and any objects referenced only from them, and so on and so on), unless some of the individual items were also still referenced.
That means the only time when del a[:]; del a will release more than del a on its own is when the list is referenced somewhere else. This is precisely when you shouldn't be emptying out the list: someone else is still using it!!!
Basically, you shouldn't be thinking about managing pieces of memory. Instead, think about managing references to objects. In 99% of all Python code, Python cleans up everything you don't need pretty soon after the last time you needed it, and there's no problem. Every time a function finishes all the local variables in that function "die", and if they were pointing to objects that are not referenced anywhere else they'll be deleted, and that will cascade to everything contained within those objects.
The only time you need to think about it is when you have a large object (say a huge list), you do something with it, and then you begin a long-running (or memory intensive) sub-computation, where the large object isn't needed for the sub-computation. Because you have a reference to it, the large object won't be released until the sub-computation finishes and then you return. In that sort of case (and only that sort of case), you can explicitly del your reference to the large object before you begin the sub-computation, so that the large object can be freed earlier (if no-one else is using it; if a caller passed the object in to you and the caller does still need it after you return, you'll be very glad that it doesn't get released).
Python uses Reference Count to manage its resource.
import sys
class foo:
pass
b = foo()
a = [b, 1]
sys.getrefcount(b) # gives 3
sys.getrefcount(a) # gives 2
a = None # delete the list
sys.getrefcount(b) # gives 2
In the above example, b's reference count will be incremented when you put it into a list, and as you can see, when you delete the list, the reference count of b get decremented too. So in your code
def release_list(a):
del a[:]
del a
was redundant.
In summary, all you need to do is assigning the list into a None object or use del keyword to remove the list from the attributes dictionary. (a.k.a, to unbind the name from the actual object). For example,
a = None # or
del a
When the reference count of an object goes to zero, python will free the memory for you. To make sure the object gets deleted, you have to make sure no other places reference the object by name, or by container.
sys.getrefcount(b) # gives 2
If sys.getrefcount gives you 2, that means you are the only one who had the reference of the object and when you do
b = None
it will get freed from the memory.
As #monkut notes, you probably shouldn't worry too much about memory management in most situations. If you do have a giant list that you're sure you're done with now and it won't go out of the current function's scope for a while, though:
del a simply removes your name a for that chunk of memory. If some other function or structure or whatever has a reference to it still, it won't be deleted; if this code has the only reference to that list under the name a and you're using CPython, the reference counter will immediately free that memory. Other implementations (PyPy, Jython, IronPython) might not kill it right away because they have different garbage collectors.
Because of this, the del a statement in your realse_list function doesn't actually do anything, because the caller still has a reference!
del a[:] will, as you note, remove the elements from the list and thus probably most of its memory usage.
You can do the_set.clear() for similar behavior with sets.
All you can do with a tuple, because they're immutable, is del the_tuple and hope nobody else has a reference to it -- but you probably shouldn't have enormous tuples!
If your worried about memory management and performance for data types why not use something like a linked double queue.
First its memory footprint is scattered though out the memory so you won't have to allocate a large chunk of continuous memory right off the bat.
Second you will see faster access times for enqueueing and dequeueing because unlike in a standard list when you remove lets say a middle element there is no need for sliding the rest of the list over in the index which takes time in large lists.
I should also note if you are using just integers I would suggest looking into a binary heap as you will see O(log^2n) access times compared to mostly O(N) with lists.
If you need to release list's memory, keeping the list's name, you can simply write a=[]
I've been working with python for quite a bit of time and I'm confused regarding few issues in the areas of Garbage Collection, memory management as well as the real deal with the deletion of the variables and freeing memory.
>>> pop = range(1000)
>>> p = pop[100:700]
>>> del pop[:]
>>> pop
[]
>>> p
[100.. ,200.. 300...699]
In the above piece of code, this happens. But,
>>> pop = range(1000)
>>> k = pop
>>> del pop[:]
>>> pop
[]
>>> k
[]
Here in the 2nd case, it implies that the k is just pointing the list 'pop'.
First Part of the question :
But, what's happening in the 1st code block? Is the memory containing [100:700] elements not getting deleted or is it duplicated when list 'p' is created?
Second Part of the question :
Also, I've tried including gc.enable and gc.collect statements in between wherever possible but there's no change in the memory utilization in both the codes. This is kind of puzzling. Isn't this bad that python is not returning free memory back to OS? Correct me if I'm wrong in the little research I've did. Thanks in advance.
Slicing a sequence results in a new sequence, with a shallow copy of the appropriate elements.
Returning the memory to the OS might be bad, since the script may turn around and create new objects, at which point Python would have to request the memory from the OS again.
1st part:
In the 1st code block, you create a new object where the elements of the old one are copied before deleting that one.
In the 2nd code block, however, you just assign a reference to the same object to another variable. Then you empty the list, which, of course, is visible via both references.
2nd part: Memory is returned when appropriate, but not always. Under the hood of Python, there is a memory allocator which has control over where the memory comes from. There are 2 ways: via the brk()/sbrk() mechanism (for smaller memory blocks) and via mmap() (larger blocks).
Here we have rather smaller blocks which get allocated directly at the end of the data segment:
datadatadata object1object1 object2object2
If we only free object1, we have a memory gap which can be reused for the next object, but cannot easily freed and returned to the OS.
If we free both objects, memory could be returned. But there probably is a threshold for keeping memory back for a while, because returning everything immediately is not the very best thing.