In many cases, you are sure you definitely won't use the list again, so you want the memory to be released right now.
a = [11,22,34,567,9999]
del a
I'm not sure if the above really releases the memory. You can use:
del a[:]
that actually removes all the elements in list a.
Is that the best way to release the memory?
def realse_list(a):
del a[:]
del a
I have the same question about tuples and sets.
def release_list(a):
del a[:]
del a
Do not ever do this. Python automatically frees all objects that are not referenced any more, so a simple del a ensures that the list's memory will be released if the list isn't referenced anywhere else. If that's the case, then the individual list items will also be released (and any objects referenced only from them, and so on and so on), unless some of the individual items were also still referenced.
That means the only time when del a[:]; del a will release more than del a on its own is when the list is referenced somewhere else. This is precisely when you shouldn't be emptying out the list: someone else is still using it!!!
Basically, you shouldn't be thinking about managing pieces of memory. Instead, think about managing references to objects. In 99% of all Python code, Python cleans up everything you don't need pretty soon after the last time you needed it, and there's no problem. Every time a function finishes all the local variables in that function "die", and if they were pointing to objects that are not referenced anywhere else they'll be deleted, and that will cascade to everything contained within those objects.
The only time you need to think about it is when you have a large object (say a huge list), you do something with it, and then you begin a long-running (or memory intensive) sub-computation, where the large object isn't needed for the sub-computation. Because you have a reference to it, the large object won't be released until the sub-computation finishes and then you return. In that sort of case (and only that sort of case), you can explicitly del your reference to the large object before you begin the sub-computation, so that the large object can be freed earlier (if no-one else is using it; if a caller passed the object in to you and the caller does still need it after you return, you'll be very glad that it doesn't get released).
Python uses Reference Count to manage its resource.
import sys
class foo:
pass
b = foo()
a = [b, 1]
sys.getrefcount(b) # gives 3
sys.getrefcount(a) # gives 2
a = None # delete the list
sys.getrefcount(b) # gives 2
In the above example, b's reference count will be incremented when you put it into a list, and as you can see, when you delete the list, the reference count of b get decremented too. So in your code
def release_list(a):
del a[:]
del a
was redundant.
In summary, all you need to do is assigning the list into a None object or use del keyword to remove the list from the attributes dictionary. (a.k.a, to unbind the name from the actual object). For example,
a = None # or
del a
When the reference count of an object goes to zero, python will free the memory for you. To make sure the object gets deleted, you have to make sure no other places reference the object by name, or by container.
sys.getrefcount(b) # gives 2
If sys.getrefcount gives you 2, that means you are the only one who had the reference of the object and when you do
b = None
it will get freed from the memory.
As #monkut notes, you probably shouldn't worry too much about memory management in most situations. If you do have a giant list that you're sure you're done with now and it won't go out of the current function's scope for a while, though:
del a simply removes your name a for that chunk of memory. If some other function or structure or whatever has a reference to it still, it won't be deleted; if this code has the only reference to that list under the name a and you're using CPython, the reference counter will immediately free that memory. Other implementations (PyPy, Jython, IronPython) might not kill it right away because they have different garbage collectors.
Because of this, the del a statement in your realse_list function doesn't actually do anything, because the caller still has a reference!
del a[:] will, as you note, remove the elements from the list and thus probably most of its memory usage.
You can do the_set.clear() for similar behavior with sets.
All you can do with a tuple, because they're immutable, is del the_tuple and hope nobody else has a reference to it -- but you probably shouldn't have enormous tuples!
If your worried about memory management and performance for data types why not use something like a linked double queue.
First its memory footprint is scattered though out the memory so you won't have to allocate a large chunk of continuous memory right off the bat.
Second you will see faster access times for enqueueing and dequeueing because unlike in a standard list when you remove lets say a middle element there is no need for sliding the rest of the list over in the index which takes time in large lists.
I should also note if you are using just integers I would suggest looking into a binary heap as you will see O(log^2n) access times compared to mostly O(N) with lists.
If you need to release list's memory, keeping the list's name, you can simply write a=[]
Related
Python works with reference counting. That means, if there is no more reference to a value, then the memory of that value is recycled. Or in other words. As long as there is at least one remaining reference, the obj is not deleted and the memory is not released.
Lets consider the following example:
def myfn():
result = work_with(BigObj()) # reference 1 to BigObj is on the stack frame.
# Not yet counting any
# reference inside of work_with function
# after work_with returns: The stack frame
# and reference 1 are deleted. memory of BigObj
# is released
return result
def work_with(big_obj): # here we have another reference to BigObj
big_obj = None # let's assume, that need more memory and we don't
# need big_obj any_more
# the reference inside work_with is deleted. However,
# there is still the reference on the stack. So the
# memory is not released until work_with returns
other_big_obj = BigObj() # we need the memory for another BigObj -> we may run
# out of memory here
So my question is:
Why does CPython hold an additional reference to values which are passed to functions on the stack? Is there any special purpose behind this or is it just an "unlucky" implementation detail?
My first thought on this is:
To prevent the reference count from dropping to zero. However, we have still an alive reference inside the called function. So this does not make any sense to me.
It is the way CPython passes parameters to a function. The frame holds a reference to its argument to allow passing temporary objects. And the frame is destroyed only when the function returns, so all parameters get an additional reference during the function call.
This is the reason why the doc for sys.getrefcount says:
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
In fact, in the callee, the reference to the arguments is known to be a borrowed reference, meaning that the callee never has to decrement it. So when you set it to None it will not destroy the object.
A different implementation would be possible, where the callee should decrement the reference to its arguments. The benefit would be that it would allow immediate destruction of temporaries. But the drawback would be that the callee should explicitely decrement the reference count of all its parameters. At C level, ref counting is already tedious, and I assume that Python implementers made that choice for simplicity.
By the way, it only matters when you pass a large temporary object to a function which is not the most common use case.
TL/DR: IMHO there is no real rationale for preventing a function to immediately destroy a temporary, it is just a consequence of the general implementation of functions in CPython.
Please observe this simple code:
import random
while True:
L = list( str(random.random()))
Question: if I let this run, will python run out of memory?
reason I am asking:
First iteration of this loop, a list is created, and 'L' is assigned to represent that list. The next iteration of this loop, another list is created, 'L' is yanked from the previous list and and assigned to the new list. The previous list has lost it reference. Is the previous list going to be garbage collected? if not at the end of each iteration, but eventually I hope?
Having said that, just expand the scenario a bit further into multiprocessing:
import random
while True:
l1 = list( str(random.random()))
pseudo: multiprocessing.Queue.put(l1)
# how is l1 handled here?
# is l1 .copy()-ed to the queue or referenced by the queue?
# is l1 destoryed in this process (this while loop) at the end of iteration?
The primary means of garbage collection is reference counting in CPython (the reference implementation of the language). When there are no longer any references to an object, the memory it occupies is freed immediately and can be reused by other Python objects. (It may or may not ever be released back to the operating system.) There are a few exceptions of objects that are never freed: smallish integers, interned strings (including literals), the empty tuple, None.
So to answer your initial question, L is going to be reassigned to a new list on each iteration. At that point, the previous list has no references and its memory will be released immediately.
With regard to your second example, putting something into a multiprocessing queue is, of necessity, a copy operation. The object must be serialized ("pickled" in Python parlance) to be sent to the new process, which has its own memory space and can't see anything from the original process's memory. When, in your loop, you reassign li to the next list, the previous list has no references and, again, will be released.
At the end of your loop, the L or l1 variable still refers to a list: the one you created in the last iteration of the loop. If you want to release this object, just del L or del l1 respectively.
PS -- When objects contain references to themselves (either directly, or indirectly through a chain of other objects), this is referred to a cyclic reference. These aren't collected automatically by reference counting and Python has a separate garbage collector which runs periodically to clean them up.
We can easily test this by adding a custom __del__ command to a class as watch what happens:
class WithDestructor(object):
def __del__(self):
print(f"Exploding {self}")
Q=None
for i in range(5):
Q = WithDestructor()
print(f"In loop {i}")
If cleanup only happened at the end of the loop, we'd get the loop output followed by the destructor output. Instead I get it interlaced, so the object in Q is getting immediately cleaned up when Q is reassigned.
In loop 0
Exploding <__main__.WithDestructor object at 0x7f93141176d8>
In loop 1
Exploding <__main__.WithDestructor object at 0x7f93141172b0>
In loop 2
Exploding <__main__.WithDestructor object at 0x7f93141176d8>
In loop 3
Exploding <__main__.WithDestructor object at 0x7f93141172b0>
In loop 4
I have a ponter to a dict() and I keep update it, like that:
def creating_new_dict():
MyDict = dict()
#... rest of code - get data to MyDict
return MyDict
def main():
while(1):
MyDict = creating_new_dict()
time.sleep(20)
I know that MyDict is a pointer to the dict so My question is what happens to the old data in MyDict?
Is it deleted or should I make sure it's gone?
Coming back around to answer your question since nobody else did:
I know that MyDict is a pointer to the dict
Ehhh. MyDict is a reference to your dict object, not a pointer. Slight digression follows:
You can't mutate immutable types within a function, because you're handling references, not pointers. See:
def adder(x):
x = x + 1
a = 0
adder(a)
a
Out[4]: 0
I won't go too far down the rabbit hole, but just this: there is nothing I could have put into the body of adder that would have caused a to change. Because variables are references, not pointers.
Okay, we're back.
so My question is what happens to the old data in MyDict?
Well, it's still there, in some object somewhere, but you blew up the reference to it. The reference is no more, it has ceased to be. In fact, there are exactly zero extant references to the old dict, which will be significant in the next paragraph!
Is it deleted
It's not deleted per se, but you could basically think of it as deleted. It has no remaining references, which means it is eligible for garbage collection. Python's gc, when triggered, will come along and get rid of all the objects that have no remaining references, which includes all your orphaned dicts.
But like I said in my comment: gc will not be triggered -now-, but some time in the (possibly very near) future. How soon that happens depends on.. stuff, including what flavor of python you're running. cPython (the one you're probably running) will usually gc the object immediately, but this is not guaranteed and should not be relied upon - it is left up to the implementation. What you can count on - the gc will come along and clean up eventually. (well, as long as you haven't disabled it)
or should I make sure it's gone?
Nooooo. Nope. Don't do the garbage collector's job. Only when you have a very specific performance reason should you do so.
I still don't believe you, show me an example.
Er, you said that, right? Okay:
class DeleteMe:
def __del__(self): #called when the object is destroyed, i.e. gc'd
print(':(')
d = DeleteMe()
d = DeleteMe() # first one will be garbage collected, leading to sad face
:(
And, like I said before, you shouldn't count on :( happening immediately, just that gc will happen eventually.
I've a python code where the memory consumption steadily grows with time. While there are several objects which can legitimately grow quite large, I'm trying to understand whether the memory footprint I'm observing is due to these objects, or is it just me littering the memory with temporaries which don't get properly disposed of --- Being a recent convert from a world of manual memory management, I guess I just don't exactly understand some very basic aspects of how the python runtime deals with temporary objects.
Consider a code with roughly this general structure (am omitting irrelevant details):
def tweak_list(lst):
new_lst = copy.deepcopy(lst)
if numpy.random.rand() > 0.5:
new_lst[0] += 1 # in real code, the operation is a little more sensible :-)
return new_lst
else:
return lst
lst = [1, 2, 3]
cache = {}
# main loop
for step in xrange(some_large_number):
lst = tweak_list(lst) # <<-----(1)
# do something with lst here, cut out for clarity
cache[tuple(lst)] = 42 # <<-----(2)
if step%chunk_size == 0:
# dump the cache dict to a DB, free the memory (?)
cache = {} # <<-----(3)
Questions:
What is the lifetime of a new_list created in a tweak_list? Will it be destroyed on exit, or will it be garbage collected (at which point?). Will repeated calls to tweak_list generate a gazillion of small lists lingering around for a long time?
Is there a temporary creation when converting a list to a tuple to be used as a dict key?
Will setting a dict to an empty one release the memory?
Or, am I approaching the issue at hand from a completely wrong perspective?
new_lst is cleaned up when the function exists when not returned. It's reference count drops to 0, and it can be garbage collected. On current cpython implementations that happens immediately.
If it is returned, the value referenced by new_lst replaces lst; the list referred to by lst sees it's reference count drop by 1, but the value originally referred to by new_lst is still being referred to by another variable.
The tuple() key is a value stored in the dict, so that's not a temporary. No extra objects are created other than that tuple.
Replacing the old cache dict with a new one will reduce the reference count by one. If cache was the only reference to the dict it'll be garbage collected. This then causes the reference count for all contained tuple keys to drop by one. If nothing else references to those those will be garbage collected.
Note that when Python frees memory, that does not necessarily mean the operating system reclaims it immediately. Most operating systems will only reclaim the memory when it is needed for something else, instead presuming the program might need some or all of that memory again soon.
You might want to take a look at Heapy as a way of profiling memory usage. I think PySizer is also used in some instances for this but I am not familiar with it. ObjGraph is also a strong tool to take a lok at.
I've been working with python for quite a bit of time and I'm confused regarding few issues in the areas of Garbage Collection, memory management as well as the real deal with the deletion of the variables and freeing memory.
>>> pop = range(1000)
>>> p = pop[100:700]
>>> del pop[:]
>>> pop
[]
>>> p
[100.. ,200.. 300...699]
In the above piece of code, this happens. But,
>>> pop = range(1000)
>>> k = pop
>>> del pop[:]
>>> pop
[]
>>> k
[]
Here in the 2nd case, it implies that the k is just pointing the list 'pop'.
First Part of the question :
But, what's happening in the 1st code block? Is the memory containing [100:700] elements not getting deleted or is it duplicated when list 'p' is created?
Second Part of the question :
Also, I've tried including gc.enable and gc.collect statements in between wherever possible but there's no change in the memory utilization in both the codes. This is kind of puzzling. Isn't this bad that python is not returning free memory back to OS? Correct me if I'm wrong in the little research I've did. Thanks in advance.
Slicing a sequence results in a new sequence, with a shallow copy of the appropriate elements.
Returning the memory to the OS might be bad, since the script may turn around and create new objects, at which point Python would have to request the memory from the OS again.
1st part:
In the 1st code block, you create a new object where the elements of the old one are copied before deleting that one.
In the 2nd code block, however, you just assign a reference to the same object to another variable. Then you empty the list, which, of course, is visible via both references.
2nd part: Memory is returned when appropriate, but not always. Under the hood of Python, there is a memory allocator which has control over where the memory comes from. There are 2 ways: via the brk()/sbrk() mechanism (for smaller memory blocks) and via mmap() (larger blocks).
Here we have rather smaller blocks which get allocated directly at the end of the data segment:
datadatadata object1object1 object2object2
If we only free object1, we have a memory gap which can be reused for the next object, but cannot easily freed and returned to the OS.
If we free both objects, memory could be returned. But there probably is a threshold for keeping memory back for a while, because returning everything immediately is not the very best thing.