Python garbage collection

Python garbage collection - python

I have created some python code which creates an object in a loop, and in every iteration overwrites this object with a new one of the same type. This is done 10.000 times, and Python takes up 7mb of memory every second until my 3gb RAM is used. Does anyone know of a way to remove the objects from memory?

I think this is circular reference (though the question isn't explicit about this information.)
One way to solve this problem is to manually invoke garbage collection. When you manually run garbage collector, it will sweep circular referenced objects too.
import gc
for i in xrange(10000):
j = myObj()
processObj(j)
#assuming count reference is not zero but still
#object won't remain usable after the iteration
if !(i%100):
gc.collect()
Here don't run garbage collector too often because it has its own overhead, e.g. if you run garbage collector in every loop, interpretation will become extremely slow.

You haven't provided enough information - this depends on the specifics of the object you are creating and what else you're doing with it in the loop. If the object does not create circular references, it should be deallocated on the next iteration. For example, the code
for x in range(100000):
obj = " " * 10000000
will not result in ever-increasing memory allocation.

This is an old error that was corrected for some types in python 2.5. What was happening was that python was not so good at collecting things like empty lists/dictionaries/tupes/floats/ints. In python 2.5 this was fixed...mostly. However floats and ints are singletons for comparisons so once one of those is created it stays around as long as the interpreter is alive. I've been bitten by this worst when dealing with large amount of floats since they have a nasty habit of being unique. This was characterized for python 2.4 and updated about it being folded into python 2.5
The best way I've found around it is to upgrade to python 2.5 or newer to take care of the lists/dictionaries/tuples issue. For numbers the only solution is to not let large amounts of numbers get into python. I've done it with my own wrapper to a c++ object, but I have the impression that numpy.array will give similar results.
As a post script I have no idea what has happened to this in python 3, but I'm suspicious that numbers are still part of a singleton. So the memory leak is actually a feature of the language.

If you're creating circular references, your objects won't be deallocated immediately, but have to wait for a GC cycle to run.
You could use the weakref module to address this problem, or explicitly del your objects after use.

I found that in my case (with Python 2.5.1), with circular references involving classes that have __del__() methods, not only was garbage collection not happening in a timely manner, the __del__() methods of my objects were never getting called, even when the script exited. So I used weakref to break the circular references and all was well.
Kudos to Miles who provided all the information in his comments for me to put this together.

Here's one thing you can do at the REPL to force a dereferencing of a variable:
>>> x = 5
>>> x
5
>>> del x
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

weakref can be used for circular object structured code as in the explained example

Related

Make a Python memory leak on purpose

I'm looking for an example that purposely makes a memory leak in Python.
It should be as short and simple as possible and ideally not use non-standard dependencies (that could simply do the memory leak in C code) or multi-threading/processing.
I've seen memory leaks achieved before but only when bad things were being done to libraries such as matplotlib. Also, there are many questions about how to find and fix memory leaks in Python, but they all seem to be big programs with lots of external dependencies.
The reason for asking this is about how good Python's GC really is. I know it detects reference cycles. However, can it be tricked? Is there some way to leak memory? It may be impossible to solve the most restrictive version of this problem. In that case, I'm very happy to see a rigorous argument why. Ideally, the answer should refer to the actual implementation and not just state that "an ideal garbage collector would be ideal and disallow memory leaks".
For nitpicking purposes: An ideal solution to the problem would be a program like this:
# Use Python version at least v3.10
# May use imports.
# Bonus points for only standard library.
# If the problem is unsolvable otherwise (please argue that it is),
# then you may use e.g. Numpy, Scipy, Pandas. Minus points for Matplotlib.
def memleak():
# do whatever you want but only within this function
# No global variables!
# Bonus points for no observable side-effects (besides memory use)
# ...
for _ in range(100):
memleak()
The function must return and be called multiple times. Goals in order of bonus points (high number = many bonus points)
the program keeps using more memory, until it crashes.
after calling the function multiple times (e.g. the 100 specified above), the program may continue doing other (normal) things such that the memory leaked during the function is never freed.
Like 2 but the memory cannot be freed, even by by calling gc manually and similar means.

One way to "trick" CPython's garbage collector into leaking memory is by invalidating an object's reference count. We can do this by creating an extraneous strong reference that never gets deleted.
To create a new strong reference, we need to invoke Py_IncRef (or Py_NewRef) from Python's C API. This can be done via ctypes.pythonapi:
import ctypes
import sys
# Create C API callable
inc_ref = ctypes.pythonapi.Py_IncRef
inc_ref.argtypes = [ctypes.py_object]
inc_ref.restype = None
# Create an arbitrary object.
obj = object()
# Print the number of references to obj.
# This should be 2:
# - one for the global variable 'obj'
# - one for the argument inside of 'sys.getrefcount'
print(sys.getrefcount(obj))
# Create a new strong reference.
inc_ref(obj)
# Print the number of references to obj.
# This should be 3 now.
print(sys.getrefcount(obj))
outputs
2
3
Concretely, you can write your memleak function as
import ctypes
def memleak():
# Create C api callable
inc_ref = ctypes.pythonapi.Py_IncRef
inc_ref.argtypes = [ctypes.py_object]
inc_ref.restype = None
# Allocate a large object
obj = list(range(10_000_000))
# Increment its ref count
inc_ref(obj)
# obj will have a dangling reference after this function exits
memleak() # leaks memory
An object with a dangling strong reference will never be freed by reference counting, and won't be detected as an unreachable object by the optional garbage collector. Running gc manually via
gc.collect()
will have not effect.

How to return used memory after funtion call in python

I am trying to write a python module which checks consistency of the mac addresses stored in the HW memory. The scale could go upto 80K mac addresses. But when I make multiple calls to get a list of mac addresses through a python method, the memory does not get freed up and eventually I am running out of memory.
An example of what I am doing is:
import resource
import copy
def get_list():
list1 = None
list1 = []
for j in range(1,10):
for i in range(0,1000000):
list1.append('abcdefg')
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
return list1
for i in range(0,5):
x=get_list()
On executing the script, I get:
45805
53805
61804
69804
77803
85803
93802
101801
109805
118075
126074
134074
142073
150073
158072
166072
174071
182075
190361
198361
206360
214360
222359
230359
238358
246358
254361
262365
270364
278364
286363
294363
302362
310362
318361
326365
334368
342368
350367
358367
366366
374366
382365
390365
398368
i.e. the memory usage reported keeps going up.
Is it that I am looking at the memory usage in a wrong way?
And if not, is there a way to not have the memory usage go up between function calls in a loop. (In my case with mac addresses, I do not call the same list of mac addresses again. I get the list from a different section of the HW memory. i.e. all the calls to get mac addresses are valid, but after each call the data obtained is useless and can be discarded.

Python is a managed language. Memory is, generally speaking, the concern of the implementation rather than the average developer. The system is designed to reclaim memory that you are no longer using automatically.
If you are using CPython, an object will be destroyed when its reference count reaches zero, or when the cyclic garbage collector finds and collects it. If you want to reclaim the memory belonging to an object, you need to ensure that no references to it remain, or at least that it is not reachable from any stack frame's variables. That is to say, it should not be possible to refer to the data you want reclaimed, either directly or through some expression such as foo.bar[42], from any currently executing function.
If you are using another implementation, such as PyPy, the rules may vary. In particular, reference counting is not required by the Python language standard, so objects may not go away until the next garbage collection run (and then you may have to wait for the right generation to be collected).
For older versions of Python (prior to Python 3.4), you also need to worry about reference cycles which involve finalizers (__del__() methods). The old garbage collector cannot collect such cycles, so they will (basically) get leaked. Most built-in types do not have finalizers, are not capable of participating in reference cycles, or both, but this is a legitimate concern if you are creating your own classes.
For your use case, you should empty or replace the list when you no longer need its contents (with e.g. list1 = [] or del list1[:]), or return from the function which created it (assuming it's a local variable, rather than a global variable or some other such thing). If you find that you are still running out of memory after that, you should either switch to a lower-overhead language like C or invest in more memory. For more complicated cases, you can use the gc module to test and evaluate how the garbage collector is interacting with your program.

Try this : it might not Lways free the memory as it may still be in use.
See if it works
gc.collect()

Managing Memory with Python Reading Objects of Varying Sizes from OODB's

I'm reading in a collection of objects (tables like sqlite3 tables or dataframes) from an Object Oriented DataBase, most of which are small enough that the Python garbage collector can handle without incident. However, when they get larger in size (less than 10 MB's) the GC doesn't seem to be able to keep up.
psuedocode looks like this:
walk = walkgenerator('/path')
objs = objgenerator(walk)
with db.transaction(bundle=True, maxSize=10000, maxParts=10):
oldobj = None
oldtable = None
for obj in objs:
currenttable = obj.table
if oldtable and oldtable in currenttable:
db.delete(oldobj.path)
del oldtable
oldtable = currenttable
del oldobj
oldobj = obj
if not count % 100:
gc.collect()
I'm looking for an elegant way to manage memory while allowing Python to handle it when possible.
Perhaps embarrassingly, I've tried using del to help clean up reference counts.
I've tried gc.collect() at varying modulo counts in my for loops:
100 (no difference),
1 (slows loop quite a lot, and I will still get a memory error of some type),
3 (loop is still slow but memory still blows up eventually)
Suggestions are appreciated!!!
Particularly, if you can give me tools to assist with introspection. I've used Windows Task Manager here, and it seems to more or less randomly spring a memory leak. I've limited the transaction size as much as I feel comfortable, and that seems to help a little bit.

There's not enough info here to say much, but what I do have to say wouldn't fit in a comment so I'll post it here ;-)
First, and most importantly, in CPython garbage collection is mostly based on reference counting. gc.collect() won't do anything for you (except burn time) unless trash objects are involved in reference cycles (an object A can be reached from itself by following a chain of pointers transitively reachable from A). You create no reference cycles in the code you showed, but perhaps the database layer does.
So, after you run gc.collect(), does memory use go down at all? If not, running it is pointless.
I expect it's most likely that the database layer is holding references to objects longer than necessary, but digging into that requires digging into exact details of how the database layer is implemented.
One way to get clues is to print the result of sys.getrefcount() applied to various large objects:
>>> import sys
>>> bigobj = [1] * 1000000
>>> sys.getrefcount(bigobj)
2
As the docs say, the result is generally 1 larger than you might hope, because the refcount of getrefcount()'s argument is temporarily incremented by 1 simply because it is being used (temporarily) as an argument.
So if you see a refcount greater than 2, del won't free the object.
Another way to get clues is to pass the object to gc.get_referrers(). That returns a list of objects that directly refer to the argument (provided that a referrer participates in Python's cyclic gc).
BTW, you need to be clearer about what you mean by "doesn't seem to work" and "blows up eventually". Can't guess. What exactly goes wrong? For example, is MemoryError raised? Something else? Traebacks often yield a world of useful clues.

how to release used memory immediately in python list?

In many cases, you are sure you definitely won't use the list again, so you want the memory to be released right now.
a = [11,22,34,567,9999]
del a
I'm not sure if the above really releases the memory. You can use:
del a[:]
that actually removes all the elements in list a.
Is that the best way to release the memory?
def realse_list(a):
del a[:]
del a
I have the same question about tuples and sets.

def release_list(a):
del a[:]
del a
Do not ever do this. Python automatically frees all objects that are not referenced any more, so a simple del a ensures that the list's memory will be released if the list isn't referenced anywhere else. If that's the case, then the individual list items will also be released (and any objects referenced only from them, and so on and so on), unless some of the individual items were also still referenced.
That means the only time when del a[:]; del a will release more than del a on its own is when the list is referenced somewhere else. This is precisely when you shouldn't be emptying out the list: someone else is still using it!!!
Basically, you shouldn't be thinking about managing pieces of memory. Instead, think about managing references to objects. In 99% of all Python code, Python cleans up everything you don't need pretty soon after the last time you needed it, and there's no problem. Every time a function finishes all the local variables in that function "die", and if they were pointing to objects that are not referenced anywhere else they'll be deleted, and that will cascade to everything contained within those objects.
The only time you need to think about it is when you have a large object (say a huge list), you do something with it, and then you begin a long-running (or memory intensive) sub-computation, where the large object isn't needed for the sub-computation. Because you have a reference to it, the large object won't be released until the sub-computation finishes and then you return. In that sort of case (and only that sort of case), you can explicitly del your reference to the large object before you begin the sub-computation, so that the large object can be freed earlier (if no-one else is using it; if a caller passed the object in to you and the caller does still need it after you return, you'll be very glad that it doesn't get released).

Python uses Reference Count to manage its resource.
import sys
class foo:
pass
b = foo()
a = [b, 1]
sys.getrefcount(b) # gives 3
sys.getrefcount(a) # gives 2
a = None # delete the list
sys.getrefcount(b) # gives 2
In the above example, b's reference count will be incremented when you put it into a list, and as you can see, when you delete the list, the reference count of b get decremented too. So in your code
def release_list(a):
del a[:]
del a
was redundant.
In summary, all you need to do is assigning the list into a None object or use del keyword to remove the list from the attributes dictionary. (a.k.a, to unbind the name from the actual object). For example,
a = None # or
del a
When the reference count of an object goes to zero, python will free the memory for you. To make sure the object gets deleted, you have to make sure no other places reference the object by name, or by container.
sys.getrefcount(b) # gives 2
If sys.getrefcount gives you 2, that means you are the only one who had the reference of the object and when you do
b = None
it will get freed from the memory.

As #monkut notes, you probably shouldn't worry too much about memory management in most situations. If you do have a giant list that you're sure you're done with now and it won't go out of the current function's scope for a while, though:
del a simply removes your name a for that chunk of memory. If some other function or structure or whatever has a reference to it still, it won't be deleted; if this code has the only reference to that list under the name a and you're using CPython, the reference counter will immediately free that memory. Other implementations (PyPy, Jython, IronPython) might not kill it right away because they have different garbage collectors.
Because of this, the del a statement in your realse_list function doesn't actually do anything, because the caller still has a reference!
del a[:] will, as you note, remove the elements from the list and thus probably most of its memory usage.
You can do the_set.clear() for similar behavior with sets.
All you can do with a tuple, because they're immutable, is del the_tuple and hope nobody else has a reference to it -- but you probably shouldn't have enormous tuples!

If your worried about memory management and performance for data types why not use something like a linked double queue.
First its memory footprint is scattered though out the memory so you won't have to allocate a large chunk of continuous memory right off the bat.
Second you will see faster access times for enqueueing and dequeueing because unlike in a standard list when you remove lets say a middle element there is no need for sliding the rest of the list over in the index which takes time in large lists.
I should also note if you are using just integers I would suggest looking into a binary heap as you will see O(log^2n) access times compared to mostly O(N) with lists.

If you need to release list's memory, keeping the list's name, you can simply write a=[]

Deletion of a list in python with and without ':' operator

I've been working with python for quite a bit of time and I'm confused regarding few issues in the areas of Garbage Collection, memory management as well as the real deal with the deletion of the variables and freeing memory.
>>> pop = range(1000)
>>> p = pop[100:700]
>>> del pop[:]
>>> pop
[]
>>> p
[100.. ,200.. 300...699]
In the above piece of code, this happens. But,
>>> pop = range(1000)
>>> k = pop
>>> del pop[:]
>>> pop
[]
>>> k
[]
Here in the 2nd case, it implies that the k is just pointing the list 'pop'.
First Part of the question :
But, what's happening in the 1st code block? Is the memory containing [100:700] elements not getting deleted or is it duplicated when list 'p' is created?
Second Part of the question :
Also, I've tried including gc.enable and gc.collect statements in between wherever possible but there's no change in the memory utilization in both the codes. This is kind of puzzling. Isn't this bad that python is not returning free memory back to OS? Correct me if I'm wrong in the little research I've did. Thanks in advance.

Slicing a sequence results in a new sequence, with a shallow copy of the appropriate elements.
Returning the memory to the OS might be bad, since the script may turn around and create new objects, at which point Python would have to request the memory from the OS again.

1st part:
In the 1st code block, you create a new object where the elements of the old one are copied before deleting that one.
In the 2nd code block, however, you just assign a reference to the same object to another variable. Then you empty the list, which, of course, is visible via both references.
2nd part: Memory is returned when appropriate, but not always. Under the hood of Python, there is a memory allocator which has control over where the memory comes from. There are 2 ways: via the brk()/sbrk() mechanism (for smaller memory blocks) and via mmap() (larger blocks).
Here we have rather smaller blocks which get allocated directly at the end of the data segment:
datadatadata object1object1 object2object2
If we only free object1, we have a memory gap which can be reused for the next object, but cannot easily freed and returned to the OS.
If we free both objects, memory could be returned. But there probably is a threshold for keeping memory back for a while, because returning everything immediately is not the very best thing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.