I have a ponter to a dict() and I keep update it, like that:
def creating_new_dict():
MyDict = dict()
#... rest of code - get data to MyDict
return MyDict
def main():
while(1):
MyDict = creating_new_dict()
time.sleep(20)
I know that MyDict is a pointer to the dict so My question is what happens to the old data in MyDict?
Is it deleted or should I make sure it's gone?
Coming back around to answer your question since nobody else did:
I know that MyDict is a pointer to the dict
Ehhh. MyDict is a reference to your dict object, not a pointer. Slight digression follows:
You can't mutate immutable types within a function, because you're handling references, not pointers. See:
def adder(x):
x = x + 1
a = 0
adder(a)
a
Out[4]: 0
I won't go too far down the rabbit hole, but just this: there is nothing I could have put into the body of adder that would have caused a to change. Because variables are references, not pointers.
Okay, we're back.
so My question is what happens to the old data in MyDict?
Well, it's still there, in some object somewhere, but you blew up the reference to it. The reference is no more, it has ceased to be. In fact, there are exactly zero extant references to the old dict, which will be significant in the next paragraph!
Is it deleted
It's not deleted per se, but you could basically think of it as deleted. It has no remaining references, which means it is eligible for garbage collection. Python's gc, when triggered, will come along and get rid of all the objects that have no remaining references, which includes all your orphaned dicts.
But like I said in my comment: gc will not be triggered -now-, but some time in the (possibly very near) future. How soon that happens depends on.. stuff, including what flavor of python you're running. cPython (the one you're probably running) will usually gc the object immediately, but this is not guaranteed and should not be relied upon - it is left up to the implementation. What you can count on - the gc will come along and clean up eventually. (well, as long as you haven't disabled it)
or should I make sure it's gone?
Nooooo. Nope. Don't do the garbage collector's job. Only when you have a very specific performance reason should you do so.
I still don't believe you, show me an example.
Er, you said that, right? Okay:
class DeleteMe:
def __del__(self): #called when the object is destroyed, i.e. gc'd
print(':(')
d = DeleteMe()
d = DeleteMe() # first one will be garbage collected, leading to sad face
:(
And, like I said before, you shouldn't count on :( happening immediately, just that gc will happen eventually.
Related
Is there a way to check if a generator is in use anywhere globally? Such that an active generator will bail no one is using it.
This is mostly academic but I can think of numerous situations where it would be good to detect this. So you understand, here is an example:
def accord():
_accord = None
_inuse = lambda: someutilmodule.scopes_using(_accord) > 1
def gen():
uid = 0
while _inuse():
uid += 1
yield uid
else:
print("I'm done, although you obviously forgot about me.")
_accord = gen()
return _accord
a = accord()
a.__next__()
a.__next__()
a.__next__()
a = None
"""
<<< 1
<<< 2
<<< 3
<<< I'm done, although you obviously forgot about me.
"""
The triple quote is the text I would expect to see if someutilmodule.scopes_using reported the number of uses of the variable. By uses I mean how many copies or references exist.
Note the that the generator has an infinite loop which is generally bad practice but in cases like a unique id generator and other not widely or complexly used, it is often useful and won't create huge overhead. Obviously another way would simply be to expose a function or method that would see the flag where that the loop was using as it's condition. But again it's good to know ways to do various ways to do things.
In this case, when you do
a = accord()
A reference counter behind the scenes keeps track of the fact that a variable is referencing that generator object. This keeps it in memory because there's a chance it may be needed in the future.
Once you do this however:
a = None
The reference to the generator is lost, and the reference counter associated with it is decremented. Once it reaches 0 (which it would, because you only had one reference to it), the system knows that nothing can ever refer to that object again, which frees the data associated with that object up for garbage collection.
This is all handled behind the scenes. There's no need for you to intervene.
The best way to see what's going on, for better or worse, is to examine the relevant source code for CPython. Ultimately, _Py_DECREF is called when references are lost. You can see a little further down, after interpreting some convoluted logic, that once the reference is 0, _Py_Dealloc(op); is called on PyObject *op. I can't for the life of me find the actual call to free though that I'm sure ultimately results from _Py_Dealloc. It seems to be somewhere in the Py_TRASHCAN_END macro, but good lord. That's one of the longest rabbit holes I've ever gone down where I have nothing to show for it.
This loop is used in barcode scanning software. It may run as many times as a barcode is scanned, which is hundreds of times in an hour.
# locpats is a list of regular expression patterns of possible depot locations
for pat in locpats:
q = re.match(pat, scannedcode)
if q:
print(q)
return True
q is a Match object. The print(q) tells me that every match object gets its own little piece of memory. They'll add up. I have no idea to what amount in total.
I don't need the Match object anymore once inside the if. Should I wipe it, like so?
q = re.match(pat, scannedcode)
if q:
q = None
return True
Or is there a cleaner way? Should I bother at all?
If I understand right (from this), garbage collection with gc.collect() won't happen until a process terminates, which in my case is at the end of the day when the user is done scanning. Until that time, these objects won't be regarded as garbage, even.
cPython uses reference counting (plus some cyclical reference detection, not applicable here) to handle gc of objects. Once an object reaches 0 extant references, it will be immediately gc'd.
In the case of your loop:
for pat in locpats:
q = re.match(pat, scannedcode)
Each successive pat in locpats binds a new re.match object to q. This implies that the old re.match object has 0 remaining references, and will be immediately garbage collected. A similar situation applies when you return from your function.
This is all an implementation detail of cPython; other flavors of python will handle gc differently. In all cases, don't prematurely optimize. Unless you can pinpoint a specific reason to do so, leaving the gc alone is likely to be the most performant solution.
This is not a problem, since q is local, and therefore won't persist after you return.
If you want to make yourself feel better, you can try
if re.match(pat, scannedcode):
return True
which will do what you're doing now without ever naming the match - but it won't change your memory footprint.
(I'm assuming that you don't care about the printed value at all, it's just diagnostic)
If your print statement is showing that each match is getting its own piece of memory then it looks like one of two things is happening:
1) As others have mentioned you are not using CPython as your interpreter and the interpreter you have chosen is doing something strange with garbage collection
2) There is code you haven't shown us here which is keeping a reference to the match object so that the GC code never frees it as the reference count to the match object never reaches zero
Is either of these the case?
I'm reading in a collection of objects (tables like sqlite3 tables or dataframes) from an Object Oriented DataBase, most of which are small enough that the Python garbage collector can handle without incident. However, when they get larger in size (less than 10 MB's) the GC doesn't seem to be able to keep up.
psuedocode looks like this:
walk = walkgenerator('/path')
objs = objgenerator(walk)
with db.transaction(bundle=True, maxSize=10000, maxParts=10):
oldobj = None
oldtable = None
for obj in objs:
currenttable = obj.table
if oldtable and oldtable in currenttable:
db.delete(oldobj.path)
del oldtable
oldtable = currenttable
del oldobj
oldobj = obj
if not count % 100:
gc.collect()
I'm looking for an elegant way to manage memory while allowing Python to handle it when possible.
Perhaps embarrassingly, I've tried using del to help clean up reference counts.
I've tried gc.collect() at varying modulo counts in my for loops:
100 (no difference),
1 (slows loop quite a lot, and I will still get a memory error of some type),
3 (loop is still slow but memory still blows up eventually)
Suggestions are appreciated!!!
Particularly, if you can give me tools to assist with introspection. I've used Windows Task Manager here, and it seems to more or less randomly spring a memory leak. I've limited the transaction size as much as I feel comfortable, and that seems to help a little bit.
There's not enough info here to say much, but what I do have to say wouldn't fit in a comment so I'll post it here ;-)
First, and most importantly, in CPython garbage collection is mostly based on reference counting. gc.collect() won't do anything for you (except burn time) unless trash objects are involved in reference cycles (an object A can be reached from itself by following a chain of pointers transitively reachable from A). You create no reference cycles in the code you showed, but perhaps the database layer does.
So, after you run gc.collect(), does memory use go down at all? If not, running it is pointless.
I expect it's most likely that the database layer is holding references to objects longer than necessary, but digging into that requires digging into exact details of how the database layer is implemented.
One way to get clues is to print the result of sys.getrefcount() applied to various large objects:
>>> import sys
>>> bigobj = [1] * 1000000
>>> sys.getrefcount(bigobj)
2
As the docs say, the result is generally 1 larger than you might hope, because the refcount of getrefcount()'s argument is temporarily incremented by 1 simply because it is being used (temporarily) as an argument.
So if you see a refcount greater than 2, del won't free the object.
Another way to get clues is to pass the object to gc.get_referrers(). That returns a list of objects that directly refer to the argument (provided that a referrer participates in Python's cyclic gc).
BTW, you need to be clearer about what you mean by "doesn't seem to work" and "blows up eventually". Can't guess. What exactly goes wrong? For example, is MemoryError raised? Something else? Traebacks often yield a world of useful clues.
I've a python code where the memory consumption steadily grows with time. While there are several objects which can legitimately grow quite large, I'm trying to understand whether the memory footprint I'm observing is due to these objects, or is it just me littering the memory with temporaries which don't get properly disposed of --- Being a recent convert from a world of manual memory management, I guess I just don't exactly understand some very basic aspects of how the python runtime deals with temporary objects.
Consider a code with roughly this general structure (am omitting irrelevant details):
def tweak_list(lst):
new_lst = copy.deepcopy(lst)
if numpy.random.rand() > 0.5:
new_lst[0] += 1 # in real code, the operation is a little more sensible :-)
return new_lst
else:
return lst
lst = [1, 2, 3]
cache = {}
# main loop
for step in xrange(some_large_number):
lst = tweak_list(lst) # <<-----(1)
# do something with lst here, cut out for clarity
cache[tuple(lst)] = 42 # <<-----(2)
if step%chunk_size == 0:
# dump the cache dict to a DB, free the memory (?)
cache = {} # <<-----(3)
Questions:
What is the lifetime of a new_list created in a tweak_list? Will it be destroyed on exit, or will it be garbage collected (at which point?). Will repeated calls to tweak_list generate a gazillion of small lists lingering around for a long time?
Is there a temporary creation when converting a list to a tuple to be used as a dict key?
Will setting a dict to an empty one release the memory?
Or, am I approaching the issue at hand from a completely wrong perspective?
new_lst is cleaned up when the function exists when not returned. It's reference count drops to 0, and it can be garbage collected. On current cpython implementations that happens immediately.
If it is returned, the value referenced by new_lst replaces lst; the list referred to by lst sees it's reference count drop by 1, but the value originally referred to by new_lst is still being referred to by another variable.
The tuple() key is a value stored in the dict, so that's not a temporary. No extra objects are created other than that tuple.
Replacing the old cache dict with a new one will reduce the reference count by one. If cache was the only reference to the dict it'll be garbage collected. This then causes the reference count for all contained tuple keys to drop by one. If nothing else references to those those will be garbage collected.
Note that when Python frees memory, that does not necessarily mean the operating system reclaims it immediately. Most operating systems will only reclaim the memory when it is needed for something else, instead presuming the program might need some or all of that memory again soon.
You might want to take a look at Heapy as a way of profiling memory usage. I think PySizer is also used in some instances for this but I am not familiar with it. ObjGraph is also a strong tool to take a lok at.
In many cases, you are sure you definitely won't use the list again, so you want the memory to be released right now.
a = [11,22,34,567,9999]
del a
I'm not sure if the above really releases the memory. You can use:
del a[:]
that actually removes all the elements in list a.
Is that the best way to release the memory?
def realse_list(a):
del a[:]
del a
I have the same question about tuples and sets.
def release_list(a):
del a[:]
del a
Do not ever do this. Python automatically frees all objects that are not referenced any more, so a simple del a ensures that the list's memory will be released if the list isn't referenced anywhere else. If that's the case, then the individual list items will also be released (and any objects referenced only from them, and so on and so on), unless some of the individual items were also still referenced.
That means the only time when del a[:]; del a will release more than del a on its own is when the list is referenced somewhere else. This is precisely when you shouldn't be emptying out the list: someone else is still using it!!!
Basically, you shouldn't be thinking about managing pieces of memory. Instead, think about managing references to objects. In 99% of all Python code, Python cleans up everything you don't need pretty soon after the last time you needed it, and there's no problem. Every time a function finishes all the local variables in that function "die", and if they were pointing to objects that are not referenced anywhere else they'll be deleted, and that will cascade to everything contained within those objects.
The only time you need to think about it is when you have a large object (say a huge list), you do something with it, and then you begin a long-running (or memory intensive) sub-computation, where the large object isn't needed for the sub-computation. Because you have a reference to it, the large object won't be released until the sub-computation finishes and then you return. In that sort of case (and only that sort of case), you can explicitly del your reference to the large object before you begin the sub-computation, so that the large object can be freed earlier (if no-one else is using it; if a caller passed the object in to you and the caller does still need it after you return, you'll be very glad that it doesn't get released).
Python uses Reference Count to manage its resource.
import sys
class foo:
pass
b = foo()
a = [b, 1]
sys.getrefcount(b) # gives 3
sys.getrefcount(a) # gives 2
a = None # delete the list
sys.getrefcount(b) # gives 2
In the above example, b's reference count will be incremented when you put it into a list, and as you can see, when you delete the list, the reference count of b get decremented too. So in your code
def release_list(a):
del a[:]
del a
was redundant.
In summary, all you need to do is assigning the list into a None object or use del keyword to remove the list from the attributes dictionary. (a.k.a, to unbind the name from the actual object). For example,
a = None # or
del a
When the reference count of an object goes to zero, python will free the memory for you. To make sure the object gets deleted, you have to make sure no other places reference the object by name, or by container.
sys.getrefcount(b) # gives 2
If sys.getrefcount gives you 2, that means you are the only one who had the reference of the object and when you do
b = None
it will get freed from the memory.
As #monkut notes, you probably shouldn't worry too much about memory management in most situations. If you do have a giant list that you're sure you're done with now and it won't go out of the current function's scope for a while, though:
del a simply removes your name a for that chunk of memory. If some other function or structure or whatever has a reference to it still, it won't be deleted; if this code has the only reference to that list under the name a and you're using CPython, the reference counter will immediately free that memory. Other implementations (PyPy, Jython, IronPython) might not kill it right away because they have different garbage collectors.
Because of this, the del a statement in your realse_list function doesn't actually do anything, because the caller still has a reference!
del a[:] will, as you note, remove the elements from the list and thus probably most of its memory usage.
You can do the_set.clear() for similar behavior with sets.
All you can do with a tuple, because they're immutable, is del the_tuple and hope nobody else has a reference to it -- but you probably shouldn't have enormous tuples!
If your worried about memory management and performance for data types why not use something like a linked double queue.
First its memory footprint is scattered though out the memory so you won't have to allocate a large chunk of continuous memory right off the bat.
Second you will see faster access times for enqueueing and dequeueing because unlike in a standard list when you remove lets say a middle element there is no need for sliding the rest of the list over in the index which takes time in large lists.
I should also note if you are using just integers I would suggest looking into a binary heap as you will see O(log^2n) access times compared to mostly O(N) with lists.
If you need to release list's memory, keeping the list's name, you can simply write a=[]