Memory behaivor of Python - python

I have a list which will get really big. So I will save the List on my HDD and continue with an empty list. My question is: when I do myList[] will the old data be deleted or will it remain somewhere on the Ram. I fear, that the pointer of myList will just point somewhere else and and the old data will not be toched.
myList = []
for i in range(bigNumber1)
for k in range(bigNumber2)
myList.append( bigData(i,k) )
savemat("data"+str(i), "data":{myList})
myList = []

Good day.
In python and many other programming languages, object pointers that reference an unused object will be collected by the garbage collector, a feature that looks for these objects and clears them from memory. How this is done exactly under the hook, can be read about in more detail here:
https://stackify.com/python-garbage-collection/
Happy codings!

Python uses Garbage Collection for memory management (read more here).
The garbage collector attempts to reclaim memory which was allocated
by the program, but is no longer referenced—also called garbage.
So your data will automatically be deleted. However, if you want to sure that the memory is free at a particular point, you can call the GC directly with
import gc
gc.collect()
This is not recommended though.

Related

Automatically freeing memory of no more used variables in python

I'm new in Python.
Let's say, I use large pandas data frames.
My code looks something like:
all_data = pd.read_csv(huge_file_name)
part_data = all_data.loc['ColumnName1', 'ColumnName2','ColumnName3']
data_filtered = part_data.loc[:,part_data['ColumnName2']==-1]
and so on.
Is some way, that python can delete all_data, part_data and other variables no more used?
I can write del var_name, but it will change the code to be very dirty.
Also I can use for all variables the same name, but it also doesn't look good.
Thank you all in advance!
The del keyword is the way to do it; I'm not sure there's much to be done about your concern for making the code "dirty." Python people like to say that explicit is better than implicit, and this would be an instance of that.
Otherwise declare the intermediate variables within a function scope and the space used by those variables will be freed (or rather marked for "garbage collection"; see below) when the function terminates.
So you could:
import gc
all_data = pd.read_csv(huge_file_name)
part_data = all_data.loc['ColumnName1', 'ColumnName2','ColumnName3']
data_filtered = part_data.loc[:,part_data['ColumnName2']==-1]
del all_data, part_data
# and if you're impatient for that memory to be freed, like RIGHT now
gc.collect()
Or you could:
import gc
def filter_data(infile):
all_data = pd.read_csv(infile)
part_data = all_data.loc['ColumnName1', 'ColumnName2','ColumnName3']
return part_data.loc[:,part_data['ColumnName2']==-1]
data_filtered = filter_data(huge_file_name)
# force out-of-scope variables to be garbage collected RIGHT now
gc.collect()
The del keyword releases a variable from the local scope so it can be (eventually) garbage collected, but the memory freed when variables go out of scope may not be immediately returned to the operating system. The SO thread AMC helpfully pointed you to has details.
Garbage collection strategies are PhD-level computer science stuff, but my intuition is that GC is only triggered when there is some "pressure" on the Python runtime to release some memory; as in, new variable declarations that would need to use some memory previously in use by out-of-scope variables.
You were careful to point out that this is a large CSV file being read into a single (Pandas) data structure, but be mindful of the fact that out-of-scope variables are normally automatically garbage collected, and usually you do not need to micro-manage this process yourself.
Here is some background on garbage collection in Python that you may find illuminating, and here is a discussion of other times when del is useful (deleting slices out of a list, for example).

How to return used memory after funtion call in python

I am trying to write a python module which checks consistency of the mac addresses stored in the HW memory. The scale could go upto 80K mac addresses. But when I make multiple calls to get a list of mac addresses through a python method, the memory does not get freed up and eventually I am running out of memory.
An example of what I am doing is:
import resource
import copy
def get_list():
list1 = None
list1 = []
for j in range(1,10):
for i in range(0,1000000):
list1.append('abcdefg')
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
return list1
for i in range(0,5):
x=get_list()
On executing the script, I get:
45805
53805
61804
69804
77803
85803
93802
101801
109805
118075
126074
134074
142073
150073
158072
166072
174071
182075
190361
198361
206360
214360
222359
230359
238358
246358
254361
262365
270364
278364
286363
294363
302362
310362
318361
326365
334368
342368
350367
358367
366366
374366
382365
390365
398368
i.e. the memory usage reported keeps going up.
Is it that I am looking at the memory usage in a wrong way?
And if not, is there a way to not have the memory usage go up between function calls in a loop. (In my case with mac addresses, I do not call the same list of mac addresses again. I get the list from a different section of the HW memory. i.e. all the calls to get mac addresses are valid, but after each call the data obtained is useless and can be discarded.
Python is a managed language. Memory is, generally speaking, the concern of the implementation rather than the average developer. The system is designed to reclaim memory that you are no longer using automatically.
If you are using CPython, an object will be destroyed when its reference count reaches zero, or when the cyclic garbage collector finds and collects it. If you want to reclaim the memory belonging to an object, you need to ensure that no references to it remain, or at least that it is not reachable from any stack frame's variables. That is to say, it should not be possible to refer to the data you want reclaimed, either directly or through some expression such as foo.bar[42], from any currently executing function.
If you are using another implementation, such as PyPy, the rules may vary. In particular, reference counting is not required by the Python language standard, so objects may not go away until the next garbage collection run (and then you may have to wait for the right generation to be collected).
For older versions of Python (prior to Python 3.4), you also need to worry about reference cycles which involve finalizers (__del__() methods). The old garbage collector cannot collect such cycles, so they will (basically) get leaked. Most built-in types do not have finalizers, are not capable of participating in reference cycles, or both, but this is a legitimate concern if you are creating your own classes.
For your use case, you should empty or replace the list when you no longer need its contents (with e.g. list1 = [] or del list1[:]), or return from the function which created it (assuming it's a local variable, rather than a global variable or some other such thing). If you find that you are still running out of memory after that, you should either switch to a lower-overhead language like C or invest in more memory. For more complicated cases, you can use the gc module to test and evaluate how the garbage collector is interacting with your program.
Try this : it might not Lways free the memory as it may still be in use.
See if it works
gc.collect()

Python releasing memory of dictionary

I'm having a problem with releasing the memory of a dictionary in Python.
I run the following check and followed the process memory usage:
a = dict()
for i in xrange(1000000):
a[i] = i
for i in xrange(1000000):
del a[i]
gc.collect()
the memory usage after running those lines is much higher than before.
how can I release all of the memory?
notice I don't want to delete the dict itself.
thanks.
Simply removing all the elements from the dictionary is not going to remove the dictionary from the memory. Python uses the reference counting technique. So, only when the reference count of an object drops to 0, it will be ready for garbage collection. So, your best bet would be to remove the reference a from referring to the actual dictionary like this
a = None
if the dictionary has no other references, the dictionary referred by a will be garbage collected automatically.

memory consumption and lifetime of temporaries

I've a python code where the memory consumption steadily grows with time. While there are several objects which can legitimately grow quite large, I'm trying to understand whether the memory footprint I'm observing is due to these objects, or is it just me littering the memory with temporaries which don't get properly disposed of --- Being a recent convert from a world of manual memory management, I guess I just don't exactly understand some very basic aspects of how the python runtime deals with temporary objects.
Consider a code with roughly this general structure (am omitting irrelevant details):
def tweak_list(lst):
new_lst = copy.deepcopy(lst)
if numpy.random.rand() > 0.5:
new_lst[0] += 1 # in real code, the operation is a little more sensible :-)
return new_lst
else:
return lst
lst = [1, 2, 3]
cache = {}
# main loop
for step in xrange(some_large_number):
lst = tweak_list(lst) # <<-----(1)
# do something with lst here, cut out for clarity
cache[tuple(lst)] = 42 # <<-----(2)
if step%chunk_size == 0:
# dump the cache dict to a DB, free the memory (?)
cache = {} # <<-----(3)
Questions:
What is the lifetime of a new_list created in a tweak_list? Will it be destroyed on exit, or will it be garbage collected (at which point?). Will repeated calls to tweak_list generate a gazillion of small lists lingering around for a long time?
Is there a temporary creation when converting a list to a tuple to be used as a dict key?
Will setting a dict to an empty one release the memory?
Or, am I approaching the issue at hand from a completely wrong perspective?
new_lst is cleaned up when the function exists when not returned. It's reference count drops to 0, and it can be garbage collected. On current cpython implementations that happens immediately.
If it is returned, the value referenced by new_lst replaces lst; the list referred to by lst sees it's reference count drop by 1, but the value originally referred to by new_lst is still being referred to by another variable.
The tuple() key is a value stored in the dict, so that's not a temporary. No extra objects are created other than that tuple.
Replacing the old cache dict with a new one will reduce the reference count by one. If cache was the only reference to the dict it'll be garbage collected. This then causes the reference count for all contained tuple keys to drop by one. If nothing else references to those those will be garbage collected.
Note that when Python frees memory, that does not necessarily mean the operating system reclaims it immediately. Most operating systems will only reclaim the memory when it is needed for something else, instead presuming the program might need some or all of that memory again soon.
You might want to take a look at Heapy as a way of profiling memory usage. I think PySizer is also used in some instances for this but I am not familiar with it. ObjGraph is also a strong tool to take a lok at.

Deletion of a list in python with and without ':' operator

I've been working with python for quite a bit of time and I'm confused regarding few issues in the areas of Garbage Collection, memory management as well as the real deal with the deletion of the variables and freeing memory.
>>> pop = range(1000)
>>> p = pop[100:700]
>>> del pop[:]
>>> pop
[]
>>> p
[100.. ,200.. 300...699]
In the above piece of code, this happens. But,
>>> pop = range(1000)
>>> k = pop
>>> del pop[:]
>>> pop
[]
>>> k
[]
Here in the 2nd case, it implies that the k is just pointing the list 'pop'.
First Part of the question :
But, what's happening in the 1st code block? Is the memory containing [100:700] elements not getting deleted or is it duplicated when list 'p' is created?
Second Part of the question :
Also, I've tried including gc.enable and gc.collect statements in between wherever possible but there's no change in the memory utilization in both the codes. This is kind of puzzling. Isn't this bad that python is not returning free memory back to OS? Correct me if I'm wrong in the little research I've did. Thanks in advance.
Slicing a sequence results in a new sequence, with a shallow copy of the appropriate elements.
Returning the memory to the OS might be bad, since the script may turn around and create new objects, at which point Python would have to request the memory from the OS again.
1st part:
In the 1st code block, you create a new object where the elements of the old one are copied before deleting that one.
In the 2nd code block, however, you just assign a reference to the same object to another variable. Then you empty the list, which, of course, is visible via both references.
2nd part: Memory is returned when appropriate, but not always. Under the hood of Python, there is a memory allocator which has control over where the memory comes from. There are 2 ways: via the brk()/sbrk() mechanism (for smaller memory blocks) and via mmap() (larger blocks).
Here we have rather smaller blocks which get allocated directly at the end of the data segment:
datadatadata object1object1 object2object2
If we only free object1, we have a memory gap which can be reused for the next object, but cannot easily freed and returned to the OS.
If we free both objects, memory could be returned. But there probably is a threshold for keeping memory back for a while, because returning everything immediately is not the very best thing.

Categories