Code block in python in order to free memory - python

Pretty simple question:
I have some code to show some graphs, and it prepares data for the graphs, and I don't want to waste memory (limited)... is there a way to have a "local scope" so when we get to the end, everything inside is freed?
I come from C++ where you can define code inside { ... } so at the end everything is freed, and you don't have to care about anything
Anything like that in python?
The only thing I can think of is:
def tmp():
... code ...
tmp()
but is very ugly, and for sure I don't want to list all the del x at the end

If anything holds a reference to your object, it cannot be freed. By default, anything at the global scope is going to be held in the global namespace (globals()), and as far as the interpreter knows, the very next line of source code could reference it (or, another module could import it from this current module), so globals cannot be implicitly freed, ever.
This forces your hand to either explicitly delete references to objects with del, or to put them within the local scope of a function. This may seem ugly, but if you follow the philosophy that a function should do one thing and one thing well (thanks Unix!), you will already segment your code into functions already. On the one-off exceptions where you allocate a lot of memory early on in your function, and no longer need it midway through, you can del the reference to it.
I know this isn't the answer you want to hear, but its the reality of Python. You could accomplish something similar by nesting function defs or classs inside, but this is kinda hacky (or in the class case, which wouldn't require calling/instantiating, extremely hacky).
I will also mention, there is a gc built in module for interacting with the garbage collector. Here, you can trigger an immediate garbage collection (otherwise python will eventually get around to collecting the things you del refs to), as well as inspect how many references a given object has.
If you're curious where the allocations are happening, you can also use the built in tracemalloc module to trace said allocations.

Mechanism that handles freeing memory in Python is called "Garbage Collector" and it means there's no reason to use del in overwhelming majority of Python code.
When programming in Python, you are "not supposed" to care about such low level things as allocating and freeing memory for your variables.
That being said, putting your code into functions (although preferrably called something clearer than tmp()) is most definitely a good idea as it will make your code much more readable and "Pythonic"

Coming from C++ and already stumbled to one of the main diferences (drawbacks) of python and this is memory management.Python Garbage Collector will delete all the objects that will fall out of scope.Freeing up memory of objects althought doesnt guarantee that this memory will return actually to the system but instead a rather big portion will be kept reserved by the python programm even if not used.If you face a memory problem and you want to free your memory back to the system the only safe method is to run the memory intensive function into a seperate process.Every process in python have its own interpreter and any memory consumed by this process will return to the system when the process exits.

Related

Will making a new assignment for a variable in Python will change the old address of the variable?

In Python, when you write x=10, it reserves a memory location and essentially stores 10, right? Then, if you write x=20 will 20 replace the value of 10 (like C/C++ does) or will it write 20 to a new memory location and consider the old 10 as garbage?
Thanks in advance ;)
You do not have to manually free memory that you use.
Perhaps this is useful also.
garbage collection
The process of freeing memory when it is not used anymore. Python performs garbage collection via reference counting and a cyclic garbage collector that is able to detect and break reference cycles.
Sample on allocation (ints are immutable)
something=10
print(id(something)) # memory address
something=12
print(id(something))
140159603405344
140159603405408
You don't know. The Python Language Specification does not talk about things like "memory location" or "address".
It simply specifies the semantics of the code. Implementors are free to implement those semantics however they may wish.
For GraalPython, for example, I would guess that the compiler would completely optimize away the variable.

How can I understand if a memory address is used or not?

I am doing some experiments with the Python garbage collector, I would like to check if a memory address is used or not. In the following example, I have de-referenced the string (surely) at ls[2]. If I run the garbage collector, I can still see surely at the original address. I would like to be sure that the address is now writable. Is there a way to check it in Python?
from ctypes import string_at
from sys import getsizeof
import gc
ls = ['This','will be','surely','deleted']
idsurely= id(ls[2])
sizesurely = getsizeof(ls[2])
ls[2] = 'probably'
print(ls)
print(string_at(idsurely,sizesurely))
gc.collect()
# I check there is nothing in the garbage
print(gc.garbage)
print(string_at(idsurely,sizesurely))
I am interested in this mainly from a theoretical point of view so I am not saying that is something that has practical usage. My goal is to show how memory works for a tutorial. I want to show that the data is still there and that just that the bytes at the address can be now written. So the output of the script is up to now as expected. I just want to prove the last passage.
Not possible.
There is no central registry of used or unused memory addresses in Python. There isn't even a central registry of all objects (the cyclic GC doesn't know about all of them), and even if you had a registry of all objects, that wouldn't be enough to determine what memory locations are in use. Additionally, you can't just read arbitrary memory addresses, or write to arbitrary deallocated addresses. That'll quickly lead to segfaults or worse.
Finally, I would strongly advise against using this kind of thing in a tutorial even if you did find something to make it work. When you put something in a tutorial, a large fraction of people reading the tutorial are going to think it's something they're supposed to learn. Programming newbies should not be mislead into thinking that examining possibly-deallocated memory locations is something they should be doing.
Your experiments are way off base. id (solely as a CPython implementation detail) does get the memory address of the object in question, but we're talking about the Python object itself, not the data it contains. sys.getsizeof returns a number that roughly corresponds to how much memory the object occupies, but there is no guarantee that memory is contiguous.
By sheer coincidence, this almost works on str (though it will perform a buffer overread if the string in question has cached copies of its UTF-8 or wchar_t form, so you're risking crashing your program), but even then your test is flawed; CPython interns string literals that look like legal variable names, so if the string in question appears as a literal anywhere else in your program (including as the name of some class or function in some module you imported), it won't actually go away when you replace it. Similar implicit caches can occur if the literal string appears in any function, anywhere (it ends up being not only interned, but stored in the constants for that function).
Update: On testing, in an actual script, the reference count for 'surely' when you hold onto a copy of it is 3, which drops to 2 when you replace it with 'probably'. Turns out constants are being cached even at global scope. The only reason the interactive interpreter doesn't exhibit this behavior is that it effectively evals each line separately, so the constant cache is discarded when the eval completes.
And even if all that's not a problem, most (almost all) memory managers (CPython's specialized small object heap and the general heap it's built on) don't actually zero out memory when its released, so if you do look at the same address shortly after it really was released, it'll probably have pretty similar data in it.
Lastly, your gc.collect() call won't change anything except by coincidence (of whatever happens during gc possibly allocating memory by side-effect). str is not a garbage collected type, as it cannot contain references to other Python objects, so it's impossible for it to be a link in a reference cycle, and the CPython garbage collector is solely concerned with collecting cyclic garbage; CPython is reference counted, so anything that's not part of a reference cycle is cleaned up automatically and immediately when the last reference disappears.
The short answer this all leads up to is: There is no way to determine, within CPython, non-heuristically, if a particular memory address has been released to the free store and made available for reuse. CPython's memory management scheme is pure implementation detail, and exposing APIs at that level of detail would create compatibility concerns when people depended on them.
The closest you're going to get is using something like the tracemalloc module to perform basic snapshotting and compute differences in the snapshot. That's not going to give you a window into whether a specific address is still in use though AFAICT; at best it can tell you where an address that's definitely in use was allocated.
The other approach (specific to CPython) you can use is to just check the reference counts before replacing the object; sys.getrefcount for a given name/attribute reports 2, then deling (or rebinding) that name/attribute will release it (assuming no threads that might create additional references between the test and the del/rebind). You expect 2, not 1, because calling sys.getrefcount creates a temporary reference to the object in question. If it reports a number greater than 2, deling/rebinding could still lead to the object being deleted eventually when the cyclic garbage collectors runs, if the object was part of a reference cycle, but for a reference count of 2 (or 1 for something otherwise unnamed, e.g. sys.getrefcount(''.join(('f', '9')) or the like), the behavior will be deterministic.
From the documentation about gc:
... the collector supplements the reference counting already used in Python...
And from gc.is_tracked():
Returns True if the object is currently tracked by the garbage collector, False otherwise. As a general rule, instances of atomic types aren’t tracked and instances of non-atomic types (containers, user-defined objects…) are.
Strings are not tracked by the garbage collector:
In [1]: import gc
In [2]: test = 'surely'
Out[2]: 'surely'
In [3]: gc.is_tracked(test)
Out[3]: False
Looking at the documentation, there doesn't seem to be a method for accessing the reference counting from within the language.
Note that at least for me, using string_at doesn't work from the interactive interpreter. It does work in a script.

How can I delete some variables in NumPy/SciPy?

I'm transiting from Matlab/Octave to NumPy/SciPy. When I use Matlab in the interactive mode, I used clear or clear [some_variable] from time to time to remove that variable from memory. For example, before reading some new data to start a new sets of experiments, I used to clear data in Matlab.
How could I do the same thing with NumPy/SciPy?
I did some research, and I found there is a command called del, but I heard that del actually doesn't clear memory, but the variable disappears from the namespace instead. Am I right?
That being said, what would be the best way to mimic "clear" of Matlab in NumPy/SciPy?
del(obj) will work, according to the scipy mail list
If you're working in IPython, then you can also use %xdel obj
...but I heard that "del" actually doesn't clear memory, but the
variable disappears from the namespace. Am I right?
Yes, that's correct. That's what garbage collection is, Python will handle clearing the memory when it makes sense to, you don't need to worry about it, as from your end the variable no longer exists. Your code will behave the same, whether or not garbage collection has occurred yet won't matter, so there's no need for an alternative to del.
If you are curious about the differences of Matlab and Pythons garbage collection / memory allocation, you can read this SO thread on it.

Python: is the "old" memory free'd when a variable is assigned new content?

If a variable is assigned any new content, will the memory allocated for the "old content" be "properly" free'd? For example, in the following script, will the memory for variable "a" as an array of zeros be free'd after "a" is assigned some new stuff
import numpy
a = numpy.zeros(1000)
a = a+1
I would imaging Python is smart enough to do everything cleanly, using the so-called 'garbage collection', which I never really be able to read through. Any confirmation? I'd appreciate it.
Eventually, the old memory will be freed, though you cannot predict when this will happen. It is dependent on the Python implementation and many other factors.
That said, for the example you gave and the CPython implementation, the old array should be garbage collected during the assignment.
(Note that NumPy arrays are a particularly complex example for discussing garbage-collector behaviour.)
You can find the answer by playing with gc module (and probably finetuning). It provides the ability to disable the collector, tune the collection frequency, and set debugging options. It also provides access to unreachable objects that the collector found but cannot free.
See http://docs.python.org/library/gc.html

Are there any Python reference counting/garbage collection gotchas when dealing with C code?

Just for the sheer heck of it, I've decided to create a Scheme binding to libpython so you can embed Python in Scheme programs. I'm already able to call into Python's C API, but I haven't really thought about memory management.
The way mzscheme's FFI works is that I can call a function, and if that function returns a pointer to a PyObject, then I can have it automatically increment the reference count. Then, I can register a finalizer that will decrement the reference count when the Scheme object gets garbage collected. I've looked at the documentation for reference counting, and don't see any problems with this at first glance (although it may be sub-optimal in some cases). Are there any gotchas I'm missing?
Also, I'm having trouble making heads or tails of the cyclic garbage collector documentation. What things will I need to bear in mind here? In particular, how do I make Python aware that I have a reference to something so it doesn't collect it while I'm still using it?
Your link to http://docs.python.org/extending/extending.html#reference-counts is the right place. The Extending and Embedding and Python/C API sections of the documentation are the ones that will explain how to use the C API.
Reference counting is one of the annoying parts of using the C API. The main gotcha is keeping everything straight: Depending on the API function you call, you may or may not own the reference to the object you get. Be careful to understand whether you own it (and thus cannot forget to DECREF it or give it to something that will steal it) or are borrowing it (and must INCREF it to keep it and possibly to use it during your function). The most common bugs involving this are 1) remembering incorrectly whether you own a reference returned by a particular function and 2) believing you're safe to borrow a reference for a longer time than you are.
You do not have to do anything special for the cyclic garbage collector. It's just there to patch up a flaw in reference counting and doesn't require direct access.
The biggest gotcha I know with ref counting and the C API is the __del__ thing. When you have a borrowed reference to something, you think you can get away without INCREF'ing because you don't give up the GIL while you use that reference. But, if you end up deleting an object (by, for example, removing it from a list), it's possible that you trigger a __del__ call, which might remove the reference you're borrowing from under your feet. Very tricky.
If you INCREF (and then DECREF, of course) all borrowed references as soon as you get them, there shouldn't be any problem.

Categories