How to stop memory leaks when using `as_ptr()`? - python

Since it's my first time learning systems programming, I'm having a hard time wrapping my head around the rules. Now, I got confused about memory leaks. Let's consider an example. Say, Rust is throwing a pointer (to a string) which Python is gonna catch.
In Rust, (I'm just sending the pointer of the CString)
use std::ffi::CString;
pub extern fn do_something() -> *const c_char {
CString::new(some_string).unwrap().as_ptr()
}
In Python, (I'm dereferencing the pointer)
def call_rust():
lib = ctypes.cdll.LoadLibrary(rustLib)
lib.do_something.restype = ctypes.c_void_p
c_pointer = lib.do_something()
some_string = ctypes.c_char_p(c_pointer).value
Now, my question is about freeing the memory. I thought it should be freed in Python, but then ownership pops in. Because, as_ptr seems to take an immutable reference. So, I got confused about whether I should free the memory in Rust or Python (or both?). If it's gonna be Rust, then how should I go about freeing it when the control flow has landed back into Python?

Your Rust function do_something constructs a temporary CString, takes a pointer into it, and then drops the CString. The *const c_char is invalid from the instant you return it. If you're on nightly, you probably want CString#into_ptr instead of CString#as_ptr, as the former consumes the CString without deallocating the memory. On stable, you can mem::forget the CString. Then you can worry about who is supposed to free it.
Freeing from Python will be tricky or impossible, since Rust may use a different allocator. The best approach would be to expose a Rust function that takes a c_char pointer, constructs a CString for that pointer (rather than copying the data into a new allocation), and drops it. Unfortunately the middle part (creating the CString) seems impossible on stable for now: CString::from_ptr is unstable.
A workaround would to pass (a pointer to) the entire CString to Python and provide an accessor function to get the char pointer from it. You simply need to box the CString and transmute the box to a raw pointer. Then you can have another function that transmutes the pointer back to a box and lets it drop.

Related

Code block in python in order to free memory

Pretty simple question:
I have some code to show some graphs, and it prepares data for the graphs, and I don't want to waste memory (limited)... is there a way to have a "local scope" so when we get to the end, everything inside is freed?
I come from C++ where you can define code inside { ... } so at the end everything is freed, and you don't have to care about anything
Anything like that in python?
The only thing I can think of is:
def tmp():
... code ...
tmp()
but is very ugly, and for sure I don't want to list all the del x at the end
If anything holds a reference to your object, it cannot be freed. By default, anything at the global scope is going to be held in the global namespace (globals()), and as far as the interpreter knows, the very next line of source code could reference it (or, another module could import it from this current module), so globals cannot be implicitly freed, ever.
This forces your hand to either explicitly delete references to objects with del, or to put them within the local scope of a function. This may seem ugly, but if you follow the philosophy that a function should do one thing and one thing well (thanks Unix!), you will already segment your code into functions already. On the one-off exceptions where you allocate a lot of memory early on in your function, and no longer need it midway through, you can del the reference to it.
I know this isn't the answer you want to hear, but its the reality of Python. You could accomplish something similar by nesting function defs or classs inside, but this is kinda hacky (or in the class case, which wouldn't require calling/instantiating, extremely hacky).
I will also mention, there is a gc built in module for interacting with the garbage collector. Here, you can trigger an immediate garbage collection (otherwise python will eventually get around to collecting the things you del refs to), as well as inspect how many references a given object has.
If you're curious where the allocations are happening, you can also use the built in tracemalloc module to trace said allocations.
Mechanism that handles freeing memory in Python is called "Garbage Collector" and it means there's no reason to use del in overwhelming majority of Python code.
When programming in Python, you are "not supposed" to care about such low level things as allocating and freeing memory for your variables.
That being said, putting your code into functions (although preferrably called something clearer than tmp()) is most definitely a good idea as it will make your code much more readable and "Pythonic"
Coming from C++ and already stumbled to one of the main diferences (drawbacks) of python and this is memory management.Python Garbage Collector will delete all the objects that will fall out of scope.Freeing up memory of objects althought doesnt guarantee that this memory will return actually to the system but instead a rather big portion will be kept reserved by the python programm even if not used.If you face a memory problem and you want to free your memory back to the system the only safe method is to run the memory intensive function into a seperate process.Every process in python have its own interpreter and any memory consumed by this process will return to the system when the process exits.

Is Python memory-safe?

With Deno being the new Node.js rival and all, the memory-safe nature of Rust has been mentioned in a lot of news articles, one particular piece stated Rust and Go are good for their memory-safe nature, as are Swift and Kotlin but the latter two are not used for systems programming that widely.
Safe Rust is the true Rust programming language. If all you do is write Safe Rust, you will never have to worry about type-safety or memory-safety. You will never endure a dangling pointer, a use-after-free, or any other kind of Undefined Behavior.
This piqued my interest into understanding if Python can be regarded as memory-safe and if yes or no, how safe or unsafe?
From the outset, the article on memory safety on Wikipedia does not even mention Python and the article on Python only mentions memory management it seems.
The closest I've come to finding an answer was this one by Daniel:
The wikipedia article associates type-safe to memory-safe, meaning, that the same memory area cannot be accessed as e.g. integer and string. In this way Python is type-safe. You cannot change the type of a object implicitly.
But even this only seems to imply a connection between two aspects (using an association from Wikipedia, which again is debatable) and no definitive answer on whether Python can be regarded as memory-safe.
Wikipedia lists the following examples of memory safety issues:
Access errors: invalid read/write of a pointer
Buffer overflow - out-of-bound writes can corrupt the content of adjacent objects, or internal data (like bookkeeping information for the heap) or return addresses.
Buffer over-read - out-of-bound reads can reveal sensitive data or help attackers bypass address space layout randomization.
Python at least tries to protect against these.
Race condition - concurrent reads/writes to shared memory
That's actually not that hard to do in languages with mutable data structures. (Advocates of functional programming and immutable data structures often use this fact as an argument in their favor).
Invalid page fault - accessing a pointer outside the virtual memory space. A null pointer dereference will often cause an exception or program termination in most environments, but can cause corruption in operating system kernels or systems without memory protection, or when use of the null pointer involves a large or negative offset.
Use after free - dereferencing a dangling pointer storing the address of an object that has been deleted.
Uninitialized variables - a variable that has not been assigned a value is used. It may contain an undesired or, in some languages, a corrupt value.
Null pointer dereference - dereferencing an invalid pointer or a pointer to memory that has not been allocated
Wild pointers arise when a pointer is used prior to initialization to some known state. They show the same erratic behaviour as dangling pointers, though they are less likely to stay undetected.
There's no real way to prevent someone from trying to access a null pointer. In C# and Java, this results in an exception. In C++, this results in undefined behavior.
Memory leak - when memory usage is not tracked or is tracked incorrectly
Stack exhaustion - occurs when a program runs out of stack space, typically because of too deep recursion. A guard page typically halts the program, preventing memory corruption, but functions with large stack frames may bypass the page.
Memory leaks in languages like C#, Java, and Python have different meanings than they do in languages like C and C++ where you manage memory manually. In C or C++, you get a memory leak by failing to deallocate allocated memory. In a language with managed memory, you don't have to explicitly de-allocate memory, but it's still possible to do something quite similar by accidentally maintaining a reference to an object somewhere even after the object is no longer needed.
This is actually quite easy to do with things like event handlers in C# and long-lived collection classes; I've actually worked on projects where there were memory leaks in spite of the fact that we were using managed memory. In one sense, working with an environment that has managed memory can actually make these issues more dangerous because programmers can have a false sense of security. In my experience, even experienced engineers often fail to do memory profiling or write test cases to check for this (likely due to the environment giving them a false sense of security).
Stack exhaustion is quite easy to do in Python too (e.g. with infinite recursion).
Heap exhaustion - the program tries to allocate more memory than the amount available. In some languages, this condition must be checked for manually after each allocation.
Still quite possible - I'm rather embarrassed to admit that I've personally done that in C# (although not in Python yet).
Double free - repeated calls to free may prematurely free a new object at the same address. If the exact address has not been reused, other corruption may occur, especially in allocators that use free lists.
Invalid free - passing an invalid address to free can corrupt the heap.
Mismatched free - when multiple allocators are in use, attempting to free memory with a deallocation function of a different allocator[20]
Unwanted aliasing - when the same memory location is allocated and modified twice for unrelated purposes.
Unwanted aliasing is actually quite easy to do in Python. Here's an example in Java (full disclosure: I wrote the accepted answer); you could just as easily do something quite similar in Python. The others are managed by the Python interpreter itself.
So, it would seem that memory-safety is relative. Depending on exactly what you consider a "memory-safety issue," it can actually be quite difficult to entirely prevent. High-level languages like Java, C#, and Python can prevent many of the worst of these errors, but there are other issues that are difficult or impossible to completely prevent.

Does python code using boost python in cpp make dynamic memory allocation?

In my code, python grammar is written in Cpp code using boost python.
It works well. But I didn't fully understand how it increase object size.
Even, I couldn't prove inner code.
So I worry about memory leak.
Because real code is very huge, system will be able to die.
Below code is simple example which same algorithm applied.
Could anyone tell me that this code make memory leak or how it increase object size?
using namespace boost::python;
dict get_name(){
int i;
dict school;
list class1;
for(i=0;i<10;i++){
class1.append(student[i]);
}
school["class1"] = class1;
return school;
}
This code might or might not leak depending of the held type of whatever student is mapping. But otherwise, no, you should not have to worry about a leak because the objects are wrapped by Python which will delete them whenever they are no longer referenced.

C++ and Python 3 memory leak using PyArg_ParseTuple

I'm not a C++ developer so i don't really know what I'm doing. Unfortunately I have to debug the following code but I'm not making any progress.
static PyObject* native_deserialize(PyObject *self, PyObject *args){
PyObject * pycontent;
int len;
PyObject * props = NULL;
PyArg_ParseTuple(args, "|SiO", &pycontent, &len, &props);
RecordParser reader("onet_ser_v0");
TrackerListener* listener;
listener = new TrackerListener(props);
#if PY_MAJOR_VERSION >= 3
reader.parse((unsigned char*)PyBytes_AsString(pycontent), len, *listener);
#else
reader.parse((unsigned char*)PyString_AsString(pycontent), len, *listener);
#endif
return listener->obj;
}
Here is the python that calls that code:
clsname, data = pyorient_native.deserialize(content,
content.__sizeof__(), self.props)
This code creates a nasty memory leak. In fact, when I run this code, it kills my memory within 20 minutes.
I looked at the code but can't find the problem in the C++.
How can I prevent rogue C++ code from killing my Python code? Is there a way to flag C++ code from within python to be recycled regardless whether the C++ created a memory leak?
Is there a way I can force the memory to be garbage collected in C++. How can I find the exact leak in C++ by running python?
My biggest issue is understanding Py_XDECREF and Py_XINCREF and the rest of the reference counting macros. I'm reading the docs but obviously I'm missing some context because I can't figure out where and when to use these. I have loads of respect for C++ developers. Their jobs seem unnecessarily difficult :(
It turns out the solution was to Py_XDECREF the reference count for al the created objects. I still don't know exactly how, why and were as many of this still doesn't make sense to me.
I found this page that points out some of the pitfalls of these macros.
https://wingware.com/psupport/python-manual/2.3/ext/node22.html
There is the documentation but that wasn't very helpful.
https://docs.python.org/3/c-api/refcounting.html
Maybe someone can share something else that is easier to consume for us non C++ peoplez?

Const correctness of Python's C API

It seems that the Python C API is not consistent with the const correctness of character arrays. For example, PyImport_ImportFrozenModule accepts a char*, whereas PyImport_ImportModule accepts a const char*.
The implication of all this is that in my C++ application that I am writing with an embedded Python interpreter, I sometimes have to cast the string literal that I pass to a Python API call as just a char* (as opposed to const char*), and sometimes I don't. For example:
PyObject *os = PyImport_ImportModule("os"); // Works without the const_cast
PyObject *cwd = PyObject_CallMethod(os, const_cast<char*>("getcwd"), NULL); // Accepts char*, not const char*
If I don't do the const_cast<char*> (or (char*)) on the string literal, I get a compiler warning about casting string literals to char*.
Here are my questions:
Is there an advantage/reason to having some of the functions not take a const char* (and/or why would the Python API not be consistent in this)? My understanding is that if the function can take a string literal, it cannot change the char* so the const modifier would just be reinforcing this. I also believe that the const distinction is not as important for C (for which the API was written) than it is in C++ (correct me if I am wrong... my strength is python, not C/C++). Is the lack of "const correctness" of the Python API because it's simply not as important in C? (There is an old thread on the python mailing list from 2000 asking the same question, but it didn't seem to go anywhere and it is implied the reason might be due to some compilers not supporting const. Since many functions now have const char*, this doesn't seem to apply anymore)
Because my understanding of C++ is limited, I am unsure if I am going about casting string literals properly. The way I see it, I can either one of the following (I am currently doing the first):
// Method 1) Use const_cast<char*>
PyImport_ImportFrozenModule(const_cast<char*>("mymodule"));
// Method 2) Use (char*)
PyImport_ImportFrozenModule((char*) "mymodule");
// Method 3) Use char array
char mod[] = "mymodule";
PyImport_ImportFrozenModule(mod);
Which is the best method do use?
Update:
It looks like the Python3 branch is slowly trying to fix the const correctness issue. For example, the PyImport_ImportFrozenModule function I use as an example above now takes a const char* in Python 3.4, but there are still functions that take only a char*, such as PyLong_FromString.
Based on some mailing list conversations from python-dev, it looks like the initial API just simply wasn't created with const correctness in mind, probably just because Guido didn't think about it. Dating all the way back to 2002, someone asked if there was any desire to address that by adding const-correctness, complaining that it's a pain to always have to do this:
somefunc(const char* modulename, const char* key)
{
... PyImport_ImportModule(const_cast<char*>(modulename)) ...
Guido Van Rossum (the creator of Python) replied (emphasis mine):
I've never tried to enforce const-correctness before, but I've heard
enough horror stories about this. The problem is that it breaks 3rd
party extensions left and right, and fixing those isn't always easy.
In general, whenever you add a const somewhere, it ends up propagating
to some other API, which then also requires a const, which propagates
to yet another API needing a const, ad infinitum.
There was a bit more discussion, but without Guido's support the idea died.
Fast forward nine years, and the topic came up again. This time someone was simply wondering why some functions were const-correct, while others weren't. One of the Python core developers replied with this:
We have been adding const to many places over the years. I think the
specific case was just missed (i.e. nobody cared about adding const
there).
It seems that when it could be done without breaking backwards compatibility, const-correctness has been added to many places in the C API (and in the case of Python 3, in places where it would break backwards compatibility with Python 2), but there was never a real global effort to fix it everywhere. So the situation is better in Python 3, but the entire API is likely not const correct even now.
I'm don't think that the Python community has any preferred way to handle casting with calls that are not const-correct (there's no mention of it in the official C-API style guide), probably because there aren't a ton of people out there interfacing with the C-API from C++ code. I would say the preferred way of doing it from a pure C++ best-practices perspective would be the first choice, though. (I'm by no means a C++ expert, so take that with a grain of salt).
Is there an advantage/reason to having some of the functions not take a const char*?
No. Looks like an oversight in the library's design or, like you say, legacy issues. They could at least have made it consistent, though!
My understanding is that if the function can take a string literal, it cannot change the char* so the const modifier would just be reinforcing this.
Exactly. Their documentation should also specify that the function argument (or, rather, the argument's pointee) shall not be modified during the function call; alas it currently does not say this.
I also believe that the const distinction is not as important for C (for which the API was written) than it is in C++.
Well, not really, at least as far as I know.
The way I see it, I can either one of the following (I am currently doing the first)
(good)
Which is the best method do use?
Well the const_cast will at least make sure that you are only modifying the const-ness, so if you had to choose I'd go with that. But, really, I wouldn't be too bothered about this.

Categories