Memory efficiency when exposing C struct with Cython

Memory efficiency when exposing C struct with Cython - python

I am looking to use Cython to reduce the memory requirements of a data structure that is stored in a Python dictionary millions of times. Right now, I have implemented this as a simple Cython class in a pyx file:
cdef class FooBar:
cdef public int x
cdef public int y
def __init__(self, x, y):
self.x = x
self.y = y
I use this in another pyx file to populate a Python dictionary that has millions of keys:
python_dict[key] = FooBar(x, y)
This provides a substantial memory savings compared to using a namedtuple or other Python solution as my actual class stores 6 integers and thus required 40 bytes of memory (6*4 + 16 bytes for PyObject_Head). However, the 16 bytes overhead is substantial and my question is if there is a better way to do this. My understanding is that I can't directly expose a C struct since this would be type cast to a Python dictionary requiring far more memory. The python_dict dictionary is used extensively in my primarily Python code. Is there a way to avoid the PyObject_Head overhead without having to move a lot of the implementation over to pure Cython/C?

Related

Cython: efficient custom numpy 1D array for cdef class

Say we have a class in cython that wraps (via a pointer) a C++ class with unknown/variable size in memory:
//poly.h
class Poly{
std::vector[int] v
// [...] Methods to initialize/add/multiply/... coefficients [...] e.g.,
Poly(int len, int val){for (int i=0; i<len; i++){this->v.push_back(val)};};
void add(Poly& p) {for (int i=0; i<this->v.size();i++){this->v[i] += p->v[i];};};
};
We can conveniently expose operations like add in PyPoly using operator overloads (e.g., __add__/__iadd__):
cdef extern from "poly.h":
cdef cppclass Poly:
Poly(int len, int val)
void add(Poly& p)
#pywrapper.pyx
cdef class PyPoly
cdef Poly* c_poly
cdef __cinit__(self, int l, int val):
self.c_poly = new Poly(l, val)
cdef __dealloc__(self):
del self.c_poly
def __add__(self, PyPoly other):
new_poly = PyPoly(self.c_poly.size(), 0)
new_poly.c_poly.add(self.c_poly)
new_poly.c_poly.add(other.c_poly)
return new_poly
How to create an efficient 1D numpy array with this cdef class?
The naive way I'm using so far involves a np.ndarray of type object, which benefits from the existing operator overloads:
pypoly_arr = np.array([PyPoly(l=10, val) for val in range(10)])
pypoly_sum = np.sum(pypoly_arr) # Works thanks to implemented PyPoly.__add__
However, the above solution has to go through python code to understand the data type and the proper way to deal with __add__, which becomes quite cumbersome for big array sizes.
Inspired by https://stackoverflow.com/a/45150611/9670056, I tried with an array wrapper of my own, but I'm not sure how to create a vector[PyPoly], whether I should do it or instead just hold a vector of borrowed references vector[Poly*], so that the call to np.sum could be treated (and paralellized) at C++ level.
Any help/suggestions will be highly appreciated! (specially to rework the question/examples to make it as generic as possible & runnable)

This is not possible to do that in Cython. Indeed, Numpy does not support native Cython classes as a data type. The reason is that the Numpy code is written in C and it already compiled when your Cython code is compiled. This means Numpy cannot directly use your native type. It has to do an indirection and this indirection is made possible through the object CPython type which has the downside of being slow (mainly because of the actual indirection but also a bit because of CPython compiler overheads). Cython do not reimplement Numpy primitives as it would be a huge work. Numpy only supports a restricted predefined set of data types. It supports custom user types such types are not as powerful as CPython classes (eg. you cannot reimplement custom operators on items like you did).
Just-in-time (JIT) compiler modules like Numba can theoretically supports this because they reimplement Numpy and generate a code at runtime. However, the support of JIT classes in Numba is experimental and AFAIK array of JIT classes are not yet supported.
Note that you do not need to build an array in this case. A basic loop is faster and use less memory. Something (untested) like:
cdef int val
cdef PyPoly pypoly_sum
pypoly_sum = PyPoly(l=10, 0)
for val in range(1, 10):
pypoly_sum += PyPoly(l=10, val)

Python ctypes 'c_char_p' memory leak

I'm developing a Python library for cryptography. I wanted to optimize my library by writing the main classes in C++ with GMP. I wrote my C++ classes and I wrote the extern methods to use the main arithmetic operations: addition, subtraction, etc... These methods returns the results as char* to avoid cast problems. I built the DLL of my library and I declared the methods in a Python wrapper with ctypes. I noticed that after each arithmetic operation with huge numbers the memory grew exponentially. I was looking for problems in my C++ implementation, but there were no problems thanks to the C++ garbage collector. I was looking for a possible solution, so I discovered that I had to implement a C++ method to free up memory of the string created by the DLL. So I wrote this simple method:
extern "C" {
__declspec(dllexport) void free_memory(char * n)
{
free(n);
}
...
}
I implemented this code in the Python wrapper to free up the memory allocated by the DLL:
import os
import ctypes
DIR_PATH = os.path.dirname(os.path.realpath(__file__))
NUMERIC = ctypes.CDLL(DIR_PATH + "/numeric.dll")
...
NUMERIC.free_memory.argtypes = [ctypes.c_void_p]
NUMERIC.free_memory.restype = None
def void_cast(n):
a = ctypes.cast(n, ctypes.c_char_p)
res = ctypes.c_char_p(a.value)
NUMERIC.free_memory(a)
return res
So with res = ctypes.c_char_p (a.value) I create a new variable that no longer points to a. This way I correctly delete a using the DLL method, but I still have memory leak problems. It is as if the Python garbage collector does not free up correctly the memory of strings of type c_char_p. In the previous implementation I used only Python and the gmpy2 library, so all the numbers were converted to mpz or mpq. I tested the memory consumption using the memory_profiler package. I created 40 objects of type projective point, defined on an elliptical curve, and I calculated the products i*P, withi from 1 to 40. With gmpy2 about 70MB in total were used. Instead, using ctypes with the classes in C++ the consumption of the memory rose to 1.5GB. It's obvious that there is something wrong, especially when only the base classes that deal with arithmetic operations change. How can I properly free the memory without having memory leak problems?
I put an example of extern method for calculating an arithmetic operation, but I have already checked that the problem lies only in correctly freeing the memory via thefree_memory function and reassigning the string so that the garbage collector of Python will free the string when needed.
extern "C" {
__declspec(dllexport) const char* rat_add(const char * n, const char * m)
{
return (RationalNum(n) + RationalNum(m)).getValue();
}
}
Thanks in advance and have a nice day.
PS: Clearly in C++ I correctly implemented the destructor method to free up the space of the mpz_t and mpq_t objects created.

The problem is in this line:
res = ctypes.c_char_p(a.value)
This creates a copy of a.value and sets res to a c_char_p that points to the copy. However, Python does not do memory management for ctypes pointers, so the copy will be leaked!
The leak should be fixed if you replace above line with:
res = bytes(memoryview(a.value))
This also creates a copy, but res will be a real Python object.

Complex conversion to Python Complex

I'm currently writing a wrapper for a C++ project that use std::complex<double>, available in cython as libcpp.complex.complex[double].
However there is no implicit conversion between this and Python complex, I'm trying to find the best way to do this conversion.
The obvious is to use
cdef libcpp.complex.complex[double] x = ...
X = complex(x.real(),x.imag()
And
cdef complex Y = ...
cdef libcpp.complex.complex[double] y = libcpp.complex.complex[double](Y.real, Y.imag)
And
cdef libcpp.complex.complex[double] z
cdef complex Z = ...
z.real(Z.real)
z.imag(Z.imag)
But is there a better way, preferably to make cython do the conversion automatically.
For one this occurs in some cpdef functions, and I would like to avoid using Python complex in calls from cython for improved speed.
However as far as I can tell cython can't do this conversion implicitly, and thus I can't avoid using the Python complex for code that can be called from Python or C++.

This functionality will be released in the next version of Cython.
https://github.com/cython/cython/commit/2c3c04b25620c437ce41d7851e7581eb1f0c313c

python iterate over dynamically allocated Cython array

I'm writing a python wrapper to a C class and I'm allocating memory using PyMem_Malloc as explained here
cdef class SomeMemory:
cdef double* data
def __cinit__(self, size_t number):
# allocate some memory (uninitialised, may contain arbitrary data)
self.data = <my_data_t*> PyMem_Malloc(number * sizeof(my_data_t))
if not self.data:
raise MemoryError()
Then I import and use the class in another script using:
import SomeMemory
sm = SomeMemory(10)
I now would like to access the elements of self.data but I encounter 2 problems
if I type self.data and hit enter, the ipython kernel crashes
if I try to loop on self data
like:
for p in self.data:
print p
I get an error that self.data is not iterable.
How can I access self.data? Do I need to cast the elements to my_data_t first?

(Heavily edited in light of the updated question)
So - my understanding is that self.data needs to be declared as public to access it from Python:
cdef class SomeMemory:
cdef public double* data
# also make sure you define __dealloc__ to remove the data
However, that doesn't seem to be enforced here.
However, your real problems are that Python doesn't know what to do with an object of type double *. It's certainly never going to be iterable because the information about when to stop is simply not stored (so it will always go off the end.
There are a range of better alternatives:
you store your data as a Python array (see http://docs.cython.org/src/tutorial/array.html for a guide). You have quick access to the raw c pointer from within Cython if you want, but also Python knows what to do with it
Code follows
from cpython cimport array as c_array
from array import array
cdef class SomeMemory:
cdef c_array.array data
def __cinit__(self, size_t number):
self.data = array('d', some initial value goes here)
You store your data as a numpy array. This is possibly slightly more complicated, but has much the same advantages. I won't give an example, but it's easier to find.
You use Cython's typed memory view: http://docs.cython.org/src/userguide/memoryviews.html. The advantage of this is that it lets you control the memory management yourself (if that's absolutely important to you). However, I'm not 100% sure that you can access them cleanly in Python (too lazy to test!)
You wrap the data in your own class implementing __getitem__ and __len__ so it can be used as an iterator. This is likely more trouble that it's worth.
I recommend the first option, unless you have good reason for one of the others.

How to figure out why cython-izing code slows it down?

We have some code written in python that uses a few classes that are really just "structs" -- instances of these classes just have a bunch of fields in them and no methods. Example:
class ResProperties:
def __init__(self):
self.endDayUtilities = 0
self.marginalUtilities = []
self.held = 0
self.idleResource = True
self.experience = 0.0
self.resSetAside = 0
self.unitsGatheredToday = 0
Our main code uses a bunch of instances of this class.
To speed up the code, I thought I'd cython-ized this class:
cdef class ResProperties:
cdef public float endDayUtilities
cdef public list marginalUtilities
cdef public int held
cdef public int idleResource
cdef public float experience
cdef public int resSetAside
cdef public int unitsGatheredToday
def __init__(self):
self.endDayUtilities = 0
# etc: code just like above.
However, the result is that the code runs something like 25% slower now!
How do I figure out what is causing the code to run slower now?
Thanks.

You converted these classes to Cython but are still using them from Python code?
Conversion of data from C to Python and back is going to incur overhead. For example, your endDayUtilities member is a C-style float. When you access it from Python, a float() object must be constructed before your Python code can do anything with it. The same thing has to happen in reverse when you assign to that attribute from Python.
Off the top of my head, I'd estimate the performance overhead of these data conversions at... oh, about 25%. :-)
You're not going to see a performance boost until you move some of your code that uses that data to Cython. Basically, the more you can stay in C-land, the better you'll do. Going back and forth will kill you.
As another, simpler approach, you might want to try Psyco or PyPy instead of Cython.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Memory efficiency when exposing C struct with Cython - python

Related

Cython: efficient custom numpy 1D array for cdef class

Python ctypes 'c_char_p' memory leak

Complex conversion to Python Complex

python iterate over dynamically allocated Cython array

How to figure out why cython-izing code slows it down?

Categories

Resources