I'm trying to implement a simple code in cython using Jupyter notebook (I use python 2) and using gmp arithmetic in order to handle very large integers. I'm not a gmp/cython expert. My question is : how do I print the value a in the function fib().
The following code returns {}.
As fas as I can understand it has to do with stdout. For instance I tried gmp_printf and it didn't work.
%%cython --link-args=-lgmp
cdef extern from "gmp.h":
ctypedef struct mpz_t:
pass
cdef void mpz_init(mpz_t)
cdef void mpz_init_set_ui(mpz_t, unsigned int)
cdef void mpz_add(mpz_t, mpz_t, mpz_t)
cdef void mpz_sub(mpz_t, mpz_t, mpz_t)
cdef void mpz_add_ui(mpz_t, const mpz_t, unsigned long int)
cdef void mpz_set(mpz_t, mpz_t)
cdef void mpz_clear(mpz_t)
cdef unsigned long int mpz_get_ui(mpz_t)
cdef void mpz_set_ui(mpz_t, unsigned long int)
cdef int gmp_printf (const char*, ...)
cdef size_t mpz_out_str (FILE , int , const mpz_t)
def fib(unsigned long int n):
cdef mpz_t a,b
mpz_init(a)
mpz_init(b)
mpz_init_set_ui(a,1)
mpz_init_set_ui(b,1)
cdef int i
for i in range(n):
mpz_add(a,a,b)
mpz_sub(b,a,b)
return a
And the result
fib(10)
{}
If I use return mpz_get_ui(a) instead of return a
the code is working fine, but this is not the thing I really want (to get a long integer).
EDIT.
I compared the previous code with another one again in cython but not using mpz.
%%cython
def pyfib(unsigned long int n):
a,b=1,1
for i in range(n):
a=a+b
b=a-b
return a
and finally the same code but using mpz from gmpy2
%%cython
import gmpy2
from gmpy2 import mpz
def pyfib_with_gmpy2(unsigned long int n):
cdef int i
a,b=mpz(1),mpz(1)
for i in range(n):
a=a+b
b=a-b
return a
Then
timeit fib(700000)
1 loops, best of 3: 3.19 s per loop
and
timeit pyfib(700000)
1 loops, best of 3: 11 s per loop
and
timeit pyfib_with_gmpy2(700000)
1 loops, best of 3: 3.28 s per loop
(Answer mostly summarises a bunch of comments)
The immediate issue you were having was that Python has no real way to handle C structs. To get around this, Cython tries to convert structs to dictionaries when they are passed to Python (if possible). In this particular case, mpz_t is treated as "opaque" by C (and thus Cython) so you aren't supposed to know about its members. Therefore Cython "helpfully" converts it to an empty dictionary (a correct representation of all the members it knows about).
For an immediate fix I suggested using the gmpy library, which is an existing Python/Cython wrapping of GMP. This is probably a better choice than repeating the effort to wrap it.
As a general solution to this sort of problem there are two obvious options.
You could create a cdef wrapper class. The documentation I have linked is for C++, but the idea could be applied to C as well (with new/'del' replaced with 'malloc'/'free'). This is ultimately a Python class (so can be returned from Cython to Python) but contains a C struct, which you can manipulate directly in Cython. The approach is pretty well documented and doesn't need repeating here.
You could convert the mpz_t back to a Python integer at the end of the function. I feel this makes most sense, since ultimately they represent the same thing. The code shown below is a rough outline and hasn't been tested (I don't have gmp installed):
cdef mpz_to_py_int(mpz_t x):
# get bytes that describe the integer
cdef const mp_limb_t* x_data = mpz_limbs_read(x)
# view as a unsigned char* (i.e. as bytes)
cdef unsigned char* x_data_bytes = <unsigned char*>x_data
# cast to a memoryview then pass that to the int classmethod "from_bytes"
# assuming big endian (python 3.2+ required)
out = int.from_bytes(<unsigned char[:mpz_size(x):1]>x_data_bytes,'big')
# correct using sign
if mpz_sign(x) < 0:
return -out
else
return out
Related
Say we have a class in cython that wraps (via a pointer) a C++ class with unknown/variable size in memory:
//poly.h
class Poly{
std::vector[int] v
// [...] Methods to initialize/add/multiply/... coefficients [...] e.g.,
Poly(int len, int val){for (int i=0; i<len; i++){this->v.push_back(val)};};
void add(Poly& p) {for (int i=0; i<this->v.size();i++){this->v[i] += p->v[i];};};
};
We can conveniently expose operations like add in PyPoly using operator overloads (e.g., __add__/__iadd__):
cdef extern from "poly.h":
cdef cppclass Poly:
Poly(int len, int val)
void add(Poly& p)
#pywrapper.pyx
cdef class PyPoly
cdef Poly* c_poly
cdef __cinit__(self, int l, int val):
self.c_poly = new Poly(l, val)
cdef __dealloc__(self):
del self.c_poly
def __add__(self, PyPoly other):
new_poly = PyPoly(self.c_poly.size(), 0)
new_poly.c_poly.add(self.c_poly)
new_poly.c_poly.add(other.c_poly)
return new_poly
How to create an efficient 1D numpy array with this cdef class?
The naive way I'm using so far involves a np.ndarray of type object, which benefits from the existing operator overloads:
pypoly_arr = np.array([PyPoly(l=10, val) for val in range(10)])
pypoly_sum = np.sum(pypoly_arr) # Works thanks to implemented PyPoly.__add__
However, the above solution has to go through python code to understand the data type and the proper way to deal with __add__, which becomes quite cumbersome for big array sizes.
Inspired by https://stackoverflow.com/a/45150611/9670056, I tried with an array wrapper of my own, but I'm not sure how to create a vector[PyPoly], whether I should do it or instead just hold a vector of borrowed references vector[Poly*], so that the call to np.sum could be treated (and paralellized) at C++ level.
Any help/suggestions will be highly appreciated! (specially to rework the question/examples to make it as generic as possible & runnable)
This is not possible to do that in Cython. Indeed, Numpy does not support native Cython classes as a data type. The reason is that the Numpy code is written in C and it already compiled when your Cython code is compiled. This means Numpy cannot directly use your native type. It has to do an indirection and this indirection is made possible through the object CPython type which has the downside of being slow (mainly because of the actual indirection but also a bit because of CPython compiler overheads). Cython do not reimplement Numpy primitives as it would be a huge work. Numpy only supports a restricted predefined set of data types. It supports custom user types such types are not as powerful as CPython classes (eg. you cannot reimplement custom operators on items like you did).
Just-in-time (JIT) compiler modules like Numba can theoretically supports this because they reimplement Numpy and generate a code at runtime. However, the support of JIT classes in Numba is experimental and AFAIK array of JIT classes are not yet supported.
Note that you do not need to build an array in this case. A basic loop is faster and use less memory. Something (untested) like:
cdef int val
cdef PyPoly pypoly_sum
pypoly_sum = PyPoly(l=10, 0)
for val in range(1, 10):
pypoly_sum += PyPoly(l=10, val)
I'd like to use the C++ way of reading files in Cython.
I have a simple file reader that looks like this:
std::ifstream file(fileName);
while(file >> chromosome >> start >> end >> junk >> junk >> strand)
{ ... }
Can I do this in Cython?
Probably better options would be to use python parsing functionality (for example pandas' or numpy's) or, if first solution isn't flexible enough, to code the reader in pure C++ and then call the functionality from Cython.
However, also your approach is possible in Cython, but in order to make it work, one needs to jump through some hoops:
the whole iostream hierarchy isn't part of the provided libcpp-wrappers, so one has to wrap it (and if one doesn't it quick&dirty that are a few lines).
Because std::ifsteam doesn't provide a default constructor, we cannot construct it as an object with automatic lifetime in Cython and need take care of memory management.
Another issue is wrapping of used-defined conversion. It is not very well described in the documentation (see this SO-question), but only operator bool()]3 is supported, so we need to use C++11 (otherwise it is operator void*() const;).
So here is a quick&dirty proof of concept:
%%cython -+ -c=-std=c++11
cdef extern from "<fstream>" namespace "std" nogil:
cdef cppclass ifstream:
# constructor
ifstream (const char* filename)
# needed operator>> overloads:
ifstream& operator>> (int& val)
# others are
# ifstream& operator>> (unsigned int& val)
# ifstream& operator>> (long& val)
# ...
bint operator bool() # is needed,
# so while(file) can be evaluated
def read_with_cpp(filename):
cdef int a=0,b=0
cdef ifstream* foo = new ifstream(filename)
try:
while (foo[0] >> a >> b):
print(a, b)
finally: # don't forget to call destructor!
del foo
actually the return type of operator>>(...) is not std::ifstream but std::basic_istream - I'm just too lazy to wrap it as well.
And now:
>>> read_with_cpp(b"my_test_file.txt")
prints the content of the file to console.
However, as stated above, I would go for writing the parsing in pure C++ and consume it from Cython (e.g. by passing a functor, so the cpp-code can use Python functionality), here is a possible implementation:
%%cython -+
cdef extern from *:
"""
#include <fstream>
void read_file(const char* file_name, void(*line_callback)(int, int)){
std::ifstream file(file_name);
int a,b;
while(file>>a>>b){
line_callback(a,b);
}
}
"""
ctypedef void(*line_callback_type)(int, int)
void read_file(const char* file_name, line_callback_type line_callback)
# use function pointer to get access to Python functionality in cpp-code:
cdef void line_callback(int a, int b):
print(a,b)
# expose functionality to pure Python:
def read_with_cpp2(filename):
read_file(filename, line_callback)
and now calling read_with_cpp2(b"my_test_file.txt") leads to the same result as above.
c++ header (some.h) contains:
#define NAME_SIZE 42
struct s_X{
char name[NAME_SIZE + 1]
} X;
I want to use X structure in Python. How could I make it?
I write:
cdef extern from "some.h":
cdef int NAME_SIZE # 42
ctypedef struct X:
char name[NAME_SIZE + 1]
And got an error: Not allowed in a constant expression
It often doesn't really matter what you tell Cython when declaring types - it uses the information for checking you aren't doing anything obviously wrong with type casting and that's it. The cdef extern "some.h" statement ensures that some.h is included into to c-file Cython creates and ultimately that determines what is complied.
Therefore, in this particular case, you can just insert an arbitary number and it will work fine
cdef extern "some.h":
cdef int NAME_SIZE # 42
ctypedef struct X:
char name[2] # you can pick a number at random here
In situations it won't work though, especially where Cython has to actually use the number in the C code it generates. For example:
def some_function():
cdef char_array[NAME_SIZE+1] # won't work! Cython needs to know NAME_SIZE to generate the C code...
# other code follows
(I don't currently have a suggestion as to what to do in this case)
NAME_SIZE doesn't actually exist in your program so you'll probably have to hardcode it in the Python.
Despite how it looks in your C source code, you hardcoded it in the C array declaration, too.
I'm working in with a C library that repeatedly calls a user supplied function pointer to get more data. I'd like to write a Cython wrapper in such a way that the Python implementation of that callback can return any reasonable data type like str, bytearray, memory mapped files, and so on (specifically, supports the Buffer interface). what I have so far is:
from cpython.buffer cimport PyBUF_SIMPLE
from cpython.buffer cimport Py_buffer
from cpython.buffer cimport PyObject_GetBuffer
from cpython.buffer cimport PyBuffer_Release
from libc.string cimport memmove
cdef class _callback:
cdef public object callback
cdef public object data
cdef uint16_t GetDataCallback(void * userdata,
uint32_t wantlen, unsigned char * data,
uint32_t * gotlen):
cdef Py_buffer gotdata
box = <_callback> userdata
gotdata_object = box.callback(box.data, wantlen)
if not PyObject_CheckBuffer(gotdata_object):
# sulk
return 1
try:
PyObject_GetBuffer(gotdata_object, &gotdata, PyBUF_SIMPLE)
if not (0 < gotdata.len <= wantlen):
# sulk
return 1
memmove(data, gotdata.buf, gotdata.len)
return 0
finally:
PyBuffer_Release(&gotdata)
The code I want to write would produce equivalent C code, but look like this:
from somewhere cimport something
from libc.string cimport memmove
cdef class _callback:
cdef public object callback
cdef public object data
cdef uint16_t GetDataCallback(void * userdata,
uint32_t wantlen, unsigned char * data,
uint32_t * gotlen):
cdef something gotdata
box = <_callback> userdata
gotdata = box.callback(box.data, wantlen)
if not (0 < gotdata.len <= wantlen):
# sulk
return 1
memmove(data, gotdata.buf, gotdata.len)
return 0
The generated C code looks like what I think it should be doing; but this seems like digging around in the Python API unnecessarily. Does Cython provide a nicer syntax to achieve this effect?
If you want to support everything that implements every variation of the new-style or old-style buffer interface, then you have to use the C API.
But if you don't care about old-style buffers, you can almost always use a memoryview:
Cython memoryviews support nearly all objects exporting the interface of Python new style buffers. This is the buffer interface described in PEP 3118. NumPy arrays support this interface, as do Cython arrays. The “nearly all” is because the Python buffer interface allows the elements in the data array to themselves be pointers; Cython memoryviews do not yet support this.
This of course includes str (or, in 3.x, bytes), bytearray, etc—if you followed the link, you may notice that it links to the same page to explain what it supports that you linked to explain what you want to support.
For 1D arrays of characters (like str), it's:
cdef char [:] gotdata
A few SciPy functions (like scipy.ndimage.interpolation.geometric_transform) can take pointers to C functions as arguments to avoid having to call a Python callable on each point of the input array.
In a nutshell :
Define a function called my_function somewhere in the C module
Return a PyCObject with the &my_function pointer and (optionally) a void* pointer to pass some global data around
The related API method is PyCObject_FromVoidPtrAndDesc, and you can read Extending ndimage in C to see it in action.
I am very interested in using Cython to keep my code more manageable, but I'm not sure how exactly I should create such an object. Any, well,... pointers?
Just do in Cython the same thing you would do in C, call PyCObject_FromVoidPtrAndDesc directly. Here is an example from your link ported to Cython:
###### example.pyx ######
from libc.stdlib cimport malloc, free
from cpython.cobject cimport PyCObject_FromVoidPtrAndDesc
cdef int _shift_function(int *output_coordinates, double* input_coordinates,
int output_rank, int input_rank, double *shift_data):
cdef double shift = shift_data[0]
cdef int ii
for ii in range(input_rank):
input_coordinates[ii] = output_coordinates[ii] - shift
return 1
cdef void _shift_destructor(void* cobject, void *shift_data):
free(shift_data)
def shift_function(double shift):
"""This is the function callable from python."""
cdef double* shift_data = <double*>malloc(sizeof(shift))
shift_data[0] = shift
return PyCObject_FromVoidPtrAndDesc(&_shift_function,
shift_data,
&_shift_destructor)
Performance should be identical to pure C version.
Note that Cyhton requires operator & to get function address. Also, Cython lacks pointer dereference operator *, indexing equivalent is used instead (*ptr -> ptr[0]).
I think that is a bad idea. Cython was created to avoid writing PyObjects also! Moreover, in this case, writing the code through Cython probably doesn't improve code maintenance...
Anyway, you can import the PyObject with
from cpython.ref cimport PyObject
in your Cython code.
UPDATE
from cpython cimport *
is safer.
Cheers,
Davide