C++ lib in Python: custom sorting method - python

I want to make a custom sorting method in C++ and import it in Python. I am not an expert in C++, here are implementation of "sort_counting"
#include <iostream>
#include <time.h>
using namespace std;
const int MAX = 30;
class cSort
{
public:
void sort( int* arr, int len )
{
int mi, mx, z = 0; findMinMax( arr, len, mi, mx );
int nlen = ( mx - mi ) + 1; int* temp = new int[nlen];
memset( temp, 0, nlen * sizeof( int ) );
for( int i = 0; i < len; i++ ) temp[arr[i] - mi]++;
for( int i = mi; i <= mx; i++ )
{
while( temp[i - mi] )
{
arr[z++] = i;
temp[i - mi]--;
}
}
delete [] temp;
}
private:
void findMinMax( int* arr, int len, int& mi, int& mx )
{
mi = INT_MAX; mx = 0;
for( int i = 0; i < len; i++ )
{
if( arr[i] > mx ) mx = arr[i];
if( arr[i] < mi ) mi = arr[i];
}
}
};
int main( int* arr )
{
cSort s;
s.sort( arr, 100 );
return *arr;
}
and then using it in python
from ctypes import cdll
lib = cdll.LoadLibrary('sort_counting.so')
result = lib.main([3,4,7,5,10,1])
compilation goes nice
How to rewrite a C++ method to receive an array and then return a sorted array?

The error is quite clear: ctypes doesn't know how to convert a python list into a int * to be passed to your function. In fact a python integer is not a simple int and a list is not just an array.
There are limitations on what ctypes can do. Converting a generic python list to an array of ints is not something that can be done automatically.
This is explained here:
None, integers, bytes objects and (unicode) strings are the only
native Python objects that can directly be used as parameters in these
function calls. None is passed as a C NULL pointer, bytes objects and
strings are passed as pointer to the memory block that contains their
data (char * or wchar_t *). Python integers are passed as the
platforms default C int type, their value is masked to fit into the C
type.
If you want to pass an integer array you should read about arrays. Instead of creating a list you have to create an array of ints using the ctypes data types and pass that in instead.
Note that you must do the conversion from python. It doesn't matter what C++ code you write. The alternative way is to use the Python C/API instead of ctypes to only write C code.
A simple example would be:
from ctypes import *
lib = cdll.LoadLibrary('sort_counting.so')
data = [3,4,7,5,10,1]
arr_type = c_int * len(data)
array = arr_type(*data)
result = lib.main(array)
data_sorted = list(result)

Related

Writing my own ufunc; ufunc not supported for the input types

I'm trying to write a C function that coverts numpy string array to a float array. How can I receive numpy's array in C as const char *?
static void double_logitprod(char **args, npy_intp *dimensions,
npy_intp* steps, void* data)
{
npy_intp i;
npy_intp n = dimensions[0];
char *in1 = args[0], *in2 = args[1];
char *out = args[2];
npy_intp in1_step = steps[0];
npy_intp out_step = steps[2];
for (i = 0; i < n; i++) {
/*BEGIN main ufunc computation*/
char *tmp1 = *((char **) in1);
double tmp2 = *((double *)in2);
*((double *) out) = to_float(tmp1, tmp2);
/*END main ufunc computation*/
in1 += in1_step;
out += out_step;
}
}
/*This a pointer to the above function*/
PyUFuncGenericFunction funcs[1] = {&double_logitprod};
/* These are the input and return dtypes of logit.*/
static char types[3] = {NPY_STRING, NPY_DOUBLE,
NPY_DOUBLE};
How to accept a numpy string array in C? NPY_STRING or NPY_UNICODE gives error.
Numpy string array is like this:
x = np.array(['1.0', '2.0', 'N/A'])

Same random numbers in C++ as computed by Python3 numpy.random.rand

I would like to duplicate in C++ the testing for some code that has already been implemented in Python3 which relies on numpy.random.rand and randn values and a specific seed (e.g., seed = 1).
I understand that Python's random implementation is based on a Mersenne twister. The C++ standard library also supplies this in std::mersenne_twister_engine.
The C++ version returns an unsigned int, whereas Python rand is a floating point value.
Is there a way to obtain the same values in C++ as are generated in Python, and be sure that they are the same? And the same for an array generated by randn ?
You can do it this way for integer values:
import numpy as np
np.random.seed(12345)
print(np.random.randint(256**4, dtype='<u4', size=1)[0])
#include <iostream>
#include <random>
int main()
{
std::mt19937 e2(12345);
std::cout << e2() << std::endl;
}
The result of both snippets is 3992670690
By looking at source code of rand you can implement it in your C++ code this way:
import numpy as np
np.random.seed(12345)
print(np.random.rand())
#include <iostream>
#include <iomanip>
#include <random>
int main()
{
std::mt19937 e2(12345);
int a = e2() >> 5;
int b = e2() >> 6;
double value = (a * 67108864.0 + b) / 9007199254740992.0;
std::cout << std::fixed << std::setprecision(16) << value << std::endl;
}
Both random values are 0.9296160928171479
It would be convenient to use std::generate_canonical, but it uses another method to convert the output of Mersenne twister to double. The reason they differ is likely that generate_canonical is more optimized than the random generator used in NumPy, as it avoids costly floating point operations, especially multiplication and division, as seen in source code. However it seems to be implementation dependent, while NumPy produces the same result on all platforms.
double value = std::generate_canonical<double, std::numeric_limits<double>::digits>(e2);
This doesn't work and produces result 0.8901547132827379, which differs from the output of Python code.
For completeness and to avoid re-inventing the wheel, here is an implementation for both numpy.rand and numpy.randn in C++
The header file:
#ifndef RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#define RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#include "RandomNumGenerator.h"
//Uniform distribution - numpy.rand
class RandomNumGen_NumpyCompatible {
public:
RandomNumGen_NumpyCompatible();
RandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);
std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
void seed(std::uint_fast32_t seed);
void discard(unsigned long long); // NOTE!! Advances and discards twice as many values as passed in to keep tracking with Numpy order
uint_fast32_t operator()(); //Simply returns the next Mersenne value from the engine
double getDouble(); //Calculates the next uniformly random double as numpy.rand does
std::string getGeneratorType() const { return "RandomNumGen_NumpyCompatible"; }
private:
std::mt19937 m_mersenneEngine;
};
///////////////////
//Gaussian distribution - numpy.randn
class GaussianRandomNumGen_NumpyCompatible {
public:
GaussianRandomNumGen_NumpyCompatible();
GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);
std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
void seed(std::uint_fast32_t seed);
void discard(unsigned long long); // NOTE!! Advances and discards twice as many values as passed in to keep tracking with Numpy order
uint_fast32_t operator()(); //Simply returns the next Mersenne value from the engine
double getDouble(); //Calculates the next normally (Gaussian) distrubuted random double as numpy.randn does
std::string getGeneratorType() const { return "GaussianRandomNumGen_NumpyCompatible"; }
private:
bool m_haveNextVal;
double m_nextVal;
std::mt19937 m_mersenneEngine;
};
#endif
And the implementation:
#include "RandomNumGen_NumpyCompatible.h"
RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible()
{
}
RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_mersenneEngine(seed)
{
}
void RandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
m_mersenneEngine.seed(newSeed);
}
void RandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
//Advances and discards TWICE as many values to keep with Numpy order
m_mersenneEngine.discard(2*z);
}
std::uint_fast32_t RandomNumGen_NumpyCompatible::operator()()
{
return m_mersenneEngine();
}
double RandomNumGen_NumpyCompatible::getDouble()
{
int a = m_mersenneEngine() >> 5;
int b = m_mersenneEngine() >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
///////////////////
GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible()
: m_haveNextVal(false)
{
}
GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_haveNextVal(false), m_mersenneEngine(seed)
{
}
void GaussianRandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
m_mersenneEngine.seed(newSeed);
}
void GaussianRandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
//Burn some CPU cyles here
for (unsigned i = 0; i < z; ++i)
getDouble();
}
std::uint_fast32_t GaussianRandomNumGen_NumpyCompatible::operator()()
{
return m_mersenneEngine();
}
double GaussianRandomNumGen_NumpyCompatible::getDouble()
{
if (m_haveNextVal) {
m_haveNextVal = false;
return m_nextVal;
}
double f, x1, x2, r2;
do {
int a1 = m_mersenneEngine() >> 5;
int b1 = m_mersenneEngine() >> 6;
int a2 = m_mersenneEngine() >> 5;
int b2 = m_mersenneEngine() >> 6;
x1 = 2.0 * ((a1 * 67108864.0 + b1) / 9007199254740992.0) - 1.0;
x2 = 2.0 * ((a2 * 67108864.0 + b2) / 9007199254740992.0) - 1.0;
r2 = x1 * x1 + x2 * x2;
} while (r2 >= 1.0 || r2 == 0.0);
/* Box-Muller transform */
f = sqrt(-2.0 * log(r2) / r2);
m_haveNextVal = true;
m_nextVal = f * x1;
return f * x2;
}
After doing a bit of testing, it does seem that the values are within a tolerance (see #fdermishin 's comment below) when the C++ unsigned int is divided by the maximum value for an unsigned int like this:
#include <limits>
...
std::mt19937 generator1(seed); // mt19937 is a standard mersenne_twister_engine
unsigned val1 = generator1();
std::cout << "Gen 1 random value: " << val1 << std::endl;
std::cout << "Normalized Gen 1: " << static_cast<double>(val1) / std::numeric_limits<std::uint32_t>::max() << std::endl;
However, Python's version seems to skip every other value.
Given the following two programs:
#!/usr/bin/env python3
import numpy as np
def main():
np.random.seed(1)
for i in range(0, 10):
print(np.random.rand())
###########
# Call main and exit success
if __name__ == "__main__":
main()
sys.exit()
and
#include <cstdlib>
#include <iostream>
#include <random>
#include <limits>
int main()
{
unsigned seed = 1;
std::mt19937 generator1(seed); // mt19937 is a standard mersenne_twister_engine
for (unsigned i = 0; i < 10; ++i) {
unsigned val1 = generator1();
std::cout << "Normalized, #" << i << ": " << (static_cast<double>(val1) / std::numeric_limits<std::uint32_t>::max()) << std::endl;
}
return EXIT_SUCCESS;
}
the Python program prints:
0.417022004702574
0.7203244934421581
0.00011437481734488664
0.30233257263183977
0.14675589081711304
0.0923385947687978
0.1862602113776709
0.34556072704304774
0.39676747423066994
0.538816734003357
whereas the C++ program prints:
Normalized, #0: 0.417022
Normalized, #1: 0.997185
Normalized, #2: 0.720324
Normalized, #3: 0.932557
Normalized, #4: 0.000114381
Normalized, #5: 0.128124
Normalized, #6: 0.302333
Normalized, #7: 0.999041
Normalized, #8: 0.146756
Normalized, #9: 0.236089
I could easily skip every other value in the C++ version, which should give me numbers that match the Python version (within a tolerance). But why would Python's implementation seem to skip every other value, or where do these extra values in the C++ version come from?

Swig and multidimensional arrays

I am using Swig to interface python with C code.
I want to call a C function that takes for argument a struct containing an int** var:
typedef struct
{
(...)
int** my2Darray;
} myStruct;
void myCFunction( myStruct struct );
I am struggling with multi dimensional arrays.
My code looks like this:
In the interface file, I am using carray like this:
%include carrays.i
%array_class( int, intArray );
%array_class( intArray, intArrayArray );
In python, I have:
myStruct = myModule.myStruct()
var = myModule.intArrayArray(28)
for j in range(28):
var1 = myModule.intArray(28)
for i in range(28):
var1[i] = (...) filling var1 (...)
var[j] = var1
myStruct.my2Darray = var
myCFonction( myStruct )
I get an error on the line myStruct.my2Darray = var:
TypeError: in method 'maStruct_monTableau2D_set', argument 2 of type 'int **'
I doubt about the line %array_class( intArray, intArrayArray ).
I tried using a typedef for int* to create my array like this:
%array_class( myTypeDef, intArrayArray );
But it didn't seem to work.
Do you know how to handle multidimensional arrays in Swig ?
Thanks for your help.
Have you considered using numpy for this? I have used numpy with my SWIG-wrapped C++ project for 1D, 2D, and 3D arrays of double and std::complex elements with a lot of success.
You would need to get numpy.i and install numpy in your python environment.
Here is an example of how you would structure it:
.i file:
// Numpy Related Includes:
%{
#define SWIG_FILE_WITH_INIT
%}
// numpy arrays
%include "numpy.i"
%init %{
import_array(); // This is essential. We will get a crash in Python without it.
%}
// These names must exactly match the function declaration.
%apply (int* INPLACE_ARRAY2, int DIM1, int DIM2) \
{(int* npyArray2D, int npyLength1D, int npyLength2D)}
%include "yourheader.h"
%clear (int* npyArray2D, int npyLength1D, int npyLength2D);
.h file:
/// Get the data in a 2D Array.
void arrayFunction(int* npyArray2D, int npyLength1D, int npyLength2D);
.cpp file:
void arrayFunction(int* npyArray2D, int npyLength1D, int npyLength2D)
{
for(int i = 0; i < npyLength1D; ++i)
{
for(int j = 0; j < npyLength2D; ++j)
{
int nIndexJ = i * npyLength2D + j;
// operate on array
npyArray2D[nIndexJ];
}
}
}
.py file:
def makeArray(rows, cols):
return numpy.array(numpy.zeros(shape=(rows, cols)), dtype=numpy.int)
arr2D = makeArray(28, 28)
myModule.arrayFunction(arr2D)
This is how I handled 2d arrays. The trick I used was to write some inline code to handle the creation and mutation of an array. Once that is done, I can use those functions to do my bidding.
Below is the sample code.
ddarray.i
%module ddarray
%inline %{
// Helper function to create a 2d array
int* *int_array(int rows, int cols) {
int i;
int **arr = (int **)malloc(rows * sizeof(int *));
for (i=0; i<rows; i++)
arr[i] = (int *)malloc(cols * sizeof(int));
return arr;
}
void *setitem(int **array, int row, int col, int value) {
array[row][col] = value;
}
%}
ddarray.c
int calculate(int **arr, int rows, int cols) {
int i, j, sum = 0, product;
for(i = 0; i < rows; i++) {
product = 1;
for(j = 0; j < cols; j++)
product *= arr[i][j];
sum += product;
}
return sum;
}
Sample Python script
import ddarray
a = ddarray.int_array(2, 3)
for i in xrange(2):
for j in xrange(3):
ddarray.setitem(a, i, j, i + 1)
print ddarray.calculate(a, 2, 3)

Passing an array using Ctypes

So my python program is
from ctypes import *
import ctypes
number = [0,1,2]
testlib = cdll.LoadLibrary("./a.out")
testlib.init.argtypes = [ctypes.c_int]
testlib.init.restype = ctypes.c_double
#create an array of size 3
testlib.init(3)
#Loop to fill the array
#use AccessArray to preform an action on the array
And the C part is
#include <stdio.h>
double init(int size){
double points[size];
return points[0];
}
double fillarray(double value, double location){
// i need to access
}
double AccessArray(double value, double location){
// i need to acess the array that is filled in the previous function
}
So what I need to do is to pass an array from the python part to the C function somehow move that array in C to the another function where I will access it in order to process it.
I'm stuck though because I cant figure out a way to move the array in the C part.
can someone show me how to do this?
You should try something like this (in your C code):
#include <stdio.h>
double points[1000];//change 1000 for the maximum size for you
int sz = 0;
double init(int size){
//verify size <= maximum size for the array
for(int i=0;i<size;i++) {
points[i] = 1;//change 1 for the init value for you
}
sz = size;
return points[0];
}
double fillarray(double value, double location){
//first verify 0 < location < sz
points[(int)location] = value;
}
double AccessArray(double value, double location){
//first verify 0 < location < sz
return points[(int)location];
}
This is a very simple solution but if you need to allocate an array with just any size you shoul study the use of malloc
Maybe something like this?
$ cat Makefile
go: a.out
./c-double
a.out: c.c
gcc -fpic -shared c.c -o a.out
zareason-dstromberg:~/src/outside-questions/c-double x86_64-pc-linux-gnu 27062 - above cmd done 2013 Fri Dec 27 11:03 AM
$ cat c.c
#include <stdio.h>
#include <malloc.h>
double *init(int size) {
double *points;
points = malloc(size * sizeof(double));
return points;
}
double fill_array(double *points, int size) {
int i;
for (i=0; i < size; i++) {
points[i] = (double) i;
}
}
double access_array(double *points, int size) {
// i need to access the array that is filled in the previous function
int i;
for (i=0; i < size; i++) {
printf("%d: %f\n", i, points[i]);
}
}
zareason-dstromberg:~/src/outside-questions/c-double x86_64-pc-linux-gnu 27062 - above cmd done 2013 Fri Dec 27 11:03 AM
$ cat c-double
#!/usr/local/cpython-3.3/bin/python
import ctypes
testlib = ctypes.cdll.LoadLibrary("./a.out")
testlib.init.argtypes = [ctypes.c_int]
testlib.init.restype = ctypes.c_void_p
#create an array of size 3
size = 3
double_array = testlib.init(size)
#Loop to fill the array
testlib.fill_array(double_array, size)
#use AccessArray to preform an action on the array
testlib.access_array(double_array, size)

Returning numpy array from a C extension

For the sake of learning something new, I am currently trying to reimplement the numpy.mean() function in C. It should take a 3D array and return a 2D array with the mean of the elements along axis 0. I manage to calculate the mean of all values, but don't really know how I would return a new array to Python.
My code so far:
#include <Python.h>
#include <numpy/arrayobject.h>
// Actual magic here:
static PyObject*
myexts_std(PyObject *self, PyObject *args)
{
PyArrayObject *input=NULL;
int i, j, k, x, y, z, dims[2];
double out = 0.0;
if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &input))
return NULL;
x = input->dimensions[0];
y = input->dimensions[1];
z = input->dimensions[2];
for(k=0;k<z;k++){
for(j=0;j<y;j++){
for(i=0;i < x; i++){
out += *(double*)(input->data + i*input->strides[0]
+j*input->strides[1] + k*input->strides[2]);
}
}
}
out /= x*y*z;
return Py_BuildValue("f", out);
}
// Methods table - this defines the interface to python by mapping names to
// c-functions
static PyMethodDef myextsMethods[] = {
{"std", myexts_std, METH_VARARGS,
"Calculate the standard deviation pixelwise."},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initmyexts(void)
{
(void) Py_InitModule("myexts", myextsMethods);
import_array();
}
What I understand so far (and please correct me if I'm wrong) is that I need to create a new PyArrayObject, which will be my output (maybe with PyArray_FromDims ?). Then I need an array of adresses to the memory of this array and fill it with data. How would I go about this?
EDIT:
After doing some more reading on pointers (here: http://pw1.netcom.com/~tjensen/ptr/pointers.htm), I achieved what I was aiming at. Now another question arises: Where would I find the origingal implementation of numpy.mean()? I'd like to see how it is, that the python operation is so much faster than my version. I assume it avoids the ugly looping.
Here is my solution:
static PyObject*
myexts_std(PyObject *self, PyObject *args)
{
PyArrayObject *input=NULL, *output=NULL; // will be pointer to actual numpy array ?
int i, j, k, x, y, z, dims[2]; // array dimensions ?
double *out = NULL;
if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &input))
return NULL;
x = input->dimensions[0];
y = dims[0] = input->dimensions[1];
z = dims[1] = input->dimensions[2];
output = PyArray_FromDims(2, dims, PyArray_DOUBLE);
for(k=0;k<z;k++){
for(j=0;j<y;j++){
out = output->data + j*output->strides[0] + k*output->strides[1];
*out = 0;
for(i=0;i < x; i++){
*out += *(double*)(input->data + i*input->strides[0] +j*input->strides[1] + k*input->strides[2]);
}
*out /= x;
}
}
return PyArray_Return(output);
}
The Numpy API has a function PyArray_Mean that accomplishes what you're trying to do without the "ugly looping" ;).
static PyObject *func1(PyObject *self, PyObject *args) {
PyArrayObject *X, *meanX;
int axis;
PyArg_ParseTuple(args, "O!i", &PyArray_Type, &X, &axis);
meanX = (PyArrayObject *) PyArray_Mean(X, axis, NPY_DOUBLE, NULL);
return PyArray_Return(meanX);
}

Categories