creating numpy array in c extension segfaults

creating numpy array in c extension segfaults - python

I'm just trying to start off by creating a numpy array before I even start to write my extension. Here is a super simple program:
#include <stdio.h>
#include <iostream>
#include "Python.h"
#include "numpy/npy_common.h"
#include "numpy/ndarrayobject.h"
#include "numpy/arrayobject.h"
int main(int argc, char * argv[])
{
int n = 2;
int nd = 1;
npy_intp size = {1};
PyObject* alpha = PyArray_SimpleNew(nd, &size, NPY_DOUBLE);
return 0;
}
This program segfaults on the PyArray_SimpleNew call and I don't understand why. I'm trying to follow some previous questions (e.g. numpy array C api and C array to PyArray). What am I doing wrong?

Typical usage of PyArray_SimpleNew is for example
int nd = 2;
npy_intp dims[] = {3,2};
PyObject *alpha = PyArray_SimpleNew(nd, dims, NPY_DOUBLE);
Note that the value of nd must not exceed the number of elements of array dims[].
ALSO: The extension must call import_array() to set up the C API's function-pointer table. E.g. in Cython:
import numpy as np
cimport numpy as np
np.import_array() # so numpy's C API won't segfault
cdef make_array():
cdef np.npy_intp element_count = 100
return np.PyArray_SimpleNew(1, &element_count, np.NPY_DOUBLE)

Related

PyImport_Import segmentation fault after reading in TSV with C++

I am using C++ as a wrapper around a Python module. First, I read in a TSV file, cast it as a numpy array, import my Python module, and then pass the numpy array to Python for further analysis. When I first wrote the program, I was testing everything using a randomly generated array, and it worked well. However, once I replaced the randomly generated array with the imported TSV array, I got a segmentation fault when I tried to import the Python module. Here is some of my code:
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#define PY_SSIZE_T_CLEAN
#include <python3.8/Python.h>
#include "./venv/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h"
#include <stdio.h>
#include <iostream>
#include <stdlib.h>
#include <random>
#include <fstream>
#include <sstream>
int main(int argc, char* argv[]) {
setenv("PYTHONPATH", ".", 0);
Py_Initialize();
import_array();
static const int numberRows = 1000;
static const int numberColumns = 500;
npy_intp dims[2]{ numberRows, numberColumns };
static const int numberDims = 2;
double(*c_arr)[numberColumns]{ new double[numberRows][numberColumns] };
// ***********************************************************
// THIS PART OF THE CODE GENERATES A RANDOM ARRAY AND WORKS WITH THE REST OF THE CODE
// // initialize random number generation
// typedef std::mt19937 MyRNG;
// std::random_device r;
// MyRNG rng{r()};
// std::lognormal_distribution<double> lognormalDistribution(1.6, 0.25);
// //populate array
// for (int i=0; i < numberRows; i++) {
// for (int j=0; j < numberColumns; j++) {
// c_arr[i][j] = lognormalDistribution(rng);
// }
// }
// ***********************************************************
// ***********************************************************
// THIS PART OF THE CODE INGESTS AN ARRAY FROM TSV AND CAUSES CODE TO FAIL AT PyImport_Import
std::ifstream data("data.mat");
std::string line;
int row = 0;
int column = 0;
while (std::getline(data, line)) {
std::stringstream lineStream(line);
std::string cell;
while (std::getline(lineStream, cell, '\t')) {
c_arr[row][column] = std::stod(cell);
column++;
}
row++;
column = 0;
if (row > numberRows) {
break;
}
}
// ***********************************************************
PyArrayObject *npArray = reinterpret_cast<PyArrayObject*>(
PyArray_SimpleNewFromData(numberDims, dims, NPY_DOUBLE, reinterpret_cast<void*>(c_arr))
);
const char *moduleName = "cpp_test";
PyObject *pname = PyUnicode_FromString(moduleName);
// ***********************************************************
// CODE FAILS HERE - SEGMENTATION FAULT
PyObject *pyModule = PyImport_Import(pname);
// .......
// THERE IS MORE CODE BELOW NOT INCLUDED HERE
}
So, I'm not sure why the code fails when ingest data from a TSV file, but not when I use randomly generated data.
EDIT: (very stupid mistake incoming) I used the conditional row > numberRows for the stopping condition in the while loop and so this affected the row number used for the final line in the array. Once I changed that conditional to row == numberRows, everything worked. Who knew being specific about rows when building an array was so important? I'll leave this up as a testament to stupid programming mistakes and maybe someone will learn a little something from it.

Note that you don't have to use arrays for storing the information(like double values) in 2D manner because you can also use dynamically sized containers like std::vector as shown below. The advantage of using std::vector is that you don't have to know the number of rows and columns beforehand in your input file(data.mat). So you don't have to allocate memory beforehand for rows and columns. You can add the values dynamically.
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include<fstream>
int main() {
std::string line;
double word;
std::ifstream inFile("data.mat");
//create/use a std::vector instead of builit in array
std::vector<std::vector<double>> vec;
if(inFile)
{
while(getline(inFile, line, '\n'))
{
//create a temporary vector that will contain all the columns
std::vector<double> tempVec;
std::istringstream ss(line);
//read word by word(or double by double)
while(ss >> word)
{
//std::cout<<"word:"<<word<<std::endl;
//add the word to the temporary vector
tempVec.push_back(word);
}
//now all the words from the current line has been added to the temporary vector
vec.emplace_back(tempVec);
}
}
else
{
std::cout<<"file cannot be opened"<<std::endl;
}
inFile.close();
//lets check out the elements of the 2D vector so the we can confirm if it contains all the right elements(rows and columns)
for(std::vector<double> &newvec: vec)
{
for(const double &elem: newvec)
{
std::cout<<elem<<" ";
}
std::cout<<std::endl;
}
return 0;
}
The output of the above program can be seen here. Since you didn't provide data.mat file, i created an example data.mat file and used it in my program which can be found at the above mentioned link.

Cython crashes Python kernel when running code that passes struct of pointers to c function

Getting memory allocation errors when running a compiled version of the following code. This is an application where a struct of pointers is defined and I would like to assign a value to the pointer and then pass this struct to c code. I have seen other examples and questions on this subject and I believe this is being done correctly, however still having issues.
The code will compile fine, however it crashes Python when running it. Debugging with Visual Studio, it is showing a memory access violation. I have researched this quite a bit but am unable to come up with a reason why this is happening. Was able to reproduce this on a different computer.
I believe it has something to do with how the struct test_M is being allocated and referenced on the stack. I've tried several different variations of defining the test_M.param.gain_val, the one shown does allow the code to compile fine and I can get the output to print on the screen. However, Python crashes immediately after this.
Unfortunately I can not modify the c code because this is the format auto-generated code from Matlab/Simulink embedded coder.
Any help would be appreciated.
Using:
python = 3.6
cython = 0.26
numpy = 1.13.1
Visual Studio 2017 v15
ccodetest.c
#include <stdlib.h>
typedef struct P_T_ P_T;
typedef struct tag_T RT_MODEL_T;
struct P_T_ {
double gain_val;
};
struct tag_T {
P_T *param;
};
void compute(double array_in[4], RT_MODEL_T *const test_M)
{
P_T *test_P = ((P_T *) test_M->param);
int size;
size = sizeof(array_in);
int i;
for (i=0; i<size; i++)
{
array_in[i] = array_in[i] * test_P->gain_val;
}
}
cython_param_test.pyx
cimport cython
import numpy as np
cimport numpy as np
from cpython.mem cimport PyMem_Malloc, PyMem_Free
np.import_array()
cdef extern from "ccodetest.c":
ctypedef tag_T RT_MODEL_T
ctypedef P_T_ P_T
cdef struct P_T_:
double gain_val
cdef struct tag_T:
P_T *param
void compute(double array_in[4], RT_MODEL_T *const test_M)
cdef double array_in[4]
def run(
np.ndarray[np.double_t, ndim=1, mode='c'] x_in,
np.ndarray[np.double_t, ndim=2, mode='c'] x_out,
np.ndarray[np.double_t, ndim=1, mode='c'] gain):
cdef RT_MODEL_T* test_M = <RT_MODEL_T*> PyMem_Malloc(sizeof(RT_MODEL_T))
global array_in
test_M.param.gain_val = <double>gain
cdef int idx
try:
for idx in range(len(x_in)):
array_in[idx] = x_in[idx]
compute(array_in, test_M)
for idx in range(len(x_in)):
x_out[idx] = array_in[idx]
finally:
PyMem_Free(test_M)
return None
setup.py
import numpy
from Cython.Distutils import build_ext
def configuration(parent_package='', top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('', parent_package, top_path)
config.add_extension('cython_param_test',
sources=['cython_param_test.pyx'],
# libraries=['m'],
depends=['ccodetest.c'],
include_dirs=[numpy.get_include()])
return config
if __name__ == '__main__':
params = configuration(top_path='').todict()
params['cmdclass'] = dict(build_ext=build_ext)
setup(**params)
run_cython_param_test.py
import cython_param_test
import numpy as np
n_samples = 4
x_in = np.arange(n_samples, dtype='double') % 4
x_out = np.empty((n_samples, 1))
gain = np.ones(1, dtype='double') * 5
cython_param_test.run(x_in, x_out, gain)
print(x_out)

cdef RT_MODEL_T* test_M = <RT_MODEL_T*> PyMem_Malloc(sizeof(RT_MODEL_T))
You allocate space for a RT_MODEL_T. test_M has one member, a pointer to a P_T. Allocating space for the RT_MODEL_T only allocates space to store the pointer - it doesn't allocate a P_T to be pointed to. Where param points is completely arbitrary at the moment and is most likely a memory address that you aren't allowed to write to.
test_M.param.gain_val = ...
You attempt to write to an element of the P_T pointed to by param. However, param does not point to an allocated P_T.
... = <double>gain
You attempt to cast a numpy array to a double. This does not make sense at all. You probably want to get the first element of the numpy array or you should pass gain as just a double rather than a numpy array of doubles?
Since test_M and its contents don't need to live beyond the end of the function they're allocated in, I'd be tempted to allocate them on the stack instead, and that way you can completely avoid malloc and free:
cdef RT_MODEL_T test_M # not a pointer
cdef P_T p_t_instance
p_t_instance.gain_val = gain # or gain[0]?
test_M.param = &p_t_instance
# ...
compute(array_in, &test_M) # pass the address of `test_M`
Only do this if you are sure of the required lifetime of test_M and the P_T it holds a pointer to.

Cython macro definition in structure

I'm using Cython to import a structure to python from C while there are some macro definitions which include functions. I just don't how to realize the structure in Cython.
typedef struct _SparMat {
int m, n;
int *rvec;
int *ridx;
double *rval;
int *cvec;
int *cidx;
double *cval;
int nnz;
int bufsz;
int incsz;
int flag;
#define MAT_ROWBASE_INDEX (0x00000001)
#define MAT_ROWBASE_VALUE (0x00000002)
#define MAT_COLBASE_INDEX (0x00000004)
#define MAT_COLBASE_VALUE (0x00000008)
#define CSR_INDEX(flag) ((flag) & MAT_ROWBASE_INDEX)
#define CSR_VALUE(flag) ((flag) & MAT_ROWBASE_VALUE)
#define CSC_INDEX(flag) ((flag) & MAT_COLBASE_INDEX)
#define CSC_VALUE(flag) ((flag) & MAT_COLBASE_VALUE)
} SparMat, * matptr;

Serialize raw Image buffer (rgb pixels) in C and deserialize in Python

I want to serialize raw image data i.e. uint16 array, and send it over to python using zmq. I am considered using msgPack-c but the only way I found was something like given How do I unpack and extract data properly using msgpack-c?.
if I follow this approach I have to pack each element in my C array separately, which will make it very slow.
Could someone please point to the right direction.

You can send uint16_t array from c side as is, and use ctypes module to access it in python code.
Sending c code:
#include <stdint.h>
#include <stdio.h>
#include <zmq.h>
#define IMAGE_SIZE (256 * 256)
unsigned checksum(uint16_t* data, int len) {
unsigned s = 0;
for (int i = 0; i < len; ++i) {
s += data[i];
}
return s;
}
int main() {
uint16_t image[IMAGE_SIZE];
printf("image checksum: %i\n", checksum(image, IMAGE_SIZE));
void* context = zmq_ctx_new();
void* push = zmq_socket(context, ZMQ_PUSH);
zmq_connect(push, "tcp://127.0.0.1:5555");
zmq_send(push, image, IMAGE_SIZE * sizeof(uint16_t), 0);
zmq_close(push);
zmq_ctx_destroy(context);
return 0;
}
Receiving python code:
from ctypes import c_uint16
import zmq
IMAGE_SIZE = 256 * 256
Image = c_uint16 * IMAGE_SIZE # corresponds to uint16_t[IMAGE_SIZE]
context = zmq.Context(1)
pull = zmq.Socket(context, zmq.PULL)
pull.bind("tcp://127.0.0.1:5555")
message = pull.recv()
image = Image.from_buffer_copy(message)
# This should print the same number as the sending code
# Note that it is different from sum(message)
print(sum(image))

SWIG c++ vector access in python

This may be a noob question but here it goes. I have wrapped a 3d vector into a python module using SWIG. Everything has compiled and I can import the module and perform actions with it. I can't seem to figure out how to access my vector in python to store and change values in it. How do I store and change my vector values in python. My code is below and was written to test if the algorithm stl works with SWIG. It does seem to work but I need to be able to put values into my vector with python.
header.h
#ifndef HEADER_H_INCLUDED
#define HEADER_H_INCLUDED
#include <vector>
using namespace std;
struct myStruct{
int vecd1, vecd2, vecd3;
vector<vector<vector<double> > >vec3d;
void vecSizer();
void deleteDuplicates();
double vecSize();
void run();
};
#endif // HEADER_H_INCLUDED
main.cpp
#include "header.h"
#include <vector>
#include <algorithm>
void myStruct::vecSizer()
{
vec3d.resize(vecd1);
for(int i = 0; i < vec3d.size(); i++)
{
vec3d[i].resize(vecd2);
for(int j = 0; j < vec3d[i].size(); j++)
{
vec3d[i][j].resize(vecd3);
}
}
}
void myStruct::deleteDuplicates()
{
vector<vector<vector<double> > >::iterator it;
sort(vec3d.begin(),vec3d.end());
it = unique(vec3d.begin(),vec3d.end());
vec3d.resize(distance(vec3d.begin(), it));
}
double myStruct::vecSize()
{
return vec3d.size();
}
void myStruct::run()
{
vecSizer();
deleteDuplicates();
vecSize();
}
from the terminal (Ubuntu)
import test #import the SWIG generated module
x = test.myStruct() #create an instance of myStruct
x.vecSize() #run vecSize() should be 0 since vector dimensions are not initialized
0.0
x.vec3d #see if vec3d exists and is of the correct type
<Swig Object of type 'vector< vector< vector< double > > > *' at 0x7fe6a483c8d0>
Thanks in advance!

It turns out that vectors are converted to immutable python objects when the wrapper/interface is generated. So in short you cannot modify wrapped c++ vectors from python.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

creating numpy array in c extension segfaults - python

Related

PyImport_Import segmentation fault after reading in TSV with C++

Cython crashes Python kernel when running code that passes struct of pointers to c function

Cython macro definition in structure

Serialize raw Image buffer (rgb pixels) in C and deserialize in Python

SWIG c++ vector access in python

Categories

Resources