Python/C API - The result is not displayed - python

I would like to integrate C modules in the Python, so my choice fell on the interface Python.h. Everything compiled without errors and warnings, so I can not understand what the problem is.
C side:
#include <python3.5m/Python.h>
...
#define PyInt_AsLong(x) (PyLong_AsLong((x)))
typedef PyObject* Py;
static Py getSumma(Py self, Py args){
Py nums;
if (!PyArg_ParseTuple(args, "O", &nums)){
return NULL;
}
size_t numsAmount = PyList_Size(args);
int32_t summa = 0;
for (size_t i = 0; i < numsAmount; i++){
Py temp = PyList_GetItem(nums, i);
int32_t num = PyInt_AsLong(temp);
summa += num;
}
return Py_BuildValue("l", summa);
}
static PyMethodDef moduleMethods[] = {
{"getSumma", (PyCFunction)getSumma, METH_VARARGS, NULL},
{NULL, NULL, 0, NULL}
};
static PyModuleDef SummaLogic = {
PyModuleDef_HEAD_INIT,
"SummaLogic",
"",
-1,
moduleMethods
};
PyMODINIT_FUNC PyInit_SummaLogic(void){
return PyModule_Create(&SummaLogic);
}
setup.py:
from distutils.core import setup, Extension
SummaLogic = Extension("SummaLogic", sources=['SummaLogic.c'])
setup(ext_modules=[SummaLogic])
Python side:
from SummaLogic import getSumma
if __name__ == "__main__":
a = [1, 2, 3]
b = getSumma(a)
print(b)
It seems right, but when I start it in terminal - nothing happens, just hanging without any activity. What could I miss?

It boils down to PyList_Size and that you don't check for errors there.
You probably wanted to use it on nums, not args as argument. However you used on args and a very interesting thing happened:
args is a tuple,
therefore PyList_Size failed and returned -1
that -1 which was cast to an unsigned size_t which probably resulted in a very huge number, probably 2**64-1
therefore your iteration runs a "very long time" because it takes quite a while to iterate over 2**64-1 items (apart from all the out-of-bound memory accesses).
The quick fix would be to use:
Py_ssize_t listlength = PyList_Size(nums); /* nums instead of args */
if (listlength == -1) { /* check for errors */
return NULL;
}
size_t numsAmount = (size_t)listlength /* cast to size_t only after you checked for errors */
However you should check what the error conditions are and test for them after every python C API function call otherwise you'll get a lot of undefined behaviours. Also I probably would stick to the defined return types instead of int32_t (PyInt_AsLong returns long so you might get weird casting errors there as well!), size_t, ... and the typedef PyObject* Py; makes things really tricky for someone who regularly writes C extensions.

Related

How to return numpy array by reference in pybind11

I am trying to return a numpy array using pybind11 from a C++ object, where the array is created from memory owned by the C++ class. Right now, I have the class exposed by the buffer protocol and to return an py::array:
auto raw_image_cls =
py::class_<RawImage>(m, "RawImage", py::buffer_protocol());
....
.def_buffer([](RawImage &img) -> py::buffer_info {
size_t buff_sz = 0;
return py::buffer_info(
img.ImageData(buff_sz), img.BytesPerPixel(),
GetFormatDescriptor(img.BytesPerPixel()), 2,
{img.Height(), img.Width()},
{img.Width() * img.BytesPerPixel(), img.BytesPerPixel()}
);
})
.def_property_readonly(
"img",
[](RawImage &img) -> py::array {
size_t buff_sz = 0;
// py::capsule buffer_handle([]() {});
py::capsule buffer_handle(img.ImageData(buff_sz),
[](void *p) { free(p); });
return py::array(
py::buffer_info(
img.ImageData(buff_sz), img.BytesPerPixel(),
GetFormatDescriptor(img.BytesPerPixel()), 2,
{img.Height(), img.Width()},
{img.Width() * img.BytesPerPixel(), img.BytesPerPixel()}),
buffer_handle);
},
py::return_value_policy::reference_internal)
.....
When I use numpy and do something like:
a = RawImage(filename)
b = numpy.array(a, copy=False)
a = 0
b
Everything works as expected, and b will remain. If I do:
a = RawImage(filename)
b = a.img
a = 0
b
I get a segfault, which makes sense because a is destroyed, but how do I return a py::array and obtain the same behavior with numpy.array(a, copy=False) - which does not crash?
I have tried:
return py::array(py::buffer_info(
img.ImageData(buff_sz), img.BytesPerPixel(),
GetFormatDescriptor(img.BytesPerPixel()), 2,
{img.Height(), img.Width()},
{img.Width() * img.BytesPerPixel(), img.BytesPerPixel()}));
without the buffer handle, but that just makes a full copy, which really slows things down. Is there a way to tell py:array that we basically just want to return a reference or an object that points to my memory in C++ (from img.ImageData(sz)) to do the same thing as numpy.array(a, copy=False)?
This may be not exactly the answer you are looking for.
The "problem" with a numpy array, as well with std::vector<> is, that it has the ownership. But in Python as well as in C++ the memory will be freed when the object is deleted.
I solved the problem by using a std::vector<T> * and constructing with it a py::array_t<T>:
#include <pybind11/functional.h>
#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
template <typename T>
py::array Vec2NpArray(std::vector<T> *data,
std::vector<size_t> shape) {
// calculate stride of multidimensional data from shape
std::vector<size_t> stride(shape.size(), 0);
size_t elm_stride = sizeof(T);
auto shape_it = shape.rbegin();
auto stride_it = stride.rbegin();
for (; stride_it != stride.rend(); stride_it++, shape_it++) {
*stride_it = elm_stride;
elm_stride *= *shape_it;
}
// tell python to delete pointer when deconstruction
auto capsule = py::capsule(
data, [](void *data) { delete reinterpret_cast<std::vector<T> *>(data); });
return py::array_t<T>(shape, stride, data->data(), capsule);
}
If anyone has another solution, I would be very interested.

Numpy Python/C API - PyArray_SimpleNewFromData hangs

I'm figuring out the Python/C API for a more complex task. Initially, I wrote a simple example of adding two ndarrays of shape = (2,3) and type = float32.
I am able to pass two numpy arrays into c functions, read their dimensions and data and perform custom addion on data. But when I try to wrap the resulting data using PyArray_SimpleNewFromData, code hangs (returns NULL?)
To replicate the issue, create three files: mymath.c, setup.py, test.py in a folder as follows and run test.py (it runs setup.py to compile and install the module and then runs a simple test).
I'm using python in windows, inside an anaconda environment. I'm new to the Python/C API. So, any help would be much appreciated.
​
// mymath.c
#include <Python.h>
#include <stdio.h>
#include "numpy/arrayobject.h"
#include "numpy/npy_math.h"
#include <math.h>
#include <omp.h>
/*
C functions
*/
float* arr_add(float* d1, float* d2, int M, int N){
float * result = (float *) malloc(sizeof(float)*M*N);
for (int m=0; m<M; m++)
for (int n=0; n<N; n++)
result [m*N+ n] = d1[m*N+ n] + d2[m*N+ n];
return result;
}
/*
Unwrap apply and wrap pyObjects
*/
void capsule_cleanup(PyObject *capsule) {
void *memory = PyCapsule_GetPointer(capsule, NULL);
free(memory);
}
// add two 2d arrays (float32)
static PyObject *arr_add_fn(PyObject *self, PyObject *args)
{
PyArrayObject *arr1, *arr2;
if (!PyArg_ParseTuple(args, "OO", &arr1, &arr2))
return NULL;
// get data as flat list
float *d1, *d2;
d1 = (float *) arr1->data;
d2 = (float *) arr2->data;
int M, N;
M = (int)arr1->dimensions[0];
N = (int)arr1->dimensions[1];
printf("Dimensions, %d, %d \n\n", M,N);
PyObject *result, *capsule;
npy_intp dim[2];
dim[0] = M;
dim[1] = N;
float * d3 = arr_add(d1, d2, M, N);
result = PyArray_SimpleNewFromData(2, dim, NPY_FLOAT, (void *)d3);
if (result == NULL)
return NULL;
// -----------This is not executed. code hangs--------------------
for (int m=0; m<M; m++)
for (int n=0; n<N; n++)
printf("%f \n", d3[m*N+n]);
capsule = PyCapsule_New(d3, NULL, capsule_cleanup);
PyArray_SetBaseObject((PyArrayObject *) result, capsule);
return result;
}
/*
Bundle functions into module
*/
static PyMethodDef MyMethods [] ={
{"arr_add", arr_add_fn, METH_VARARGS, "Array Add two numbers"},
{NULL,NULL,0,NULL}
};
/*
Create module
*/
static struct PyModuleDef mymathmodule = {
PyModuleDef_HEAD_INIT,
"mymath", "My doc of mymath", -1, MyMethods
};
PyMODINIT_FUNC PyInit_mymath(void){
return PyModule_Create(&mymathmodule);
}
​
# setup.py
from distutils.core import setup, Extension
import numpy
module1 = Extension('mymath',
sources = ['mymath.c'],
# define_macros = [('NPY_NO_DEPRECATED_API', 'NPY_1_7_API_VERSION')],
include_dirs=[numpy.get_include()],
extra_compile_args = ['-fopenmp'],
extra_link_args = ['-lgomp'])
setup (name = 'mymath',
version = '1.0',
description = 'My math',
ext_modules = [module1])
​
# test.py
import os
os.system("python .\setup.py install")
import numpy as np
import mymath
a = np.arange(6,dtype=np.float32).reshape(2,3)
b = np.arange(6,dtype=np.float32).reshape(2,3)
c = mymath.arr_add(a,b)
print(c)

How do I call global functions on Python objects?

I've seen this page: https://docs.python.org/3/c-api/object.html but there doesn't seem to be any way to call functions like long_lshift or long_or.
It's not essential to me to call these functions, I could also live with the more generic versions, although I'd prefer to call these. Anyways, is there any way to use these? What do I need to include? Below is some example code, where I'd like to use them (simplified):
size_t parse_varint(parse_state* state) {
int64_t value[2] = { 0, 0 };
size_t parsed = parse_varint_impl(state, value);
PyObject* low = PyLong_FromLong(value[0]);
PyObject* high;
if (value[1] > 0) {
high = PyLong_FromLong(value[1]);
PyObject* shift = PyLong_FromLong(64L);
PyObject* high_shifted = long_lshift(high, shift);
state->out = long_or(low, high_shifted);
} else {
state->out = low;
}
PyObject_Print(state->out, stdout, 0);
return 0;
}
I couldn't find these functions in documentation, but they appear to be exported in Python.h header:
PyNumber_Lshift is the replacement for long_shift in my code.
Similarly, PyNumber_Or is the replacement for long_or in my code.

How to properly release Python C API GIL from main thread

I'm trying to embed Python in a C++ multithreaded program.
What I do is calling two statistical functions from the Python C API to perform the Two Sample Kolmogorov-Smirnov Test and the Two Sample Anderson-Darling Test on some data that I collect. So I'm just embedding Python in my code, I'm not extending it or using my own Python functions.
I recently found out that in order to run a multithreaded program that uses the Python C API you need to handle properly the Global Interpreter Lock (GIL) and when ever you use a Python C API function you need to acquire the GIL and then release it when you're done using the API functions.
The thing that I still don't understand is how to properly release the GIL from the main thread in order to let the others execute the Python code.
I tried this (option 1):
int main(int argc, const char * argv[]) {
int n = 4;
std::thread threads[n];
Py_Initialize();
PyEval_InitThreads();
PyEval_SaveThread();
for (int i = 0; i < n; i++) {
threads[i] = std::thread(exec, i);
}
for (int i = 0; i < n; i++) {
threads[i].join();
}
Py_Finalize();
return 0;
}
But it gives me a segmentation fault when calling Py_Finalize().
So I tried this (option 2):
int main(int argc, const char * argv[]) {
int n = 4;
std::thread threads[n];
Py_Initialize();
PyEval_InitThreads();
PyThreadState * Py_UNBLOCK_THREADS
for (int i = 0; i < n; i++) {
threads[i] = std::thread(exec, i);
}
for (int i = 0; i < n; i++) {
threads[i].join();
}
Py_BLOCK_THREADS
Py_Finalize();
return 0;
}
and this (option 3):
int main(int argc, const char * argv[]) {
int n = 4;
std::thread threads[n];
Py_Initialize();
PyEval_InitThreads();
Py_BEGIN_ALLOW_THREADS
for (int i = 0; i < n; i++) {
threads[i] = std::thread(exec, i);
}
for (int i = 0; i < n; i++) {
threads[i].join();
}
Py_END_ALLOW_THREADS
Py_Finalize();
return 0;
}
With both these last two options the code runs but ends with this error:
Exception ignored in: <module 'threading' from '/usr/local/opt/python3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py'>
Traceback (most recent call last):
File "/usr/local/opt/python3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1289, in _shutdown
assert tlock.locked()
AssertionError:
EDIT:
The code that is executed by the spawned threads is this:
double limited_rand(double lower_bound, double upper_bound) {
return lower_bound + (rand() / (RAND_MAX / (upper_bound-lower_bound) ) );
}
double exec_1(std::vector<int> &left_sample, std::vector<int> &right_sample) {
PyGILState_STATE gstate = PyGILState_Ensure(); // Acquiring GIL for thread-safe usage Python C API
PyObject* scipy_stats_module = PyImport_ImportModule("scipy.stats"); // importing "scipy.stats" module
import_array();
npy_intp left_nparray_shape[] = {(npy_intp)left_sample.size()}; // Size of left nparray's first dimension
PyObject* left_sample_nparray = PyArray_SimpleNewFromData(1, left_nparray_shape, NPY_INT, &left_sample[0]); // Creating numpy array with 1 dimension, taking "dim" as a dummy, elements are integers, and the data is taken from "sample1" as a int* pointer
npy_intp right_nparray_shape[] = {(npy_intp)right_sample.size()}; // Size of right nparray's first dimension
PyObject* right_sample_nparray = PyArray_SimpleNewFromData(1, right_nparray_shape, NPY_INT, &right_sample[0]);
PyObject* ks_2samp = PyObject_GetAttrString(scipy_stats_module, "ks_2samp");
Py_DecRef(scipy_stats_module);
PyObject* ks_2samp_return_val = PyObject_CallFunctionObjArgs(ks_2samp, left_sample_nparray, right_sample_nparray, NULL);
Py_DecRef(ks_2samp);
Py_DecRef(right_sample_nparray);
Py_DecRef(left_sample_nparray);
double p_value = PyFloat_AsDouble(PyTuple_GetItem(ks_2samp_return_val, 1));
Py_DecRef(ks_2samp_return_val);
PyGILState_Release(gstate); // Releasing GIL
return p_value;
}
void initialize_c_2d_int_array(int*& c_array, unsigned long row_length_c_array, std::vector<int> &row1, std::vector<int> &row2) {
for (unsigned int i = 0; i < row_length_c_array; i++) {
c_array[i] = row1[i];
c_array[row_length_c_array + i] = row2[i];
}
}
double exec_2(std::vector<int> &left_sample, std::vector<int> &right_sample){
PyGILState_STATE gstate = PyGILState_Ensure(); // Acquiring GIL for thread-safe usage Python C API
PyObject* scipy_stats_module = PyImport_ImportModule("scipy.stats"); // importing "scipy.stats" module
// import_array();
unsigned long n_cols = std::min(left_sample.size(), right_sample.size());
int* both_samples = (int*) (malloc(2 * n_cols * sizeof(int)));
initialize_c_2d_int_array(both_samples, n_cols, left_sample, right_sample);
npy_intp dim3[] = {2, (npy_intp) n_cols};
PyObject* both_samples_nparray = PyArray_SimpleNewFromData(2, dim3, NPY_INT, both_samples);
PyObject* anderson_ksamp = PyObject_GetAttrString(scipy_stats_module, "anderson_ksamp");
Py_DecRef(scipy_stats_module);
PyObject* anderson_2samp_return_val = PyObject_CallFunctionObjArgs(anderson_ksamp, both_samples_nparray, NULL);
Py_DecRef(anderson_ksamp);
Py_DecRef(both_samples_nparray);
free(both_samples);
double p_value = PyFloat_AsDouble(PyTuple_GetItem(anderson_2samp_return_val, 2));
Py_DecRef(anderson_2samp_return_val);
PyGILState_Release(gstate); // Releasing GIL
return p_value;
}
void exec(int thread_id) {
std::vector<int> left_sample;
std::vector<int> right_sample;
int n = 50;
for (int j = 0; j < n; j++) {
int size = 100;
for (int i = 0; i < size; i++) {
left_sample.push_back(limited_rand(0, 100));
right_sample.push_back(limited_rand(0, 100));
}
exec_1(left_sample, right_sample);
exec_2(left_sample, right_sample);
}
}
The functions where I use the Python C API are only exec_1 and exec_2, while exec has just the job to call the repeatedly on new random data. This is the simplest code I could think of that mimics the behavior of my real code. I've also left out every type of error checking when using the Python APIs for a better readability.
Without any other choice I'll run my code like option 2 or option 3 and forget about the error, but I would really like to understand what's going on. Can you help me?
P.S. I'm running Python 3.6.1 under a macOS 10.12.5 system using Xcode 8.3.3. If you need more details let me know.
option1:
I think is giving you a segmentation fault because you called PyEval_SaveThread() (which releases the gil, returns a saved thread state, and sets the current thread state to NULL).
Py_Finalize will try to free all memory associated with the interpreter, and I guess this included the main thread state. So you can either capture this state with:
PyEval_InitThreads(); //initialize and aquire the GIL
//release the GIL, store thread state, set the current thread state to NULL
PyThreadState *mainThreadState = PyEval_SaveThread();
*main code segment*
//re-aquire the GIL (re-initialize the current thread state)
PyEval_RestoreThread(mainThreadState);
Py_Finalize();
return 0;
Or you can immediately call PyEval_ReleaseLock() after calling PyEval_InitThreads() since it looks like the main code segment does not use any embedded python. I had a similar problem and that seemed to fix it.
NOTE: Other threads will still need to aquire/release the GIL wherever necessary

Python C module memory leak

I have a C library adapted for python, it is part of the project rdpy(https://github.com/citronneur/rdpy).
However, there is a memory leak, that I cant kill in python code: deleting references to objects (input=None, output=None), gc.collect(), etc. Leaks analyzers, such as pympler, also dont show objects that occupy memory. I also found that the same code is used in the rdesktop project (https://github.com/rdesktop/rdesktop/blob/master/bitmap.c), but it works correctly there, maybe it's a matter of integration with python. What in this function can make leak a memory?
/* Specific rename for RDPY integration */
#define uint8 unsigned char
#define uint16 unsigned short
#define unimpl(str, code)
#define RD_BOOL int
#define False 0
#define True 1
/* end specific rename */
......................................
static PyObject*
bitmap_decompress_wrapper(PyObject* self, PyObject* args)
{
Py_buffer output, input;
int width = 0, height = 0, bpp = 0;
if (!PyArg_ParseTuple(args, "s*iis*i", &output, &width, &height, &input, &bpp))
return NULL;
if(bitmap_decompress((uint8*)output.buf, width, height, (uint8*)input.buf, input.len, bpp) == False)
return NULL;
Py_RETURN_NONE;
}
static PyMethodDef rle_methods[] =
{
{"bitmap_decompress", bitmap_decompress_wrapper, METH_VARARGS, "decompress bitmap from microsoft rle algorithm."},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC
initrle(void)
{
(void) Py_InitModule("rle", rle_methods);
}

Categories