Im trying to run some c code in python using inline from scipy.weave.
Lets say we have 2 double arrays and onbe double value, i wish to add each index of the first index to the corresponiding index of the next index, plus the value.
The C code:
double* first;
double* second;
double val;
int length;
int i;
for (i = 0; i < length; i++) {
second[i] = second[i] + first[i] + val;
}
Then i wish to use the "second" array in my python code again.
Given the following python code:
import numpy
from scipy import weave
first = zeros(10) #first double array
second = ones(10) #second python array
val = 1.0
code = """
the c code
"""
second = inline(code,[first, second, val, 10])
Now i am not shure if this is the correct way of sending in the arrays/getting it out, and how to use/get acces to them within the c code.
Related
I'm new to c++ and looking for a faster way to append pixel value to python list, since currrently on loop it takes around 0.1 second to process one frame of image with resolution of 854x480, do anyone have any idea?
I tried to avoid using third party module if possible.
Here is what I've got so far:
PyObject* byte_list = PyList_New(static_cast<Py_ssize_t>(0));
AVFrame *pFrameRGB = av_frame_alloc();
av_frame_copy_props(pFrameRGB, this->pFrame);
pFrameRGB->width = this->pFrame->width;
pFrameRGB->height = this->pFrame->height;
pFrameRGB->format = AV_PIX_FMT_RGB24;
av_frame_get_buffer(pFrameRGB, 0);
sws_scale(this->swsCtx, this->pFrame->data, this->pFrame->linesize, 0,
this->pCodecContext->height, pFrameRGB->data, pFrameRGB->linesize);
if (this->_debug) {
std::cout << "Frame linesize " << pFrameRGB->linesize[0] << "\n";
std::cout << "Frame width " << pFrameRGB->width << "\n";
std::cout << "Frame height " << pFrameRGB->height << "\n";
}
// This looping method seems slow
for(int y = 0; y < pFrameRGB->height; ++y) {
for(int x = 0; x < pFrameRGB->width; ++x) {
int p = x * 3 + y * pFrameRGB->linesize[0];
int r = pFrameRGB->data[0][p];
int g = pFrameRGB->data[0][p+1];
int b = pFrameRGB->data[0][p+2];
PyList_Append(byte_list, PyLong_FromLong(r));
PyList_Append(byte_list, PyLong_FromLong(g));
PyList_Append(byte_list, PyLong_FromLong(b));
}
}
av_frame_free(&pFrameRGB);
Thanks!
After looking around, I've decided to use Python Built-in Array Library that can use memcpy instead of PyList which require to input the data one by one.
From my test, this improve the speed from 2-10 times, depending on the data.
PyObject *vec_to_array(std::vector<uint8_t>& vec) {
static PyObject *single_array;
if (!single_array) {
PyObject *array_module = PyImport_ImportModule("array");
if (!array_module)
return NULL;
PyObject *array_type = PyObject_GetAttrString(array_module, "array");
Py_DECREF(array_module);
if (!array_type)
return NULL;
single_array = PyObject_CallFunction(array_type, "s[B]", "B", 0);
Py_DECREF(array_type);
if (!single_array)
return NULL;
}
// extra-fast way to create an empty array of count elements:
// array = single_element_array * count
PyObject *pysize = PyLong_FromSsize_t(vec.size());
if (!pysize)
return NULL;
PyObject *array = PyNumber_Multiply(single_array, pysize);
Py_DECREF(pysize);
if (!array)
return NULL;
// now, obtain the address of the array's buffer
PyObject *buffer_info = PyObject_CallMethod(array, "buffer_info", "");
if (!buffer_info) {
Py_DECREF(array);
return NULL;
}
PyObject *pyaddr = PyTuple_GetItem(buffer_info, 0);
void *addr = PyLong_AsVoidPtr(pyaddr);
// and, finally, copy the data.
if (vec.size())
memcpy(addr, &vec[0], vec.size() * sizeof(uint8_t));
return array;
}
after that I passed the vector into that function
std::vector<uint8_t> rgb_arr;
// Copy data from AV Frame
uint8_t* rgb_data[4]; int rgb_linesize[4];
av_image_alloc(rgb_data, rgb_linesize, this->pFrame->width, this->pFrame->height, AV_PIX_FMT_RGB24, 32);
sws_scale(this->swsCtx, this->pFrame->data, this->pFrame->linesize, 0, this->pFrame->height, rgb_data, rgb_linesize);
// Put the data into vector
int rgb_size = pFrame->height * rgb_linesize[0];
std::vector<uint8_t> rgb_vector(rgb_size);
memcpy(rgb_vector.data(), rgb_data[0], rgb_size);
// Transfer the data from vector to rgb_arr
for(int y = 0; y < pFrame->height; ++y) {
rgb_arr.insert(
rgb_arr.end(),
rgb_vector.begin() + y * rgb_linesize[0],
rgb_vector.begin() + y * rgb_linesize[0] + 3 * pFrame->width
);
}
PyObject* arr = vec_to_array(rgb_arr);
This then later can be accessed by python.
Use a container with a faster insertion time, such as std::vector or std::deque instead of std::list. These containers have a constant-time insertion time, whereas std::list has a linear-time insertion time.
Use a bulk insertion method, such as std::vector::insert() or std::deque::insert(), to insert multiple values at once instead of inserting them one at a time. This can reduce the overhead of inserting individual elements.
Use a memory-efficient data structure, such as std::bitset, to store the pixel values if each pixel only has a few possible values (e.g. 0 or 1). This can reduce the memory usage and improve the performance of inserting and accessing the values.
Use C++11's emplace_back() method, which avoids the overhead of constructing and copying elements by constructing the element in place in the container.
Preallocate the memory for the container to avoid the overhead of frequent memory reallocations as the container grows. You can use the reserve() method of std::vector or std::deque to preallocate the memory.
Consider using a faster algorithm or data structure for the image processing task itself. For example, you may be able to use optimized image processing libraries or parallelize the image processing using multi-threading or SIMD instructions.
Debian version: Buster (10) on WSL
numpy version: 1.20.1
python version: 3.7.3
gcc version: 8.3.0
I'm trying to pass a numpy array to a function written in C. The C function sees the array as being twice as long with every other entry being zero. I have no idea why this is happening and couldn't find any instances of someone running into a similar problem.
Here's the contents of function.c:
#include <stdio.h>
void function(int* array, int size) {
int i;
for (i = 0; i < size; ++i) {
printf("i = %d, array value = %d \n", i, array[i]);
}
}
int main() {
return 0;
}
I'm compiling the C code into an .so file using the following steps:
gcc -c -fPIC function.c -o function.o
gcc function.o -shared -o function.so
Here's the python file, main.py, that calls the C code with a simple array.
import numpy as np
import ctypes
c_int_array = np.ctypeslib.ndpointer(dtype=np.int64, ndim=1, flags='C_CONTIGUOUS')
lib = np.ctypeslib.load_library('function.so', '.')
lib.function.argtypes = [c_int_array, ctypes.c_int]
array = np.array([1,2,3,4])
lib.function(array, len(array))
When I run the main.py file, I get the following result.
i = 0, array value = 1
i = 1, array value = 0
i = 2, array value = 2
i = 3, array value = 0
Here's what happens when I multiply the size of the array by two. lib.function(array, 2*len(array))
i = 0, array value = 1
i = 1, array value = 0
i = 2, array value = 2
i = 3, array value = 0
i = 4, array value = 3
i = 5, array value = 0
i = 6, array value = 4
i = 7, array value = 0
There seems to be an issue that occurs when passing the numpy array from python to the C code. Ever other index being zero is very strange.
Any idea what's causing this and how this can be fixed? As usual, I'm probably doing something obvious and stupid but I can't seem to figure this out. I tried to make the code for this question as simple as reasonable.
As I initially hypothesised in my comment, the issue was caused by these lines:
# Python
c_int_array = np.ctypeslib.ndpointer(dtype=np.int64, ndim=1, flags='C_CONTIGUOUS')
// C
void function(int* array, int size)
Here, the Python pointer is a pointer to a 64-bit integer, but the C function accepts a pointer to a 32-bit integer, which is exactly twice as short as the 64-bit one.
There are two ways to solve this:
Pass a pointer to np.int32 from Python
Accept a pointer to long long or int64_t in C:
#include <stdint.h>
void function(int64_t* array, int size)
I want to use a 2D array created by a c function in python. I asked how to do this before today and one approach suggested by #Abhijit Pritam was to use structs. I implemented it and it does work.
c code:
typedef struct {
int arr[3][5];
} Array;
Array make_array_struct() {
Array my_array;
int count = 0;
for (int i = 0; i < 3; i++)
for (int j = 0; j < 5; j++)
my_array.arr[i][j] = ++count;
return my_array;
}
in python I have this:
cdef extern from "numpy_fun.h":
ctypedef struct Array:
int[3][5] arr
cdef Array make_array_struct()
def make_array():
cdef Array arr = make_array_struct()
return arr
my_arr = make_array()
my_arr['arr']
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
However it was suggested that this was not the best approach to the problem because it's possible to make python have control over the data. I'm trying to implement this but I haven't been able to that so far. This is what I have.
c code:
int **make_array_ptr() {
int **my_array = (int **)malloc(3 * sizeof(int *));
my_array[0] = calloc(3 * 5, sizeof(int));
for (int i = 1; i < 3; i++)
my_array[i] = my_array[0] + i * 5;
int count = 0;
for (int i = 0; i < 3; i++)
for (int j = 0; j < 5; j++)
my_array[i][j] = ++count;
return my_array;
}
python:
import numpy as np
cimport numpy as np
np.import_array()
ctypedef np.int32_t DTYPE_t
cdef extern from "numpy/arrayobject.h":
void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
cdef extern from "numpy_fun.h":
cdef int **make_array_ptr()
def make_array():
cdef int[::1] dims = np.array([3, 5], dtype=np.int32)
cdef DTYPE_t **data = <DTYPE_t **>make_array_ptr()
cdef np.ndarray[DTYPE_t, ndim=2] my_array = np.PyArray_SimpleNewFromData(2, &dims[0], np.NPY_INT32, data)
PyArray_ENABLEFLAGS(my_array, np.NPY_OWNDATA)
return my_array
I was following Force NumPy ndarray to take ownership of its memory in Cython
which seems to be what I need to do. In my case is it's different because I need 2D array so I'll likely have to do things a bit differently because for example the function expects data to be a pointer to int and I gave it a pointer to pointer to int.
What do I have to do to use this approach?
My issues with the struct approach is:
It breaks as soon as you want anything but a fixed size of array, with no real way of fixing it.
It relies on Cython's implicit conversion from structs to dicts. Cython copies the data to a Python list, which isn't terribly efficient. This isn't an issue with the small arrays you have here, but it's silly for larger arrays.
I also don't really recommend 2D arrays as pointers-to-pointers. The way numpy (and most other sensible array libraries) implement 2D arrays is to store a 1D array and the shape of the 2D array, and just use the shape to work out what index to access. This tends to be more efficient (faster lookups, faster allocation) and also easier to use (less allocation/deallocation to keep track of).
To do this change the C code to:
int32_t *make_array_ptr() {
int32_t *my_array = calloc(3 * 5, sizeof(int32_t));
int count = 0;
for (int i = 0; i < 3; i++)
for (int j = 0; j < 5; j++)
my_array[j+i*5] = ++count;
return my_array;
}
I've deleted the first loop that you immediately overwrite. I've also changed the type of int32_t since you seem to rely on this in your Cython code later.
The Cython code is then very close to what you were using:
def make_array():
cdef np.intp_t dims[2]
dims[0]=3; dims[1] = 5
cdef np.int32_t *data = make_array_ptr()
cdef np.ndarray[np.int32_t, ndim=2] my_array = np.PyArray_SimpleNewFromData(2, &dims[0], np.NPY_INT32, data)
PyArray_ENABLEFLAGS(my_array, np.NPY_OWNDATA)
return my_array
The main changes are that I've removed some casts and also just allocated dims as a static array (which seemed simpler than memoryviews)
I don't think it's particularly easy allow numpy to handle a pointer-to-pointer array. It might be possible by implementing the Python buffer interface but that that seems like a lot of work and may not be easy.
EDIT 3
I have some C++ code (externed as C) which I access from python.
I want to allocate a double** in python, pass it to the C/C++ code to copy the content of a class internal data, and then use it in python similarly to how I would use a list of lists.
Unfortunately I can not manage to specify to python the size of the most inner array, so it reads invalid memory when iterating over it and the program segfaults.
I can not change the structure of the internal data in C++, and I'd like to have python do the bound checking for me (like if I was using a c_double_Array_N_Array_M instead of an array of pointers).
test.cpp (compile with g++ -Wall -fPIC --shared -o test.so test.cpp )
#include <stdlib.h>
#include <string.h>
class Dummy
{
double** ptr;
int e;
int i;
};
extern "C" {
void * get_dummy(int N, int M) {
Dummy * d = new Dummy();
d->ptr = new double*[N];
d->e = N;
d->i = M;
for(int i=0; i<N; ++i)
{
d->ptr[i]=new double[M];
for(int j=0; j <M; ++j)
{
d->ptr[i][j] = i*N + j;
}
}
return d;
}
void copy(void * inst, double ** dest) {
Dummy * d = static_cast<Dummy*>(inst);
for(int i=0; i < d->e; ++i)
{
memcpy(dest[i], d->ptr[i], sizeof(double) * d->i);
}
}
void cleanup(void * inst) {
if (inst != NULL) {
Dummy * d = static_cast<Dummy*>(inst);
for(int i=0; i < d->e; ++i)
{
delete[] d->ptr[i];
}
delete[] d->ptr;
delete d;
}
}
}
Python (this segfaults. Put it in the same dir in which the test.so is)
import os
from contextlib import contextmanager
import ctypes as ct
DOUBLE_P = ct.POINTER(ct.c_double)
library_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'test.so')
lib = ct.cdll.LoadLibrary(library_path)
lib.get_dummy.restype = ct.c_void_p
N=15
M=10
#contextmanager
def work_with_dummy(N, M):
dummy = None
try:
dummy = lib.get_dummy(N, M)
yield dummy
finally:
lib.cleanup(dummy)
with work_with_dummy(N,M) as dummy:
internal = (ct.c_double * M)
# Dest is allocated in python, it will live out of the with context and will be deallocated by python
dest = (DOUBLE_P * N)()
for i in range(N):
dest[i] = internal()
lib.copy(dummy, dest)
#dummy is not available anymore here. All the C resources has been cleaned up
for i in dest:
for n in i:
print(n) #it segfaults reading more than the length of the array
What can I change in my python code so that I can treat the array as a list?
(I need only to read from it)
3 ways to pass a int** array from Python to C and back
So that Python knows the size of the array when iterating
The data
This solutions work for either 2d array or array of pointers to arrays with slight modifications, without the use of libraries like numpy.
I will use int as a type instead of double and we will copy source, which is defined as
N = 10;
M = 15;
int ** source = (int **) malloc(sizeof(int*) * N);
for(int i=0; i<N; ++i)
{
source[i] = (int *) malloc(sizeof(int) * M);
for(int j=0; j<M; ++j)
{
source[i][j] = i*N + j;
}
}
1) Assigning the array pointers
Python allocation
dest = ((ctypes.c_int * M) * N) ()
int_P = ctypes.POINTER(ctypes.c_int)
temp = (int_P * N) ()
for i in range(N):
temp[i] = dest[i]
lib.copy(temp)
del temp
# temp gets collected by GC, but the data was stored into the memory allocated by dest
# You can now access dest as if it was a list of lists
for row in dest:
for item in row:
print(item)
C copy function
void copy(int** dest)
{
for(int i=0; i<N; ++i)
{
memcpy(dest[i], source[i], sizeof(int) * M);
}
}
Explanation
We first allocate a 2D array. A 2D array[N][M] is allocated as a 1D array[N*M], with 2d_array[n][m] == 1d_array[n*M + m].
Since our code is expecting a int**, but our 2D array in allocated as a int *, we create a temporary array to provide the expected structure.
We allocate temp[N][M], and than we assign the address of the memory we allocated previously temp[n] = 2d_array[n] = &1d_array[n*M] (the second equal is there to show what is happening with the real memory we allocated).
If you change the copying code so that it copies more than M, let's say M+1, you will see that it will not segfault, but it will override the memory of the next row because they are contiguous (if you change the copying code, remember to add increase by 1 the size of dest allocated in python, otherwise it will segfault when you write after the last item of the last row)
2) Slicing the pointers
Python allocation
int_P = ctypes.POINTER(ctypes.c_int)
inner_array = (ctypes.c_int * M)
dest = (int_P * N) ()
for i in range(N):
dest[i] = inner_array()
lib.copy(dest)
for row in dest:
# Python knows the length of dest, so everything works fine here
for item in row:
# Python doesn't know that row is an array, so it will continue to read memory without ever stopping (actually, a segfault will stop it)
print(item)
dest = [internal[:M] for internal in dest]
for row in dest:
for item in row:
# No more segfaulting, as now python know that internal is M item long
print(item)
C copy function
Same as for solution 1
Explanation
This time we are allocating an actual array of pointers of array, like source was allocated.
Since the outermost array ( dest ) is an array of pointers, python doesn't know the length of the array pointed to (it doesn't even know that is an array, it could be a pointer to a single int as well).
If you iterate over that pointer, python will not bound check and it will start reading all your memory, resulting in a segfault.
So, we slice the pointer taking the first M elements (which actually are all the elements in the array). Now python knows that it should only iterate over the first M elements, and it won't segfault any more.
I believe that python copies the content pointed to a new list using this method ( see sources )
2.1) Slicing the pointers, continued
Eryksun jumped in in the comments and proposed a solution which avoids the copying of all the elements in new lists.
Python allocation
int_P = ctypes.POINTER(ctypes.c_int)
inner_array = (ctypes.c_int * M)
inner_array_P = ctypes.POINTER(inner_array)
dest = (int_P * N) ()
for i in range(N):
dest[i] = inner_array()
lib.copy(dest)
dest_arrays = [inner_array_p.from_buffer(x)[0] for x in dest]
for row in dest_arrays:
for item in row:
print(item)
C copying code
Same as for solution 1
3) Contiguous memory
This method is an option only if you can change the copying code on the C side. source will not need to be changed.
Python allocation
dest = ((ctypes.c_int * M) * N) ()
lib.copy(dest)
for row in dest:
for item in row:
print(item)
C copy function
void copy(int * dest) {
for(int i=0; i < N; ++i)
{
memcpy(&dest[i * M], source[i], sizeof(int) * M);
}
}
Explanation
This time, like in case 1) we are allocating a contiguous 2D array. But since we can change the C code, we don't need to create a different array and copy the pointers since we will be giving the expected type to C.
In the copy function, we pass the address of the first item of every row, and we copy M elements in that row, then we go to the next row.
The copy pattern is exactly as in case 1), but this time instead of writing the interface in python so that the C code receives the data how it expects it, we changed the C code to expect the data in that precise format.
If you keep this C code, you'll be able to use numpy arrays as well, as they are 2D row major arrays.
All of this answer is possible thanks the great (and concise) comments of #eryksun below the original question.
The following code is an algorithm to determine the amount of integer triangles, with their biggest side being smaller or equal to MAX, that have an integer median. The Python version works but is too slow for bigger N, while the C++ version is a lot faster but doesn't give the right result.
When MAX is 10, C++ and Python both return 3.
When MAX is 100, Python returns 835 and C++ returns 836.
When MAX is 200, Python returns 4088 and C++ returns 4102.
When MAX is 500, Python returns 32251 and C++ returns 32296.
When MAX is 1000, Python returns 149869 and C++ returns 150002.
Here's the C++ version:
#include <cstdio>
#include <math.h>
const int MAX = 1000;
int main()
{
long long int x = 0;
for (int b = MAX; b > 4; b--)
{
printf("%lld\n", b);
for (int a = b; a > 4; a -= 2){
for (int c = floor(b/2); c < floor(MAX/2); c+=1)
{
if (a+b > 2*c){
int d = 2*(pow(a,2)+pow(b,2)-2*pow(c,2));
if (sqrt(d)/2==floor(sqrt(d)/2))
x+=1;
}
}
}
}
printf("Done: ");
printf("%lld\n", x);
}
Here's the original Python version:
import math
def sumofSquares(n):
f = 0
for b in range(n,4,-1):
print(b)
for a in range(b,4,-2):
for C in range(math.ceil(b/2),n//2+1):
if a+b>2*C:
D = 2*(a**2+b**2-2*C**2)
if (math.sqrt(D)/2).is_integer():
f += 1
return f
a = int(input())
print(sumofSquares(a))
print('Done')
I'm not too familiar with C++ so I have no idea what could be happening that's causing this (maybe an overflow error?).
Of course, any optimizations for the algorithm are more than welcome!
The issue is that the range for your c (C in python) variables do not match. To make them equivalent to your existing C++ range, you can change your python loop to:
for C in range(int(math.floor(b/2)), int(math.floor(n/2))):
...
To make them equivalent to your existing python range, you can change your C++ loop to:
for (int c = ceil(b/2.0); c < MAX/2 + 1; c++) {
...
}
Depending on which loop is originally correct, this will make the results match.
It seams some troubles could be here:
(sqrt(d)==floor(sqrt(d)))