Calling parallel C++ code in Python using Pybind11

Calling parallel C++ code in Python using Pybind11 - python

I have a C++ code that runs in parallel with OpenMP, performing some long calculations. This part works great.
Now, I'm using Python to make a GUI around this code. So, I'd like to call my C++ code inside my python program. For that, I use Pybind11 (but I guess I could use something else if needed).
The problem is that when called from Python, my C++ code runs in serial with only one thread/CPU.
I tried (in two ways) to understand what is done in the documentation of pybind11 here but it does not seem to work at all.
My binding looks like that :
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "../cpp/include/myHeader.hpp"
namespace py = pybind11;
PYBIND11_MODULE(my_module, m) {
m.def("testFunction", &testFunction, py::call_guard<py::gil_scoped_release>());
m.def("testFunction2", [](inputType input) -> outputType {
/* Release GIL before calling into (potentially long-running) C++ code */
py::gil_scoped_release release;
outputType output = testFunction(input);
py::gil_scoped_acquire acquire;
return output;
});
}
Problem: This still does not work and uses only one thread (I verify that with a print of omp_get_num_threads() in an omp parallel region).
Question: What am I doing wrong? What do I need to do to be able to use parallel C++ code inside Python?
Disclaimer: I must admit I don't really understand the GIL thing, particularly in my case where I do not use Python inside my C++ code, which is really "independent" in theory. I just want to be able to use it in another (Python) code.
Have a great day.
EDIT :
I have solved my problem thanks to the pptaszni's answer. Indeed, the GIL things are not needed at all, I misunderstood the documentation. pptaszni's code worked and in fact it was a problem with my CMake file.
Thank you.

It's not really a good answer (too long for a comment thought), because I did not reproduce your problem, but maybe you can isolate the issue in your code by trying this example that works for me:
C++ code:
#include "OpenMpExample.hpp"
#include <algorithm>
#include <iostream>
#include <random>
#include <vector>
#include <omp.h>
constexpr int DATA_SIZE = 10000000;
std::vector<int> testFunction()
{
int nthreads = 0, tid = 0;
std::vector<std::vector<int> > data;
std::vector<int> results;
std::random_device rnd_device;
std::mt19937 mersenne_engine {rnd_device()};
std::uniform_int_distribution<int> dist {-10, 10};
auto gen = [&dist, &mersenne_engine](){ return dist(mersenne_engine); };
#pragma omp parallel private(tid)
{
tid = omp_get_thread_num();
if (tid == 0)
{
nthreads = omp_get_num_threads();
std::cout << "Num threads: " << nthreads << std::endl;
data.resize(nthreads);
results.resize(nthreads);
}
}
#pragma omp parallel private(tid) shared(data, gen)
{
tid = omp_get_thread_num();
data[tid].resize(DATA_SIZE);
std::generate(data[tid].begin(), data[tid].end(), gen);
}
#pragma omp parallel private(tid) shared(data, results)
{
tid = omp_get_thread_num();
results[tid] = std::accumulate(data[tid].begin(), data[tid].end(), 0);
}
for (auto r : results)
{
std::cout << r << ", ";
}
std::cout << std::endl;
return results;
}
I tried to keep the code short, but force the machine to actually do some computations at the same time. Each thread generates 10^7 random integers and then sums them up. Then the python binding does not even require gil_scoped_release:
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "OpenMpExample.hpp"
namespace py = pybind11;
// both versions work for me
// PYBIND11_MODULE(mylib, m) {
// m.def("testFunction", &testFunction, py::call_guard<py::gil_scoped_release>());
// }
PYBIND11_MODULE(mylib, m) {
m.def("testFunction", &testFunction);
}
Example output from python:
Python 3.6.8 (default, Jun 29 2020, 16:38:14)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mylib
>>> x = mylib.testFunction()
Num threads: 12
-10975, -22101, -11333, -28603, -471, -15505, -18141, 2887, -6813, -5328, -13975, -4321,
My environment: Ubuntu 18.04.3 LTS, gcc 8.4.0, openMP 201511, python 3.6.8;

Related

Freeze/Fail when using functional with OpenMP [Pybind11/OpenMP]

I have a problem with the functional feature of Pybind11 when I use it with a for-loop with OpenMP. I've done some research and my problem sounds pretty similar to the one in this Pull Request from 2 years ago, but although this PR is closed and the issue seems to be fixed I still have this issue. A code example I created will hopefully explain my problem better:
b.h
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <omp.h>
namespace py = pybind11;
class B {
public:
B(int n, const int& initial_value);
void map(const std::function<int(int)> &f);
private:
int n;
int* elements;
};
b.cpp
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include "b.h"
namespace py = pybind11;
B::B(int n, const int& v)
: n(n) {
elements = new int[n];
#pragma omp parallel for
for (int i = 0; i < n; i++) {
elements[i] = v;
}
}
void B::map(const std::function<int(int)> &f) {
#pragma omp parallel for
for (int i = 0; i < n; i++) {
elements[i] = f(elements[i]);
}
}
PYBIND11_MODULE(m, handle) {
handle.doc() = "Example Module";
py::class_<B>(handle, "B")
.def(py::init<int, int>())
.def("map", &B::map)
;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.4...3.18)
project(example)
find_package(OpenMP)
add_subdirectory(pybind11)
pybind11_add_module(m b.cpp)
if(OpenMP_CXX_FOUND)
target_link_libraries(m PUBLIC OpenMP::OpenMP_CXX)
else()
message( FATAL_ERROR "Your compiler does not support OpenMP" )
endif()
test.py
from build.m import *
def test(i):
return i * 20
b = B(2, 2)
b.map(test)
I basically have an array where I want to apply a Python function to every element using a for-loop. I know that it is an issue with functional and OpenMP specifically because in other parts of my project I am using OpenMP successfully and functional is also working if I am not using OpenMP.
Edit: It freezes at the map function and has to be terminated. I am using Ubuntu 21.10, Python 3.9, GCC 11.2.0, OpenMP 4.5, and the newest version of the pybind11 repo.

You're likely experiencing a deadlock between OpenMP's scheduler and Python's GIL (Global Interpreter Lock).
I suggest attaching gdb to your process and looking at where the threads are to verify that's really the problem.
IMHO mixing Python functions and OpenMP like that is asking for trouble. If you want multi-threading of Python functions you can use multiprocessing.pool.ThreadPool. But unless your functions release the GIL most of the time you won't benefit from multi-threading.

Embed / Include Python.h into C++ [Full Guide] (Python 3.9) (Windows) (Qt 5.15) [duplicate]

This question already has answers here:
how can i include python.h in QMake
(1 answer)
Embedding python 3.4 into C++ Qt Application?
(4 answers)
Closed 2 years ago.
When I was trying to embed a Python script into my Qt C++ program, I run into multiple problems when trying to include Python.h.
The following features, I would like to provide:
Include python.h
Execute Python Strings
Execute Python Scripts
Execute Python Scripts with Arguments
It should also work when Python is not installed on the deployed machine
Therefore I searched around the Internet to try to find a solution. And found a lot of Questions and Blogs, but non have them covered all my Problems and it still took me multiple hours and a lot of frustration.
That's why I have to write down a StackOverflow entry with my full solution so it might help and might accelerate all your work :)

(This answer and all its code examples work also in a non-Qt environment. Only 2. and 4. are Qt specific)
Download and install Python https://www.python.org/downloads/release
Alter the .pro file of your project and add the following lines (edit for your correct python path):
INCLUDEPATH = "C:\Users\Public\AppData\Local\Programs\Python\Python39\include"
LIBS += -L"C:\Users\Public\AppData\Local\Programs\Python\Python39\libs" -l"python39"
Example main.cpp code:
#include <QCoreApplication>
#pragma push_macro("slots")
#undef slots
#include <Python.h>
#pragma pop_macro("slots")
/*!
* \brief runPy can execut a Python string
* \param string (Python code)
*/
static void runPy(const char* string){
Py_Initialize();
PyRun_SimpleString(string);
Py_Finalize();
}
/*!
* \brief runPyScript executs a Python script
* \param file (the path of the script)
*/
static void runPyScript(const char* file){
FILE* fp;
Py_Initialize();
fp = _Py_fopen(file, "r");
PyRun_SimpleFile(fp, file);
Py_Finalize();
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
runPy("from time import time,ctime\n"
"print('Today is', ctime(time()))\n");
//uncomment the following line to run a script
//runPyScript("test/decode.py");
return a.exec();
}
Whenever you #include <Python.h> use the following code instead. (The Slots from Python will otherwise conflict with the Qt Slots
#pragma push_macro("slots")
#undef slots
#include <Python.h>
#pragma pop_macro("slots")
After compiling, add the python3.dll, python39.dll, as well as the DLLs and Lib Python folders to your compilation folder. You can find them in the root directory of your Python installation. This will allow you to run the embedded c++ code even when python is not installed.
With these steps, I was able to get python running in Qt with the 64 bit MinGW and MSVC compiler. Only the MSVC in debug mode got still a problem.
FURTHER:
If you want to pass arguments to the python script, you need the following function (It can be easy copy-pasted into your code):
/*!
* \brief runPyScriptArgs executs a Python script and passes arguments
* \param file (the path of the script)
* \param argc amount of arguments
* \param argv array of arguments with size of argc
*/
static void runPyScriptArgs(const char* file, int argc, char *argv[]){
FILE* fp;
wchar_t** wargv = new wchar_t*[argc];
for(int i = 0; i < argc; i++)
{
wargv[i] = Py_DecodeLocale(argv[i], nullptr);
if(wargv[i] == nullptr)
{
return;
}
}
Py_SetProgramName(wargv[0]);
Py_Initialize();
PySys_SetArgv(argc, wargv);
fp = _Py_fopen(file, "r");
PyRun_SimpleFile(fp, file);
Py_Finalize();
for(int i = 0; i < argc; i++)
{
PyMem_RawFree(wargv[i]);
wargv[i] = nullptr;
}
delete[] wargv;
wargv = nullptr;
}
To use this function, call it like this (For example in your main):
int py_argc = 2;
char* py_argv[py_argc];
py_argv[0] = "Progamm";
py_argv[1] = "Hello";
runPyScriptArgs("test/test.py", py_argc, py_argv);
Together with the test.py script in the test folder:
import sys
if len(sys.argv) != 2:
sys.exit("Not enough args")
ca_one = str(sys.argv[0])
ca_two = str(sys.argv[1])
print ("My command line args are " + ca_one + " and " + ca_two)
you get the following output:
My command line args are Progamm and Hello

What is the C++ equivalent of this python code?

li = list(map(int,input().split()))
I am pretty new to c++ . I essentialy want the easiest verison of this code which takes in the input I pass in via terminal and push back the output to a vector.
I've tried:
#include <iostream>
#include<vector>
#include<string>
using namespace std;
int main()
{
string input;
vector<int> numbers;
while(getline(cin,input,' ')){
numbers.push_back(stoi(input));
}
for(int i : numbers){
cout << i << endl;
}
return 0;
}
I am using g++ 9.2.0.
The same code works fine on an online ide. I am not sure if it's an issue with the g++ compiler or not.
Weird Stuff!

Your example runs fine for me: https://ideone.com/bFLjB1
You could clean this up a bit by using the default type deduction and whitespace-splitting behavior of cin's operator>>:
std::vector<int> numbers;
int temp = 0;
while (std::cin >> temp) {
numbers.push_back(temp);
}

You can construct a container directly from whitespace delimited input.
#include <iostream>
#include <vector>
#include <iterator>
int main() {
std::vector<int> numbers(std::istream_iterator<int>(std::cin), {});
for(int i : numbers){
std::cout << i << std::endl;
}
return 0;
}

How to debug ctypes without error message

I have a simple python script that uses a c/c++ library with ctypes. My c++ library also contains a main method, so I can compile it without the -shared flag and it can be excecuted and it runs without issues.
However, when I run the same code from a python script using ctypes, a part of the c++ program is excecuted (I can tell that from the cout calls). Then the entire application, including the python script, termiantes (I can tell that from the missing cout and print calls). There is no error message, no segfault, no python stacktrace.
My question is: How can I debug this? What are possible reasons for this to happen?
Here is part of the code, however, since there is no error message, I don't know which code is relevant.
import ctypes
interface = ctypes.CDLL("apprunner.so")
interface.start()
print "complete"
.
#include "../../app/ShaderApp.cpp"
#include <iostream>
#include "TestApp.cpp"
TestApp* app = 0;
extern "C" void start() {
app = new TestApp();
cout << "Running from library" << endl;
app->run();
}
int main( int argc, const char* argv[]) {
cout << "Running from excecutable" << endl;
start();
}

Typically you begin from a small mock-up library that just lets you test the function calls from python. When this is ready (all the debug prints are ok) you proceed further. In your example, comment out #include "testapp.cpp" and get the prints to cout working.

fseek with SEEK_END returns a "Invalid argument" error to manage large data(7GB) with python x86 C extension lib on windows7 x64

I've been trying to manage large binary data(7GB) within python x86 original extension library.
But fseek with SEEK_END doesn't work well.
I put _FILE_OFFSET_BITS 64 macro. I also tried fseeko64, but it raises an error.
With Less than 2GB files or using SEEK_CUR, SEEK_SET it's works fine.
I've been stuck for a couple of days. Could anybody give me ideas?
#define _GNU_SOURCE
#define _LARGEFILE_SOURCE
#define _LARGEFILE64_SOURCE
#define _FILE_OFFSET_BITS 64
#include <Python.h>
#include "structmember.h"
#include <stdio.h>
static PyObject *
MyClass_load(MyClass* self, PyObject *args)
{
const char* file_path;
if (!PyArg_ParseTuple(args, "s", &file_path))
return NULL;
self->fp = fopen(file_path ,"rb");
if (self->fp == NULL) {
PyErr_SetString(PyExc_IOError, "File does not exist.");
return NULL;
}
off_t offset = 0;
if(fseek(self->fp, offset, SEEK_END) != 0){
printf("%s\n", strerror(errno)); // show "Invalid argument"
PyErr_SetString(PyExc_IOError, "Seek failed.");
return NULL;
}
Py_INCREF(Py_None);
return Py_None;
}
Environment:
Windows 7 x64
Python 2.7 x86
MinGW GCC

Using '_fseeki64' and '_ftelli64' like VC works perfectly.
I know I'm still using gcc to compile c file to python c library file, but I don't know why I can use VC code in gcc.
Anyway, problem was solved!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calling parallel C++ code in Python using Pybind11 - python

Related

Freeze/Fail when using functional with OpenMP [Pybind11/OpenMP]

Embed / Include Python.h into C++ [Full Guide] (Python 3.9) (Windows) (Qt 5.15) [duplicate]

What is the C++ equivalent of this python code?

How to debug ctypes without error message

fseek with SEEK_END returns a "Invalid argument" error to manage large data(7GB) with python x86 C extension lib on windows7 x64

Categories

Resources