So we have this massive Python project responsible for talking to a piece of hardware.
Currently we open up a python shell, import a bunch of the company's and python classes and run a bunch of commands to initialize the hardware. In the shell we then execute a bunch of functions loaded from our Python API passing in references of the initialized hardware we got from initialization.
We would like to be able to do the same thing via C++, and use it as a wrapper of sorts to send commands into a Python shell.
We cant just pass in scripts that initialize, get the hw reference, and then run some functions and end because the initialize part takes 5-10 seconds, so we want to keep the python instance alive that has the var's to communicate with the the initialized hardware so we can initialize once and then just send send function after function at the hardware at a much faster rate. I'd also like to be able to get the output back to C++.
Hopefully that makes sense what we are trying to do and why, if not let me know.
You can extend Python with C++ easily. Or you can run two processes and use inter-process communication to using methods and functionalities.
Related
Greetings Stackoverflow community!
I recently learned about the power of microservices and containers, and I decided to wrap some of my numerical simulations codes in C++ and make them available as an API. Here are some requirements/details of my applications:
My simulators are coded in C++ with a few dependencies that I link via dynamic or static libraries in windows (e.g. Hypre, for solution of linear systems). They also run in parallel with MPI/OpenMP (in the future I would like to implement CUDA support as well).
The input for a simulator is a simple configuration file with some keys (.json format) and a data file (ascii but could be binary as well) with millions of entries (these are fields with one value for each simulation cells, and my models can be as large as 500x500x500 (=125000000 cells).
A typical call to the simulator in Windows is: mpiexec -n 4 mysimulator.exe "C:\path\to\config.json". Inside my configuration file I have other absolute path to the ascii file with the cellwise values.
I would like to "containerize" this whole mess and create an api available through HTTP requests or any other protocol that would allow the code to be run from outside the container. While the simulation microservice is running on a remote machine, anyone should be able to send a configuration file and the big ascii or binary file to the container, which would receive the request, perfom the simulation and somehow send back the results (which can be files and/or numerical values).
After some research, I feel this could be achieved with the following approach.
Create a docker image with the C++ code. When the container is created using the image as a blueprint, we obtain a binary executable of the C++ simulator.
Implement a python interface that handles the incoming requests using flask or django. We listen to requests at a certain port and once we get a request, we call the binary executable using python's subprocess.
The simulator somehow needs to send a "simulation status" back since these simulations can take hours to finish.
I have a few questions:
Is python "subprocess" call to a binary executable with the C++ code the way to go? Or is it easier/more recommended to implement the treatment to the API calls inside the C++ code?
How do you typically send a big binary/ascii file through HTTP to the microservice running inside a docker container?
If I have a workstation with - let's say - 16 cores...and I want to allow each user to run at most 2 processors, I could have a max of 8 parallel instances. This way, would I need 8 containers running simultaneously in the computer?
Since the simulations take hours to finish, what's the best approach to interact with the client who's requesting the simulation results? Are events typically used in this context?
Thanks,
Rafael.
Is python "subprocess" call to a binary executable with the C++ code the way to go? Or is it easier/more recommended to implement the treatment to the API calls inside the C++ code?
If you don't have performance concerns, use whatever faster to achieve and easier to scale according to your skills. Use the language that you're comfortable with. If performance is essential, then choose it wisely or refactor them later.
How do you typically send a big binary/ascii file through HTTP to the microservice running inside a docker container?
Depends on the scenario. It's possible to send a data through end point or send them part by part. You may refer to this post for restful update.
If I have a workstation with - let's say - 16 cores...and I want to allow each user to run at most 2 processors, I could have a max of 8 parallel instances. This way, would I need 8 containers running simultaneously in the computer?
Keep your service simple. If one service uses only 1 or 2 cores. Then run multiple instance. Since it's easy to scale rather than create a complex multithreading program.
Since the simulations take hours to finish, what's the best approach to interact with the client who's requesting the simulation results? Are events typically used in this context?
Event would be good enough. Use polling if simulation status is important.
Note: This is more of opinion based post, but it has general scenarios worth answering.
I have the following problem, Lets have this python function
def func():
run some code here which calls some native code
Inside func() I am calling some functions which in turn calls some native C code.
If any crash happens the whole python process crashes alltoghether.
How is possible to catch and recover from such errors?
One way that came to my mind is run this function in a separate process, but not just starting another process because there is a lot of memory and objects used by the function, will be very hard to split that. Is there something like fork() in C available in python, to create a copy of the same exact process with same memory structures and etc?
Or maybe other ideas?
Update:
It seems that there is no real way of catching the C runtime errors in python, those are at a lower level and crashes the whole Python virtual machine.
As solutions you currently have two options:
Use os.fork() but work only in unix like OS env.
Use multiprocessing and a shared memory model to share big objects between processes. Usual serialization will just not work with objects that have multi-gigabytes in memory (you will just run out of memory). However there is a very good python library called Ray (https://docs.ray.io/en/master/) that performs in-memory big objects serialization using shared memory model and it's ideal for BigData/ML workloads - highly recommended.
As long as you are running on an operating system that supports fork that's already how the multiprocessing module creates subprocesses. You could os.fork, multiprocessing.Process or multiprocessing.Pool to get what you want. You can also use the os.fork() call on these systems.
I'm relatively inexperienced with C++, but I need to build a framework to shuffle some data around. Not necessarily relevant, but the general flow path of my data needs to go like this:
Data is generated in a python script
The python object is passed to a compiled C++ extension
The C++ extension makes some changes and passes the data (presumably a pointer?) to compiled C++/CUDA code (.exe)
C++/CUDA .exe does stuff
Data is handled back in the python script and sent to more python functions
Step 3. is where I'm having trouble. How would I go about calling the .exe containing the CUDA code in a way that it can access the data that is seen in the C++ python extension? I assume I should be able to pass a pointer somehow, but I'm having trouble finding resources that explain how. I've seen references to creating shared memory, but I'm unclear on the details there, as well.
There are many ways two executables can exchange data.
Some examples:
write/read data to/from a shared file (don't forget locking so they don't stumble on eachother).
use TCP or UDP sockets between the processes to exchange data.
use shared memory.
if one application starts the other you can pass data via commandline arguments or in the environment.
use pipes between the processes.
use Unix domain sockets between the processes.
And there are more options but the above are probably the most common ones.
What you need to research is IPC (Inter-Process Communication).
I've been trying to create a C++ program that embeds multiple python threads. Due to the nature of the program the advantage of multitasking comes from asynchronous I/O; but due to some variables that need to be altered between context switching I need to control the scheduling. I thought that because of python's GIL lock this would be simple enough, but it's turning out not to be: python wants to use POSIX threads rather than software threads, I can't figure out from the documentation what happens if I store the result of PyEval_SaveThread() and don't call PyEval_RestoreThread() in the same function--so presumably I'm not supposed to be doing that, etc.
Is it possible to create a custom scheduler for embedded python threads, or was python basically designed so that it can't be done?
It turns out that using PyEval_SaveThread() and PyEval_RestoreThread() is unnecessary, basically I used coroutines to run the scripts and control the scheduling. In this case from libPCL. However this isn't really much of a solution because if python encounters a syntax error it will segfault if it is in a coroutine, oddly enough even if there is only one python script running in one coroutine this will still happen. But at the very least they don't seem to conflict with each other.
I have a constantly running Python code on Linux, every so often outside data needs to be fed into this code so Python code can alter a file.
How do I go about structuring Python code so it receives these arguments for further processing?
I found some stuff on outgoing args, Running external program using pipes and passing arguments in python
But looking for in coming args
Flexible with how arguments get passed down
You need some kind of Inter Process Communication.
For example, you can feed program's standard input. You can read it by reading from sys.stdin, but it requires the program that started your process to give its handle to another process.
Another way is to create a socket of some kind. That's far more scalable, allows connecting to the program when it's running on another machine, and allows non-Python processes to easily communicate with your process.