I'm relatively inexperienced with C++, but I need to build a framework to shuffle some data around. Not necessarily relevant, but the general flow path of my data needs to go like this:
Data is generated in a python script
The python object is passed to a compiled C++ extension
The C++ extension makes some changes and passes the data (presumably a pointer?) to compiled C++/CUDA code (.exe)
C++/CUDA .exe does stuff
Data is handled back in the python script and sent to more python functions
Step 3. is where I'm having trouble. How would I go about calling the .exe containing the CUDA code in a way that it can access the data that is seen in the C++ python extension? I assume I should be able to pass a pointer somehow, but I'm having trouble finding resources that explain how. I've seen references to creating shared memory, but I'm unclear on the details there, as well.
There are many ways two executables can exchange data.
Some examples:
write/read data to/from a shared file (don't forget locking so they don't stumble on eachother).
use TCP or UDP sockets between the processes to exchange data.
use shared memory.
if one application starts the other you can pass data via commandline arguments or in the environment.
use pipes between the processes.
use Unix domain sockets between the processes.
And there are more options but the above are probably the most common ones.
What you need to research is IPC (Inter-Process Communication).
Related
I'm looking to wrap a small C++ library for use in Python. I've briefly read up on the Python C-API, ctypes, Cython, SWIG, Boost.Python, and CLIF. Which framework (or other) should I use given my specific use-case described below?
Context: I have a multi-process program that does communication via shared memory. I've written Writer and Reader classes to interact with the shared memory and handle synchronization. I have multiple C++ programs working together via the shared memory and want to add a Python program to the party. Its behavior in interacting with the shared memory should be the same as the C++ programs.
Considerations:
(main) The data packets stored in the shared memory are large, so I want to avoid a copy when returning to Python. Specifically, the Reader::read() method returns a const T& reference so the client can directly read from it. I'd like to preserve this behavior in the Python class.
Would like to directly expose the Writer() and Reader() classes as Python classes without re-implementing logic.
I'm only exposing these two classes, not an entire library. I don't mind some upfront work if it means more maintainability.
All things being equal, I'd like to minimize external dependencies. That said, I don't mind dependencies if it leads to a better outcome.
TL;DR: How can I spawn a different python interpreter (from within python) and create a communication channel between the parent and child when stdin/stdout are unavailable?
I would like my python script to execute a modified python interpreter and through some kind of IPC such as multiprocessing.Pipe communicate with the script that interpreter runs.
Lets say I've got something similar to the following:
subprocess.Popen(args=["/my_modified_python_interpreter.exe",
"--my_additional_flag",
"my_python_script.py"])
Which works fine and well, executes my python script and all.
I would now like to set up some kind of interprocess communication with that modified python interpreter.
Ideally, I would like to share something similar to one of the returned values from multiprocessing.Pipe(), however I will need to share that object with the modified python process (and I suspect multiprocessing.Pipe won't handle that well even if I do that).
Although sending text and binary will be sufficient (I don't need to share python objects or anything), I do need this to be functional on all major OSes (windows, Linux, Mac).
Some more use-case/business explanation
More specifically, the modified interpreter is the IDAPython interpreter that is shipped with IDA to allow scripting within the IDA tool.
Unfortunately, since stdio is already heavily used for the existing user interface functionalities (provided by IDA), I cannot use stdin/stdout for the communication.
I'm searching for possibilities that are better than the one's I thought of:
Use two (rx and tx channels) hard-disk files and pass paths to both as the arguments.
Use a local socket and pass a path as an argument.
Use a memory mapped file and the tagname on windows and some other sync method on other OSes.
After some tinkering with the multiprocessing.Pipe function and the multiprocesing.Connection objects it returns, I realized that serialization of Connection objects is far simpler that I originally thought.
A Connection object has three descripting properties:
fileno - A handle. An arbitrary file descriptor on Unix and a socket on windows.
readable - A boolean controlling whether Connection object can be read.
writable - A boolean controlling whether Connection object can be written.
All three properties are accessible as object attributes and are controllable through the Connection class constructor.
It appears that if:
The process calling Pipe spawns a child process and shares the connection.fileno() number.
The child process creates a Connection object using that file descriptor as the handle.
Both interpreters implement the Connection object roughly the same (And this is the risky part, I guess).
It is possible to Connection.send and Connection.recv between those two processes although they do not share the same interpreter build and the multiprocessing module was not actually used to instantiate the child process.
EDIT:
Please note the Connection class is available as multiprocessing.connection.Connection in python3 and as _multiprocessing.Connection in python2 (which might suggest it's usage is discouraged. YMMV)
Going with the other answer of mine turned out to be a mistake. Because of how handles are inherited in python2 on Windows I couldn't get the same solution to work on Windows machines. I ended up using the far superior Listener and Client interfaces also found in the multiprocessing module.
This question of mine discusses that mistake.
I have several scripts. Each of them does some computation and it is completely independent from the others. Once these computations are done, they will be saved to disk and a record updated.
The record is maintained by an instance of a class, which saves itself to disks. I would like to have a single record instance used in multiple scripts (for example, record_manager = RecordManager(file_on_disk). And then record_manager.update(...) ); but I can't do this right now, because when updating the record there may be concurrent write accesses to the same file on disk, leading to data loss. So I have a separate record manager for every script, and then I merge the records manually later.
What is the easiest way to have a single instance used in all the scripts that solves the concurrent write access problem?
I am using macOS (High sierra) and linux (Ubuntu 16.04).
Thanks!
To build a custom solution to this you will probably need to write a short new queuing module. This queuing module will have write access to the file(s) alone and be passed write actions from the existing modules in your code.
The queue logic and logic should be a pretty straightforward queue architecture.
There may also be libraries that exist in python to handle this problem that would avoid you writing your own queue class.
Finally, it is possible that this whole thing will be/could be handled in some way by your OS, independent of python.
So we have this massive Python project responsible for talking to a piece of hardware.
Currently we open up a python shell, import a bunch of the company's and python classes and run a bunch of commands to initialize the hardware. In the shell we then execute a bunch of functions loaded from our Python API passing in references of the initialized hardware we got from initialization.
We would like to be able to do the same thing via C++, and use it as a wrapper of sorts to send commands into a Python shell.
We cant just pass in scripts that initialize, get the hw reference, and then run some functions and end because the initialize part takes 5-10 seconds, so we want to keep the python instance alive that has the var's to communicate with the the initialized hardware so we can initialize once and then just send send function after function at the hardware at a much faster rate. I'd also like to be able to get the output back to C++.
Hopefully that makes sense what we are trying to do and why, if not let me know.
You can extend Python with C++ easily. Or you can run two processes and use inter-process communication to using methods and functionalities.
Details:
I have source code for both the process.
They communicate over sockets using TCP.
Message size varies from 10bytes to 100KB to 1MB.
Both the process run on the same machine hence latency is ~0.
Python process is the parent and C is the child.
Both the process communicate with each other. i.e Duplex connection.
Source code of C is huge. Won't be easy to wrap it around python. (not too keen to do that as well since C developers might need to learn Python)
Python process is a web app written in Django.
Common place to have the message declarations so when a new field in the message is added, it should be simple to propogate the change to both the process.
Questions:
A common file which contains the format of the messages. What should be the type of the file?
What should be the type of the data structure?
Is it a good idea to use struct in header file and have python parse it?
Any better way?
you should go for XML-RPC the python APIs are given here
http://tldp.org/HOWTO/XML-RPC-HOWTO/xmlrpc-howto-python.html
and the C API are given here
http://xmlrpc-c.sourceforge.net/example-code.php
it will become easy to debug also if XML or JSON RPC is used.