I need to make Inter-Process Communication between python and objective-C. Both processes should be in a local machine. So which IPC mechanism is best and shall I get any scratch for develop?
So which IPC mechanism is best and shall I get any scratch for
develop?
is very difficult to assess. Pure IPC includes message passing, shared memory, etc... Much depends on the data you are trying to communicate (eg. is data length constant or fixed? Is it binary or pure text? Is the data much or little?
Here's a simple example using shared memory, between C and Python: http://jcoppens.com/univ/iua/str/prog/shm_python.tar.gz
You'll need the sysv_ipc module for Python.
Message passing (msgget and friends) is slightly more flexible - in my opinion.
Related
I'm working on a project that is written in C++ and Python. The communication between the 2 sides is made through TCP sockets. Both processes run on the same machine.
The problem is that it is too slow for the current needs. What is the fastest way to exchange information between C++ and Python?
I heard about ZeroMQ, but would it be noticeably faster than plain TCP sockets?
Edit: OS is Linux, the data that should be transferred consists of multiple floats (lets say around 100 numbers) every 0.02s, both ways. So 50 times per second, the python code sends 100 float numbers to C++, and the C++ code then responds with 100 float numbers.
In case performance is the only metric you care for, shared memory is going to be the fastest way to share data between two processes running on the same machine. You can use a semaphore in shared memory for synchronization.
TCP sockets will work as well, and are probably fast enough. Since you are using Linux, I would just use pipes, it is the simplest approach, and they will outperform TCP sockets. This should get your started: http://man7.org/linux/man-pages/man2/pipe.2.html
For more background information, I recommend Advanced Programming in the UNIX Environment.
If you're in the same machine, use a named shared memory, it's a very fast option. On python you have multiprocessing.shared_memory and in C++ you can use posix shared memory, once you're in Linux.
Short answer, no, but ZeroMQ can have other advantages. Lets go straight for it, if you are on Linux and want fast data transfer, you go Shared memory. But it will not as easy as with ZeroMQ.
Because ZeroMQ is a message queue. It solves (and solves well) different problems. It is able to use IPC between C++ and Python, what can be noticeably faster than using sockets (for the same usages), and gives you a window for network features in your future developments. It is reliable and quite easy to use, with the same API in Python and C++. It is often used with Protobuf to serialize and send data, even for high troughput.
The first problem with IPC on ZeroMQ is that it lacks Windows support, because it is not a POSIX compliant system. But the biggest problem is maybe not there: ZeroMQ is slow because it embeds your message. You can enjoy the benefits of it, but it can impede performances. The best way to check this is, as always, to test it by yourself with IPC-BENCH as I am not sure the benchmark I provided in the previous link was using the IPC. The average gain with IPC against local domain TCP is not fantastic though.
As I previously said, I am quite sure Shared Memory will be the fastest, excluding the last possibility: develop your own C++ wrapper in Python. I would bet that is the fastest solution, but will require a bit of engineering to multithread if needed because both C++ and Python will run on the same process. And of course you need to adjust the current C++ code if it is already started.
And as usual, remember that optimization always happen in a context. If data transfer is only a fraction of the running time when compared to the processing you can do after, or if you can wait the 0.00001sec that using Shared memory would help you to gain, it might be worth it to go directly to ZeroMQ because it will be simpler, more scalable and more reliable.
I have 2 code bases, one in python, one in c++. I want to share real time data between them. I am trying to evaluate which option will work best for my specific use case:
many small data updates from the C++ program to the python program
they both run on the same machine
reliability is important
low latency is nice to have
I can see a few options:
One process writes to a flat file, the other process reads it. It is non scalable, slow and I/O error prone.
One process writes to a database, the other process reads it. This makes it more scalable, slightly less error prone, but still very slow.
Embed my python program into the C++ one or the other way round. I rejected that solution because both code bases are reasonably complex, and I prefered to keep them separated for maintainability reasons.
I use some sockets in both programs, and send messages directly. This seems to be a reasonable approach, but does not leverage the fact that they are on the same machine (it will be optimized slightly by using local host as destination, but still feels cumbersome).
Use shared memory. So far I think this is the most satisfying solution I have found, but has the drawback of being slightly more complex to implement.
Are there other solutions I should consider?
First of all, this question is highly opinion-based!
The cleanest way would be to use them in the same process and get them communicate directly. The only complexity is to implement proper API and C++ -> Python calls. Drawbacks are maintainability as you noted and potentially lower robustness (both crash together, not a problem in most cases) and lower flexibility (are you sure you'll never need to run them on different machines?). Extensibility is the best as it's very simple to add more communication or to change existing. You can reconsider maintainability point. Can you python app be used w/o C++ counterpart? If not I wouldn't worry about maintainability so much.
Then shared memory is the next choice with better maintainability but same other drawbacks. Extensibility is a little bit worse but still not so bad. It can be complicated, I don't know Python support for shared memory operation, for C++ you can have a look at Boost.Interprocess. The main question I'd check first is synchronisation between processes.
Then, network communication. Lots of choices here, from the simplest possible binary protocol implemented on socket level to higher-level options mentioned in comments. It depends how complex your C++ <-> Python communication is and can be in the future. This approach can be more complicated to implement, can require 3rd-party libraries but once done it's extensible and flexible. Usually 3rd-party libraries are based on code generation (Thrift, Protobuf) that doesn't simplify your build process.
I wouldn't seriously consider file system or database for this case.
I am reading financial data from my broker in real time through a websocket API. The client is written in Python. I have another C++ program that reads that data, but the way I am communicating with the python script is through a physical text file.
My questions are...
1) Does constantly rewriting the textfile, opening, reading and closing it everytime affects performance? If so, what's a better way to do it? Performance on my application is crucial.
2) Would using named pipes be a better option? Or is that pretty much the same as writing and reading to a text file?
Modern OS support many different IPC. Pipes, named pipes, sockets, memory mapped files, ... The choice of one solution or the other is very dependent of your application. But broadly speaking, all of them should be "better" than using a plain-old-file.
As IPC are objects managed by the OS, they are not dependent of the language used to write the various process. Some IPC have a file semantic (pipes, named pipes). Other require the use of some dedicated system primitive (mmap). But C++ and Python (and many other language) will support the required system call. In fact, IPC are great to help software written in different languages to speak together.
Why does the multiprocessing package for python pickle objects to pass them between processes, i.e. to return results from different processes to the main interpreter process? This may be an incredibly naive question, but why can't process A say to process B "object x is at point y in memory, it's yours now" without having to perform the operation necessary to represent the object as a string.
multiprocessing runs jobs in different processes. Processes have their own independent memory spaces, and in general cannot share data through memory.
To make processes communicate, you need some sort of channel. One possible channel would be a "shared memory segment", which pretty much is what it sounds like. But it's more common to use "serialization". I haven't studied this issue extensively but my guess is that the shared memory solution is too tightly coupled; serialization lets processes communicate without letting one process cause a fault in the other.
When data sets are really large, and speed is critical, shared memory segments may be the best way to go. The main example I can think of is video frame buffer image data (for example, passed from a user-mode driver to the kernel or vice versa).
http://en.wikipedia.org/wiki/Shared_memory
http://en.wikipedia.org/wiki/Serialization
Linux, and other *NIX operating systems, provide a built-in mechanism for sharing data via serialization: "domain sockets" This should be quite fast.
http://en.wikipedia.org/wiki/Unix_domain_socket
Since Python has pickle that works well for serialization, multiprocessing uses that. pickle is a fast, binary format; it should be more efficient in general than a serialization format like XML or JSON. There are other binary serialization formats such as Google Protocol Buffers.
One good thing about using serialization: it's about the same to share the work within one computer (to use additional cores) or to share the work between multiple computers (to use multiple computers in a cluster). The serialization work is identical, and network sockets work about like domain sockets.
EDIT: #Mike McKerns said, in a comment below, that multiprocessing can use shared memory sometimes. I did a Google search and found this great discussion of it: Python multiprocessing shared memory
I'm writing a small multithreaded client-side python application that contains a small webserver (only serves page to the localhost) and a daemon. The webserver loads and puts data into a persistent "datastore", and the daemon processes this data, modifies it and adds some more. It should also takes care of the synchronization with the disk.
I'd like to avoid complicated external things like SQL or other databases as much as possible.
What are good and simple ways to design the datastore? Bonus points if your solution uses only standard python.
What you're seeking isn't too Python specific, because AFAIU you want to communicate between two different processes, which are only incidentally written in Python. If this indeed is your problem, you should look for a general solution, not a Python-specific one.
I think that a simple No-SQL key-value datastore such as Redis, for example, could be a very nice solution for your situation. Contrary to "complicated" using a tool designed specifically for such a purpose will actually make your code simpler.
If you insist on a Python-only solution, then consider using the Python bindings for SQLite which come pre-installed with Python. An SQLite DB can be concurrently used by two processes in a safe manner, as long as your semantics of data access are well defined (i.e. problems you have to solve anyway, the tool nonwithstanding).