What is the best way to connect two daemons in Python?
I have daemon A and B. I'd like to receive data generated by B in A's module (maybe bidirectional). Both daemons support plugins, so I'd like to shut communication in plugins. What's the best and cross-platform way to do that?
I know few mechanisms from low-level solutions - shared memory (C/C++), linux pipe, sockets (TCP/UDP), etc. and few high-level - queue (JMS, Rabbit), RPC.
Both daemons should run on the same host, but obviously better approach is to abstract from connection type.
What are typical solutions/libraries in python? I'm looking for an elegant and lightweight solution. I don't need external server, just two processes talking with each other.
What should I use in python to do that?
You can use sockets for process communication: http://docs.python.org/howto/sockets.html
Also remote procedure calls suits for that: Python XML RPC http://docs.python.org/library/xmlrpclib.html or Google Protobuf http://code.google.com/p/protobuf/
https://developers.google.com/protocol-buffers/docs/pythontutorial
I am not sure how heavy is your traffic, but I would recommend the asyncore package. Its fairly simple to use and it is based on sockets.
I did a command-pattern with this long time ago. I can dig out the code if you are interested.
Related
I need to mutex several processes running python on a linux host.
They processes are not spawned in a way I control (to be clear, they are my code), so i cannot use multithreading.Lock, at least as I understand it. The resource being synchronized is a series of reads/writes to two separate internal services, which are old, stateful, not designed for concurrent/transactional access, and out of scope to modify.
a couple approaches I'm familiar with but rejected so far:
In native code using shmget / pthread_mutex_lock (eg create a pthread mutex by well-known string name, in shared memory provided by the OS). Im hoping to not have to use/add a ctypes wrapper for this (or ideally have any low-level constructs visible at all here for this high-level app).
Using one of the lock file libraries such as fasteners would work - but requiring any particular file system access is awkward (the library/approach could use it robustly under the hood, but ideally my client code is abstracted from that).
Is there a preferred way to accomplish this in python (under linux; bonus points for cross-platform)?
Options for synchronizing non-child processes:
Use a remote manager. I'm not super familiar with this process, but the docs has at least a simple example.
create a simple server with your own protocol (rather than a manager): something like a socket server on the loopback address for bouncing simple messages around.
use the filesystem: https://pypi.org/project/filelock/
On posix compliant systems, there's a rather straightforward wrapper for IPC constructs posix-ipc. I also found a wrapper for windows semaphores, but it's not quite as simple (though also not difficult per-say). In both cases your program would use a well known string "name" to access / create the mutex. In both cases, care / error checking is needed to handle creation of the mutex properly (see things like O_CREX flag...)
Background:
I have a current implementation that receives data from about 120 different socket connections in python. In my current implementation, I handle each of these separate socket connections with a dedicated thread for each. Each of these threads parse the data and eventually store it within a shared locked dictionary. These sockets DO NOT have uniform data rates, some sockets get more data than others.
Question:
Is this the best way to handle incoming data in python, or does python have a better way on handling multiple sockets per thread?
Using an asynchronous approach will make you much happier. For an example of a well-done implementation of this as a well-known application Tornado is perfect. You can easily use Tornado's ioloop for things other than web servers, too.
There are alternative libraries such as gevent; but I believe Tornado is a better place to look at first since it both provides the loop and a web server implemented on top of it as a great example of how to use the loop well.
If you're using threads, that's basically the way you'd go about it.
The alternative is to use one of the various asynchronous networking libraries out there, such as Twisted, Tornado, or GEvent.
As mentioned in Asynchronous UDP Socket Reading question from you, asyncoro can be used to process many asynchronous sockets efficiently. Another benefit with asyncoro in your problem is that you don't need to worry about locking shared dictionary, as with asyncoro at most one coroutine is executing at any time and there is no forced preemption.
I have to write a CPU-bound server in Python, to distribute workloads for many cores. I want to use Twisted as the server (requests coming in via TCP).
Are there any better options - using Ampoule, perhaps? I also saw an article using Twisted's pb for communication, in conjunction with Popen - or perhaps combine it with multiprocessing?
Ampoule is a good building block for a multiprocess CPU-bound server. It uses the simpler AMP protocol, rather than PB (the complexity of which is usually not needed just to move job data into another process and then retrieve the results). It handles process creation, lifetime management, restarting, etc.
You generally want to avoid using Popen or the multiprocessing standard library module if you're using Twisted. They can cooperate, but they both present blocking-oriented APIs which somewhat defeat the purpose of using Twisted in the first place. Twisted's native child process API, reactor.spawnProcess is as capable and avoids blocking. Ampoule is based on this.
Ampoule is not as widely used as multiprocessing, though. You may find it to have some quirks in your development or deployment environments. I don't think these will be obstacles you can't overcome, though. I developed a service which used Ampoule to distribute the work of parsing large quantities of HTML across multiple CPUs and it eventually worked fine. If you do come across any problems though I encourage you to report them upstream! Eventually I would like to be able to say that Ampoule is as robust as anything (or more so) instead of attaching a disclaimer about its use. :)
I have two python programs and I want to communicate them.
Both of them are system services and none of them is forked by parent process.
Is there any way to do this without using sockets?
(eg by crating some Queue -> serialize it -> deserialize by other process and perform communication; or write on file process id to which perform communication, and then create magic structure which gets process id and send some messages to this process... )
The solution should work on Linux and Windows.
Your best bet is ZeroMQ, which is designed for, and extremely fast at IPC (also supports TCP/multicast messaging as well). The Python bindings are really nice, and easy to work with. There is a nice introduction to ZeroMQ with Python here: http://nichol.as/zeromq-an-introduction. If you were planning to expand this across multiple machines, AMQP (which is a message queue protocol) would be a good to look at, there are a lot of great libraries for working with AMQP for python. I really like kombu and celery. You could also think about twisted, which gives you a fairly insane number of options for communication, and a nice event loop to boot.
On Linux you can use a named pipe. http://en.wikipedia.org/wiki/Named_pipe Just beware, the writing program / thread will block until the reader opens the pipe.
I think windows supports them to some degree.
I wonder what is the best way to handle parallel SSH connections in python.
I need to open several SSH connections to keep in background and to feed commands in interactive or timed batch way.
Is this possible to do it with the paramiko libraries? It would be nice not to spawn a different SSH process for each connection.
Thanks.
It might be worth checking out what options are available in Twisted. For example, the Twisted.Conch page reports:
http://twistedmatrix.com/users/z3p/files/conch-talk.html
Unlike OpenSSH, the Conch server does not fork a process for each incoming connection. Instead, it uses the Twisted reactor to multiplex the connections.
Yes, you can do this with paramiko.
If you're connecting to one server, you can run multiple channels through a single connection. If you're connecting to multiple servers, you can start multiple connections in separate threads. No need to manage multiple processes, although you could substitute the multiprocessing module for the threading module and have the same effect.
I haven't looked into twisted conch in a while, but it looks like it getting updates again, which is nice. I couldn't give you a good feature comparison between the two, but I find paramiko is easier to get going. It takes a little more effort to get into twisted, but it could be well worth it if you're doing other network programming.
You can simply use subprocess.Popen for that purpose, without any problems.
However, you might want to simply install cronjobs on the remote machines. :-)
Reading the paramiko API docs, it looks like it is possible to open one ssh connection, and multiplex as many ssh tunnels on top of that as are wished. Common ssh clients (openssh) often do things like this automatically behind the scene if there is already a connection open.
I've tried clusterssh, and I don't like the multiwindow model. Too confusing in the common case when everything works.
I've tried pssh, and it has a few problems with quotation escaping and password prompting.
The best I've used is dsh:
Description: dancer's shell, or distributed shell
Executes specified command on a group of computers using remote shell
methods such as rsh or ssh.
.
dsh can parallelise job submission using several algorithms, such as using
fan-out method or opening as much connections as possible, or
using a window of connections at one time.
It also supports "interactive mode" for interactive maintenance of
remote hosts.
.
This tool is handy for administration of PC clusters, and multiple hosts.
Its very flexible in scheduling and topology: you can request something close to a calling tree if need be. But the default is a simple topology of one command node to many leaf nodes.
http://www.netfort.gr.jp/~dancer/software/dsh.html
This might not be relevant to your question. But there are tools like pssh, clusterssh etc. that can parallely spawn connections. You can couple Expect with pssh to control them too.