how to communicate two separate python processes? - python

I have two python programs and I want to communicate them.
Both of them are system services and none of them is forked by parent process.
Is there any way to do this without using sockets?
(eg by crating some Queue -> serialize it -> deserialize by other process and perform communication; or write on file process id to which perform communication, and then create magic structure which gets process id and send some messages to this process... )
The solution should work on Linux and Windows.

Your best bet is ZeroMQ, which is designed for, and extremely fast at IPC (also supports TCP/multicast messaging as well). The Python bindings are really nice, and easy to work with. There is a nice introduction to ZeroMQ with Python here: http://nichol.as/zeromq-an-introduction. If you were planning to expand this across multiple machines, AMQP (which is a message queue protocol) would be a good to look at, there are a lot of great libraries for working with AMQP for python. I really like kombu and celery. You could also think about twisted, which gives you a fairly insane number of options for communication, and a nice event loop to boot.

On Linux you can use a named pipe. http://en.wikipedia.org/wiki/Named_pipe Just beware, the writing program / thread will block until the reader opens the pipe.
I think windows supports them to some degree.

Related

Connect two daemons in python

What is the best way to connect two daemons in Python?
I have daemon A and B. I'd like to receive data generated by B in A's module (maybe bidirectional). Both daemons support plugins, so I'd like to shut communication in plugins. What's the best and cross-platform way to do that?
I know few mechanisms from low-level solutions - shared memory (C/C++), linux pipe, sockets (TCP/UDP), etc. and few high-level - queue (JMS, Rabbit), RPC.
Both daemons should run on the same host, but obviously better approach is to abstract from connection type.
What are typical solutions/libraries in python? I'm looking for an elegant and lightweight solution. I don't need external server, just two processes talking with each other.
What should I use in python to do that?
You can use sockets for process communication: http://docs.python.org/howto/sockets.html
Also remote procedure calls suits for that: Python XML RPC http://docs.python.org/library/xmlrpclib.html or Google Protobuf http://code.google.com/p/protobuf/
https://developers.google.com/protocol-buffers/docs/pythontutorial
I am not sure how heavy is your traffic, but I would recommend the asyncore package. Its fairly simple to use and it is based on sockets.
I did a command-pattern with this long time ago. I can dig out the code if you are interested.

How to implement a long-running, event-driven python program?

I have a series of maintenance tasks for a python WSGI application that are a bit too complex for a crontab (jobs need to be run at frequencies derived from the size of the job queue, manage a connection pool to a group of EC2 instances, etc).
How should I implement a long-running, event-driven python program? I've never needed this functionality before, so I'm not even sure what to google.
Most of the large, modern python sites are using Celery for this type of work. It is a distributed task queue that supports scheduling of tasks as well.
Though probably a bit heavyweight for a small site, it'll grow with you. I'm looking to implement it myself (sans Rabbit) shortly.
I recently found another choice for django users, django-tasks which is focused on fewer, longer, batch processing type jobs. There is also django-ztask using zeromq.
Addendum: Just came across gearman which has python bindings.

Jython(WLST)/Python Communication

I have a want to make a jython and python communcation link. I have a django app and python scripts I use for a front end and do system admin/automation tasks. I use jython for Weblogic 9/10. The thing I want to do is make it so that I can give the jython system a request to do. Such as task A with args a,b,c and then return back a message when it was done.
I want to do this because wlst or jython is slow to start and it becomes a pain to do when I need to do a deploy, or check the status of a server or servers(up to 100 right now). So which would be the easiest way to share information back to the main script or python class while keeping the jython/(wlst) system alive and can easily share / make requests?
The way I have been doing it is using the pickle object. By getting all the data, spitting it out to a file, then loading the file back into the python app/script.
Have you considdered Celery or some other standard Queue/Broker messaging system? django-celery is quite mature and well developed and specifically designed for this kind of task.
Django -> Celery --> Worker Process (always running)
^ |-> Worker Process
| `-> Worker Process -,
\______ Job Complete _____/
The basic idea is that you have always-running worker processes(on 1 or more servers) waiting for messages to come in. (these can be pickled objects or json or whatever you want). These processes idle waiting for Celery (and it's RabbitMQ backend) to dish out a message/job to them. Once the message/job is processed, notification comes back through the broker and calls a callback in django in which you update status.
Celery is a task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).
For this kind of messaging I like to use a real language-agnostic message queueing system that can be used over and over again in future projects. Look at AMQP if you can handle having a message-queue broker in the middle managing all the queues. Or if you don't want a 3rd party broker then look at ZeroMQ.
In both cases you can send messages using a sub-pub queue that can handle several workers per queue if needed. The messages can either be simple text strings http://tnetstrings.org/ or they can be JSON objects or, you might even be able to send pickled Python objects along with code to execute if you are careful. Personally I like using JSON objects (a subset of JSON) and unpack them into Python dicts in order to use them.
I have used both AMQP and ZeroMQ in systems with around 20 communicating Python processes. It works well, and if you need to connect to something non-Python, you will find that there is already an AMQP module and a ZeroMQ library out there.
An interesting extension of your scenario is to have 3 kinds of worker processes written in Jython, CPython and IronPython. That way you can leverage 3rd party Java and .NET modules as well as binary CPython modules like lxml. Combine it with something like Redis so that the processes are completely decoupled and can run on multiple servers if necessary. The workers would put their results into Redis instead of gumming up the message queuing system with big messages interleaved with small ones. If necessary a worker can publish a message containing a Redis key so that another process can retrieve the value.
Pickling is fine, use cPickle for efficiency. However you should not write it to a file. Rather use some other IPC mechanisms, like sockets or pipes (i.e. see https://stackoverflow.com/search?q=python+named+pipes) that avoid the disk-overhead.

Twisted Threading + MapReduce on a single node/server?

I'm confused about Twisted threading.
I've heard and read more than a few articles, books, and sat through a few presentations on the subject of threading vs processes in Python. It just seems to me that unless one is doing lots of IO or wanting to utilize shared memory across jobs, then the right choice is to use multiprocessing.
However, from what I've seen so far, it seems like Twisted uses Threads (pThreads from the python threading module). And Twisted seems to perform really really well in processing lots of data.
I've got a fairly large number of processes that I'd like to distribute processing to using the MapReduce pattern in Python on a single node/server. They don't do any IO really, they just do a lot of processing.
Is the Twisted reactor the right tool for this job?
The short answer to your question: no, twisted threading is not the right solution for heavy processing.
If you have a lot of processing to do, twisted's threading will still be subject to the GIL (Global Interpreter Lock). Without going into a long in depth explanation, the GIL is what allows only one thread at a time to execute python code. What this means in effect is you will not be able to take advantage of multiple cores with a single multi-threaded twisted process. That said, some C modules (such as bits of SciPy) can release the GIL and run multi-threaded, though the python code associated is still effectively single-threaded.
What twisted's threading is mainly useful for is using it along with blocking I/O based modules. A prime example of this is database API's, because the db-api spec doesn't account for asynchronous use cases, and most database modules adhere to the spec. Thusly, to use PostgreSQL for example from a twisted app, one has to either block or use something like twisted.enterprise.adbapi which is a wrapper that uses twisted.internet.threads.deferToThread to allow a SQL query to execute while other stuff is going on. This can allow other python code to run because the socket module (among most others involving operating system I/O) will release the GIL while in a system call.
That said, you can use twisted to write a network application talking to many twisted (or non-twisted, if you'd like) workers. Each worker could then work on little bits of work, and you would not be restricted by the GIL, because each worker would be its own completely isolated process. The master process can then make use of many of twisted's asynchronous primitives. For example you could use a DeferredList to wait on a number of results coming from any number of workers, and then run a response handler when all of the Deferred's complete. (thus allowing you to do your map call) If you want to go down this route, I recommend looking at twisted.protocols.amp, which is their Asynchronous Message Protocol, and can be used very trivially to implement a network-based RPC or map-reduce.
The downside of running many disparate processes versus something like multiprocessing is that
you lose out on simple process management, and
the subprocesses can't share memory as if they would if they were forked on a unix system.
Though for modern systems, 2) is rarely a problem unless you are running hundreds of subprocesses. And problem 1) can be solved by using a process management system like supervisord
Edit For more on python and the GIL, you should watch Dave Beazley's talks on the subject ( website , video, slides )

how to process long-running requests in python workers?

I have a python (well, it's php now but we're rewriting) function that takes some parameters (A and B) and compute some results (finds best path from A to B in a graph, graph is read-only), in typical scenario one call takes 0.1s to 0.9s to complete. This function is accessed by users as a simple REST web-service (GET bestpath.php?from=A&to=B). Current implementation is quite stupid - it's a simple php script+apache+mod_php+APC, every requests needs to load all the data (over 12MB in php arrays), create all structures, compute a path and exit. I want to change it.
I want a setup with N independent workers (X per server with Y servers), each worker is a python app running in a loop (getting request -> processing -> sending reply -> getting req...), each worker can process one request at a time. I need something that will act as a frontend: get requests from users, manage queue of requests (with configurable timeout) and feed my workers with one request at a time.
how to approach this? can you propose some setup? nginx + fcgi or wsgi or something else? haproxy? as you can see i'am a newbie in python, reverse-proxy, etc. i just need a starting point about architecture (and data flow)
btw. workers are using read-only data so there is no need to maintain locking and communication between them
The typical way to handle this sort of arrangement using threads in Python is to use the standard library module Queue. An example of using the Queue module for managing workers can be found here: Queue Example
Looks like you need the "workers" to be separate processes (at least some of them, and therefore might as well make them all separate processes rather than bunches of threads divided into several processes). The multiprocessing module in Python 2.6 and later's standard library offers good facilities to spawn a pool of processes and communicate with them via FIFO "queues"; if for some reason you're stuck with Python 2.5 or even earlier there are versions of multiprocessing on the PyPi repository that you can download and use with those older versions of Python.
The "frontend" can and should be pretty easily made to run with WSGI (with either Apache or Nginx), and it can deal with all communications to/from worker processes via multiprocessing, without the need to use HTTP, proxying, etc, for that part of the system; only the frontend would be a web app per se, the workers just receive, process and respond to units of work as requested by the frontend. This seems the soundest, simplest architecture to me.
There are other distributed processing approaches available in third party packages for Python, but multiprocessing is quite decent and has the advantage of being part of the standard library, so, absent other peculiar restrictions or constraints, multiprocessing is what I'd suggest you go for.
There are many FastCGI modules with preforked mode and WSGI interface for python around, the most known is flup. My personal preference for such task is superfcgi with nginx. Both will launch several processes and will dispatch requests to them. 12Mb is not as much to load them separately in each process, but if you'd like to share data among workers you need threads, not processes. Note, that heavy math in python with single process and many threads won't use several CPU/cores efficiently due to GIL. Probably the best approach is to use several processes (as much as cores you have) each running several threads (default mode in superfcgi).
The most simple solution in this case is to use the webserver to do all the heavy lifting. Why should you handle threads and/or processes when the webserver will do all that for you?
The standard arrangement in deployments of Python is:
The webserver start a number of processes each running a complete python interpreter and loading all your data into memory.
HTTP request comes in and gets dispatched off to some process
Process does your calculation and returns the result directly to the webserver and user
When you need to change your code or the graph data, you restart the webserver and go back to step 1.
This is the architecture used Django and other popular web frameworks.
I think you can configure modwsgi/Apache so it will have several "hot" Python interpreters
in separate processes ready to go at all times and also reuse them for new accesses
(and spawn a new one if they are all busy).
In this case you could load all the preprocessed data as module globals and they would
only get loaded once per process and get reused for each new access. In fact I'm not sure this isn't the default configuration
for modwsgi/Apache.
The main problem here is that you might end up consuming
a lot of "core" memory (but that may not be a problem either).
I think you can also configure modwsgi for single process/multiple
thread -- but in that case you may only be using one CPU because
of the Python Global Interpreter Lock (the infamous GIL), I think.
Don't be afraid to ask at the modwsgi mailing list -- they are very
responsive and friendly.
You could use nginx load balancer to proxy to PythonPaste paster (which serves WSGI, for example Pylons), that launches each request as separate thread anyway.
Another option is a queue table in the database.
The worker processes run in a loop or off cron and poll the queue table for new jobs.

Categories