Handling Incoming Data from Multiple Sockets in Python - python

Background:
I have a current implementation that receives data from about 120 different socket connections in python. In my current implementation, I handle each of these separate socket connections with a dedicated thread for each. Each of these threads parse the data and eventually store it within a shared locked dictionary. These sockets DO NOT have uniform data rates, some sockets get more data than others.
Question:
Is this the best way to handle incoming data in python, or does python have a better way on handling multiple sockets per thread?

Using an asynchronous approach will make you much happier. For an example of a well-done implementation of this as a well-known application Tornado is perfect. You can easily use Tornado's ioloop for things other than web servers, too.
There are alternative libraries such as gevent; but I believe Tornado is a better place to look at first since it both provides the loop and a web server implemented on top of it as a great example of how to use the loop well.

If you're using threads, that's basically the way you'd go about it.
The alternative is to use one of the various asynchronous networking libraries out there, such as Twisted, Tornado, or GEvent.

As mentioned in Asynchronous UDP Socket Reading question from you, asyncoro can be used to process many asynchronous sockets efficiently. Another benefit with asyncoro in your problem is that you don't need to worry about locking shared dictionary, as with asyncoro at most one coroutine is executing at any time and there is no forced preemption.

Related

Threads vs Asynchronous Networking (Twisted) Python

I am writing an implementation of a NAT. My algorithm is as follows:
Packet comes in
Check against lookup table if external, add to lookup table if internal
Swap the source address and send the packet on its way
I have been reading about Twisted. I was curious if Twisted takes advantage of multicore CPUs? Assume the system has thousands of users and one packet comes right after the other. With twisted can the lookup table operations be taking place at the same time on each core. I hear with threads the GIL will not allow this anyway. Perhaps I could benifit from multiprocessing>
Nginix is asynchronous and happily serves thousands of users at the same time.
Using threads with twisted is discouraged. It has very good performance when used asynchronously, but the code you write for the request handlers must not block. So if your handler is a pretty big piece of code, break it up into smaller parts and utilize twisted's famous Deferreds to attach the other parts via callbacks. It certainly requires a somewhat different thinking than most programmers are used to, but it has benefits. If the code has blocking parts, like database operations, or accessing other resources via network to get some result, try finding asynchronous libraries for those tasks too, so you can use Deferreds in those cases also. If you can't use asynchronous libraries you may finally use the deferToThread function, which will run the function you want to call in a different thread and return a Deferred for it, and fire your callback when finished, but it's better to use that as a last resort, if nothing else can be done.
Here is the official tutorial for Deferreds:
http://twistedmatrix.com/documents/10.1.0/core/howto/deferredindepth.html
And another nice guide, which can help to get used to think in "async mode":
http://ezyang.com/twisted/defer2.html

Connect two daemons in python

What is the best way to connect two daemons in Python?
I have daemon A and B. I'd like to receive data generated by B in A's module (maybe bidirectional). Both daemons support plugins, so I'd like to shut communication in plugins. What's the best and cross-platform way to do that?
I know few mechanisms from low-level solutions - shared memory (C/C++), linux pipe, sockets (TCP/UDP), etc. and few high-level - queue (JMS, Rabbit), RPC.
Both daemons should run on the same host, but obviously better approach is to abstract from connection type.
What are typical solutions/libraries in python? I'm looking for an elegant and lightweight solution. I don't need external server, just two processes talking with each other.
What should I use in python to do that?
You can use sockets for process communication: http://docs.python.org/howto/sockets.html
Also remote procedure calls suits for that: Python XML RPC http://docs.python.org/library/xmlrpclib.html or Google Protobuf http://code.google.com/p/protobuf/
https://developers.google.com/protocol-buffers/docs/pythontutorial
I am not sure how heavy is your traffic, but I would recommend the asyncore package. Its fairly simple to use and it is based on sockets.
I did a command-pattern with this long time ago. I can dig out the code if you are interested.

Python - Waiting for input from lots of sockets

I'm working on a simple experiment in Python. I have a "master" process, in charge of all the others, and every single process has a connection via unix socket to the master process. I would like to be able for the master process to be able to monitor all of the sockets for a response - but there could theoretically be almost a hundred of them. How would threads impact the memory and performance of the application? What would be the best solution? Thanks a lot!
One hundred simultaneous threads might be pushing the reasonable limits of threading. If you find this is the cleanest way to organize your code, I'd say give it a try, but threading really doesn't scale very far.
What works better is to use a technique like select to wait for one of the sockets to be readable / writable / or has an error to report. This mechanism lets you go to sleep until something interesting happens, handle as many sockets have content to handle, and then go back to sleep again, all in a single thread of execution. Removing the multi-threading can often reduce chances for errors, and this style of programming should get you into the hundreds of connections no trouble. (If you want to go beyond about 100, I'd use the poll functionality instead of select -- constantly rebuilding the list of interesting file descriptors takes time that poll does not require.)
Something to consider is the Python Twisted Framework. They've gone to some length to provide a consistent way to hook callbacks onto events for this exact sort of programming. (If you're familiar with node.js, it's a bit like that, but Python.) I must admit a slight aversion to Twisted -- I never got very far in their documentation without being utterly baffled -- but a lot of people made it further in the docs than I did. You might find it a better fit than I have.
The easiest way to conduct comparative tests of threads versus processes for socket handling is to use the SocketServer in Python's standard library. You can easily switch approaches (while keeping everything else the same) by inheriting from either ThreadingMixIn or ForkingMixIn. Here is a simple example to get you started.
Another alternative is a select/poll approach using non-blocking sockets in a single process and a single thread.
If you're interested in software that is already fully developed and highly evolved, consider these high-performance Python based server packages:
The Twisted framework uses the async single process, single thread style.
The Tornado framework is similar (less evolved, less full featured, but easier to understand)
And Gunicorn which is a high-performance forking server.

Python server for hardware control (possibly with Twisted?)

I'm currently in the process of programming a server which can let clients interact with a piece of hardware. For the interested readers it's a device which monitors the wavelength of a set of lasers concurrently (and controls the lasers). The server should be able to broadcast the wavelengths (a list of floats) on a regular basis and let the clients change the settings of the device through dll calls.
My initial idea was to write a custom protocol to handle the communication, but after thinking about how to handle TCP fragmentation and data encoding I bumped into Twisted, and it looks like most of the work is already done if I use perspective broker to share the data and call server methods directly from the clients. This solution might be a bit overkill, but for me it appeared obvious, what do you think?
My main concern arrose when I thought about the clients. Basically I need two types of clients, one which just displays the wavelengths (this should be straight forward) and a second which can change the device settings and get feedback when it's changed. My idea was to create a single client capable of both, but thinking about combining it with our previous system got me thinking... The second client should be controlled from an already rather complex python framework which controls a lot of independant hardware with relatively strict timing requirements, and the settings of the wavelengthmeter should then be called within this sequential code. Now the thing is, how do I mix this with the Twisted client? As I understand Twisted is not threadsafe, so I can't simply spawn a new thread running the reactor and then inteact with it from my main thread, can I?
Any suggestions for writing this server/client framework through different means than Twisted are very welcome!
Thanks
You can start the reactor in a dedicated thread, and then issue calls to it with blockingCallFromThread from your existing "sequential" code.
Also, I'd recommend AMP for the protocol rather than PB, since AMP is more amenable to heterogeneous environments (see amp-protocol.net for independent protocol information), and it sounds like you have a substantial amount of other technology you might want to integrate with this system.
Have you tried zeromq?
It's a library that simplifies working with sockets. It can operate over TCP and implements several topologies, such as publisher/subscriber (for broadcasting data, such as your laser readings) and request/response (that you can use for you control scheme).
There are bindings for several languages and the site is full of examples. Also, it's amazingly fast.
Good stuff.

Mix Python Twisted with multiprocessing?

I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?
Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.
ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.
You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)

Categories