I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?
Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.
ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.
You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)
Related
I've created a socket object for Telnet communication, and I'm using it to communicate with an API, sending and receiving data. I need to configure it in such a way that I can send and receive data at the same time. By that, I mean data should be sent as soon as the application tries to send it, and data should be processed immediately on receipt. Currently, I have a configuration which allows receipt to be instant, and sending to be second priority with a very short delay.
Currently the best way I have found to do this is by having an event queue, and pushing data to send into it, then having a response queue into which I put messages from the server. I have a thread which polls the buffer every .1 seconds to check for new data, if there isn't any, it then checks the request queue and processes anything there, and that's running in a continuous loop. I then have threads insert data into the request queue, and read data from the response queue. Everything is just about linear enough that this works fine.
This is not "asynchronous", in a sense that I've had to make it as asynchronous as possible without actually achieving it. Is there a proper way to do this? Or is anything under the hood going to be doing exactly the same as I am?
Other things I have investigated as a solution to this problem:
A callback system, where I might call socket.on_receipt(handle_message, args) to call the method handle_message with args as a parameter, passing the received data into the method. The only way I could find to achieve this is by implementing what I already have, then registering a callback for it (in fact, this is very close to what I do already have).
Please note: I am approaching this as a learning exercise to understand better how asynchronous systems work, not to understand how to use a particular library, so please do not suggest an existing library unless it contains very clear code which is simple to understand and answers the question fully and concisely.
This seems like a pretty straightforward use case for asyncio. I wouldn't consider using asyncio as "using a particular library" since socket programming paired with asyncio's event loop is pretty low-level and the concept is very transparent if you have experience with other languages and just want to see how async programming works in Python.
You can use this async chat as an example: https://gist.github.com/gregvish/7665915
Essentially, you create a non-blocking socket, see standard library reference on socket.setblocking(0):
https://docs.python.org/3/library/socket.html#socket.socket.setblocking
I'd also suggest this amazing session by David Beazley as a must-see for async Python programming. He explains the concurrency concepts in Python using sockets, exactly what you need: https://www.youtube.com/watch?v=MCs5OvhV9S4
So I have this problem I am trying to solve in a particular way, but I am not sure how hard it is to achieve.
I would like to use the asyncio/coroutines features of Python 3.4 to trigger many concurrent http requests using a blocking http library, like requests, or any Python api that does http requests like boto for aws.
I know about run_in_executor() method to run tasks in threads/processes, but I would like to avoid that.
I would like to do it in a single-thread, using those select features in Linux/Unix kernel.
Actually I was following David Beazley's presentation on this, and I was trying to use this code: https://github.com/dabeaz/concurrencylive/blob/master/aserver.py
but without the future/pool stuff, and use my blocking-api call instead of computing the Fibonacci number.
Put it seems that the http requests are still running in sequence.
Any ideas if this is possible? And how?
Thanks
Not possible. All the calls that the requests library makes to the underlying socket are blocking (i.e. socket.read) because the socket is in blocking mode. You could put the socket into non-blocking mode, but then socket.read would fail. You basically need an event-loop to tell you when it's possible to do a socket.read, but blocking libraries aren't written with one in mind. This is the whole reason why asyncio exists; providing a default event-loop that different libraries can share and make use of non-blocking file descriptors (e.g. sockets).
Use aiohttp, it's just as easy as requests and in the process you get to learn more about asyncio. asyncio and the new Python 3.5 async/await syntax are the Future of networking IO; yield to it (pun intended).
I would like to have a Python thread wait either for data coming from one socket (serial port, TCP/IP, etc.), or for data coming from another thread.
And I would like a portable Windows-and-Linux solution.
What I am looking for is similar to select.select() but I believe I cannot use select.select() on Windows for inter-thread communication.
Is this possible easily ?
Are you certain that it is necessary to use threads? Are you using some foreign API that requires their use?
Anyway, using Twisted, you can easily listen on any file-like portably (including serial ports and TCP sockets). Additionally, provided that you do in fact need to use threads, Twisted provides several tools for doing so. The simplest method, given your description, would be that you call reactor.callFromThread. If you want to get data back and not simply call the function in the reactor thread, Twisted provides twisted.internet.threads.blockingCallFromThread, which will block until the function in the reactor thread returns (or, if it returns a deferred, until that deferred fires).
Background:
I have a current implementation that receives data from about 120 different socket connections in python. In my current implementation, I handle each of these separate socket connections with a dedicated thread for each. Each of these threads parse the data and eventually store it within a shared locked dictionary. These sockets DO NOT have uniform data rates, some sockets get more data than others.
Question:
Is this the best way to handle incoming data in python, or does python have a better way on handling multiple sockets per thread?
Using an asynchronous approach will make you much happier. For an example of a well-done implementation of this as a well-known application Tornado is perfect. You can easily use Tornado's ioloop for things other than web servers, too.
There are alternative libraries such as gevent; but I believe Tornado is a better place to look at first since it both provides the loop and a web server implemented on top of it as a great example of how to use the loop well.
If you're using threads, that's basically the way you'd go about it.
The alternative is to use one of the various asynchronous networking libraries out there, such as Twisted, Tornado, or GEvent.
As mentioned in Asynchronous UDP Socket Reading question from you, asyncoro can be used to process many asynchronous sockets efficiently. Another benefit with asyncoro in your problem is that you don't need to worry about locking shared dictionary, as with asyncoro at most one coroutine is executing at any time and there is no forced preemption.
I'm building a program that has a class used locally, but I want the same class to be used the same way over the network. This means I need to be able to make synchronous calls to any of its public methods. The class reads and writes files, so I think XML-RPC is too much overhead. I created a basic rpc client/server using the examples from twisted, but I'm having trouble with the client.
c = ClientCreator(reactor, Greeter)
c.connectTCP(self.host, self.port).addCallback(request)
reactor.run()
This works for a single call, when the data is received I'm calling reactor.stop(), but if I make any more calls the reactor won't restart. Is there something else I should be using for this? maybe a different twisted module or another framework?
(I'm not including the details of how the protocol works, because the main point is that I only get one call out of this.)
Addendum & Clarification:
I shared a google doc with notes on what I'm doing. http://docs.google.com/Doc?id=ddv9rsfd_37ftshgpgz
I have a version written that uses fuse and can combine multiple local folders into the fuse mount point. The file access is already handled within a class, so I want to have servers that give me network access to the same class. After continuing to search, I suspect pyro (http://pyro.sourceforge.net/) might be what I'm really looking for (simply based on reading their home page right now) but I'm open to any suggestions.
I could achieve similar results by using an nfs mount and combining it with my local folder, but I want all of the peers to have access to the same combined filesystem, so that would require every computer to bee an nfs server with a number of nfs mounts equal to the number of computers in the network.
Conclusion:
I have decided to use rpyc as it gave me exactly what I was looking for. A server that keeps an instance of a class that I can manipulate as if it was local. If anyone is interested I put my project up on Launchpad (http://launchpad.net/dstorage).
If you're even considering Pyro, check out RPyC first, and re-consider XML-RPC.
Regarding Twisted: try leaving the reactor up instead of stopping it, and just ClientCreator(...).connectTCP(...) each time.
If you self.transport.loseConnection() in your Protocol you won't be leaving open connections.
For a synchronous client, Twisted probably isn't the right option. Instead, you might want to use the socket module directly.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((self.host, self.port))
s.send(output)
data = s.recv(size)
s.close()
The recv() call might need to be repeated until you get an empty string, but this shows the basics.
Alternatively, you can rearrange your entire program to support asynchronous calls...
Why do you feel that it needs to be synchronous?
If you want to ensure that only one of these is happening at a time, invoke all of the calls through a DeferredSemaphore so you can rate limit the actual invocations (to any arbitrary value).
If you want to be able to run multiple streams of these at different times, but don't care about concurrency limits, then you should at least separate reactor startup and teardown from the invocations (the reactor should run throughout the entire lifetime of the process).
If you just can't figure out how to express your application's logic in a reactor pattern, you can use deferToThread and write a chunk of purely synchronous code -- although I would guess this would not be necessary.
If you are using Twisted you should probably know that:
You will not be making synchronous calls to any network service
The reactor can only ever be run once, so do not stop it (by calling reactor.stop()) until your application is ready to exit.
I hope this answers your question. I personally believe that Twisted is exactly the correct solution for your use case, but that you need to work around your synchronicity issue.
Addendum & Clarification:
Part of what I don't understand is
that when I call reactor.run() it
seems to go into a loop that just
watches for network activity. How do I
continue running the rest of my
program while it uses the network? if
I can get past that, then I can
probably work through the
synchronicity issue.
That is exactly what reactor.run() does. It runs a main loop which is an event reactor. It will not only wait for entwork events, but anything else you have scheduled to happen. With Twisted you will need to structure the rest of your application in a way to deal with its asynchronous nature. Perhaps if we knew what kind of application it is, we could advise.