what's the tornado ioloop, and tornado's workflow?

what's the tornado ioloop, and tornado's workflow? - python

i want to know tornado's internal workflow, and have seen this article, it's great, but something i just can't figure out
within the ioloop.py, there is such a function
def add_handler(self, fd, handler, events):
"""Registers the given handler to receive the given events for fd."""
self._handlers[fd] = handler
self._impl.register(fd, events | self.ERROR)
so what's this mean? every request will trigger add_handler or it's just triggered once when init?
every socket connect will generate a file descriptor , or it's just generated once?
what's the relationship between ioloop and iostream ?
how does httpserver work with ioloop and iostream ?
is there any workflow chart, so i can see it clearly ?
sorry for these questiones, i just confused
any link, suggestion, tip helps. many thanks :)

I'll see if I can answer your questions in order:
Here _impl is whichever socket polling mechanism is available, epoll on Linux, select on Windows. So self._impl.register(fd, events | self.ERROR) passes the "wait for some event" request to the underlying operating system, also specifically including error events.
When run, the HTTPServer will register sockets to accept connections on, using IOLoop.add_handler(). When connections are accepted, they will generate more communication sockets, which will likely also add event handlers via an IOStream, which may also call add_handler(). So new handlers will be added both at the beginning, and as connections are recieved.
Yes, every new socket connection will have a unique file descriptor. The original socket the HTTPServer is listening on should keep its file descriptor though. File descriptors are provided by the operating system.
The IOLoop handles events to do with the sockets, for example whether they have data available to be read, whether they may be written to, and whether an error has occured. By using operating system services such as epoll or select, it can do this very efficiently.
An IOStream handles streaming data over a single connection, and uses the IOLoop to do this asynchronously. For example an IOStream can read as much data as is available, then use IOLoop.add_handler() to wait until more data is available.
On listen(), the HTTPServer creates a socket which it uses to listen for connections using the IOLoop. When a connection is obtained, it uses socket.accept() to create a new socket which is then used for communicating with the client using a new HTTPConnection.
The HTTPConnection uses an IOStream to transfer data to or from the client. This IOStream uses the IOLoop to do this in an asynchronous and non-blocking way. Many IOStream and HTTPConnection objects can be active at once, all using the same IOLoop.
I hope this answers some of your questions. I don't know of a good structural chart, but the overall idea should be fairly similar for other webservers too, so there might be some good info around. That in-depth article you linked to did look pretty useful, so if you understand enough I'd recommend giving it another go :).

Related

Is Flask-SocketIO's emit function thread safe?

I have a Flask-SocketIO application. Can I safely call socketio.emit() from different threads? Is socketio.emit() atomic like the normal socket.send()?

The socketio.emit() function is thread safe, or I should say that it is intended to be thread-safe, as there is currently one open issue related to this. Note that 'thread' in this context means a supported threading model. Most people use Flask-SocketIO in conjunction with eventlet or gevent in production, so in those contexts thread means "green" thread.
The open issue is related to using a message queue, which is necessary when you have multiple servers. In that set up, the accesses to the queue are not thread safe at this time. This is a bug that needs to be fixed, but as a workaround, you can create a different socketio object per thread.
On second question regarding if socketio.emit() is atomic, the answer is no. This is not a simple socket write operation. The payload needs to be formatted in certain way to comply with the Socket.IO protocol, then depending on the selected transport (long-polling or websocket) the write happens in a completely different way.

processing HTTPS requests and ZeroMQ messages in parallel inside Twisted

I am trying to do two things in parallel:
intercept HTTPS POST request and make Twisted hold the connection open indefinitely (return NOT_DONE_YET)
listen to ZeroMQ queue (and close one of the pending connections as a result of a ZeroMQ message)
I know how to make each of these two things separately, but how can I do them in parallel? To be specific, how can I receive messages from ZeroMQ inside the Twisted framework, when I am already listening to TCP sockets?

You can get support for ZMQ here, in txZMQ.
It sounds like you already know how to deal with the HTTP request.
To do these things in parallel, just create your ZmqSubConnection and your twisted.web.server.Site, with references to each other as necessary for your application. No special set-up is required.
Just make sure you only create and run one reactor. Other people with similar questions sometimes don't understand that reactor.run() means "run the event loop for the whole process", and not "run this one thing I set up".

asynchronous I/O multiplexing (socket and inter-thread)

I would like to have a Python thread wait either for data coming from one socket (serial port, TCP/IP, etc.), or for data coming from another thread.
And I would like a portable Windows-and-Linux solution.
What I am looking for is similar to select.select() but I believe I cannot use select.select() on Windows for inter-thread communication.
Is this possible easily ?

Are you certain that it is necessary to use threads? Are you using some foreign API that requires their use?
Anyway, using Twisted, you can easily listen on any file-like portably (including serial ports and TCP sockets). Additionally, provided that you do in fact need to use threads, Twisted provides several tools for doing so. The simplest method, given your description, would be that you call reactor.callFromThread. If you want to get data back and not simply call the function in the reactor thread, Twisted provides twisted.internet.threads.blockingCallFromThread, which will block until the function in the reactor thread returns (or, if it returns a deferred, until that deferred fires).

Mix Python Twisted with multiprocessing?

I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?

Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.

ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.

You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)

How can I do synchronous rpc calls

I'm building a program that has a class used locally, but I want the same class to be used the same way over the network. This means I need to be able to make synchronous calls to any of its public methods. The class reads and writes files, so I think XML-RPC is too much overhead. I created a basic rpc client/server using the examples from twisted, but I'm having trouble with the client.
c = ClientCreator(reactor, Greeter)
c.connectTCP(self.host, self.port).addCallback(request)
reactor.run()
This works for a single call, when the data is received I'm calling reactor.stop(), but if I make any more calls the reactor won't restart. Is there something else I should be using for this? maybe a different twisted module or another framework?
(I'm not including the details of how the protocol works, because the main point is that I only get one call out of this.)
Addendum & Clarification:
I shared a google doc with notes on what I'm doing. http://docs.google.com/Doc?id=ddv9rsfd_37ftshgpgz
I have a version written that uses fuse and can combine multiple local folders into the fuse mount point. The file access is already handled within a class, so I want to have servers that give me network access to the same class. After continuing to search, I suspect pyro (http://pyro.sourceforge.net/) might be what I'm really looking for (simply based on reading their home page right now) but I'm open to any suggestions.
I could achieve similar results by using an nfs mount and combining it with my local folder, but I want all of the peers to have access to the same combined filesystem, so that would require every computer to bee an nfs server with a number of nfs mounts equal to the number of computers in the network.
Conclusion:
I have decided to use rpyc as it gave me exactly what I was looking for. A server that keeps an instance of a class that I can manipulate as if it was local. If anyone is interested I put my project up on Launchpad (http://launchpad.net/dstorage).

If you're even considering Pyro, check out RPyC first, and re-consider XML-RPC.
Regarding Twisted: try leaving the reactor up instead of stopping it, and just ClientCreator(...).connectTCP(...) each time.
If you self.transport.loseConnection() in your Protocol you won't be leaving open connections.

For a synchronous client, Twisted probably isn't the right option. Instead, you might want to use the socket module directly.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((self.host, self.port))
s.send(output)
data = s.recv(size)
s.close()
The recv() call might need to be repeated until you get an empty string, but this shows the basics.
Alternatively, you can rearrange your entire program to support asynchronous calls...

Why do you feel that it needs to be synchronous?
If you want to ensure that only one of these is happening at a time, invoke all of the calls through a DeferredSemaphore so you can rate limit the actual invocations (to any arbitrary value).
If you want to be able to run multiple streams of these at different times, but don't care about concurrency limits, then you should at least separate reactor startup and teardown from the invocations (the reactor should run throughout the entire lifetime of the process).
If you just can't figure out how to express your application's logic in a reactor pattern, you can use deferToThread and write a chunk of purely synchronous code -- although I would guess this would not be necessary.

If you are using Twisted you should probably know that:
You will not be making synchronous calls to any network service
The reactor can only ever be run once, so do not stop it (by calling reactor.stop()) until your application is ready to exit.
I hope this answers your question. I personally believe that Twisted is exactly the correct solution for your use case, but that you need to work around your synchronicity issue.
Addendum & Clarification:
Part of what I don't understand is
that when I call reactor.run() it
seems to go into a loop that just
watches for network activity. How do I
continue running the rest of my
program while it uses the network? if
I can get past that, then I can
probably work through the
synchronicity issue.
That is exactly what reactor.run() does. It runs a main loop which is an event reactor. It will not only wait for entwork events, but anything else you have scheduled to happen. With Twisted you will need to structure the rest of your application in a way to deal with its asynchronous nature. Perhaps if we knew what kind of application it is, we could advise.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.