I just thought about the non-blocking infrastructure of tornado and event-driven programming. Actually I'm writing a simple webapp which is accessing a HTTP-API of an external webservice. I understand why I should call this API non-blocking. But are there any disadvantages if I do just the first call non-blocking, so that the IOLoop can loop further?
For Example:
#tornado.web.asynchronous
def get(self):
nonblocking_call1(self._callback)
def _callback(self, response):
self.write(str(response))
self.write(str(blocking_call2()))
self.write(str(blocking_call3()))
self.finish()
vs.
#tornado.web.asynchronous
def get(self):
nonblocking_call1(self._nonblocking_callback1)
def _callback1(self, response):
self.write(str(response))
nonblocking_call2(self._nonblocking_callback2)
def _callback2(self, response):
self.write(str(response))
nonblocking_call3(self._nonblocking_callback3)
def _callback3(self, response):
self.write(str(response))
self.finish()
If you use blocking code inside tornado, the same tornado process can not process any other requests while any blocking code is waiting. Your app will not support more than one simultaneous user, and even if the blocking call only takes something like 100ms, it will still be a HUGE performance killer.
If writing this way is exhausting for you (it is for me), you can use tornado's gen module:
class GenAsyncHandler(RequestHandler):
#asynchronous
#gen.engine
def get(self):
http_client = AsyncHTTPClient()
response = yield gen.Task(http_client.fetch, "http://example.com")
do_something_with_response(response)
self.render("template.html")
Related
Below tornado APP has 2 end points. One(/) is slow because it waits for an IO operation and other(/hello) is fast.
My requirement is to make a request to both end points simultaneously.I observed it takes 2nd request only after it finishes the 1st one. Even though It is asynchronous why it is not able to handle both requests at same time ?
How to make it to handle simultaneously?
Edit : I am using windows 7, Eclipse IDE
****************Module*****************
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self.do_something()
self.write("FINISHED")
self.finish()
def do_something(self):
inp = input("enter to continue")
print (inp)
class HelloHandler(tornado.web.RequestHandler):
def get(self):
print ("say hello")
self.write("Hello bro")
self.finish(
def make_app():
return tornado.web.Application([
(r"/", MainHandler),
(r"/hello", HelloHandler)
])
if __name__ == "__main__":
app = make_app()
app.listen(8888)
tornado.ioloop.IOLoop.current().start()
It is asynchronous only if you make it so. A Tornado server runs in a single thread. If that thread is blocked by a synchronous function call, nothing else can happen on that thread in the meantime. What #tornado.web.asynchronous enables is the use of generators:
#tornado.web.asynchronous
def get(self):
yield from self.do_something()
^^^^^^^^^^
This yield/yield from (in current Python versions await) feature suspends the function and lets other code run on the same thread while the asynchronous call completes elsewhere (e.g. waiting for data from the database, waiting for a network request to return a response). I.e., if Python doesn't actively have to do something but is waiting for external processes to complete, it can yield processing power to other tasks. But since your function is very much running in the foreground and blocking the thread, nothing else will happen.
See http://www.tornadoweb.org/en/stable/guide/async.html and https://docs.python.org/3/library/asyncio.html.
I want to write a simple async http server with Tornado.
It is not clear to me how to set the callback in order to free the server for additional requests while the current request is processed.
The code I wrote is:
import tornado.web
from tornado.ioloop import IOLoop
from tornado import gen
import time
class TestHandler(tornado.web.RequestHandler):
#gen.coroutine
def post(self, *args, **kwargs):
json_input = tornado.escape.json_decode(self.request.body)
print ('Now in POST. body: {}'.format(json_input))
self.perform_long_task(*args, **json_input)
#gen.coroutine
def perform_long_task(self, **params):
time.sleep(10)
self.write(str(params))
self.finish()
application = tornado.web.Application([
(r"/test", TestHandler),
])
application.listen(9999)
IOLoop.instance().start()
To test I tried to send few POST requests in parallel:
curl -v http://localhost:9999/test -X POST -H "Content-Type:appication/json" -d '{"key1": "val1", "key2": "val2"}' &
Currently the server is blocked while perform_long_task() is processed.
I need help getting the server to be a non-blocking.
Never use time.sleep in Tornado code!
http://www.tornadoweb.org/en/latest/faq.html#why-isn-t-this-example-with-time-sleep-running-in-parallel
Do this in your code instead:
class TestHandler(tornado.web.RequestHandler):
#gen.coroutine
def post(self, *args, **kwargs):
json_input = tornado.escape.json_decode(self.request.body)
print ('Now in POST. body: {}'.format(json_input))
# NOTE: yield here
yield self.perform_long_task(*args, **json_input)
#gen.coroutine
def perform_long_task(self, **params):
yield gen.sleep(10)
self.write(str(params))
# NOTE: no need for self.finish()
You don't need to call self.finish - when the "post" coroutine finishes, Tornado automatically finishes the request.
You must yield self.perform_long_task(), though, otherwise Tornado will end your request early, before you've called "self.write()".
Once you make these changes, two "curl" commands will show that you're doing concurrent processing in Tornado.
I'm still using time.sleep() as my code calls other code that I can't control how is written.
The FAQ http://www.tornadoweb.org/en/latest/faq.html#why-isn-t-this-example-with-time-sleep-running-in-parallel describes three methods. The third one is what I needed.
The only change I needed in my code is to replace:
yield self.perform_long_task(*args, **json_input)
which works only for a class that is written for async,
with:
yield executor.submit(self.perform_long_task,*args, **json_input)
All replies and comments were helpful. Many thanks!
I need run some multi-thread\multiprocessing work (because I have some library which uses blocking call) in Scrapy, and after its completion put back Request to Scrapy engine.
I need something like this:
def blocking_call(self, html):
# ....
# do some work in blocking call
return Request(url)
def parse(self, response):
return self.blocking_call(response.body)
How I can do that? I think I should to use Twisted reactor and Deferred object.
But Scrapy parse callback must return only None or Request or BaseItem object.
Based on answer from #Jean-Paul Calderone I did some investigation and testing and here is what I have found out.
Internally scrapy uses Twisted framework for managing request/response sync and async calls.
Scrapy spawns requests (crawling) in async manner, but processing responses (our custom parse callback functions) are done synchronous. So if you have blocking call in a callback, it will block the whole engine.
Hopefully this can be changed. When processing Deferred response callback result, Twisted handles the case (twisted.internet.defer.Deferred source) if Deferred object returns other Deferred object. In that case Twisted yields new async call.
Basically, if we return Deferred object from our response callback, this will change nature of response callback call from sync to async. For that we can use method deferToThread ( internally calls deferToThreadPool(reactor, reactor.getThreadPool()... - which was used in #Jean-Paul Calderone code example).
The working code example is:
from twisted.internet.threads import deferToThread
from twisted.internet import reactor
class SpiderWithBlocking(...):
...
def parse(self, response):
return deferToThread(reactor, self.blocking_call, response.body)
def blocking_call(self, html):
# ....
# do some work in blocking call
return Request(url)
Additionally, only callbacks can return Deferred objects, but start_requests can not (scrapy logic).
If you want to return a Deferred that fires after your blocking operation has finished running in one of the reactor's thread pool threads, use deferToThreadPool:
from twisted.internet.threads import deferToThreadPool
from twisted.internet import reactor
...
def parse(self, response):
return deferToThreadPool(
reactor, reactor.getThreadPool(), self.blocking_call, response.body)
here is my code:
class AsyncTestHandler(BaseHandler):
def testTimeOut(self, callback):
time.sleep(20)
callback("ok")
#tornado.web.asynchronous
def post(self):
user = self.get_current_user()
self.testTimeOut(callback=self.respones)
def respones(self,msg):
self.finish(msg)
i have used "#tornado.web.asynchronous" with callback,but the request is not asynchronous,how can id?
Tornado uses only one process and one thread. All the IO operations in it are asynchronous, which doesn't mean that they are processed concurrently. So, if you call time.sleep(xx) in your code anywhere, your Tornado process will totally 'stopped' for that time!
The correct way to sleep in Tornado is to call ioloop.add_timeout.
See tornado equivalent of delay.
See http://caisong.com/Tornado%20don't%20use%20time.sleep%20.html.
The problem is, time.sleep isn't asynchronous, so the main loop is blocked, while sleeping. For running synchronous code asynchronously you can use a seperate worker thread.
class HugeQueryHandler(BaseHandler):
executor = tornado.concurrent.futures.ThreadPoolExecutor(5)
#tornado.concurrent.run_on_executor
def sleep_async(self):
time.sleep(20)
return
#tornado.web.asynchronous
#gen.engine
def get(self):
r = yield self.sleep_async()
self.finish()
import tornado.web
import Queue
QUEUE = Queue.Queue()
class HandlerA( tornado.web.RequestHandler ):
def get(self):
global QUEUE
self.finish(QUEUE.get_nowait())
class HandlerB( tornado.web.RequestHandler ):
def get(self):
global QUEUE
QUEUE.put('Hello')
self.finish('In queue.')
Problem: HandlerA blocks HandlerB for 10 seconds.
Browser A handled by HandlerA and waits...
Browser B handled by HandlerB and waits.... till timeout exceptions
Goal
Browser A handled by HandlerA and waits...
Browser B handled by HandlerB and returns
HandlerA returns after dequeuing
Is this an issue with Non-blocking, async, epoll or sockets?
Thanks!
UPDATE:
I updated this code with a new thread to handle the Queue.get_nowait() request. Which I fear is a HORRIBLE solution considering I'm going to have thousands of requests at once and would therefore have thousands of threads at once. I'm considering moving to a epoll style in the near future.
class HandlerA( tornado.web.RequestHandler ):
#tornado.web.asynchronous
def get(self):
thread.start_new_thread(self.get_next)
def get_next(self):
global QUEUE
self.finish(QUEUE.get_nowait())
Now this is not the best way to handle it... but at least its a start.
SOLUTION
Found here Running blocking code in Tornado
This is Python. So, time.sleep will always block the flow! In order to call action after 10 seconds with Tornado, you need to use tornado.ioloop.add_timeout function and pass callback as param. Docs for more information.