Using Tornado and Twisted at the same time - python

I am in a weird situation where I have to use Twisted in a system built completely out of Tornado.
They can share the same IOLoop so I know they can work together. My question is can I safely use their co-routine decorators in the same function? For example:
import tornado.platform.twisted
tornado.platform.twisted.install()
...
#gen.engine
#defer.inlineCallbacks
def get(self):
...
a = yield gen.Task(getA) # tornado
b = yield proxy.callRemote(getB) # twisted
...
defer.returnValue(a + b) # twisted
They do work on the same IOLoop so I am thinking this should be fine. Would there be any unforeseen consequences? Thanks in advance.

Looks like what you want is Cyclone, a web server framework for Python that implements the Tornado API as a Twisted protocol.

No, this wouldn't work. In your case inlineCallbacks is wrapped directly around your generator and gen.engine is wrapped outside. The problem is that inlineCallbacks does not know anything about gen.Task and it will yield it immediately (it has no way of passing it along to gen.engine).
To elaborate: if you yield obj inside an inlineCallbacks-wrapped generator, two things can happen:
obj is a Deferred in which case control is returned to the reactor until that Deferred fires.
obj is something else, in which case it is immediately sent back into your generator.
In your case, the result would be:
a = yield gen.Task(getA) # continues right through
# a is of type gen.Task here
b = yield proxy.callRemote(getB) # waits for result of proxy.callRemote
See here for how inlineCallbacks is implemented.
What is the right way to do this? Try to use either inlineCallbacks or gen.engine (but not both). Wrap the alien gen.Task (or Deferred) into the "native" form. I am not familiar with Tornado but maybe this question helps.
Alternatively, write your own decorator like inlineCallbacks that handles gen.Task as well.

Related

is it possible to list all blocked tornado coroutines

I have a "gateway" app written in tornado using #tornado.gen.coroutine to transfer information from one handler to another. I'm trying to do some debugging/status testing. What I'd like to be able to do is enumerate all of the currently blocked/waiting coroutines that are live at a given moment. Is this information accessible somewhere in tornado?
You talk about ioloop _handlers dict maybe. Try to add this in periodic callback:
def print_current_handlers():
io_loop = ioloop.IOLoop.current()
print io_loop._handlers
update: I've checked source code and now think that there is no simple way to trace current running gen.corouitines, A. Jesse Jiryu Davis is right!
But you can trace all "async" calls (yields) from coroutines - each yield from generator go into IOLoop.add_callback (http://www.tornadoweb.org/en/stable/ioloop.html#callbacks-and-timeouts)
So, by examining io_loop._callbacks you can see what yields are in ioloop right now.
Many interesting stuff is here :) https://github.com/tornadoweb/tornado/blob/master/tornado/gen.py
No there isn't, but you could perhaps create your own decorator that wraps gen.coroutine, then updates a data structure when the coroutine begins.
import weakref
import functools
from tornado import gen
from tornado.ioloop import IOLoop
all_coroutines = weakref.WeakKeyDictionary()
def tracked_coroutine(fn):
coro = gen.coroutine(fn)
#functools.wraps(coro)
def start(*args, **kwargs):
future = coro(*args, **kwargs)
all_coroutines[future] = str(fn)
return future
return start
#tracked_coroutine
def five_second_coroutine():
yield gen.sleep(5)
#tracked_coroutine
def ten_second_coroutine():
yield gen.sleep(10)
#gen.coroutine
def tracker():
while True:
running = list(all_coroutines.values())
print(running)
yield gen.sleep(1)
loop = IOLoop.current()
loop.spawn_callback(tracker)
loop.spawn_callback(five_second_coroutine)
loop.spawn_callback(ten_second_coroutine)
loop.start()
If you run this script for a few seconds you'll see two active coroutines, then one, then none.
Note the warning in the docs about the dictionary changing size, you should catch "RuntimeError" in "tracker" to deal with that problem.
This is a bit complex, you might get all you need much more simply by turning on Tornado's logging and using set_blocking_log_threshold.

Using gen.coroutine’s callback argument in Tornado

Looking for a simple example demonstrating use of tornado.gen.coroutine’s callback argument. The docs say:
Functions with [the gen.coroutine] decorator return a Future. Additionally, they may be called with a callback keyword argument, which will be invoked with the future’s result when it resolves.
Adapting an example from the docs’ User’s guide, I would think I could do:
from tornado import gen
#gen.coroutine
def divide(x, y):
return x / y
#gen.coroutine
def good_call():
yield divide(1, 2)
good_call(callback=print)
I’d expect this to print 0.5, but there’s no output.
I’ve found copious examples demonstrating the deprecated gen.engine decorator, but there doesn’t seem to be as much out there on gen.coroutine. Running on Python 3.5.1 and Tornado 4.3.
You still need to start the IOLoop. If you add tornado.ioloop.IOLoop.current().start() at the end of your script you'll see the output printed (and then the IOLoop runs forever. If you want it to stop, you'll need to do so from your callback after printing).
Note that in general it is possible (and encouraged) to write Tornado applications using only coroutines and yield, without passing any callbacks directly.

Using context in twisted

I try to use twisted.python.context, but context disappear after first deferToThread.
from twisted.internet import reactor, defer, threads
from twisted.python import context
def _print_context(msg):
cont = context.get('cont')
print "{msg}: {context}".format(msg=msg, context=cont)
def sub_call():
_print_context("In sub_call")
#defer.inlineCallbacks
def with_context():
_print_context("before thread")
yield threads.deferToThread(sub_call)
_print_context("after thread")
reactor.stop()
def test():
cont = {'cont': "TestContext"}
context.call(cont, with_context)
reactor.callLater(0, test)
reactor.run()
I have context before deferToThread and in sub_call, but no context after deferToThread.
Is there any way to have context after deferToThread?
context.call sets the context for the duration of the call of the object passed to it - in this case with_context.
with_context is an inlineCallbacks-wrapped generator function. The first call to it creates a new generator and iterates it as far as the first yield statement. Then its execution is suspended and, as far as the caller is concerned, the call returns. At this point the context stack is popped and the context you supplied is discarded.
Later, the implementation of inlineCallbacks ensures the generator will be iterated further so that the code after the yield statement executes. However, the context has already been discarded.
There's no easy way to fix this. twisted.python.context does not try to address the problem of asynchronous context management. Moreover, twisted.python.context is fairly terrible and few if any programs should actually be written to use it.
I recommend taking a step back and re-evaluating your choice here. You'd probably be better served by creating a class and using instance attributes on an instance of it to carry state around between method calls.

Parallelise web tasks with asyncio in Python

I'm trying to wrap my head around asyncio and aiohttp and for the first time in years programming makes me feel utterly stupid and incapable. Which is kind of beautiful, in a weirdo Zen way. But alas, there's work to get done.
I've got an existing class that can do numerous wondrous things on the web, like signing up to a web site, getting data, the works. And now I need like, 100 or 1000 of these little worker bees to sign up. Code looks roughly like this:
class Worker(object):
def signup(self, ...):
...
data = self.make_request(url, data)
self.user_id = data.get("user_id")
return self
def make_request(self, url, data):
response = requests.post(url, data=data)
return response.json()
workers = [Worker().signup() for n in range(100)]
As you can see, we're using the requests module to make a POST request. However this is blocking, so we'll have to wait for worker N to finish signing up before we start signing up worker N+1. Fortunately, the original author of the Worker class (that sounds charmingly Marxist) in her infinite wisdom wrapped every HTTP call in the self.make_request method, so making the whole Worker non blocking should just be a matter of swapping out the requests library for a non-blocking one aaaaand bob's your uncle, right? This is how far I got:
class AyncWorker(Worker):
#asyncio.coroutine
def make_request(self, url, data):
response = yield from aiohttp.request('post', url, data=data)
return (yield from response.json())
coroutines = [Worker().signup() for n in range(100)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(coroutines))
loop.close()
But this will raise an AttributeError: 'generator' object has no attribute 'get' in the signup method where I do self.user_id = data.get("user_id"). And beyond that, I still don't have the workers in a neat dictionary. I'm aware that I'm most likely completely misunderstanding how asyncio works - but I already spent a day reading through various docs, mind-shattering tutorials by David Beazly, and masses of toy examples that are simply enough that I understand them and too simple to apply to this situation. How should I structure my worker and my async loop to sign up 100 workers in parallel and eventually get a list of all workers after they signed up?
Once you use the yield (or yield from) in a function, this function becomes a coroutine. It means that you can't get a result by just calling it: you will get a generator object. You must at least do this:
#asyncio.coroutine
def some_coroutine(*args):
#...
#...
yield from tasty.asyncio.function()
return result
def coroutine_user():
# data = some_coroutine() will give you a generator object instead of result
data = yield from some_coroutine()
return data # data here is a plain result: you can call your .get or whatever
Guess what happens when you call coroutine_user():
>>> coroutine_user()
<generator object coroutine_user at 0x7fe13b8a47e0>
Lack of async.coroutine decorator doesn't help at all: coroutines are contagious! To get a result in a function, you must use yield from. It turns your function into another coroutine!
Though things aren't always that bad (usually you can manually iterate a generator object without relying on yield from), asyncio will specifically stop you from doing it: it breaks some internals (you can do it only from Future or asyncio.coroutine). So just use concurrent.futures or something similar unless you're going to turn all your code into coroutines. As some alternative, isolate all users of aiohttp.request from usual methods and work with both coroutine-based async workers and synchronous plain old code. Diving into asyncio and actually refactoring all your code is an option too, obviously: you basically need to put yield from before every call to any infected with asyncio method.

Integrating with deluged api - twisted deferred

I have a very simple script that monitors a file transfer progress, comparing its actual size with the target then calculating its hash, comparing with the desired hash and firing up a few extra things when everything seems alright.
I've replaced the tool used for the file transfers (wget) with deluged, which has a neat api to integrate with.
Instead of comparing the file progress and compare the hashes, I only need to know now when deluged finished downloading the files. To achieve that, I was able to modify this script to my needs, but I'm stuck trying to wrap my head around twisted framework, that deluged makes use of.
To try getting over it, I grabbed one sample script from twisted deferred documentation, wrapped a class around it and attempted to use the same concept I'm using on this script I mentioned.
Now, I don't know exactly what to do with the reactor object, since it's basically a blocking loop that can't be restarted.
This is my sample code I'm working with:
from twisted.internet import reactor, defer
import time
class DummyDataGetter:
done = False
result = 0
def getDummyData(self, x):
d = defer.Deferred()
# simulate a delayed result by asking the reactor to fire the
# Deferred in 2 seconds time with the result x * 3
reactor.callLater(2, d.callback, x * 3)
return d
def assignResult(self, d):
"""
Data handling function to be added as a callback: handles the
data by printing the result
"""
self.result = d
self.done = True
reactor.stop()
def run(self):
d = self.getDummyData(3)
d.addCallback(self.assignResult)
reactor.run()
getter = DummyDataGetter()
getter.run()
while not getter.done:
time.sleep(0.5)
print getter.result
# then somewhere else I want to get dummy data again
getter = DummyDataGetter()
getter.run() #this throws an exception of type error.ReactorNotRestartable
while not getter.done:
time.sleep(0.5)
print getter.result
My questions are:
Should reactor be fired in another thread to prevent it blocking the code?
If so, how would I add more callbacks to this reactor living in a separate thread? Simply by doing something similar to reactor.callLater(2, d.callback, x * 3), from my main thread?
If not, what is the technique to overcome this problem of not being able to starting/stopping reactor twice or more on the same process?
OK, easiest approach I found to this is to simply have a separate script called using subprocess.Popen, dump the statuses of the torrents and anything else needed into the stdout (serialized using JSON) and pipe that into the calling script.
Way less traumatic than learning twisted, but of course far away from optimal.

Categories