Using context in twisted - python

I try to use twisted.python.context, but context disappear after first deferToThread.
from twisted.internet import reactor, defer, threads
from twisted.python import context
def _print_context(msg):
cont = context.get('cont')
print "{msg}: {context}".format(msg=msg, context=cont)
def sub_call():
_print_context("In sub_call")
#defer.inlineCallbacks
def with_context():
_print_context("before thread")
yield threads.deferToThread(sub_call)
_print_context("after thread")
reactor.stop()
def test():
cont = {'cont': "TestContext"}
context.call(cont, with_context)
reactor.callLater(0, test)
reactor.run()
I have context before deferToThread and in sub_call, but no context after deferToThread.
Is there any way to have context after deferToThread?

context.call sets the context for the duration of the call of the object passed to it - in this case with_context.
with_context is an inlineCallbacks-wrapped generator function. The first call to it creates a new generator and iterates it as far as the first yield statement. Then its execution is suspended and, as far as the caller is concerned, the call returns. At this point the context stack is popped and the context you supplied is discarded.
Later, the implementation of inlineCallbacks ensures the generator will be iterated further so that the code after the yield statement executes. However, the context has already been discarded.
There's no easy way to fix this. twisted.python.context does not try to address the problem of asynchronous context management. Moreover, twisted.python.context is fairly terrible and few if any programs should actually be written to use it.
I recommend taking a step back and re-evaluating your choice here. You'd probably be better served by creating a class and using instance attributes on an instance of it to carry state around between method calls.

Related

Python Asyncio - Eventloop in a contextmanager

Since I dont like the approach to use loop.run() for various reasons I wanted to code contextual loop, since the docs states on different occasions that if you don't go with the canonical .run() you have to prevent memory leaks by yourself (i.e). After a bit of research it seems like the python devs answer this feature with We don't need it!. While contextmanagers seems in general perfectly fine if you using the lower level api of asyncio, see PEP 343 - The “with” Statement exampel 10:
This can be used to deterministically close anything with a close
method, be it file, generator, or something else. It can even be used
when the object isn’t guaranteed to require closing (e.g., a function
that accepts an arbitrary iterable)
So can we do it anyway?
Related links:
https://bugs.python.org/issue24795
https://bugs.python.org/issue32875
https://groups.google.com/g/python-tulip/c/8bRLexUzeU4
https://bugs.python.org/issue19860#msg205062
https://github.com/python/asyncio/issues/261
Yes we can have a context manager for our event loop, even if there seems no good practice via subclassing due the c implementations (i.e). Basically the idea crafted out below is the following:
TL;DR
Create an Object with __enter__ and __exit__ to have the syntax with working.
Instead of usually returning the object, we return the loop that is served by asyncio
wrapping the asycio.loop.close() so that the loop gets stopped and our __exit__ method gets invoked before.
Close all connections which can lead to memory leaks and then close the loop.
Side-note
The implementation is due a wrapper object that returns a new loop into an annonymous block statement. Be aware that loop.stop() will finalize the loop and no further actions should be called. Overall the code below is just a little help and more a styling choice in my opinion, especially due the fact that it is no real subclass. But I think if someone wants to use the lower api without minding to finalize everything before, here is a possibility.
import asyncio
class OpenLoop:
def close(self,*args, **kwargs):
self._loop.stop()
def _close_wrapper(self):
self._close = self._loop.close
self._loop.close = self.close
def __enter__(self):
self._loop = asyncio.new_event_loop()
self._close_wrapper()
return self._loop
def __exit__(self,*exc_info):
asyncio.run(self._loop.shutdown_asyncgens())
asyncio.run(self._loop.shutdown_default_executor())
#close other services
self._close()
if __name__ == '__main__':
with OpenLoop() as loop:
loop.call_later(1,loop.close)
loop.run_forever()
assert loop.is_closed()

Python asyncio: Queue concurrent to normal code

Edit: I am closing this question.
As it turns out, my goal of having parallel HTTP posts is pointless. After implementing it successfully with aiohttp, I run into deadlocks elsewhere in the pipeline.
I will reformulate this and post a single question in a few days.
Problem
I want to have a class that, during some other computation, holds generated data and can write it to a DB via HTTP (details below) when convenient. It's gotta be a class as it is also used to load/represent/manipulate data.
I have written a naive, nonconcurrent implementation that works:
The class is initialized and then used in a "main loop". Data is added to it in this main loop to a naive "Queue" (a list of HTTP requests). At certain intervals in the main loop, the class calls a function to write those requests and clear the "queue".
As you can expect, this is IO bound. Whenever I need to write the "queue", the main loop halts. Furthermore, since the main computation runs on a GPU, the loop is also not really CPU bound.
Essentially, I want to have a queue, and, say, ten workers running in the background and pushing items to the http connector, waiting for the push to finish and then taking on the next (or just waiting for the next write call, not a big deal). In the meantime, my main loop runs and adds to the queue.
Program example
My naive program looks something like this
class data_storage(...):
def add(...):
def write_queue(self):
if len(self.queue) > 0:
res = self.connector.run(self.queue)
self.queue = []
def main_loop(storage):
# do many things
for batch in dataset: #simplified example
# Do stuff
for some_other_loop:
(...)
storage.add(results)
# For example, call each iteration
storage.write_queue()
if __name__ == "__main__":
storage=data_storage()
main_loop(storage)
...
In detail: the connector class is from the package 'neo4j-connector' to post to my Neo4j database. It essentially does JSON formatting and uses the "requests" api from python.
This works, even without a real queue, since nothing is concurrent.
Now I have to make it work concurrently.
From my research, I have seen that ideally I would want a "producer-consumer" pattern, where both are initialized via asyncio. I have only seen this implemented via functions, not classes, so I don't know how to approach this. With functions, my main loop should be a producer coroutine and my write function becomes the consumer. Both are initiated as tasks on the queue and then gathered, where I'd initialize only one producer but many consumers.
My issue is that the main loop includes parts that are already parallel (e.g. PyTorch). Asyncio is not thread safe, so I don't think I can just wrap everything in async decorators and make a co-routine. This is also precisely why I want the DB logic in a separate class.
I also don't actually want or need the main loop to run "concurrently" on the same thread with the workers. But it's fine if that's the outcome as the workers don't do much on the CPU. But technically speaking, I want multi-threading? I have no idea.
My only other option would be to write into the queue until it is "full", halt the loop and then use multiple threads to dump it to the DB. Still, this would be much slower than doing it while the main loop is running. My gain would be minimal, just concurrency while working through the queue. I'd settle for it if need be.
However, from a stackoverflow post, I came up with this small change
class data_storage(...):
def add(...):
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def write_queue(self):
if len(self.queue) > 0:
res = self.connector.run(self.queue)
self.queue = []
Shockingly this sort of "works" and is blazingly fast. Of course since it's not a real queue, things get overwritten. Furthermore, this overwhelms or deadlocks the HTTP API and in general produces a load of errors.
But since this - in principle - works, I wonder if I could do is the following:
class data_storage(...):
def add(...):
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def post(self, items):
if len(items) > 0:
self.nr_workers.increase()
res = self.connector.run(items)
self.nr_workers.decrease()
def write_queue(self):
if self.nr_workers < 10:
items=self.queue.get(200) # Extract and delete from queue, non-concurrent
self.post(items) # Add "Worker"
for some hypothetical queue and nr_workers objects. Then at the end of the main loop, have a function that blocks progress until number of workers is zero and clears, non-concurrently, the rest of the queue.
This seems like a monumentally bad idea, but I don't know how else to implement this. If this is terrible, I'd like to know before I start doing more work on this. Do you think it would it work?
Otherwise, could you give me any pointers as how to approach this situation correctly?
Some key words, tools or things to research would of course be enough.
Thank you!

is it possible to list all blocked tornado coroutines

I have a "gateway" app written in tornado using #tornado.gen.coroutine to transfer information from one handler to another. I'm trying to do some debugging/status testing. What I'd like to be able to do is enumerate all of the currently blocked/waiting coroutines that are live at a given moment. Is this information accessible somewhere in tornado?
You talk about ioloop _handlers dict maybe. Try to add this in periodic callback:
def print_current_handlers():
io_loop = ioloop.IOLoop.current()
print io_loop._handlers
update: I've checked source code and now think that there is no simple way to trace current running gen.corouitines, A. Jesse Jiryu Davis is right!
But you can trace all "async" calls (yields) from coroutines - each yield from generator go into IOLoop.add_callback (http://www.tornadoweb.org/en/stable/ioloop.html#callbacks-and-timeouts)
So, by examining io_loop._callbacks you can see what yields are in ioloop right now.
Many interesting stuff is here :) https://github.com/tornadoweb/tornado/blob/master/tornado/gen.py
No there isn't, but you could perhaps create your own decorator that wraps gen.coroutine, then updates a data structure when the coroutine begins.
import weakref
import functools
from tornado import gen
from tornado.ioloop import IOLoop
all_coroutines = weakref.WeakKeyDictionary()
def tracked_coroutine(fn):
coro = gen.coroutine(fn)
#functools.wraps(coro)
def start(*args, **kwargs):
future = coro(*args, **kwargs)
all_coroutines[future] = str(fn)
return future
return start
#tracked_coroutine
def five_second_coroutine():
yield gen.sleep(5)
#tracked_coroutine
def ten_second_coroutine():
yield gen.sleep(10)
#gen.coroutine
def tracker():
while True:
running = list(all_coroutines.values())
print(running)
yield gen.sleep(1)
loop = IOLoop.current()
loop.spawn_callback(tracker)
loop.spawn_callback(five_second_coroutine)
loop.spawn_callback(ten_second_coroutine)
loop.start()
If you run this script for a few seconds you'll see two active coroutines, then one, then none.
Note the warning in the docs about the dictionary changing size, you should catch "RuntimeError" in "tracker" to deal with that problem.
This is a bit complex, you might get all you need much more simply by turning on Tornado's logging and using set_blocking_log_threshold.

Parallelise web tasks with asyncio in Python

I'm trying to wrap my head around asyncio and aiohttp and for the first time in years programming makes me feel utterly stupid and incapable. Which is kind of beautiful, in a weirdo Zen way. But alas, there's work to get done.
I've got an existing class that can do numerous wondrous things on the web, like signing up to a web site, getting data, the works. And now I need like, 100 or 1000 of these little worker bees to sign up. Code looks roughly like this:
class Worker(object):
def signup(self, ...):
...
data = self.make_request(url, data)
self.user_id = data.get("user_id")
return self
def make_request(self, url, data):
response = requests.post(url, data=data)
return response.json()
workers = [Worker().signup() for n in range(100)]
As you can see, we're using the requests module to make a POST request. However this is blocking, so we'll have to wait for worker N to finish signing up before we start signing up worker N+1. Fortunately, the original author of the Worker class (that sounds charmingly Marxist) in her infinite wisdom wrapped every HTTP call in the self.make_request method, so making the whole Worker non blocking should just be a matter of swapping out the requests library for a non-blocking one aaaaand bob's your uncle, right? This is how far I got:
class AyncWorker(Worker):
#asyncio.coroutine
def make_request(self, url, data):
response = yield from aiohttp.request('post', url, data=data)
return (yield from response.json())
coroutines = [Worker().signup() for n in range(100)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(coroutines))
loop.close()
But this will raise an AttributeError: 'generator' object has no attribute 'get' in the signup method where I do self.user_id = data.get("user_id"). And beyond that, I still don't have the workers in a neat dictionary. I'm aware that I'm most likely completely misunderstanding how asyncio works - but I already spent a day reading through various docs, mind-shattering tutorials by David Beazly, and masses of toy examples that are simply enough that I understand them and too simple to apply to this situation. How should I structure my worker and my async loop to sign up 100 workers in parallel and eventually get a list of all workers after they signed up?
Once you use the yield (or yield from) in a function, this function becomes a coroutine. It means that you can't get a result by just calling it: you will get a generator object. You must at least do this:
#asyncio.coroutine
def some_coroutine(*args):
#...
#...
yield from tasty.asyncio.function()
return result
def coroutine_user():
# data = some_coroutine() will give you a generator object instead of result
data = yield from some_coroutine()
return data # data here is a plain result: you can call your .get or whatever
Guess what happens when you call coroutine_user():
>>> coroutine_user()
<generator object coroutine_user at 0x7fe13b8a47e0>
Lack of async.coroutine decorator doesn't help at all: coroutines are contagious! To get a result in a function, you must use yield from. It turns your function into another coroutine!
Though things aren't always that bad (usually you can manually iterate a generator object without relying on yield from), asyncio will specifically stop you from doing it: it breaks some internals (you can do it only from Future or asyncio.coroutine). So just use concurrent.futures or something similar unless you're going to turn all your code into coroutines. As some alternative, isolate all users of aiohttp.request from usual methods and work with both coroutine-based async workers and synchronous plain old code. Diving into asyncio and actually refactoring all your code is an option too, obviously: you basically need to put yield from before every call to any infected with asyncio method.

Using Tornado and Twisted at the same time

I am in a weird situation where I have to use Twisted in a system built completely out of Tornado.
They can share the same IOLoop so I know they can work together. My question is can I safely use their co-routine decorators in the same function? For example:
import tornado.platform.twisted
tornado.platform.twisted.install()
...
#gen.engine
#defer.inlineCallbacks
def get(self):
...
a = yield gen.Task(getA) # tornado
b = yield proxy.callRemote(getB) # twisted
...
defer.returnValue(a + b) # twisted
They do work on the same IOLoop so I am thinking this should be fine. Would there be any unforeseen consequences? Thanks in advance.
Looks like what you want is Cyclone, a web server framework for Python that implements the Tornado API as a Twisted protocol.
No, this wouldn't work. In your case inlineCallbacks is wrapped directly around your generator and gen.engine is wrapped outside. The problem is that inlineCallbacks does not know anything about gen.Task and it will yield it immediately (it has no way of passing it along to gen.engine).
To elaborate: if you yield obj inside an inlineCallbacks-wrapped generator, two things can happen:
obj is a Deferred in which case control is returned to the reactor until that Deferred fires.
obj is something else, in which case it is immediately sent back into your generator.
In your case, the result would be:
a = yield gen.Task(getA) # continues right through
# a is of type gen.Task here
b = yield proxy.callRemote(getB) # waits for result of proxy.callRemote
See here for how inlineCallbacks is implemented.
What is the right way to do this? Try to use either inlineCallbacks or gen.engine (but not both). Wrap the alien gen.Task (or Deferred) into the "native" form. I am not familiar with Tornado but maybe this question helps.
Alternatively, write your own decorator like inlineCallbacks that handles gen.Task as well.

Categories