I have view that takes a lot of memory and is asynchronous. Can I limit count of connections simultaneously working inside handler function (like critical section with N max workers inside).
Is this possible in Tornado?
Like:
#tornado.web.asynchronous
def get(self):
with critical_section(count=5):
# some code
Thanks
Toro provides synchronization primitives similar to those found in the threading module for Tornado coroutines. You could use its BoundedSemaphore to gate entry to the handler body:
# global semaphore
sem = toro.BoundedSemaphore(5)
#gen.coroutine
def get(self):
with (yield sem.acquire()):
# do work
Short answer:
As far as I understand Tornado and other frameworks that use Future/Deferred/generator based concurrency, this is not possible. However, it should definitely be possible using higher-order functions, i.e. a critical_section() helper function that takes the body of the with-block as a parameter.
Long answer:
To my best knowledge, Tornado's concurrency works very much like that of Twisted; which means non-blocking calls are limited to using Futures and yield (based on Twisted's #inlineCallbacks or whatever is the equivalent in Tornado).
In order to implement a critical_section context manager, it would have to cooperate with the reactor internally; this can only happen using callbacks or yield. However, neither is composable with context managers.
I'd actually already thrown up some code until I remembered this. This is the code I came up with:
import sys
from contextlib import contextmanager
from collections import defaultdict
from tornado.concurrent import Future
_critical_sections = defaultdict(lambda: (0, []))
#contextmanager
def critical_section(count):
# get the code location of the critical section
frame = sys._getframe()
orig_caller = frame.f_back.f_back
lineno = orig_caller.f_lineno
filename = orig_caller.f_code.co_filename
loc = (filename, lineno)
count, waiters = _critical_sections[loc]
if count > 5:
future = Future()
_critical_sections[loc] = (count + 1, waiters + [future])
# XXX: not possible; either have to set a callback or use yield, but
# then this context manager itself would not work as you'd expect:
future.wait() # <---- not possible in Tornado nor Twisted; only in Gevent/Eventlet
fn(*args, **kwargs)
else:
_critical_sections[loc] = (count + 1, waiters)
try:
yield
finally:
count, waiters = _critical_sections[loc]
_, w_future = waiters[0]
_critical_sections[loc] = (count, waiters[1:])
w_future.set_result(None)
(I have not tested it in anyway, and it's not runnable on Tornado anyway.)
Now, if you're happy with the proposed approach, here's something to get you started (or maybe it even works out of the box):
...
def _critical_section(count, fn, *args, **kwargs):
...
if count > 5:
future = Future()
future.add_done_callback(lambda _: fn(*args, **kwargs))
_critical_sections[loc] = (count + 1, waiters + [future])
# XXX: not possible; either have to set a callback or use yield, but
# then this context manager itself would not work as you'd expect:
return future
else:
_critical_sections[loc] = (count + 1, waiters)
try:
return fn()
finally:
... # same
then you could just turn it into a decorator:
from functools import wraps
def critical_section(count):
def decorate(fn):
#wraps(fn)
def ret(*args, **kwargs):
return _critical_section(count, fn, *args, **kwargs)
return ret
return decorate
Usage:
#tornado.web.asynchronous
def get(self):
#critical_section(count=5)
def some_code():
pass # do stuff
Also, the code is using sys._getframe(), which has (at least) 2 implications:
it will make the code slower when run on PyPy (until/unless PyPy has become able to JIT-compile functions that use sys._getframe), but most of the time, it's an acceptable tradeoff when it comes to web code
I don't think the code will work if you compile it to .pyc and remove the .py—it shouldn't then be able to determine the filename and line number of the calling code, so it will not (probably) be possible to uniquely distinguish the location of the critical section, in which case you'd have to use lock objects.
NOTE: The context manager version would be perfectly feasible on Gevent (http://gevent.org) or Eventlet.
Related
When implementing classes that have uses in both synchronous and asynchronous applications, I find myself maintaining virtually identical code for both use cases.
Just as an example, consider:
from time import sleep
import asyncio
class UselessExample:
def __init__(self, delay):
self.delay = delay
async def a_ticker(self, to):
for i in range(to):
yield i
await asyncio.sleep(self.delay)
def ticker(self, to):
for i in range(to):
yield i
sleep(self.delay)
def func(ue):
for value in ue.ticker(5):
print(value)
async def a_func(ue):
async for value in ue.a_ticker(5):
print(value)
def main():
ue = UselessExample(1)
func(ue)
loop = asyncio.get_event_loop()
loop.run_until_complete(a_func(ue))
if __name__ == '__main__':
main()
In this example, it's not too bad, the ticker methods of UselessExample are easy to maintain in tandem, but you can imagine that exception handling and more complicated functionality can quickly grow a method and make it more of an issue, even though both methods can remain virtually identical (only replacing certain elements with their asynchronous counterparts).
Assuming there's no substantial difference that makes it worth having both fully implemented, what is the best (and most Pythonic) way of maintaining a class like this and avoiding needless duplication?
There is no one-size-fits-all road to making an asyncio coroutine-based codebase useable from traditional synchronous codebases. You have to make choices per codepath.
Pick and choose from a series of tools:
Synchronous versions using asyncio.run()
Provide synchronous wrappers around coroutines, which block until the coroutine completes.
Even an async generator function such as ticker() can be handled this way, in a loop:
class UselessExample:
def __init__(self, delay):
self.delay = delay
async def a_ticker(self, to):
for i in range(to):
yield i
await asyncio.sleep(self.delay)
def ticker(self, to):
agen = self.a_ticker(to)
try:
while True:
yield asyncio.run(agen.__anext__())
except StopAsyncIteration:
return
These synchronous wrappers can be generated with helper functions:
from functools import wraps
def sync_agen_method(agen_method):
#wraps(agen_method)
def wrapper(self, *args, **kwargs):
agen = agen_method(self, *args, **kwargs)
try:
while True:
yield asyncio.run(agen.__anext__())
except StopAsyncIteration:
return
if wrapper.__name__[:2] == 'a_':
wrapper.__name__ = wrapper.__name__[2:]
return wrapper
then just use ticker = sync_agen_method(a_ticker) in the class definition.
Straight-up coroutine methods (not generator coroutines) could be wrapped with:
def sync_method(async_method):
#wraps(async_method)
def wrapper(self, *args, **kwargs):
return async.run(async_method(self, *args, **kwargs))
if wrapper.__name__[:2] == 'a_':
wrapper.__name__ = wrapper.__name__[2:]
return wrapper
Factor out common components
Refactor out the synchronous parts, into generators, context managers, utility functions, etc.
For your specific example, pulling out the for loop into a separate generator would minimise the duplicated code to the way the two versions sleep:
class UselessExample:
def __init__(self, delay):
self.delay = delay
def _ticker_gen(self, to):
yield from range(to)
async def a_ticker(self, to):
for i in self._ticker_gen(to):
yield i
await asyncio.sleep(self.delay)
def ticker(self, to):
for i in self._ticker_gen(to):
yield i
sleep(self.delay)
While this doesn't make much of any difference here it can work in other contexts.
Abstract Syntax Tree tranformation
Use AST rewriting and a map to transform coroutines into synchronous code. This can be quite fragile if you are not careful on how you recognise utility functions such as asyncio.sleep() vs time.sleep():
import inspect
import ast
import copy
import textwrap
import time
asynciomap = {
# asyncio function to (additional globals, replacement source) tuples
"sleep": ({"time": time}, "time.sleep")
}
class AsyncToSync(ast.NodeTransformer):
def __init__(self):
self.globals = {}
def visit_AsyncFunctionDef(self, node):
return ast.copy_location(
ast.FunctionDef(
node.name,
self.visit(node.args),
[self.visit(stmt) for stmt in node.body],
[self.visit(stmt) for stmt in node.decorator_list],
node.returns and ast.visit(node.returns),
),
node,
)
def visit_Await(self, node):
return self.visit(node.value)
def visit_Attribute(self, node):
if (
isinstance(node.value, ast.Name)
and isinstance(node.value.ctx, ast.Load)
and node.value.id == "asyncio"
and node.attr in asynciomap
):
g, replacement = asynciomap[node.attr]
self.globals.update(g)
return ast.copy_location(
ast.parse(replacement, mode="eval").body,
node
)
return node
def transform_sync(f):
filename = inspect.getfile(f)
lines, lineno = inspect.getsourcelines(f)
ast_tree = ast.parse(textwrap.dedent(''.join(lines)), filename)
ast.increment_lineno(ast_tree, lineno - 1)
transformer = AsyncToSync()
transformer.visit(ast_tree)
tranformed_globals = {**f.__globals__, **transformer.globals}
exec(compile(ast_tree, filename, 'exec'), tranformed_globals)
return tranformed_globals[f.__name__]
While the above is probably far from complete enough to fit all needs, and transforming AST trees can be daunting, the above would let you maintain just the async version and map that version to synchronous versions directly:
>>> import example
>>> del example.UselessExample.ticker
>>> example.main()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../example.py", line 32, in main
func(ue)
File "/.../example.py", line 21, in func
for value in ue.ticker(5):
AttributeError: 'UselessExample' object has no attribute 'ticker'
>>> example.UselessExample.ticker = transform_sync(example.UselessExample.a_ticker)
>>> example.main()
0
1
2
3
4
0
1
2
3
4
async/await is infectious by design.
Accept that your code will have different users — synchronous and asynchronous, and that these users will have different requirements, that over time the implementations will diverge.
Publish separate libraries
For example, compare aiohttp vs. aiohttp-requests vs. requests.
Likewise, compare asyncpg vs. psycopg2.
How to get there
Opt1. (easy) clone implementation, allow them to diverge.
Opt2. (sensible) partial refactor, let e.g. async library depend on and import sync library.
Opt3. (radical) create a "pure" library that can be used both in sync and async program. For example, see https://github.com/python-hyper/hyper-h2 .
On the upside, testing is easier and thorough. Consider how hard (or impossible) it is force the test framework to evaluate all possible concurrent execution orders in an async program. Pure library doesn't need that :)
On the down-side this style of programming requires different thinking, is not always straightforward, and may be suboptimal. For example, instead of await socket.read(2**20) you'd write for event in fsm.push(data): ... and rely on your library user to provide you with data in good-sized chunks.
For context, see the backpressure argument in https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/
I would to combine pytest and trio (or curio, if that is any easier), i.e. write my test cases as coroutine functions. This is relatively easy to achieve by declaring a custom test runner in conftest.py:
#pytest.mark.tryfirst
def pytest_pyfunc_call(pyfuncitem):
'''If item is a coroutine function, run it under trio'''
if not inspect.iscoroutinefunction(pyfuncitem.obj):
return
kernel = trio.Kernel()
funcargs = pyfuncitem.funcargs
testargs = {arg: funcargs[arg]
for arg in pyfuncitem._fixtureinfo.argnames}
try:
kernel.run(functools.partial(pyfuncitem.obj, **testargs))
finally:
kernel.run(shutdown=True)
return True
This allows me to write test cases like this:
async def test_something():
server = MockServer()
server_task = await trio.run(server.serve)
try:
# test the server
finally:
server.please_terminate()
try:
with trio.fail_after(30):
server_task.join()
except TooSlowError:
server_task.cancel()
But this is a lot of boilerplate. In non-async code, I would factor this out into a fixture:
#pytest.yield_fixture()
def mock_server():
server = MockServer()
thread = threading.Thread(server.serve)
thread.start()
try:
yield server
finally:
server.please_terminate()
thread.join()
server.server_close()
def test_something(mock_server):
# do the test..
Is there a way to do the same in trio, i.e. implement async fixtures? Ideally, I would just write:
async def test_something(mock_server):
# do the test..
Edit: the answer below is mostly irrelevant now – instead use pytest-trio and follow the instructions in its manual.
Your example pytest_pyfunc_call code doesn't work becaues it's a mix of trio and curio :-). For trio, there's a decorator trio.testing.trio_test that can be used to mark individual tests (like if you were using classic unittest or something), so the simplest way to write a pytest plugin function is to just apply this to each async test:
from trio.testing import trio_test
#pytest.mark.tryfirst
def pytest_pyfunc_call(pyfuncitem):
if inspect.iscoroutine(pyfuncitem.obj):
# Apply the #trio_test decorator
pyfuncitem.obj = trio_test(pyfuncitem.obj)
In case you're curious, this is basically equivalent to:
import trio
from functools import wraps, partial
#pytest.mark.tryfirst
def pytest_pyfunc_call(pyfuncitem):
if inspect.iscoroutine(pyfuncitem.obj):
fn = pyfuncitem.obj
#wraps(fn)
def wrapper(**kwargs):
trio.run(partial(fn, **kwargs))
pyfuncitem.obj = wrapper
Anyway, that doesn't solve your problem with fixtures – for that you need something much more involved.
I have some Tornado's coroutine related problem.
There is some python-model A, which have the abbility to execute some function. The function could be set from outside of the model. I can't change the model itself, but I can pass any function I want. I'm trying to teach it to work with Tornado's ioloop through my function, but I couldn't.
Here is the snippet:
import functools
import pprint
from tornado import gen
from tornado import ioloop
class A:
f = None
def execute(self):
return self.f()
pass
#gen.coroutine
def genlist():
raise gen.Return(range(1, 10))
#gen.coroutine
def some_work():
a = A()
a.f = functools.partial(
ioloop.IOLoop.instance().run_sync,
lambda: genlist())
print "a.f set"
raise gen.Return(a)
#gen.coroutine
def main():
a = yield some_work()
retval = a.execute()
raise gen.Return(retval)
if __name__ == "__main__":
pprint.pprint(ioloop.IOLoop.current().run_sync(main))
So the thing is that I set the function in one part of code, but execute it in the other part with the method of the model.
Now, Tornado 4.2.1 gave me "IOLoop is already running" but in Tornado 3.1.1 it works (but I don't know how exactly).
I know next things:
I can create new ioloop but I would like to use existent ioloop.
I can wrap genlist with some function which knows that genlist's result is Future, but I don't know, how to block execution until future's result will be set inside of synchronous function.
Also, I can't use result of a.execute() as an future object because a.execute() could be called from other parts of the code, i.e. it should return list instance.
So, my question is: is there any opportunity to execute asynchronous genlist from the synchronous model's method using current IOLoop?
You cannot restart the outer IOLoop here. You have three options:
Use asynchronous interfaces everywhere: change a.execute() and everything up to the top of the stack into coroutines. This is the usual pattern for Tornado-based applications; trying to straddle the synchronous and asynchronous worlds is difficult and it's better to stay on one side or the other.
Use run_sync() on a temporary IOLoop. This is what Tornado's synchronous tornado.httpclient.HTTPClient does, which makes it safe to call from within another IOLoop. However, if you do it this way the outer IOLoop remains blocked, so you have gained nothing by making genlist asynchronous.
Run a.execute on a separate thread and call back to the main IOLoop's thread for the inner function. If a.execute cannot be made asynchronous, this is the only way to avoid blocking the IOLoop while it is running.
executor = concurrent.futures.ThreadPoolExecutor(8)
#gen.coroutine
def some_work():
a = A()
def adapter():
# Convert the thread-unsafe tornado.concurrent.Future
# to a thread-safe concurrent.futures.Future.
# Note that everything including chain_future must happen
# on the IOLoop thread.
future = concurrent.futures.Future()
ioloop.IOLoop.instance().add_callback(
lambda: tornado.concurrent.chain_future(
genlist(), future)
return future.result()
a.f = adapter
print "a.f set"
raise gen.Return(a)
#gen.coroutine
def main():
a = yield some_work()
retval = yield executor.submit(a.execute)
raise gen.Return(retval)
Say, your function looks something like this:
#gen.coroutine
def foo():
# does slow things
or
#concurrent.run_on_executor
def bar(i=1):
# does slow things
You can run foo() like so:
from tornado.ioloop import IOLoop
loop = IOLoop.current()
loop.run_sync(foo)
You can run bar(..), or any coroutine that takes args, like so:
from functools import partial
from tornado.ioloop import IOLoop
loop = IOLoop.current()
f = partial(bar, i=100)
loop.run_sync(f)
I read the official tutorial on test-driven development, but it hasn't been very helpful in my case. I've written a small library that makes extensive use of twisted.web.client.Agent and its subclasses (BrowserLikeRedirectAgent, for instance), but I've been struggling in adapting the tutorial's code to my own test cases.
I had a look at twisted.web.test.test_web, but I don't understand how to make all the pieces fit together. For instance, I still have no idea how to get a Protocol object from an Agent, as per the official tutorial
Can anybody show me how to write a simple test for some code that relies on Agent to GET and POST data? Any additional details or advice is most welcome...
Many thanks!
How about making life simpler (i.e. code more readable) by using #inlineCallbacks.
In fact, I'd even go as far as to suggest staying away from using Deferreds directly, unless absolutely necessary for performance or in a specific use case, and instead always sticking to #inlineCallbacks—this way you'll keep your code looking like normal code, while benefitting from non-blocking behavior:
from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.internet.defer import inlineCallbacks
from twisted.trial import unittest
from twisted.web.http_headers import Headers
from twisted.internet.error import DNSLookupError
class SomeTestCase(unittest.TestCase):
#inlineCallbacks
def test_smth(self):
ag = Agent(reactor)
response = yield ag.request('GET', 'http://example.com/', Headers({'User-Agent': ['Twisted Web Client Example']}), None)
self.assertEquals(response.code, 200)
#inlineCallbacks
def test_exception(self):
ag = Agent(reactor)
try:
yield ag.request('GET', 'http://exampleeee.com/', Headers({'User-Agent': ['Twisted Web Client Example']}), None)
except DNSLookupError:
pass
else:
self.fail()
Trial should take care of the rest (i.e. waiting on the Deferreds returned from the test functions (#inlineCallbacks-wrapped callables also "magically" return a Deferred—I strongly suggest reading more on #inlineCallbacks if you're not familiar with it yet).
P.S. there's also a Twisted "plugin" for nosetests that enables you to return Deferreds from your test functions and have nose wait until they are fired before exiting: http://nose.readthedocs.org/en/latest/api/twistedtools.html
This is similar to what mike said, but attempts to test response handling. There are other ways of doing this, but I like this way. Also I agree that testing things that wrap Agent isn't too helpful and testing your protocol/keeping logic in your protocol is probably better anyway but sometimes you just want to add some green ticks.
class MockResponse(object):
def __init__(self, response_string):
self.response_string = response_string
def deliverBody(self, protocol):
protocol.dataReceived(self.response_string)
protocol.connectionLost(None)
class MockAgentDeliverStuff(Agent):
def request(self, method, uri, headers=None, bodyProducer=None):
d = Deferred()
reactor.callLater(0, d.callback, MockResponse(response_body))
return d
class MyWrapperTestCase(unittest.TestCase):
def setUp:(self):
agent = MockAgentDeliverStuff(reactor)
self.wrapper_object = MyWrapper(agent)
#inlineCallbacks
def test_something(self):
response_object = yield self.wrapper_object("example.com")
self.assertEqual(response_object, expected_object)
How about this? Run trial on the following. Basically you're just mocking away Agent and pretending it does as advertised, and using FakeAgent to (in this case) fail all requests. If you actually want to inject data into the transport, that would take "more doing" I guess. But are you really testing your code, then? Or Agent's?
from twisted.web import client
from twisted.internet import reactor, defer
class BidnessLogik(object):
def __init__(self, agent):
self.agent = agent
self.money = None
def make_moneee_quik(self):
d = self.agent.request('GET', 'http://no.traffic.plz')
d.addCallback(self.made_the_money).addErrback(self.no_dice)
return d
def made_the_money(self, *args):
##print "Moneeyyyy!"
self.money = True
return 'money'
def no_dice(self, fail):
##print "Better luck next time!!"
self.money = False
return 'no dice'
class FailingAgent(client.Agent):
expected_uri = 'http://no.traffic.plz'
expected_method = 'GET'
reasons = ['No Reason']
test = None
def request(self, method, uri, **kw):
if self.test:
self.test.assertEqual(self.expected_uri, uri)
self.test.assertEqual(self.expected_method, method)
self.test.assertEqual([], kw.keys())
return defer.fail(client.ResponseFailed(reasons=self.reasons,
response=None))
class TestRequest(unittest.TestCase):
def setUp(self):
self.agent = FailingAgent(reactor)
self.agent.test = self
#defer.inlineCallbacks
def test_foo(self):
bid = BidnessLogik(self.agent)
resp = yield bid.make_moneee_quik()
self.assertEqual(resp, 'no dice')
self.assertEqual(False, bid.money)
I've been using testbed, webtest, and nose to test my Python GAE app, and it is a great setup. I'm now implementing something similar to Nick's great example of using the deferred library, but I can't figure out a good way to test the parts of the code triggered by DeadlineExceededError.
Since this is in the context of a taskqueue, it would be painful to construct a test that took more than 10 minutes to run. Is there a way to temporarily set the taskqueue time limit to a few seconds for the purpose of testing? Or perhaps some other way to elegantly test the execution of code in the except DeadlineExceededError block?
Abstract the "GAE context" for your code. in production provide real "GAE implementation" for testing provide a mock own that will raise the DeadlineExceededError. The test should not depend on any timeout, should be fast.
Sample abstraction (just glue):
class AbstractGAETaskContext(object):
def task_spired(): pass # this will throw exception in mock impl
# here you define any method that you call into GAE, to be mocked
def defered(...): pass
If you don't like abstraction, you can do monkey patching for testing only, also you need to define the task_expired function to be your hook for testing.
task_expired should be called during your task implementation function.
*UPDATED*This the 3rd solution:
First I want to mention that the Nick's sample implementation is not so great, the Mapper class has to many responsabilities(deferring, query data, update in batch); and this make the test hard to made, a lot of mocks need to be defined. So I extract the deferring responsabilities in a separate class. You only want to test that deferring mechanism, what actually is happen(the update, query, etc) should be handled in other test.
Here is deffering class, also this no more depends on GAE:
class DeferredCall(object):
def __init__(self, deferred):
self.deferred = deferred
def run(self, long_execution_call, context, *args, **kwargs):
''' long_execution_call should return a tuple that tell us how was terminate operation, with timeout and the context where was abandoned '''
next_context, timeouted = long_execution_call(context, *args, **kwargs)
if timeouted:
self.deferred(self.run, next_context, *args, **kwargs)
Here is the test module:
class Test(unittest.TestCase):
def test_defer(self):
calls = []
def mock_deferrer(callback, *args, **kwargs):
calls.append((callback, args, kwargs))
def interrupted(self, context):
return "new_context", True
d = DeferredCall()
d.run(interrupted, "init_context")
self.assertEquals(1, len(calls), 'a deferred call should be')
def test_no_defer(self):
calls = []
def mock_deferrer(callback, *args, **kwargs):
calls.append((callback, args, kwargs))
def completed(self, context):
return None, False
d = DeferredCall()
d.run(completed, "init_context")
self.assertEquals(0, len(calls), 'no deferred call should be')
How will look the Nick's Mapper implementation:
class Mapper:
...
def _continue(self, start_key, batch_size):
... # here is same code, nothing was changed
except DeadlineExceededError:
# Write any unfinished updates to the datastore.
self._batch_write()
# Queue a new task to pick up where we left off.
##deferred.defer(self._continue, start_key, batch_size)
return start_key, True ## make compatible with DeferredCall
self.finish()
return None, False ## make it comaptible with DeferredCall
runner = _continue
Code where you register the long running task; this only depend on the GAE deferred lib.
import DeferredCall
import PersonMapper # this inherits the Mapper
from google.appengine.ext import deferred
mapper = PersonMapper()
DeferredCall(deferred).run(mapper.run)