Trouble with async context and structuring my code - python

I am using a library that requires async context(aioboto3).
My issue is that I can't call methods from outside the async with block on my custom S3StreamingFile instance. If I do so, python raises an exception, telling me that HttpClient is None.
I want to access the class methods of S3StreamingFile from an outer function, for example in a API route. I don't want to return anything more(from file_2.py) than the S3StreamingFile class instance to the caller(file_3.py). The aioboto3 related code can't be moved to file_3.py. file_1.py and file_2.py need to contain the aioboto3 related logic.
How can I solve this?
Example of not working code:
# file_1.py
class S3StreamingFile():
def __init__(self, s3_object):
self.s3_object = s3_object
async def size(self):
return await self.s3_object.content_length # raises exception, HttpClient is None
...
# file_2.py
async def get_file():
async with s3.resource(...) as resource:
s3_object = await resource.Object(...)
s3_file = S3StreamingFile(s3_object)
return s3_file
# file_3.py
async def main()
s3_file = await get_file()
size = await s3_file.size() # raises exception, HttpClient is None
Example of working code:
# file_1.py
class S3StreamingFile():
def __init__(self, s3_object):
self.s3_object = s3_object
async def size(self):
return await self.s3_object.content_length
...
# file_2.py
async def get_file():
async with s3.resource(...) as resource:
s3_object = await resource.Object(...)
s3_file = S3StreamingFile(s3_object)
size = await s3_file.size() # works OK here, HttpClient is available
return s3_file
# file_3.py
async def main()
s3_file = await get_file()

I want to access the class methods from an outer function... how do I solve this?
Don't. This library is using async context managers to handle resource acquisition/release. The whole point about the context manager is that things like s3_file.size() only make sense when you have acquired the relevant resource (here the s3 file instance).
But how do you use this data in the rest of your program? In general---since you haven't said what the rest of your program is or why you want this data---there are two approaches:
acquire the resource somewhere else, and then make it available in much larger scopes, or
make your other functions resource-aware.
In the first case, you'd acquire the resource before all the logic runs, and then hold on to it. (This might look like RAII.) This might well make sense in smaller scripts, or when a resource is designed to be held by only one process at a time. It's a poor fit for code which will spend most of its time doing nothing, or has to coexist with other users of the resource. (An extension of this is writing your own code as a context manager, and effectively moving the problem up the calling stack. If each code path only handles one resource, this might well be the way to go.)
In the second, you'd write your higher-level functions to be aware that they're accessing a resource. You might do this by passing the resource itself around:
def get_file(resource: AcquiredResource) -> FileResource:
...
def get_size(thing: AcquirableResource) -> int:
with thing as resource:
s3_file = get_file(resource)
return s3_file.size
(using made-up generic types here to illustrate the point).
Or you might want a static copy of (some) attrs of a particular resource, like a file here, and a step where you build that copy. Personally I would likely store those in a dict or local object to make it clear that I wasn't handling the resource itself.
The basic idea here is that the with block guards access to a potentially difficult-to-acquire resource. That safety is built into the library, but it comes at the cost of having to think about the acquisition and structure it into the flow of your code.

Related

How to test operations in a context manager using pytest

I have a database handler that utilizes SQLAlchemy ORM to communicate with a database. As part of SQLAlchemy's recommended practices, I interact with the session by using it as a context manager. How can I test what a function called inside the context manager using that context manager has done?
EDIT: I realized the file structure mattered due to the complexity in introduced. I re-structured the code below to more closely mirror what the end file structure will be like, and what a common production repo in my environment would look like, with code being defined in one file and tests in a completely separate file.
For example:
Code File (delete_things_from_table.py):
from db_handler import delete, SomeTable
def delete_stuff(handler):
stmt = delete(SomeTable)
with handler.Session.begin() as session:
session.execute(stmt)
session.commit()
Test File:
import pytest
import delete_things_from_table as dlt
from db_handler import Handler
def test_delete_stuff():
handler = db_handler()
dlt.delete_stuff(handler):
# Test that session.execute was called
# Test the value of 'stmt'
# Test that session.commit was called
I am not looking for a solution specific to SQLAlchemy; I am only utilizing this to highlight what I want to test within a context manager, and any strategies for testing context managers are welcome.
After sleeping on it, I came up with a solution. I'd love additional/less complex solutions if there are any available, but this works:
import pytest
import delete_things_from_table as dlt
from db_handler import Handler
class MockSession:
def __init__(self):
self.execute_params = []
self.commit_called = False
def execute(self, *args, **kwargs):
self.execute_params.append(["call", args, kwargs])
return self
def commit(self):
self.commit_called = True
return self
def begin(self):
return self
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
pass
def test_delete_stuff(monkeypatch):
handler = db_handler()
# Parens in 'MockSession' below are Important, pass an instance not the class
monkeypatch.setattr(handler, Session, MockSession())
dlt.delete_stuff(handler):
# Test that session.execute was called
assert len(handler.Session.execute_params)
# Test the value of 'stmt'
assert str(handler.Session.execute_params[0][1][0]) == "DELETE FROM some_table"
# Test that session.commit was called
assert handler.Session.commit_called
Some key things to note:
I created a static mock instead of a MagicMock as it's easier to control the methods/data flow with a custom mock class
Since the SQLAlchemy session context manager requires a begin() to start the context, my mock class needed a begin. Returning self in begin allows us to test the values later.
context managers rely on on the magic methods __enter__ and __exit__ with the argument signatures you see above.
The mocked class contains mocked methods which alter instance variables allowing us to test later
This relies on monkeypatch (there are other ways I'm sure), but what's important to note is that when you pass your mock class you want to patch in an instance of the class and not the class itself. The parentheses make a world of difference.
I don't think it's an elegant solution, but it's working. I'll happily take any suggestions for improvement.

Testing two methods in an Azure function in python

I am writing tests for my azure function, and for some reason - I can't mock a function call. I should also mention this is the first time I'm writing a python test case so be nice :)
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
try:
req_body = req.get_json()
except ValueError as error:
logging.info(error)
download_excel(req_body)
return func.HttpResponse(
"This HTTP triggered function executed successfully.",
status_code=200
)
so thats the initial function. This function calls download_excel and pass the request body. The next function receives the request body, and writes that excel to a blob storage.
def download_excel(request_body: Any):
excel_file = request_body["items_excel"]
#initiate the blob storage client
blob_service_client = BlobServiceClient.from_connection_string(os.environ["AzureWebJobsStorage"])
container = blob_service_client.get_container_client(CONTAINER_NAME)
blob_path = "excel-path/items.xlsx"
blob_client = container.get_blob_client(blob_path)
blob_client.upload_blob_from_url(excel_file)
Those are the two functions. receive a file, save it to blob storage, but i can't mock the download_excel call in the main function. I've tried using mock, patch, went through all sorts of links, and i just can't find a way to achieve this. Any help would be appreciated. here is what i currently have in the test file.
class TestFunction(unittest.TestCase):
##patch('download_excel')
def get_excel_files_main(self):
"""Test main function"""
req = Mock()
resp = main(req)
# download_excel= MagicMock()
self.assertEqual(resp.status_code, 200)
commenting out the function call in the function and in the test makes the test pass, but i need to know how to mock the download_excel call. I'm still going to write a test case for the download_excel function, but will cross that bridge when i get to it.
Figured it out. I'm pretty silly. Main issue was in an azure function, I figured since there was no class I can ignore every example in the doc that had to do with classes.
The trick is to use the function name as a class. so say you have function name - http_trigger, and a init.py file within that function folder. Within that init file - you have your main method, and a second method thats called from the main method - you can use MagicMock.
import function_name
def test_main_function(self):
"""Testing main function"""
function_name.second_method_being_called = MagicMock()
Thats it. Thats how you mock it! *facepalm

How to make an unified interface for synchronous and asynchronous code in Python? [duplicate]

When implementing classes that have uses in both synchronous and asynchronous applications, I find myself maintaining virtually identical code for both use cases.
Just as an example, consider:
from time import sleep
import asyncio
class UselessExample:
def __init__(self, delay):
self.delay = delay
async def a_ticker(self, to):
for i in range(to):
yield i
await asyncio.sleep(self.delay)
def ticker(self, to):
for i in range(to):
yield i
sleep(self.delay)
def func(ue):
for value in ue.ticker(5):
print(value)
async def a_func(ue):
async for value in ue.a_ticker(5):
print(value)
def main():
ue = UselessExample(1)
func(ue)
loop = asyncio.get_event_loop()
loop.run_until_complete(a_func(ue))
if __name__ == '__main__':
main()
In this example, it's not too bad, the ticker methods of UselessExample are easy to maintain in tandem, but you can imagine that exception handling and more complicated functionality can quickly grow a method and make it more of an issue, even though both methods can remain virtually identical (only replacing certain elements with their asynchronous counterparts).
Assuming there's no substantial difference that makes it worth having both fully implemented, what is the best (and most Pythonic) way of maintaining a class like this and avoiding needless duplication?
There is no one-size-fits-all road to making an asyncio coroutine-based codebase useable from traditional synchronous codebases. You have to make choices per codepath.
Pick and choose from a series of tools:
Synchronous versions using asyncio.run()
Provide synchronous wrappers around coroutines, which block until the coroutine completes.
Even an async generator function such as ticker() can be handled this way, in a loop:
class UselessExample:
def __init__(self, delay):
self.delay = delay
async def a_ticker(self, to):
for i in range(to):
yield i
await asyncio.sleep(self.delay)
def ticker(self, to):
agen = self.a_ticker(to)
try:
while True:
yield asyncio.run(agen.__anext__())
except StopAsyncIteration:
return
These synchronous wrappers can be generated with helper functions:
from functools import wraps
def sync_agen_method(agen_method):
#wraps(agen_method)
def wrapper(self, *args, **kwargs):
agen = agen_method(self, *args, **kwargs)
try:
while True:
yield asyncio.run(agen.__anext__())
except StopAsyncIteration:
return
if wrapper.__name__[:2] == 'a_':
wrapper.__name__ = wrapper.__name__[2:]
return wrapper
then just use ticker = sync_agen_method(a_ticker) in the class definition.
Straight-up coroutine methods (not generator coroutines) could be wrapped with:
def sync_method(async_method):
#wraps(async_method)
def wrapper(self, *args, **kwargs):
return async.run(async_method(self, *args, **kwargs))
if wrapper.__name__[:2] == 'a_':
wrapper.__name__ = wrapper.__name__[2:]
return wrapper
Factor out common components
Refactor out the synchronous parts, into generators, context managers, utility functions, etc.
For your specific example, pulling out the for loop into a separate generator would minimise the duplicated code to the way the two versions sleep:
class UselessExample:
def __init__(self, delay):
self.delay = delay
def _ticker_gen(self, to):
yield from range(to)
async def a_ticker(self, to):
for i in self._ticker_gen(to):
yield i
await asyncio.sleep(self.delay)
def ticker(self, to):
for i in self._ticker_gen(to):
yield i
sleep(self.delay)
While this doesn't make much of any difference here it can work in other contexts.
Abstract Syntax Tree tranformation
Use AST rewriting and a map to transform coroutines into synchronous code. This can be quite fragile if you are not careful on how you recognise utility functions such as asyncio.sleep() vs time.sleep():
import inspect
import ast
import copy
import textwrap
import time
asynciomap = {
# asyncio function to (additional globals, replacement source) tuples
"sleep": ({"time": time}, "time.sleep")
}
class AsyncToSync(ast.NodeTransformer):
def __init__(self):
self.globals = {}
def visit_AsyncFunctionDef(self, node):
return ast.copy_location(
ast.FunctionDef(
node.name,
self.visit(node.args),
[self.visit(stmt) for stmt in node.body],
[self.visit(stmt) for stmt in node.decorator_list],
node.returns and ast.visit(node.returns),
),
node,
)
def visit_Await(self, node):
return self.visit(node.value)
def visit_Attribute(self, node):
if (
isinstance(node.value, ast.Name)
and isinstance(node.value.ctx, ast.Load)
and node.value.id == "asyncio"
and node.attr in asynciomap
):
g, replacement = asynciomap[node.attr]
self.globals.update(g)
return ast.copy_location(
ast.parse(replacement, mode="eval").body,
node
)
return node
def transform_sync(f):
filename = inspect.getfile(f)
lines, lineno = inspect.getsourcelines(f)
ast_tree = ast.parse(textwrap.dedent(''.join(lines)), filename)
ast.increment_lineno(ast_tree, lineno - 1)
transformer = AsyncToSync()
transformer.visit(ast_tree)
tranformed_globals = {**f.__globals__, **transformer.globals}
exec(compile(ast_tree, filename, 'exec'), tranformed_globals)
return tranformed_globals[f.__name__]
While the above is probably far from complete enough to fit all needs, and transforming AST trees can be daunting, the above would let you maintain just the async version and map that version to synchronous versions directly:
>>> import example
>>> del example.UselessExample.ticker
>>> example.main()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../example.py", line 32, in main
func(ue)
File "/.../example.py", line 21, in func
for value in ue.ticker(5):
AttributeError: 'UselessExample' object has no attribute 'ticker'
>>> example.UselessExample.ticker = transform_sync(example.UselessExample.a_ticker)
>>> example.main()
0
1
2
3
4
0
1
2
3
4
async/await is infectious by design.
Accept that your code will have different users — synchronous and asynchronous, and that these users will have different requirements, that over time the implementations will diverge.
Publish separate libraries
For example, compare aiohttp vs. aiohttp-requests vs. requests.
Likewise, compare asyncpg vs. psycopg2.
How to get there
Opt1. (easy) clone implementation, allow them to diverge.
Opt2. (sensible) partial refactor, let e.g. async library depend on and import sync library.
Opt3. (radical) create a "pure" library that can be used both in sync and async program. For example, see https://github.com/python-hyper/hyper-h2 .
On the upside, testing is easier and thorough. Consider how hard (or impossible) it is force the test framework to evaluate all possible concurrent execution orders in an async program. Pure library doesn't need that :)
On the down-side this style of programming requires different thinking, is not always straightforward, and may be suboptimal. For example, instead of await socket.read(2**20) you'd write for event in fsm.push(data): ... and rely on your library user to provide you with data in good-sized chunks.
For context, see the backpressure argument in https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/

Combining py.test and trio/curio

I would to combine pytest and trio (or curio, if that is any easier), i.e. write my test cases as coroutine functions. This is relatively easy to achieve by declaring a custom test runner in conftest.py:
#pytest.mark.tryfirst
def pytest_pyfunc_call(pyfuncitem):
'''If item is a coroutine function, run it under trio'''
if not inspect.iscoroutinefunction(pyfuncitem.obj):
return
kernel = trio.Kernel()
funcargs = pyfuncitem.funcargs
testargs = {arg: funcargs[arg]
for arg in pyfuncitem._fixtureinfo.argnames}
try:
kernel.run(functools.partial(pyfuncitem.obj, **testargs))
finally:
kernel.run(shutdown=True)
return True
This allows me to write test cases like this:
async def test_something():
server = MockServer()
server_task = await trio.run(server.serve)
try:
# test the server
finally:
server.please_terminate()
try:
with trio.fail_after(30):
server_task.join()
except TooSlowError:
server_task.cancel()
But this is a lot of boilerplate. In non-async code, I would factor this out into a fixture:
#pytest.yield_fixture()
def mock_server():
server = MockServer()
thread = threading.Thread(server.serve)
thread.start()
try:
yield server
finally:
server.please_terminate()
thread.join()
server.server_close()
def test_something(mock_server):
# do the test..
Is there a way to do the same in trio, i.e. implement async fixtures? Ideally, I would just write:
async def test_something(mock_server):
# do the test..
Edit: the answer below is mostly irrelevant now – instead use pytest-trio and follow the instructions in its manual.
Your example pytest_pyfunc_call code doesn't work becaues it's a mix of trio and curio :-). For trio, there's a decorator trio.testing.trio_test that can be used to mark individual tests (like if you were using classic unittest or something), so the simplest way to write a pytest plugin function is to just apply this to each async test:
from trio.testing import trio_test
#pytest.mark.tryfirst
def pytest_pyfunc_call(pyfuncitem):
if inspect.iscoroutine(pyfuncitem.obj):
# Apply the #trio_test decorator
pyfuncitem.obj = trio_test(pyfuncitem.obj)
In case you're curious, this is basically equivalent to:
import trio
from functools import wraps, partial
#pytest.mark.tryfirst
def pytest_pyfunc_call(pyfuncitem):
if inspect.iscoroutine(pyfuncitem.obj):
fn = pyfuncitem.obj
#wraps(fn)
def wrapper(**kwargs):
trio.run(partial(fn, **kwargs))
pyfuncitem.obj = wrapper
Anyway, that doesn't solve your problem with fixtures – for that you need something much more involved.

App Engine (Python) Datastore Precall API Hooks

Background
So let's say I'm making app for GAE, and I want to use API Hooks.
BIG EDIT: In the original version of this question, I described my use case, but some folks correctly pointed out that it was not really suited for API Hooks. Granted! Consider me helped. But now my issue is academic: I still don't know how to use hooks in practice, and I'd like to. I've rewritten my question to make it much more generic.
Code
So I make a model like this:
class Model(db.Model):
user = db.UserProperty(required=True)
def pre_put(self):
# Sets a value, raises an exception, whatever. Use your imagination
And then I create a db_hooks.py:
from google.appengine.api import apiproxy_stub_map
def patch_appengine():
def hook(service, call, request, response):
assert service == 'datastore_v3'
if call == 'Put':
for entity in request.entity_list():
entity.pre_put()
apiproxy_stub_map.apiproxy.GetPreCallHooks().Append('preput',
hook,
'datastore_v3')
Being TDD-addled, I'm making all this using GAEUnit, so in gaeunit.py, just above the main method, I add:
import db_hooks
db_hooks.patch_appengine()
And then I write a test that instantiates and puts a Model.
Question
While patch_appengine() is definitely being called, the hook never is. What am I missing? How do I make the pre_put function actually get called?
Hooks are a little low level for the task at hand. What you probably want is a custom property class. DerivedProperty, from aetycoon, is just the ticket.
Bear in mind, however, that the 'nickname' field of the user object is probably not what you want - per the docs, it's simply the user part of the email field if they're using a gmail account, otherwise it's their full email address. You probably want to let users set their own nicknames, instead.
The issue here is that within the context of the hook() function an entity is not an instance of db.Model as you are expecting.
In this context entity is the protocol buffer class confusingly referred to as entity (entity_pb). Think of it like a JSON representation of your real entity, all the data is there, and you could build a new instance from it, but there is no reference to your memory-resident instance that is waiting for it's callback.
Monkey patching all of the various put/delete methods is the best way to setup Model-level callbacks as far as I know†
Since there doesn't seem to be that many resources on how to do this safely with the newer async calls, here's a BaseModel that implements before_put, after_put, before_delete & after_delete hooks:
class HookedModel(db.Model):
def before_put(self):
logging.error("before put")
def after_put(self):
logging.error("after put")
def before_delete(self):
logging.error("before delete")
def after_delete(self):
logging.error("after delete")
def put(self):
return self.put_async().get_result()
def delete(self):
return self.delete_async().get_result()
def put_async(self):
return db.put_async(self)
def delete_async(self):
return db.delete_async(self)
Inherit your model-classes from HookedModel and override the before_xxx,after_xxx methods as required.
Place the following code somewhere that will get loaded globally in your applicaiton (like main.py if you use a pretty standard looking layout). This is the part that calls our hooks:
def normalize_entities(entities):
if not isinstance(entities, (list, tuple)):
entities = (entities,)
return [e for e in entities if hasattr(e, 'before_put')]
# monkeypatch put_async to call entity.before_put
db_put_async = db.put_async
def db_put_async_hooked(entities, **kwargs):
ents = normalize_entities(entities)
for entity in ents:
entity.before_put()
a = db_put_async(entities, **kwargs)
get_result = a.get_result
def get_result_with_callback():
for entity in ents:
entity.after_put()
return get_result()
a.get_result = get_result_with_callback
return a
db.put_async = db_put_async_hooked
# monkeypatch delete_async to call entity.before_delete
db_delete_async = db.delete_async
def db_delete_async_hooked(entities, **kwargs):
ents = normalize_entities(entities)
for entity in ents:
entity.before_delete()
a = db_delete_async(entities, **kwargs)
get_result = a.get_result
def get_result_with_callback():
for entity in ents:
entity.after_delete()
return get_result()
a.get_result = get_result_with_callback
return a
db.delete_async = db_delete_async_hooked
You can save or destroy your instances via model.put() or any of the db.put(), db.put_async() etc, methods and get the desired effect.
†would love to know if there is an even better solution!?
I don't think that Hooks are really going to solve this problem. The Hooks will only run in the context of your AppEngine application, but the user can change their nickname outside of your application using Google Account settings. If they do that, it won't trigger any logic implement in your hooks.
I think that the real solution to your problem is for your application to manage its own nickname that is independent of the one exposed by the Users entity.

Categories