Using asyncio with blocking code

Using asyncio with blocking code - python

Firstly, I looked at this, this and this and whilst the first has some useful information, it's not relevant here because I'm trying to iterate over values.
Here's an example of something I want to be able to do:
class BlockingIter:
def __iter__(self):
while True:
yield input()
async def coroutine():
my_iter = BlockingIter()
#Magic thing here
async for i in my_iter:
await do_stuff_with(i)
How would I go about this?
(Note, BlockingIter is in reality a library I'm using (chatexchange) so there might be a few other complications.)

As #vaultah says and also explained in the docs, awaiting the executor (await loop.run_in_executor(None, next, iter_messages)) is probably what you want.

Related

Why do I need to await twice

I'm trying to run some IO blocking code, so I'm using to_thread to send the function to another thread. I tried several things, but in all cases, I seem to have to await the to_thread which just returns another coroutine (?!) that I then have to await again. For some reason this isn't quite clicking.
import asyncio
async def search(keyword_list):
coroutines = set()
for index, kw in enumerate(keyword_list):
coroutines.add(asyncio.to_thread(do_lookup, keyword, index))
for result in asyncio.as_completed(coroutines):
outcome, index = await result
# Do some magic with the outcome and index
# BUT, this doesn't work because `await result` apparently
# just returns ANOTHER coroutine!
async def do_lookup(keyword, index):
# Do long, blocking stuff here
print(f'running...{keyword} {index}')
return keyword, index
if __name__ == '__main__':
asyncio.run(search([1, 2, 3, 4]))

As I was copy/pasting and adapting my code to make a generic example, I discovered the problem here.
do_lookup is supposed to a synchronous function (because of the usage of to_thread), so by defining it async dev do_lookup I'm instead defining it as an asynchronous function, thereby causing the "double" await issue.
Simply redefining do_lookup without the async keyword did the trick!

How do I collect task garbages in Python asyncio?

Sorry for the bad English, but I'll try my best.
Consider the following code:
import asyncio
async def main():
async def printA():
await asyncio.sleep(1)
print('a')
# create a stream
stream=...
async for message in stream:
pass
asyncio.run(main())
Yes, printA is not yet used.
Now I want to invoke printA when I see some types of messages from the stream.
If I can accept that the stream waits printA is done to continue, I can write something like this:
async for message in stream:
if message=='printA':
await printA()
But I can't, so I must write at least:
async def main():
async def printA():
await asyncio.sleep(1)
print('a')
# create a stream
stream=...
taskSet=set()
async for message in stream:
if message=='printA':
taskSet.add(asyncio.create_task(printA()))
await asyncio.gather(*taskSet)
But if the stream is long enough, taskSet would become really big, even if many printA(s) are in fact already done.
So I would want them to be removed as soon as they are done.
I don't know how to write this from now on.
Can I remove that task within printA? The execution of printA() won't be earlier than create_task is invoked, but would it be later than create_task returns? Documentation does not seem to guarantee that. Although I found some says that it is guaranteed by the current implementation.
I can't simply discard the task reference, right? As the doc of create_task says:
Important: Save a reference to the result of this function, to avoid a task disappearing mid execution.

You can find the answer directly in the bug report concerning the same problem of "fire and forget" tasks that led to the documentation update "Important: Save a reference ..."
https://bugs.python.org/issue44665
I'll copy the recipe for an automatic task removal:
running_tasks = set()
# [...]
task = asyncio.create_task(some_background_function())
running_tasks.add(task)
task.add_done_callback(lambda t: running_tasks.remove(t))

How to use Trio for fast web api calls?

I'm trying to speed up some code that calls an api_caller(), which is a generator that you can iterate over to get results.
My synchronous code looks something like this:
def process_comment_tree(p):
# time consuming breadth first search that makes another api call...
return
def process_post(p):
process_comment_tree(p)
def process_posts(kw):
for p in api_caller(query=kw): #possibly 1000s of results
process_post(p)
def process_kws(kws):
for kw in kws:
process_posts(kw)
process_kws(kws=['python', 'threads', 'music'])
When I run this code on a long list of kws, it takes around 18 minutes to complete.
When I use threads:
with concurrent.futures.ThreadPoolExecutor(max_workers=len(KWS)) as pool:
for result in pool.map(process_posts, ['python', 'threads', 'music']):
print(f'result: {result}')
the code completes in around 3 minutes.
Now, I'm trying to use Trio for the first time, but I'm having trouble.
async def process_comment_tree(p):
# same as before...
return
async def process_post(p):
await process_comment_tree(p)
async def process_posts(kw):
async with trio.open_nursery() as nursery:
for p in r.api.search_submissions(query=kw)
nursery.start_soon(process_post, p)
async def process_kws(kws):
async with trio.open_nursery() as nursery:
for kw in kws:
nursery.start_soon(process_posts, kw)
trio.run(process_kws, ['python', 'threads', 'music'])
This still takes around 18 minutes to execute. Am I doing something wrong here, or is something like trio/async not appropriate for my problem setup?

Trio, and async libraries in general, work by switching to a different task while waiting for something external, like an API call. In your code example, it looks like you start a bunch of tasks, but wait for something external. I would recommend reading this part of the tutorial; it gives an idea of what that means: https://trio.readthedocs.io/en/stable/tutorial.html#task-switching-illustrated
Basically, your code has to call a function that will pass control back to the run loop so that it can switch to a different task.
If your api_caller generator makes calls to an external API, that's likely to be something you can replace with async calls. You'll need to use an async http library, like HTTPX or hip
On the other hand, if there's nothing in your code that has to wait for something external, then async won't help your code go faster.

lock an asyncio function when more than 100 are running?

This is gonna be a bad explanation but I don't know how else to word this so bear with me please.
I have one function:
async def request():
# this can only be called n times at once
But as it says it can only be called n times at once. Is it possible to have some sort of pool with a limited number of objects so I can do this:
async def request():
await with poolOfOneHundred.acquire():
# do something
and then python would acquire 100 of these, and then once it gets to the 101th it would wait at the await with statement until another request() function was finished, then one lock in the pool would be free.
Is this a thing? If not, how could I implement something like this?
Does this make any sense?

You're looking for asyncio.Semaphore.
Here's an example of how to use it.

Python 3.6 async await without asyncio: how to write own simplest event loop?

Just like to understand async await syntax, so I am looking for some 'hello world' app without using asyncio at all.
So how to create simplest event loop using only Python syntax itself? The simplest code (from this Start async function without importing the asyncio package , further code is much more then hello world, that's why I am asking) looks like that:
async def cr():
while(True):
print(1)
cr().send(None)
It prints 1 infinitely, not so good.
So the 1st question is how to yield from coroutine back to the main flow? yield keyword makes coroutine async generator, not we expected.
I would also appreciate a simple application, like this
i.e. we have a coroutine which prints 1, then yields to event loop, then prints 2 then exits with return 3, and simple event loop, which push coroutine until return and consume result.

How about this?
import types
#types.coroutine
def simple_coroutine():
print(1)
yield
print(2)
return 3
future = simple_coroutine()
while True:
try: future.send(None)
except StopIteration as returned:
print('It has returned', returned.value)
break
I think your biggest problem is that you're mixing concepts. An async function is not the same as a coroutine. It is more appropriate to think of it as a way of combining coroutines. Same as ordinary def functions are a way of combining statements into functions. Yes, Python is highly reflective language, so def is also a statement, and what you get from your async function is also a coroutine---but you need to have something at the bottom, something to start with. (At the bottom, yielding is just yielding. At some intermediate level, it is awaiting---of something else, of course.) That's given to you through the types.coroutine decorator in the standard library.
If you have any more questions, feel free to ask.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using asyncio with blocking code - python

As #vaultah says and also explained in the docs, awaiting the executor (await loop.run_in_executor(None, next, iter_messages)) is probably what you want.

Related

Why do I need to await twice

How do I collect task garbages in Python asyncio?

How to use Trio for fast web api calls?

lock an asyncio function when more than 100 are running?

Python 3.6 async await without asyncio: how to write own simplest event loop?

Categories

Resources