In trying to aggregate the results from an asynchronous generator, like so:
async def result_tuple():
async def result_generator():
# some await things happening in here
yield 1
yield 2
return tuple(num async for num in result_generator())
I get a
TypeError: 'async_generator' object is not iterable
when executing the async for line.
But PEP 530 seems to suggest that it should be valid:
Asynchronous Comprehensions
We propose to allow using async for inside list, set and dict comprehensions. Pending PEP 525 approval, we can also allow creation of asynchronous generator expressions.
Examples:
set comprehension: {i async for i in agen()};
list comprehension: [i async for i in agen()];
dict comprehension: {i: i ** 2 async for i in agen()};
generator expression: (i ** 2 async for i in agen()).
What's going on, and how can I aggregate an asynchronous generator into a single tuple?
In the PEP excerpt, the comprehensions are listed side-by-side in the same bullet list, but the generator expression is very different from the others.
There is no such thing as a "tuple comprehension". The argument to tuple() makes an asynchronous generator:
tuple(num async for num in result_generator())
The line is equivalent to tuple(result_generator()). The tuple then tries to iterate over the generator synchronously and raises the TypeError.
The other comprehensions will work, though, as the question expected. So it's possible to generate a tuple by first aggregating to a list, like so:
async def result_tuple():
async def result_generator():
# some await things happening in here
yield 1
yield 2
return tuple([num async for num in result_generator()])
Related
I'm new to python. I'm using the latest version.
I have a for loop that takes long to execute, and would like to run it in parallel to improve performances.
After some research, I gathered that async.io and async for is my best option, but I haven't understood yet how I can transform my for loop using this technique.
Here is my code:
def filter(my_list):
res = []
for _ in my_list:
if check(_): # this takes a while to execute
res.append(_)
else:
print(f'{_} removed')
return res
How can I optimize the execution time of this program ?
The rest of the program should remain the same, meaning that calling filter should not change, and should return a filtered list.
Thanks
Async
unless you modify check()
to be an async function
use async libraries/modules
to be primarily io bound
you will not gain any performance from async. example of a valid async function.
async def check(item):
await asyncio.sleep(1)
return item > 5
if you did have an async check function, you could do something like this
series version that takes 10s
my_list = list(range(10))
res = [item for item in my_list if await check(item)]
vs parralel version that takes 1s
import asyncio
my_list = list(range(10))
check_tasks = [check(_) for _ in my_list]
checked = await asyncio.gather(*check_tasks)
res = [item for keep, item in zip(checked, my_list) if keep]
print(res)
Note how while creating the list of check_tasks, we don't use await. This is because asyncio.gather takes in coroutines.
Also if you use time.sleep(1) instead of asyncio.sleep(1), both series and parallel will have the same 10s runtime.
If you want to limit the maximum async coroutines executing at one point of time, you can use an asyncio.Semaphore and modify check().
Example - if we want 2 in parallel at a given time -
sem = asyncio.Semaphore(2)
async def check(item):
async with sem:
await asyncio.sleep(1)
return item > 5
which takes 5s
Multiprocessing version
check is defined as
import time
def check(item):
await time.sleep(1)
return item > 5
our initial code that runs in series will be
my_list = list(range(10))
checked = map(check, my_list)
res = [item for keep, item in zip(checked, my_list) if keep]
print(res)
and the parallel version will be
from multiprocessing import Pool
my_list = list(range(10))
with Pool(5) as p:
checked = p.map(check, my_list)
res = [item for keep, item in zip(checked, my_list) if keep]
print(res)
Pool(5) will start 5 processes here. Keep in mind that starting a process is expensive.
I'm trying to do something like this:
mylist.sort(key=lambda x: await somefunction(x))
But I get this error:
SyntaxError: 'await' outside async function
Which makes sense because the lambda is not async.
I tried to use async lambda x: ... but that throws a SyntaxError: invalid syntax.
Pep 492 states:
Syntax for asynchronous lambda functions could be provided, but this construct is outside of the scope of this PEP.
But I could not find out if that syntax was implemented in CPython.
Is there a way to declare an async lambda, or to use an async function for sorting a list?
You can't. There is no async lambda, and even if there were, you coudln't pass it in as key function to list.sort(), since a key function will be called as a synchronous function and not awaited. An easy work-around is to annotate your list yourself:
mylist_annotated = [(await some_function(x), x) for x in mylist]
mylist_annotated.sort()
mylist = [x for key, x in mylist_annotated]
Note that await expressions in list comprehensions are only supported in Python 3.6+. If you're using 3.5, you can do the following:
mylist_annotated = []
for x in mylist:
mylist_annotated.append((await some_function(x), x))
mylist_annotated.sort()
mylist = [x for key, x in mylist_annotated]
An "async lambda" can be emulated by combining a lambda with an async generator:1
key=lambda x: (await somefunction(x) for _ in '_').__anext__()
It is possible to move the ( ).__anext__() to a helper, which likely makes the pattern clearer as well:
def head(async_iterator): return async_iterator.__anext__()
key=lambda x: head(await somefunction(x) for _ in '_')
Note that the sort method/function in the standard library are not async. One needs an async version, such as asyncstdlib.sorted (disclaimer: I maintain this library):
import asyncstdlib as a
mylist = await a.sorted(mylist, key=lambda x: head(await somefunction(x) for _ in '_'))
Understanding the lambda ...: (...).__anext__() pattern
An "async lambda" would be an anonymous asynchronous function, or in other words an anonymous function evaluating to an awaitable. This is in parallel to how async def defines a named function evaluating to an awaitable.
The task can be split into two parts: An anonymous function expression and a nested awaitable expression.
An anonymous function expression is exactly what a lambda ...: ... is.
An awaitable expression is only allowed inside a coroutine function; however:
An (asynchronous) generator expression implicitly creates a (coroutine) function. As an async generator only needs async to run, it can be defined in a sync function (since Python 3.7).
An asynchronous iterable can be used as an awaitable via its __anext__ method.
These three parts are directly used in the "async lambda" pattern:
# | regular lambda for the callable and scope
# | | async generator expression for an async scope
# v v v first item as an awaitable
key=lambda x: (await somefunction(x) for _ in '_').__anext__()
The for _ in '_' in the async generator is only to have exactly one iteration. Any variant with at least one iteration will do.
1Be mindful whether an "async lambda" is actually needed in the first place, since async functions are first class just like regular functions. Just as lambda x: foo(x) is redundant and should just be foo, lambda x: (await bar(x) …) is redundant and should just be bar . The function body should do more than just call-and-await, such as 3 + await bar(x) or await bar(x) or await qux(x).
await cannot be included in a lambda function.
The solutions here can be shortened to:
from asyncio import coroutine, run
my_list = [. . .]
async def some_function(x) -> coroutine:
. . .
my_list.sort(key=lambda x: await some_function(x)) # raises a SyntaxError
my_list.sort(key=lambda x: run(some_function(x)) # works
If you already defined a separate async function, you can simplify MisterMiyagi's answer even a bit more:
mylist = await a.sorted(
mylist,
key=somefunction)
If you want to change the key after awaiting it, you can use asyncstdlib.apply:
mylist = await a.sorted(
mylist,
key=lambda x: a.apply(lambda after: 1 / after, some_function(x)))
Here is a complete example program:
import asyncio
import asyncstdlib as a
async def some_function(x):
return x
async def testme():
mylist=[2, 1, 3]
mylist = await a.sorted(
mylist,
key=lambda x: a.apply(lambda after: 1 / after, some_function(x)))
print(f'mylist is: {mylist}')
if __name__ == "__main__":
asyncio.run(testme())
The answer from Sven Marnach has an Edge case.
If you try and sort a list that has 2 items that produce the same search key but are different and are not directly sortable, it will crash.
mylist = [{'score':50,'name':'bob'},{'score':50,'name':'linda'}]
mylist_annotated = [(x['score'], x) for x in mylist]
mylist_annotated.sort()
print( [x for key, x in mylist_annotated] )
Will give:
TypeError: '<' not supported between instances of 'dict' and 'dict'
Fortunately I had an easy solution - my data had a unique key in that was sortable, so I could put that as the second key:
mylist = [{'score':50,'name':'bob','unique_id':1},{'score':50,'name':'linda','unique_id':2}]
mylist_annotated = [(x['score'], x['unique_id'], x) for x in mylist]
mylist_annotated.sort()
print( [x for key, unique, x in mylist_annotated] )
I guess if your data doesn't have a naturally unique value in, you can insert one before trying to sort? A uuid maybe?
EDIT: As suggested in comment (Thanks!), you can also use operator.itemgetter:
import operator
mylist = [{'score':50,'name':'bob'},{'score':50,'name':'linda'}]
mylist_annotated = [(x['score'], x) for x in mylist]
mylist_annotated.sort(key=operator.itemgetter(0))
print( [x for key, x in mylist_annotated] )
I'm trying to do something like this:
mylist.sort(key=lambda x: await somefunction(x))
But I get this error:
SyntaxError: 'await' outside async function
Which makes sense because the lambda is not async.
I tried to use async lambda x: ... but that throws a SyntaxError: invalid syntax.
Pep 492 states:
Syntax for asynchronous lambda functions could be provided, but this construct is outside of the scope of this PEP.
But I could not find out if that syntax was implemented in CPython.
Is there a way to declare an async lambda, or to use an async function for sorting a list?
You can't. There is no async lambda, and even if there were, you coudln't pass it in as key function to list.sort(), since a key function will be called as a synchronous function and not awaited. An easy work-around is to annotate your list yourself:
mylist_annotated = [(await some_function(x), x) for x in mylist]
mylist_annotated.sort()
mylist = [x for key, x in mylist_annotated]
Note that await expressions in list comprehensions are only supported in Python 3.6+. If you're using 3.5, you can do the following:
mylist_annotated = []
for x in mylist:
mylist_annotated.append((await some_function(x), x))
mylist_annotated.sort()
mylist = [x for key, x in mylist_annotated]
An "async lambda" can be emulated by combining a lambda with an async generator:1
key=lambda x: (await somefunction(x) for _ in '_').__anext__()
It is possible to move the ( ).__anext__() to a helper, which likely makes the pattern clearer as well:
def head(async_iterator): return async_iterator.__anext__()
key=lambda x: head(await somefunction(x) for _ in '_')
Note that the sort method/function in the standard library are not async. One needs an async version, such as asyncstdlib.sorted (disclaimer: I maintain this library):
import asyncstdlib as a
mylist = await a.sorted(mylist, key=lambda x: head(await somefunction(x) for _ in '_'))
Understanding the lambda ...: (...).__anext__() pattern
An "async lambda" would be an anonymous asynchronous function, or in other words an anonymous function evaluating to an awaitable. This is in parallel to how async def defines a named function evaluating to an awaitable.
The task can be split into two parts: An anonymous function expression and a nested awaitable expression.
An anonymous function expression is exactly what a lambda ...: ... is.
An awaitable expression is only allowed inside a coroutine function; however:
An (asynchronous) generator expression implicitly creates a (coroutine) function. As an async generator only needs async to run, it can be defined in a sync function (since Python 3.7).
An asynchronous iterable can be used as an awaitable via its __anext__ method.
These three parts are directly used in the "async lambda" pattern:
# | regular lambda for the callable and scope
# | | async generator expression for an async scope
# v v v first item as an awaitable
key=lambda x: (await somefunction(x) for _ in '_').__anext__()
The for _ in '_' in the async generator is only to have exactly one iteration. Any variant with at least one iteration will do.
1Be mindful whether an "async lambda" is actually needed in the first place, since async functions are first class just like regular functions. Just as lambda x: foo(x) is redundant and should just be foo, lambda x: (await bar(x) …) is redundant and should just be bar . The function body should do more than just call-and-await, such as 3 + await bar(x) or await bar(x) or await qux(x).
await cannot be included in a lambda function.
The solutions here can be shortened to:
from asyncio import coroutine, run
my_list = [. . .]
async def some_function(x) -> coroutine:
. . .
my_list.sort(key=lambda x: await some_function(x)) # raises a SyntaxError
my_list.sort(key=lambda x: run(some_function(x)) # works
If you already defined a separate async function, you can simplify MisterMiyagi's answer even a bit more:
mylist = await a.sorted(
mylist,
key=somefunction)
If you want to change the key after awaiting it, you can use asyncstdlib.apply:
mylist = await a.sorted(
mylist,
key=lambda x: a.apply(lambda after: 1 / after, some_function(x)))
Here is a complete example program:
import asyncio
import asyncstdlib as a
async def some_function(x):
return x
async def testme():
mylist=[2, 1, 3]
mylist = await a.sorted(
mylist,
key=lambda x: a.apply(lambda after: 1 / after, some_function(x)))
print(f'mylist is: {mylist}')
if __name__ == "__main__":
asyncio.run(testme())
The answer from Sven Marnach has an Edge case.
If you try and sort a list that has 2 items that produce the same search key but are different and are not directly sortable, it will crash.
mylist = [{'score':50,'name':'bob'},{'score':50,'name':'linda'}]
mylist_annotated = [(x['score'], x) for x in mylist]
mylist_annotated.sort()
print( [x for key, x in mylist_annotated] )
Will give:
TypeError: '<' not supported between instances of 'dict' and 'dict'
Fortunately I had an easy solution - my data had a unique key in that was sortable, so I could put that as the second key:
mylist = [{'score':50,'name':'bob','unique_id':1},{'score':50,'name':'linda','unique_id':2}]
mylist_annotated = [(x['score'], x['unique_id'], x) for x in mylist]
mylist_annotated.sort()
print( [x for key, unique, x in mylist_annotated] )
I guess if your data doesn't have a naturally unique value in, you can insert one before trying to sort? A uuid maybe?
EDIT: As suggested in comment (Thanks!), you can also use operator.itemgetter:
import operator
mylist = [{'score':50,'name':'bob'},{'score':50,'name':'linda'}]
mylist_annotated = [(x['score'], x) for x in mylist]
mylist_annotated.sort(key=operator.itemgetter(0))
print( [x for key, x in mylist_annotated] )
Why doesn't asyncio.gather work with a generator expression?
import asyncio
async def func():
await asyncio.sleep(2)
# Works
async def call3():
x = (func() for x in range(3))
await asyncio.gather(*x)
# Doesn't work
async def call3():
await asyncio.gather(func() for x in range(3))
# Works
async def call3():
await asyncio.gather(*[func() for x in range(3)])
asyncio.run(call3())
The second variant gives:
[...]
File "test.py", line 13, in <genexpr>
await asyncio.gather(func() for x in range(3))
RuntimeError: Task got bad yield: <coroutine object func at 0x10421dc20>
Is this expected behavior?
await asyncio.gather(func() for x in range(3))
This doesn't work because this is passing the generator object as argument to gather. gather doesn't expect an iterable, it expects coroutines as individual arguments. Which means you need to unpack the generator.
Unpack the generator:
await asyncio.gather(*(func() for i in range(10))) # star expands generator
We must expand it because asyncio.gather expects a list of arguments (i.e. asyncio.gather(coroutine0, coroutine1, coroutine2, coroutine3)), not an iterable
Python uses */** for both 'un-packing' and just 'packing' based on whether it's used for variable assignment or not.
def foo(*args,**kwargs):...
In this case, all non-keyworded args are getting put into a tuple args and all kwargs are getting packed into a new dictionary. A single variable passed in still gets packed into a tuple(*) or dict(**).
This is kind of a hybrid
first,*i_take_the_rest,last = range(10)
>>> first=0,i_take_the_rest=[1,2,3,4,5,6,7,8],last=9
*a,b = range(1)
>>> a=[],b=0
But here it unpacks:
combined_iterables = [*range(10),*range(3)]
merged_dict = {**first_dict,**second_dict}
So basically if it's on the left side of the equals or if it's used in a function/method definition like *foo it's packing stuff into a list or tuple (respectively). In comprehensions, however, it has the unpacking behavior.
In an interview , the interviewer asked me for some of generators being used in Python. I know a generator is like a function which yield values instead of return.
so any one tell me is for/while loop is an example of generator.
Short answer: No, but there are other forms of generators.
A for/while loop is a loop structure: it does not emit values and thus is not a generator.
Nevertheless, there are other ways to construct generators.
You example with yield is for instance a generator:
def some_generator(xs):
for x in xs:
if x:
yield x
But there are also generator expressions, like:
(x for x in xs if x)
Furthermore in python-3.x the range(..), map(..), filter(..) constructs are generators as well.
And of course you can make an iterable (by using an iterable pattern):
class some_generator(object):
def __init__(self, xs):
self.n = n
self.idx = 0
def __iter__(self):
return self
def __next__(self):
return self.next()
def next(self):
while self.num < len(self.xs) and not self.xs[self.num]:
self.num += 1
if self.num < len(self.xs):
res = self.xs[self.num]
self.num += 1
return res
else:
raise StopIteration()
Neither while nor for are themselves generators or iterators. They are control constructs that perform iteration. Certainly, you can use for or while to iterate over the items yielded by a generator, and you can use for or while to perform iteration inside the code of a generator. But neither of those facts make for or while generators.
The first line in the python wiki for generators:
Generators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.
So in the context of your interview I'd believe they were looking for you to answer about the creation of an iterable.
The wiki for a for loop
In Python this is controlled instead by generating the appropriate sequence.
So you could get pedantic but generally, no, a for loop isn't a generator.
for and while are loop structures, and you can use them to iterate over generators. You can take certain elements of a generator by converting it to a list.