I am trying to create an asyncio task, perform some db query and then a for loop process and get the result back in the task. However, in the code sample below, it seems like my result is not being put to total_result.result() but rather, just to total_result.
Not sure if there is any misunderstanding that I"m having regarding my implementation of asyncio below?
class DatabaseHandler:
def __init__(self):
self.loop = get_event_loop()
self.engine = create_engine("postgres stuffs here")
self.conn = self.engine.connect()
async def _fetch_sql_data(self, query):
return self.conn.execute(query)
async def get_all(self, item):
total_result = []
if item == "all":
data = create_task(self._fetch_sql_data("select col1 from table1;"))
else:
data = create_task(self._fetch_sql_data(f"select col1 from table1 where quote = '{item}';"))
await data
for i in data.result().fetchall():
total_result.append(i[0])
return total_result
async def update(self):
total_result = create_task(self.get_all("all"))
print(await total_result) # prints out the result immediately and not the task object.
# this means that `total_result.result()` produces an error
loop = get_event_loop()
a = DatabaseHandler()
loop.run_until_complete(a.update())
I have a feeling that it is because of total_result being a list object. But not sure how to resolve this.
task.result() returns the result of your task (the return value of the wrapped coro) and not another Task. This means this
task = asyncio.create_task(coro())
await task
result = task.result()
is actually equivalent to
result = await coro()
Using tasks is especially useful, if you want to execute multiple coroutines concurrently. But as you are not doing that here, your code is a bit overcomplicated. You can just do
async def get_all(self, item):
total_result = []
if item == "all":
result = await self._fetch_sql_data("select col1 from table1;")
else:
result = await self._fetch_sql_data(f"select col1 from table1 where quote = '{item}';")
for i in result.fetchall():
total_result.append(i[0])
return total_result # holds the results of your db query just as called from sync code
Related
This is my first attempt at asynchronous programming in Python, but I am running into a problem where my results stop after the first task is finished, as opposed to returning all of the results after every task has finished executing.
In api.py, I have a search_async function that ultimately makes the request using the aiohttp.ClientSession object being passed around. Then the search_value_async function as a wrapper that's being called in app.py
# api.py
async def search_async(self, session, offset=0):
endpoint = 'https://example.com'
query_string = urlencode({ 'offset': offset })
lookup_url = f'{endpoint}?{query_string)}'
async with session.get(lookup_url, headers=self.get_resource_headers()) as response:
if response.status not in range(200, 299):
return {
'Status': response.status
}
return await response.json()
async def search_value_async(self, session, offset=0):
return await self.search_async(session, offset)
# app.py
async def get_recommendations(queries):
async with aiohttp.ClientSession() as session:
data = await get_all_queries(session, queries)
return data
async def get_all_queries(session, queries):
tasks = []
for query in queries:
for offset in range(0, 1000, 50):
tasks.append(asyncio.create_task(api.search_value_async(session, query, offset)))
results = await asyncio.gather(*tasks)
return results
def main():
# queries = ...
data = []
results = asyncio.run(get_recommendations(queries))
data.extend(results)
recommendations = normalize_data(data)
return data
So far, I've confirmed that the correct number of coroutines are being created, and I was able to diagnose that the number of results I get back when running asynchronously is equivalent to only the first task being ran.
I'm new to this, so my understanding could be wrong, but if all my tasks are being created, I would expect the results from await aysncio.gather(*tasks) to give me the results from all of my completed tasks, not just the first one.
I have a loop that when it finishes it returns a dataframe. This dataframe is then processed elsewhere within the app. Broadly speaking, in a sequenctial code, it looks like that:
import pandas as pd
def print_int_seq():
col_1 = []
col_2 = []
for i in range(10):
col_1.append(i)
col_2.append(2*i)
print(i)
return pd.DataFrame({'col_1': col_1, 'col_2': col_2})
def run_seq():
df = print_int_seq()
print(df)
if __name__ == "__main__":
run_seq()
I want now to have another function that will run asynchronously with the loop that returns the dataframe. I do not know how to do this please, ie return a value from an async/awaited function. If I didnt need to return anything the program (with the two async functions) would probably look like this:
import pandas as pd
from datetime import datetime
import asyncio
async def print_time(con):
while True:
print(datetime.now().time())
await asyncio.sleep(1)
async def print_int():
# I would like this to return the full 10x2 dataframe
col_1 = []
col_2 = []
for i in range(10):
col_1.append(i)
col_2.append(2*i)
print(i)
await asyncio.sleep(1)
async def main():
# how can I catch and process the 10x2 dataframe returned by print_int()?
await asyncio.gather(
print_time(con),
print_int(),
)
if __name__ == "__main__":
asyncio.run(main())
How can I edit the script above so when the loop is exhausted to catch the dataframe and handle it in another function please? Does it matter that loop in the other async function never ends?
First and most important: async mimic the behavior of functions a lot - if you want them to return a value, just add a return statement with whatever value you want to return:
async def print_int():
# I would like this to return the full 10x2 dataframe
col_1 = []
col_2 = []
for i in range(10):
...
return pd.Dataframe(...)
Second: asyncio.gather simply returns a sequence with all return values of the executed tasks, and for that, it must wait until all "gathered" tasks return. If the other task were to be finite, and finshed more or less on the same time, you'd do:
async def main():
result1, result2 = await asyncio.gather(
print_time(con),
print_int(),
)
As you plan to have a concurrent routine that won't end at all, asyncio.gather is not the best thing to: just create a task for both co-routines and await the result of the task you want:
async def main():
# this will get the other co-routine running in the background:
asyncio.create_task(print_time(con))
result = await print_int()
I am trying to connect and recieve messages from multiple websockets concurrently.
For this purpose I made it with asyncio, and it prints messages correctly. But the problem is that I just can print it, not return it.
The simplified example of pseudo code which I am struggle with is as below:
import websockets
import json
symbols_id = [1,2]
## LOOP RUNNING EXAMPLE OF ASYNCIO
async def get_connect(symbols_id):
tasks = []
for _id in symbols_id:
print('conncetion to', _id)
if _id == 1:
a = 0
elif _id == 2:
a = 200
tasks.append(asyncio.create_task(_loop(a)))
return tasks
async def _loop(a):
while True:
print(a)
a+=1
await asyncio.sleep(2.5)
async def ping_func():
while True:
print('------ ping')
await asyncio.sleep(5)
async def main():
tasks = await get_connect(symbols_id)
asyncio.create_task(ping_func())
await asyncio.gather(*tasks)
asyncio.run(main())
As you can see from the code above I used print(a) to print a in each loop.
I test return a instead of print(a) but it was not helpful.
thanks
yield a? return a will exit the function and the loop, yield is usually what you want in asyncio for looped tasks
Finally I found the way of using yield and async for to read data in each loop.
It will work correctly, by changing the code to the following one.
import websockets
import json
symbols_id = [1,2]
global a
a=0
## LOOP RUNNING EXAMPLE OF ASYNCIO
async def get_connect(symbols_id):
tasks = []
for _id in symbols_id:
print('conncetion to', _id)
if _id == 1:
a = 0
elif _id == 2:
a = 200
tasks.append(asyncio.create_task(_loop(a)))
return tasks
async def _loop(param):
global a
a = param
while True:
print(a)
a+=1
await asyncio.sleep(2.5)
async def ping_func():
while True:
print('------ ping')
await asyncio.sleep(5)
async def get_result():
global a
while True:
yield a
await asyncio.sleep(1)
async def main():
tasks = await get_connect(symbols_id)
asyncio.create_task(ping_func())
async for x in get_result():
print(x)
await asyncio.gather(*tasks)
asyncio.run(main())
I was confused with how to use generated data from this code snippet inside the other code snippet. what I found is:
1- Generated data can be accessible with global variables.
2- By defining a class and a property, it can be accessible from every part of the code.
I have a function which adds items to the list and returns the list. The items are returned from async function. Now it creates the item and then adds it one by one.
I want to create the items in parallel and add them to the list and after that return the value of the function. How can I solve this?
Thank you in advance!
async def __create_sockets(self):
rd_data = []
for s in self.symbols.index:
try:
print(f'Collecting data of {s}')
socket = DepthCacheManager(self.client, s, refresh_interval=None)
rd_data.append(await socket.__aenter__())
except:
continue
return rd_data
An easy solution to your problem is to gather the results asynchronously and compile the list of results at the same time.
This is provided by the asyncio.gather() call as explained in the asyncio documentation. Have a look at the excellent example given there.
In your case it might roughly look like this (obviously I cannot test it):
async def create_socket(self, s):
print(f'Collecting data of {s}')
socket = DepthCacheManager(self.client, s, refresh_interval=None)
return socket.__aenter__()
async def __create_sockets(self):
rd_data = await asyncio.gather(
*[self.create_socket(s) for s in self.symbols.index]
)
return rd_data
There is a problem here with missing exception handling. You may return None in case of an exception and then clean up the list later like this:
async def create_socket(self, s):
try:
print(f'Collecting data of {s}')
socket = DepthCacheManager(self.client, s, refresh_interval=None)
return await socket.__aenter__() # await is important here
except:
return None
async def __create_sockets(self):
rd_data = await asyncio.gather(
*[self.create_socket(s) for s in self.symbols.index]
)
return [i for i in rd_data if i != None]
I have the following method in my Tornado handler:
async def get(self):
url = 'url here'
try:
async for batch in downloader.fetch(url):
self.write(batch)
await self.flush()
except Exception as e:
logger.warning(e)
This is the code for downloader.fetch():
async def fetch(url, **kwargs):
timeout = kwargs.get('timeout', aiohttp.ClientTimeout(total=12))
response_validator = kwargs.get('response_validator', json_response_validator)
extractor = kwargs.get('extractor', json_extractor)
try:
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(url) as resp:
response_validator(resp)
async for batch in extractor(resp):
yield batch
except aiohttp.client_exceptions.ClientConnectorError:
logger.warning("bad request")
raise
except asyncio.TimeoutError:
logger.warning("server timeout")
raise
I would like yield the "batch" object from multiple downloaders in paralel.
I want the first available batch from the first downloader and so on until all downloaders finished. Something like this (this is not working code):
async for batch in [downloader.fetch(url1), downloader.fetch(url2)]:
....
Is this possible? How can I modify what I am doing in order to be able to yield from multiple coroutines in parallel?
How can I modify what I am doing in order to be able to yield from multiple coroutines in parallel?
You need a function that merges two async sequences into one, iterating over both in parallel and yielding elements from one or the other, as they become available. While such a function is not included in the current standard library, you can find one in the aiostream package.
You can also write your own merge function, as shown in this answer:
async def merge(*iterables):
iter_next = {it.__aiter__(): None for it in iterables}
while iter_next:
for it, it_next in iter_next.items():
if it_next is None:
fut = asyncio.ensure_future(it.__anext__())
fut._orig_iter = it
iter_next[it] = fut
done, _ = await asyncio.wait(iter_next.values(),
return_when=asyncio.FIRST_COMPLETED)
for fut in done:
iter_next[fut._orig_iter] = None
try:
ret = fut.result()
except StopAsyncIteration:
del iter_next[fut._orig_iter]
continue
yield ret
Using that function, the loop would look like this:
async for batch in merge(downloader.fetch(url1), downloader.fetch(url2)):
....
Edit:
As mentioned in the comment, below method does not execute given routines in parallel.
Checkout aitertools library.
import asyncio
import aitertools
async def f1():
await asyncio.sleep(5)
yield 1
async def f2():
await asyncio.sleep(6)
yield 2
async def iter_funcs():
async for x in aitertools.chain(f2(), f1()):
print(x)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(iter_funcs())
It seems that, functions being iterated must be couroutine.