First experience with asyncio - python

I am learning python-asyncio, and I'm trying to write a simple model.
I have a function handling tasks. While handling, it goes to another remote service for data and then prints a message.
My code:
dd = 0
#asyncio.coroutine
def slow_operation():
global dd
dd += 1
print('Future is started!', dd)
yield from asyncio.sleep(10 - dd) # request to another server
print('Future is done!', dd)
def add():
while True:
time.sleep(1)
asyncio.ensure_future(slow_operation(), loop=loop)
loop = asyncio.get_event_loop()
future = asyncio.Future()
asyncio.ensure_future(slow_operation(), loop=loop)
th.Thread(target=add).start()
loop.run_forever()
But this code doesn't switch the context while sleeping in:
yield from asyncio.sleep(10 - dd)
How can I fix that?

asyncio.ensure_future is not thread-safe, that's why slow_operation tasks are not started when they should be. Use loop.call_soon_threadsafe:
callback = lambda: asyncio.ensure_future(slow_operation(), loop=loop)
loop.call_soon_threadsafe(cb)
Or asyncio.run_coroutine_threadsafe if you're running python 3.5.1:
asyncio.run_coroutine_threadsafe(slow_operation(), loop)
However, you should probably keep the use of threads to the minimum. Unless you use a library running tasks in its own thread, all the code should run inside the event loop (or inside an executor, see loop.run_in_executor).

Related

Asyncio how to run 10 concurrent functions forever

Question on asyncio. I have this working just not sure if it's the correct way or if there is a easier way.
The short versions of what I am trying to do is continuously to execute the run() 10x concurrently
To do this I had to create a function work_it() with a While True Loop
The run() function take about 5 minutes to complete. Database calls, processing, aiohttp reqeusts, and etc.
Is this the best way to to do this or is there another way to have asyncio continuously run a function over and over again with 10 concurrent processes.
Also is asyncio.gather the correct function to use? Am I better of using an executor?
Thanks in advance.
Erik
db = Database()
conn = db.connect()
async def run(worker_id=None):
"""
Using Shared Database Conneciton
Create a object. Query the database, process the data, and do a http post with aiohttp
Returns: True\False based on the http post
"""
# my_object = Object_Model(db)
# await do_sql_queries
# await process_data
# Lots of processing
# result = await aiohttp_requests
nap_time = random.randint(1,5)
print(f'Worker-{worker_id} sleeping for {nap_time}')
await asyncio.sleep(nap_time)
return True
async def work_it(worker_id=None):
"""
This worker should run forever
"""
while True:
start = time.monotonic()
result = await run(worker_id)
duration = time.monotonic() - start
print(f'Worker-{worker_id} ran for {duration:.6f} seconds')
async def main():
"""
Start 10 "workers"
"""
workers = 10
tasks = []
for worker_id in range(1, workers+1):
print(f'Building Task {worker_id}')
tasks.append(work_it(worker_id))
print(f'Await Gather')
await asyncio.gather(*tasks)
asyncio.run(main())

Async HTTP API call for each line of file - Python

I am working on a big data problem and am stuck with some concurrency and async io issues. The problem is as follows:
1) Have multiple huge files (~4gb each x upto 15) which I am processing using ProcessPoolExecutor from concurrent.futures module this way :
def process(source):
files = os.list(source)
with ProcessPoolExecutor() as executor:
future_to_url = {executor.submit(process_individual_file, source, input_file):input_file for input_file in files}
for future in as_completed(future_to_url):
data = future.result()
2) Now in each file, I want to go line by line, process line to create a particular json, group such 2K jsons together and hit an API with that request to get response. Here is the code:
def process_individual_file(source, input_file):
limit = 2000
with open(source+input_file) as sf:
for line in sf:
json_array.append(form_json(line))
limit -= 1
if limit == 0:
response = requests.post(API_URL, json=json_array)
#check response status here
limit = 2000
3) Now the problem, the number of lines in each file being really large and that API call blocking and bit slow to respond, the program is taking huge amount of time to complete.
4) What I want to achieve is to make that API call async so that I can keep processing next batch of 2000 when that API call is happening.
5) Things I tried till now : I was trying to implement this using asyncio but there we need to collect the set of future tasks and wait for completion using event loop. Something like this:
async def process_individual_file(source, input_file):
tasks = []
limit = 2000
with open(source+input_file) as sf:
for line in sf:
json_array.append(form_json(line))
limit -= 1
if limit == 0:
tasks.append(asyncio.ensure_future(call_api(json_array)))
limit = 2000
await asyncio.wait(tasks)
ioloop = asyncio.get_event_loop()
ioloop.run_until_complete(process_individual_file(source, input_file))
ioloop.close()
6) I am really not understanding this because this is indirectly the same as previous as it waits to collect all tasks before launching them. Can someone help me with what should be the correct architecture of this problem ? How can I call the API async way, without collecting all tasks and with ability to process next batch parallely ?
I am really not understanding this because this is indirectly the
same as previous as it waits to collect all tasks before launching
them.
No, you wrong here. When you create asyncio.Task with asyncio.ensure_future it starts executing call_api coroutine immediately. This is how tasks in asyncio work:
import asyncio
async def test(i):
print(f'{i} started')
await asyncio.sleep(i)
async def main():
tasks = [
asyncio.ensure_future(test(i))
for i
in range(3)
]
await asyncio.sleep(0)
print('At this moment tasks are already started')
await asyncio.wait(tasks)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
0 started
1 started
2 started
At this moment tasks are already started
Problem with your approach is that process_individual_file is not actually asynchronous: it does large amount of CPU-related job without returning control to your asyncio event loop. It's a problem - function blocks event loop making impossible tasks to be executed.
Very simple, but effective solution I think you can use - is to return control to event loop manually using asyncio.sleep(0) after a few amount of executing process_individual_file, for example, on reading each line:
async def process_individual_file(source, input_file):
tasks = []
limit = 2000
with open(source+input_file) as sf:
for line in sf:
await asyncio.sleep(0) # Return control to event loop to allow it execute tasks
json_array.append(form_json(line))
limit -= 1
if limit == 0:
tasks.append(asyncio.ensure_future(call_api(json_array)))
limit = 2000
await asyncio.wait(tasks)
Upd:
there will be more than millions of requests to be done and hence I am
feeling uncomfortable to store future objects for all of them in a
list
It makes much sense. Nothing good will happen if you run million parallel network requests. Usual way to set limit in this case is to use synchronization primitives like asyncio.Semaphore.
I advice you to make generator to get json_array from file, and acquire Semaphore before adding new task and release it on task ready. You will get clean code protected from many parallel running tasks.
This will look like something like this:
def get_json_array(input_file):
json_array = []
limit = 2000
with open(input_file) as sf:
for line in sf:
json_array.append(form_json(line))
limit -= 1
if limit == 0:
yield json_array # generator will allow split file-reading logic from adding tasks
json_array = []
limit = 2000
sem = asyncio.Semaphore(50) # don't allow more than 50 parallel requests
async def process_individual_file(input_file):
for json_array in get_json_array(input_file):
await sem.acquire() # file reading wouldn't resume until there's some place for newer tasks
task = asyncio.ensure_future(call_api(json_array))
task.add_done_callback(lambda t: sem.release()) # on task done - free place for next tasks
task.add_done_callback(lambda t: print(t.result())) # print result on some call_api done

RuntimeError: There is no current event loop in thread in async + apscheduler

I have a async function and need to run in with apscheduller every N minutes.
There is a python code below
URL_LIST = ['<url1>',
'<url2>',
'<url2>',
]
def demo_async(urls):
"""Fetch list of web pages asynchronously."""
loop = asyncio.get_event_loop() # event loop
future = asyncio.ensure_future(fetch_all(urls)) # tasks to do
loop.run_until_complete(future) # loop until done
async def fetch_all(urls):
tasks = [] # dictionary of start times for each url
async with ClientSession() as session:
for url in urls:
task = asyncio.ensure_future(fetch(url, session))
tasks.append(task) # create list of tasks
_ = await asyncio.gather(*tasks) # gather task responses
async def fetch(url, session):
"""Fetch a url, using specified ClientSession."""
async with session.get(url) as response:
resp = await response.read()
print(resp)
if __name__ == '__main__':
scheduler = AsyncIOScheduler()
scheduler.add_job(demo_async, args=[URL_LIST], trigger='interval', seconds=15)
scheduler.start()
print('Press Ctrl+{0} to exit'.format('Break' if os.name == 'nt' else 'C'))
# Execution will block here until Ctrl+C (Ctrl+Break on Windows) is pressed.
try:
asyncio.get_event_loop().run_forever()
except (KeyboardInterrupt, SystemExit):
pass
But when i tried to run it i have the next error info
Job "demo_async (trigger: interval[0:00:15], next run at: 2017-10-12 18:21:12 +04)" raised an exception.....
..........\lib\asyncio\events.py", line 584, in get_event_loop
% threading.current_thread().name)
RuntimeError: There is no current event loop in thread '<concurrent.futures.thread.ThreadPoolExecutor object at 0x0356B150>_0'.
Could you please help me with this?
Python 3.6, APScheduler 3.3.1,
In your def demo_async(urls), try to replace:
loop = asyncio.get_event_loop()
with:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
The important thing that hasn't been mentioned is why the error occurs. For me personally, knowing why the error occurs is as important as solving the actual problem.
Let's take a look at the implementation of the get_event_loop of BaseDefaultEventLoopPolicy:
class BaseDefaultEventLoopPolicy(AbstractEventLoopPolicy):
...
def get_event_loop(self):
"""Get the event loop.
This may be None or an instance of EventLoop.
"""
if (self._local._loop is None and
not self._local._set_called and
isinstance(threading.current_thread(), threading._MainThread)):
self.set_event_loop(self.new_event_loop())
if self._local._loop is None:
raise RuntimeError('There is no current event loop in thread %r.'
% threading.current_thread().name)
return self._local._loop
You can see that the self.set_event_loop(self.new_event_loop()) is only executed if all of the below conditions are met:
self._local._loop is None - _local._loop is not set
not self._local._set_called - set_event_loop hasn't been called yet
isinstance(threading.current_thread(), threading._MainThread) - current thread is the main one (this is not True in your case)
Therefore the exception is raised, because no loop is set in the current thread:
if self._local._loop is None:
raise RuntimeError('There is no current event loop in thread %r.'
% threading.current_thread().name)
Just pass fetch_all to scheduler.add_job() directly. The asyncio scheduler supports coroutine functions as job targets.
If the target callable is not a coroutine function, it will be run in a worker thread (due to historical reasons), hence the exception.
I had a similar issue where I wanted my asyncio module to be callable from a non-asyncio script (which was running under gevent... don't ask...). The code below resolved my issue because it tries to get the current event loop, but will create one if there isn't one in the current thread. Tested in python 3.9.11.
try:
loop = asyncio.get_event_loop()
except RuntimeError as e:
if str(e).startswith('There is no current event loop in thread'):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
else:
raise
Use asyncio.run() instead of directly using the event loop.
It creates a new loop and closes it when finished.
This is how the 'run' looks like:
if events._get_running_loop() is not None:
raise RuntimeError(
"asyncio.run() cannot be called from a running event loop")
if not coroutines.iscoroutine(main):
raise ValueError("a coroutine was expected, got {!r}".format(main))
loop = events.new_event_loop()
try:
events.set_event_loop(loop)
loop.set_debug(debug)
return loop.run_until_complete(main)
finally:
try:
_cancel_all_tasks(loop)
loop.run_until_complete(loop.shutdown_asyncgens())
finally:
events.set_event_loop(None)
loop.close()
Since this question continues to appear on the first page, I will write my problem and my answer here.
I had a RuntimeError: There is no current event loop in thread 'Thread-X'. when using flask-socketio and Bleak.
Edit: well, I refactored my file and made a class.
I initialized the loop in the constructor, and now everything is working fine:
class BLE:
def __init__(self):
self.loop = asyncio.get_event_loop()
# function example, improvement of
# https://github.com/hbldh/bleak/blob/master/examples/discover.py :
def list_bluetooth_low_energy(self) -> list:
async def run() -> list:
BLElist = []
devices = await bleak.discover()
for d in devices:
BLElist.append(d.name)
return 'success', BLElist
return self.loop.run_until_complete(run())
Usage:
ble = path.to.lib.BLE()
list = ble.list_bluetooth_low_energy()
Original answer:
The solution was stupid. I did not pay attention to what I did, but I moved some import out of a function, like this:
import asyncio, platform
from bleak import discover
def listBLE() -> dict:
async def run() -> dict:
# my code that keep throwing exceptions.
loop = asyncio.get_event_loop()
ble_list = loop.run_until_complete(run())
return ble_list
So I thought that I needed to change something in my code, and I created a new event loop using this piece of code just before the line with get_event_loop():
loop = asyncio.new_event_loop()
loop = asyncio.set_event_loop()
At this moment I was pretty happy, since I had a loop running.
But not responding. And my code relied on a timeout to return some values, so it was pretty bad for my app.
It took me nearly two hours to figure out that the problem was the import, and here is my (working) code:
def list() -> dict:
import asyncio, platform
from bleak import discover
async def run() -> dict:
# my code running perfectly
loop = asyncio.get_event_loop()
ble_list = loop.run_until_complete(run())
return ble_list
Reading given answers I only manage to fix my websocket thread by using the hint (try replacing) in https://stackoverflow.com/a/46750562/598513 on this page.
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
The documentation of BaseDefaultEventLoopPolicy explains
Default policy implementation for accessing the event loop.
In this policy, each thread has its own event loop. However, we
only automatically create an event loop by default for the main
thread; other threads by default have no event loop.
So when using a thread one has to create the loop.
And I had to reorder my code so my final code
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
# !!! Place code after setting the loop !!!
server = Server()
start_server = websockets.serve(server.ws_handler, 'localhost', port)
In my case the line was like this
asyncio.get_event_loop().run_until_complete(test())
I replaced above line with this line which solved my problem
asyncio.run(test())

Making a Python slack bot asynchronous

I've been trying to make a bot in Slack that remains responsive even if it hasn't finished processing earlier commands, so it could go and do something that takes some time without locking up. It should return whatever is finished first.
I think I'm getting part of the way there: it now doesn't ignore stuff that's typed in before an earlier command is finished running. But it still doesn't allow threads to "overtake" each other - a command called first will return first, even if it takes much longer to complete.
import asyncio
from slackclient import SlackClient
import time, datetime as dt
token = "my token"
sc = SlackClient(token)
#asyncio.coroutine
def sayHello(waitPeriod = 5):
yield from asyncio.sleep(waitPeriod)
msg = 'Hello! I waited {} seconds.'.format(waitPeriod)
return msg
#asyncio.coroutine
def listen():
yield from asyncio.sleep(1)
x = sc.rtm_connect()
info = sc.rtm_read()
if len(info) == 1:
if r'/hello' in info[0]['text']:
print(info)
try:
waitPeriod = int(info[0]['text'][6:])
except:
print('Can not read a time period. Using 5 seconds.')
waitPeriod = 5
msg = yield from sayHello(waitPeriod = waitPeriod)
print(msg)
chan = info[0]['channel']
sc.rtm_send_message(chan, msg)
asyncio.async(listen())
def main():
print('here we go')
loop = asyncio.get_event_loop()
asyncio.async(listen())
loop.run_forever()
if __name__ == '__main__':
main()
When I type /hello 12 and /hello 2 into the Slack chat window, the bot does respond to both commands now. However it doesn't process the /hello 2 command until it's finished doing the /hello 12 command. My understanding of asyncio is a work in progress, so it's quite possible I'm making a very basic error. I was told in a previous question that things like sc.rtm_read() are blocking functions. Is that the root of my problem?
Thanks a lot,
Alex
What is happening is your listen() coroutine is blocking at the yield from sayHello() statement. Only once sayHello() completes will listen() be able to continue on its merry way. The crux is that the yield from statement (or await from Python 3.5+) is blocking. It chains the two coroutines together and the 'parent' coroutine can't complete until the linked 'child' coroutine completes. (However, 'neighbouring' coroutines that aren't part of the same linked chain are free to proceed in the meantime).
The simple way to release sayHello() without holding up listen() in this case is to use listen() as a dedicated listening coroutine and to offload all subsequent actions into their own Task wrappers instead, thus not hindering listen() from responding to subsequent incoming messages. Something along these lines.
#asyncio.coroutine
def sayHello(waitPeriod, sc, chan):
yield from asyncio.sleep(waitPeriod)
msg = 'Hello! I waited {} seconds.'.format(waitPeriod)
print(msg)
sc.rtm_send_message(chan, msg)
#asyncio.coroutine
def listen():
# connect once only if possible:
x = sc.rtm_connect()
# use a While True block instead of repeatedly calling a new Task at the end
while True:
yield from asyncio.sleep(0) # use 0 unless you need to wait a full second?
#x = sc.rtm_connect() # probably not necessary to reconnect each loop?
info = sc.rtm_read()
if len(info) == 1:
if r'/hello' in info[0]['text']:
print(info)
try:
waitPeriod = int(info[0]['text'][6:])
except:
print('Can not read a time period. Using 5 seconds.')
waitPeriod = 5
chan = info[0]['channel']
asyncio.async(sayHello(waitPeriod, sc, chan))

asyncio - How can coroutines be used in signal handlers?

I am developing an application that uses asyncio from python3.4 for networking. When this application shuts down cleanly, a node needs to "disconnect" from the hub. This disconnect is an active process that requires a network connection so the loop needs to wait for this to complete before shutting down.
My issue is that using a coroutine as a signal handler will result in the application not shutting down. Please consider the following example:
import asyncio
import functools
import os
import signal
#asyncio.coroutine
def ask_exit(signame):
print("got signal %s: exit" % signame)
yield from asyncio.sleep(10.0)
loop.stop()
loop = asyncio.get_event_loop()
for signame in ('SIGINT', 'SIGTERM'):
loop.add_signal_handler(getattr(signal, signame),
functools.partial(ask_exit, signame))
print("Event loop running forever, press CTRL+c to interrupt.")
print("pid %s: send SIGINT or SIGTERM to exit." % os.getpid())
loop.run_forever()
If you run this example and then press Ctrl+C, nothing will happen.
The question is, how do I make this behavior happen with siganls and coroutines?
Syntax for python >=3.5
loop = asyncio.get_event_loop()
for signame in ('SIGINT', 'SIGTERM'):
loop.add_signal_handler(getattr(signal, signame),
lambda: asyncio.ensure_future(ask_exit(signame)))
Syntax for python >=3.7
loop = asyncio.get_event_loop()
for signame in ('SIGINT', 'SIGTERM'):
loop.add_signal_handler(getattr(signal, signame),
lambda signame=signame: asyncio.create_task(ask_exit(signame)))
Note
This is basically same as #svs's answers with two differences:
Usage of the more recent Python 3.7+ method asyncio.create_task which is "more readable" than asyncio.ensure_future.
Binding signame immediately to the lambda function avoids the problem of late binding leading to the expected-unexpected™ behavior referred to in the comment by #R2RT. This was shamelessly copied from Lynn Root's blog post: Graceful Shutdowns with asyncio (read the whole series to learn more about asyncio's beautiful goriness).
loop = asyncio.get_event_loop()
for signame in ('SIGINT', 'SIGTERM'):
loop.add_signal_handler(getattr(signal, signame),
asyncio.async, ask_exit(signame))
That way the signal causes your ask_exit to get scheduled in a task.
python3.8
1st attempt: used async def handler_shutdown, and wrapped it in loop.create_task() when passing to add_signal_handler()
2nd attempt: don't use async for def handler_shutdown().
3rd attempt: wrap handler_shutdown and param in functools.partial()
e.g.
import asyncio
import functools
def handler_shutdown(signal, loop, tasks, http_runner, ):
...
...
def main():
loop = asyncio.get_event_loop()
for signame in ('SIGINT', 'SIGTERM', 'SIGQUIT'):
print(f"add signal handler {signame} ...")
loop.add_signal_handler(
getattr(signal, signame),
functools.partial(handler_shutdown,
signal=signame, loop=loop, tasks=tasks,
http_runner=http_runner
)
)
The main issue i had was error
raise TypeError("coroutines cannot be used "
solved it by wrapping the routine in loop.create_task()
then solved it by removing async form signal handler function
for named param to handler also use functools.partial

Categories