I have been attempting to generate a ping scan that uses a limited number of processes. I tried as_completed without success and switched to asyncio.wait with asyncio.FIRST_COMPLETED.
The following complete script works if the offending line is commented out. I'd like to collect the tasks to a set in order to get rid of pending = list(pending) however pending_set.union(task) throws await wasn't used with future.
"""Test simultaneous pings, limiting processes."""
import asyncio
from time import asctime
pinglist = [
'127.0.0.1', '192.168.1.10', '192.168.1.20', '192.168.1.254',
'192.168.177.20', '192.168.177.100', '172.17.1.1'
]
async def ping(ip):
"""Run external ping."""
p = await asyncio.create_subprocess_exec(
'ping', '-n', '-c', '1', ip,
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.DEVNULL
)
return await p.wait()
async def run():
"""Run the test, uses some processes and will take a while."""
iplist = pinglist[:]
pending = []
pending_set = set()
tasks = {}
while len(pending) or len(iplist):
while len(pending) < 3 and len(iplist):
ip = iplist.pop()
print(f"{asctime()} adding {ip}")
task = asyncio.create_task(ping(ip))
tasks[task] = ip
pending.append(task)
pending_set.union(task) # comment this line and no error
done, pending = await asyncio.wait(
pending, return_when=asyncio.FIRST_COMPLETED
)
pending = list(pending)
for taskdone in done:
print(' '.join([
asctime(),
('BAD' if taskdone.result() else 'good'),
tasks[taskdone]
]))
if __name__ == '__main__':
asyncio.run(run())
There are two problems with pending_set.union(task):
union doesn't update the set in-place, it returns a new set consisting of the original one and the one it receives as argument.
It accepts an iterable collection (such as another set), not a single element. Thus union attempts to iterate over task, which doesn't make sense. To make things more confusing, task objects are technically iterable in order to be usable in yield from expressions, but they detect iteration attempts in non-async contexts, and report the error you've observed.
To fix both issues, you should use the add method instead, which operates by side effect and accepts a single element to add to the set:
pending_set.add(task)
Note that a more idiomatic way to limit concurrency in asyncio is using a Semaphore. For example (untested):
async def run():
limit = asyncio.Semaphore(3)
async def wait_and_ping(ip):
async with limit:
print(f"{asctime()} adding {ip}")
result = await ping(ip)
print(asctime(), ip, ('BAD' if result else 'good'))
await asyncio.gather(*[wait_and_ping(ip) for ip in pinglist])
Use await asyncio.gather(*pending_set)
asyncio.gather() accepts any number of awaitables and also returns one
* unpacks the set
>>> "{} {} {}".format(*set((1,2,3)))
'1 2 3'
Example from the docs
await asyncio.gather(
factorial("A", 2),
factorial("B", 3),
factorial("C", 4),
)
I solved this without queuing the ping targets in my original application, which simplified things. This answer includes a gradually received list of targets and the useful pointers from #user4815162342. This completes the answer to the original question.
import asyncio
import time
pinglist = ['127.0.0.1', '192.168.1.10', '192.168.1.20', '192.168.1.254',
'192.168.177.20', '192.168.177.100', '172.17.1.1']
async def worker(queue):
limit = asyncio.Semaphore(4) # restrict the rate of work
async def ping(ip):
"""Run external ping."""
async with limit:
print(f"{time.time():.2f} starting {ip}")
p = await asyncio.create_subprocess_exec(
'ping', '-n', '1', ip,
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.DEVNULL
)
return (ip, await p.wait())
async def get_assign():
return await queue.get()
assign = {asyncio.create_task(get_assign())}
pending = set()
Maintaining two distinct pending sets proved key. One set is a single task that receives assigned addresses. This completes and needs restarted each time. The other set is for the ping messages which run once and are then complete.
while len(assign) + len(pending) > 0: # stop condition
done, pending = await asyncio.wait(
set().union(assign, pending),
return_when=asyncio.FIRST_COMPLETED
)
for job in done:
if job in assign:
if job.result() is None:
assign = set() # for stop condition
else:
pending.add(asyncio.create_task(ping(job.result())))
assign = {asyncio.create_task(get_assign())}
else:
print(
f"{time.time():.2f} result {job.result()[0]}"
f" {['good', 'BAD'][job.result()[1]]}"
)
The remainder is pretty straight forward.
async def assign(queue):
"""Assign tasks as if these are arriving gradually."""
print(f"{time.time():.2f} start assigning")
for task in pinglist:
await queue.put(task)
await asyncio.sleep(0.1)
await queue.put(None) # to stop nicely
async def main():
queue = asyncio.Queue()
await asyncio.gather(worker(queue), assign(queue))
if __name__ == '__main__':
asyncio.run(main())
The output of this is (on my network with 172 failing to respond):
1631611141.70 start assigning
1631611141.70 starting 127.0.0.1
1631611141.71 result 127.0.0.1 good
1631611141.80 starting 192.168.1.10
1631611141.81 result 192.168.1.10 good
1631611141.91 starting 192.168.1.20
1631611142.02 starting 192.168.1.254
1631611142.03 result 192.168.1.254 good
1631611142.13 starting 192.168.177.20
1631611142.23 starting 192.168.177.100
1631611142.24 result 192.168.177.100 good
1631611142.34 starting 172.17.1.1
1631611144.47 result 192.168.1.20 good
1631611145.11 result 192.168.177.20 good
1631611145.97 result 172.17.1.1 BAD
Related
Can anyone help me out, I'm trying to get the program to pause if the condition is met. But as of now, its not sleeping at all. And I can't wrap my head around why. Im completely new to asyncio
time.sleep() doesnt really work either, so I would prefer to use asyncio. Thanks alot!
from python_graphql_client import GraphqlClient
import asyncio
import os
import requests
loop = asyncio.get_event_loop()
def print_handle(data):
print(data["data"]["liveMeasurement"]["timestamp"]+" "+str(data["data"]["liveMeasurement"]["power"]))
tall = (data["data"]["liveMeasurement"]["power"])
if tall >= 1000:
print("OK")
# schedule async task from sync code
asyncio.create_task(send_push_notification(data))
print("msg sent")
asyncio.create_task(sleep())
client = GraphqlClient(endpoint="wss://api.tibber.com/v1-beta/gql/subscriptions")
query = """
subscription{
liveMeasurement(homeId:"fd73a8a6ca"){
timestamp
power
}
}
"""
query2 = """
mutation{
sendPushNotification(input: {
title: "Advarsel! Høyt forbruk",
message: "Du bruker 8kw eller mer",
screenToOpen: CONSUMPTION
}){
successful
pushedToNumberOfDevices
}
}
"""
async def sleep():
await asyncio.sleep(10)
async def send_push_notification(data):
#maybe update your query with the received data here
await client.execute_async(query=query2,headers={'Authorization': "2bTCaFx74"})
async def main():
await client.subscribe(query=query, headers={'Authorization': "2bTCaFxDiYdHlxBSt074"}, handle=print_handle)
asyncio.run(main())
If I understand correctly, you want to observe broadcasts of some data, and react to those broadcasts, keeping the right to pause those reactions. Something like:
async def monitor(read_broadcast):
while True:
data = await read_broadcast()
print(data["data"]["liveMeasurement"]["timestamp"]+" "+str(data["data"]["liveMeasurement"]["power"]))
tall = (data["data"]["liveMeasurement"]["power"])
if tall >= 1000:
print("OK")
await send_push_notification(data)
print("msg sent")
# sleep for a while before sending another one
await asyncio.sleep(10)
To implement read_broadcast, we can use a "future":
# client, query, query2, send_push_notification defined as before
async def main():
broadcast_fut = None
def update_broadcast_fut(_fut=None):
nonlocal broadcast_fut
broadcast_fut = asyncio.get_event_loop().create_future()
broadcast_fut.add_done_callback(update_broadcast_fut)
update_broadcast_fut()
def read_broadcast():
return broadcast_fut
asyncio.create_task(monitor(read_broadcast))
await client.subscribe(
query=query, headers={'Authorization': "2bTCaFxDiYdHlxBSt074"},
handle=lambda data: broadcast_fut.set_result(data),
)
asyncio.run(main())
Note that I haven't tested the above code, so there could be typos.
I think the easiest way to reduce the number of messages being sent, is to define a minimum interval in which no notification is sent while the value is still over the threshold.
import time
last_notification_timestamp = 0
NOTIFICATION_INTERVAL = 5 * 60 # 5 min
def print_handle(data):
global last_notification_timestamp
print(
data["data"]["liveMeasurement"]["timestamp"]
+ " "
+ str(data["data"]["liveMeasurement"]["power"])
)
tall = data["data"]["liveMeasurement"]["power"]
current_time = time.time()
if (
tall >= 1000
and current_time - NOTIFICATION_INTERVAL > last_notification_timestamp
):
print("OK")
# schedule async task from sync code
asyncio.create_task(send_push_notification(data))
last_notification_timestamp = current_time
print("msg sent")
The timestamp of the last message sent needs to be stored somewhere, so we'll define a variable in the global scope to hold it and use the global keyword within print_handle() to be able to write to it from within the function. In the function we will then check, if the value is above the threshold and enough time passed after the last message. This way you will still keep your subscription alive as well as limit the number of notifications you receive. This is simple enough but probably you will soon want to extend the range of what you want to do with your received data. Just keep in mind that print_handle() is a sync callback and should be as short as possible.
My API is to receive users' texts within 900ms and they will be sent to the model to calculate their length (just for a simple demo). I already realized it but the way is ugly. I will open a new background schedule thread. And API receives the query in the main thread, it will put it in the queue which is shared by the main and new thread. And the new thread will schedule get all texts in the queue and send them to the model. After the model calculated them, results are stored in a shared dict. In the main thread, get_response method will use a while loop to check the result in the shared dict, my question is how can I get rid of the while loop in get_response method. I wanna an elegant method. Thx!
this is server code, need to remove while sleep in get-response because it's ugly :
import asyncio
import uuid
from typing import Union, List
import threading
from queue import Queue
from fastapi import FastAPI, Request, Body, APIRouter
from fastapi_utils.tasks import repeat_every
import uvicorn
import time
import logging
import datetime
logger = logging.getLogger(__name__)
app = APIRouter()
def feed_data_into_model(queue,shared_dict,lock):
if queue.qsize() != 0:
data = []
ids = []
while queue.qsize() != 0:
task = queue.get()
task_id = task[0]
ids.append(task_id)
text = task[1]
data.append(text)
result = model_work(data)
# print("model result:",result)
for index,task_id in enumerate(ids):
value = result[index]
handle_dict(task_id,value,action = "put",lock=lock, shared_dict = shared_dict)
class TestThreading(object):
def __init__(self, interval, queue,shared_dict,lock):
self.interval = interval
thread = threading.Thread(target=self.run, args=(queue,shared_dict,lock))
thread.daemon = True
thread.start()
def run(self,queue,shared_dict,lock):
while True:
# More statements comes here
# print(datetime.datetime.now().__str__() + ' : Start task in the background')
feed_data_into_model(queue,shared_dict,lock)
time.sleep(self.interval)
if __name__ != "__main__":
# since uvicorn will init and reload the file, and __name__ will change, not as __main__, so I init variable here
# otherwise, we will have 2 background thread (one is empty) , it doesn't run but hard to debug due to the confusion
global queue, shared_dict, lock
queue = Queue(maxsize=64) #
shared_dict = {} # model result saved here!
lock = threading.Lock()
tr = TestThreading(0.9, queue,shared_dict,lock)
def handle_dict(key, value = None, action = "put", lock = None, shared_dict = None):
lock.acquire()
try:
if action == "put":
shared_dict[key] = value
elif action == "delete":
del shared_dict[key]
elif action == "get":
value = shared_dict[key]
elif action == "exist":
value = key in shared_dict
else:
pass
finally:
# Always called, even if exception is raised in try block
lock.release()
return value
def model_work(x:Union[str,List[str]]):
time.sleep(3)
if isinstance(x,str):
result = [len(x)]
else:
result = [len(_) for _ in x]
return result
async def get_response(task_id, lock, shared_dict):
not_exist_flag = True
while not_exist_flag:
not_exist_flag = handle_dict(task_id, None, action= "exist",lock=lock, shared_dict = shared_dict) is False
await asyncio.sleep(0.02)
value = handle_dict(task_id, None, action= "get", lock=lock, shared_dict = shared_dict)
handle_dict(task_id, None, action= "delete",lock=lock, shared_dict = shared_dict)
return value
#app.get("/{text}")
async def demo(text:str):
global queue, shared_dict, lock
task_id = str(uuid.uuid4())
logger.info(task_id)
state = "pending"
item= [task_id,text,state,""]
queue.put(item)
# TODO: await query_from_answer_dict , need to change since it's ugly to while wait the answer
value = await get_response(task_id, lock, shared_dict)
return 1
if __name__ == "__main__":
# what I want to do:
# single process run every 900ms, if queue is not empty then pop them out to model
# and model will save result in thread-safe dict, key is task-id
uvicorn.run("api:app", host="0.0.0.0", port=5555)
client code:
for n in {1..5}; do curl http://localhost:5555/a & ; done
The usual way to run a blocking task in asyncio code is to use asyncio's builtin run_in_executor to handle if for you. You can either setup an executor, or let it do it for you:
import asyncio
from time import sleep
def proc(t):
print("in thread")
sleep(t)
return f"Slept for {t} seconds"
async def submit_task(t):
print("submitting:", t)
res = await loop.run_in_executor(None, proc, t)
print("got:", res)
async def other_task():
for _ in range(4):
print("poll!")
await asyncio.sleep(1)
loop = asyncio.new_event_loop()
loop.create_task(other_task())
loop.run_until_complete(submit_task(3))
Note that if loop is not defined globally, you can get it inside the function with asyncio.get_event_loop(). I've deliberately used a simple example without fastapi/uvicorn to illustrate the point, but the idea is the same: fastapi (etc) just run in the event loop, which is why you define coroutines for the endpoints.
The advantage of this is that we can simply await the response directly, without messing about with awaiting an event and then using some other means (shared dict with mutex, pipe, queue, whatever) to get the result out, which keeps the code clean and readable, and is likely also a good deal quicker. If, for some reason, we want to make sure it runs in processes and not threads we can make our own executor:
from concurrent.futures import ProcessPoolExecutor
e = ProcessPoolExecutor()
...
res = await loop.run_in_executor(e, proc, t)
See the docs for more information.
Another option would be using a multiprocessing.Pool to run the task, and then apply_async. But you can't await multiprocessing futures directly. There is a library aiomultiprocessing to make the two play together but I have no experience with it and cannot see a reason to prefer it over the builtin executor for this case (running a single background task per invocation of the coro).
Lastly do note that the main reason to avoid a polling while loop is not that it's ugly (although it is), but that it's not nearly as performant as almost any other solution.
I think I already got the answer that is using asyncio.event to communicate across threads. Using set, clear, wait and asyncio.get_event_loop().
I have a script where I have multiple async functions and I am running them in loop. Everything runs okay, except one task which I need to run twice with different input parameters.
def run(self):
checks_to_run = self.returnChecksBasedOnInputs()
loop = asyncio.new_event_loop().run_until_complete(self.run_all_checks_async(checks_to_run))
asyncio.set_event_loop(loop)
return self.output
async def run_all_checks_async(self,checks_to_run):
async with aiohttp.ClientSession() as session:
check_results = []
for single_check in checks_to_run:
if single_check == "cvim_check_storage": #can run parallel in separate thread for each az
total_number_of_azs = len(Constants.cvim_azs)+1
self.log.info(total_number_of_azs)
for x in range(1,total_number_of_azs):
task = asyncio.ensure_future(getattr(self, single_check)(session,x))
else:
task = asyncio.ensure_future(getattr(self, single_check)(session))
check_results.append(task)
await asyncio.gather(*check_results, return_exceptions=False)
class apiCaller:
def __init__(self):
pass
async def callAndReturnJson(self, method, url, headers, session, payload, log):
sslcontext = ssl.create_default_context(purpose=ssl.Purpose.CLIENT_AUTH)
try:
async with session.request(method, url, data=payload, headers=headers,ssl=sslcontext) as response:
response = await response.json()
print(str(response))
return response
except Exception as e:
print("here exception")
raise Exception(str(e))
The problem is here in this function - I am running it twice, but I noticed when the second version of the task goes to the return statement also first task closes down immediately. How can I avoid that and wait till other task also finishes ?
async def cvim_check_storage(self,session, aznumber):
response = await apiCaller().callAndReturnJson("POST",f"https://{single_cvim_az}/v1/diskmgmt/check_disks",getattr(Constants,cvim_az_headers),session=session, payload=payload,log=self.log)
self.log.info(str(response))
self.log.info(str(response.keys()))
if "diskmgmt_request" not in response.keys():
self.output.cvim_checks.cvim_raid_checks.results[az_plus_number].overall_status = "FAILED"
self.output.cvim_checks.cvim_raid_checks.results[az_plus_number].details = str(response)
return
...rest of the code if above if statement is false
The problem is how you track your tasks. You are using task to add new tasks to check_results, but in one of your branches, you are assigning task inside a for loop. You don't add task to check_results until after the loop completes, though, so only the last task gets added. gather won't wait for any of the other tasks created before completing.
The solution is to add task during each iteration of the inner for loop. There are a few different ways to spell that.
One option is to just call check_results.append anywhere you currently assign to task.
if single_check == "cvim_check_storage": #can run parallel in separate thread for each az
total_number_of_azs = len(Constants.cvim_azs)+1
self.log.info(total_number_of_azs)
for x in range(1,total_number_of_azs):
check_results.append(
asyncio.ensure_future(getattr(self, single_check)(session,x))
)
else:
check_results.append(
asyncio.ensure_future(getattr(self, single_check)(session))
)
I'd take it one step further and use a list comprehension when creating multiple tasks, though.
if single_check == "cvim_check_storage": #can run parallel in separate thread for each az
total_number_of_azs = len(Constants.cvim_azs)+1
self.log.info(total_number_of_azs)
check_results.extend(
[
asyncio.ensure_future(getattr(self, single_check)(session,x))
for x in range(1,total_number_of_azs)
]
)
else:
task = asyncio.ensure_future(getattr(self, single_check)(session))
check_results.append(task)
I have a complex function Vehicle.set_data, which has many nested functions, API calls, DB calls, etc. For the sake of this example, I will simplify it.
I am trying to use Async IO to run Vehicle.set_data on multiple vehicles at once. Here is my Vehicle model:
class Vehicle:
def __init__(self, token):
self.token = token
# Works async
async def set_data(self):
await asyncio.sleep(random.random() * 10)
# Does not work async
# def set_data(self):
# time.sleep(random.random() * 10)
And here is my Async IO routinue:
async def set_vehicle_data(vehicle):
# sleep for T seconds on average
await vehicle.set_data()
def get_random_string():
return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(5))
async def producer(queue):
count = 0
while True:
count += 1
# produce a token and send it to a consumer
token = get_random_string()
vehicle = Vehicle(token)
print(f'produced {vehicle.token}')
await queue.put(vehicle)
if count > 3:
break
async def consumer(queue):
while True:
vehicle = await queue.get()
# process the token received from a producer
print(f'Starting consumption for vehicle {vehicle.token}')
await set_vehicle_data(vehicle)
queue.task_done()
print(f'Ending consumption for vehicle {vehicle.token}')
async def main():
queue = asyncio.Queue()
# #todo now, do I need multiple producers
producers = [asyncio.create_task(producer(queue))
for _ in range(3)]
consumers = [asyncio.create_task(consumer(queue))
for _ in range(3)]
# with both producers and consumers running, wait for
# the producers to finish
await asyncio.gather(*producers)
print('---- done producing')
# wait for the remaining tasks to be processed
await queue.join()
# cancel the consumers, which are now idle
for c in consumers:
c.cancel()
asyncio.run(main())
In the example above, this commented section of code does not allow multiple vehicles to process at once:
# Does not work async
# def set_data(self):
# time.sleep(random.random() * 10)
Because this is such a complex query in our actual codebase, it would be a tremendous refactor to go flag every single nested function with async and await. Is there any way I can make this function work async without marking up my whole codebase with async?
You can run the function in a separate thread with asyncio.to_thread
await asyncio.to_thread(self.set_data)
If you're using python <3.9 use loop.run_in_executor
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, self.set_data)
I am working a sample program that reads from a datasource (csv or rdbms) in chunks, makes some transformation and sends it via socket to a server.
But because the csv is very large, for testing purpose I want to break the reading after few chunks.
Unfortunately something goes wrong and I do not know what and how to fix it. Probably I have to do some cancellation, but now sure where and how. I get the following error:
Task was destroyed but it is pending!
task: <Task pending coro=<<async_generator_athrow without __name__>()>>
The sample code is:
import asyncio
import json
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.001)
yield chunk
async def send(row):
j = json.dumps(row)
print(f"to be sent: {j}")
await asyncio.sleep(0.001)
async def main():
i = 0
async for chunk in readChunks():
for k, v in chunk.items():
await asyncio.gather(send({k:v}))
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Many async resources, such as generators, need to be cleaned up with the help of an event loop. When an async for loop stops iterating an async generator via break, the generator is cleaned up by the garbage collector only. This means the task is pending (waits for the event loop) but gets destroyed (by the garbage collector).
The most straightforward fix is to aclose the generator explicitly:
async def main():
i = 0
aiter = readChunks() # name iterator in order to ...
try:
async for chunk in aiter:
...
i += 1
if i > 5:
break
finally:
await aiter.aclose() # ... clean it up when done
These patterns can be simplified using the asyncstdlib (disclaimer: I maintain this library). asyncstdlib.islice allows to take a fixed number of items before cleanly closing the generator:
import asyncstdlib as a
async def main():
async for chunk in a.islice(readChunks(), 5):
...
If the break condition is dynamic, scoping the iterator guarantees cleanup in any case:
import asyncstdlib as a
async def main():
async with a.scoped_iter(readChunks()) as aiter:
async for idx, chunk in a.enumerate(aiter):
...
if idx >= 5:
break
This works...
import asyncio
import json
import logging
logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s',
datefmt='%S')
root = logging.getLogger()
root.setLevel(logging.INFO)
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.002)
root.info('readChunks: next chunk coming')
yield chunk
async def send(row):
j = json.dumps(row)
root.info(f"to be sent: {j}")
await asyncio.sleep(0.002)
async def main():
i = 0
root.info('main: starting to read chunks')
async for chunk in readChunks():
for k, v in chunk.items():
root.info(f'main: sending an item')
#await asyncio.gather(send({k:v}))
stuff = await send({k:v})
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
##loop = asyncio.get_event_loop()
##loop.run_until_complete(main())
##loop.close()
if __name__ == '__main__':
asyncio.run(main())
... At least it runs and finishes.
The issue with stopping an async generator by reaking out of an async for loop is described in bugs.python.org/issue38013 and looks like it was fixed in 3.7.5.
However, using
loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())
loop.close()
I get a debug error but no Exception in Python 3.8.
Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<<async_generator_athrow without __name__>()>>
Using the higher level API asyncio.run(main()) with debugging ON I do not get the debug message. If you are going to try and upgrade to Python 3.7.5-9 you probably should still use asyncio.run().
The problem is simple. You do early exit from loop, but async generator is not exhausted yet(its pending):
...
if i > 5:
break
...
Your readChunks is running in async and your loop. and without completing the program you are breaking it.
That's why it gives asyncio task was destroyed but it is pending
In short async task was doing its work in the background but you killed it by breaking the loop (stopping the program).