asyncio semaphore and wait task ordering patterns - python

Consider the following code for managing concurrency with identical async tasks
import asyncio
async def performTask(id):
await asyncio.sleep(1)
print(id)
async def runBatchItem(semaphore, task):
await semaphore.acquire()
await task
semaphore.release()
async def main():
# all tasks
tasks = [performTask(i) for i in range(20)]
# concurrency handler
MAX_CONCURRENT = 3
semaphore = asyncio.Semaphore(value=MAX_CONCURRENT)
stasks = [runBatchItem(semaphore, task) for task in tasks]
await asyncio.wait(stasks)
asyncio.run(main())
No matter how often I run it, I always end up with the following sequence of outputs
3 19 4 5 6 7 8 17 9 10 11 12 13 0 14 1 15 2 16 18
Question 1. What is the logic to this ordering of my tasks?
Question 2. What if I want the tasks to be processed in approximate insert order? I.e, like working through a queue with limited concurrency.
Thanks in advance!

As Andrew Svetlov (asyncio developer) answered here:
The order is undetermenistic by .wait() specification.
If you start your script on another machine you will get different result. If you want impose an order on task execution, you can just await for them in a loop or use asyncio synchronization primitive such as Event or Condition within coroutines.

Related

Is it possible to make a nested loop run asynchronously in python?

I was trying to run a cosine similarity code to check if two strings are similar inside my list of strings to make the list containing unique strings only to remove sentences that are similar. I took one string and compared it with every other string in the list. The method I implemented is O(n^2) and will take a month minimum to finish for all my strings. I was thinking if I could run the nested loop tasks in parallel to reduce the time using asyncio.
So I tried something very similar to this but it doesn't work asynchronously. Kindly guide me a little bit. thank you.
async def dumb_add(i,j):
print("adding",i,"+",j)
await asyncio.sleep(random.randint(0,3))
print(i,"+",j,"=",(i+j))
async def main():
for i in range(0,2):
for j in range(0,2):
await dumb_add(i,j)
print('main done')
asyncio.create_task(main())
Results:
adding 0 + 0
0 + 0 = 0
adding 0 + 1
0 + 1 = 1
adding 1 + 0
1 + 0 = 1
adding 1 + 1
1 + 1 = 2
main done
It is not running in parallel because the "await" keyword
is causing the co-routine to wait for each "dumb_add" call to finish, before moving on to the next one.
Therefore, the calls run sequentially rather than concurrently.
If you want to run your "dumb_add" function in parallel, you should use asyncio.gather().
In this way, you can create a list of routines that can be executed in parallel.
Something like this:
async def dumb_add(i,j):
print("adding",i,"+",j)
await asyncio.sleep(random.randint(0,3))
print(i,"+",j,"=",(i+j))
async def main():
tasks = []
for i in range(0,2):
for j in range(0,2):
tasks.append(dumb_add(i,j))
await asyncio.gather(*tasks)
print('main done')
asyncio.run(main())

run functions in paralel Python

I have a stream and i have a function I want to run, when i receive the message on this stream
async def some_func():
asyncio.sleep(5)
print("hello world")
client = create_client('wax.dfuse.eosnation.io:9000')
stream = client.Execute(Request(query = OPERATION_EOS))
for rawRequest in stream:
async.gather(some_func())
If there are 2 or more messages at the same time I want 2 or more functions that run in parallel.
Currently this script does not run a function
I need just a way to run function independently from main function.
Code example:
import asyncio
import time
chain = ""
sum = 0
async def myproc(callid):
global chain
global sum
print(f"myProc {callid} started ...")
t1 = time.perf_counter()
time.sleep(2.5)
chain = chain + "->" + str(callid)
sum = sum + 1
await asyncio.sleep(5)
print("hello world")
t = time.perf_counter() - t1
print(f" myProc {callid} finished in {t:0.5f} seconds. sum = {sum} chain {chain}")
async def main():
#client = create_client('wax.dfuse.eosnation.io:9000')
#stream = client.Execute(Request(query = OPERATION_EOS))
stream = range(10) # # simulation of the task aka each eelment from stream
coros = [myproc(rawRequest) for rawRequest in stream]
await asyncio.gather(*coros)
if __name__ == "__main__":
start_sec = time.perf_counter()
await main() # # for notebook how does work, for python interpreter use asyncio.run(main())
elapsed_secs = time.perf_counter() - start_sec
print(f"Job finished in {elapsed_secs:0.5f} seconds.")
Output:
myProc 0 started ...
myProc 1 started ...
myProc 2 started ...
myProc 3 started ...
myProc 4 started ...
myProc 5 started ...
myProc 6 started ...
myProc 7 started ...
myProc 8 started ...
myProc 9 started ...
hello world
myProc 0 finished in 25.02580 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 1 finished in 22.52303 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 2 finished in 20.02011 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 3 finished in 17.51737 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 4 finished in 15.01457 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 5 finished in 12.51187 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 6 finished in 10.00907 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 7 finished in 7.50854 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 8 finished in 7.50605 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
hello world
myProc 9 finished in 7.50515 seconds. sum = 10 chain ->0->1->2->3->4->5->6->7->8->9
Job finished in 30.02882 seconds.
For a detailed explanation u can look at following very good explanation on asynchronous execution of individual functions inside a program instead of parallelizing the processing, so instead of parallel execution using threading that is not efficient u can leverage a combination and the asyncio.gather(*coros) does run it all in parallel format without defining threads and increasing infrastructure Async Processing in Python – Make Data Pipelines Scream. Consider also using the asyncio.run() function instead of using lower level functions to manually create and close an event-loop, I did point out in comment but would be additional loop and for me in notebook that already does run by default so a high-level API for coroutines run is better in this case to handle all and u need to execute those coroutines with "event-loop" following format asyncio.run(main()) instead a simple call for coroutines main() (a bit of non-sense explanation for better understanding all this APIs).
Note: I did use notebook to execute it so if u do use python interpreter then use asyncio.run(main()) instead of await main() We use async processing here to mimic parallel processing, instead of doing true parallel processing which is generally harder to accomplish and not suited for your streaming job.
Somehow I made it working.
My code:
async def some_func(rawResult):
# There is some code
async def stream_eosio(loop):
for rawResult in stream:
asyncio.run_coroutine_threadsafe(some_func(rawResult), loop)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
Thread(target=asyncio.run, args=(stream_eosio(loop),)).start()
loop.run_forever()
Cons:
You can't stop this script with Ctrl + Z or Ctrl + C, because of Thread.
Pros: It's kinda ez.
with threading you could do
import threading
thread1 = threading.Thread(target=fn)
thread2 = threading.Thread(target=fn)
thread1.start()
thread2.start()
thread1.join()
thread2.join()

Python Trio set up a decimal number of workers

I'm working with trio to run asynchronous concurrent task that will do some web scraping on different websites. I'd like to be able to chose how many concurrent workers I'll divide the tasks with. To do so I've written this code
async def run_task():
s = trio.Session(connections=5)
Total_to_check = to_check() / int(module().workers)
line = 0
if int(Total_to_check) < 1:
Total_to_check = 1
module().workers = int(to_check())
for i in range(int(Total_to_check)):
try:
async with trio.open_nursery() as nursery:
for x in range(int(module().workers)):
nursery.start_soon(python_worker, self, s, x, line)
line += 1
except BlockingIOError as e:
print("[Fatal Error]", str(e))
continue
In this example to_check() is equal to how many urls are given to fetch data from, and module().workers is equal to how many concurrent workers I'd like to use.
So if I had let's say I had 30 urls and I input that I want 10 concurrent tasks, it'll fetch data from 10 urls concurrently and repeat the procedure 3 times.
Now this is all well and good up until I the Total_to_check(which is equal to the number of urls divided by the number of workers) is in the decimals.
If I have let's say 15 urls and I ask for 10 workers, then this code will only check 10 urls. Same if I've got 20 urls but ask for 15 workers.
I could do something like math.ceil(Total_to_check) but then it'll start trying to check urls that don't exist.
How could I make this properly work, so that let's if I have 10 concurrent tasks and 15 urls, it'll check the first 10 concurrently and then the last 5 concurrently without skipping urls? (or trying to check too many)
Thanks!
Well, here comes the CapacityLimiter that you would use like this:
async def python_worker(self, session, workers, line, limit):
async with limit:
...
Then you can simplify your run_task:
async def run_task():
limit = trio.CapacityLimiter(10)
s = trio.Session(connections=5)
line = 0
async with trio.open_nursery() as nursery:
for x in range(int(to_check())):
nursery.start_soon(python_worker, self, s, x, line, limit)
line += 1
I believe the BlockingIOError would have to move inside python_worker too because nursery.start_soon() won't block, it's the __aexit__ of the nursery that automagically waits at the end of the async with trio.open_nursery() as nursery block.

Python, How to make an asynchronous data generator?

I have a program that loads data and processes it. Both loading and processing take time, and I'd like to do them in parallel.
Here is the synchronous version of my program (where the "loading" and "processing" are done in sequence, and are trivial operations here for the sake of the example):
import time
def data_loader():
for i in range(4):
time.sleep(1) # Simulated loading time
yield i
def main():
start = time.time()
for data in data_loader():
time.sleep(1) # Simulated processing time
processed_data = -data*2
print(f'At t={time.time()-start:.3g}, processed data {data} into {processed_data}')
if __name__ == '__main__':
main()
When I run this, I get output:
At t=2.01, processed data 0 into 0
At t=4.01, processed data 1 into -2
At t=6.02, processed data 2 into -4
At t=8.02, processed data 3 into -6
The loop runs every 2s, with 1s for loading and 1s for processing.
Now, I'd like to make an asynchronous version, where the loading and processing are done concurrently (so the loader gets the next data ready while the processor is processing it). It should then take 2s for the first statement to be printed, and 1s for each statement after that. Expected output would be similar to:
At t=2.01, processed data 0 into 0
At t=3.01, processed data 1 into -2
At t=4.02, processed data 2 into -4
At t=5.02, processed data 3 into -6
Ideally, only contents of the main function would have to change (as the data_loader code should not care that it may be used in an asynchronous way).
The multiprocessing module's utilities may be what you want.
import time
import multiprocessing
def data_loader():
for i in range(4):
time.sleep(1) # Simulated loading time
yield i
def process_item(item):
time.sleep(1) # Simulated processing time
return (item, -item*2) # Return the original too.
def main():
start = time.time()
with multiprocessing.Pool() as p:
data_iterator = data_loader()
for (data, processed_data) in p.imap(process_item, data_iterator):
print(f'At t={time.time()-start:.3g}, processed data {data} into {processed_data}')
if __name__ == '__main__':
main()
This outputs
At t=2.03, processed data 0 into 0
At t=3.03, processed data 1 into -2
At t=4.04, processed data 2 into -4
At t=5.04, processed data 3 into -6
Depending on your requirements, you may find .imap_unordered() to be faster, and it's also worth knowing that there's a thread-based version of Pool available as multiprocessing.dummy.Pool – this may be useful to avoid IPC overhead if your data is large, and your processing is not done in Python (so you can avoid the GIL).
The key of your problem is in the actual processing of the data. I don't know what you're doing with the data in your real program, but it must be an asynchronous operation to use asynchronous programming. If you're doing active, blocking CPU-bound processing, you might be better offloading to a separate process, instead, to be able to use multiple CPU cores and do things concurrently. If the actual processing of the data is in fact just the consumption of some asynchronous service, then it can be wrapped in a single asynchronous concurrent thread very effectively.
In your example, you're using time.sleep() to simulate the processing. Since that example operation can be done asynchronously (by using asyncio.sleep() instead) then the conversion is simple:
import itertools
import asyncio
async def data_loader():
for i in itertools.count(0):
await asyncio.sleep(1) # Simulated loading time
yield i
async def process(data):
await asyncio.sleep(1) # Simulated processing time
processed_data = -data*2
print(f'At t={loop.time()-start:.3g}, processed data {data} into {processed_data}')
async def main():
tasks = []
async for data in data_loader():
tasks.append(loop.create_task(process(data)))
await asyncio.wait(tasks) # wait for all remaining tasks
if __name__ == '__main__':
loop = asyncio.get_event_loop()
start = loop.time()
loop.run_until_complete(main())
loop.close()
The results, as you expect:
At t=2, processed data 0 into 0
At t=3, processed data 1 into -2
At t=4, processed data 2 into -4
...
Remember that it only works because time.sleep() has an asynchronous alternative in the form of asyncio.sleep(). Check the operation you're using, to see if it can be written in asynchronous form.
Here is a solution that allows you to wrap the dataloader with an iter_asynchronously function. It solves the problem for now. (Note however that there is still the problem that if the dataloader is faster than the processing loop, the queue will grow indefinitely. This could easily be solved by adding a wait in _async_queue_manager if the queue gets to big (but sadly Queue.qsize() is not supported on Mac!))
import time
from multiprocessing import Queue, Process
class PoisonPill:
pass
def _async_queue_manager(gen_func, queue: Queue):
for item in gen_func():
queue.put(item)
queue.put(PoisonPill)
def iter_asynchronously(gen_func):
""" Given a generator function, make it asynchonous. """
q = Queue()
p = Process(target=_async_queue_manager, args=(gen_func, q))
p.start()
while True:
item = q.get()
if item is PoisonPill:
break
else:
yield item
def data_loader():
for i in range(4):
time.sleep(1) # Simulated loading time
yield i
def main():
start = time.time()
for data in iter_asynchronously(data_loader):
time.sleep(1) # Simulated processing time
processed_data = -data*2
print(f'At t={time.time()-start:.3g}, processed data {data} into {processed_data}')
if __name__ == '__main__':
main()
The output is now as desired:
At t=2.03, processed data 0 into 0
At t=3.03, processed data 1 into -2
At t=4.04, processed data 2 into -4
At t=5.04, processed data 3 into -6

How do I use threads on a generator while keeping the order?

I have a simple code that runs a GET request per each item in the generator that I'm trying to speed up:
def stream(self, records):
# type(records) = <type 'generator'>
for record in records:
# record = OrderedDict([('_time', '1518287568'), ('data', '5552267792')])
output = rest_api_lookup(record[self.input_field])
record.update(output)
yield record
Right now this runs on a single thread and takes forever since each REST call waits until the previous REST call finishes.
I have used multithreading in Python from a list before using this great answer (https://stackoverflow.com/a/28463266/1150923), but I'm not sure how to re-use the same strategy on a generator instead of a list.
I had some advise from a fellow developer who recommended me that I break out the generator into 100-element lists and then close the pool, but I don't know how to create these lists from the generator.
I also need to keep the original order since I need to yield record in the right order.
I assume you don't want to turn your generator records into a list first. One way to speed up your processing is to pass the records into a ThreadPoolExecutor chunk-wise. The executor will process your rest_api_lookup concurrently for all items of the chunk. Then you just need to "unchunk" your results. Here's some running sample code (which does not use classes, sorry, but I hope it shows the principle):
from concurrent.futures import ThreadPoolExecutor
from time import sleep
pool = ThreadPoolExecutor(8) # 8 threads, adjust to taste and # of cores
def records():
# simulates records generator
for i in range(100):
yield {'a': i}
def rest_api_lookup(a):
# simulates REST call :)
sleep(0.1)
return {'b': -a}
def stream(records):
def update_fun(record):
output = rest_api_lookup(record['a'])
record.update(output)
return record
chunk = []
for record in records:
# submit update_fun(record) into pool, keep resulting Future
chunk.append(pool.submit(update_fun, record))
if len(chunk) == 8:
yield chunk
chunk = []
if chunk:
yield chunk
def unchunk(chunk_gen):
"""Flattens a generator of Future chunks into a generator of Future results."""
for chunk in chunk_gen:
for f in chunk:
yield f.result() # get result from Future
# Now iterate over all results in same order as generated by records()
for result in unchunk(stream(records())):
print(result)
HTH!
Update: I added a sleep to the simulated REST call, to make it more realistic. This chunked version finishes on my machine in 1.5 seconds. The sequential version takes 10 seconds (as is to be expected, 100 * 0.1s = 10s).
Here's an example how you can do it with concurrent.futures:
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor
class YourClass(object):
def stream(self, records):
for record in records:
output = rest_api_lookup(record[self.input_field])
record.update(output)
# process your list and yield back result.
yield {"result_key": "whatever the result is"}
def run_parallel(self):
""" Use this method to do the parallel processing """
# The important part - concurrent futures
# - set number of workers as the number of jobs to process - suggest 4, but may differ
# this will depend on how many threads you want to run in parallel
with ThreadPoolExecutor(4) as executor:
# Use list jobs for concurrent futures
# Use list scraped_results for results
jobs = []
parallel_results = []
# Pass some keyword arguments if needed - per job
record1 = {} # your values for record1 - if need more - create
record2 = {} # your values for record2 - if need more - create
record3 = {} # your values for record3 - if need more - create
record4 = {} # your values for record4 - if need more - create
list_of_records = [[record1, record2], [record3, record4],]
for records in list_of_records:
# Here we iterate 'number of records' times, could be different
# We're adding stream, could be different function per call
jobs.append(executor.submit(self.stream, records))
# Once parallel processing is complete, iterate over results
# append results to final processing without any networking
for job in futures.as_completed(jobs):
# Read result from future
result = job.result()
# Append to the list of results
parallel_results.append(result)
# Use sorted to sort by key to preserve order
parallel_results = sorted(parallel_results, key=lambda k: k['result_key'])
# Iterate over results streamed and do whatever is needed
for result is parallel_results:
print("Do something with me {}".format(result))
The answer by dnswlt works well but can still be improved. If the request to the REST API (or whatever else should be done with each record) take a variable amount of time, some CPUs may be idle while the slowest request of each batch is running.
The following solution takes a generator and a function as an input and applies the function to each element produced by the generator while maintaining a given number of running threads (each of which applies the function to one element). At the same time, it still returns the results in the order of the input.
from concurrent.futures import ThreadPoolExecutor
import os
import random
import time
def map_async(iterable, func, max_workers=os.cpu_count()):
# Generator that applies func to the input using max_workers concurrent jobs
def async_iterator():
iterator = iter(iterable)
pending_results = []
has_input = True
thread_pool = ThreadPoolExecutor(max_workers)
while True:
# Submit jobs for remaining input until max_worker jobs are running
while has_input and \
len([e for e in pending_results if e.running()]) \
< max_workers:
try:
e = next(iterator)
print('Submitting task...')
pending_results.append(thread_pool.submit(func, e))
except StopIteration:
print('Submitted all task.')
has_input = False
# If there are no pending results, the generator is done
if not pending_results:
return
# If the oldest job is done, return its value
if pending_results[0].done():
yield pending_results.pop(0).result()
# Otherwise, yield the CPU, then continue starting new jobs
else:
time.sleep(.01)
return async_iterator()
def example_generator():
for i in range(20):
print('Creating task', i)
yield i
def do_work(i):
print('Starting to work on', i)
time.sleep(random.uniform(0, 3))
print('Done with', i)
return i
random.seed(42)
for i in map_async(example_generator(), do_work):
print('Got result:', i)
The commented output of a possible execution (on a machine with 8 logical CPUs):
Creating task 0
Submitting task...
Starting to work on 0
Creating task 1
Submitting task...
Starting to work on 1
Creating task 2
Submitting task...
Starting to work on 2
Creating task 3
Submitting task...
Starting to work on 3
Creating task 4
Submitting task...
Starting to work on 4
Creating task 5
Submitting task...
Starting to work on 5
Creating task 6
Submitting task...
Starting to work on 6
Creating task 7
Submitting task...
Starting to work on 7 # This point is reached quickly: 8 jobs are started before any of them finishes
Done with 1 # Job 1 is done, but since job 0 is not, the result is not returned yet
Creating task 8 # Job 1 finished, so a new job can be started
Submitting task...
Creating task 9
Starting to work on 8
Submitting task...
Done with 7
Starting to work on 9
Done with 9
Creating task 10
Submitting task...
Creating task 11
Starting to work on 10
Submitting task...
Done with 3
Starting to work on 11
Done with 2
Creating task 12
Submitting task...
Creating task 13
Starting to work on 12
Submitting task...
Done with 12
Starting to work on 13
Done with 10
Creating task 14
Submitting task...
Creating task 15
Starting to work on 14
Submitting task...
Done with 8
Starting to work on 15
Done with 13 # Several other jobs are started and completed
Creating task 16
Submitting task...
Creating task 17
Starting to work on 16
Submitting task...
Done with 0 # Finally, job 0 is completed
Starting to work on 17
Got result: 0
Got result: 1
Got result: 2
Got result: 3 # The result of all completed jobs are returned in input order until the job of the next one is still running
Done with 5
Creating task 18
Submitting task...
Creating task 19
Starting to work on 18
Submitting task...
Done with 16
Starting to work on 19
Done with 11
Submitted all task.
Done with 19
Done with 4
Got result: 4
Got result: 5
Done with 6
Got result: 6 # Job 6 must have been a very long job; now that it's done, its result and the result of many subsequent jobs can be returned
Got result: 7
Got result: 8
Got result: 9
Got result: 10
Got result: 11
Got result: 12
Got result: 13
Done with 14
Got result: 14
Done with 15
Got result: 15
Got result: 16
Done with 17
Got result: 17
Done with 18
Got result: 18
Got result: 19
The above run took about 4.7s while the sequential execution (setting max_workers=1) took about 23.6s. Without the optimization that avoids waiting for the slowest execution per batch, the execution takes about 5.3s. Depending on the variation of the individual job times and max_workers, the effect of the optimization may be even larger.

Categories