Python Trio set up a decimal number of workers - python

I'm working with trio to run asynchronous concurrent task that will do some web scraping on different websites. I'd like to be able to chose how many concurrent workers I'll divide the tasks with. To do so I've written this code
async def run_task():
s = trio.Session(connections=5)
Total_to_check = to_check() / int(module().workers)
line = 0
if int(Total_to_check) < 1:
Total_to_check = 1
module().workers = int(to_check())
for i in range(int(Total_to_check)):
try:
async with trio.open_nursery() as nursery:
for x in range(int(module().workers)):
nursery.start_soon(python_worker, self, s, x, line)
line += 1
except BlockingIOError as e:
print("[Fatal Error]", str(e))
continue
In this example to_check() is equal to how many urls are given to fetch data from, and module().workers is equal to how many concurrent workers I'd like to use.
So if I had let's say I had 30 urls and I input that I want 10 concurrent tasks, it'll fetch data from 10 urls concurrently and repeat the procedure 3 times.
Now this is all well and good up until I the Total_to_check(which is equal to the number of urls divided by the number of workers) is in the decimals.
If I have let's say 15 urls and I ask for 10 workers, then this code will only check 10 urls. Same if I've got 20 urls but ask for 15 workers.
I could do something like math.ceil(Total_to_check) but then it'll start trying to check urls that don't exist.
How could I make this properly work, so that let's if I have 10 concurrent tasks and 15 urls, it'll check the first 10 concurrently and then the last 5 concurrently without skipping urls? (or trying to check too many)
Thanks!

Well, here comes the CapacityLimiter that you would use like this:
async def python_worker(self, session, workers, line, limit):
async with limit:
...
Then you can simplify your run_task:
async def run_task():
limit = trio.CapacityLimiter(10)
s = trio.Session(connections=5)
line = 0
async with trio.open_nursery() as nursery:
for x in range(int(to_check())):
nursery.start_soon(python_worker, self, s, x, line, limit)
line += 1
I believe the BlockingIOError would have to move inside python_worker too because nursery.start_soon() won't block, it's the __aexit__ of the nursery that automagically waits at the end of the async with trio.open_nursery() as nursery block.

Related

Is it possible to make a nested loop run asynchronously in python?

I was trying to run a cosine similarity code to check if two strings are similar inside my list of strings to make the list containing unique strings only to remove sentences that are similar. I took one string and compared it with every other string in the list. The method I implemented is O(n^2) and will take a month minimum to finish for all my strings. I was thinking if I could run the nested loop tasks in parallel to reduce the time using asyncio.
So I tried something very similar to this but it doesn't work asynchronously. Kindly guide me a little bit. thank you.
async def dumb_add(i,j):
print("adding",i,"+",j)
await asyncio.sleep(random.randint(0,3))
print(i,"+",j,"=",(i+j))
async def main():
for i in range(0,2):
for j in range(0,2):
await dumb_add(i,j)
print('main done')
asyncio.create_task(main())
Results:
adding 0 + 0
0 + 0 = 0
adding 0 + 1
0 + 1 = 1
adding 1 + 0
1 + 0 = 1
adding 1 + 1
1 + 1 = 2
main done
It is not running in parallel because the "await" keyword
is causing the co-routine to wait for each "dumb_add" call to finish, before moving on to the next one.
Therefore, the calls run sequentially rather than concurrently.
If you want to run your "dumb_add" function in parallel, you should use asyncio.gather().
In this way, you can create a list of routines that can be executed in parallel.
Something like this:
async def dumb_add(i,j):
print("adding",i,"+",j)
await asyncio.sleep(random.randint(0,3))
print(i,"+",j,"=",(i+j))
async def main():
tasks = []
for i in range(0,2):
for j in range(0,2):
tasks.append(dumb_add(i,j))
await asyncio.gather(*tasks)
print('main done')
asyncio.run(main())

asyncio semaphore and wait task ordering patterns

Consider the following code for managing concurrency with identical async tasks
import asyncio
async def performTask(id):
await asyncio.sleep(1)
print(id)
async def runBatchItem(semaphore, task):
await semaphore.acquire()
await task
semaphore.release()
async def main():
# all tasks
tasks = [performTask(i) for i in range(20)]
# concurrency handler
MAX_CONCURRENT = 3
semaphore = asyncio.Semaphore(value=MAX_CONCURRENT)
stasks = [runBatchItem(semaphore, task) for task in tasks]
await asyncio.wait(stasks)
asyncio.run(main())
No matter how often I run it, I always end up with the following sequence of outputs
3 19 4 5 6 7 8 17 9 10 11 12 13 0 14 1 15 2 16 18
Question 1. What is the logic to this ordering of my tasks?
Question 2. What if I want the tasks to be processed in approximate insert order? I.e, like working through a queue with limited concurrency.
Thanks in advance!
As Andrew Svetlov (asyncio developer) answered here:
The order is undetermenistic by .wait() specification.
If you start your script on another machine you will get different result. If you want impose an order on task execution, you can just await for them in a loop or use asyncio synchronization primitive such as Event or Condition within coroutines.

Python, How to make an asynchronous data generator?

I have a program that loads data and processes it. Both loading and processing take time, and I'd like to do them in parallel.
Here is the synchronous version of my program (where the "loading" and "processing" are done in sequence, and are trivial operations here for the sake of the example):
import time
def data_loader():
for i in range(4):
time.sleep(1) # Simulated loading time
yield i
def main():
start = time.time()
for data in data_loader():
time.sleep(1) # Simulated processing time
processed_data = -data*2
print(f'At t={time.time()-start:.3g}, processed data {data} into {processed_data}')
if __name__ == '__main__':
main()
When I run this, I get output:
At t=2.01, processed data 0 into 0
At t=4.01, processed data 1 into -2
At t=6.02, processed data 2 into -4
At t=8.02, processed data 3 into -6
The loop runs every 2s, with 1s for loading and 1s for processing.
Now, I'd like to make an asynchronous version, where the loading and processing are done concurrently (so the loader gets the next data ready while the processor is processing it). It should then take 2s for the first statement to be printed, and 1s for each statement after that. Expected output would be similar to:
At t=2.01, processed data 0 into 0
At t=3.01, processed data 1 into -2
At t=4.02, processed data 2 into -4
At t=5.02, processed data 3 into -6
Ideally, only contents of the main function would have to change (as the data_loader code should not care that it may be used in an asynchronous way).
The multiprocessing module's utilities may be what you want.
import time
import multiprocessing
def data_loader():
for i in range(4):
time.sleep(1) # Simulated loading time
yield i
def process_item(item):
time.sleep(1) # Simulated processing time
return (item, -item*2) # Return the original too.
def main():
start = time.time()
with multiprocessing.Pool() as p:
data_iterator = data_loader()
for (data, processed_data) in p.imap(process_item, data_iterator):
print(f'At t={time.time()-start:.3g}, processed data {data} into {processed_data}')
if __name__ == '__main__':
main()
This outputs
At t=2.03, processed data 0 into 0
At t=3.03, processed data 1 into -2
At t=4.04, processed data 2 into -4
At t=5.04, processed data 3 into -6
Depending on your requirements, you may find .imap_unordered() to be faster, and it's also worth knowing that there's a thread-based version of Pool available as multiprocessing.dummy.Pool – this may be useful to avoid IPC overhead if your data is large, and your processing is not done in Python (so you can avoid the GIL).
The key of your problem is in the actual processing of the data. I don't know what you're doing with the data in your real program, but it must be an asynchronous operation to use asynchronous programming. If you're doing active, blocking CPU-bound processing, you might be better offloading to a separate process, instead, to be able to use multiple CPU cores and do things concurrently. If the actual processing of the data is in fact just the consumption of some asynchronous service, then it can be wrapped in a single asynchronous concurrent thread very effectively.
In your example, you're using time.sleep() to simulate the processing. Since that example operation can be done asynchronously (by using asyncio.sleep() instead) then the conversion is simple:
import itertools
import asyncio
async def data_loader():
for i in itertools.count(0):
await asyncio.sleep(1) # Simulated loading time
yield i
async def process(data):
await asyncio.sleep(1) # Simulated processing time
processed_data = -data*2
print(f'At t={loop.time()-start:.3g}, processed data {data} into {processed_data}')
async def main():
tasks = []
async for data in data_loader():
tasks.append(loop.create_task(process(data)))
await asyncio.wait(tasks) # wait for all remaining tasks
if __name__ == '__main__':
loop = asyncio.get_event_loop()
start = loop.time()
loop.run_until_complete(main())
loop.close()
The results, as you expect:
At t=2, processed data 0 into 0
At t=3, processed data 1 into -2
At t=4, processed data 2 into -4
...
Remember that it only works because time.sleep() has an asynchronous alternative in the form of asyncio.sleep(). Check the operation you're using, to see if it can be written in asynchronous form.
Here is a solution that allows you to wrap the dataloader with an iter_asynchronously function. It solves the problem for now. (Note however that there is still the problem that if the dataloader is faster than the processing loop, the queue will grow indefinitely. This could easily be solved by adding a wait in _async_queue_manager if the queue gets to big (but sadly Queue.qsize() is not supported on Mac!))
import time
from multiprocessing import Queue, Process
class PoisonPill:
pass
def _async_queue_manager(gen_func, queue: Queue):
for item in gen_func():
queue.put(item)
queue.put(PoisonPill)
def iter_asynchronously(gen_func):
""" Given a generator function, make it asynchonous. """
q = Queue()
p = Process(target=_async_queue_manager, args=(gen_func, q))
p.start()
while True:
item = q.get()
if item is PoisonPill:
break
else:
yield item
def data_loader():
for i in range(4):
time.sleep(1) # Simulated loading time
yield i
def main():
start = time.time()
for data in iter_asynchronously(data_loader):
time.sleep(1) # Simulated processing time
processed_data = -data*2
print(f'At t={time.time()-start:.3g}, processed data {data} into {processed_data}')
if __name__ == '__main__':
main()
The output is now as desired:
At t=2.03, processed data 0 into 0
At t=3.03, processed data 1 into -2
At t=4.04, processed data 2 into -4
At t=5.04, processed data 3 into -6

How to throttle script that creates celery tasks faster than they're consumed?

I have a script that generates millions of Celery tasks, one per row in the DB. Is there a way to throttle it so that it doesn't completely flood Celery?
Ideally I want to keep Celery busy, but I don't want the length of the Celery queue to exceed a few dozen tasks since that's just a waste of memory (especially since without some kind of throttle the script will add millions of tasks to the queue almost instantly).
I've spent some time on this problem over the past several days and came up with what I'm calling a CeleryThrottle object. Basically, you tell it how many items you want in a queue and it does its best to keep the queue between that size and 2× that size.
So here's the code (assumes Redis broker, but easily changed):
# coding=utf-8
from collections import deque
import time
import redis
from django.conf import settings
from django.utils.timezone import now
def get_queue_length(queue_name='celery'):
"""Get the number of tasks in a celery queue.
:param queue_name: The name of the queue you want to inspect.
:return: the number of items in the queue.
"""
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
return r.llen(queue_name)
class CeleryThrottle(object):
"""A class for throttling celery."""
def __init__(self, min_items=100, queue_name='celery'):
"""Create a throttle to prevent celery run aways.
:param min_items: The minimum number of items that should be enqueued.
A maximum of 2× this number may be created. This minimum value is not
guaranteed and so a number slightly higher than your max concurrency
should be used. Note that this number includes all tasks unless you use
a specific queue for your processing.
"""
self.min = min_items
self.max = self.min * 2
# Variables used to track the queue and wait-rate
self.last_processed_count = 0
self.count_to_do = self.max
self.last_measurement = None
self.first_run = True
# Use a fixed-length queue to hold last N rates
self.rates = deque(maxlen=15)
self.avg_rate = self._calculate_avg()
# For inspections
self.queue_name = queue_name
def _calculate_avg(self):
return float(sum(self.rates)) / (len(self.rates) or 1)
def _add_latest_rate(self):
"""Calculate the rate that the queue is processing items."""
right_now = now()
elapsed_seconds = (right_now - self.last_measurement).total_seconds()
self.rates.append(self.last_processed_count / elapsed_seconds)
self.last_measurement = right_now
self.last_processed_count = 0
self.avg_rate = self._calculate_avg()
def maybe_wait(self):
"""Stall the calling function or let it proceed, depending on the queue.
The idea here is to check the length of the queue as infrequently as
possible while keeping the number of items in the queue as closely
between self.min and self.max as possible.
We do this by immediately enqueueing self.max items. After that, we
monitor the queue to determine how quickly it is processing items. Using
that rate we wait an appropriate amount of time or immediately press on.
"""
self.last_processed_count += 1
if self.count_to_do > 0:
# Do not wait. Allow process to continue.
if self.first_run:
self.first_run = False
self.last_measurement = now()
self.count_to_do -= 1
return
self._add_latest_rate()
task_count = get_queue_length(self.queue_name)
if task_count > self.min:
# Estimate how long the surplus will take to complete and wait that
# long + 5% to ensure we're below self.min on next iteration.
surplus_task_count = task_count - self.min
wait_time = (surplus_task_count / self.avg_rate) * 1.05
time.sleep(wait_time)
# Assume we're below self.min due to waiting; max out the queue.
if task_count < self.max:
self.count_to_do = self.max - self.min
return
elif task_count <= self.min:
# Add more items.
self.count_to_do = self.max - task_count
return
Usage looks like:
throttle = CeleryThrottle()
for item in really_big_list_of_items:
throttle.maybe_wait()
my_task.delay(item)
Pretty simple and hopefully pretty flexible. With that in place, the code will monitor your queue and add waits to your loop if the queue is getting too long. This is in our github repo in case there are updates.
As it does this, it will track the rolling average speed of the task, and will attempt not to check the queue length more frequently than needed. For example, if tasks take two minutes each to run, after putting 100 items in teh queue, it can wait quite a while before having to check the length of the queue again. A simpler version of this script could check the queue length every time through the loop, but that would add unnecessary delay. This version tries to be smart about it at the cost of being sometimes wrong (in which case the queue goes below min_items).

How do I use threads on a generator while keeping the order?

I have a simple code that runs a GET request per each item in the generator that I'm trying to speed up:
def stream(self, records):
# type(records) = <type 'generator'>
for record in records:
# record = OrderedDict([('_time', '1518287568'), ('data', '5552267792')])
output = rest_api_lookup(record[self.input_field])
record.update(output)
yield record
Right now this runs on a single thread and takes forever since each REST call waits until the previous REST call finishes.
I have used multithreading in Python from a list before using this great answer (https://stackoverflow.com/a/28463266/1150923), but I'm not sure how to re-use the same strategy on a generator instead of a list.
I had some advise from a fellow developer who recommended me that I break out the generator into 100-element lists and then close the pool, but I don't know how to create these lists from the generator.
I also need to keep the original order since I need to yield record in the right order.
I assume you don't want to turn your generator records into a list first. One way to speed up your processing is to pass the records into a ThreadPoolExecutor chunk-wise. The executor will process your rest_api_lookup concurrently for all items of the chunk. Then you just need to "unchunk" your results. Here's some running sample code (which does not use classes, sorry, but I hope it shows the principle):
from concurrent.futures import ThreadPoolExecutor
from time import sleep
pool = ThreadPoolExecutor(8) # 8 threads, adjust to taste and # of cores
def records():
# simulates records generator
for i in range(100):
yield {'a': i}
def rest_api_lookup(a):
# simulates REST call :)
sleep(0.1)
return {'b': -a}
def stream(records):
def update_fun(record):
output = rest_api_lookup(record['a'])
record.update(output)
return record
chunk = []
for record in records:
# submit update_fun(record) into pool, keep resulting Future
chunk.append(pool.submit(update_fun, record))
if len(chunk) == 8:
yield chunk
chunk = []
if chunk:
yield chunk
def unchunk(chunk_gen):
"""Flattens a generator of Future chunks into a generator of Future results."""
for chunk in chunk_gen:
for f in chunk:
yield f.result() # get result from Future
# Now iterate over all results in same order as generated by records()
for result in unchunk(stream(records())):
print(result)
HTH!
Update: I added a sleep to the simulated REST call, to make it more realistic. This chunked version finishes on my machine in 1.5 seconds. The sequential version takes 10 seconds (as is to be expected, 100 * 0.1s = 10s).
Here's an example how you can do it with concurrent.futures:
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor
class YourClass(object):
def stream(self, records):
for record in records:
output = rest_api_lookup(record[self.input_field])
record.update(output)
# process your list and yield back result.
yield {"result_key": "whatever the result is"}
def run_parallel(self):
""" Use this method to do the parallel processing """
# The important part - concurrent futures
# - set number of workers as the number of jobs to process - suggest 4, but may differ
# this will depend on how many threads you want to run in parallel
with ThreadPoolExecutor(4) as executor:
# Use list jobs for concurrent futures
# Use list scraped_results for results
jobs = []
parallel_results = []
# Pass some keyword arguments if needed - per job
record1 = {} # your values for record1 - if need more - create
record2 = {} # your values for record2 - if need more - create
record3 = {} # your values for record3 - if need more - create
record4 = {} # your values for record4 - if need more - create
list_of_records = [[record1, record2], [record3, record4],]
for records in list_of_records:
# Here we iterate 'number of records' times, could be different
# We're adding stream, could be different function per call
jobs.append(executor.submit(self.stream, records))
# Once parallel processing is complete, iterate over results
# append results to final processing without any networking
for job in futures.as_completed(jobs):
# Read result from future
result = job.result()
# Append to the list of results
parallel_results.append(result)
# Use sorted to sort by key to preserve order
parallel_results = sorted(parallel_results, key=lambda k: k['result_key'])
# Iterate over results streamed and do whatever is needed
for result is parallel_results:
print("Do something with me {}".format(result))
The answer by dnswlt works well but can still be improved. If the request to the REST API (or whatever else should be done with each record) take a variable amount of time, some CPUs may be idle while the slowest request of each batch is running.
The following solution takes a generator and a function as an input and applies the function to each element produced by the generator while maintaining a given number of running threads (each of which applies the function to one element). At the same time, it still returns the results in the order of the input.
from concurrent.futures import ThreadPoolExecutor
import os
import random
import time
def map_async(iterable, func, max_workers=os.cpu_count()):
# Generator that applies func to the input using max_workers concurrent jobs
def async_iterator():
iterator = iter(iterable)
pending_results = []
has_input = True
thread_pool = ThreadPoolExecutor(max_workers)
while True:
# Submit jobs for remaining input until max_worker jobs are running
while has_input and \
len([e for e in pending_results if e.running()]) \
< max_workers:
try:
e = next(iterator)
print('Submitting task...')
pending_results.append(thread_pool.submit(func, e))
except StopIteration:
print('Submitted all task.')
has_input = False
# If there are no pending results, the generator is done
if not pending_results:
return
# If the oldest job is done, return its value
if pending_results[0].done():
yield pending_results.pop(0).result()
# Otherwise, yield the CPU, then continue starting new jobs
else:
time.sleep(.01)
return async_iterator()
def example_generator():
for i in range(20):
print('Creating task', i)
yield i
def do_work(i):
print('Starting to work on', i)
time.sleep(random.uniform(0, 3))
print('Done with', i)
return i
random.seed(42)
for i in map_async(example_generator(), do_work):
print('Got result:', i)
The commented output of a possible execution (on a machine with 8 logical CPUs):
Creating task 0
Submitting task...
Starting to work on 0
Creating task 1
Submitting task...
Starting to work on 1
Creating task 2
Submitting task...
Starting to work on 2
Creating task 3
Submitting task...
Starting to work on 3
Creating task 4
Submitting task...
Starting to work on 4
Creating task 5
Submitting task...
Starting to work on 5
Creating task 6
Submitting task...
Starting to work on 6
Creating task 7
Submitting task...
Starting to work on 7 # This point is reached quickly: 8 jobs are started before any of them finishes
Done with 1 # Job 1 is done, but since job 0 is not, the result is not returned yet
Creating task 8 # Job 1 finished, so a new job can be started
Submitting task...
Creating task 9
Starting to work on 8
Submitting task...
Done with 7
Starting to work on 9
Done with 9
Creating task 10
Submitting task...
Creating task 11
Starting to work on 10
Submitting task...
Done with 3
Starting to work on 11
Done with 2
Creating task 12
Submitting task...
Creating task 13
Starting to work on 12
Submitting task...
Done with 12
Starting to work on 13
Done with 10
Creating task 14
Submitting task...
Creating task 15
Starting to work on 14
Submitting task...
Done with 8
Starting to work on 15
Done with 13 # Several other jobs are started and completed
Creating task 16
Submitting task...
Creating task 17
Starting to work on 16
Submitting task...
Done with 0 # Finally, job 0 is completed
Starting to work on 17
Got result: 0
Got result: 1
Got result: 2
Got result: 3 # The result of all completed jobs are returned in input order until the job of the next one is still running
Done with 5
Creating task 18
Submitting task...
Creating task 19
Starting to work on 18
Submitting task...
Done with 16
Starting to work on 19
Done with 11
Submitted all task.
Done with 19
Done with 4
Got result: 4
Got result: 5
Done with 6
Got result: 6 # Job 6 must have been a very long job; now that it's done, its result and the result of many subsequent jobs can be returned
Got result: 7
Got result: 8
Got result: 9
Got result: 10
Got result: 11
Got result: 12
Got result: 13
Done with 14
Got result: 14
Done with 15
Got result: 15
Got result: 16
Done with 17
Got result: 17
Done with 18
Got result: 18
Got result: 19
The above run took about 4.7s while the sequential execution (setting max_workers=1) took about 23.6s. Without the optimization that avoids waiting for the slowest execution per batch, the execution takes about 5.3s. Depending on the variation of the individual job times and max_workers, the effect of the optimization may be even larger.

Categories