How to make a Python generator execution asynchronous?

How to make a Python generator execution asynchronous? - python

I have a piece of of code that looks like this:
def generator():
while True:
result = very_long_computation()
yield result
def caller():
g = generator()
for i in range(n):
element = next(g)
another_very_long_computation()
Basically, I'd like to overlap the execution of very_long_computation() and another_very_long_computation() as much as possible.
Is there simple way to make the generator asynchronous? I'd like the generator to start computing the next iteration of the while loop right after result has been yielded, so that (ideally) the next result is ready to be yielded before the succesive next() call in caller().

There is no simple way, especially since you've got very_long_computation and another_very_long_computation instead of very_slow_io. Even if you moved generator into its own thread, you'd be limited by CPython's global interpreter lock, preventing any performance benefit.
You could move the work into a worker process, but the multiprocessing module isn't the drop-in replacement for threading it likes to pretend to be. It's full of weird copy semantics, unintuitive restrictions, and platform-dependent behavior, as well as just having a lot of communication overhead.
If you've got I/O along with your computation, it's fairly simple to shove the generator's work into its own thread to at least get some work done during the I/O:
from queue import Queue
import threading
def worker(queue, n):
gen = generator()
for i in range(n):
queue.put(next(gen))
def caller():
queue = Queue()
worker_thread = threading.Thread(worker, args=(queue, n))
worker_thread.start()
for i in range(n):
element = queue.get()
another_very_long_computation()

Related

Fill a Queue with Objects from several data loaders using multiprocessing

I work on a machine learning input pipeline. I wrote a data loader that reads in data from a large .hdf file and returns slices, which takes roughly 2 seconds per slice. Therefore I would like to use a queue, that takes in objects from several data loaders and can return single objects from the queue via a next function (like a generator). Furthermore the processes that fill the queue should run somehow in the background, refilling the queue when it is not full. I do not get it to work properly. It worked with a single dataloader, giving me 4 times the same slices..
import multiprocessing as mp
class Queue_Generator():
def __init__(self, data_loader_list):
self.pool = mp.Pool(4)
self.data_loader_list = data_loader_list
self.queue = mp.Queue(maxsize=16)
self.pool.map(self.fill_queue, self.data_loader_list)
def fill_queue(self,gen):
self.queue.put(next(gen))
def __next__(self):
yield self.queue.get()
What I get from this:
NotImplementedError: pool objects cannot be passed between processes or pickled
Thanks in advance

Your specific error means that you cannot have a pool as part of your class when you are passing class methods to a pool. What I would suggest could be the following:
import multiprocessing as mp
from queue import Empty
class QueueGenerator(object):
def __init__(self, data_loader_list):
self.data_loader_list = data_loader_list
self.queue = mp.Queue(maxsize=16)
def __iter__(self):
processes = list()
for _ in range(4):
pr = mp.Process(target=fill_queue, args=(self.queue, self.data_loader_list))
pr.start()
processes.append(pr)
return self
def __next__(self):
try:
return self.queue.get(timeout=1) # this should have a value, otherwise your loop will never stop. make it something that ensures your processes have enough time to update the queue but not too long that your program freezes for an extended period of time after all information is processed
except Empty:
raise StopIteration
# have fill queue as a separate function
def fill_queue(queue, gen):
while True:
try:
value = next(gen)
queue.put(value)
except StopIteration: # assumes the given data_loader_list is an iterator
break
print('stopping')
gen = iter(range(70))
qg = QueueGenerator(gen)
for val in qg:
print(val)
# test if it works several times:
for val in qg:
print(val)
The next issue for you to solve I think is to have the data_loader_list be something that provides new information in every separate process. But since you have not given any information about that I can't help you with that. The above does however provide you a way to have the processes fill your queue which is then passed out as an iterator.

Not quite sure why you are yielding in __next__, that doesn't look quite right to me. __next__ should return a value, not a generator object.
Here is a simple way that you can return the results of parallel functions as a generator. It may or may not meet your specific requirements but can be tweaked to suit. It will keep on processing data_loader_list until it is exhausted. This may use a lot of memory compared to keeping, for example, 4 items in a Queue at all times.
import multiprocessing as mp
def read_lines(data_loader):
from time import sleep
sleep(2)
return f'did something with {data_loader}'
def make_gen(data_loader_list):
with mp.Pool(4) as pool:
for result in pool.imap(read_lines, data_loader_list):
yield result
if __name__ == '__main__':
data_loader_list = [i for i in range(15)]
result_generator = make_gen(data_loader_list)
print(type(result_generator))
for i in result_generator:
print(i)
Using imap means that the results can be processed as they are produced. map and map_async would block in the for loop until all results were ready. See this question for more.

Python run two loops at the same time where one is rate limited and depends on data from the other

I have a problem in python where I want to run two loops at the same time. I feel like I need to do this because the second loop needs to be rate limited, but the first loop really shouldn't be rate limited. Also, the second loop takes an input from the first.
I'm looking fro something that works something like this:
for line in file:
do some stuff
list = []
list.append("an_item")
Rate limited:
for x in list:
do some stuff simultaneously

There are two basic approaches with different tradeoffs: synchronously switching between tasks, and running in threads or subprocesses. First, some common setup:
from queue import Queue # or Queue, if python 2
work = Queue()
def fast_task():
""" Do the fast thing """
if done:
return None
else:
return result
def slow_task(arg):
""" Do the slow thing """
RATE_LIMIT = 30 # seconds
Now, the synchronous approach. It has the advantage of being much simpler, and easier to debug, at the cost of being a bit slower. How much slower depends on the details of your tasks. How it works is, we run a tight loop that calls the fast job every time, and the slow job only if enough time has passed. If the fast job is no longer producing work and the queue is empty, we quit.
import time
last_call = 0
while True:
next_job = fast_task()
if next_job:
work.put(next_job)
elif work.empty():
# nothing left to do
break
else:
# fast task has done all its work - short sleep to slow the spin
time.sleep(.1)
now = time.time()
if now - last_call > RATE_LIMIT:
last_call = now
slow_task(work.get())
If you feel like this doesn't work fast enough, you can try the multiprocessing approach. You can use the same structure for working with threads or processes, depending on whether you import from multiprocessing.dummy or multiprocessing itself. We use a multiprocessing.Queue for communication instead of queue.Queue.
def do_the_fast_loop(work_queue):
while True:
next_job = fast_task()
if next_job:
work_queue.put(next_job)
else:
work_queue.put(None) # sentinel - tells slow process to quit
break
def do_the_slow_loop(work_queue):
next_call = time.time()
while True:
job = work_queue.get()
if job is None: # sentinel seen - no more work to do
break
time.sleep(max(0, next_call - time.time()))
next_call = time.time() + RATE_LIMIT
slow_task(job)
if __name__ == '__main__':
# from multiprocessing.dummy import Queue, Process # for threads
from multiprocessing import Queue, Process # for processes
work = Queue()
fast = Process(target=fast_task, args=(work,))
slow = Process(target=slow_task, args=(work,))
fast.start()
slow.start()
fast.join()
slow.join()
As you can see, there's quite a lot more machinery for you to implement, but it will be somewhat faster. Again, how much faster depends a lot on your tasks. I'd try all three approaches - synchronous, threaded, and multiprocess - and see which you like best.

You need to do 2 things:
Put the function require data from the other on its own process
Implement a way to communicate between the two processes (e.g. Queue)
All of this must be done thanks to the GIL.

multithreading check membership in Queue and stop the threads

I want to iterate over a list using 2 thread. One from leading and other from trailing, and put the elements in a Queue on each iteration. But before putting the value in Queue I need to check for existence of the value within Queue (its when that one of the threads has putted that value in Queue), So when this happens I need to stop the thread and return list of traversed values for each thread.
This is what I have tried so far :
from Queue import Queue
from threading import Thread, Event
class ThreadWithReturnValue(Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs={}, Verbose=None):
Thread.__init__(self, group, target, name, args, kwargs, Verbose)
self._return = None
def run(self):
if self._Thread__target is not None:
self._return = self._Thread__target(*self._Thread__args,
**self._Thread__kwargs)
def join(self):
Thread.join(self)
return self._return
main_path = Queue()
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
def a(main_path,g,l=[]):
for i in g:
l.append(i)
print 'a'
if is_in_queue(i,main_path):
return l
main_path.put(i)
def b(main_path,g,l=[]):
for i in g:
l.append(i)
print 'b'
if is_in_queue(i,main_path):
return l
main_path.put(i)
g=['a','b','c','d','e','f','g','h','i','j','k','l']
t1 = ThreadWithReturnValue(target=a, args=(main_path,g))
t2 = ThreadWithReturnValue(target=b, args=(main_path,g[::-1]))
t2.start()
t1.start()
# Wait for all produced items to be consumed
print main_path.join()
I used ThreadWithReturnValue that will create a custom thread that returns the value.
And for membership checking I used the following function :
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
Now if I first start the t1 and then the t2 I will get 12 a then one b then it doesn't do any thing and I need to terminate the python manually!
But if I first run the t2 then t1 I will get the following result:
b
b
b
b
ab
ab
b
b
b
b
a
a
So my questions is that why python treads different in this cases? and how can I terminate the threads and make them communicate with each other?

Before we get into bigger problems, you're not using Queue.join right.
The whole point of this function is that a producer who adds a bunch of items to a queue can wait until the consumer or consumers have finished working on all of those items. This works by having the consumer call task_done after they finish working on each item that they pulled off with get. Once there have been as many task_done calls as put calls, the queue is done. You're not doing a get anywhere, much less a task_done, so there's no way the queue can ever be finished. So, that's why you block forever after the two threads finish.
The first problem here is that your threads are doing almost no work outside of the actual synchronization. If the only thing they do is fight over a queue, only one of them is going to be able to run at a time.
Of course that's common in toy problems, but you have to think through your real problem:
If you're doing a lot of I/O work (listening on sockets, waiting for user input, etc.), threads work great.
If you're doing a lot of CPU work (calculating primes), threads don't work in Python because of the GIL, but processes do.
If you're actually primarily dealing with synchronizing separate tasks, neither one is going to work well (and processes will be worse). It may still be simpler to think in terms of threads, but it'll be the slowest way to do things. You may want to look into coroutines; Greg Ewing has a great demonstration of how to use yield from to use coroutines to build things like schedulers or many-actor simulations.
Next, as I alluded to in your previous question, making threads (or processes) work efficiently with shared state requires holding locks for as short a time as possible.
So, if you have to search a whole queue under a lock, that had better be a constant-time search, not a linear-time search. That's why I suggested using something like an OrderedSet recipe rather than a list, like the one inside the stdlib's Queue.Queue. Then this function:
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
… is only blocking the queue for a tiny fraction of a second—just long enough to look up a hash value in a table, instead of long enough to compare every element in the queue against x.
Finally, I tried to explain about race conditions on your other question, but let me try again.
You need a lock around every complete "transaction" in your code, not just around the individual operations.
For example, if you do this:
with queue locked:
see if x is in the queue
if x was not in the queue:
with queue locked:
add x to the queue
… then it's always possible that x was not in the queue when you checked, but in the time between when you unlocked it and relocked it, someone added it. This is exactly why it's possible for both threads to stop early.
To fix this, you need to put a lock around the whole thing:
with queue locked:
if x is not in the queue:
add x to the queue
Of course this goes directly against what I said before about locking the queue for as short a time as possible. Really, that's what makes multithreading hard in a nutshell. It's easy to write safe code that just locks everything for as long as might conceivably be necessary, but then your code ends up only using a single core, while all the other threads are blocked waiting for the lock. And it's easy to write fast code that just locks everything as briefly as possible, but then it's unsafe and you get garbage values or even crashes all over the place. Figuring out what needs to be a transaction, and how to minimize the work inside those transactions, and how to deal with the multiple locks you'll probably need to make that work without deadlocking them… that's not so easy.

A couple of things that I think can be improved:
Due to the GIL, you might want to use the multiprocessing (rather than threading) module. In general, CPython threading will not cause CPU intensive work to speed up. (Depending on what exactly is the context of your question, it's also possible that multiprocessing won't, but threading almost certainly won't.)
A function like your is_inqueue would likely lead to high contention.
The locked time seems linear in the number of items that need to be traversed:
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
So, instead, you could possibly do the following.
Use multiprocessing with a shared dict:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
# Fn definitions and such
p1 = Process(target=p1, args=(d,))
p2 = Process(target=p2, args=(d,))
within each function, check for the item like this:
def p1(d):
# Stuff
if 'foo' in d:
return

parallel writing to list in python

I got multiple parallel processes writing into one list in python. My code is:
global_list = []
class MyThread(threading.Thread):
...
def run(self):
results = self.calculate_results()
global_list.extend(results)
def total_results():
for param in params:
t = MyThread(param)
t.start()
while threading.active_count() > 1:
pass
return total_results
I don't like this aproach as it has:
An overall global variable -> What would be the way to have a local variable for the `total_results function?
The way I check when the list is returned seems somewhat clumsy, what would be the standard way?

Is your computation CPU-intensive? If so you should look at the multiprocessing module which is included with Python and offers a fairly easy to use Pool class into which you can feed compute tasks and later get all the results. If you need a lot of CPU time this will be faster anyway, because Python doesn't do threading all that well: only a single interpreter thread can run at a time in one process. Multiprocessing sidesteps that (and offers the Pool abstraction which makes your job easier). Oh, and if you really want to stick with threads, multiprocessing has a ThreadPool too.

1 - Use a class variable shared between all Worker's instances to append your results
from threading import Thread
class Worker(Thread):
results = []
...
def run(self):
results = self.calculate_results()
Worker.results.extend(results) # extending a list is thread safe
2 - Use join() to wait untill all the threads are done and let them have some computational time
def total_results(params):
# create all workers
workers = [Worker(p) for p in params]
# start all workers
[w.start() for w in workers]
# wait for all of them to finish
[w.join() for w in workers]
#get the result
return Worker.results

What kind of problems (if any) would there be combining asyncio with multiprocessing?

As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?

You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.

Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res

See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.