Transform callbacks to generator in Python? - python

Let's say we have some library (eg. for XML parsing) that accepts a callback and calls it everytime it encounters some event (eg. find some XML tag). I'd like to be able to transform those callbacks into a generator that can be iterated via the for loop. Is that possible in Python without using threads or collecting all the callback results (ie. with lazy evaluation)?
Example:
# this is how I can produce the items
def callback(item)
# do something with each item
parser.parse(xml_file, callback=callback)
# this is how the items should be consumed
for item in iter_parse(xml_file):
print(item)
I've tried to study if coroutines could be used but it seems that coroutines are useful for pushing data from the producer, while generator pull data to the consumer.
The natural idea was that the producer and consumer would be coroutines that would ping the execution flow back and forth.
I've managed to get a producer-consumer pattern working with the asyncio loop (in a similar way to this answer). However it cannot be used like a generator in a for loop:
import asyncio
q = asyncio.Queue(maxsize=1)
#asyncio.coroutine
def produce(data):
for v in data:
print("Producing:", v)
yield from q.put(v)
print("Producer waiting")
yield from q.put(None)
print("Producer done")
#asyncio.coroutine
def consume():
while True:
print("Consumer waiting")
value = yield from q.get()
print("Consumed:", value)
if value is not None:
# process the value
yield from asyncio.sleep(0.5)
else:
break
print("Consumer done")
tasks = [
asyncio.Task(consume()),
asyncio.Task(produce(data=range(5)))
]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
The problem is that the result cannot be iterated in a for loop since it is managed by the loop.
When I rewrite the code so that the callback is called from an ordinary function, the problem is that asyncio.Queue.put() called from the callback doesn't block and the computation is not lazy.
import asyncio
q = asyncio.Queue(maxsize=1)
def parse(data, callback):
for value in data:
# yield from q.put(value)
callback(value)
#asyncio.coroutine
def produce(data):
#asyncio.coroutine
def enqueue(value):
print('enqueue()', value)
yield from q.put(value)
def callback(value):
print('callback()', value)
asyncio.async(enqueue(value))
parse(data, callback)
print('produce()')
print('produce(): enqueuing sentinel value')
asyncio.async(enqueue(None))
print('produce(): done')
#asyncio.coroutine
def consume():
print('consume()')
while True:
print('consume(): waiting')
value = yield from q.get()
print('consumed:', value)
if value is not None:
# here we'd like to yield and use this in a for loop elsewhere
print(value)
else:
break
print('consume(): done')
tasks = [
asyncio.Task(consume()),
asyncio.Task(produce(range(5)))
]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
# I'd like:
# for value in iter_parse(data=range(5)):
# print('consumed:', value)
It this kind of computation even possible with asyncio or do I need to use greenlet or gevent? I seems in gevent it is possible to iterate over async results in for loop but I don't like to depend on another library if possible and it is not completely ready for Python 3.

Related

Python How to check whether the variable state is changed which is shared and edited in another scheduled thread without using while loop to check

My API is to receive users' texts within 900ms and they will be sent to the model to calculate their length (just for a simple demo). I already realized it but the way is ugly. I will open a new background schedule thread. And API receives the query in the main thread, it will put it in the queue which is shared by the main and new thread. And the new thread will schedule get all texts in the queue and send them to the model. After the model calculated them, results are stored in a shared dict. In the main thread, get_response method will use a while loop to check the result in the shared dict, my question is how can I get rid of the while loop in get_response method. I wanna an elegant method. Thx!
this is server code, need to remove while sleep in get-response because it's ugly :
import asyncio
import uuid
from typing import Union, List
import threading
from queue import Queue
from fastapi import FastAPI, Request, Body, APIRouter
from fastapi_utils.tasks import repeat_every
import uvicorn
import time
import logging
import datetime
logger = logging.getLogger(__name__)
app = APIRouter()
def feed_data_into_model(queue,shared_dict,lock):
if queue.qsize() != 0:
data = []
ids = []
while queue.qsize() != 0:
task = queue.get()
task_id = task[0]
ids.append(task_id)
text = task[1]
data.append(text)
result = model_work(data)
# print("model result:",result)
for index,task_id in enumerate(ids):
value = result[index]
handle_dict(task_id,value,action = "put",lock=lock, shared_dict = shared_dict)
class TestThreading(object):
def __init__(self, interval, queue,shared_dict,lock):
self.interval = interval
thread = threading.Thread(target=self.run, args=(queue,shared_dict,lock))
thread.daemon = True
thread.start()
def run(self,queue,shared_dict,lock):
while True:
# More statements comes here
# print(datetime.datetime.now().__str__() + ' : Start task in the background')
feed_data_into_model(queue,shared_dict,lock)
time.sleep(self.interval)
if __name__ != "__main__":
# since uvicorn will init and reload the file, and __name__ will change, not as __main__, so I init variable here
# otherwise, we will have 2 background thread (one is empty) , it doesn't run but hard to debug due to the confusion
global queue, shared_dict, lock
queue = Queue(maxsize=64) #
shared_dict = {} # model result saved here!
lock = threading.Lock()
tr = TestThreading(0.9, queue,shared_dict,lock)
def handle_dict(key, value = None, action = "put", lock = None, shared_dict = None):
lock.acquire()
try:
if action == "put":
shared_dict[key] = value
elif action == "delete":
del shared_dict[key]
elif action == "get":
value = shared_dict[key]
elif action == "exist":
value = key in shared_dict
else:
pass
finally:
# Always called, even if exception is raised in try block
lock.release()
return value
def model_work(x:Union[str,List[str]]):
time.sleep(3)
if isinstance(x,str):
result = [len(x)]
else:
result = [len(_) for _ in x]
return result
async def get_response(task_id, lock, shared_dict):
not_exist_flag = True
while not_exist_flag:
not_exist_flag = handle_dict(task_id, None, action= "exist",lock=lock, shared_dict = shared_dict) is False
await asyncio.sleep(0.02)
value = handle_dict(task_id, None, action= "get", lock=lock, shared_dict = shared_dict)
handle_dict(task_id, None, action= "delete",lock=lock, shared_dict = shared_dict)
return value
#app.get("/{text}")
async def demo(text:str):
global queue, shared_dict, lock
task_id = str(uuid.uuid4())
logger.info(task_id)
state = "pending"
item= [task_id,text,state,""]
queue.put(item)
# TODO: await query_from_answer_dict , need to change since it's ugly to while wait the answer
value = await get_response(task_id, lock, shared_dict)
return 1
if __name__ == "__main__":
# what I want to do:
# single process run every 900ms, if queue is not empty then pop them out to model
# and model will save result in thread-safe dict, key is task-id
uvicorn.run("api:app", host="0.0.0.0", port=5555)
client code:
for n in {1..5}; do curl http://localhost:5555/a & ; done
The usual way to run a blocking task in asyncio code is to use asyncio's builtin run_in_executor to handle if for you. You can either setup an executor, or let it do it for you:
import asyncio
from time import sleep
def proc(t):
print("in thread")
sleep(t)
return f"Slept for {t} seconds"
async def submit_task(t):
print("submitting:", t)
res = await loop.run_in_executor(None, proc, t)
print("got:", res)
async def other_task():
for _ in range(4):
print("poll!")
await asyncio.sleep(1)
loop = asyncio.new_event_loop()
loop.create_task(other_task())
loop.run_until_complete(submit_task(3))
Note that if loop is not defined globally, you can get it inside the function with asyncio.get_event_loop(). I've deliberately used a simple example without fastapi/uvicorn to illustrate the point, but the idea is the same: fastapi (etc) just run in the event loop, which is why you define coroutines for the endpoints.
The advantage of this is that we can simply await the response directly, without messing about with awaiting an event and then using some other means (shared dict with mutex, pipe, queue, whatever) to get the result out, which keeps the code clean and readable, and is likely also a good deal quicker. If, for some reason, we want to make sure it runs in processes and not threads we can make our own executor:
from concurrent.futures import ProcessPoolExecutor
e = ProcessPoolExecutor()
...
res = await loop.run_in_executor(e, proc, t)
See the docs for more information.
Another option would be using a multiprocessing.Pool to run the task, and then apply_async. But you can't await multiprocessing futures directly. There is a library aiomultiprocessing to make the two play together but I have no experience with it and cannot see a reason to prefer it over the builtin executor for this case (running a single background task per invocation of the coro).
Lastly do note that the main reason to avoid a polling while loop is not that it's ugly (although it is), but that it's not nearly as performant as almost any other solution.
I think I already got the answer that is using asyncio.event to communicate across threads. Using set, clear, wait and asyncio.get_event_loop().

Multiprocess Queue synchronization with asyncio

I want to gather data from asyncio loops running in sibling processes with Python 3.7
Ideally I would use a multiprocess.JoinableQueue, relaying on its join() call for synchronization.
However, its synchronization primitives block the event loop in full (see my partial answer below for an example).
Illustrative prototype:
class MP_GatherDict(dict):
'''A per-process dictionary which can be gathered from a single one'''
def __init__(self):
self.q = multiprocess.JoinableQueue()
super().__init__()
async def worker_process_server(self):
while True:
(await?) self.q.put(dict(self)) # Put a shallow copy
(await?) self.q.join() # Wait for it to be gathered
async def gather(self):
all_dicts = []
while not self.q.empty():
all_dicts.append(await self.q.get())
self.q.task_done()
return all_dicts
Note that the put->get->join->put flow might not work as expected but this question really is about using multiprocess primitives in asyncio event loop...
The question would then be how to best await for multiprocess primitives from an asyncio event loop?
This test shows that multiprocess.Queue.get() blocks the whole event loop:
mp_q = mp.JoinableQueue()
async def mp_queue_wait():
try:
print('Queue:',mp_q.get(timeout=2))
except Exception as ex:
print('Queue:',repr(ex))
async def main_loop_task():
task = asyncio.get_running_loop().create_task(mp_queue_wait())
for i in range(3):
print(i, os.times())
await asyncio.sleep(1)
await task
print(repr(task))
asyncio.run(main_loop_task())
Whose output is:
0 posix.times_result(user=0.41, system=0.04, children_user=0.0, children_system=0.0, elapsed=17208620.18)
Queue: Empty()
1 posix.times_result(user=0.41, system=0.04, children_user=0.0, children_system=0.0, elapsed=17208622.18)
2 posix.times_result(user=0.41, system=0.04, children_user=0.0, children_system=0.0, elapsed=17208623.18)
<Task finished coro=<mp_queue_wait() done,...> result=None>
So I am looking at asyncio.loop.run_in_executor() as the next possible answer, however spawning an executor/thread just for this seems overkill...
Here is same test using the default executor:
async def mp_queue_wait():
try:
result = await asyncio.get_running_loop().run_in_executor(None,mp_q.get,True,2)
except Exception as ex:
result = ex
print('Queue:',repr(result))
return result
And the (desired) result:
0 posix.times_result(user=0.36, system=0.02, children_user=0.0, children_system=0.0, elapsed=17210674.65)
1 posix.times_result(user=0.37, system=0.02, children_user=0.0, children_system=0.0, elapsed=17210675.65)
Queue: Empty()
2 posix.times_result(user=0.37, system=0.02, children_user=0.0, children_system=0.0, elapsed=17210676.66)
<Task finished coro=<mp_queue_wait() done, defined at /home/apozuelo/Documents/5G_SBA/Tera5G/services/db.py:211> result=Empty()>
This comes bit late, but.
You need to create an async wrapper around the mp.JoinableQueue() since both get()and put() block the whole process (GIL).
There are two approaches for this:
Use threads
Use asyncio.sleep() and get_nowait(), put_nowait() methods.
I chose the option 2 since it is easy.
from queue import Queue, Full, Empty
from typing import Any, Generic, TypeVar
from asyncio import sleep
T= TypeVar('T')
class AsyncQueue(Generic[T]):
"""Async wrapper for queue.Queue"""
SLEEP: float = 0.01
def __init__(self, queue: Queue[T]):
self._Q : Queue[T] = queue
async def get(self) -> T:
while True:
try:
return self._Q.get_nowait()
except Empty:
await sleep(self.SLEEP)
async def put(self, item: T) -> None:
while True:
try:
self._Q.put_nowait(item)
return None
except Full:
await sleep(self.SLEEP)
def task_done(self) -> None:
self._Q.task_done()
return None

How to communicate between traditional thread and asyncio thread in Python?

In python, what's the idiomatic way to establish a one-way communication between two threading.Threads, call them thread a and thread b.
a is the producer, it continuously generates values for b to consume.
b is the consumer, it reads one value generated by a, process the value with a coroutine, and then reads the next value, and so on.
Illustration:
q = very_magic_queue.Queue()
def worker_of_a(q):
while True:
q.put(1)
time.sleep(1)
a = threading.Thread(worker_of_a, args=(q,))
a.start()
async def loop(q):
while True:
# v must be processed in the same order as they are produced
v = await q.get()
print(v)
async def foo():
pass
async def b_main(q):
loop_fut = asyncio.ensure_future(loop(q))
foo_fut = asyncio.ensure_future(foo())
_ = await asyncio.wait([loop_fut, foo_fut], ...)
# blah blah blah
def worker_of_b(q):
asyncio.set_event_loop(asyncio.new_event_loop())
asyncio.get_event_loop().run_until_complete(b_main(q))
b = threading.Thread(worker_of_b, args=(q,))
b.start()
Of course the above code doesn't work, because queue.Queue.get cannot be awaitted, and asyncio.Queue cannot be used in another thread.
I also need a communication channel from b to a.
I would be great if the solution could also work with gevent.
Thanks :)
I had a similar problem -communicate data between a thread and asyncio. The solution I used is to create a sync Queue and add methods for async get and async put using asyncio.sleep to make it non-blocking.
Here is my queue class:
#class to provide queue (sync or asyc morph)
class queMorph(queue.Queue):
def __init__(self,qSize,qNM):
super().__init__(qSize)
self.timeout=0.018
self.me=f'queMorph-{qNM}'
#Introduce methods for async awaitables morph of Q
async def aget(self):
while True:
try:
return self.get_nowait()
except queue.Empty:
await asyncio.sleep(self.timeout)
except Exception as E:
raise
async def aput(self,data):
while True:
try:
return self.put_nowait(data)
except queue.Full:
print(f'{self.me} Queue full on put..')
await asyncio.sleep(self.timeout)
except Exception as E:
raise
To put/get items from queue from the thread (synchronous), use the normal q.get() and q.put() blocking functions.
In the async loop, use q.aget() and q.aput() which do not block.
You can use a synchronized queue from the queue module and defer the wait to a ThreadPoolExecutor:
async def loop(q):
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=1) as executor:
loop = asyncio.get_event_loop()
while True:
# v must be processed in the same order as they are produced
v = await loop.run_in_executor(executor, q.get)
print(v)
I've used Janus to solve this problem - it's a Python library that gives you a thread-safe queue that can be used to communicate between asyncio and a thread.
def threaded(sync_q):
for i in range(100):
sync_q.put(i)
sync_q.join()
async def async_code(async_q):
for i in range(100):
val = await async_q.get()
assert val == i
async_q.task_done()
queue = janus.Queue()
fut = loop.run_in_executor(None, threaded, queue.sync_q)
await async_code(queue.async_q)

Problems with asyncio in 3.4.2 - it just terminates for some reason

A newbie at python and spending many hours reading docs and other code I cannot seem to get the new asyncio module in Python 3.
It keeps on terminating without a stack trace to give me a clue and should run forever but does not.
The fundamental process concept I am trying to emulate is the following:
read from port:
open port -> read data (variable length) -> place on queue1
then process data:
get data from queue1 -> condition applies -> outcome put on queue2
then write to port:
get data from queue2 and write to port
loop around from top forever
Note: The data on the in port is sporadic, variable length and several blocks may arrive out of 'sequence' thus I use asyncio. I understand asyncio will allow the case of a block arrives, then another prior to my app responding - i.e. the call get_io_from_port() facilitates multiple executions of the co-routine. This is why I have use the queues to ensure non-blocking of the process_queue()
My toy example code so far:
import queue
import asyncio
#asyncio.coroutine
def process_queue(q1, q2):
tmp = q1.Get()
if tmp == 'ABCDEF':
q2.put('12345')
elif tmp == 'GHIJKL':
q2.put =('67890')
else:
print('There is a data error')
#asyncio.coroutine
def put_io_to_port(writer, q2):
if not q2.empty():
try:
writer.write(q2.get())
except IOError as e:
print('OUT Port issue: ', e)
#asyncio.coroutine
def get_io_from_port(reader, q1):
try:
data_i = yield from reader.read(1200)
q1.put(data_i)
except IOError as e:
print('IN Port issue: ', e)
def main():
q1 = queue()
q2 = queue()
loop = asyncio.get_event_loop() # main loop declaration
reader, writer = yield from asyncio.open_connection('192.168.1.103', 5555)
# high-level call open streams - read and write
print('Start')
tasks = [
asyncio.async(get_io_from_port(reader,q1)),
asyncio.async(process_queue(q1, q2)),
asyncio.async(put_io_to_port(writer, q2)),] # do these tasks - in this order
loop.run_forever(tasks) # loop through on main loop forever
loop.close()
if __name__ == '__main__':
main()
Also, as an aside - how does one debug this code - ie tracing? What techniques could be suggested? I am using Eclipse and PyDev but to no avail.
You've made several mistakes here. First, you're treating main like its a normal function, but you've placed a yield from call in there, which will automatically convert it into a generator. That means when you do
if __name__ == "__main__":
main()
main is not actually executed; the call to main() just creates a generator object that's immediately thrown away (because you're not assigning it to a variable). This is why you're having a hard time debugging - none of the code inside main is even executing. You should convert main to be a coroutine and call it using loop.run_until_complete instead.
Next, you're trying to use the queue module, which is not designed for use in a single-threaded asynchronous program. As soon as you call queue.get(), it's going to block your main thread, which means your asyncio event loop will be blocked, which means your whole program will be deadlocked. You should use the coroutine-safe asyncio.Queue instead.
You also have a race condition in put_io_to_port. You're only trying to consume from q2 if it isn't empty, but its possible that put_io_to_port could execute before process_queue has a chance to run and populate the queue. It looks like you would be fine if you just removed the if not q2.empty() check from put_io_to_port altogether.
Finally, you're adding your coroutines to the event loop using asyncio.async, which is fine. But you have a comment that says # do these tasks, in this order, but that's not how the program will behave with asyncio.async. It just adds all the coroutines to the event loop, and they'll all run in parallel. If you really want them to run sequentially, you should just do:
yield from get_io_from_port(reader,q1)
yield from process_queue(q1, q2)
yield from put_io_to_port(writer, q2)
But that's really not necessary here. You can run all of them at the same time and get the correct behavior; if one coroutine executes ahead of the other, it will just wait until the coroutine it depends on passes it the data it needs, and then resume execution.
You also have a few typos in there (q1.Get(), q2.put =(...), etc).
So, put all those fixes together and you get this:
import queue
import asyncio
#asyncio.coroutine
def process_queue(q1, q2):
while True:
tmp = yield from q1.get()
if tmp == 'ABCDEF':
yield from q2.put('12345')
elif tmp == 'GHIJKL':
yield from q2.put('67890')
else:
print('There is a data error')
#asyncio.coroutine
def put_io_to_port(writer, q2):
while True:
try:
data = yield from q2.get()
writer.write(data)
except IOError as e:
print('OUT Port issue: ', e)
#asyncio.coroutine
def get_io_from_port(reader, q1):
while True:
try:
data_i = yield from reader.read(1200)
yield from q1.put(data_i)
except IOError as e:
print('IN Port issue: ', e)
#asyncio.coroutine
def main():
q1 = asyncio.Queue()
q2 = asyncio.Queue()
reader, writer = yield from asyncio.open_connection('192.168.1.103', 5555)
# high-level call open streams - read and write
print('Start')
tasks = [
asyncio.async(get_io_from_port(reader,q1)),
asyncio.async(process_queue(q1, q2)),
asyncio.async(put_io_to_port(writer, q2)),]
if __name__ == '__main__':
loop = asyncio.get_event_loop() # main loop declaration
loop.run_until_complete(main())
import queue
import asyncio
#asyncio.coroutine
def process_queue(q1, q2):
while True:
tmp = yield from q1.get()
if tmp == 'ABCDEF':
yield from q2.put('12345')
elif tmp == 'GHIJKL':
yield from q2.put('67890')
else:
print('There is a data error')
#asyncio.coroutine
def put_io_to_port(writer, q2):
while True:
try:
data = yield from q2.get()
writer.write(data)
except IOError as e:
print('OUT Port issue: ', e)
#asyncio.coroutine
def get_io_from_port(reader, q1):
while True:
try:
data_i = yield from reader.read(1200)
yield from q1.put(data_i)
except IOError as e:
print('IN Port issue: ', e)
#asyncio.coroutine
def main():
q1 = asyncio.Queue()
q2 = asyncio.Queue()
reader, writer = yield from asyncio.open_connection('192.168.1.103', 5555)
# high-level call open streams - read and write
print('Start')
asyncio.async(get_io_from_port(reader,q1)) # changed items so not
asyncio.async(process_queue(q1, q2)) # in task list otherwise
asyncio.async(put_io_to_port(writer, q2)) # they are not visible
if __name__ == '__main__':
loop = asyncio.get_event_loop() # main loop declaration
loop.run_until_complete(main())
Find comments inline with code to understand the problem.

Turn functions with a callback into Python generators?

The Scipy minimization function (just to use as an example), has the option of adding a callback function at each step. So I can do something like,
def my_callback(x):
print x
scipy.optimize.fmin(func, x0, callback=my_callback)
Is there a way to use the callback function to create a generator version of fmin, so that I could do,
for x in my_fmin(func,x0):
print x
It seems like it might be possible with some combination of yields and sends, but I can quite think of anything.
As pointed in the comments, you could do it in a new thread, using Queue. The drawback is that you'd still need some way to access the final result (what fmin returns at the end). My example below uses an optional callback to do something with it (another option would be to just yield it also, though your calling code would have to differentiate between iteration results and final results):
from thread import start_new_thread
from Queue import Queue
def my_fmin(func, x0, end_callback=(lambda x:x), timeout=None):
q = Queue() # fmin produces, the generator consumes
job_done = object() # signals the processing is done
# Producer
def my_callback(x):
q.put(x)
def task():
ret = scipy.optimize.fmin(func,x0,callback=my_callback)
q.put(job_done)
end_callback(ret) # "Returns" the result of the main call
# Starts fmin in a new thread
start_new_thread(task,())
# Consumer
while True:
next_item = q.get(True,timeout) # Blocks until an input is available
if next_item is job_done:
break
yield next_item
Update: to block the execution of the next iteration until the consumer has finished processing the last one, it's also necessary to use task_done and join.
# Producer
def my_callback(x):
q.put(x)
q.join() # Blocks until task_done is called
# Consumer
while True:
next_item = q.get(True,timeout) # Blocks until an input is available
if next_item is job_done:
break
yield next_item
q.task_done() # Unblocks the producer, so a new iteration can start
Note that maxsize=1 is not necessary, since no new item will be added to the queue until the last one is consumed.
Update 2: Also note that, unless all items are eventually retrieved by this generator, the created thread will deadlock (it will block forever and its resources will never be released). The producer is waiting on the queue, and since it stores a reference to that queue, it will never be reclaimed by the gc even if the consumer is. The queue will then become unreachable, so nobody will be able to release the lock.
A clean solution for that is unknown, if possible at all (since it would depend on the particular function used in the place of fmin). A workaround could be made using timeout, having the producer raises an exception if put blocks for too long:
q = Queue(maxsize=1)
# Producer
def my_callback(x):
q.put(x)
q.put("dummy",True,timeout) # Blocks until the first result is retrieved
q.join() # Blocks again until task_done is called
# Consumer
while True:
next_item = q.get(True,timeout) # Blocks until an input is available
q.task_done() # (one "task_done" per "get")
if next_item is job_done:
break
yield next_item
q.get() # Retrieves the "dummy" object (must be after yield)
q.task_done() # Unblocks the producer, so a new iteration can start
Generator as coroutine (no threading)
Let's have FakeFtp with retrbinary function using callback being called with each successful read of chunk of data:
class FakeFtp(object):
def __init__(self):
self.data = iter(["aaa", "bbb", "ccc", "ddd"])
def login(self, user, password):
self.user = user
self.password = password
def retrbinary(self, cmd, cb):
for chunk in self.data:
cb(chunk)
Using simple callback function has disadvantage, that it is called repeatedly and the callback
function cannot easily keep context between calls.
Following code defines process_chunks generator, which will be able receiving chunks of data one
by one and processing them. In contrast to simple callback, here we are able to keep all the
processing within one function without losing context.
from contextlib import closing
from itertools import count
def main():
processed = []
def process_chunks():
for i in count():
try:
# (repeatedly) get the chunk to process
chunk = yield
except GeneratorExit:
# finish_up
print("Finishing up.")
return
else:
# Here process the chunk as you like
print("inside coroutine, processing chunk:", i, chunk)
product = "processed({i}): {chunk}".format(i=i, chunk=chunk)
processed.append(product)
with closing(process_chunks()) as coroutine:
# Get the coroutine to the first yield
coroutine.next()
ftp = FakeFtp()
# next line repeatedly calls `coroutine.send(data)`
ftp.retrbinary("RETR binary", cb=coroutine.send)
# each callback "jumps" to `yield` line in `process_chunks`
print("processed result", processed)
print("DONE")
To see the code in action, put the FakeFtp class, the code shown above and following line:
main()
into one file and call it:
$ python headsandtails.py
('inside coroutine, processing chunk:', 0, 'aaa')
('inside coroutine, processing chunk:', 1, 'bbb')
('inside coroutine, processing chunk:', 2, 'ccc')
('inside coroutine, processing chunk:', 3, 'ddd')
Finishing up.
('processed result', ['processed(0): aaa', 'processed(1): bbb', 'processed(2): ccc', 'processed(3): ddd'])
DONE
How it works
processed = [] is here just to show, the generator process_chunks shall have no problems to
cooperate with its external context. All is wrapped into def main(): to prove, there is no need to
use global variables.
def process_chunks() is the core of the solution. It might have one shot input parameters (not
used here), but main point, where it receives input is each yield line returning what anyone sends
via .send(data) into instance of this generator. One can coroutine.send(chunk) but in this example it is done via callback refering to this function callback.send.
Note, that in real solution there is no problem to have multiple yields in the code, they are
processed one by one. This might be used e.g. to read (and ignore) header of CSV file and then
continue processing records with data.
We could instantiate and use the generator as follows:
coroutine = process_chunks()
# Get the coroutine to the first yield
coroutine.next()
ftp = FakeFtp()
# next line repeatedly calls `coroutine.send(data)`
ftp.retrbinary("RETR binary", cb=coroutine.send)
# each callback "jumps" to `yield` line in `process_chunks`
# close the coroutine (will throw the `GeneratorExit` exception into the
# `process_chunks` coroutine).
coroutine.close()
Real code is using contextlib closing context manager to ensure, the coroutine.close() is
always called.
Conclusions
This solution is not providing sort of iterator to consume data from in traditional style "from
outside". On the other hand, we are able to:
use the generator "from inside"
keep all iterative processing within one function without being interrupted between callbacks
optionally use external context
provide usable results to outside
all this can be done without using threading
Credits: The solution is heavily inspired by SO answer Python FTP “chunk” iterator (without loading entire file into memory)
written by user2357112
Concept Use a blocking queue with maxsize=1 and a producer/consumer model.
The callback produces, then the next call to the callback will block on the full queue.
The consumer then yields the value from the queue, tries to get another value, and blocks on read.
The producer is the allowed to push to the queue, rinse and repeat.
Usage:
def dummy(func, arg, callback=None):
for i in range(100):
callback(func(arg+i))
# Dummy example:
for i in Iteratorize(dummy, lambda x: x+1, 0):
print(i)
# example with scipy:
for i in Iteratorize(scipy.optimize.fmin, func, x0):
print(i)
Can be used as expected for an iterator:
for i in take(5, Iteratorize(dummy, lambda x: x+1, 0)):
print(i)
Iteratorize class:
from thread import start_new_thread
from Queue import Queue
class Iteratorize:
"""
Transforms a function that takes a callback
into a lazy iterator (generator).
"""
def __init__(self, func, ifunc, arg, callback=None):
self.mfunc=func
self.ifunc=ifunc
self.c_callback=callback
self.q = Queue(maxsize=1)
self.stored_arg=arg
self.sentinel = object()
def _callback(val):
self.q.put(val)
def gentask():
ret = self.mfunc(self.ifunc, self.stored_arg, callback=_callback)
self.q.put(self.sentinel)
if self.c_callback:
self.c_callback(ret)
start_new_thread(gentask, ())
def __iter__(self):
return self
def next(self):
obj = self.q.get(True,None)
if obj is self.sentinel:
raise StopIteration
else:
return obj
Can probably do with some cleaning up to accept *args and **kwargs for the function being wrapped and/or the final result callback.
How about
data = []
scipy.optimize.fmin(func,x0,callback=data.append)
for line in data:
print line
If not, what exactly do you want to do with the generator's data?
A variant of Frits' answer, that:
Supports send to choose a return value for the callback
Supports throw to choose an exception for the callback
Supports close to gracefully shut down
Does not compute a queue item until it is requested
The complete code with tests can be found on github
import queue
import threading
import collections.abc
class generator_from_callback(collections.abc.Generator):
def __init__(self, expr):
"""
expr: a function that takes a callback
"""
self._expr = expr
self._done = False
self._ready_queue = queue.Queue(1)
self._done_queue = queue.Queue(1)
self._done_holder = [False]
# local to avoid reference cycles
ready_queue = self._ready_queue
done_queue = self._done_queue
done_holder = self._done_holder
def callback(value):
done_queue.put((False, value))
cmd, *args = ready_queue.get()
if cmd == 'close':
raise GeneratorExit
elif cmd == 'send':
return args[0]
elif cmd == 'throw':
raise args[0]
def thread_func():
try:
cmd, *args = ready_queue.get()
if cmd == 'close':
raise GeneratorExit
elif cmd == 'send':
if args[0] is not None:
raise TypeError("can't send non-None value to a just-started generator")
elif cmd == 'throw':
raise args[0]
ret = expr(callback)
raise StopIteration(ret)
except BaseException as e:
done_holder[0] = True
done_queue.put((True, e))
self._thread = threading.Thread(target=thread_func)
self._thread.start()
def __next__(self):
return self.send(None)
def send(self, value):
if self._done_holder[0]:
raise StopIteration
self._ready_queue.put(('send', value))
is_exception, val = self._done_queue.get()
if is_exception:
raise val
else:
return val
def throw(self, exc):
if self._done_holder[0]:
raise StopIteration
self._ready_queue.put(('throw', exc))
is_exception, val = self._done_queue.get()
if is_exception:
raise val
else:
return val
def close(self):
if not self._done_holder[0]:
self._ready_queue.put(('close',))
self._thread.join()
def __del__(self):
self.close()
Which works as:
In [3]: def callback(f):
...: ret = f(1)
...: print("gave 1, got {}".format(ret))
...: f(2)
...: print("gave 2")
...: f(3)
...:
In [4]: i = generator_from_callback(callback)
In [5]: next(i)
Out[5]: 1
In [6]: i.send(4)
gave 1, got 4
Out[6]: 2
In [7]: next(i)
gave 2, got None
Out[7]: 3
In [8]: next(i)
StopIteration
For scipy.optimize.fmin, you would use generator_from_callback(lambda c: scipy.optimize.fmin(func, x0, callback=c))
Solution to handle non-blocking callbacks
The solution using threading and queue is pretty good, of high-performance and cross-platform, probably the best one.
Here I provide this not-too-bad solution, which is mainly for handling non-blocking callbacks, e.g. called from the parent function through threading.Thread(target=callback).start(), or other non-blocking ways.
import pickle
import select
import subprocess
def my_fmin(func, x0):
# open a process to use as a pipeline
proc = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
def my_callback(x):
# x might be any object, not only str, so we use pickle to dump it
proc.stdin.write(pickle.dumps(x).replace(b'\n', b'\\n') + b'\n')
proc.stdin.flush()
from scipy import optimize
optimize.fmin(func, x0, callback=my_callback)
# this is meant to handle non-blocking callbacks, e.g. called somewhere
# through `threading.Thread(target=callback).start()`
while select.select([proc.stdout], [], [], 0)[0]:
yield pickle.loads(proc.stdout.readline()[:-1].replace(b'\\n', b'\n'))
# close the process
proc.communicate()
Then you can use the function like this:
# unfortunately, `scipy.optimize.fmin`'s callback is blocking.
# so this example is just for showing how-to.
for x in my_fmin(lambda x: x**2, 3):
print(x)
Although This solution seems quite simple and readable, it's not as high-performance as the threading and queue solution, because:
Processes are much heavier than threadings.
Passing data through pipe instead of memory is much slower.
Besides, it doesn't work on Windows, because the select module on Windows can only handle sockets, not pipes and other file descriptors.
For a super simple approach...
def callback_to_generator():
data = []
method_with_callback(blah, foo, callback=data.append)
for item in data:
yield item
Yes, this isn't good for large data
Yes, this blocks on all items being processed first
But it still might be useful for some use cases :)
Also thanks to #winston-ewert as this is just a small variant on his answer :)

Categories