How to multithread call another endpoint method in python? - python

I have an api with 2 endpoints, one is a simple post receiving a json and the other is an endpoint which calls the 1st multiple times depending on the length of a list of jsons and save the return to a list.
First method
#app.route('/getAudience', methods=['POST', 'OPTIONS'])
def get_audience(audience_=None):
try:
if audience_:
audience = audience_
else:
audience = request.get_json()
except (BadRequest, ValueError):
return make_response(jsonify(exception_response), 500)
return get_audience_response(audience, exception_response)
Second method
#app.route('/getMultipleAudience', methods=['POST', 'OPTIONS'])
def get_multiple_audience():
try:
audiences = request.json
except (BadRequest, ValueError):
return make_response(jsonify(exception_response), 500)
response = []
for audience in audiences:
new_resp = json.loads(get_audience(audience).data)
response.append(new_resp)
return make_response(jsonify(response))
I wanted to call the first method starting a thread per object in the list of the second method so I tried this:
def get_multiple_audience():
with app.app_context():
try:
audiences = request.get_json()
except (BadRequest, ValueError):
return make_response(jsonify(exception_response), 500)
for audience in audiences:
thread = Thread(target=get_audience, args=audience)
thread.start()
thread.join()
return make_response(jsonify(response))
And got this error:
Exception in thread Thread-6:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Python27\lib\site-packages\flask_cors\decorator.py", line 123, in wrapped_function
options = get_cors_options(current_app, _options)
File "C:\Python27\lib\site-packages\flask_cors\core.py", line 286, in get_cors_options
options.update(get_app_kwarg_dict(appInstance))
File "C:\Python27\lib\site-packages\flask_cors\core.py", line 299, in get_app_kwarg_dict
app_config = getattr(app, 'config', {})
File "C:\Python27\lib\site-packages\werkzeug\local.py", line 347, in __getattr__
return getattr(self._get_current_object(), name)
File "C:\Python27\lib\site-packages\werkzeug\local.py", line 306, in _get_current_object
return self.__local()
File "C:\Python27\lib\site-packages\flask\globals.py", line 51, in _find_app
raise RuntimeError(_app_ctx_err_msg)
RuntimeError: Working outside of application context.
So then I tried to modify the first method like this:
#app.route('/getAudience', methods=['POST', 'OPTIONS'])
def get_audience(audience_=None):
with app.app_context():
try:
...
And got the same error. Can anyone give me a hint, advice, best practice or solution?

There are multiple problems here. Firstly, here:
for audience in audiences:
thread = Thread(target=get_audience, args=audience)
thread.start()
thread.join()
You are only waiting for the last thread to complete. You should have a list of all the threads, and wait for all of them to complete.
threads = []
for audience in audiences:
thread = Thread(target=get_audience, args=audience)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
Second problem is that you are returning a single response which isn't even set anywhere. But that's not how multi-threading works. You will have multiple results from all the threads and you will have to keep track of them. So you can create a results array to hold the answers for each thread's return value. Here I will make a simple function sum as an example.
results = []
threads = []
def sum(a, b):
results.append(a + b)
#app.route("/test")
def test():
with app.app_context():
for i in range(5):
t = Thread(target=sum, args=(1, 2))
threads.append(t)
t.start()
for t in threads:
t.join()
return jsonify(results)
This will happily work, and it will return the result of all the calls to sum() function.
Now if I change sum to:
#app.route("/mysum/a/b")
def sum(a, b):
results.append(a + b)
return jsonify(a + b)
I will get a similar error as the one you were getting earlier: namely, RuntimeError: Working outside of request context., even thought the return value would still be correct: [3, 3, 3, 3, 3]. What's happening here is that your sum function is now trying to return a flask response, but it is residing inside its own temporary thread and doesn't have access to any of flask's internal contexts. What you should do is to never return a value inside a temporary worker thread, but have a pool to store them for future reference.
But this doesn't mean you can't have a /mysum route. Indeed, you can, but the logic has to be separated. To put it all together:
results = []
threads = []
def sum(a, b):
return a + b
def sum_worker(a, b):
results.append(sum(a, b))
#app.route("/mysum/a/b")
def mysum(a, b):
return jsonify(sum(a, b))
#app.route("/test")
def test():
with app.app_context():
for i in range(5):
t = Thread(target=sum_worker, args=(1, 2))
threads.append(t)
t.start()
for t in threads:
t.join()
return jsonify(results)
Note that this code is very crude and is only for demonstration purposes. I can't recommend making global variables throughout your app, so some cleanup is required.

Related

I don't understand An asyncio.Future, a coroutine or an awaitable is required

I have issues understanding why this code doesn't work (text explanation of the problem below):
import asyncio
from random import randrange
def main_function():
n = randrange(2)
if n:
return 1
else:
return None
def backup_function():
return 1
async def run():
jobs1 = [main_function() for _ in range(0, 5)]
jobs2 = [backup_function() for _ in range(0, 5)]
_res = await asyncio.gather(*jobs1)
if len(_res) < 5:
_res = await asyncio.gather(*jobs2)
if __name__ == '__main__':
loop = asyncio.new_event_loop()
loop.run_until_complete(run())
Ok, let's say I have an async function run where I have a list of jobs jobs1 I want to get done by calling asyncio.gather. These jobs are simulated in my example by the function main_function, because, in short, the real function we are talking about here returns 'something' or None, whether it internally 'fails' or not, simulated with return 1 or return None. asyncio.gather returns a list with the results of the executed jobs, so my understanding. I thought, that await now waits for all jobs to finish, so, _rescould be an empty list or filled with a maximum of five 1s, depending on randrange(2) in this case, of course.
I now need to check, if all of the executions of my jobs1 returned a 1 or if there may have been one execution, which returned None, so I wanted to check that with if len(_res) < 5, and in this case, execute another asyncio.gather with another backup_function, but this leads to an error:
Traceback (most recent call last):
File "/home/cptsnuggles/run.py", line 290, in <module>
loop.run_until_complete(run())
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/cptsnuggles/run.py", line 282, in run
_res = await asyncio.gather(*jobs1, return_exceptions=True)
File "/usr/lib/python3.9/asyncio/tasks.py", line 827, in gather
fut = ensure_future(arg, loop=loop)
File "/usr/lib/python3.9/asyncio/tasks.py", line 680, in ensure_future
raise TypeError('An asyncio.Future, a coroutine or an awaitable is '
TypeError: An asyncio.Future, a coroutine or an awaitable is required
Process finished with exit code 1
I think, I'm missing something in my fundamental understanding here but I can't figure it out alone. Thank you for your replies!

How to mock a tornado coroutine function using mock framework for unit testing?

The title simply described my problem. I would like to mock "_func_inner_1" with specific return value. Thanks for any advises :)
code under test:
from tornado.gen import coroutine, Return
from tornado.testing import gen_test
from tornado.testing import AsyncTestCase
import mock
#coroutine
def _func_inner_1():
raise Return(1)
#coroutine
def _func_under_test_1():
temp = yield _func_inner_1()
raise Return(temp + 1)
But, this intuitive solution not work
class Test123(AsyncTestCase):
#gen_test
#mock.patch(__name__ + '._func_inner_1')
def test_1(self, mock_func_inner_1):
mock_func_inner_1.side_effect = Return(9)
result_1 = yield _func_inner_1()
print 'result_1', result_1
result = yield _func_under_test_1()
self.assertEqual(10, result, result)
With below error, seems _func_inner_1 is not patched due to it's coroutine nature
AssertionError: 2
if I add coroutine to patch returned mock function
#gen_test
#mock.patch(__name__ + '._func_inner_1')
def test_1(self, mock_func_inner_1):
mock_func_inner_1.side_effect = Return(9)
mock_func_inner_1 = coroutine(mock_func_inner_1)
result_1 = yield _func_inner_1()
print 'result_1', result_1
result = yield _func_under_test_1()
self.assertEqual(10, result, result)
the error becomes:
Traceback (most recent call last):
File "tornado/testing.py", line 118, in __call__
result = self.orig_method(*args, **kwargs)
File "tornado/testing.py", line 494, in post_coroutine
timeout=timeout)
File "tornado/ioloop.py", line 418, in run_sync
return future_cell[0].result()
File "tornado/concurrent.py", line 109, in result
raise_exc_info(self._exc_info)
File "tornado/gen.py", line 175, in wrapper
yielded = next(result)
File "coroutine_unit_test.py", line 39, in test_1
mock_func_inner_1 = coroutine(mock_func_inner_1)
File "tornado/gen.py", line 140, in coroutine
return _make_coroutine_wrapper(func, replace_callback=True)
File "tornado/gen.py", line 150, in _make_coroutine_wrapper
#functools.wraps(func)
File "functools.py", line 33, in update_wrapper
setattr(wrapper, attr, getattr(wrapped, attr))
File "mock.py", line 660, in __getattr__
raise AttributeError(name)
AttributeError: __name__
This is the closest solution I can find, but the mocking function will NOT be reset after test case execution, unlike what patch does
#gen_test
def test_4(self):
global _func_inner_1
mock_func_inner_1 = mock.create_autospec(_func_inner_1)
mock_func_inner_1.side_effect = Return(100)
mock_func_inner_1 = coroutine(mock_func_inner_1)
_func_inner_1 = mock_func_inner_1
result = yield _func_under_test_1()
self.assertEqual(101, result, result)
There are two issues here:
First is the interaction between #mock.patch and #gen_test. gen_test works by converting a generator into a "normal" function; mock.patch only works on normal functions (as far as the decorator can tell, the generator returns as soon as it reaches the first yield, so mock.patch undoes all its work). To avoid this problem, you can either reorder the decorators (always put #mock.patch before #gen_test, or use the with form of mock.patch instead of the decorator form.
Second, coroutines should never raise an exception. Instead, they return a Future which will contain a result or an exception. The special Return exception is encapsulated by the coroutine system; you would never raise it from a Future. When you create your mocks, you must create the appropriate Future and set it as the return value instead of using side_effect to raise on exception.
The complete solution is:
from tornado.concurrent import Future
from tornado.gen import coroutine, Return
from tornado.testing import gen_test
from tornado.testing import AsyncTestCase
import mock
#coroutine
def _func_inner_1():
raise Return(1)
#coroutine
def _func_under_test_1():
temp = yield _func_inner_1()
raise Return(temp + 1)
class Test123(AsyncTestCase):
#mock.patch(__name__ + '._func_inner_1')
#gen_test
def test_1(self, mock_func_inner_1):
future_1 = Future()
future_1.set_result(9)
mock_func_inner_1.return_value = future_1
result_1 = yield _func_inner_1()
print 'result_1', result_1
result = yield _func_under_test_1()
self.assertEqual(10, result, result)
import unittest
unittest.main()

add_callback with thread pool, but Exception about "Cannot write() after finish()."

I'm using thread pool while using Tornado to do some work. This is the code:
common/thread_pool.py
import tornado.ioloop
class Worker(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self._queue = queue
def run(self):
logging.info('Worker start')
while True:
content = self._queue.get()
if isinstance(content, str) and content == 'quit':
break
#content: (func, args, on_complete)
func = content[0]
args = content[1]
on_complete = content[2]
resp = func(args)
tornado.ioloop.IOLoop.instance().add_callback(lambda: on_complete(resp))
#i dont know is correct to call this
#self._queue.task_done()
logging.info('Worker stop')
class WorkerPool(object):
_workers = []
def __init__(self, num):
self._queue = Queue.Queue()
self._size = num
def start(self):
logging.info('WorkerPool start %d' % self._size)
for _ in range(self._size):
worker = Worker(self._queue)
worker.start()
self._workers.append(worker)
def stop(self):
for worker in self._workers:
self._queue.put('quit')
for worker in self._workers:
worker.join()
logging.info('WorkerPool stopd')
def append(self, content):
self._queue.put(content)
gateway.py
import tornado.ioloop
import tornado.web
from common import thread_pool
workers = None
class MainServerHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
start_time = time.time()
method = 'get'
content = (self.handle, (method, self.request, start_time), self.on_complete)
workers.append(content)
#tornado.web.asynchronous
def post(self):
start_time = time.time()
method = 'post'
content = (self.handle, (method, self.request, start_time), self.on_complete)
workers.append(content)
def handle(self, args):
method, request, start_time = args
#for test, just return
return 'test test'
def on_complete(self, res):
logging.debug('on_complete')
self.write(res)
self.finish()
return
def main(argv):
global workers
workers = thread_pool.WorkerPool(conf_mgr.thread_num)
workers.start()
application = tornado.web.Application([(r"/", MainServerHandler)])
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
if __name__ == "__main__":
main(sys.argv[1:])
When I make many concurrent requests, I get this error:
ERROR: 2014-09-15 18:04:03: ioloop.py:435 * 140500107065056 Exception in callback <tornado.stack_context._StackContextWrapper object at 0x7fc8b4d6b9f0>
Traceback (most recent call last):
File "/home/work/nlp_arch/project/ps/se/nlp-arch/gateway/gateway/../third-party/tornado-2.4.1/tornado/ioloop.py", line 421, in _run_callback
callback()
File "/home/work/nlp_arch/project/ps/se/nlp-arch/gateway/gateway/../common/thread_pool.py", line 39, in <lambda>
tornado.ioloop.IOLoop.instance().add_callback(lambda: on_complete(resp))
File "/home/work/nlp_arch/project/ps/se/nlp-arch/gateway/gateway/gateway.py", line 92, in on_complete
self.write(res)
File "/home/work/nlp_arch/project/ps/se/nlp-arch/gateway/gateway/../third-party/tornado-2.4.1/tornado/web.py", line 489, in write
raise RuntimeError("Cannot write() after finish(). May be caused "
RuntimeError: Cannot write() after finish(). May be caused by using async operations without the #asynchronous decorator.
But I didn't call write after finish. I'm also using the #asynchronous decorator. At the same time, in the logs I see that write/finish is called by same thread.
The issue is with the way you're adding the callback to the I/O loop. Add it like this:
tornado.ioloop.IOLoop.instance().add_callback(on_complete, resp)
And the errors will go away.
You're seeing this strange behavior because when you use a lambda function, you're creating a closure in the local scope of the function, and the variables used in that closure get bound at the point the lambda is executed, not when its created. Consider this example:
funcs = []
def func(a):
print a
for i in range(5):
funcs.append(lambda: func(i))
for f in funcs:
f()
Output:
4
4
4
4
4
Because your worker method is running in a while loop, on_complete ends up getting redefined several times, which also changes the value of on_complete inside the lambda. That means if one worker thread sets on_complete for a handler A, but then gets another task and sets on_complete for handler B prior to the callback set for handler A running, both callbacks end up up running handler B's on_complete.
If you really wanted to use a lambda, you could also avoid this by binding on_complete in the local scope of the lambda:
tornado.ioloop.IOLoop.instance().add_callback(lambda on_complete=on_complete: on_complete(resp))
But just adding the function and its argument directly is much nicer.

Concurrently run two functions that take parameters and return lists?

I understand that two functions can run in parallel using multiprocessing or threading modules, e.g. Make 2 functions run at the same time and Python multiprocessing for parallel processes.
But the above examples only use print function. Is it possible to run functions that return a list in parallel in python, if so, how?
I've tried with threading:
from threading import Thread
def func1(x):
return [i*i for i in x]
def func2(x):
return [i*i*i for i in x]
nums = [1,2,3,4,5]
p1 = Thread(target = func1(nums)).start()
p2 = Thread(target = func2(nums)).start()
print p1
print p2
but i got the follow error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 761, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: 'list' object is not callable
None
None
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 761, in run
self.__target(*self.__args, **self.__kwargs)
TypeError: 'list' object is not callable
I've tried inputing args parameter as a tuple, instead of a variable:
import threading
from threading import Thread
def func1(x):
return [i*i for i in x]
def func2(x):
return [i*i*i for i in x]
nums = [1,2,3,4,5]
p1 = Thread(target = func1, args=(nums,)).start()
p2 = Thread(target = func2, args=(nums,)).start()
print p1, p2
but it only returns None None, the desired output should be:
[out]:
[1, 4, 9, 16, 25] [1, 8, 27, 64, 125]
Thread's target function cannot return a value. Or, I should say, the return value is ignored and as such, not communicated back to spawning thread. But here's a couple things you can do:
1) Communicate back to spawning thread using Queue.Queue. Note the wrapper around the original functions:
from threading import Thread
from Queue import Queue
def func1(x):
return [i*i for i in x]
def func2(x):
return [i*i*i for i in x]
nums = [1,2,3,4,5]
def wrapper(func, arg, queue):
queue.put(func(arg))
q1, q2 = Queue(), Queue()
Thread(target=wrapper, args=(func1, nums, q1)).start()
Thread(target=wrapper, args=(func2, nums, q2)).start()
print q1.get(), q2.get()
2) Use global to access result lists in your threads, as well as the spawning process:
from threading import Thread
list1=list()
list2=list()
def func1(x):
global list1
list1 = [i*i for i in x]
def func2(x):
global list2
list2 = [i*i*i for i in x]
nums = [1,2,3,4,5]
Thread(target = func1, args=(nums,)).start()
Thread(target = func2, args=(nums,)).start()
print list1, list2
Target should only receive the function name. The parameters should be passed with the parameter "args". I cannot paste the code because I am answering from my mobile phone... ;-)

Turn functions with a callback into Python generators?

The Scipy minimization function (just to use as an example), has the option of adding a callback function at each step. So I can do something like,
def my_callback(x):
print x
scipy.optimize.fmin(func, x0, callback=my_callback)
Is there a way to use the callback function to create a generator version of fmin, so that I could do,
for x in my_fmin(func,x0):
print x
It seems like it might be possible with some combination of yields and sends, but I can quite think of anything.
As pointed in the comments, you could do it in a new thread, using Queue. The drawback is that you'd still need some way to access the final result (what fmin returns at the end). My example below uses an optional callback to do something with it (another option would be to just yield it also, though your calling code would have to differentiate between iteration results and final results):
from thread import start_new_thread
from Queue import Queue
def my_fmin(func, x0, end_callback=(lambda x:x), timeout=None):
q = Queue() # fmin produces, the generator consumes
job_done = object() # signals the processing is done
# Producer
def my_callback(x):
q.put(x)
def task():
ret = scipy.optimize.fmin(func,x0,callback=my_callback)
q.put(job_done)
end_callback(ret) # "Returns" the result of the main call
# Starts fmin in a new thread
start_new_thread(task,())
# Consumer
while True:
next_item = q.get(True,timeout) # Blocks until an input is available
if next_item is job_done:
break
yield next_item
Update: to block the execution of the next iteration until the consumer has finished processing the last one, it's also necessary to use task_done and join.
# Producer
def my_callback(x):
q.put(x)
q.join() # Blocks until task_done is called
# Consumer
while True:
next_item = q.get(True,timeout) # Blocks until an input is available
if next_item is job_done:
break
yield next_item
q.task_done() # Unblocks the producer, so a new iteration can start
Note that maxsize=1 is not necessary, since no new item will be added to the queue until the last one is consumed.
Update 2: Also note that, unless all items are eventually retrieved by this generator, the created thread will deadlock (it will block forever and its resources will never be released). The producer is waiting on the queue, and since it stores a reference to that queue, it will never be reclaimed by the gc even if the consumer is. The queue will then become unreachable, so nobody will be able to release the lock.
A clean solution for that is unknown, if possible at all (since it would depend on the particular function used in the place of fmin). A workaround could be made using timeout, having the producer raises an exception if put blocks for too long:
q = Queue(maxsize=1)
# Producer
def my_callback(x):
q.put(x)
q.put("dummy",True,timeout) # Blocks until the first result is retrieved
q.join() # Blocks again until task_done is called
# Consumer
while True:
next_item = q.get(True,timeout) # Blocks until an input is available
q.task_done() # (one "task_done" per "get")
if next_item is job_done:
break
yield next_item
q.get() # Retrieves the "dummy" object (must be after yield)
q.task_done() # Unblocks the producer, so a new iteration can start
Generator as coroutine (no threading)
Let's have FakeFtp with retrbinary function using callback being called with each successful read of chunk of data:
class FakeFtp(object):
def __init__(self):
self.data = iter(["aaa", "bbb", "ccc", "ddd"])
def login(self, user, password):
self.user = user
self.password = password
def retrbinary(self, cmd, cb):
for chunk in self.data:
cb(chunk)
Using simple callback function has disadvantage, that it is called repeatedly and the callback
function cannot easily keep context between calls.
Following code defines process_chunks generator, which will be able receiving chunks of data one
by one and processing them. In contrast to simple callback, here we are able to keep all the
processing within one function without losing context.
from contextlib import closing
from itertools import count
def main():
processed = []
def process_chunks():
for i in count():
try:
# (repeatedly) get the chunk to process
chunk = yield
except GeneratorExit:
# finish_up
print("Finishing up.")
return
else:
# Here process the chunk as you like
print("inside coroutine, processing chunk:", i, chunk)
product = "processed({i}): {chunk}".format(i=i, chunk=chunk)
processed.append(product)
with closing(process_chunks()) as coroutine:
# Get the coroutine to the first yield
coroutine.next()
ftp = FakeFtp()
# next line repeatedly calls `coroutine.send(data)`
ftp.retrbinary("RETR binary", cb=coroutine.send)
# each callback "jumps" to `yield` line in `process_chunks`
print("processed result", processed)
print("DONE")
To see the code in action, put the FakeFtp class, the code shown above and following line:
main()
into one file and call it:
$ python headsandtails.py
('inside coroutine, processing chunk:', 0, 'aaa')
('inside coroutine, processing chunk:', 1, 'bbb')
('inside coroutine, processing chunk:', 2, 'ccc')
('inside coroutine, processing chunk:', 3, 'ddd')
Finishing up.
('processed result', ['processed(0): aaa', 'processed(1): bbb', 'processed(2): ccc', 'processed(3): ddd'])
DONE
How it works
processed = [] is here just to show, the generator process_chunks shall have no problems to
cooperate with its external context. All is wrapped into def main(): to prove, there is no need to
use global variables.
def process_chunks() is the core of the solution. It might have one shot input parameters (not
used here), but main point, where it receives input is each yield line returning what anyone sends
via .send(data) into instance of this generator. One can coroutine.send(chunk) but in this example it is done via callback refering to this function callback.send.
Note, that in real solution there is no problem to have multiple yields in the code, they are
processed one by one. This might be used e.g. to read (and ignore) header of CSV file and then
continue processing records with data.
We could instantiate and use the generator as follows:
coroutine = process_chunks()
# Get the coroutine to the first yield
coroutine.next()
ftp = FakeFtp()
# next line repeatedly calls `coroutine.send(data)`
ftp.retrbinary("RETR binary", cb=coroutine.send)
# each callback "jumps" to `yield` line in `process_chunks`
# close the coroutine (will throw the `GeneratorExit` exception into the
# `process_chunks` coroutine).
coroutine.close()
Real code is using contextlib closing context manager to ensure, the coroutine.close() is
always called.
Conclusions
This solution is not providing sort of iterator to consume data from in traditional style "from
outside". On the other hand, we are able to:
use the generator "from inside"
keep all iterative processing within one function without being interrupted between callbacks
optionally use external context
provide usable results to outside
all this can be done without using threading
Credits: The solution is heavily inspired by SO answer Python FTP “chunk” iterator (without loading entire file into memory)
written by user2357112
Concept Use a blocking queue with maxsize=1 and a producer/consumer model.
The callback produces, then the next call to the callback will block on the full queue.
The consumer then yields the value from the queue, tries to get another value, and blocks on read.
The producer is the allowed to push to the queue, rinse and repeat.
Usage:
def dummy(func, arg, callback=None):
for i in range(100):
callback(func(arg+i))
# Dummy example:
for i in Iteratorize(dummy, lambda x: x+1, 0):
print(i)
# example with scipy:
for i in Iteratorize(scipy.optimize.fmin, func, x0):
print(i)
Can be used as expected for an iterator:
for i in take(5, Iteratorize(dummy, lambda x: x+1, 0)):
print(i)
Iteratorize class:
from thread import start_new_thread
from Queue import Queue
class Iteratorize:
"""
Transforms a function that takes a callback
into a lazy iterator (generator).
"""
def __init__(self, func, ifunc, arg, callback=None):
self.mfunc=func
self.ifunc=ifunc
self.c_callback=callback
self.q = Queue(maxsize=1)
self.stored_arg=arg
self.sentinel = object()
def _callback(val):
self.q.put(val)
def gentask():
ret = self.mfunc(self.ifunc, self.stored_arg, callback=_callback)
self.q.put(self.sentinel)
if self.c_callback:
self.c_callback(ret)
start_new_thread(gentask, ())
def __iter__(self):
return self
def next(self):
obj = self.q.get(True,None)
if obj is self.sentinel:
raise StopIteration
else:
return obj
Can probably do with some cleaning up to accept *args and **kwargs for the function being wrapped and/or the final result callback.
How about
data = []
scipy.optimize.fmin(func,x0,callback=data.append)
for line in data:
print line
If not, what exactly do you want to do with the generator's data?
A variant of Frits' answer, that:
Supports send to choose a return value for the callback
Supports throw to choose an exception for the callback
Supports close to gracefully shut down
Does not compute a queue item until it is requested
The complete code with tests can be found on github
import queue
import threading
import collections.abc
class generator_from_callback(collections.abc.Generator):
def __init__(self, expr):
"""
expr: a function that takes a callback
"""
self._expr = expr
self._done = False
self._ready_queue = queue.Queue(1)
self._done_queue = queue.Queue(1)
self._done_holder = [False]
# local to avoid reference cycles
ready_queue = self._ready_queue
done_queue = self._done_queue
done_holder = self._done_holder
def callback(value):
done_queue.put((False, value))
cmd, *args = ready_queue.get()
if cmd == 'close':
raise GeneratorExit
elif cmd == 'send':
return args[0]
elif cmd == 'throw':
raise args[0]
def thread_func():
try:
cmd, *args = ready_queue.get()
if cmd == 'close':
raise GeneratorExit
elif cmd == 'send':
if args[0] is not None:
raise TypeError("can't send non-None value to a just-started generator")
elif cmd == 'throw':
raise args[0]
ret = expr(callback)
raise StopIteration(ret)
except BaseException as e:
done_holder[0] = True
done_queue.put((True, e))
self._thread = threading.Thread(target=thread_func)
self._thread.start()
def __next__(self):
return self.send(None)
def send(self, value):
if self._done_holder[0]:
raise StopIteration
self._ready_queue.put(('send', value))
is_exception, val = self._done_queue.get()
if is_exception:
raise val
else:
return val
def throw(self, exc):
if self._done_holder[0]:
raise StopIteration
self._ready_queue.put(('throw', exc))
is_exception, val = self._done_queue.get()
if is_exception:
raise val
else:
return val
def close(self):
if not self._done_holder[0]:
self._ready_queue.put(('close',))
self._thread.join()
def __del__(self):
self.close()
Which works as:
In [3]: def callback(f):
...: ret = f(1)
...: print("gave 1, got {}".format(ret))
...: f(2)
...: print("gave 2")
...: f(3)
...:
In [4]: i = generator_from_callback(callback)
In [5]: next(i)
Out[5]: 1
In [6]: i.send(4)
gave 1, got 4
Out[6]: 2
In [7]: next(i)
gave 2, got None
Out[7]: 3
In [8]: next(i)
StopIteration
For scipy.optimize.fmin, you would use generator_from_callback(lambda c: scipy.optimize.fmin(func, x0, callback=c))
Solution to handle non-blocking callbacks
The solution using threading and queue is pretty good, of high-performance and cross-platform, probably the best one.
Here I provide this not-too-bad solution, which is mainly for handling non-blocking callbacks, e.g. called from the parent function through threading.Thread(target=callback).start(), or other non-blocking ways.
import pickle
import select
import subprocess
def my_fmin(func, x0):
# open a process to use as a pipeline
proc = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
def my_callback(x):
# x might be any object, not only str, so we use pickle to dump it
proc.stdin.write(pickle.dumps(x).replace(b'\n', b'\\n') + b'\n')
proc.stdin.flush()
from scipy import optimize
optimize.fmin(func, x0, callback=my_callback)
# this is meant to handle non-blocking callbacks, e.g. called somewhere
# through `threading.Thread(target=callback).start()`
while select.select([proc.stdout], [], [], 0)[0]:
yield pickle.loads(proc.stdout.readline()[:-1].replace(b'\\n', b'\n'))
# close the process
proc.communicate()
Then you can use the function like this:
# unfortunately, `scipy.optimize.fmin`'s callback is blocking.
# so this example is just for showing how-to.
for x in my_fmin(lambda x: x**2, 3):
print(x)
Although This solution seems quite simple and readable, it's not as high-performance as the threading and queue solution, because:
Processes are much heavier than threadings.
Passing data through pipe instead of memory is much slower.
Besides, it doesn't work on Windows, because the select module on Windows can only handle sockets, not pipes and other file descriptors.
For a super simple approach...
def callback_to_generator():
data = []
method_with_callback(blah, foo, callback=data.append)
for item in data:
yield item
Yes, this isn't good for large data
Yes, this blocks on all items being processed first
But it still might be useful for some use cases :)
Also thanks to #winston-ewert as this is just a small variant on his answer :)

Categories