Mixing Deferred and deferToThread to run blocking code in separate thread - python

In my Python Twisted application I need to receive data from the client, perform some database operations and - depends on data - run some blocking code in separate thread.
So far I have:
d = get_user(user_id)
d.addCallback(do_something_with_input_data, input_data)
d.addCallback(run_blocking_code)
d.addCallback(save_data_into_db)
d.addCallback(response_to_client)
#defer.inlineCallbacks
def get_user(self, user_id):
user = yield get_user_from_db(user_id)
defer.returnValue(user)
def do_something_with_input_data(user, input_data):
# do smth...
return results
#defer.inlineCallbacks
def run_blocking_code(results)
threads.deferToThread(run_in_separate_thread, results)
return results
#defer.inlineCallbacks
def save_data_into_db(results)
yield save_in_db(results)
def.returnValue('OK')
def response_to_client(response)
# send 'OK' to client
Is this a good approach to call deferToThread() in run_blocking_code()? If so, how can I make save_data_into_db() wait until thread ends?

I'd say the general concept is fine. I'd add in some errbacks, and you also need to tweak your run_blocking_code function:
from twisted.internet import defer, threads
#defer.inlineCallbacks
def run_blocking_code(results):
# the `deferToThread` returns a deferred
d = threads.deferToThread(run_in_separate_thread, results)
# let's wait for it here (need a yield for functions decorated with `inlineCallbacks`)
yield d
# now return its value to the next function in the callback chain
defer.returnValue(d.result)

Related

Return from function if execution finished within timeout or make callback otherwise

I have a project in Python 3.5 without any usage of asynchronous features. I have to implement the folowing logic:
def should_return_in_3_sec(some_serious_job, arguments, finished_callback):
# Start some_serious_job(*arguments) in a task
# if it finishes within 3 sec:
# return result immediately
# otherwise return None, but do not terminate task.
# If the task finishes in 1 minute:
# call finished_callback(result)
# else:
# call finished_callback(None)
pass
The function should_return_in_3_sec() should remain synchronous, but it is up to me to write any new asynchronous code (including some_serious_job()).
What is the most elegant and pythonic way to do it?
Fork off a thread doing the serious job, let it write its result into a queue and then terminate. Read in your main thread from that queue with a timeout of three seconds. If the timeout occurs, start another thread and return None. Let the second thread read from the queue with a timeout of one minute; if that timeouts also, call finished_callback(None); otherwise call finished_callback(result).
I sketched it like this:
import threading, queue
def should_return_in_3_sec(some_serious_job, arguments, finished_callback):
result_queue = queue.Queue(1)
def do_serious_job_and_deliver_result():
result = some_serious_job(arguments)
result_queue.put(result)
threading.Thread(target=do_serious_job_and_deliver_result).start()
try:
result = result_queue.get(timeout=3)
except queue.Empty: # timeout?
def expect_and_handle_late_result():
try:
result = result_queue.get(timeout=60)
except queue.Empty:
finished_callback(None)
else:
finished_callback(result)
threading.Thread(target=expect_and_handle_late_result).start()
return None
else:
return result
The threading module has some simple timeout options, see Thread.join(timeout) for example.
If you do choose to use asyncio, below is a a partial solution to address some of your needs:
import asyncio
import time
async def late_response(task, flag, timeout, callback):
done, pending = await asyncio.wait([task], timeout=timeout)
callback(done.pop().result() if done else None) # will raise an exception if some_serious_job failed
flag[0] = True # signal some_serious_job to stop
return await task
async def launch_job(loop, some_serious_job, arguments, finished_callback,
timeout_1=3, timeout_2=5):
flag = [False]
task = loop.run_in_executor(None, some_serious_job, flag, *arguments)
done, pending = await asyncio.wait([task], timeout=timeout_1)
if done:
return done.pop().result() # will raise an exception if some_serious_job failed
asyncio.ensure_future(
late_response(task, flag, timeout_2, finished_callback))
return None
def f(flag, n):
for i in range(n):
print("serious", i, flag)
if flag[0]:
return "CANCELLED"
time.sleep(1)
return "OK"
def finished(result):
print("FINISHED", result)
loop = asyncio.get_event_loop()
result = loop.run_until_complete(launch_job(loop, f, [1], finished))
print("result:", result)
loop.run_forever()
This will run the job in a separate thread (Use loop.set_executor(ProcessPoolExecutor()) to run a CPU intensive task in a process instead). Keep in mind it is a bad practice to terminate a process/thread - the code above uses a very simple list to signal the thread to stop (See also threading.Event / multiprocessing.Event).
While implementing your solution, you might discover you would want to modify your existing code to use couroutines instead of using threads.

Python ThreadPoolExecutor wait for all futures to complete

I am trying to write a module which needs to crawl some URLs concurrently/parallelly. since this would be a more expensive Network IO operation instead of CPU heavy. I am using ThreadPoolExecutor.
Now in my code, multiple functions add tasks to the shared thread pool.
my issue is Main thread gets suspended before all future objects are
done processing in the callback functions.
I am a beginner dealing with futures and ThreadPoolExecutor. Any help would be appreciated.
import settings
from concurrent.futures import ThreadPoolExecutor
import concurrent.futures
class Test(Base):
WORKER_THREADS = settings.WORKER_THREADS
def __init__(self, urls):
super(Test, self).__init__()
self.urls = urls
self.worker_pool = ThreadPoolExecutor(max_workers=Test.WORKER_THREADS)
def add_to_worker_queue(self, task, callback, **kwargs):
self.logger.info("Adding task %s to worker pool.", task.func_name)
self.worker_pool.submit(task, **kwargs).add_done_callback(callback)
return
def load_url(self, url):
response = self.make_requests(urls=url) # make_requests is in Base class (it just makes a HTTP req)
# response is a generator, so to get the data out of it need to iterate through it.
for res in response:
return res
def handle_response(self, response):
# do some stuff with response and add it again to the worker queue for further parallel processing
self.add_to_worker_queue(some_task, callback_func, data=response)
return
def start(self):
for url in self.urls:
self.add_to_worker_queue(self.load_url, self.handle_response, url=[url])
return
def stop(self):
self.worker_pool.shutdown(wait=True)
return
if __name__ == "__main__":
start_urls = [ 'http://stackoverflow.com/'
, 'https://docs.python.org/3.3/library/concurrent.futures.html'
]
test = Test(urls=start_urls)
test.start()
test.stop()
PS I tried using executer with "with" statement, according to this example. https://docs.python.org/3.3/library/concurrent.futures.html#threadpoolexecutor-example
but as I submit tasks to the pool one by one and above example wait for future objects to be completed which defeats my purpose.

Is there a way to use asyncio.Queue in multiple threads?

Let's assume I have the following code:
import asyncio
import threading
queue = asyncio.Queue()
def threaded():
import time
while True:
time.sleep(2)
queue.put_nowait(time.time())
print(queue.qsize())
#asyncio.coroutine
def async():
while True:
time = yield from queue.get()
print(time)
loop = asyncio.get_event_loop()
asyncio.Task(async())
threading.Thread(target=threaded).start()
loop.run_forever()
The problem with this code is that the loop inside async coroutine is never finishing the first iteration, while queue size is increasing.
Why is this happening this way and what can I do to fix it?
I can't get rid of separate thread, because in my real code I use a separate thread to communicate with a serial device, and I haven't find a way to do that using asyncio.
asyncio.Queue is not thread-safe, so you can't use it directly from more than one thread. Instead, you can use janus, which is a third-party library that provides a thread-aware asyncio queue.
import asyncio
import threading
import janus
def threaded(squeue):
import time
while True:
time.sleep(2)
squeue.put_nowait(time.time())
print(squeue.qsize())
#asyncio.coroutine
def async_func(aqueue):
while True:
time = yield from aqueue.get()
print(time)
loop = asyncio.get_event_loop()
queue = janus.Queue(loop=loop)
asyncio.create_task(async_func(queue.async_q))
threading.Thread(target=threaded, args=(queue.sync_q,)).start()
loop.run_forever()
There is also aioprocessing (full-disclosure: I wrote it), which provides process-safe (and as a side-effect, thread-safe) queues as well, but that's overkill if you're not trying to use multiprocessing.
Edit
As pointed it out in other answers, for simple use-cases you can use loop.call_soon_threadsafe to add to the queue, as well.
If you do not want to use another library you can schedule a coroutine from the thread. Replacing the queue.put_nowait with the following works fine.
asyncio.run_coroutine_threadsafe(queue.put(time.time()), loop)
The variable loop represents the event loop in the main thread.
EDIT:
The reason why your async coroutine is not doing anything is that
the event loop never gives it a chance to do so. The queue object is
not threadsafe and if you dig through the cpython code you find that
this means that put_nowait wakes up consumers of the queue through
the use of a future with the call_soon method of the event loop. If
we could make it use call_soon_threadsafe it should work. The major
difference between call_soon and call_soon_threadsafe, however, is
that call_soon_threadsafe wakes up the event loop by calling loop._write_to_self() . So let's call it ourselves:
import asyncio
import threading
queue = asyncio.Queue()
def threaded():
import time
while True:
time.sleep(2)
queue.put_nowait(time.time())
queue._loop._write_to_self()
print(queue.qsize())
#asyncio.coroutine
def async():
while True:
time = yield from queue.get()
print(time)
loop = asyncio.get_event_loop()
asyncio.Task(async())
threading.Thread(target=threaded).start()
loop.run_forever()
Then, everything works as expected.
As for the threadsafe aspect of
accessing shared objects,asyncio.queue uses under the hood
collections.deque which has threadsafe append and popleft.
Maybe checking for queue not empty and popleft is not atomic, but if
you consume the queue only in one thread (the one of the event loop)
it could be fine.
The other proposed solutions, loop.call_soon_threadsafe from Huazuo
Gao's answer and my asyncio.run_coroutine_threadsafe are just doing
this, waking up the event loop.
BaseEventLoop.call_soon_threadsafe is at hand. See asyncio doc for detail.
Simply change your threaded() like this:
def threaded():
import time
while True:
time.sleep(1)
loop.call_soon_threadsafe(queue.put_nowait, time.time())
loop.call_soon_threadsafe(lambda: print(queue.qsize()))
Here's a sample output:
0
1443857763.3355968
0
1443857764.3368602
0
1443857765.338082
0
1443857766.3392274
0
1443857767.3403943
What about just using threading.Lock with asyncio.Queue?
class ThreadSafeAsyncFuture(asyncio.Future):
""" asyncio.Future is not thread-safe
https://stackoverflow.com/questions/33000200/asyncio-wait-for-event-from-other-thread
"""
def set_result(self, result):
func = super().set_result
call = lambda: func(result)
self._loop.call_soon_threadsafe(call) # Warning: self._loop is undocumented
class ThreadSafeAsyncQueue(queue.Queue):
""" asyncio.Queue is not thread-safe, threading.Queue is not awaitable
works only with one putter to unlimited-size queue and with several getters
TODO: add maxsize limits
TODO: make put corouitine
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.lock = threading.Lock()
self.loop = asyncio.get_event_loop()
self.waiters = []
def put(self, item):
with self.lock:
if self.waiters:
self.waiters.pop(0).set_result(item)
else:
super().put(item)
async def get(self):
with self.lock:
if not self.empty():
return super().get()
else:
fut = ThreadSafeAsyncFuture()
self.waiters.append(fut)
result = await fut
return result
See also - asyncio: Wait for event from other thread

Have Python wait for a function to finish before proceeding with the program

I have a python program that I have written. This python program calls a function within a module I have also written and passes it some data.
program:
def Response(Response):
Resp = Response
def main():
myModule.process_this("hello") #Send string to myModule Process_this function
#Should wait around here for Resp to contain the Response
print Resp
That function processes it and passes it back as a response to function Response in the main program.
myModule:
def process_this(data)
#process data
program.Response(data)
I checked and all the data is being passed correctly. I have left out all the imports and the data processing to keep this question as concise as possible.
I need to find some way of having Python wait for resp to actually contain the response before proceeding with the program. I've been looking threading and using semaphores or using the Queue module, but i'm not 100% sure how I would incorporate either into my program.
Here's a working solution with queues and the threading module. Note: if your tasks are CPU bound rather than IO bound, you should use multiprocessing instead
import threading
import Queue
def worker(in_q, out_q):
""" threadsafe worker """
abort = False
while not abort:
try:
# make sure we don't wait forever
task = in_q.get(True, .5)
except Queue.Empty:
abort = True
else:
# process task
response = task
# return result
out_q.put(response)
in_q.task_done()
# one queue to pass tasks, one to get results
task_q = Queue.Queue()
result_q = Queue.Queue()
# start threads
t = threading.Thread(target=worker, args=(task_q, result_q))
t.start()
# submit some work
task_q.put("hello")
# wait for results
task_q.join()
print "result", result_q.get()

Python multiprocessing with twisted's reactor

I am working on a xmlrpc server which has to perform certain tasks cyclically. I am using twisted as the core of the xmlrpc service but I am running into a little problem:
class cemeteryRPC(xmlrpc.XMLRPC):
def __init__(self, dic):
xmlrpc.XMLRPC.__init__(self)
def xmlrpc_foo(self):
return 1
def cycle(self):
print "Hello"
time.sleep(3)
class cemeteryM( base ):
def __init__(self, dic): # dic is for cemetery
multiprocessing.Process.__init__(self)
self.cemRPC = cemeteryRPC()
def run(self):
# Start reactor on a second process
reactor.listenTCP( c.PORT_XMLRPC, server.Site( self.cemRPC ) )
p = multiprocessing.Process( target=reactor.run )
p.start()
while not self.exit.is_set():
self.cemRPC.cycle()
#p.join()
if __name__ == "__main__":
import errno
test = cemeteryM()
test.start()
# trying new method
notintr = False
while not notintr:
try:
test.join()
notintr = True
except OSError, ose:
if ose.errno != errno.EINTR:
raise ose
except KeyboardInterrupt:
notintr = True
How should i go about joining these two process so that their respective joins doesn't block?
(I am pretty confused by "join". Why would it block and I have googled but can't find much helpful explanation to the usage of join. Can someone explain this to me?)
Regards
Do you really need to run Twisted in a separate process? That looks pretty unusual to me.
Try to think of Twisted's Reactor as your main loop - and hang everything you need off that - rather than trying to run Twisted as a background task.
The more normal way of performing this sort of operation would be to use Twisted's .callLater or to add a LoopingCall object to the Reactor.
e.g.
from twisted.web import xmlrpc, server
from twisted.internet import task
from twisted.internet import reactor
class Example(xmlrpc.XMLRPC):
def xmlrpc_add(self, a, b):
return a + b
def timer_event(self):
print "one second"
r = Example()
m = task.LoopingCall(r.timer_event)
m.start(1.0)
reactor.listenTCP(7080, server.Site(r))
reactor.run()
Hey asdvawev - .join() in multiprocessing works just like .join() in threading - it's a blocking call the main thread runs to wait for the worker to shut down. If the worker never shuts down, then .join() will never return. For example:
class myproc(Process):
def run(self):
while True:
time.sleep(1)
Calling run on this means that join() will never, ever return. Typically to prevent this I'll use an Event() object passed into the child process to allow me to signal the child when to exit:
class myproc(Process):
def __init__(self, event):
self.event = event
Process.__init__(self)
def run(self):
while not self.event.is_set():
time.sleep(1)
Alternatively, if your work is encapsulated in a queue - you can simply have the child process work off of the queue until it encounters a sentinel (typically a None entry in the queue) and then shut down.
Both of these suggestions means that prior to calling .join() you can send set the event, or insert the sentinel and when join() is called, the process will finish it's current task and then exit properly.

Categories