How to efficiently do many tasks a "little later" in Python? - python

I have a process, that needs to perform a bunch of actions "later" (after 10-60 seconds usually). The problem is that those "later" actions can be a lot (1000s), so using a Thread per task is not viable. I know for the existence of tools like gevent and eventlet, but one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it).
What I'm wondering is What are my options? So, suggestions are welcome, in the lines of libraries (if you've used any of the mentioned please share your experiences), techniques (Python's "coroutine" support, use one thread that sleeps for a while and checks a queue), how to make use of zeromq's poll or eventloop to do the job, or something else.

consider using a priority queue with one or more worker threads to service the tasks. The main thread can add work to the queue, with a timestamp of the soonest it should be serviced. Worker threads pop work off the queue, sleep until the time of priority value is reached, do the work, and then pop another item off the queue.
How about a more fleshed out answer. mklauber makes a good point. If there's a chance all of your workers might be sleeping when you have new, more urgent work, then a queue.PriorityQueue isn't really the solution, although a "priority queue" is still the technique to use, which is available from the heapq module. Instead, we'll make use of a different synchronization primitive; a condition variable, which in python is spelled threading.Condition.
The approach is fairly simple, peek on the heap, and if the work is current, pop it off and do that work. If there was work, but it's scheduled into the future, just wait on the condition until then, or if there's no work at all, sleep forever.
The producer does it's fair share of the work; every time it adds new work, it notifies the condition, so if there are sleeping workers, they'll wake up and recheck the queue for newer work.
import heapq, time, threading
START_TIME = time.time()
SERIALIZE_STDOUT = threading.Lock()
def consumer(message):
"""the actual work function. nevermind the locks here, this just keeps
the output nicely formatted. a real work function probably won't need
it, or might need quite different synchronization"""
SERIALIZE_STDOUT.acquire()
print time.time() - START_TIME, message
SERIALIZE_STDOUT.release()
def produce(work_queue, condition, timeout, message):
"""called to put a single item onto the work queue."""
prio = time.time() + float(timeout)
condition.acquire()
heapq.heappush(work_queue, (prio, message))
condition.notify()
condition.release()
def worker(work_queue, condition):
condition.acquire()
stopped = False
while not stopped:
now = time.time()
if work_queue:
prio, data = work_queue[0]
if data == 'stop':
stopped = True
continue
if prio < now:
heapq.heappop(work_queue)
condition.release()
# do some work!
consumer(data)
condition.acquire()
else:
condition.wait(prio - now)
else:
# the queue is empty, wait until notified
condition.wait()
condition.release()
if __name__ == '__main__':
# first set up the work queue and worker pool
work_queue = []
cond = threading.Condition()
pool = [threading.Thread(target=worker, args=(work_queue, cond))
for _ignored in range(4)]
map(threading.Thread.start, pool)
# now add some work
produce(work_queue, cond, 10, 'Grumpy')
produce(work_queue, cond, 10, 'Sneezy')
produce(work_queue, cond, 5, 'Happy')
produce(work_queue, cond, 10, 'Dopey')
produce(work_queue, cond, 15, 'Bashful')
time.sleep(5)
produce(work_queue, cond, 5, 'Sleepy')
produce(work_queue, cond, 10, 'Doc')
# and just to make the example a bit more friendly, tell the threads to stop after all
# the work is done
produce(work_queue, cond, float('inf'), 'stop')
map(threading.Thread.join, pool)

This answer has actually two suggestions - my first one and another I have discovered after the first one.
sched
I suspect you are looking for the sched module.
EDIT: my bare suggestion seemed little helpful after I have read it. So I decided to test the sched module to see if it can work as I suggested. Here comes my test: I would use it with a sole thread, more or less this way:
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
def run(self):
self.scheduler.run()
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
First, I'd create a thread class which would have its own scheduler and a queue. At least one event would be registered in the scheduler: one for invoking a method for scheduling events from the queue.
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The method for scheduling events from the queue would lock the queue, schedule each event, empty the queue and schedule itself again, for looking for new events some time in the future. Note that the period for looking for new events is short (one second), you may change it:
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The class should also have a method for scheduling user events. Naturally, this method should lock the queue while updating it:
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
Finally, the class should invoke the scheduler main method:
def run(self):
self.scheduler.run()
Here comes an example of using:
def print_time():
print "scheduled:", time.time()
if __name__ == "__main__":
st = SchedulingThread()
st.start()
st.schedule(print_time, 10)
while True:
print "main thread:", time.time()
time.sleep(5)
st.join()
Its output in my machine is:
$ python schedthread.py
main thread: 1311089765.77
Registerd event (10, 1, <function print_time at 0x2f4bb0>, ())
main thread: 1311089770.77
main thread: 1311089775.77
scheduled: 1311089776.77
main thread: 1311089780.77
main thread: 1311089785.77
This code is just a quick'n'dirty example, it may need some work. However, I have to confess that I am a bit fascinated by the sched module, so did I suggest it. You may want to look for other suggestions as well :)
APScheduler
Looking in Google for solutions like the one I've post, I found this amazing APScheduler module. It is so practical and useful that I bet it is your solution. My previous example would be way simpler with this module:
from apscheduler.scheduler import Scheduler
import time
sch = Scheduler()
sch.start()
#sch.interval_schedule(seconds=10)
def print_time():
print "scheduled:", time.time()
sch.unschedule_func(print_time)
while True:
print "main thread:", time.time()
time.sleep(5)
(Unfortunately I did not find how to schedule an event to execute only once, so the function event should unschedule itself. I bet it can be solved with some decorator.)

If you have a bunch of tasks that need to get performed later, and you want them to persist even if you shut down the calling program or your workers, you should really look into Celery, which makes it super easy to create new tasks, have them executed on any machine you'd like, and wait for the results.
From the Celery page, "This is a simple task adding two numbers:"
from celery.task import task
#task
def add(x, y):
return x + y
You can execute the task in the background, or wait for it to finish:
>>> result = add.delay(8, 8)
>>> result.wait() # wait for and return the result
16

You wrote:
one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it)
Seems like your choice will be heavily influenced by these details, which are a bit unclear—how is zeromq being used for communication, how much resources will the integration will require, and what are your requirements and available resources.
There's a project called django-ztask which uses zeromq and provides a task decorator similar to celery's one. However, it is (obviously) Django-specific and so may not be suitable in your case. I haven't used it, prefer celery myself.
Been using celery for a couple of projects (these are hosted at ep.io PaaS hosting, which provides an easy way to use it).
Celery looks like quite flexible solution, allowing delaying tasks, callbacks, task expiration & retrying, limiting task execution rate, etc. It may be used with Redis, Beanstalk, CouchDB, MongoDB or an SQL database.
Example code (definition of task and asynchronous execution after a delay):
from celery.decorators import task
#task
def my_task(arg1, arg2):
pass # Do something
result = my_task.apply_async(
args=[sth1, sth2], # Arguments that will be passed to `my_task()` function.
countdown=3, # Time in seconds to wait before queueing the task.
)
See also a section in celery docs.

Have you looked at the multiprocessing module? It comes standard with Python. It is similar to the threading module, but runs each task in a process. You can use a Pool() object to set up a worker pool, then use the .map() method to call a function with the various queued task arguments.

Pyzmq has an ioloop implementation with a similar api to that of the tornado ioloop. It implements a DelayedCallback which may help you.

Presuming your process has a run loop which can receive signals and the length of time of each action is within bounds of sequential operation, use signals and posix alarm()
signal.alarm(time)
If time is non-zero, this function requests that a
SIGALRM signal be sent to the process in time seconds.
This depends on what you mean by "those "later" actions can be a lot" and if your process already uses signals. Due to phrasing of the question it's unclear why an external python package would be needed.

Another option is to use the Phyton GLib bindings, in particular its timeout functions.
It's a good choice as long as you don't want to make use of multiple cores and as long as the dependency on GLib is no problem. It handles all events in the same thread which prevents synchronization issues. Additionally, its event framework can also be used to watch and handle IO-based (i.e. sockets) events.
UPDATE:
Here's a live session using GLib:
>>> import time
>>> import glib
>>>
>>> def workon(thing):
... print("%s: working on %s" % (time.time(), thing))
... return True # use True for repetitive and False for one-time tasks
...
>>> ml = glib.MainLoop()
>>>
>>> glib.timeout_add(1000, workon, "this")
2
>>> glib.timeout_add(2000, workon, "that")
3
>>>
>>> ml.run()
1311343177.61: working on this
1311343178.61: working on that
1311343178.61: working on this
1311343179.61: working on this
1311343180.61: working on this
1311343180.61: working on that
1311343181.61: working on this
1311343182.61: working on this
1311343182.61: working on that
1311343183.61: working on this

Well in my opinion you could use something called "cooperative multitasking". It's twisted-based thing and its really cool. Just look at PyCon presentation from 2010: http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-cooperative-multitasking-with-twisted-getting-things-done-concurrently-11-3352182
Well you will need transport queue to do this too...

Simple. You can inherit your class from Thread and create instance of your class with Param like timeout so for each instance of your class you can say timeout that will make your thread wait for that time

Related

How to avoid to start hundreds of threads when starting (very short) actions at different timings in the future

I use this method to launch a few dozen (less than thousand) of calls of do_it at different timings in the future:
import threading
timers = []
while True:
for i in range(20):
t = threading.Timer(i * 0.010, do_it, [i]) # I pass the parameter i to function do_it
t.start()
timers.append(t) # so that they can be cancelled if needed
wait_for_something_else() # this can last from 5 ms to 20 seconds
The runtime of each do_it call is very fast (much less than 0.1 ms) and non-blocking. I would like to avoid spawning hundreds of new threads for such a simple task.
How could I do this with only one additional thread for all do_it calls?
Is there a simple way to do this with Python, without third party library and only standard library?
As I understand it, you want a single worker thread that can process submitted tasks, not in the order they are submitted, but rather in some prioritized order. This seems like a job for the thread-safe queue.PriorityQueue.
from dataclasses import dataclass, field
from threading import Thread
from typing import Any
from queue import PriorityQueue
#dataclass(order=True)
class PrioritizedItem:
priority: int
item: Any=field(compare=False)
def thread_worker(q: PriorityQueue[PrioritizedItem]):
while True:
do_it(q.get().item)
q.task_done()
q = PriorityQueue()
t = Thread(target=thread_worker, args=(q,))
t.start()
while True:
for i in range(20):
q.put(PrioritizedItem(priority=i * 0.010, item=i))
wait_for_something_else()
This code assumes you want to run forever. If not, you can add a timeout to the q.get in thread_worker, and return when the queue.Empty exception is thrown because the timeout expired. Like that you'll be able to join the queue/thread after all the jobs have been processed, and the timeout has expired.
If you want to wait until some specific time in the future to run the tasks, it gets a bit more complicated. Here's an approach that extends the above approach by sleeping in the worker thread until the specified time has arrived, but be aware that time.sleep is only as accurate as your OS allows it to be.
from dataclasses import astuple, dataclass, field
from datetime import datetime, timedelta
from time import sleep
from threading import Thread
from typing import Any
from queue import PriorityQueue
#dataclass(order=True)
class TimedItem:
when: datetime
item: Any=field(compare=False)
def thread_worker(q: PriorityQueue[TimedItem]):
while True:
when, item = astuple(q.get())
sleep_time = (when - datetime.now()).total_seconds()
if sleep_time > 0:
sleep(sleep_time)
do_it(item)
q.task_done()
q = PriorityQueue()
t = Thread(target=thread_worker, args=(q,))
t.start()
while True:
now = datetime.now()
for i in range(20):
q.put(TimedItem(when=now + timedelta(seconds=i * 0.010), item=i))
wait_for_something_else()
To address this problem using only a single extra thread we have to sleep in that thread, so it's possible that new tasks with higher priority could come in while the worker is sleeping. In that case the worker would process that new high priority task after it's done with the current one. The above code assumes that scenario will not happen, which seems reasonable based on the problem description. If that might happen you can alter the sleep code to repeatedly poll if the task at the front of the priority queue has come due. The disadvantage with a polling approach like that is that it would be more CPU intensive.
Also, if you can guarantee that the relative order of the tasks won't change after they've been submitted to the worker, then you can replace the priority queue with a regular queue.Queue to simplify the code somewhat.
These do_it tasks can be cancelled by removing them from the queue.
The above code was tested with the following mock definitions:
def do_it(x):
print(x)
def wait_for_something_else():
sleep(5)
An alternative approach that uses no extra threads would be to use asyncio, as pointed out by smcjones. Here's an approach using asyncio that calls do_it at specific times in the future by using loop.call_later:
import asyncio
def do_it(x):
print(x)
async def wait_for_something_else():
await asyncio.sleep(5)
async def main():
loop = asyncio.get_event_loop()
while True:
for i in range(20):
loop.call_later(i * 0.010, do_it, i)
await wait_for_something_else()
asyncio.run(main())
These do_it tasks can be cancelled using the handle returned by loop.call_later.
This approach will, however, require either switching over your program to use asyncio throughout, or running the asyncio event loop in a separate thread.
It sounds like you want something to be non-blocking and asynchronous, but also single-processed and single-threaded (one thread dedicated to do_it).
If this is the case, and especially if any networking is involved, so long as you're not actively doing serious I/O on your main thread, it is probably worthwhile using asyncio instead.
It's designed to handle non-blocking operations, and allows you to make all of your requests without waiting for a response.
Example:
import asyncio
def main():
while True:
tasks = []
for i in range(20):
tasks.append(asyncio.create_task(do_it(i)))
await wait_for_something_else()
for task in tasks:
await task
asyncio.run(main())
Given the time spent on blocking I/O (seconds) - you'll probably waste more time managing threads than you will save on generating a separate thread to do these other operations.
As you have said that in your code each series of 20 do_it calls starts when wait_for_something_else is finished, I would recommend calling the join method in each iteration of the while loop:
import threading
timers = []
while True:
for i in range(20):
t = threading.Timer(i * 0.010, do_it, [i]) # I pass the parameter i to function do_it
t.start()
timers.append(t) # so that they can be cancelled if needed
wait_for_something_else() # this can last from 5 ms to 20 seconds
for t in timers[-20:]:
t.join()
do_it run in order and cancellable
run all do_it in one thread and sleep for the specific timing (may not with sleep)
use a variable "should_run_it" to check the do_it should run or not (cancellable?)
it's that something like this?
import threading
import time
def do_it(i):
print(f"[{i}] {time.time()}")
should_run_it = {i:True for i in range(20)}
def guard_do_it(i):
if should_run_it[i]:
do_it(i)
def run_do_it():
for i in range(20):
guard_do_it(i)
time.sleep(0.010)
if __name__ == "__main__":
t = threading.Timer(0.010, run_do_it)
start = time.time()
print(start)
t.start()
#should_run_it[5] = should_run_it[10] = should_run_it[15] = False # test
t.join()
end = time.time()
print(end)
print(end - start)
I don't have a ton of experience with threading in Python, so please go easy on me. The concurrent.futures library is a part of Python3 and it's dead simple. I'm providing an example for you so you can see how straightforward it is.
Concurrent.futures with exactly one thread for do_it() and concurrency:
import concurrent.futures
import time
def do_it(iteration):
time.sleep(0.1)
print('do it counter', iteration)
def wait_for_something_else():
time.sleep(1)
print('waiting for something else')
def single_thread():
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
futures = (executor.submit(do_it, i) for i in range(20))
for future in concurrent.futures.as_completed(futures):
future.result()
def do_asap():
wait_for_something_else()
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(single_thread), executor.submit(do_asap)]
for future in concurrent.futures.as_completed(futures):
future.result()
The code above uses max_workers=1 threads to execute do_it() in a single thread. On line 13, do_it() is constrained to a single thread using the option max_workers=1 to limit the work to exactly one thread.
On line 22, both methods are submitted to the concurrent.futures thread pool executor. The code from lines 21-24 enables both methods to run in a thread pool and do_it runs on a single non-blocking thread.
The concurrent.futures doc describes how to control the number of threads. When max_workers is not specified, the total number of threads assigned to both processes is max_workers = min(32, os.cpu_count() + 4).

Proper approach to hang a Python program [duplicate]

I am writing an queue processing application which uses threads for waiting on and responding to queue messages to be delivered to the app. For the main part of the application, it just needs to stay active. For a code example like:
while True:
pass
or
while True:
time.sleep(1)
Which one will have the least impact on a system? What is the preferred way to do nothing, but keep a python app running?
I would imagine time.sleep() will have less overhead on the system. Using pass will cause the loop to immediately re-evaluate and peg the CPU, whereas using time.sleep will allow the execution to be temporarily suspended.
EDIT: just to prove the point, if you launch the python interpreter and run this:
>>> while True:
... pass
...
You can watch Python start eating up 90-100% CPU instantly, versus:
>>> import time
>>> while True:
... time.sleep(1)
...
Which barely even registers on the Activity Monitor (using OS X here but it should be the same for every platform).
Why sleep? You don't want to sleep, you want to wait for the threads to finish.
So
# store the threads you start in a your_threads list, then
for a_thread in your_threads:
a_thread.join()
See: thread.join
If you are looking for a short, zero-cpu way to loop forever until a KeyboardInterrupt, you can use:
from threading import Event
Event().wait()
Note: Due to a bug, this only works on Python 3.2+. In addition, it appears to not work on Windows. For this reason, while True: sleep(1) might be the better option.
For some background, Event objects are normally used for waiting for long running background tasks to complete:
def do_task():
sleep(10)
print('Task complete.')
event.set()
event = Event()
Thread(do_task).start()
event.wait()
print('Continuing...')
Which prints:
Task complete.
Continuing...
signal.pause() is another solution, see https://docs.python.org/3/library/signal.html#signal.pause
Cause the process to sleep until a signal is received; the appropriate handler will then be called. Returns nothing. Not on Windows. (See the Unix man page signal(2).)
I've always seen/heard that using sleep is the better way to do it. Using sleep will keep your Python interpreter's CPU usage from going wild.
You don't give much context to what you are really doing, but maybe Queue could be used instead of an explicit busy-wait loop? If not, I would assume sleep would be preferable, as I believe it will consume less CPU (as others have already noted).
[Edited according to additional information in comment below.]
Maybe this is obvious, but anyway, what you could do in a case where you are reading information from blocking sockets is to have one thread read from the socket and post suitably formatted messages into a Queue, and then have the rest of your "worker" threads reading from that queue; the workers will then block on reading from the queue without the need for neither pass, nor sleep.
Running a method as a background thread with sleep in Python:
import threading
import time
class ThreadingExample(object):
""" Threading example class
The run() method will be started and it will run in the background
until the application exits.
"""
def __init__(self, interval=1):
""" Constructor
:type interval: int
:param interval: Check interval, in seconds
"""
self.interval = interval
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
""" Method that runs forever """
while True:
# Do something
print('Doing something imporant in the background')
time.sleep(self.interval)
example = ThreadingExample()
time.sleep(3)
print('Checkpoint')
time.sleep(2)
print('Bye')

Using concurrent.futures to consume many dequeued messages a time

I'm consuming messages from a RabbitMQ channel, I wish I could consume n elements at a time. I think I could use a ProcessPoolExecutor (or ThreadPoolExecutor).
I just wonder if it's possible to know if there's a free executor in the pool.
This is what I want to write:
executor = futures.ProcessPoolExecutor(max_workers=5)
running = []
def consume(message):
print "actually consuming a single message"
def on_message(channel, method_frame, header_frame, message):
# this method is called once per incoming message
future = executor.submit(consume, message)
block_until_a_free_worker(executor, future)
def block_until_a_free_worker(executor, future):
running.append(future) # this grows forever!
futures.wait(running, timeout=5, return_when=futures.FIRST_COMPLETED)
[...]
channel.basic_consume(on_message, 'my_queue')
channel.start_consuming()
I need to write the function block_until_a_free_worker.
This methods should be able to check if all the running workers are in use or not.
In alternative I could use any blocking executor.submit option, if available.
I tried a different approach and change the list of futures meanwhile they are completed.
I tried to explicitly add and remove futures from a list and then waiting like this:
futures.wait(running, timeout=5, return_when=futures.FIRST_COMPLETED)
It seems it's not a solution.
I could set a future.add_done_callback, and possibily count the running instances...
Any hint or ideas?
Thank you.
I gave a similar answer here.
Semaphores serve the purpose of limiting the access to a resource to a set of workers.
from threading import Semaphore
from concurrent.futures import ProcessPoolExecutor
class TaskManager:
def __init__(self, workers):
self.pool = ProcessPoolExecutor(max_workers=workers)
self.workers = Semaphore(workers)
def new_task(self, function):
"""Start a new task, blocks if all workers are busy."""
self.workers.acquire() # flag a worker as busy
future = self.pool.submit(function, ... )
future.add_task_done(self.task_done)
def task_done(self, future):
"""Called once task is done, releases one worker."""
self.workers.release()

What kind of problems (if any) would there be combining asyncio with multiprocessing?

As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?
You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.
Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res
See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.

Python threading question

I have some python application with 2 threads. Each thread operates within a separate gui. The GUIs need to operate independently without blocking. I am trying to figure out how to make thread_1 trigger an event to happen in thread_2?
Below is some code I want function foo to trigger function bar in the simplest, most elegant way as quickly as possible, without consuming unnecessary resources. Below is what I've come up with.
bar_trigger=False #global trigger for function bar.
lock = threading.Lock()
class Thread_2(threading.Thread):
def run(self):
global lock, bar_trigger
while(True):
lock.acquire()
if bar_trigger==True:
Thread_2.bar() #function I want to happen
bar_trigger=False
lock.release()
time.sleep(100) #sleep to preserve resources
#would like to preserve as much resources as possible
# and sleep as little as possible.
def bar(self):
print "Bar!"
class Thread_1(threading.Thread):
def foo(self):
global lock, bar_trigger
lock.acquire()
bar_trigger=True #trigger for bar in thread2
lock.release()
Is there a better way to accomplish this? I'm not a threadding expert so any advice on how to best trigger a method in thread_2 from within thread_1 is appreciated.
Without knowing what you're doing and what GUI framework you're using, I can't get into much more detail, but from your problem's code snippet, it sounds like you're looking for something called conditional variables.
Python comes with them included by default in the threading module, under threading.Condition You might be interested in threading.Event as well.
How are these threads instantiated? There should really be a main thread that oversees the workers. For example,
import time
import threading
class Worker(threading.Thread):
def __init__(self, stopper):
threading.Thread.__init__(self)
self.stopper = stopper
def run(self):
while not self.stopper.is_set():
print 'Hello from Worker!'
time.sleep(1)
stop = threading.Event()
worker = Worker(stop)
worker.start()
# ...
stop.set()
Using a shared Event object is just one way of synchronizing and sending messages between threads. There are others, and their usages depend on the specifics.
One option would be to share a queue between the threads. Thread 1 would push an instruction into the queue and thread two would poll that queue. When Thread 2 sees the queue is non-empty, it reads off the first instruction in the queue and calls the appropriate function. This has the additional benefit of being fairly loosely couple which can make testing each thread in isolation easier.

Categories