Using concurrent.futures to consume many dequeued messages a time

Using concurrent.futures to consume many dequeued messages a time - python

I'm consuming messages from a RabbitMQ channel, I wish I could consume n elements at a time. I think I could use a ProcessPoolExecutor (or ThreadPoolExecutor).
I just wonder if it's possible to know if there's a free executor in the pool.
This is what I want to write:
executor = futures.ProcessPoolExecutor(max_workers=5)
running = []
def consume(message):
print "actually consuming a single message"
def on_message(channel, method_frame, header_frame, message):
# this method is called once per incoming message
future = executor.submit(consume, message)
block_until_a_free_worker(executor, future)
def block_until_a_free_worker(executor, future):
running.append(future) # this grows forever!
futures.wait(running, timeout=5, return_when=futures.FIRST_COMPLETED)
[...]
channel.basic_consume(on_message, 'my_queue')
channel.start_consuming()
I need to write the function block_until_a_free_worker.
This methods should be able to check if all the running workers are in use or not.
In alternative I could use any blocking executor.submit option, if available.
I tried a different approach and change the list of futures meanwhile they are completed.
I tried to explicitly add and remove futures from a list and then waiting like this:
futures.wait(running, timeout=5, return_when=futures.FIRST_COMPLETED)
It seems it's not a solution.
I could set a future.add_done_callback, and possibily count the running instances...
Any hint or ideas?
Thank you.

I gave a similar answer here.
Semaphores serve the purpose of limiting the access to a resource to a set of workers.
from threading import Semaphore
from concurrent.futures import ProcessPoolExecutor
class TaskManager:
def __init__(self, workers):
self.pool = ProcessPoolExecutor(max_workers=workers)
self.workers = Semaphore(workers)
def new_task(self, function):
"""Start a new task, blocks if all workers are busy."""
self.workers.acquire() # flag a worker as busy
future = self.pool.submit(function, ... )
future.add_task_done(self.task_done)
def task_done(self, future):
"""Called once task is done, releases one worker."""
self.workers.release()

Related

Does python provide a synchronized buffer?

I'm very familiar with Python queue.Queue. This is definitely the thing you want when you want to have a reliable stream between consumer and producer threads.
However, sometimes you have producers that are faster than consumers and are forced to drop data (as for live video frame capture, for example. We may typically want to buffer just the last one, or two frames).
Does Python provide an asynchronous buffer class, similar to queue.Queue?
It's not exactly obvious how to correctly implement one using queue.Queue.
I could, for example:
buf = queue.Queue(maxsize=3)
def produce(msg):
if buf.full():
buf.get(block=False) # Make space
buf.put(msg, block=False)
def consume():
msg = buf.get(block=True)
work(msg)
although I don't particularly like that produce is not a locked, queue-atomic operation. A consume may start between full and get, for example, and it would be (probably) broken for a multi-producer scenario.
Is there's an out-of-the-box solution?

There's nothing built in for this, but it appears straightforward enough to build your own buffer class that wraps a Queue and provides mutual exclusion between .put() and .get() with its own lock, and using a Condition variable to wake up would-be consumers whenever an item is added. Like so:
import threading
class SBuf:
def __init__(self, maxsize):
import queue
self.q = queue.Queue()
self.maxsize = maxsize
self.nonempty = threading.Condition()
def get(self):
with self.nonempty:
while not self.q.qsize():
self.nonempty.wait()
assert self.q.qsize()
return self.q.get()
def put(self, v):
with self.nonempty:
while self.q.qsize() >= self.maxsize:
self.q.get()
self.q.put(v)
assert 0 < self.q.qsize() <= self.maxsize
self.nonempty.notify_all()
BTW, I advise against trying to build this kind of logic out of raw locks. Of course it can be done, but Condition variables are very carefully designed to save you from universes of unintended race conditions. There's a learning curve for Condition variables, but one well worth climbing: they often make things easy instead of brain-busting. Indeed, Python's threading module uses them internally to implement all sort of things.
An Alternative
In the above, we only invoke queue.Queue methods under the protection of our own lock, so there's really no need to use a thread-safe container - we're supplying all the thread safety already.
So it would be a bit leaner to use a simpler container. Happily, a collections.deque can be configured to discard all but the most recent N entries itself, but "at C speed". Like so:
class SBuf:
def __init__(self, maxsize):
import collections
self.q = collections.deque(maxlen=maxsize)
self.maxsize = maxsize
self.nonempty = threading.Condition()
def get(self):
with self.nonempty:
while not self.q:
self.nonempty.wait()
assert self.q
return self.q.popleft()
def put(self, v):
with self.nonempty:
self.q.append(v) # discards oldest, if needed
assert 0 < len(self.q) <= self.maxsize
self.nonempty.notify()
This also changed .notify_all() to .notify(). In this use case, either works correctly, but we're only adding one item so there's no need to notify more than one consumer. If there are multiple consumers waiting, .notify_all() will wake all of them up but only the first will find a non-empty queue. The others will see that it's empty, and just .wait() again.

Queue is already multiprocessing and multithreading safe, in that you can't write and read from the queue at the same time. However, you are correct that there's nothing stopping the queue from getting modified between the full() and get commands.
As such you can use a lock, which is how you can control thread access between multiple lines. The lock can only be acquired once, so if its currently locked, all other threads will wait until it has been released before they continue.
import threading
lock = threading.Lock()
def produce(msg):
lock.acquire()
if buf.full():
buf.get(block=False) # Make space
buf.put(msg, block=False)
lock.release()
def consume():
msg = None
while !msg:
lock.acquire()
try:
msg = buf.get(block=False)
except queue.Empty:
# buffer is empty, wait and try again
sleep(0.01)
lock.release()
work(msg)

multiprocessing python is it possible to send to the pool a job sequentially

I am trying to send jobs to a Pool one by one (due to memory limits I cannot use map). Is it possible for the pool to digest if it has available workers and otherwise to wait?
e.g. can I do something like this?
p = Pool(n_workers)
for item in list:
p.work_on_this(item)
where work_on_this sends the job if the pool has more workers available and otherwise wait until it does.

The way the problem can be solved is by using a semaphore initialized with the number of workers. You acquire and release the semaphore before feeding the pool and after a worker has completed the task.
Here's an example working with Python 2.6 or greater.
from threading import Semaphore
from multiprocessing import Pool
def TaskManager:
def __init__(self, processes):
self.pool = Pool(processes=processes)
self.workers = Semaphore(processes)
def new_task(self):
"""Start a new task, block if all workers are busy."""
self.workers.acquire()
self.pool.apply_async(function, callback=self.task_done)
def task_done(self, results):
"""Called once task is done, releases the caller if blocked."""
self.workers.release()

How to terminate Producer-Consumer threads from main thread in Python?

I have a Producer and a Consumer thread (threading.Thread), which share a queue of type Queue.
Producer run:
while self.running:
product = produced() ### I/O operations
queue.put(product)
Consumer run:
while self.running or not queue.empty():
product = queue.get()
time.sleep(several_seconds) ###
consume(product)
Now I need to terminate both threads from main thread, with the requirement that queue must be empty (all consumed) before terminating.
Currently I'm using code like below to terminate these two threads:
main thread stop:
producer.running = False
producer.join()
consumer.running = False
consumer.join()
But I guess it's unsafe if there are more consumers.
In addition, I'm not sure whether the sleep will release schedule to the producer so that it can produce more products. In fact, I find the producer keeps "starving" but I'm not sure whether this is the root cause.
Is there a decent way to deal with this case?

You can put a sentinel object in queue to signal end of tasks, causing all consumers to terminate:
_sentinel = object()
def producer(queue):
while running:
# produce some data
queue.put(data)
queue.put(_sentinel)
def consumer(queue):
while True:
data = queue.get()
if data is _sentinel:
# put it back so that other consumers see it
queue.put(_sentinel)
break
# Process data
This snippet is shamelessly copied from Python Cookbook 12.3.
Use a _sentinel to mark end of queue. None also works if no task produced by producer is None, but using a _sentinel is safer for the more general case.
You don't need to put multiple end markers into queue, for each consumer. You may not be aware of how many threads are consuming. Just put the sentinel back into queue when a consumer finds it, for other consumers to get the signal.

Edit 2:
a) The reason your consumers keep taking so much time is because your loop runs continously even when you have no data.
b) I added code at that bottom that shows how to handle this.
If I understood you correctly, the producer/consumer is a continuous process, e.g. it is acceptable to delay the shutdown until you exit the current blocking I/O and process the data you received from that.
In that case, to shut down your producer and consumer in an orderly fashion, I would add communication from the main thread to the producer thread to invoke a shutdown. In the most general case, this could be a queue that the main thread can use to queue a "shutdown" code, but in the simple case of a single producer that is to be stopped and never restarted, it could simply be a global shutdown flag.
Your producer should check this shutdown condition (queue or flag) in its main loop right before it would start a blocking I/O operation (e.g. after you have finished sending other data to the consumer queue). If the flag is set, then it should put a special end-of-data code (that does not look like your normal data) on the queue to tell the consumer that a shut down is occurring, and then the producer should return (terminate itself).
The consumer should be modified to check for this end-of-data code whenever it pulls data out of the queue. If the end-of-data code is found, it should do an orderly shutdown and return (terminating itself).
If there are multiple consumers, then the producer could queue multiple end-of-data messages -- one for each consumer -- before it shuts down. Since the consumers stop consuming after they read the message, they will all eventually shut down.
Alternatively, if you do not know up-front how many consumers there are, then part of the orderly shut down of the consumer could be re-queueing the end-of-data code.
This will insure that all consumers eventually see the end-of-data code and shut down, and when all are done, there will be one remaining item in the queue -- the end-of-data code queued by the last consumer.
EDIT:
The correct way to represent your end-of-data code is highly application dependent, but in many cases a simple None works very well. Since None is a singleton, the consumer can use the very efficient if data is None construct to deal with the end case.
Another possibility that can be even more efficient in some cases is to set up a try /except outside your main consumer loop, in such a way that if the except happened, it was because you were trying to unpack the data in a way that always works except for when you are processing the end-of-data code.
EDIT 2:
Combining these concepts with your initial code, now the producer does this:
while self.running:
product = produced() ### I/O operations
queue.put(product)
for x in range(number_of_consumers):
queue.put(None) # Termination code
Each consumer does this:
while 1:
product = queue.get()
if product is None:
break
consume(product)
The main program can then just do this:
producer.running = False
producer.join()
for consumer in consumers:
consumer.join()

One observation from your code is that, your consumer will keep on looking for getting some thing from the queue, ideally you should handle that by keeping some timeout and handle Empty exception for the same like below, ideally this helps to check the while self.running or not queue.empty() for every timeout.
while self.running or not queue.empty():
try:
product = queue.get(timeout=1)
except Empty:
pass
time.sleep(several_seconds) ###
consume(product)
I did simulated your situation and created producer and consumer threads, Below is the sample code that is running with 2 producers and 4 consumers it's working very well. hope this helps you!
import time
import threading
from Queue import Queue, Empty
"""A multi-producer, multi-consumer queue."""
# A thread that produces data
class Producer(threading.Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
threading.Thread.__init__(self, group=group, target=target, name=name,
verbose=verbose)
self.running = True
self.name = name
self.args = args
self.kwargs = kwargs
def run(self):
out_q = self.kwargs.get('queue')
while self.running:
# Adding some integer
out_q.put(10)
# Kepping this thread in sleep not to do many iterations
time.sleep(0.1)
print 'producer {name} terminated\n'.format(name=self.name)
# A thread that consumes data
class Consumer(threading.Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
threading.Thread.__init__(self, group=group, target=target, name=name,
verbose=verbose)
self.args = args
self.kwargs = kwargs
self.producer_alive = True
self.name = name
def run(self):
in_q = self.kwargs.get('queue')
# Consumer should die one queue is producer si dead and queue is empty.
while self.producer_alive or not in_q.empty():
try:
data = in_q.get(timeout=1)
except Empty, e:
pass
# This part you can do anything to consume time
if isinstance(data, int):
# just doing some work, infact you can make this one sleep
for i in xrange(data + 10**6):
pass
else:
pass
print 'Consumer {name} terminated (Is producer alive={pstatus}, Is Queue empty={qstatus})!\n'.format(
name=self.name, pstatus=self.producer_alive, qstatus=in_q.empty())
# Create the shared queue and launch both thread pools
q = Queue()
producer_pool, consumer_pool = [], []
for i in range(1, 3):
producer_worker = Producer(kwargs={'queue': q}, name=str(i))
producer_pool.append(producer_worker)
producer_worker.start()
for i in xrange(1, 5):
consumer_worker = Consumer(kwargs={'queue': q}, name=str(i))
consumer_pool.append(consumer_worker)
consumer_worker.start()
while 1:
control_process = raw_input('> Y/N: ')
if control_process == 'Y':
for producer in producer_pool:
producer.running = False
# Joining this to make sure all the producers die
producer.join()
for consumer in consumer_pool:
# Ideally consumer should stop once producers die
consumer.producer_alive = False
break

What kind of problems (if any) would there be combining asyncio with multiprocessing?

As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?

You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.

Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res

See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.

How to efficiently do many tasks a "little later" in Python?

I have a process, that needs to perform a bunch of actions "later" (after 10-60 seconds usually). The problem is that those "later" actions can be a lot (1000s), so using a Thread per task is not viable. I know for the existence of tools like gevent and eventlet, but one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it).
What I'm wondering is What are my options? So, suggestions are welcome, in the lines of libraries (if you've used any of the mentioned please share your experiences), techniques (Python's "coroutine" support, use one thread that sleeps for a while and checks a queue), how to make use of zeromq's poll or eventloop to do the job, or something else.

consider using a priority queue with one or more worker threads to service the tasks. The main thread can add work to the queue, with a timestamp of the soonest it should be serviced. Worker threads pop work off the queue, sleep until the time of priority value is reached, do the work, and then pop another item off the queue.
How about a more fleshed out answer. mklauber makes a good point. If there's a chance all of your workers might be sleeping when you have new, more urgent work, then a queue.PriorityQueue isn't really the solution, although a "priority queue" is still the technique to use, which is available from the heapq module. Instead, we'll make use of a different synchronization primitive; a condition variable, which in python is spelled threading.Condition.
The approach is fairly simple, peek on the heap, and if the work is current, pop it off and do that work. If there was work, but it's scheduled into the future, just wait on the condition until then, or if there's no work at all, sleep forever.
The producer does it's fair share of the work; every time it adds new work, it notifies the condition, so if there are sleeping workers, they'll wake up and recheck the queue for newer work.
import heapq, time, threading
START_TIME = time.time()
SERIALIZE_STDOUT = threading.Lock()
def consumer(message):
"""the actual work function. nevermind the locks here, this just keeps
the output nicely formatted. a real work function probably won't need
it, or might need quite different synchronization"""
SERIALIZE_STDOUT.acquire()
print time.time() - START_TIME, message
SERIALIZE_STDOUT.release()
def produce(work_queue, condition, timeout, message):
"""called to put a single item onto the work queue."""
prio = time.time() + float(timeout)
condition.acquire()
heapq.heappush(work_queue, (prio, message))
condition.notify()
condition.release()
def worker(work_queue, condition):
condition.acquire()
stopped = False
while not stopped:
now = time.time()
if work_queue:
prio, data = work_queue[0]
if data == 'stop':
stopped = True
continue
if prio < now:
heapq.heappop(work_queue)
condition.release()
# do some work!
consumer(data)
condition.acquire()
else:
condition.wait(prio - now)
else:
# the queue is empty, wait until notified
condition.wait()
condition.release()
if __name__ == '__main__':
# first set up the work queue and worker pool
work_queue = []
cond = threading.Condition()
pool = [threading.Thread(target=worker, args=(work_queue, cond))
for _ignored in range(4)]
map(threading.Thread.start, pool)
# now add some work
produce(work_queue, cond, 10, 'Grumpy')
produce(work_queue, cond, 10, 'Sneezy')
produce(work_queue, cond, 5, 'Happy')
produce(work_queue, cond, 10, 'Dopey')
produce(work_queue, cond, 15, 'Bashful')
time.sleep(5)
produce(work_queue, cond, 5, 'Sleepy')
produce(work_queue, cond, 10, 'Doc')
# and just to make the example a bit more friendly, tell the threads to stop after all
# the work is done
produce(work_queue, cond, float('inf'), 'stop')
map(threading.Thread.join, pool)

This answer has actually two suggestions - my first one and another I have discovered after the first one.
sched
I suspect you are looking for the sched module.
EDIT: my bare suggestion seemed little helpful after I have read it. So I decided to test the sched module to see if it can work as I suggested. Here comes my test: I would use it with a sole thread, more or less this way:
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
def run(self):
self.scheduler.run()
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
First, I'd create a thread class which would have its own scheduler and a queue. At least one event would be registered in the scheduler: one for invoking a method for scheduling events from the queue.
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The method for scheduling events from the queue would lock the queue, schedule each event, empty the queue and schedule itself again, for looking for new events some time in the future. Note that the period for looking for new events is short (one second), you may change it:
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The class should also have a method for scheduling user events. Naturally, this method should lock the queue while updating it:
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
Finally, the class should invoke the scheduler main method:
def run(self):
self.scheduler.run()
Here comes an example of using:
def print_time():
print "scheduled:", time.time()
if __name__ == "__main__":
st = SchedulingThread()
st.start()
st.schedule(print_time, 10)
while True:
print "main thread:", time.time()
time.sleep(5)
st.join()
Its output in my machine is:
$ python schedthread.py
main thread: 1311089765.77
Registerd event (10, 1, <function print_time at 0x2f4bb0>, ())
main thread: 1311089770.77
main thread: 1311089775.77
scheduled: 1311089776.77
main thread: 1311089780.77
main thread: 1311089785.77
This code is just a quick'n'dirty example, it may need some work. However, I have to confess that I am a bit fascinated by the sched module, so did I suggest it. You may want to look for other suggestions as well :)
APScheduler
Looking in Google for solutions like the one I've post, I found this amazing APScheduler module. It is so practical and useful that I bet it is your solution. My previous example would be way simpler with this module:
from apscheduler.scheduler import Scheduler
import time
sch = Scheduler()
sch.start()
#sch.interval_schedule(seconds=10)
def print_time():
print "scheduled:", time.time()
sch.unschedule_func(print_time)
while True:
print "main thread:", time.time()
time.sleep(5)
(Unfortunately I did not find how to schedule an event to execute only once, so the function event should unschedule itself. I bet it can be solved with some decorator.)

If you have a bunch of tasks that need to get performed later, and you want them to persist even if you shut down the calling program or your workers, you should really look into Celery, which makes it super easy to create new tasks, have them executed on any machine you'd like, and wait for the results.
From the Celery page, "This is a simple task adding two numbers:"
from celery.task import task
#task
def add(x, y):
return x + y
You can execute the task in the background, or wait for it to finish:
>>> result = add.delay(8, 8)
>>> result.wait() # wait for and return the result
16

You wrote:
one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it)
Seems like your choice will be heavily influenced by these details, which are a bit unclear—how is zeromq being used for communication, how much resources will the integration will require, and what are your requirements and available resources.
There's a project called django-ztask which uses zeromq and provides a task decorator similar to celery's one. However, it is (obviously) Django-specific and so may not be suitable in your case. I haven't used it, prefer celery myself.
Been using celery for a couple of projects (these are hosted at ep.io PaaS hosting, which provides an easy way to use it).
Celery looks like quite flexible solution, allowing delaying tasks, callbacks, task expiration & retrying, limiting task execution rate, etc. It may be used with Redis, Beanstalk, CouchDB, MongoDB or an SQL database.
Example code (definition of task and asynchronous execution after a delay):
from celery.decorators import task
#task
def my_task(arg1, arg2):
pass # Do something
result = my_task.apply_async(
args=[sth1, sth2], # Arguments that will be passed to `my_task()` function.
countdown=3, # Time in seconds to wait before queueing the task.
)
See also a section in celery docs.

Have you looked at the multiprocessing module? It comes standard with Python. It is similar to the threading module, but runs each task in a process. You can use a Pool() object to set up a worker pool, then use the .map() method to call a function with the various queued task arguments.

Pyzmq has an ioloop implementation with a similar api to that of the tornado ioloop. It implements a DelayedCallback which may help you.

Presuming your process has a run loop which can receive signals and the length of time of each action is within bounds of sequential operation, use signals and posix alarm()
signal.alarm(time)
If time is non-zero, this function requests that a
SIGALRM signal be sent to the process in time seconds.
This depends on what you mean by "those "later" actions can be a lot" and if your process already uses signals. Due to phrasing of the question it's unclear why an external python package would be needed.

Another option is to use the Phyton GLib bindings, in particular its timeout functions.
It's a good choice as long as you don't want to make use of multiple cores and as long as the dependency on GLib is no problem. It handles all events in the same thread which prevents synchronization issues. Additionally, its event framework can also be used to watch and handle IO-based (i.e. sockets) events.
UPDATE:
Here's a live session using GLib:
>>> import time
>>> import glib
>>>
>>> def workon(thing):
... print("%s: working on %s" % (time.time(), thing))
... return True # use True for repetitive and False for one-time tasks
...
>>> ml = glib.MainLoop()
>>>
>>> glib.timeout_add(1000, workon, "this")
2
>>> glib.timeout_add(2000, workon, "that")
3
>>>
>>> ml.run()
1311343177.61: working on this
1311343178.61: working on that
1311343178.61: working on this
1311343179.61: working on this
1311343180.61: working on this
1311343180.61: working on that
1311343181.61: working on this
1311343182.61: working on this
1311343182.61: working on that
1311343183.61: working on this

Well in my opinion you could use something called "cooperative multitasking". It's twisted-based thing and its really cool. Just look at PyCon presentation from 2010: http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-cooperative-multitasking-with-twisted-getting-things-done-concurrently-11-3352182
Well you will need transport queue to do this too...

Simple. You can inherit your class from Thread and create instance of your class with Param like timeout so for each instance of your class you can say timeout that will make your thread wait for that time

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.