I'm very familiar with Python queue.Queue. This is definitely the thing you want when you want to have a reliable stream between consumer and producer threads.
However, sometimes you have producers that are faster than consumers and are forced to drop data (as for live video frame capture, for example. We may typically want to buffer just the last one, or two frames).
Does Python provide an asynchronous buffer class, similar to queue.Queue?
It's not exactly obvious how to correctly implement one using queue.Queue.
I could, for example:
buf = queue.Queue(maxsize=3)
def produce(msg):
if buf.full():
buf.get(block=False) # Make space
buf.put(msg, block=False)
def consume():
msg = buf.get(block=True)
work(msg)
although I don't particularly like that produce is not a locked, queue-atomic operation. A consume may start between full and get, for example, and it would be (probably) broken for a multi-producer scenario.
Is there's an out-of-the-box solution?
There's nothing built in for this, but it appears straightforward enough to build your own buffer class that wraps a Queue and provides mutual exclusion between .put() and .get() with its own lock, and using a Condition variable to wake up would-be consumers whenever an item is added. Like so:
import threading
class SBuf:
def __init__(self, maxsize):
import queue
self.q = queue.Queue()
self.maxsize = maxsize
self.nonempty = threading.Condition()
def get(self):
with self.nonempty:
while not self.q.qsize():
self.nonempty.wait()
assert self.q.qsize()
return self.q.get()
def put(self, v):
with self.nonempty:
while self.q.qsize() >= self.maxsize:
self.q.get()
self.q.put(v)
assert 0 < self.q.qsize() <= self.maxsize
self.nonempty.notify_all()
BTW, I advise against trying to build this kind of logic out of raw locks. Of course it can be done, but Condition variables are very carefully designed to save you from universes of unintended race conditions. There's a learning curve for Condition variables, but one well worth climbing: they often make things easy instead of brain-busting. Indeed, Python's threading module uses them internally to implement all sort of things.
An Alternative
In the above, we only invoke queue.Queue methods under the protection of our own lock, so there's really no need to use a thread-safe container - we're supplying all the thread safety already.
So it would be a bit leaner to use a simpler container. Happily, a collections.deque can be configured to discard all but the most recent N entries itself, but "at C speed". Like so:
class SBuf:
def __init__(self, maxsize):
import collections
self.q = collections.deque(maxlen=maxsize)
self.maxsize = maxsize
self.nonempty = threading.Condition()
def get(self):
with self.nonempty:
while not self.q:
self.nonempty.wait()
assert self.q
return self.q.popleft()
def put(self, v):
with self.nonempty:
self.q.append(v) # discards oldest, if needed
assert 0 < len(self.q) <= self.maxsize
self.nonempty.notify()
This also changed .notify_all() to .notify(). In this use case, either works correctly, but we're only adding one item so there's no need to notify more than one consumer. If there are multiple consumers waiting, .notify_all() will wake all of them up but only the first will find a non-empty queue. The others will see that it's empty, and just .wait() again.
Queue is already multiprocessing and multithreading safe, in that you can't write and read from the queue at the same time. However, you are correct that there's nothing stopping the queue from getting modified between the full() and get commands.
As such you can use a lock, which is how you can control thread access between multiple lines. The lock can only be acquired once, so if its currently locked, all other threads will wait until it has been released before they continue.
import threading
lock = threading.Lock()
def produce(msg):
lock.acquire()
if buf.full():
buf.get(block=False) # Make space
buf.put(msg, block=False)
lock.release()
def consume():
msg = None
while !msg:
lock.acquire()
try:
msg = buf.get(block=False)
except queue.Empty:
# buffer is empty, wait and try again
sleep(0.01)
lock.release()
work(msg)
I use a recurring python timer thread and would like to give it a name. Currently, python gives every new thread the name Thread-<number> and increments the number on every new timer start. I would like the name to remain the same. The basic Thread class supports being named, Timers, however, do not:
class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *,
daemon=None)
Note the name=None attribute, which does not exist in Timer.
class threading.Timer(interval, function, args=None, kwargs=None)
Any ideas on how I can give a Timer a name? I guess I could derive my own Timer class and add name, but I don't think the python interpreter would pick it up as the thread's name...
Timers are a Thread subclass, which have names, so you can assign a custom name to one after it's created by just assigning a value to its name attribute—so it doesn't matter that the Timer constructor doesn't accept an argument allowing this.
If you do this a lot, you could write a relatively trivial utility function that automated doing this for you (or derive your own NamedTimer(Timer) subclass, which would be about the same number of lines of code):
try:
from threading import Timer
except ImportError:
from threading import _Timer as Timer # Python <3.3
def named_timer(name, interval, function, *args, **kwargs):
"""Factory function to create named Timer objects.
Named timers call a function after a specified number of seconds:
t = named_timer('Name', 30.0, function)
t.start()
t.cancel() # stop the timer's action if it's still waiting
"""
timer = Timer(interval, function, *args, **kwargs)
timer.name = name
return timer
if __name__ == '__main__':
def func():
print('func() called')
timer = named_timer('Fidgit', 3, func)
print('timer.name: {!r}'.format(timer.name)) # -> timer.name: 'Fidgit'
timer.run() # Causes "func() called" to be printed after a few seconds.
I looked for the same information and landed here. I found that one way of doing this is:
timer = Timer(interval, function)
timer.setName('AnyTimerName')
I'm very new to python development, I need to call a function every x seconds.
So I'm trying to use a timer for that, something like:
def start_working_interval():
def timer_tick():
do_some_work() // need to be called on the main thread
timer = threading.Timer(10.0, timer_tick)
timer.start()
timer = threading.Timer(10.0, timer_tick)
timer.start()
the do_some_work() method need to be called on the main thread, and I think using the timer causing it to execute on different thread.
so my question is, how can I call this method on the main thread?
I'm now sure what you trying to achive but i played with your code and did this:
import threading
import datetime
def do_some_work():
print datetime.datetime.now()
def start_working_interval():
def timer_tick():
do_some_work()
timer = threading.Timer(10.0, timer_tick)
timer.start()
timer_tick()
start_working_interval()
So basically what i did was to set the Time inside the timer_tick() so it will call it-self after 10 sec and so on, but i removed the second timer.
I needed to do this too, here's what I did:
import time
MAXBLOCKINGSECONDS=5 #maximum time that a new task will have to wait before it's presence in the queue gets noticed.
class repeater:
repeatergroup=[] #our only static data member it holds the current list of the repeaters that need to be serviced
def __init__(self,callback,interval):
self.callback=callback
self.interval=abs(interval) #because negative makes no sense, probably assert would be better.
self.reset()
self.processing=False
def reset(self):
self.nextevent=time.time()+self.interval
def whennext(self):
return self.nextevent-time.time() #time until next event
def service(self):
if time.time()>=self.nextevent:
if self.processing=True: #or however you want to be re-entrant safe or thread safe
return 0
self.processing==True
self.callback(self) #just stuff all your args into the class and pull them back out?
#use this calculation if you don't want slew
self.nextevent+=self.interval
#reuse this calculation if you do want slew/don't want backlog
#self.reset()
#or put it just before the callback
self.processing=False
return 1
return 0
#this the transition code between class and classgroup
#I had these three as a property getter and setter but it was behaving badly/oddly
def isenabled(self):
return (self in self.repeatergroup)
def start(self):
if not (self in self.repeatergroup):
self.repeatergroup.append(self)
#another logical place to call reset if you don't want backlog:
#self.reset()
def stop(self):
if (self in self.repeatergroup):
self.repeatergroup.remove(self)
#group calls in c++ I'd make these static
def serviceall(self): #the VB hacker in me wants to name this doevents(), the c hacker in me wants to name this probe
ret=0
for r in self.repeatergroup:
ret+=r.service()
return ret
def minwhennext(self,max): #this should probably be hidden
ret=max
for r in self.repeatergroup:
ret=min(ret,r.whennext())
return ret
def sleep(self,seconds):
if not isinstance(threading.current_thread(), threading._MainThread): #if we're not on the main thread, don't process handlers, just sleep.
time.sleep(seconds)
return
endtime=time.time()+seconds #record when caller wants control back
while time.time()<=endtime: #spin until then
while self.serviceall()>0: #service each member of the group until none need service
if (time.time()>=endtime):
return #break out of service loop if caller needs control back already
#done with servicing for a while, yield control to os until we have
#another repeater to service or it's time to return control to the caller
minsleeptime=min(endtime-time.time(),MAXBLOCKINGPERIOD) #smaller of caller's requested blocking time, and our sanity number (1 min might be find for some systems, 5 seconds is good for some systems, 0.25 to 0.03 might be better if there could be video refresh code waiting, 0.15-0.3 seems a common range for software denouncing of hardware buttons.
minsleeptime=self.minwhennext(minsleeptime)
time.sleep(max(0,minsleeptime))
###################################################################
# and now some demo code:
def handler1(repeater):
print("latency is currently {0:0.7}".format(time.time()-repeater.nextevent))
repeater.count+=repeater.interval
print("Seconds: {0}".format(repeater.count))
def handler2(repeater): #or self if you prefer
print("Timed message is: {0}".format(repeater.message))
if repeater.other.isenabled():
repeater.other.stop()
else:
repeater.other.start()
repeater.interval+=1
def demo_main():
counter=repeater(handler1,1)
counter.count=0 #I'm still new enough to python
counter.start()
greeter=repeater(handler2,2)
greeter.message="Hello world." #that this feels like cheating
greeter.other=counter #but it simplifies everything.
greeter.start()
print ("Currently {0} repeaters in service group.".format(len(repeater.repeatergroup)))
print("About to yield control for a while")
greeter.sleep(10)
print("Got control back, going to do some processing")
time.sleep(5)
print("About to yield control for a while")
counter.sleep(20) #you can use any repeater to access sleep() but
#it will only service those currently enabled.
#notice how it gets behind but tries to catch up, we could add repeater.reset()
#at the beginning of a handler to make it ignore missed events, or at the
#end to let the timing slide, depending on what kind of processing we're doing
#and what sort of sensitivity there is to time.
#now just replace all your main thread's calls to time.sleep() with calls to mycounter.sleep()
#now just add a repeater.sleep(.01) or a while repeater.serviceall(): pass to any loop that will take too long.
demo_main()
There's a couple of odd things left to consider:
Would it be better to sort handlers that you'd prefer to run on main thread from handlers that you don't care? I later went on to add a threadingstyle property, which depending on it's value would run on main thread only, on either main thread or a shared/group thread, or stand alone on it's own thread. That way longer or more time-sensitive tasks, could run without causing the other threads to be as slowed down, or closer to their scheduled time.
I wonder whether, depending on the implementation details of threading: is my 'if not main thread: time.sleep(seconds); return' effectively make it sufficiently more likely to be the main thread's turn, and I shouldn't worry about the difference.
(It seems like adding our MAXBLOCKINGPERIOD as the 3rd arg to the sched library could fix it's notorious issue of not servicing new events after older longer in the future events have already hit the front of the queue.)
I have a database with a lot of records and a code with Django framework. right now I run a query on that database and collect the results in a list. each record has a field called priority, then with a for statement process them one by one according that priority. But I have a problem.
My database is very dynamic and while I'm processing current list may I have a new record in database with a higher priority! I have to process it first but it current architecture, I can't, I have to wait to terminate current list processes. how i can achieve my goal?
I have an alternative way but i'm not sure that it is the best way. Inside a while statement, I can run a query to database and fetch only one record that has higher priority.
What's your opinion to my alternative solution? Is there a better way?
You can use treading to get threads that process or sub-process your high priority data with Queue, as said by WhozCraig.
Here's an example of how it could look like. If you want to use multiple threads and more functions than only run() you will have to redefine the thread object calling from thread1_high_priority = High_priority_Thread(1, 10, queue)# where the parameters are defined in run() to
thread1_high_priority = High_priority_Thread(target= functionname, name = name)# and the same in init, def init (self, target, name):.
import Queue
import threading
import time
queue = Queue.Queue()
class High_priority_first(threading.Thread):
""" a threading class"""
def __init__ (self, start, stop, queue):
self.start = start
self.stop = stop
self.queue = queue
threading.Thread.__init__(self)
# Write a function, run(), that counts the higher priority data and extend it to
# also count lower priority, or create another function for low priority data and
# run them with a separate thread than thread 1.
def run(self):
while True:
if self.start != stop:
self.start += 1
self.queue.put(self.start)
else:
break
thread1_high_priority = High_priority_Thread(1, 10, queue)# start at 1 and stop at 10
thread1_high_priority.start() #start thread1
thread2_lower_priority = High_priority_Thread(1, 3, queue)# start at 1 and stop at 3
thread2_lower_priority.start() #start thread2
while True:
if queue != None: # check that queue isn't empty
out = queue.get()
print out
else:
break
I have a process, that needs to perform a bunch of actions "later" (after 10-60 seconds usually). The problem is that those "later" actions can be a lot (1000s), so using a Thread per task is not viable. I know for the existence of tools like gevent and eventlet, but one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it).
What I'm wondering is What are my options? So, suggestions are welcome, in the lines of libraries (if you've used any of the mentioned please share your experiences), techniques (Python's "coroutine" support, use one thread that sleeps for a while and checks a queue), how to make use of zeromq's poll or eventloop to do the job, or something else.
consider using a priority queue with one or more worker threads to service the tasks. The main thread can add work to the queue, with a timestamp of the soonest it should be serviced. Worker threads pop work off the queue, sleep until the time of priority value is reached, do the work, and then pop another item off the queue.
How about a more fleshed out answer. mklauber makes a good point. If there's a chance all of your workers might be sleeping when you have new, more urgent work, then a queue.PriorityQueue isn't really the solution, although a "priority queue" is still the technique to use, which is available from the heapq module. Instead, we'll make use of a different synchronization primitive; a condition variable, which in python is spelled threading.Condition.
The approach is fairly simple, peek on the heap, and if the work is current, pop it off and do that work. If there was work, but it's scheduled into the future, just wait on the condition until then, or if there's no work at all, sleep forever.
The producer does it's fair share of the work; every time it adds new work, it notifies the condition, so if there are sleeping workers, they'll wake up and recheck the queue for newer work.
import heapq, time, threading
START_TIME = time.time()
SERIALIZE_STDOUT = threading.Lock()
def consumer(message):
"""the actual work function. nevermind the locks here, this just keeps
the output nicely formatted. a real work function probably won't need
it, or might need quite different synchronization"""
SERIALIZE_STDOUT.acquire()
print time.time() - START_TIME, message
SERIALIZE_STDOUT.release()
def produce(work_queue, condition, timeout, message):
"""called to put a single item onto the work queue."""
prio = time.time() + float(timeout)
condition.acquire()
heapq.heappush(work_queue, (prio, message))
condition.notify()
condition.release()
def worker(work_queue, condition):
condition.acquire()
stopped = False
while not stopped:
now = time.time()
if work_queue:
prio, data = work_queue[0]
if data == 'stop':
stopped = True
continue
if prio < now:
heapq.heappop(work_queue)
condition.release()
# do some work!
consumer(data)
condition.acquire()
else:
condition.wait(prio - now)
else:
# the queue is empty, wait until notified
condition.wait()
condition.release()
if __name__ == '__main__':
# first set up the work queue and worker pool
work_queue = []
cond = threading.Condition()
pool = [threading.Thread(target=worker, args=(work_queue, cond))
for _ignored in range(4)]
map(threading.Thread.start, pool)
# now add some work
produce(work_queue, cond, 10, 'Grumpy')
produce(work_queue, cond, 10, 'Sneezy')
produce(work_queue, cond, 5, 'Happy')
produce(work_queue, cond, 10, 'Dopey')
produce(work_queue, cond, 15, 'Bashful')
time.sleep(5)
produce(work_queue, cond, 5, 'Sleepy')
produce(work_queue, cond, 10, 'Doc')
# and just to make the example a bit more friendly, tell the threads to stop after all
# the work is done
produce(work_queue, cond, float('inf'), 'stop')
map(threading.Thread.join, pool)
This answer has actually two suggestions - my first one and another I have discovered after the first one.
sched
I suspect you are looking for the sched module.
EDIT: my bare suggestion seemed little helpful after I have read it. So I decided to test the sched module to see if it can work as I suggested. Here comes my test: I would use it with a sole thread, more or less this way:
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
def run(self):
self.scheduler.run()
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
First, I'd create a thread class which would have its own scheduler and a queue. At least one event would be registered in the scheduler: one for invoking a method for scheduling events from the queue.
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The method for scheduling events from the queue would lock the queue, schedule each event, empty the queue and schedule itself again, for looking for new events some time in the future. Note that the period for looking for new events is short (one second), you may change it:
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The class should also have a method for scheduling user events. Naturally, this method should lock the queue while updating it:
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
Finally, the class should invoke the scheduler main method:
def run(self):
self.scheduler.run()
Here comes an example of using:
def print_time():
print "scheduled:", time.time()
if __name__ == "__main__":
st = SchedulingThread()
st.start()
st.schedule(print_time, 10)
while True:
print "main thread:", time.time()
time.sleep(5)
st.join()
Its output in my machine is:
$ python schedthread.py
main thread: 1311089765.77
Registerd event (10, 1, <function print_time at 0x2f4bb0>, ())
main thread: 1311089770.77
main thread: 1311089775.77
scheduled: 1311089776.77
main thread: 1311089780.77
main thread: 1311089785.77
This code is just a quick'n'dirty example, it may need some work. However, I have to confess that I am a bit fascinated by the sched module, so did I suggest it. You may want to look for other suggestions as well :)
APScheduler
Looking in Google for solutions like the one I've post, I found this amazing APScheduler module. It is so practical and useful that I bet it is your solution. My previous example would be way simpler with this module:
from apscheduler.scheduler import Scheduler
import time
sch = Scheduler()
sch.start()
#sch.interval_schedule(seconds=10)
def print_time():
print "scheduled:", time.time()
sch.unschedule_func(print_time)
while True:
print "main thread:", time.time()
time.sleep(5)
(Unfortunately I did not find how to schedule an event to execute only once, so the function event should unschedule itself. I bet it can be solved with some decorator.)
If you have a bunch of tasks that need to get performed later, and you want them to persist even if you shut down the calling program or your workers, you should really look into Celery, which makes it super easy to create new tasks, have them executed on any machine you'd like, and wait for the results.
From the Celery page, "This is a simple task adding two numbers:"
from celery.task import task
#task
def add(x, y):
return x + y
You can execute the task in the background, or wait for it to finish:
>>> result = add.delay(8, 8)
>>> result.wait() # wait for and return the result
16
You wrote:
one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it)
Seems like your choice will be heavily influenced by these details, which are a bit unclear—how is zeromq being used for communication, how much resources will the integration will require, and what are your requirements and available resources.
There's a project called django-ztask which uses zeromq and provides a task decorator similar to celery's one. However, it is (obviously) Django-specific and so may not be suitable in your case. I haven't used it, prefer celery myself.
Been using celery for a couple of projects (these are hosted at ep.io PaaS hosting, which provides an easy way to use it).
Celery looks like quite flexible solution, allowing delaying tasks, callbacks, task expiration & retrying, limiting task execution rate, etc. It may be used with Redis, Beanstalk, CouchDB, MongoDB or an SQL database.
Example code (definition of task and asynchronous execution after a delay):
from celery.decorators import task
#task
def my_task(arg1, arg2):
pass # Do something
result = my_task.apply_async(
args=[sth1, sth2], # Arguments that will be passed to `my_task()` function.
countdown=3, # Time in seconds to wait before queueing the task.
)
See also a section in celery docs.
Have you looked at the multiprocessing module? It comes standard with Python. It is similar to the threading module, but runs each task in a process. You can use a Pool() object to set up a worker pool, then use the .map() method to call a function with the various queued task arguments.
Pyzmq has an ioloop implementation with a similar api to that of the tornado ioloop. It implements a DelayedCallback which may help you.
Presuming your process has a run loop which can receive signals and the length of time of each action is within bounds of sequential operation, use signals and posix alarm()
signal.alarm(time)
If time is non-zero, this function requests that a
SIGALRM signal be sent to the process in time seconds.
This depends on what you mean by "those "later" actions can be a lot" and if your process already uses signals. Due to phrasing of the question it's unclear why an external python package would be needed.
Another option is to use the Phyton GLib bindings, in particular its timeout functions.
It's a good choice as long as you don't want to make use of multiple cores and as long as the dependency on GLib is no problem. It handles all events in the same thread which prevents synchronization issues. Additionally, its event framework can also be used to watch and handle IO-based (i.e. sockets) events.
UPDATE:
Here's a live session using GLib:
>>> import time
>>> import glib
>>>
>>> def workon(thing):
... print("%s: working on %s" % (time.time(), thing))
... return True # use True for repetitive and False for one-time tasks
...
>>> ml = glib.MainLoop()
>>>
>>> glib.timeout_add(1000, workon, "this")
2
>>> glib.timeout_add(2000, workon, "that")
3
>>>
>>> ml.run()
1311343177.61: working on this
1311343178.61: working on that
1311343178.61: working on this
1311343179.61: working on this
1311343180.61: working on this
1311343180.61: working on that
1311343181.61: working on this
1311343182.61: working on this
1311343182.61: working on that
1311343183.61: working on this
Well in my opinion you could use something called "cooperative multitasking". It's twisted-based thing and its really cool. Just look at PyCon presentation from 2010: http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-cooperative-multitasking-with-twisted-getting-things-done-concurrently-11-3352182
Well you will need transport queue to do this too...
Simple. You can inherit your class from Thread and create instance of your class with Param like timeout so for each instance of your class you can say timeout that will make your thread wait for that time