I write a python class which makes asynchronous method calls using D-Bus. When my reply_handler is called, it stores data in list. This list can be used by another class methods at the same time. Is it safe or I can use only synchronized data structures like Queue class?
If you do not modify the list outside of the callback context, then you do not necessarily need synchronization - you will just need to be aware that the list object's state is volatile.
If the list must be modified both in the callback handler as well as, say, the main execution context (or other threads, etc.), then yes you will need synchronization.
The Python synchronized Queue works naturally for message pumps - allowing you to perform actions sequentially in the order that the events come in one of your own contexts. This benefits code simplicity and readability as well since major state changes are easier to track. Callbacks generally shouldn't be too complicated anyway as the outside context in which the callbacks are called shouldn't (and probably doesn't) have to deal with exceptions raised from your code. There are also potential timing considerations as well - the callback will block the async emitter's context - so keeping the handler short and sweet is also good.
Related
Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?
A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.
I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function
Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.
Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.
Is there a synchronized set class in Python? Like Queue.Queue? I'm sending messages to a JMS queue and need to handle reciepts:
Keep track of sent messages in a set
When a receipt is received, removed it from the set
When the set is empty, set an Event
Something with the same interface as Queue would be perfect, but I need to be able to remove things in any order.
Look for object Locking.
http://docs.python.org/library/threading.html
Basicly, you lock the thread's execution based on an object. When you're done with the object, you release it and the thread continues.
Well Queue.Queue use a deque under the hood (and deque is not thread safe).
So what you can do is extend the Queue.Queue class and add a new method and in it you can call the deque method deque.remove() (if this is what you mean by removing elements in any order), and make sure that your new method is thread safe look at the Queue.Queue.put() method to see an example of what you should do.
Well it's a bit risky and race condition problems are very hard to debug if you miss something, but hope this can give a clear view.
Is there a Boost Threads equivalent to Python's threading.Event
Less specifically, is there a synchronization primitive that allows threads to pass when an internal value is set, and blocks them when not?
You should use Boost's condition variables. Condition variables avoid some of the pitfalls that can happen with event objects. I find it hard to use event objects correctly in some corner cases: multiple triggers before the event is handled, some state is changed before the handler is called, etc.
The examples in the Boost documentation are quite self-explanatory.
I'm learning to use the Queue module, and am a bit confused about how a queue consumer thread can be made to know that the queue is complete. Ideally I'd like to use get() from within the consumer thread and have it throw an exception if the queue has been marked "done". Is there a better way to communicate this than by appending a sentinel value to mark the last item in the queue?
original (most of this has changed; see updates below)
Based on some of the suggestions (thanks!) of Glenn Maynard and others, I decided to roll up a descendant of Queue.Queue that implements a close method. It's available in the form of a primitive (unpackaged) module. I'll clean this up a bit and package it properly when I have a bit more time. For now the module only contains the CloseableQueue class and the Closed exception class. I'm planning to expand it to also include subclasses of Queue.LifoQueue and Queue.PriorityQueue.
It's in a pretty preliminary state currently, which is to say that although it passes its test suite, I haven't actually used it for anything yet. Your mileage may vary. I'll keep this answer updated with exciting news.
The CloseableQueue class differs a bit from Glenn's suggestion in that closing the queue will prevent future puts, but not prevent future gets until the queue is emptied. This made the most sense to me; it seemed like functionality to clear the queue could be added as a separate mixin* that would be orthogonal to the closeability functionality. So basically with CloseableQueue, by closing the queue you indicate that the last element has been put. There's also an option to do this atomically by passing last=True to the final put call. Subsequent calls to put, and subsequent calls to get once the queue is emptied, as well as outstanding blocked calls matching those descriptions, will raise the Closed exception.
This is mostly useful for situations where a single producer is generating data for one or more consumers, but it could also be useful for a multi-multi arrangement where consumers are waiting for a particular item or set of items. In particular it doesn't provide a way to determine that all of a number of producers have finished production. Getting that working would entail the provision of some way to register producers (.open()?), as well as a way to indicate that producer registration is itself closed.
Suggestions and/or code reviews are quite welcome. I haven't written a whole lot of concurrency code, but hopefully the test suite is thorough enough that the fact that the code passes it is an indication of the code's quality, rather than the suite's lack thereof. I was able to reuse a bunch of the code from the Queue module's test suite: the file itself is included in this module and used as a basis for various subclasses and routines, including regression testing. This probably (hopefully) helped to avoid complete ineptitude in the testing department. The code itself just overrides Queue.get and Queue.put with fairly minimal changes, and adds the close and closed methods.
I've sort of intentionally avoided using any new-fangled fanciness like context managers in both the code itself and in the test suite in an effort to keep the code as backwards-compatible as is the Queue module itself, which is considerably backwards indeed. I'll probably add __enter__ and __exit__ methods at some point; otherwise, the contextlib's closing function should be applicable to a CloseableQueue instance.
*: Here I use the term "mixin" loosely. As the Queue module's classes are old-style, mixins would need to be mixed using class factory functions; some restrictions apply; offer void where prohibited by Guido.
update
The CloseableQueue module now provides CloseableLifoQueue and CloseablePriorityQueue classes. I've also added some convenience functions to support iteration. Still need to rework it as a proper package. There's a class factory function to allow for convenient subclassing of other Queue.Queue-derived classes.
update 2
CloseableQueue is now available via PyPI, e.g. with
$ easy_install CloseableQueue
Comments and criticism are welcome, especially from this answer's anonymous downvoter.
Queue's don't inherently have the idea of being complete or done. They can be used indefinitely. To close it up when you are done, you will indeed need to put None or some other magic value at the end and write the logic to check for it, as you described. The ideal way would probably be subclassing the Queue object.
See http://en.wikipedia.org/wiki/Queue_(data_structure) to learn more about queue in general.
A sentinel is a natural way to shut down a queue, but there are a couple things to watch out for.
First, remember that you may have more than one consumer, so you need to send a sentinel once for each running consumer, and guarantee that each consumer will only consume one sentinel, to ensure that each consumer receives its shutdown sentinel.
Second, remember that Queue defines an interface, and that when possible, code should behave regardless of the underlying Queue. You might have a PriorityQueue, or you might have some other class that exposes the same interface and returns values in some other order.
Unfortunately, it's hard to deal with both of these. To deal with the general case of different queues, a consumer that's shutting down must continue to consume values after receiving its shutdown sentinel until the queue is empty. That means that it may consume another thread's sentinel. This is a weakness of the Queue interface: it should have a Queue.shutdown call to cause an exception to be thrown by all consumers, but that's missing.
So, in practice:
if you're sure you're only ever using a regular Queue, simply send one sentinel per thread.
if you may be using a PriorityQueue, ensure that the sentinel has the lowest priority.
Queue is a FIFO (first in first out) register so remember that the consumer can be faster than producer. When consumers thread detect that the queue is empty normally realise one of following actions:
Send to API: switch to next thread.
Send to API: sleep some ms and than check again the queue.
Send to API: wait on event (like new message in queue).
If you wont that consumers thread terminate after job is complete than put in queue a sentinel value to terminate task.
The best practice way of doing this would be to have the queue itself notify a client that it has reached the 'done' state. The client can then take any action that is appropriate.
What you have suggested; checking the queue to see if it is done periodically, would be highly undesirable. Polling is an antipattern in multithreaded programming, you should always be using notifications.
EDIT:
So your saying that the queue itself knows that it's 'done' based on some criteria and needs to notify the clients of that fact. I think you are correct and the best way to do this is by throwing when a client calls get() and the queue is in the done state. If your throwing this would negate the need for a sentinel value on the client side. Internally the queue can detect that it is 'done' in any way it pleases e.g. queue is empty, it's state was set to done etc I don't see any need for a sentinel value.