Is there a Boost Threads equivalent to Python's threading.Event
Less specifically, is there a synchronization primitive that allows threads to pass when an internal value is set, and blocks them when not?
You should use Boost's condition variables. Condition variables avoid some of the pitfalls that can happen with event objects. I find it hard to use event objects correctly in some corner cases: multiple triggers before the event is handled, some state is changed before the handler is called, etc.
The examples in the Boost documentation are quite self-explanatory.
Related
The two classes represent excellent abstractions for concurrent programming, so it's a bit disconcerting that they don't support the same API.
Specifically, according to the docs:
asyncio.Future is almost compatible with concurrent.futures.Future.
Differences:
result() and exception() do not take a timeout argument and raise an exception when the future isn’t done yet.
Callbacks registered with add_done_callback() are always called via the event loop's call_soon_threadsafe().
This class is not compatible with the wait() and as_completed() functions in the concurrent.futures package.
The above list is actually incomplete, there are a couple more differences:
running() method is absent
result() and exception() may raise InvalidStateError if called too early
Are any of these due to the inherent nature of an event loop that makes these operations either useless or too troublesome to implement?
And what is the meaning of the difference related to add_done_callback()? Either way, the callback is guaranteed to happen at some unspecified time after the futures is done, so isn't it perfectly consistent between the two classes?
The core reason for the difference is in how threads (and processes) handle blocks vs how coroutines handle events that block. In threading, the current thread is suspended until whatever condition resolves and the thread can go forward. For example in the case of the futures, if you request the result of a future, it's fine to suspend the current thread until that result is available.
However the concurrency model of an event loop is that rather than suspending code, you return to the event loop and get called again when ready. So it is an error to request the result of an asyncio future that doesn't have a result ready.
You might think that the asyncio future could just wait and while that would be inefficient, would it really be all that bad for your coroutine to block? It turns out though that having the coroutine block is very likely to mean that the future never completes. It is very likely that the future's result will be set by some code associated with the event loop running the code that requests the result. If the thread running that event loop blocks, no code associated with the event loop would run. So blocking on the result would deadlock and prevent the result from being produced.
So, yes, the differences in interface are due to this inherent difference. As an example, you wouldn't want to use an asyncio future with the concurrent.futures waiter abstraction because again that would block the event loop thread.
The add_done_callbacks difference guarantees that callbacks will be run in the event loop. That's desirable because they will get the event loop's thread local data. Also, a lot of coroutine code assumes that it will never be run at the same time as other code from the same event loop. That is, coroutines are only thread safe under the assumption that two coroutines from the same event loop do not run at the same time. Running the callbacks in the event loop avoids a lot of thread safety issues and makes it easier to write correct code.
concurrent.futures.Future provides a way to share results between different threads and processes usually when you use Executor.
asyncio.Future solves same task but for coroutines, that are actually some special sort of functions running usually in one process/thread asynchronously. "Asynchronously" in current context means that event loop manages code executing flow of this coroutines: it may suspend execution inside one coroutine, start executing another coroutine and later return to executing first one - everything usually in one thread/process.
These objects (and many other threading/asyncio objects like Lock, Event, Semaphore etc.) look similar because the idea of concurrency in your code with threads/processes and coroutines is similar.
I think the main reason objects are different is historical: asyncio was created much later then threading and concurrent.futures. It's probably impossible to change concurrent.futures.Future to work with asyncio without breaking class API.
Should both classes be one in "ideal world"? This is probably debatable issue, but I see many disadvantages of that: while asyncio and threading look similar at first glance, they're very different in many ways, including internal implementation or way of writing asyncio/non-asyncio code (see async/await keywords).
I think it's probably for the best that classes are different: we clearly split different by nature ways of concurrency (even if their similarity looks strange at first).
I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function
Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.
Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.
I have yet to find a clear explanation of the differences between Condition and Event classes in the threading module. Is there a clear use case where one would be more helpful than the other? All the examples I can find use a producer-consumer model as an example, where queue.Queue would be the more straightforward solution.
Simply put, you use a Condition when threads are interested in waiting for something to become true, and once its true, to have exclusive access to some shared resource.
Whereas you use an Event when threads are just interested in waiting for something to become true.
In essence, Condition is an abstracted Event + Lock, but it gets more interesting when you consider that you can have several different Conditions over the same underlying lock. Thus you could have different Conditions describing the state of the underlying resource meaning you can wake workers that are only interested in particular states of the shared resource.
Another subtle difference is that Event's set() affects future calls of wait() (that is, subsequent calls of wait() will return True and won't block until clear() is called), whereas Condition's notify() (or notify_all()) doesn't (subsequent calls of wait() will block till next call of notify()).
I was reading about the GIL and it never really specified if this includes the main thread or not (i assume so). Reason I ask is because I have a program with threads setup that modify a dictionary. The main thread adds/deletes based on player input while a thread loops the data updating and changing data.
However in some cases a thread may iterate over the dictionary keys where one could delete them. If there is a so called GIL and they are run sequentially, why am I getting dict changed errors? If only one is suppose to run at a time, then technically this should not happen.
Can anyone shed some light on such a thing? Thank you.
They are running at the same time, they just don't execute at the same time. The iterations might be interleaved. Quote Python:
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time.
So two for loops might run at the same time, there will just be no (for example) two del dict[index]'s at the same time.
The GIL locks at a Python byte-code level, and applies to all threads, even the main thread. If you have one thread modifying a dictionary, and another iterating keys, they will interfere with each other.
"Only one runs at a time" is true, but you have to understand the unit of granularity. In the case of CPython's GIL, the granularity is a bytecode instruction, so execution can switch between threads at any bytecode.
The gil prevents two threads from modifying the interpreter state simultaneously. It doesn't provide any thread consistency constraints, or any kind of mutex at all on a granularity smaller than the whole process. If you need to read and modify a dict in two threads, you should be using a mutex
Python switches threads more often than you seem to think it does. You say "only one" is supposed to run at a time, and technically that's true, but it depends on your definition of "one." Python's atomic operations are very small. For example: adding a single item to a dictionary. Iteration over an entire dictionary can be interrupted.
You should use a lock object from the threading library to isolate your program's atomic operations.
I write a python class which makes asynchronous method calls using D-Bus. When my reply_handler is called, it stores data in list. This list can be used by another class methods at the same time. Is it safe or I can use only synchronized data structures like Queue class?
If you do not modify the list outside of the callback context, then you do not necessarily need synchronization - you will just need to be aware that the list object's state is volatile.
If the list must be modified both in the callback handler as well as, say, the main execution context (or other threads, etc.), then yes you will need synchronization.
The Python synchronized Queue works naturally for message pumps - allowing you to perform actions sequentially in the order that the events come in one of your own contexts. This benefits code simplicity and readability as well since major state changes are easier to track. Callbacks generally shouldn't be too complicated anyway as the outside context in which the callbacks are called shouldn't (and probably doesn't) have to deal with exceptions raised from your code. There are also potential timing considerations as well - the callback will block the async emitter's context - so keeping the handler short and sweet is also good.