Will the collections.deque "pop" methods release GIL? - python

I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function

Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.

Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.

Related

Why is using global in python threading bad practice?

I read all over various websites how using global is bad. I have an application where I am storing say, 300 objects, in an array. I want to have 8 threads running through these 300 objects. These objects are different sizes, say between 10 and 50,000 integers and randomly distributed (think worst case scenario here).
Basically, I want to start up 8 threads, do a process on an object, report or store the results, and pick up a new object, 300 times.
The solution I can think of is to set a global lock and a global counter, lock the array, get the current object, increment the counter, release the lock.
There is 1 lock for 8 threads. There is 1 counter for 8 threads. I have 2 global objects. I store results in a dictionary, possibly also global to make it visible to all threads but also threadsafe. I am not bothering to do something stupid like subclassing thread and passing along 300/8 objects to each object because multiprocessing.pool does that for me. So how would you do it? Also, convince me that using global in this situation is bad.
Classifying approaches as either "good" or "bad" is a bit simplistic -- in practice, if a design makes sense to you and accomplishes the goals you set out to accomplish, then it doesn't matter whether other people (except possibly your boss) think it's "good" or not; it either works or it doesn't. On the other hand, if your design causes you a lot of pain and suffering, that's a sign that you might not be using the most suitable design for the task at hand.
That said, there are some valid reasons why a lot of people think that global variables are problematic, particularly when combined with multithreading.
The general problem with global variables (with or without multithreading) is that as your program grows larger, it becomes increasingly difficult to mentally keep track of which parts of your program might be reading and/or updating the global variables' values at which times -- since they are global, by definition all parts of your program have access to them, so when you're trying to trace through your program to figure out who it was who set a global variable to some unexpected value, the list of suspects can become unmanageably large. (this isn't much of a problem for small programs, but the larger your program grows, the worse this problem becomes -- and a lot of programmers have learned, through painful experience, that it's better to nip the problem in the bud by avoiding globals wherever possible in the first place, then to have to go back and rewrite a big, complicated, buggy program later on)
In the specific use-case of a multithreaded program, the anybody-could-be-accessing-my-global-variable-at-any-time property becomes even more fraught with peril, since in a multithreaded scenario, any (non-immutable) data that is shared between threads can only be safely accessed with proper serialization (e.g. by locking a mutex before reading/writing the shared data, and unlocking it afterwards). Ideally programmers would never accidentally read or write any shared+mutable data without locking the mutex -- but programmers are human and will inevitably make mistakes; if given the ability to do so, sooner or later you (or someone else) will forget that access to a particular global variable needs to be serialized, and will just go ahead and read/write it, and then you're in for a lot of pain, because the symptoms will be rare and random, and the cause of the fault won't be obvious.
So smart programmers try to make it impossible to fall into that sort of trap, usually by limiting access to the shared-state to a specific, small, carefully-written set of functions (a.k.a. an API) that implement the serialization correctly so that no other code has to. When doing that, you want to make sure that only the code in this particular API has access to the shared data, and that nobody else does -- something that is impossible to do with a global variable, as by definition everyone has direct access to it.
There is also one performance-related reason why people prefer not to mix global variables and multithreading: the more serialization you have to do, the less your program can exploit the power of multiple CPU cores. In particular, it does you no good to have an 8-core CPU if 7 of your 8 threads are spending most of their time blocked, waiting for a mutex to become available.
So how does that relate to globals? It's related in that in most cases it's difficult or impossible to prove that a global variable won't ever be accessed by another thread, which means all accesses to that global variable need to be serialized. With a non-global variable, on the other hand, you can make sure to give a reference to that variable to only a single thread -- at which point you have effectively guaranteed that only that one thread will ever access the variable (since the other threads have no references to it, you know they can't access it), and because you have that guarantee, you no longer need to serialize access to that data, and now your thread can run more efficiently because it never has to block waiting for a mutex.
(Btw note that CPython in particular suffers from a severe form of implicit serialization caused by Python's Global Interpreter Lock, which means that even the best multithreaded, CPU-bound Python code will be unlikely to use more than a single CPU core at a time. The only way to get around that is to use multiprocessing instead, or do the bulk of your program's computations in a lower-level language such C, so that it can execute without holding the GIL)

How to manipulate the GIL when many threads need to execute?

My understanding is that the typical GIL manipulations involve, e.g., blocking I/O operations. Hence one would want to release the lock before the I/O operation and reacquire it once it has completed.
I'm currently facing a different scenario with a C extension: I am creating X windows that are exposed to Python via the Canvas class. When the method show() is called on an instance, a new UI thread is started using PyThreads (with a call to PyThread_start_new_thread). This new thread is responsible for drawing on the X window, using the Python code specified in the on_draw method of a subclass of Canvas. A pure C event loop is started in the main thread that simply checks for events on the X window and, for the time being, only captures the WM_DELETE_EVENT.
So I have potentially many threads (one for each X window) that want to execute Python code and the main thread that does not execute any Python code at all.
How do I release/acquire the GIL in order to allow the UI threads to get into the interpreter orderly?
The rule is easy: you need to hold the GIL to access Python machinery (any API starting with Py<...> and any PyObject).
So, you can release it whenever you don't need any of that.
Anything further than this is the fundamental problem of locking granularity: potential benefits vs locking overhead. There was an experiment for Py 1.4 to replace the GIL with more granular locks that failed exactly because the overhead proved prohibitive.
That's why it's typically released for code chunks involving call(s) to extental facilities that can take arbitrary time (especially if they involve waiting for external events) -- if you don't release the lock, Python will be just idling during this time.
Heeding this rule, you will get to your goal automatically: whenever a thread can't proceed further (whether it's I/O, signal from another thread, or even so much as a time.sleep() to avoid a busy loop), it will release the lock and allow other threads to proceed in its stead. The GIL assigning mechanism strives to be fair (see issue8299 for exploration on how fair it is), releasing the programmer from bothering about any bias stemming solely from the engine.
I think the problem stems from the fact that, in my opinion, the official documentation is a bit ambiguous on the meaning of Non-Python created threads. Quoting from it:
When threads are created using the dedicated Python APIs (such as the threading module), a thread state is automatically associated to them and the code showed above is therefore correct. However, when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them.
I have highlighted in bold the parts that I find off-putting. As I have stated in the OP, I am calling PyThread_start_new_thread. Whilst this creates a new thread from C, this function is not part of a third-party library, but of the dedicated Python (C) APIs. Based on this assumption, I ruled out that I actually needed to use the PyGILState_Ensure/PyGILState_Release paradigm.
As far as I can tell from what I've seen with my experiments, a thread created from C with (just) PyThread_start_new_thread should be considered as a non-Python created thread.

Python dict.get() Lock

When I use dictionary.get() function, is it locking the whole dictionary? I am developing a multiprocess and multithreading program. The dictionary is used to act as a state table to keep track of data. I have to impose a size limit to the dictionary, so whenever the limit is being hit, I have to do garbage collection on the table, based on the timestamp. The current implementation will delay adding operation while garbage collection is iterating through the whole table.
I will have 2 or more threads, one just to add data and one just to do garbage collection. Performance is critical in my program to handle streaming data. My program is receiving streaming data, and whenever it receives a message, it has to look for it in the state table, then add the record if it's non-existent in the first place, or copy certain information and then send it along the pipe.
I have thought of using multiprocessing to do the search and adding operation concurrently, but if I used processes, I have to make a copy of state table for each process, in that case, the performance overhead for synchronization is too high. And I also read that multiprocessing.manager.dict() is also locking the access for each CRUD operation. I could not spare the overhead for it so my current approach is using threading.
So my question is while one thread is doing .get(), del dict['key'] operation on the table, will the other insertion thread be blocked from accessing it?
Note: I have read through most SO's python dictionary related posts, but I cannot seem to find the answer. Most people only answer that even though python dictionary operations are atomic, it is safer to use a Lock for insertion/update. I'm handling a huge amount of streaming data so Locking every time is not ideal for me. Please advise if there is a better approach.
If the process of hashing or comparing the keys in your dictionary could invoke arbitrary Python code (basically, if the keys aren't all Python built-in types implemented in C, e.g. str, int, float, etc.), then yes, it would be possible for a race condition to occur in which the GIL is released while a bucket collision is being resolved (during the equality test), and another thread could leap in and cause the object being compared to to disappear from the dict. They try to ensure it doesn't actually crash the interpreter, but it has been a source of errors in the past.
If that's a possibility (or you're on a non-CPython interpreter, where there is no GIL providing basic guarantees like this), then you should really use a lock to coordinate access. On CPython, as long as you're on modern Python 3, the cost will be fairly low; contention on the lock should be fairly low since the GIL ensures only one thread is actually running at once; most of the time your lock should be uncontended (because the contention is on the GIL), so the incremental cost of using it should be fairly small.
A note: You might consider using collections.OrderedDict to simplify the process of limiting the size of your table. With OrderedDict, you can implement the size limit as a strict LRU (least-recently used) system by making additions to the table done as:
with lock:
try:
try:
odict.move_to_end(key) # If key already existed, make sure it's "renewed"
finally:
odict[key] = value # set new value whether or not key already existed
except KeyError:
# move_to_end raising key error means newly added key, so we might
# have grown larger than limit
if len(odict) > maxsize:
odict.popitem(False) # Pops oldest item
and usage done as:
with lock:
# move_to_end optional; if using key means it should live longer, then do it
# if only setting key should refresh it, omit move_to_end
odict.move_to_end(key)
return odict[key]
This does need a lock, but it also reduces the work for garbage collection when it grows too large from "check every key" (O(n) work) to "pop the oldest item off without looking at anything else" (O(1) work).
A lock is used to avoid race conditions so no two threads could make change to the dict at the same time so it is advisible that you use the lock else you might go into a race condition causing program to fail. A mutex lock can be used to deal with 2 threads.

Why are Twisted's DeferredFilesystemLocks not safe for concurrent use?

In the Twisted API for DeferredFilesystemLock, it is stated that deferUntilLocked is not safe for concurrent use.
I would like to understand in what way it is unsafe and what makes it unsafe, in order to ensure that I don't misuse the file locks.
Arguably the method is actually quite safe for concurrent use. If you read the first four lines of the implementation then it's clear that an attempt at concurrent use will immediately raise AlreadyTryingToLockError.
Perhaps the warning is meant to tell you that you'll get an exception rather than useful locking behavior, though.
The implementation of that exception should provide a hint about why concurrent use isn't allowed. DeferredFilesystemLock uses some instance attributes, starting with _tryLockCall, to keep track of progress in the attempt to acquire the lock. If concurrent attempts were allowed, they would each trample over the use of this attribute (and others) by each other.
This could be enhanced with relative ease. All that would be necessary is to keep the state associated with the lock attempt on a new object allocated per-attempt (instead of on the DeferredFilesystemLock instance). Or, DeferredLock could help.
The first and most obvious thing that comes to mind is that in concurrent situations you're never guaranteed to acquire the lock (if another thread never releases it), so you may defer forever. You could avoid this by simply passing the optional timeout to deferUntilLocked.
Other things to consider that may make this unsuitable for concurrent use:
Starvation: What if multiple threads are continually waiting to acquire the same lock - are they treated fairly, or will one thread spend longer waiting than others? Are threads guaranteed to eventually acquire the lock?
Deadlocks: If you're acquiring multiple locks at a time, and multiple threads are doing this, you may get into a situation where you have two threads both waiting on a resource that the other one holds.
Are you sure that acquired locks are always released? What if one thread acquires a lock and crashes without releasing it?
It looks to me like Twisted's implementation is fairly simple and probably doesn't take into account many of these things. Their "not safe" comment is a "there be dragons here" - you may/will get difficult to troubleshoot concurrent bugs or issues if you try to use this in a concurrent application.

Python threading and GIL

I was reading about the GIL and it never really specified if this includes the main thread or not (i assume so). Reason I ask is because I have a program with threads setup that modify a dictionary. The main thread adds/deletes based on player input while a thread loops the data updating and changing data.
However in some cases a thread may iterate over the dictionary keys where one could delete them. If there is a so called GIL and they are run sequentially, why am I getting dict changed errors? If only one is suppose to run at a time, then technically this should not happen.
Can anyone shed some light on such a thing? Thank you.
They are running at the same time, they just don't execute at the same time. The iterations might be interleaved. Quote Python:
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time.
So two for loops might run at the same time, there will just be no (for example) two del dict[index]'s at the same time.
The GIL locks at a Python byte-code level, and applies to all threads, even the main thread. If you have one thread modifying a dictionary, and another iterating keys, they will interfere with each other.
"Only one runs at a time" is true, but you have to understand the unit of granularity. In the case of CPython's GIL, the granularity is a bytecode instruction, so execution can switch between threads at any bytecode.
The gil prevents two threads from modifying the interpreter state simultaneously. It doesn't provide any thread consistency constraints, or any kind of mutex at all on a granularity smaller than the whole process. If you need to read and modify a dict in two threads, you should be using a mutex
Python switches threads more often than you seem to think it does. You say "only one" is supposed to run at a time, and technically that's true, but it depends on your definition of "one." Python's atomic operations are very small. For example: adding a single item to a dictionary. Iteration over an entire dictionary can be interrupted.
You should use a lock object from the threading library to isolate your program's atomic operations.

Categories