I have a bunch of different methods that are not supposed to run concurrently, so I use a single lock to synchronize them. Looks something like this:
selected_method = choose_method()
with lock:
selected_method()
In some of these methods, I sometimes call a helper function that does some slow network IO. (Let's call that one network_method()). I would like to release the lock while this function is running, to allow other threads to continue their processing.
One way to achieve this would be by calling lock.release() and lock.acquire() before and after calling the network method. However, I would prefer to keep the methods oblivious to the lock, since there are many of them and they change all the time.
I would much prefer to rewrite network_method() so that it checks to see whether the lock is held, and if so release it before starting and acquire it again at the end.
Note that network_method() sometimes gets called from other places, so it shouldn't release the lock if it's not on the thread that holds it.
I tried using the locked() method on the Lock object, but that method only tells me whether the lock is held, not if it is held by the current thread.
By the way, lock is a global object and I'm fine with that.
I would much prefer to rewrite network_method() so that it checks to see whether the lock is held, and if so release it before starting and acquire it again at the end.
Note that network_method() sometimes gets called from other places, so it shouldn't release the lock if it's not on the thread that holds it.
This just sounds like entirely the wrong thing to do :(
For a start, it's bad to have a function that sometimes has some other magical side-effect depending on where you call it from. That's the sort of thing that is a nightmare to debug.
Secondly, a lock should have clear acquire and release semantics. If I look at code that says "lock(); do_something(); unlock();" then I expect it to be locked for the duration of do_something(). In fact, it is also telling me that do_something() requires a lock. If I find out that someone has written a particular do_something() which actually unlocks the lock that I just saw to be locked, I will either (a) fire them or (b) hunt them down with weapons, depending on whether I am in a position of seniority relative to them or not.
By the way, lock is a global object and I'm fine with that.
Incidentally, this is also why globals are bad. If I modify a value, call a function, and then modify a value again, I don't want that function in the middle being able to reach back out and modify this value in an unpredictable way.
My suggestion to you is this: your lock is in the wrong place, or doing the wrong thing, or both. You say these methods aren't supposed to run concurrently, but you actually want some of them to run concurrently. The fact that one of them is "slow" can't possibly make it acceptable to remove the lock - either you need the mutual exclusion during this type of operation for it to be correct, or you do not. If the slower operation is indeed inherently safe when the others are not, then maybe it doesn't need the lock - but that implies the lock should go inside each of the faster operations, not outside them. But all of this is dependent on what exactly the lock is for.
Why not just do this?
with lock:
before_network()
do_network_stuff()
with lock:
after_network()
Related
I'm using mutex for blocking part of code in the first function. Can I unlock mutex in the second function?
For example:
import threading
mutex = threading.Lock()
def function1():
mutex.acquire()
#do something
def function2():
#do something
mutex.release()
#do something
You certainly can do what you're asking, locking the mutex in one function and unlocking it in another one. But you probably shouldn't. It's bad design. If the code that uses those functions calls them in the wrong order, the mutex may be locked and never unlocked, or be unlocked when it isn't locked (or even worse, when it's locked by a different thread). If you can only ever call the functions in exactly one order, why are they even separate functions?
A better idea may be to move the lock-handling code out of the functions and make the caller responsible for locking and unlocking. Then you can use a with statement that ensures the lock and unlock are exactly paired up, even in the face of exceptions or other unexpected behavior.
with mutex:
function1()
function2()
Or if not all parts of the two functions are "hot" and need the lock held to ensure they run correctly, you might consider factoring out the parts that need the lock into a third function that runs in between the other two:
function1_cold_parts()
with mutex:
hot_parts()
function2_cold_parts()
Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?
A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.
In python multi-threading, there are some atomic types that can be accessed
by multiple threads without protection(list, dict, etc). There are also some types need protected by lock.
My question is:
where can I find official document that list all atomic types, I can google some answers, but they are not "official" and out of date.
some book suggest that we should protect all shared data with lock, because atomic type may because non-atomic, we shouldn't rely on it. Is this correct?
because lock surely have overhead, is this overhead negligible even with big program?
Locks are used for making an operation atomic. This means only one thread can access some resource. Using many locks causes your application lose the benefit of threading, as only one thread can access the resource.
If you think about it, it doesn't make much sense. It will make your program slower, because of the python needs to manage and context switch between the threads.
When using threads, you should look for minimizing the number of locks as much as possible. Try use local variables whenever possible. Make your function do some work, and return a value instead of updating an existing one.
Then you can create a Queue and collect the results.
Besides locks, there are Semaphores. These are basically Locks, with a limited number of threads can use it:
A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
Python has a good documentation for threading module.
Here is a small example of a dummy function tested using single thread vs 3 threads. Pay attention to the impact Lock makes on the running time:
threads (no locks) duration: 1.0949997901
threads (with locks) duration: 3.1289999485
single thread duration: 3.09899997711
def work():
x = 0
for i in range(100):
x += i
lock.acquire()
print 'acquried lock, do some calculations'
time.sleep(1)
print x
lock.release()
print 'lock released'
I think you are looking for this link.
From above link :
An operation acting on shared memory is atomic if it completes in a
single step relative to other threads. When an atomic store is
performed on a shared variable, no other thread can observe the
modification half-complete. When an atomic load is performed on a
shared variable, it reads the entire value as it appeared at a single
moment in time. Non-atomic loads and stores do not make those
guarantees.
Any manipulation on list won't be atomic operation, so extra care need to be taken to make it thread safe using Lock, Event, Condition or Semaphores etc.
For example, you can check this answer which explains how list are thread safe.
I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function
Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.
Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.
I was reading about the GIL and it never really specified if this includes the main thread or not (i assume so). Reason I ask is because I have a program with threads setup that modify a dictionary. The main thread adds/deletes based on player input while a thread loops the data updating and changing data.
However in some cases a thread may iterate over the dictionary keys where one could delete them. If there is a so called GIL and they are run sequentially, why am I getting dict changed errors? If only one is suppose to run at a time, then technically this should not happen.
Can anyone shed some light on such a thing? Thank you.
They are running at the same time, they just don't execute at the same time. The iterations might be interleaved. Quote Python:
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time.
So two for loops might run at the same time, there will just be no (for example) two del dict[index]'s at the same time.
The GIL locks at a Python byte-code level, and applies to all threads, even the main thread. If you have one thread modifying a dictionary, and another iterating keys, they will interfere with each other.
"Only one runs at a time" is true, but you have to understand the unit of granularity. In the case of CPython's GIL, the granularity is a bytecode instruction, so execution can switch between threads at any bytecode.
The gil prevents two threads from modifying the interpreter state simultaneously. It doesn't provide any thread consistency constraints, or any kind of mutex at all on a granularity smaller than the whole process. If you need to read and modify a dict in two threads, you should be using a mutex
Python switches threads more often than you seem to think it does. You say "only one" is supposed to run at a time, and technically that's true, but it depends on your definition of "one." Python's atomic operations are very small. For example: adding a single item to a dictionary. Iteration over an entire dictionary can be interrupted.
You should use a lock object from the threading library to isolate your program's atomic operations.