I am new to Python multithreading coding. I consult the manual for Lock Object usage and find the normal case is
g_mutex = Lock()
g_mutex.acquire()
#some code
g_mutex.release()
But the lock does not specify what variable or function it is going to lock? So does python automatically find all critical variables to be locked? What if I call some function to modify some variables?
The purpose of a Lock is that at most one thread can hold it at any given time. If you acquire the lock, you can be sure no other process holds it. It's up to you to define the semantics of the lock -- if you want to use it to guard the access to some variable, just do so: acquire the lock before messing around with that variable, release it when you are done. There is no need for an explicit relation between the lock object and the variable it protects -- this is defined by the way you use it. (Also note that there is nothing Python-specific to this concept.)
Lock will not found variables or functions to lock. Lock is a mecanism that you can use to do so. For example, if you want to protect the variable foo of any modification, you must:
create a lock for that variable
acquire before any modification or usage in your whole code
release after
Related
My goal is to be able to recognize at runtime directly that a certain lock.acquire() and lock.release() calls were made and be able to get info on this lock object.
So, in theory I do not know where or if they are called - therefore, I don't want to use decorators or subclass the lock class.
My approach:
The problem is that when using sys.settrace(), the calls to acquire() and release() do not cause me to enter a new scope and thus I'm still in the same same scope without knowing these functions were even called.
This gets me to think that the C code is executed directly for a lock from _thread module (is that correct?)
So, is there a way within this approach to do that?
Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?
A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.
In python multi-threading, there are some atomic types that can be accessed
by multiple threads without protection(list, dict, etc). There are also some types need protected by lock.
My question is:
where can I find official document that list all atomic types, I can google some answers, but they are not "official" and out of date.
some book suggest that we should protect all shared data with lock, because atomic type may because non-atomic, we shouldn't rely on it. Is this correct?
because lock surely have overhead, is this overhead negligible even with big program?
Locks are used for making an operation atomic. This means only one thread can access some resource. Using many locks causes your application lose the benefit of threading, as only one thread can access the resource.
If you think about it, it doesn't make much sense. It will make your program slower, because of the python needs to manage and context switch between the threads.
When using threads, you should look for minimizing the number of locks as much as possible. Try use local variables whenever possible. Make your function do some work, and return a value instead of updating an existing one.
Then you can create a Queue and collect the results.
Besides locks, there are Semaphores. These are basically Locks, with a limited number of threads can use it:
A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
Python has a good documentation for threading module.
Here is a small example of a dummy function tested using single thread vs 3 threads. Pay attention to the impact Lock makes on the running time:
threads (no locks) duration: 1.0949997901
threads (with locks) duration: 3.1289999485
single thread duration: 3.09899997711
def work():
x = 0
for i in range(100):
x += i
lock.acquire()
print 'acquried lock, do some calculations'
time.sleep(1)
print x
lock.release()
print 'lock released'
I think you are looking for this link.
From above link :
An operation acting on shared memory is atomic if it completes in a
single step relative to other threads. When an atomic store is
performed on a shared variable, no other thread can observe the
modification half-complete. When an atomic load is performed on a
shared variable, it reads the entire value as it appeared at a single
moment in time. Non-atomic loads and stores do not make those
guarantees.
Any manipulation on list won't be atomic operation, so extra care need to be taken to make it thread safe using Lock, Event, Condition or Semaphores etc.
For example, you can check this answer which explains how list are thread safe.
Specifically I'm talking about Python. I'm trying to hack something (just a little) by seeing an object's value without ever passing it in, and I'm wondering if it is thread safe to use thread local to do that. Also, how do you even go about doing such a thing?
No -- thread local means that each thread gets its own copy of that variable. Using it is (at least normally) thread-safe, simply because each thread uses its own variable, separate from variables by the same name that's accessible to other threads. OTOH, they're not (normally) useful for communication between threads.
I have a bunch of different methods that are not supposed to run concurrently, so I use a single lock to synchronize them. Looks something like this:
selected_method = choose_method()
with lock:
selected_method()
In some of these methods, I sometimes call a helper function that does some slow network IO. (Let's call that one network_method()). I would like to release the lock while this function is running, to allow other threads to continue their processing.
One way to achieve this would be by calling lock.release() and lock.acquire() before and after calling the network method. However, I would prefer to keep the methods oblivious to the lock, since there are many of them and they change all the time.
I would much prefer to rewrite network_method() so that it checks to see whether the lock is held, and if so release it before starting and acquire it again at the end.
Note that network_method() sometimes gets called from other places, so it shouldn't release the lock if it's not on the thread that holds it.
I tried using the locked() method on the Lock object, but that method only tells me whether the lock is held, not if it is held by the current thread.
By the way, lock is a global object and I'm fine with that.
I would much prefer to rewrite network_method() so that it checks to see whether the lock is held, and if so release it before starting and acquire it again at the end.
Note that network_method() sometimes gets called from other places, so it shouldn't release the lock if it's not on the thread that holds it.
This just sounds like entirely the wrong thing to do :(
For a start, it's bad to have a function that sometimes has some other magical side-effect depending on where you call it from. That's the sort of thing that is a nightmare to debug.
Secondly, a lock should have clear acquire and release semantics. If I look at code that says "lock(); do_something(); unlock();" then I expect it to be locked for the duration of do_something(). In fact, it is also telling me that do_something() requires a lock. If I find out that someone has written a particular do_something() which actually unlocks the lock that I just saw to be locked, I will either (a) fire them or (b) hunt them down with weapons, depending on whether I am in a position of seniority relative to them or not.
By the way, lock is a global object and I'm fine with that.
Incidentally, this is also why globals are bad. If I modify a value, call a function, and then modify a value again, I don't want that function in the middle being able to reach back out and modify this value in an unpredictable way.
My suggestion to you is this: your lock is in the wrong place, or doing the wrong thing, or both. You say these methods aren't supposed to run concurrently, but you actually want some of them to run concurrently. The fact that one of them is "slow" can't possibly make it acceptable to remove the lock - either you need the mutual exclusion during this type of operation for it to be correct, or you do not. If the slower operation is indeed inherently safe when the others are not, then maybe it doesn't need the lock - but that implies the lock should go inside each of the faster operations, not outside them. But all of this is dependent on what exactly the lock is for.
Why not just do this?
with lock:
before_network()
do_network_stuff()
with lock:
after_network()