Why does multithreading have a Lock object?

Why does multithreading have a Lock object? - python

My understanding is that threading allows in reality to have only one thread active at a time, continously switching between threads. This is useful when having IO-bound operations where the worload is effectively offloaded somewhere else (an PI, a database, ...).
If so, why is there a need for a Lock() object? There is no risk that a variable is accessed by two threads simultaneously (as it can be the case in multiprocessing) so I fail to see a real usage for locks in this context.

There is no risk that a variable is accessed by two threads simultaneously
It depends on the scheduler used to implement multithreading. Context switches may occur on any interrupt no matter what the current thread does. Therefore a thread accessing a variable may be interrupted on a clock interrupt and another thread accessing the same variable may be activated.

First of all, locks secure whole areas, think of updating a file:
with lock:
with open("some_file", "r+") as f:
do_something(f)
Even single operations like
a['b'] += 1
might lead to multiple operations (read value of a['b'], increment, write to a['b']), and need to be secured by a lock:
with lock:
a['b'] += 1

Related

Correctness of modified consumer/producer

I am creating a Sound class to play notes and would like feedback on the correctness and conciseness of my design. This class differs from the typical consumer/producer in two ways:
The consumer should respond to events, such as to shut down the thread, or otherwise continue forever. The typical consumer/producer exits when the queue is empty. For example, a thread waiting in queue.get cannot handle additional notifications.
Each set of notes submitted by the producer should overwrite any unprocessed notes remaining on the queue.
Originally I had the consumer process one note at a time using the queue module. I found continually acquiring and releasing the lock without any competition to be inefficient, and as previously noted, queue.get prevents waiting on additional events. So instead of building upon that, I rewrote it into:
import threading
queue = []
condition = threading.Condition()
interrupt = threading.Event()
stop = threading.Event()
def producer():
while some_condition:
ns = get_notes() # [(float,float)]
with condition:
queue.clear()
queue.append(ns)
interrupt.set()
condition.notify()
with condition:
stop.set()
condition.notify()
consumer.join()
def consumer():
while not stop.is_set():
with condition:
while not (queue or stop.is_set()):
condition.wait()
if stop.is_set():
break
interrupt.clear()
ns = queue.pop()
ss = gen_samples(ns) # iterator/fast
for b in grouper(ss, size/2):
if interrupt.is_set() or stop.is_set()
break
stream.write(b)
thread = threading.Thread(target=consumer)
thread.start()
producer()
My questions are as follows:
Is this thread-safe? I want to specifically point out my use of is_set without locks or synchronization (in the for-loop).
Can the events be replaced with boolean variables? I believe so as conflicting writes in both threads (data race) are guarded by the condition variable. There is a race condition between setting and checking events but I do not believe it affects program flow.
Is there a more efficient approach/algorithm utilizing different synchronization primitives from the threading module?
edit: Found and fixed a possible deadlock described in Why does Python threading.Condition() notify() require a lock?

Analyzing thread-safety in Python can take into account the Global Interpreter Lock (GIL): no two threads will execute Python code simultaneously. Assignments to variables or object fields are effectively atomic (there are no half-assigned variables) and changes propagate effectively immediately to other threads.
This means that your use of Event.is_set() is already equivalent to using plain booleans. An event is a bool guarded by a Condition. The is_set() method checks the boolean directly. The set() method acquires the Condition, sets the boolean, and notifies all waiting threads. The wait() methods waits until the set() method is invoked. The clear() method acquires the Condition and unsets the boolean. Since you never wait() for any Event, and setting the boolean is atomic, the Condition in the Event is effectively unused.
This might get rid of a couple of locks, but isn't really a huge efficiency win. A Condition is still an abstraction over a lock, but the built-in Queue type uses locks directly. Thus, I would assume that the built-in queue is no less performant than your solution, even for a single consumer.
Your main issue with the built-in queue is that “continually acquiring and releasing the lock without any competition [is] inefficient”. This is wrong on two counts:
Due to Python's GIL, there is little competition in either case.
Acquiring uncontested locks is very efficient.
So while your solution is probably sufficiently correct (I can see no opportunity for deadlock) it is unlikely to be particularly efficient. (There are just some small mistakes, like using stop instead of stop.is_set() and some syntax errors.)
If you are seeing poor performance with Python threads that's probably because of CPython, not because of the Queue type. I already mentioned that only one thread can run at a time due to the GIL. If multiple threads want to run, they must be scheduled by the operating system to do so and acquire the GIL. Each thread will wait for 5ms before asking the running thread to give up the GIL (in a manner quite similar to your interrupt flag). And then the thread can do useful work like acquiring a lock for a critical section that must not be interrupted by other threads.
Possibly, the solution could be to avoid CPython's threads.
If you have multiple CPU-bound tasks, you must use multiple processes. CPython's threads will not run in parallel. However, communication between processes is more expensive.
Consider whether you can combine the producer+consumer directly, possibly using features such as generators.
For an easier time with juggling multiple tasks in the same thread, consider using async/await. Event loops are provided by the asyncio module. This is just as fast as Python's threads, with the caveat that tasks don't pre-empt (interrupt) each other. But this can be advantage: since a task can only be suspended at an await, you don't need most locks and it is easier to reason about correctness of the code. The downside is that async/await might have even higher latency than using threads.
Python has a concept of “executors” that make it easy and efficient to run tasks in separate threads (for I/O-bound tasks) or separate processes (for CPU-bound tasks).
For communicating between multiple processes, use the types from the multiprocessing module (e.g. Queue, Connection, or Value).

Does threading.Condition maintain a collection of Thread objects?

Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?

A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.

what's the best practice for accessing shared data with python multi-threading

In python multi-threading, there are some atomic types that can be accessed
by multiple threads without protection(list, dict, etc). There are also some types need protected by lock.
My question is:
where can I find official document that list all atomic types, I can google some answers, but they are not "official" and out of date.
some book suggest that we should protect all shared data with lock, because atomic type may because non-atomic, we shouldn't rely on it. Is this correct?
because lock surely have overhead, is this overhead negligible even with big program?

Locks are used for making an operation atomic. This means only one thread can access some resource. Using many locks causes your application lose the benefit of threading, as only one thread can access the resource.
If you think about it, it doesn't make much sense. It will make your program slower, because of the python needs to manage and context switch between the threads.
When using threads, you should look for minimizing the number of locks as much as possible. Try use local variables whenever possible. Make your function do some work, and return a value instead of updating an existing one.
Then you can create a Queue and collect the results.
Besides locks, there are Semaphores. These are basically Locks, with a limited number of threads can use it:
A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
Python has a good documentation for threading module.
Here is a small example of a dummy function tested using single thread vs 3 threads. Pay attention to the impact Lock makes on the running time:
threads (no locks) duration: 1.0949997901
threads (with locks) duration: 3.1289999485
single thread duration: 3.09899997711
def work():
x = 0
for i in range(100):
x += i
lock.acquire()
print 'acquried lock, do some calculations'
time.sleep(1)
print x
lock.release()
print 'lock released'

I think you are looking for this link.
From above link :
An operation acting on shared memory is atomic if it completes in a
single step relative to other threads. When an atomic store is
performed on a shared variable, no other thread can observe the
modification half-complete. When an atomic load is performed on a
shared variable, it reads the entire value as it appeared at a single
moment in time. Non-atomic loads and stores do not make those
guarantees.
Any manipulation on list won't be atomic operation, so extra care need to be taken to make it thread safe using Lock, Event, Condition or Semaphores etc.
For example, you can check this answer which explains how list are thread safe.

Why do we need locks for threads, if we have GIL?

I believe it is a stupid question but I still can't find it. Actually it's better to separate it into two questions:
1) Am I right that we could have a lot of threads but because of GIL in one moment only one thread is executing?
2) If so, why do we still need locks? We use locks to avoid the case when two threads are trying to read/write some shared object, because of GIL twi threads can't be executed in one moment, can they?

GIL protects the Python interals. That means:
you don't have to worry about something in the interpreter going wrong because of multithreading
most things do not really run in parallel, because python code is executed sequentially due to GIL
But GIL does not protect your own code. For example, if you have this code:
self.some_number += 1
That is going to read value of self.some_number, calculate some_number+1 and then write it back to self.some_number.
If you do that in two threads, the operations (read, add, write) of one thread and the other may be mixed, so that the result is wrong.
This could be the order of execution:
thread1 reads self.some_number (0)
thread2 reads self.some_number (0)
thread1 calculates some_number+1 (1)
thread2 calculates some_number+1 (1)
thread1 writes 1 to self.some_number
thread2 writes 1 to self.some_number
You use locks to enforce this order of execution:
thread1 reads self.some_number (0)
thread1 calculates some_number+1 (1)
thread1 writes 1 to self.some_number
thread2 reads self.some_number (1)
thread2 calculates some_number+1 (2)
thread2 writes 2 to self.some_number
EDIT: Let's complete this answer with some code which shows the explained behaviour:
import threading
import time
total = 0
lock = threading.Lock()
def increment_n_times(n):
global total
for i in range(n):
total += 1
def safe_increment_n_times(n):
global total
for i in range(n):
lock.acquire()
total += 1
lock.release()
def increment_in_x_threads(x, func, n):
threads = [threading.Thread(target=func, args=(n,)) for i in range(x)]
global total
total = 0
begin = time.time()
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print('finished in {}s.\ntotal: {}\nexpected: {}\ndifference: {} ({} %)'
.format(time.time()-begin, total, n*x, n*x-total, 100-total/n/x*100))
There are two functions which implement increment. One uses locks and the other does not.
Function increment_in_x_threads implements parallel execution of the incrementing function in many threads.
Now running this with a big enough number of threads makes it almost certain that an error will occur:
print('unsafe:')
increment_in_x_threads(70, increment_n_times, 100000)
print('\nwith locks:')
increment_in_x_threads(70, safe_increment_n_times, 100000)
In my case, it printed:
unsafe:
finished in 0.9840562343597412s.
total: 4654584
expected: 7000000
difference: 2345416 (33.505942857142855 %)
with locks:
finished in 20.564176082611084s.
total: 7000000
expected: 7000000
difference: 0 (0.0 %)
So without locks, there were many errors (33% of increments failed). On the other hand, with locks it was 20 times slower.
Of course, both numbers are blown up because I used 70 threads, but this shows the general idea.

At any moment, yes, only one thread is executing Python code (other threads may be executing some IO, NumPy, whatever). That is mostly true. However, this is trivially true on any single-processor system, and yet people still need locks on single-processor systems.
Take a look at the following code:
queue = []
def do_work():
while queue:
item = queue.pop(0)
process(item)
With one thread, everything is fine. With two threads, you might get an exception from queue.pop() because the other thread called queue.pop() on the last item first. So you would need to handle that somehow. Using a lock is a simple solution. You can also use a proper concurrent queue (like in the queue module)--but if you look inside the queue module, you'll find that the Queue object has a threading.Lock() inside it. So either way you are using locks.
It is a common newbie mistake to write multithreaded code without the necessary locks. You look at code and think, "this will work just fine" and then find out many hours later that something truly bizarre has happened because threads weren't synchronized properly.
Or in short, there are many places in a multithreaded program where you need to prevent another thread from modifying a structure until you're done applying some changes. This allows you to maintain the invariants on your data, and if you can't maintain invariants, then it's basically impossible to write code that is correct.
Or put in the shortest way possible, "You don't need locks if you don't care if your code is correct."

the GIL prevents simultaneous execution of multiple threads, but not in all situations.
The GIL is temporarily released during I/O operations executed by threads. That means, multiple threads can run at the same time. That's one reason you still need locks.
I don't know where I found this reference.... in a video or something - hard to look it up, but you can investigate further yourself
UPDATE:
The few thumbs down I got signal to me that people think memory is not a good enough reference, and google not a good enough database. While I'd disagree with that, let me provide one of the first URLs I looked up (and checked!), so the people who disliked my answer can live happily from how on:
https://wiki.python.org/moin/GlobalInterpreterLock

the GIL does not protect you from modification of the internal states of the objects that you are accessing concurrently from different threads, meaning that you can still mess things up if you don't take measures.
So, despite the fact that two threads may not be running at the same exact time, they can still be trying to manipulate the internal state of an object (one at a time, intermittently), and if you don't prevent that from happening (with some locking mechanism) your code could/will eventually fail.
Regards.

Putting a thread to sleep until event X occurs

I'm writing to many files in a threaded app and I'm creating one handler per file. I have HandlerFactory class that manages the distribution of these handlers. What I'd like to do is that
thread A requests and gets foo.txt's file handle from the HandlerFactory class
thread B requests foo.txt's file handler
handler class recognizes that this file handle has been checked out
handler class puts thread A to sleep
thread B closes file handle using a wrapper method from HandlerFactory
HandlerFactory notifies sleeping threads
thread B wakes and successfully gets foo.txt's file handle
This is what I have so far,
def get_handler(self, file_path, type):
self.lock.acquire()
if file_path not in self.handlers:
self.handlers[file_path] = open(file_path, type)
elif not self.handlers[file_path].closed:
time.sleep(1)
self.lock.release()
return self.handlers[file_path][type]
I believe this covers the sleeping and handler retrieval successfully, but I am unsure how to wake up all threads, or even better wake up a specific thread.

What you're looking for is known as a condition variable.
Condition Variables
Here is the Python 2 library reference.
For Python 3 it can be found here

Looks like you want a threading.Semaphore associated with each handler (other synchronization objects like Events and Conditions are also possible, but a Semaphore seems simplest for your needs). (Specifically, use a BoundedSemaphore: for your use case, that will raise an exception immediately for programming errors that erroneously release the semaphone more times than they acquire it -- and that's exactly the reason for being of the bounded version of semaphones;-).
Initialize each semaphore to a value of 1 when you build it (so that means the handler is available). Each using-thread calls acquire on the semaphore to get the handler (that may block it), and release on it when it's done with the handler (that will unblock exactly one of the waiting threads). That's simpler than the acquire/wait/notify/release lifecycle of a Condition, and more future-proof too, since as the docs for Condition say:
The current implementation wakes up
exactly one thread, if any are
waiting. However, it’s not safe to
rely on this behavior. A future,
optimized implementation may
occasionally wake up more than one
thread.
while with a Semaphore you're playing it safe (the semantics whereof are safe to rely on: if a semaphore is initialized to N, there are at all times between 0 and N-1 [[included]] threads that have successfully acquired the semaphore and not yet released it).

You do realize that Python has a giant lock, so that most of the benefits of multi-threading you do not get, right?
Unless there is some reason for the master thread to do something with the results of each worker, you may wish to consider just forking off another process for each request. You won't have to deal with locking issues then. Have the children do what they need to do, then die. If they do need to communicate back, do it over a pipe, with XMLRPC, or through a sqlite database (which is threadsafe).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.