Why does Python threading.Condition() notify() require a lock?

Why does Python threading.Condition() notify() require a lock? - python

My question refers specifically to why it was designed that way, due to the unnecessary performance implication.
When thread T1 has this code:
cv.acquire()
cv.wait()
cv.release()
and thread T2 has this code:
cv.acquire()
cv.notify() # requires that lock be held
cv.release()
what happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1. Now, there is a race-condition between T2's release and T1's reacquiring after returning from wait(). If T1 tries to reacquire first, it will be unnecessarily resuspended until T2's release() is completed.
Note: I'm intentionally not using the with statement, to better illustrate the race with explicit calls.
This seems like a design flaw. Is there any rationale known for this, or am I missing something?

This is not a definitive answer, but it's supposed to cover the relevant details I've managed to gather about this problem.
First, Python's threading implementation is based on Java's. Java's Condition.signal() documentation reads:
An implementation may (and typically does) require that the current thread hold the lock associated with this Condition when this method is called.
Now, the question was why enforce this behavior in Python in particular. But first I want to cover the pros and cons of each approach.
As to why some think it's often a better idea to hold the lock, I found two main arguments:
From the minute a waiter acquire()s the lock—that is, before releasing it on wait()—it is guaranteed to be notified of signals. If the corresponding release() happened prior to signalling, this would allow the sequence(where P=Producer and C=Consumer) P: release(); C: acquire(); P: notify(); C: wait() in which case the wait() corresponding to the acquire() of the same flow would miss the signal. There are cases where this doesn't matter (and could even be considered to be more accurate), but there are cases where that's undesirable. This is one argument.
When you notify() outside a lock, this may cause a scheduling priority inversion; that is, a low-priority thread might end up taking priority over a high-priority thread. Consider a work queue with one producer and two consumers (LC=Low-priority consumer and HC=High-priority consumer), where LC is currently executing a work item and HC is blocked in wait().
The following sequence may occur:
P LC HC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
execute(item) (in wait())
lock()
wq.push(item)
release()
acquire()
item = wq.pop()
release();
notify()
(wake-up)
while (wq.empty())
wait();
Whereas if the notify() happened before release(), LC wouldn't have been able to acquire() before HC had been woken-up. This is where the priority inversion occurred. This is the second argument.
The argument in favor of notifying outside of the lock is for high-performance threading, where a thread need not go back to sleep just to wake-up again the very next time-slice it gets—which was already explained how it might happen in my question.
Python's threading Module
In Python, as I said, you must hold the lock while notifying. The irony is that the internal implementation does not allow the underlying OS to avoid priority inversion, because it enforces a FIFO order on the waiters. Of course, the fact that the order of waiters is deterministic could come in handy, but the question remains why enforce such a thing when it could be argued that it would be more precise to differentiate between the lock and the condition variable, for that in some flows that require optimized concurrency and minimal blocking, acquire() should not by itself register a preceding waiting state, but only the wait() call itself.
Arguably, Python programmers would not care about performance to this extent anyway—although that still doesn't answer the question of why, when implementing a standard library, one should not allow several standard behaviors to be possible.
One thing which remains to be said is that the developers of the threading module might have specifically wanted a FIFO order for some reason, and found that this was somehow the best way of achieving it, and wanted to establish that as a Condition at the expense of the other (probably more prevalent) approaches. For this, they deserve the benefit of the doubt until they might account for it themselves.

There are several reasons which are compelling (when taken together).
1. The notifier needs to take a lock
Pretend that Condition.notifyUnlocked() exists.
The standard producer/consumer arrangement requires taking locks on both sides:
def unlocked(qu,cv): # qu is a thread-safe queue
qu.push(make_stuff())
cv.notifyUnlocked()
def consume(qu,cv):
with cv:
while True: # vs. other consumers or spurious wakeups
if qu: break
cv.wait()
x=qu.pop()
use_stuff(x)
This fails because both the push() and the notifyUnlocked() can intervene between the if qu: and the wait().
Writing either of
def lockedNotify(qu,cv):
qu.push(make_stuff())
with cv: cv.notify()
def lockedPush(qu,cv):
x=make_stuff() # don't hold the lock here
with cv: qu.push(x)
cv.notifyUnlocked()
works (which is an interesting exercise to demonstrate). The second form has the advantage of removing the requirement that qu be thread-safe, but it costs no more locks to take it around the call to notify() as well.
It remains to explain the preference for doing so, especially given that (as you observed) CPython does wake up the notified thread to have it switch to waiting on the mutex (rather than simply moving it to that wait queue).
2. The condition variable itself needs a lock
The Condition has internal data that must be protected in case of concurrent waits/notifications. (Glancing at the CPython implementation, I see the possibility that two unsynchronized notify()s could erroneously target the same waiting thread, which could cause reduced throughput or even deadlock.) It could protect that data with a dedicated lock, of course; since we need a user-visible lock already, using that one avoids additional synchronization costs.
3. Multiple wake conditions can need the lock
(Adapted from a comment on the blog post linked below.)
def setSignal(box,cv):
signal=False
with cv:
if not box.val:
box.val=True
signal=True
if signal: cv.notifyUnlocked()
def waitFor(box,v,cv):
v=bool(v) # to use ==
while True:
with cv:
if box.val==v: break
cv.wait()
Suppose box.val is False and thread #1 is waiting in waitFor(box,True,cv). Thread #2 calls setSignal; when it releases cv, #1 is still blocked on the condition. Thread #3 then calls waitFor(box,False,cv), finds that box.val is True, and waits. Then #2 calls notify(), waking #3, which is still unsatisfied and blocks again. Now #1 and #3 are both waiting, despite the fact that one of them must have its condition satisfied.
def setTrue(box,cv):
with cv:
if not box.val:
box.val=True
cv.notify()
Now that situation cannot arise: either #3 arrives before the update and never waits, or it arrives during or after the update and has not yet waited, guaranteeing that the notification goes to #1, which returns from waitFor.
4. The hardware might need a lock
With wait morphing and no GIL (in some alternate or future implementation of Python), the memory ordering (cf. Java's rules) imposed by the lock-release after notify() and the lock-acquire on return from wait() might be the only guarantee of the notifying thread's updates being visible to the waiting thread.
5. Real-time systems might need it
Immediately after the POSIX text you quoted we find:
however, if predictable scheduling behavior is required, then that mutex
shall be locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal().
One blog post contains further discussion of the rationale and history of this recommendation (as well as of some of the other issues here).

A couple of months ago exactly the same question occurred to me. But since I had ipython opened, looking at threading.Condition.wait?? result (the source for the method) didn't take long to answer it myself.
In short, the wait method creates another lock called waiter, acquires it, appends it to a list and then, surprise, releases the lock on itself. After that it acquires the waiter once again, that is it starts to wait until someone releases the waiter. Then it acquires the lock on itself again and returns.
The notify method pops a waiter from the waiter list (waiter is a lock, as we remember) and releases it allowing the corresponding wait method to continue.
That is the trick is that the wait method is not holding the lock on the condition itself while waiting for the notify method to release the waiter.
UPD1: I seem to have misunderstood the question. Is it correct that you are bothered that T1 might try to reacquire the lock on itself before the T2 release it?
But is it possible in the context of python's GIL? Or you think that one can insert an IO call before releasing the condition, which would allow T1 to wake up and wait forever?

It's explained in Python 3 documentation: https://docs.python.org/3/library/threading.html#condition-objects.
Note: the notify() and notify_all() methods don’t release the lock; this means that the thread or threads awakened will not return from their wait() call immediately, but only when the thread that called notify() or notify_all() finally relinquishes ownership of the lock.

What happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1.
Not quite. The cv.notify() call does not wake the T1 thread: It only moves it to a different queue. Before the notify(), T1 was waiting for the condition to be true. After the notify(), T1 is waiting to acquire the lock. T2 does not release the lock, and T1 does not "wake up" until T2 explicitly calls cv.release().

There is no race condition, this is how condition variables work.
When wait() is called, then the underlying lock is released until a notification occurs. It is guaranteed that the caller of wait will reacquire the lock before the function returns (eg, after the wait completes).
You're right that there could be some inefficiency if T1 was directly woken up when notify() is called. However, condition variables are typically implemented via OS primitives, and the OS will often be smart enough to realize that T2 still has the lock, so it won't immediately wake up T1 but instead queue it to be woken.
Additionally, in python, this doesn't really matter anyways, as there's only a single thread due to the GIL, so the threads wouldn't be able to run concurrently anyways.
Additionally, it's preferred to use the following forms instead of calling acquire/release directly:
with cv:
cv.wait()
And:
with cv:
cv.notify()
This ensures that the underlying lock is released even if an exception occurs.

Related

Do I need to terminate a started thread before starting another thread?

I have a button which calls a function. This function is called using a thread. If I click the button more than once I get an error: RuntimeError: threads can only be started once. I got the solution on SO (create a new thread). But if I create a new thread every time that button is clicked, what happens to the previous thread. Should I worry about the previous threads?
import tkinter as tk
from tkinter import ttk
import threading
root = tk.Tk()
#creating new thread on click
def start_rename():
new_thread = threading.Thread(target= bulk_rename)
new_thread.start()
def bulk_rename():
print("renaming...")
rename_button = ttk.Button(root, text="Bulk Rename", command=start_rename)
rename_button.pack()
root.mainloop()

Here's a different way to say the same things:
A mutex (a.k.a., a "lock", or a "binary semaphore") is an object with two methods; lock() and unlock(), or acquire() and release(), or decrement() and increment(), (or P() and V() in really old programs.) The lock() method does not return until the calling thread "owns"
the mutex, and a subsequent call to the unlock() method from the
same thread will relinquish ownership.
No two threads will be allowed to own the same mutex at the same time. If two or more threads simultaneously try to lock a mutex, one will immediately "win" ownership, and the others will wait for it to be unlocked again.
Assuming each of the competing threads eventually unlocks the mutex, then all of them eventually will be allowed to own the mutex, but only one-by-one. The lock() function cannot fail. The only thing it can do is wait for the mutex to become available, and then take ownership. If some thread in a buggy program keeps ownership of some mutex forever, then a subsequent attempt by some other thread to lock() that same mutex will wait forever.
We sometimes call the part of the program that comes between a lock() call and the subsequent unlock() call a critical section.
We can use a mutex as an advisory lock.
Imagine a program with three variables, A, B, and C, that are shared by several threads. The program has an important rule: A+B+C must always equal zero. Computer scientists call a rule like that an invariant—it's a statement about the program that always is true.
But, what happens if one thread needs to perform some operation, mut(), that changes the values of A, B, and C? It cannot change all three variables simultaneously, so it must temporarily break the invariant. In that moment, some other thread could see the variables in that broken state, and the program could fail.
We fix that problem by having every thread lock() the same advisory lock (i.e., the same mutex) before accessing A, B, or C. And we make sure that A+B+C=0 again before any thread unlocks() the mutex. If the thread calling mut() obeys this rule, then no other thread that also obeys the same rule will ever see A, B, and C in the "broken" state.
If none of the threads in the program ever accesses A, or B, or C without owning the mutex, then we can say that we have effectively made mut() an atomic operation.
You actually should lock a mutex when accessing shared variables regardless of any invariant—do it even if accessing just a single, primitive flag or integer—because using mutexes on a multi-CPU machine enables the different CPUs to see a consistent view of memory. In modern systems, access to the same variable by more than one thread with no locking can lead to undefined behavior.
A program with more than one invariant may use more than one mutex object to protect them: One mutex for each. But, programmers are strongly advised to learn about deadlock before writing any program in which a single thread locks more than one mutex at the same time.
"Deadlock" is the answer to a whole other question, but TLDR, it's what happens when Thread1 owns mutexA, and it's waiting to acquire mutexB; while at the same time, Thread2 owns mutexB, and is waiting to acquire mutexA. It's a thing that actually happens sometimes in real, commercial software, that was written by smart programmers. Usually it's because there were a lot of smart programmers, who weren't always talking to each other.

Correctness of modified consumer/producer

I am creating a Sound class to play notes and would like feedback on the correctness and conciseness of my design. This class differs from the typical consumer/producer in two ways:
The consumer should respond to events, such as to shut down the thread, or otherwise continue forever. The typical consumer/producer exits when the queue is empty. For example, a thread waiting in queue.get cannot handle additional notifications.
Each set of notes submitted by the producer should overwrite any unprocessed notes remaining on the queue.
Originally I had the consumer process one note at a time using the queue module. I found continually acquiring and releasing the lock without any competition to be inefficient, and as previously noted, queue.get prevents waiting on additional events. So instead of building upon that, I rewrote it into:
import threading
queue = []
condition = threading.Condition()
interrupt = threading.Event()
stop = threading.Event()
def producer():
while some_condition:
ns = get_notes() # [(float,float)]
with condition:
queue.clear()
queue.append(ns)
interrupt.set()
condition.notify()
with condition:
stop.set()
condition.notify()
consumer.join()
def consumer():
while not stop.is_set():
with condition:
while not (queue or stop.is_set()):
condition.wait()
if stop.is_set():
break
interrupt.clear()
ns = queue.pop()
ss = gen_samples(ns) # iterator/fast
for b in grouper(ss, size/2):
if interrupt.is_set() or stop.is_set()
break
stream.write(b)
thread = threading.Thread(target=consumer)
thread.start()
producer()
My questions are as follows:
Is this thread-safe? I want to specifically point out my use of is_set without locks or synchronization (in the for-loop).
Can the events be replaced with boolean variables? I believe so as conflicting writes in both threads (data race) are guarded by the condition variable. There is a race condition between setting and checking events but I do not believe it affects program flow.
Is there a more efficient approach/algorithm utilizing different synchronization primitives from the threading module?
edit: Found and fixed a possible deadlock described in Why does Python threading.Condition() notify() require a lock?

Analyzing thread-safety in Python can take into account the Global Interpreter Lock (GIL): no two threads will execute Python code simultaneously. Assignments to variables or object fields are effectively atomic (there are no half-assigned variables) and changes propagate effectively immediately to other threads.
This means that your use of Event.is_set() is already equivalent to using plain booleans. An event is a bool guarded by a Condition. The is_set() method checks the boolean directly. The set() method acquires the Condition, sets the boolean, and notifies all waiting threads. The wait() methods waits until the set() method is invoked. The clear() method acquires the Condition and unsets the boolean. Since you never wait() for any Event, and setting the boolean is atomic, the Condition in the Event is effectively unused.
This might get rid of a couple of locks, but isn't really a huge efficiency win. A Condition is still an abstraction over a lock, but the built-in Queue type uses locks directly. Thus, I would assume that the built-in queue is no less performant than your solution, even for a single consumer.
Your main issue with the built-in queue is that “continually acquiring and releasing the lock without any competition [is] inefficient”. This is wrong on two counts:
Due to Python's GIL, there is little competition in either case.
Acquiring uncontested locks is very efficient.
So while your solution is probably sufficiently correct (I can see no opportunity for deadlock) it is unlikely to be particularly efficient. (There are just some small mistakes, like using stop instead of stop.is_set() and some syntax errors.)
If you are seeing poor performance with Python threads that's probably because of CPython, not because of the Queue type. I already mentioned that only one thread can run at a time due to the GIL. If multiple threads want to run, they must be scheduled by the operating system to do so and acquire the GIL. Each thread will wait for 5ms before asking the running thread to give up the GIL (in a manner quite similar to your interrupt flag). And then the thread can do useful work like acquiring a lock for a critical section that must not be interrupted by other threads.
Possibly, the solution could be to avoid CPython's threads.
If you have multiple CPU-bound tasks, you must use multiple processes. CPython's threads will not run in parallel. However, communication between processes is more expensive.
Consider whether you can combine the producer+consumer directly, possibly using features such as generators.
For an easier time with juggling multiple tasks in the same thread, consider using async/await. Event loops are provided by the asyncio module. This is just as fast as Python's threads, with the caveat that tasks don't pre-empt (interrupt) each other. But this can be advantage: since a task can only be suspended at an await, you don't need most locks and it is easier to reason about correctness of the code. The downside is that async/await might have even higher latency than using threads.
Python has a concept of “executors” that make it easy and efficient to run tasks in separate threads (for I/O-bound tasks) or separate processes (for CPU-bound tasks).
For communicating between multiple processes, use the types from the multiprocessing module (e.g. Queue, Connection, or Value).

Why calling notify() while still holding a condition? [duplicate]

My question refers specifically to why it was designed that way, due to the unnecessary performance implication.
When thread T1 has this code:
cv.acquire()
cv.wait()
cv.release()
and thread T2 has this code:
cv.acquire()
cv.notify() # requires that lock be held
cv.release()
what happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1. Now, there is a race-condition between T2's release and T1's reacquiring after returning from wait(). If T1 tries to reacquire first, it will be unnecessarily resuspended until T2's release() is completed.
Note: I'm intentionally not using the with statement, to better illustrate the race with explicit calls.
This seems like a design flaw. Is there any rationale known for this, or am I missing something?

This is not a definitive answer, but it's supposed to cover the relevant details I've managed to gather about this problem.
First, Python's threading implementation is based on Java's. Java's Condition.signal() documentation reads:
An implementation may (and typically does) require that the current thread hold the lock associated with this Condition when this method is called.
Now, the question was why enforce this behavior in Python in particular. But first I want to cover the pros and cons of each approach.
As to why some think it's often a better idea to hold the lock, I found two main arguments:
From the minute a waiter acquire()s the lock—that is, before releasing it on wait()—it is guaranteed to be notified of signals. If the corresponding release() happened prior to signalling, this would allow the sequence(where P=Producer and C=Consumer) P: release(); C: acquire(); P: notify(); C: wait() in which case the wait() corresponding to the acquire() of the same flow would miss the signal. There are cases where this doesn't matter (and could even be considered to be more accurate), but there are cases where that's undesirable. This is one argument.
When you notify() outside a lock, this may cause a scheduling priority inversion; that is, a low-priority thread might end up taking priority over a high-priority thread. Consider a work queue with one producer and two consumers (LC=Low-priority consumer and HC=High-priority consumer), where LC is currently executing a work item and HC is blocked in wait().
The following sequence may occur:
P LC HC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
execute(item) (in wait())
lock()
wq.push(item)
release()
acquire()
item = wq.pop()
release();
notify()
(wake-up)
while (wq.empty())
wait();
Whereas if the notify() happened before release(), LC wouldn't have been able to acquire() before HC had been woken-up. This is where the priority inversion occurred. This is the second argument.
The argument in favor of notifying outside of the lock is for high-performance threading, where a thread need not go back to sleep just to wake-up again the very next time-slice it gets—which was already explained how it might happen in my question.
Python's threading Module
In Python, as I said, you must hold the lock while notifying. The irony is that the internal implementation does not allow the underlying OS to avoid priority inversion, because it enforces a FIFO order on the waiters. Of course, the fact that the order of waiters is deterministic could come in handy, but the question remains why enforce such a thing when it could be argued that it would be more precise to differentiate between the lock and the condition variable, for that in some flows that require optimized concurrency and minimal blocking, acquire() should not by itself register a preceding waiting state, but only the wait() call itself.
Arguably, Python programmers would not care about performance to this extent anyway—although that still doesn't answer the question of why, when implementing a standard library, one should not allow several standard behaviors to be possible.
One thing which remains to be said is that the developers of the threading module might have specifically wanted a FIFO order for some reason, and found that this was somehow the best way of achieving it, and wanted to establish that as a Condition at the expense of the other (probably more prevalent) approaches. For this, they deserve the benefit of the doubt until they might account for it themselves.

There are several reasons which are compelling (when taken together).
1. The notifier needs to take a lock
Pretend that Condition.notifyUnlocked() exists.
The standard producer/consumer arrangement requires taking locks on both sides:
def unlocked(qu,cv): # qu is a thread-safe queue
qu.push(make_stuff())
cv.notifyUnlocked()
def consume(qu,cv):
with cv:
while True: # vs. other consumers or spurious wakeups
if qu: break
cv.wait()
x=qu.pop()
use_stuff(x)
This fails because both the push() and the notifyUnlocked() can intervene between the if qu: and the wait().
Writing either of
def lockedNotify(qu,cv):
qu.push(make_stuff())
with cv: cv.notify()
def lockedPush(qu,cv):
x=make_stuff() # don't hold the lock here
with cv: qu.push(x)
cv.notifyUnlocked()
works (which is an interesting exercise to demonstrate). The second form has the advantage of removing the requirement that qu be thread-safe, but it costs no more locks to take it around the call to notify() as well.
It remains to explain the preference for doing so, especially given that (as you observed) CPython does wake up the notified thread to have it switch to waiting on the mutex (rather than simply moving it to that wait queue).
2. The condition variable itself needs a lock
The Condition has internal data that must be protected in case of concurrent waits/notifications. (Glancing at the CPython implementation, I see the possibility that two unsynchronized notify()s could erroneously target the same waiting thread, which could cause reduced throughput or even deadlock.) It could protect that data with a dedicated lock, of course; since we need a user-visible lock already, using that one avoids additional synchronization costs.
3. Multiple wake conditions can need the lock
(Adapted from a comment on the blog post linked below.)
def setSignal(box,cv):
signal=False
with cv:
if not box.val:
box.val=True
signal=True
if signal: cv.notifyUnlocked()
def waitFor(box,v,cv):
v=bool(v) # to use ==
while True:
with cv:
if box.val==v: break
cv.wait()
Suppose box.val is False and thread #1 is waiting in waitFor(box,True,cv). Thread #2 calls setSignal; when it releases cv, #1 is still blocked on the condition. Thread #3 then calls waitFor(box,False,cv), finds that box.val is True, and waits. Then #2 calls notify(), waking #3, which is still unsatisfied and blocks again. Now #1 and #3 are both waiting, despite the fact that one of them must have its condition satisfied.
def setTrue(box,cv):
with cv:
if not box.val:
box.val=True
cv.notify()
Now that situation cannot arise: either #3 arrives before the update and never waits, or it arrives during or after the update and has not yet waited, guaranteeing that the notification goes to #1, which returns from waitFor.
4. The hardware might need a lock
With wait morphing and no GIL (in some alternate or future implementation of Python), the memory ordering (cf. Java's rules) imposed by the lock-release after notify() and the lock-acquire on return from wait() might be the only guarantee of the notifying thread's updates being visible to the waiting thread.
5. Real-time systems might need it
Immediately after the POSIX text you quoted we find:
however, if predictable scheduling behavior is required, then that mutex
shall be locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal().
One blog post contains further discussion of the rationale and history of this recommendation (as well as of some of the other issues here).

A couple of months ago exactly the same question occurred to me. But since I had ipython opened, looking at threading.Condition.wait?? result (the source for the method) didn't take long to answer it myself.
In short, the wait method creates another lock called waiter, acquires it, appends it to a list and then, surprise, releases the lock on itself. After that it acquires the waiter once again, that is it starts to wait until someone releases the waiter. Then it acquires the lock on itself again and returns.
The notify method pops a waiter from the waiter list (waiter is a lock, as we remember) and releases it allowing the corresponding wait method to continue.
That is the trick is that the wait method is not holding the lock on the condition itself while waiting for the notify method to release the waiter.
UPD1: I seem to have misunderstood the question. Is it correct that you are bothered that T1 might try to reacquire the lock on itself before the T2 release it?
But is it possible in the context of python's GIL? Or you think that one can insert an IO call before releasing the condition, which would allow T1 to wake up and wait forever?

It's explained in Python 3 documentation: https://docs.python.org/3/library/threading.html#condition-objects.
Note: the notify() and notify_all() methods don’t release the lock; this means that the thread or threads awakened will not return from their wait() call immediately, but only when the thread that called notify() or notify_all() finally relinquishes ownership of the lock.

What happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1.
Not quite. The cv.notify() call does not wake the T1 thread: It only moves it to a different queue. Before the notify(), T1 was waiting for the condition to be true. After the notify(), T1 is waiting to acquire the lock. T2 does not release the lock, and T1 does not "wake up" until T2 explicitly calls cv.release().

There is no race condition, this is how condition variables work.
When wait() is called, then the underlying lock is released until a notification occurs. It is guaranteed that the caller of wait will reacquire the lock before the function returns (eg, after the wait completes).
You're right that there could be some inefficiency if T1 was directly woken up when notify() is called. However, condition variables are typically implemented via OS primitives, and the OS will often be smart enough to realize that T2 still has the lock, so it won't immediately wake up T1 but instead queue it to be woken.
Additionally, in python, this doesn't really matter anyways, as there's only a single thread due to the GIL, so the threads wouldn't be able to run concurrently anyways.
Additionally, it's preferred to use the following forms instead of calling acquire/release directly:
with cv:
cv.wait()
And:
with cv:
cv.notify()
This ensures that the underlying lock is released even if an exception occurs.

Does the lock in asyncio.Condition have other purpose besides compatibility with threading.Condition?

I'd like to ask about asyncio.Condition. I'm not familiar with the concept, but I know and understand locks, semaphores, and queues since my student years.
I could not find a good explanation or typical use cases, just this example. I looked at the source. The core fnctionality is achieved with a FIFO of futures. Each waiting coroutine adds a new future and awaits it. Another coroutine may call notify() which sets the result of one or optionally more futures from the FIFO and that wakes up the same number of waiting coroutines. Really simple up to this point.
However, the implementation and the usage is more complicated than this. A waiting coroutine must first acquire a lock associated with the condition in order to be able to wait (and the wait() releases it while waiting). Also the notifier must acquire a lock to be able to notify(). This leads to with statement before each operation:
async with condition:
# condition operation (wait or notify)
or else a RuntimeError occurrs.
I do not understand the point of having this lock. What resource do we need to protect with the lock? In asyncio there could be always only one coroutine executing in the event loop, there are no "critical sections" as known from threading.
Is this lock really needed (why?) or is it for compatibility with threading code only?
My first idea was it is for the compatibility, but in such case why didn't they remove the lock while preserving the usage? i.e. making
async with condition:
basically an optional no-op.

The answer for this is essentially the same as for threading.Condition vs threading.Event; a condition without a lock is an event, not a condition(*).
Conditions are used to signal that a resource is available. Whomever was waiting for the condition, can use that resource until they are done with it. To ensure that no-one else can use the resource, you need to lock the resource:
resource = get_some_resource()
async with resource.condition:
await resource.condition.wait()
# this resource is mine, no-one will touch it
await resource.do_something_async()
# lock released, resource is available again for the next user
Note how the lock is not released after wait() resumes! Until the lock is released, no other co-routine waiting for the same condition can proceed, access to the resource is made exclusive by virtue of the lock. Note that the lock is released while waiting, so other coroutines can add themselves to the queue, but for wait() to finally return the lock must first be re-acquired.
If you don't need to coordinate access to a shared resource, use an event; a condition is basically a lock and event combined into one primitive, avoiding common implementation pitfalls.
Note that multiple conditions can share locks. This would let you signal specific stages, and other coroutines can wait for that specific stage to arrive. The shared lock would coordinate access to a single resource, but different conditions are signalled when each stage is initiated.
For threading, the typical use-case for conditions offered is that of a single producer, and multiple consumers all waiting on items from the producer to process. The work queue is the shared resource, the producer acquires the condition lock to push an item into the queue and then call notify(), at which point the next consumer waiting on the condition is given the lock (as it returns from wait()) and can remove the item from the queue to work on. This doesn't quite translate to a coroutine-based application, as coroutines don't have the sitting-idle-waiting-for-work-to-be-done problems threading systems have, it's much easier to just spin up consumer co-routines as needed (with perhaps a semaphore to impose a ceiling).
Perhaps a better example is the aioimaplib library, which supports IMAP4 transactions in full. These transactions are asynchronous, but you need to have access to the shared connection resource. So the library uses a single Condition object and wait_for() to wait for a specific state to arrive and thus give exclusive connection access to the coroutine waiting for that transaction state.
(*): Events have a different use-case from conditions, and thus behave a little different from a condition without locking. Once set, an event needs to be cleared explicitly, while a condition 'auto-clears' when used, and is never 'set' when no-one is waiting on the condition. But if you want to signal between tasks and don't need to control access to a shared resource, then you probably wanted an event.

Does threading.Condition maintain a collection of Thread objects?

Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?

A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.