Confusion around thread lock in the core program - python

My program consists of a main core function and a thread that was initiated inside that function to do some tasks (check order status and update the ledger). In my core program, im using objects of multiple classes hundreds of times and i need my thread to be able to make changes and add data to those objects that are shared from the core function. In that thread, i have implemented a thread lock to make sure everything runs smoothly. My question is whether i need to put a lock everytime i use the object in the core function or is it sufficient to only lock the resources in the thread?
My apologies in advance for not sharing the code, i cant.

The effect of holding a lock (i.e. having acquired and not yet released it) is that any other thread that tries to acquire the same lock will be blocked until the first thread has released the lock. In other words, at any given time at most one thread can hold the lock.
If only one thread uses the lock, nothing is achieved.

Related

Do I need to terminate a started thread before starting another thread?

I have a button which calls a function. This function is called using a thread. If I click the button more than once I get an error: RuntimeError: threads can only be started once. I got the solution on SO (create a new thread). But if I create a new thread every time that button is clicked, what happens to the previous thread. Should I worry about the previous threads?
import tkinter as tk
from tkinter import ttk
import threading
root = tk.Tk()
#creating new thread on click
def start_rename():
new_thread = threading.Thread(target= bulk_rename)
new_thread.start()
def bulk_rename():
print("renaming...")
rename_button = ttk.Button(root, text="Bulk Rename", command=start_rename)
rename_button.pack()
root.mainloop()
Here's a different way to say the same things:
A mutex (a.k.a., a "lock", or a "binary semaphore") is an object with two methods; lock() and unlock(), or acquire() and release(), or decrement() and increment(), (or P() and V() in really old programs.) The lock() method does not return until the calling thread "owns"
the mutex, and a subsequent call to the unlock() method from the
same thread will relinquish ownership.
No two threads will be allowed to own the same mutex at the same time. If two or more threads simultaneously try to lock a mutex, one will immediately "win" ownership, and the others will wait for it to be unlocked again.
Assuming each of the competing threads eventually unlocks the mutex, then all of them eventually will be allowed to own the mutex, but only one-by-one. The lock() function cannot fail. The only thing it can do is wait for the mutex to become available, and then take ownership. If some thread in a buggy program keeps ownership of some mutex forever, then a subsequent attempt by some other thread to lock() that same mutex will wait forever.
We sometimes call the part of the program that comes between a lock() call and the subsequent unlock() call a critical section.
We can use a mutex as an advisory lock.
Imagine a program with three variables, A, B, and C, that are shared by several threads. The program has an important rule: A+B+C must always equal zero. Computer scientists call a rule like that an invariant—it's a statement about the program that always is true.
But, what happens if one thread needs to perform some operation, mut(), that changes the values of A, B, and C? It cannot change all three variables simultaneously, so it must temporarily break the invariant. In that moment, some other thread could see the variables in that broken state, and the program could fail.
We fix that problem by having every thread lock() the same advisory lock (i.e., the same mutex) before accessing A, B, or C. And we make sure that A+B+C=0 again before any thread unlocks() the mutex. If the thread calling mut() obeys this rule, then no other thread that also obeys the same rule will ever see A, B, and C in the "broken" state.
If none of the threads in the program ever accesses A, or B, or C without owning the mutex, then we can say that we have effectively made mut() an atomic operation.
You actually should lock a mutex when accessing shared variables regardless of any invariant—do it even if accessing just a single, primitive flag or integer—because using mutexes on a multi-CPU machine enables the different CPUs to see a consistent view of memory. In modern systems, access to the same variable by more than one thread with no locking can lead to undefined behavior.
A program with more than one invariant may use more than one mutex object to protect them: One mutex for each. But, programmers are strongly advised to learn about deadlock before writing any program in which a single thread locks more than one mutex at the same time.
"Deadlock" is the answer to a whole other question, but TLDR, it's what happens when Thread1 owns mutexA, and it's waiting to acquire mutexB; while at the same time, Thread2 owns mutexB, and is waiting to acquire mutexA. It's a thing that actually happens sometimes in real, commercial software, that was written by smart programmers. Usually it's because there were a lot of smart programmers, who weren't always talking to each other.

Does threading.Condition maintain a collection of Thread objects?

Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?
A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.

Python threading: access a shared variable that is being locked at the moment

Here is a situation that seems to cause problem to my program. There two threads, one thread has locks shared variables, while the other, wants to access one shared variable. So what will happen in this case, does the second thread just wait till the lock is released? or there will be errors since thread 2 is accessing locked variables?

PyQt - multithreaded application leaking memory. Should I be deleting threads when they complete?

I have a multithreaded PyQt application that is leaking memory. All the functions that leak memory are worker threads, and I'm wondering if there's something fundamentally wrong with my approach.
When the main application starts, the various worker thread instances are created from the thread classes, but they are not initially started.
When functions run that require a worker thread, the thread is initialized (data and parameters are passed from the main function, and variables are reset from with in the worker instance), and then the thread is started. The worker thread does its business, then completes, but is never formally deleted.
If the function is called again, then again the thread instance is initialized, started, runs, stops, etc...
Because the threads can be called to run again and again, I never saw the need to formally delete them. I originally figured that the same variables were just get re-used, but now I'm wondering if I was mistaken.
Does this sound like the cause of my memory leak? Should I be deleting the threads when they complete even if they're going to be called again?
If this is the root of my problem, can someone point me to a code example of how to handle the thread deleting process properly? (If it matters, I'm using PyQt 4.11.3, Qt 4.8.6, and Python 3.4.3)

Does Python's main thread get garbage collected when it stops?

In a multi-threaded Python process I have a number of non-daemon threads, by which I mean threads which keep the main process alive even after the main thread has exited / stopped.
My non-daemon threads hold weak references to certain objects in the main thread, but when the main thread ends (control falls off the bottom of the file) these objects do not appear to be garbage collected, and my weak reference finaliser callbacks don't fire.
Am I wrong to expect the main thread to be garbage collected? I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
What have I missed?
Supporting materials
Output from pprint.pprint( threading.enumerate() ) showing the main thread has stopped while others soldier on.
[<_MainThread(MainThread, stopped 139664516818688)>,
<LDQServer(testLogIOWorkerThread, started 139664479889152)>,
<_Timer(Thread-18, started 139663928870656)>,
<LDQServer(debugLogIOWorkerThread, started 139664437925632)>,
<_Timer(Thread-17, started 139664463103744)>,
<_Timer(Thread-19, started 139663937263360)>,
<LDQServer(testLogIOWorkerThread, started 139664471496448)>,
<LDQServer(debugLogIOWorkerThread, started 139664446318336)>]
And since someone always asks about the use-case...
My network service occasionally misses its real-time deadlines (which causes a total system failure in the worst case). This turned out to be because logging of (important) DEBUG data would block whenever the file-system has a tantrum. So I am attempting to retrofit a number of established specialised logging libraries to defer blocking I/O to a worker thread.
Sadly the established usage pattern is a mix of short-lived logging channels which log overlapping parallel transactions, and long-lived module-scope channels which are never explicitly closed.
So I created a decorator which defers method calls to a worker thread. The worker thread is non-daemon to ensure that all (slow) blocking I/O completes before the interpreter exits, and holds a weak reference to the client-side (where method calls get enqueued). When the client-side is garbage collected the weak reference's callback fires and the worker thread knows no more work will be enqueued, and so will exit at its next convenience.
This seems to work fine in all but one important use-case: when the logging channel is in the main thread. When the main thread stops / exits the logging channel is not finalised, and so my (non-daemon) worker thread lives on keeping the entire process alive.
It's a bad idea for your main thread to end without calling join on all non-daemon threads, or to make any assumptions about what happens if you don't.
If you don't do anything very unusual, CPython (at least 2.0-3.3) will cover for you by automatically calling join on all non-daemon threads as pair of _MainThread._exitfunc. This isn't actually documented, so you shouldn't rely on it, but it's what's happening to you.
Your main thread hasn't actually exited at all; it's blocking inside its _MainThread._exitfunc trying to join some arbitrary non-daemon thread. Its objects won't be finalized until the atexit handler is called, which doesn't happen until after it finishes joining all non-daemon threads.
Meanwhile, if you avoid this (e.g., by using thread/_thread directly, or by detaching the main thread from its object or forcing it into a normal Thread instance), what happens? It isn't defined. The threading module makes no reference to it at all, but in CPython 2.0-3.3, and likely in any other reasonable implementation, it falls to the thread/_thread module to decide. And, as the docs say:
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executing try ... finally clauses or executing object destructors.
So, if you manage to avoid joining all of your non-daemon threads, you have to write code that can handle both having them hard-killed like daemon threads, and having them continue running until exit.
If they do continue running, at least in CPython 2.7 and 3.3 on POSIX systems, that the main thread's OS-level thread handle, and various higher-level Python objects representing it, may be still retained, and not get cleaned up by the GC.
On top of that, even if everything were released, you can't rely on the GC ever deleting anything. If your code depends on deterministic GC, there are many cases you can get away with it in CPython (although your code will then break in PyPy, Jython, IronPython, etc.), but at exit time is not one of them. CPython can, and will, leak objects at exit time and let the OS sort 'em out. (This is why writable files that you never close may lose the last few writes—the __del__ method never gets called, and therefore there's nobody to tell them to flush, and at least on POSIX the underlying FILE* doesn't automatically flush either.)
If you want something to be cleaned up when the main thread finishes, you have to use some kind of close function rather than relying on __del__, and you have to make sure it gets triggered via a with block around the main block of code, an atexit function, or some other mechanism.
One last thing:
I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
Do you actually have thread locals somewhere? Or do you just mean locals and/or globals that are only accessed in one thread?

Categories