In a multi-threaded Python process I have a number of non-daemon threads, by which I mean threads which keep the main process alive even after the main thread has exited / stopped.
My non-daemon threads hold weak references to certain objects in the main thread, but when the main thread ends (control falls off the bottom of the file) these objects do not appear to be garbage collected, and my weak reference finaliser callbacks don't fire.
Am I wrong to expect the main thread to be garbage collected? I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
What have I missed?
Supporting materials
Output from pprint.pprint( threading.enumerate() ) showing the main thread has stopped while others soldier on.
[<_MainThread(MainThread, stopped 139664516818688)>,
<LDQServer(testLogIOWorkerThread, started 139664479889152)>,
<_Timer(Thread-18, started 139663928870656)>,
<LDQServer(debugLogIOWorkerThread, started 139664437925632)>,
<_Timer(Thread-17, started 139664463103744)>,
<_Timer(Thread-19, started 139663937263360)>,
<LDQServer(testLogIOWorkerThread, started 139664471496448)>,
<LDQServer(debugLogIOWorkerThread, started 139664446318336)>]
And since someone always asks about the use-case...
My network service occasionally misses its real-time deadlines (which causes a total system failure in the worst case). This turned out to be because logging of (important) DEBUG data would block whenever the file-system has a tantrum. So I am attempting to retrofit a number of established specialised logging libraries to defer blocking I/O to a worker thread.
Sadly the established usage pattern is a mix of short-lived logging channels which log overlapping parallel transactions, and long-lived module-scope channels which are never explicitly closed.
So I created a decorator which defers method calls to a worker thread. The worker thread is non-daemon to ensure that all (slow) blocking I/O completes before the interpreter exits, and holds a weak reference to the client-side (where method calls get enqueued). When the client-side is garbage collected the weak reference's callback fires and the worker thread knows no more work will be enqueued, and so will exit at its next convenience.
This seems to work fine in all but one important use-case: when the logging channel is in the main thread. When the main thread stops / exits the logging channel is not finalised, and so my (non-daemon) worker thread lives on keeping the entire process alive.
It's a bad idea for your main thread to end without calling join on all non-daemon threads, or to make any assumptions about what happens if you don't.
If you don't do anything very unusual, CPython (at least 2.0-3.3) will cover for you by automatically calling join on all non-daemon threads as pair of _MainThread._exitfunc. This isn't actually documented, so you shouldn't rely on it, but it's what's happening to you.
Your main thread hasn't actually exited at all; it's blocking inside its _MainThread._exitfunc trying to join some arbitrary non-daemon thread. Its objects won't be finalized until the atexit handler is called, which doesn't happen until after it finishes joining all non-daemon threads.
Meanwhile, if you avoid this (e.g., by using thread/_thread directly, or by detaching the main thread from its object or forcing it into a normal Thread instance), what happens? It isn't defined. The threading module makes no reference to it at all, but in CPython 2.0-3.3, and likely in any other reasonable implementation, it falls to the thread/_thread module to decide. And, as the docs say:
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executing try ... finally clauses or executing object destructors.
So, if you manage to avoid joining all of your non-daemon threads, you have to write code that can handle both having them hard-killed like daemon threads, and having them continue running until exit.
If they do continue running, at least in CPython 2.7 and 3.3 on POSIX systems, that the main thread's OS-level thread handle, and various higher-level Python objects representing it, may be still retained, and not get cleaned up by the GC.
On top of that, even if everything were released, you can't rely on the GC ever deleting anything. If your code depends on deterministic GC, there are many cases you can get away with it in CPython (although your code will then break in PyPy, Jython, IronPython, etc.), but at exit time is not one of them. CPython can, and will, leak objects at exit time and let the OS sort 'em out. (This is why writable files that you never close may lose the last few writes—the __del__ method never gets called, and therefore there's nobody to tell them to flush, and at least on POSIX the underlying FILE* doesn't automatically flush either.)
If you want something to be cleaned up when the main thread finishes, you have to use some kind of close function rather than relying on __del__, and you have to make sure it gets triggered via a with block around the main block of code, an atexit function, or some other mechanism.
One last thing:
I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
Do you actually have thread locals somewhere? Or do you just mean locals and/or globals that are only accessed in one thread?
Related
According to Python documentation:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
I know that in older versions of Python cyclic referencing would be one of the examples for this behaviour, however as I understand it, in Python 3 such cycles will successfully be destroyed upon interpreter exit.
I'm wondering what are the cases (as close to exhaustive list as possible) when the interpreter would not destroy an object upon exit.
All examples are implementation details - Python does not promise whether or not it will call __del__ for any particular objects on interpreter exit. That said, one of the simplest examples is with daemon threads:
import threading
import time
def target():
time.sleep(1000)
class HasADel:
def __del__(self):
print('del')
x = HasADel()
threading.Thread(target=target, daemon=True).start()
Here, the daemon thread prevents the HasADel instance from being garbage collected on interpreter shutdown. The daemon thread doesn't actually do anything with that object, but Python can't clean up references the daemon thread owns, and x is reachable from references the daemon thread owns.
When the interpreter exits normally, in such ways as the program ending or sys.exit being called, not all objects are guaranteed to be destroyed. There is probably some amount of logic to this, but not very simple logic. After all, the __del__ method is for freeing memory resources, not other resources (like network connections) - that's what __enter__ and __exit__ are for.
Having said that, there are situtations in which __del__ will most certainly not be called. The parallel to this is atexit functions; they are usually run at exit. However:
Note: The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.
atexit documentation
So, there are situations in which clean-up functions, like __del__, __exit__, and functions registered with atexit will not be called:
The program is killed by a signal not handled by Python - If a program recieves a signal to stop, like SIGINT or SIGQUIT, and it doesn't handle the signal, then it will be stopped.
A Python fatal interpreter error occurs.
os._exit() is called - the documentation says:
Exit the process with status n, without calling cleanup handlers, flushing stdio buffers, etc.
So it is pretty clear that __del__ should not be called.
In conclusion, the interpreter does not guarantee __del__ being called, but there are situations in which it will definitely not be called.
After comparing the quoted sentence from documentation and your title, I thought you misunderstood what __del__ is and what it does.
You used the word "destroyed", and documentation said __del__ may not get called in some situations... The thing is "all" objects are get deleted after the interpreter's process finishes. __del__ is not a destructor and has nothing to do with the destruction of objects. Even if a memory leakage occurs in a process, operating systems(the ones I know at least: Linux, Windows,...) will eventually reclaim that memory for the process after it finishes. So everything is destroyed/deleted!(here and here)
In normal cases when these objects are about to get destroyed, __del__ (better known as finalizer) gets called in the very last step of destruction. In other cases mentioned by other answers, It doesn't get called.
That's why people say don't count on __del__ method for cleaning vital stuff and instead use a context manager. In some scenarios, __del__ may even revive the object by passing a reference around.
When running my code I start a thread that runs for around 50 seconds and does a lot of background stuff. If I run this program and then close it soon after, the stuff still goes on in the background for a while because the thread never dies. How can I kill the thread gracefully in my closeEvent method in my MianWindow class? I've tried setting up a method called exit(), creating a signal 'quitOperation' in the thread in question, and then tried to use
myThread.quitOperation.emit()
I expected that this would call my exit() function in my thread because I have this line in my constructor:
self.quitOperation.connect(self.exit)
However, when I use the first line it breaks, saying that 'myThread' has no attribute 'quitOperation'. Why is this? Is there a better way?
I'm not sure for python, but I assume this myThread.quitOperation.emit() emits a signal for the thread to exit. The point is that while your worker is using the thread and does not return, nor runs QCoreApplication::processEvents(), myThread will never have the chance to actually process your request (this is called thread starvation).
Correct answer may depend on the situation, and the nature of the "stuff" your thread is doing. The most common practice is that the main thread sends a signal to the worker thread where a slot sets a flag. In the blocking process you regularly check this flag. It it is set you stop whatever "stuff" you are doing, tell your worker thread that it can quit (with a signal preferably with queued connection), call a deleteLater() on the worker object itself, and return from any functions you are currently in, so that the thread's event handler can run, and clear your worker object and itself up, the finally quit.
In case your "stuff" is a huge cycle of very fast operation like simple mathematics or directory navigation one-by-one that takes only a few milliseconds each, this will be enough.
In case your "stuff" contain huge blocking parts that you have no control of (an thus you can't place this flag checking call in it), you may need to wait in the main thread until the worker thread quits.
In case you use direct connect to set the flag, or you set it directly, be sure to protect the read/write access of the flag with a QMutex to prevent inconsistent reads, or user a queued connection to ensure single thread access of the flag.
While highly discouraged, optionally you can use QThread's terminate() method to instantaneously kill the thread. You should never do this as it may cause memory leak, heap corruption, resource leaking and any nasty stuff as destructors and clean-up codes will not run, and the execution can be halted at an undesired state.
I have a multithreaded PyQt application that is leaking memory. All the functions that leak memory are worker threads, and I'm wondering if there's something fundamentally wrong with my approach.
When the main application starts, the various worker thread instances are created from the thread classes, but they are not initially started.
When functions run that require a worker thread, the thread is initialized (data and parameters are passed from the main function, and variables are reset from with in the worker instance), and then the thread is started. The worker thread does its business, then completes, but is never formally deleted.
If the function is called again, then again the thread instance is initialized, started, runs, stops, etc...
Because the threads can be called to run again and again, I never saw the need to formally delete them. I originally figured that the same variables were just get re-used, but now I'm wondering if I was mistaken.
Does this sound like the cause of my memory leak? Should I be deleting the threads when they complete even if they're going to be called again?
If this is the root of my problem, can someone point me to a code example of how to handle the thread deleting process properly? (If it matters, I'm using PyQt 4.11.3, Qt 4.8.6, and Python 3.4.3)
I have read most of the similar questions in stackoverflow, but none see to solve my problem. I use ctypes to call a function from dll file. Therefore, I can't edit the source codes of the dll file to add any "end looping" conditions. Also, this function may last long (like some printing command). I need to design a "halt" command in case that something of emergency happens while printing is processed. The only way I can do is to kill the thread.
It is never good to forcibly kill a thread. Your program should be designed to cleanly exit from threads.
You can mark it as "daemon" before starting it. If you exit the main thread it will not wait on daemonized threads.
Terminating a thread can still be done in two ways. You can asynchronously raise a Python exception in a thread, via https://docs.python.org/2/c-api/init.html#c.PyThreadState_SetAsyncExc (as stated, this requires building a C module or using ctypes to make it work). The other approach on Windows is to call the Windows API TerminateThread():
TerminateThread is used to cause a thread to exit. When this occurs,
the target thread has no chance to execute any user-mode code. DLLs
attached to the thread are not notified that the thread is
terminating. The system frees the thread's initial stack.
[...]
TerminateThread is a dangerous function that should only be used in
the most extreme cases. You should call TerminateThread only if you
know exactly what the target thread is doing, and you control all of
the code that the target thread could possibly be running at the time
of the termination. For example, TerminateThread can result in the
following problems: ...
I think this should also be doable using ctypes.
You cannot safely terminate a thread without its cooperation. Threads are not isolated within a process, so unsafely terminating a thread contaminates the process. Please, don't go down this road.
If you need this kind of isolation, you need a process. You can safely terminate a process without its cooperation, though it may leave system objects (such as files) that the process was working on in an intermediate state. In your case, that may mean a print job half-done and a page halfway in the printer. Or it may mean temporary files that don't get removed.
I'm writing to many files in a threaded app and I'm creating one handler per file. I have HandlerFactory class that manages the distribution of these handlers. What I'd like to do is that
thread A requests and gets foo.txt's file handle from the HandlerFactory class
thread B requests foo.txt's file handler
handler class recognizes that this file handle has been checked out
handler class puts thread A to sleep
thread B closes file handle using a wrapper method from HandlerFactory
HandlerFactory notifies sleeping threads
thread B wakes and successfully gets foo.txt's file handle
This is what I have so far,
def get_handler(self, file_path, type):
self.lock.acquire()
if file_path not in self.handlers:
self.handlers[file_path] = open(file_path, type)
elif not self.handlers[file_path].closed:
time.sleep(1)
self.lock.release()
return self.handlers[file_path][type]
I believe this covers the sleeping and handler retrieval successfully, but I am unsure how to wake up all threads, or even better wake up a specific thread.
What you're looking for is known as a condition variable.
Condition Variables
Here is the Python 2 library reference.
For Python 3 it can be found here
Looks like you want a threading.Semaphore associated with each handler (other synchronization objects like Events and Conditions are also possible, but a Semaphore seems simplest for your needs). (Specifically, use a BoundedSemaphore: for your use case, that will raise an exception immediately for programming errors that erroneously release the semaphone more times than they acquire it -- and that's exactly the reason for being of the bounded version of semaphones;-).
Initialize each semaphore to a value of 1 when you build it (so that means the handler is available). Each using-thread calls acquire on the semaphore to get the handler (that may block it), and release on it when it's done with the handler (that will unblock exactly one of the waiting threads). That's simpler than the acquire/wait/notify/release lifecycle of a Condition, and more future-proof too, since as the docs for Condition say:
The current implementation wakes up
exactly one thread, if any are
waiting. However, it’s not safe to
rely on this behavior. A future,
optimized implementation may
occasionally wake up more than one
thread.
while with a Semaphore you're playing it safe (the semantics whereof are safe to rely on: if a semaphore is initialized to N, there are at all times between 0 and N-1 [[included]] threads that have successfully acquired the semaphore and not yet released it).
You do realize that Python has a giant lock, so that most of the benefits of multi-threading you do not get, right?
Unless there is some reason for the master thread to do something with the results of each worker, you may wish to consider just forking off another process for each request. You won't have to deal with locking issues then. Have the children do what they need to do, then die. If they do need to communicate back, do it over a pipe, with XMLRPC, or through a sqlite database (which is threadsafe).