The deprecation of Python's PyEval_ReleaseLock has introduced a problem in our codebase: We want to terminate a Python interpreter from a C callback function using Py_EndInterpreter
So to do that, Python's Docs say that you must hold the GIL when calling this function:
void Py_EndInterpreter(PyThreadState *tstate)
Destroy the (sub-)interpreter represented by the given thread state. The given thread state must be
the current thread state. See the discussion of thread states below. When the call returns, the
current thread state is NULL. All thread states associated with this interpreter are destroyed. (The
global interpreter lock must be held before calling this function and is still held when it returns.)
Py_FinalizeEx() will destroy all sub-interpreters that haven’t been explicitly destroyed at that point.
Great! So we call PyEval_RestoreThread to restore our thread state to the thread we're about to terminate, and then call Py_EndInterpreter.
// Acquire the GIL
PyEval_RestoreThread(thread);
// Tear down the interpreter.
Py_EndInterpreter(thread);
// Now what? We still hold the GIL and we no longer have a valid thread state.
// Previously we did PyEval_ReleaseLock here, but that is now deprecated.
The documentation for PyEval_ReleaseLock says that we should either use PyEval_SaveThread or PyEval_ReleaseThread.
PyEval_ReleaseThread's documentation says that the input thread state must not be NULL. Okay, but we can't pass in the recently deleted thread state.
PyEval_SaveThread will hit a debug assertion if you try to call it after calling Py_EndInterpreter, so that's not an option either.
So, we've currently implemented a hack to get around this issue - we save the thread state of the thread that calls Py_InitializeEx in a global variable, and swap to it after calling Py_EndInterpreter.
// Acquire the GIL
PyEval_RestoreThread(thread);
// Tear down the interpreter.
Py_EndInterpreter(thread);
// Swap to the main thread state.
PyThreadState_Swap(g_init.thread_state_);
PyEval_SaveThread(); // Release the GIL. Probably.
What's the proper solution here? It seems that embedded Python is an afterthought for this API.
Similar question: PyEval_InitThreads in Python 3: How/when to call it? (the saga continues ad nauseam)
Related
According to Python documentation:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
I know that in older versions of Python cyclic referencing would be one of the examples for this behaviour, however as I understand it, in Python 3 such cycles will successfully be destroyed upon interpreter exit.
I'm wondering what are the cases (as close to exhaustive list as possible) when the interpreter would not destroy an object upon exit.
All examples are implementation details - Python does not promise whether or not it will call __del__ for any particular objects on interpreter exit. That said, one of the simplest examples is with daemon threads:
import threading
import time
def target():
time.sleep(1000)
class HasADel:
def __del__(self):
print('del')
x = HasADel()
threading.Thread(target=target, daemon=True).start()
Here, the daemon thread prevents the HasADel instance from being garbage collected on interpreter shutdown. The daemon thread doesn't actually do anything with that object, but Python can't clean up references the daemon thread owns, and x is reachable from references the daemon thread owns.
When the interpreter exits normally, in such ways as the program ending or sys.exit being called, not all objects are guaranteed to be destroyed. There is probably some amount of logic to this, but not very simple logic. After all, the __del__ method is for freeing memory resources, not other resources (like network connections) - that's what __enter__ and __exit__ are for.
Having said that, there are situtations in which __del__ will most certainly not be called. The parallel to this is atexit functions; they are usually run at exit. However:
Note: The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.
atexit documentation
So, there are situations in which clean-up functions, like __del__, __exit__, and functions registered with atexit will not be called:
The program is killed by a signal not handled by Python - If a program recieves a signal to stop, like SIGINT or SIGQUIT, and it doesn't handle the signal, then it will be stopped.
A Python fatal interpreter error occurs.
os._exit() is called - the documentation says:
Exit the process with status n, without calling cleanup handlers, flushing stdio buffers, etc.
So it is pretty clear that __del__ should not be called.
In conclusion, the interpreter does not guarantee __del__ being called, but there are situations in which it will definitely not be called.
After comparing the quoted sentence from documentation and your title, I thought you misunderstood what __del__ is and what it does.
You used the word "destroyed", and documentation said __del__ may not get called in some situations... The thing is "all" objects are get deleted after the interpreter's process finishes. __del__ is not a destructor and has nothing to do with the destruction of objects. Even if a memory leakage occurs in a process, operating systems(the ones I know at least: Linux, Windows,...) will eventually reclaim that memory for the process after it finishes. So everything is destroyed/deleted!(here and here)
In normal cases when these objects are about to get destroyed, __del__ (better known as finalizer) gets called in the very last step of destruction. In other cases mentioned by other answers, It doesn't get called.
That's why people say don't count on __del__ method for cleaning vital stuff and instead use a context manager. In some scenarios, __del__ may even revive the object by passing a reference around.
In a multi-threaded Python process I have a number of non-daemon threads, by which I mean threads which keep the main process alive even after the main thread has exited / stopped.
My non-daemon threads hold weak references to certain objects in the main thread, but when the main thread ends (control falls off the bottom of the file) these objects do not appear to be garbage collected, and my weak reference finaliser callbacks don't fire.
Am I wrong to expect the main thread to be garbage collected? I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
What have I missed?
Supporting materials
Output from pprint.pprint( threading.enumerate() ) showing the main thread has stopped while others soldier on.
[<_MainThread(MainThread, stopped 139664516818688)>,
<LDQServer(testLogIOWorkerThread, started 139664479889152)>,
<_Timer(Thread-18, started 139663928870656)>,
<LDQServer(debugLogIOWorkerThread, started 139664437925632)>,
<_Timer(Thread-17, started 139664463103744)>,
<_Timer(Thread-19, started 139663937263360)>,
<LDQServer(testLogIOWorkerThread, started 139664471496448)>,
<LDQServer(debugLogIOWorkerThread, started 139664446318336)>]
And since someone always asks about the use-case...
My network service occasionally misses its real-time deadlines (which causes a total system failure in the worst case). This turned out to be because logging of (important) DEBUG data would block whenever the file-system has a tantrum. So I am attempting to retrofit a number of established specialised logging libraries to defer blocking I/O to a worker thread.
Sadly the established usage pattern is a mix of short-lived logging channels which log overlapping parallel transactions, and long-lived module-scope channels which are never explicitly closed.
So I created a decorator which defers method calls to a worker thread. The worker thread is non-daemon to ensure that all (slow) blocking I/O completes before the interpreter exits, and holds a weak reference to the client-side (where method calls get enqueued). When the client-side is garbage collected the weak reference's callback fires and the worker thread knows no more work will be enqueued, and so will exit at its next convenience.
This seems to work fine in all but one important use-case: when the logging channel is in the main thread. When the main thread stops / exits the logging channel is not finalised, and so my (non-daemon) worker thread lives on keeping the entire process alive.
It's a bad idea for your main thread to end without calling join on all non-daemon threads, or to make any assumptions about what happens if you don't.
If you don't do anything very unusual, CPython (at least 2.0-3.3) will cover for you by automatically calling join on all non-daemon threads as pair of _MainThread._exitfunc. This isn't actually documented, so you shouldn't rely on it, but it's what's happening to you.
Your main thread hasn't actually exited at all; it's blocking inside its _MainThread._exitfunc trying to join some arbitrary non-daemon thread. Its objects won't be finalized until the atexit handler is called, which doesn't happen until after it finishes joining all non-daemon threads.
Meanwhile, if you avoid this (e.g., by using thread/_thread directly, or by detaching the main thread from its object or forcing it into a normal Thread instance), what happens? It isn't defined. The threading module makes no reference to it at all, but in CPython 2.0-3.3, and likely in any other reasonable implementation, it falls to the thread/_thread module to decide. And, as the docs say:
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executing try ... finally clauses or executing object destructors.
So, if you manage to avoid joining all of your non-daemon threads, you have to write code that can handle both having them hard-killed like daemon threads, and having them continue running until exit.
If they do continue running, at least in CPython 2.7 and 3.3 on POSIX systems, that the main thread's OS-level thread handle, and various higher-level Python objects representing it, may be still retained, and not get cleaned up by the GC.
On top of that, even if everything were released, you can't rely on the GC ever deleting anything. If your code depends on deterministic GC, there are many cases you can get away with it in CPython (although your code will then break in PyPy, Jython, IronPython, etc.), but at exit time is not one of them. CPython can, and will, leak objects at exit time and let the OS sort 'em out. (This is why writable files that you never close may lose the last few writes—the __del__ method never gets called, and therefore there's nobody to tell them to flush, and at least on POSIX the underlying FILE* doesn't automatically flush either.)
If you want something to be cleaned up when the main thread finishes, you have to use some kind of close function rather than relying on __del__, and you have to make sure it gets triggered via a with block around the main block of code, an atexit function, or some other mechanism.
One last thing:
I would have expected that the thread-locals would be deallocated (i.e. garbage collected)...
Do you actually have thread locals somewhere? Or do you just mean locals and/or globals that are only accessed in one thread?
I'm very confused as to how exactly I can ensure thread-safety when calling Python code from a C (or C++) thread.
The Python documentation seems to be saying that the usual idiom to do so is:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);
And indeed, this stackoverflow answer seems to confirm as much. But a commenter (with a very high reputation) says otherwise. The commenter says you should use PyEval_RestoreThread()/PyEval_SaveThread().
The docs seem to confirm this:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created and
thread support is enabled) and reset the thread state to NULL,
returning the previous thread state (which is not NULL). If the lock
has been created, the current thread must have acquired it. (This
function is available even when thread support is disabled at compile
time.)
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created and thread
support is enabled) and set the thread state to tstate, which must not
be NULL. If the lock has been created, the current thread must not have
acquired it, otherwise deadlock ensues. (This function is available even
when thread support is disabled at compile time.)
The way the docs describe this, it seems that PyEval_RestoreThread()/PyEval_SaveThread() is basically a mutex lock/unlock idiom. So it would make sense that before calling any Python code from C, you first need to lock the GIL, and then unlock it.
So which is it? When calling Python code from C, should I use:
PyGILState_Ensure()/PyGILState_Release()
or
PyEval_RestoreThread/PyEval_SaveThread?
And what is really the difference?
First, you almost never want to call PyEval_RestoreThread/PyEval_SaveThread. Instead, you want to call the wrapper macros Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. The documentation is written for those macros, which is why you couldn't find it.
Anyway, either way, you don't use the thread functions/macros to acquire the GIL; you use them to temporarily release the GIL when you've acquired it.
So, why would you ever want to do this? Well, in simple cases you don't; you just need Ensure/Release. But sometimes you need to hold onto your Python thread state until later, but don't need to hold onto the GIL (or even explicitly need to not hold onto the GIL, to allow some other thread to progress so it can signal you). As the docs explain, the most common reasons for this are doing file I/O or extensive CPU-bound computation.
Finally, is there any case where you want to call the functions instead of the macros? Yes, if you want access to the stashed PyThreadState. If you can't think of a reason why you might want that, you probably don't have one.
I am embedding Python 3.2 in a C++ application and I have several sub interpreters that run at various times in the programs (created by Py_NewInterpreter). They acquire and release the GIL at various times, but I have run into a problem when I want to destroy one of the sub interpreters.
To destroy a sub interpreter, you have to acquire the GIL. So I do this:
PyEval_AcquireLock(threadstate);
Then I destroy the interpreter with
Py_EndInterpreter(threadstate);
And you would think it would release the GIL because the thing that held it was destroyed. However, the documentation for Py_EndInterpreter says:
The given thread state must be the
current thread state. See the
discussion of thread states below.
When the call returns, the current
thread state is NULL. (The global interpreter lock must be held before calling this function and is still held when it returns.)
So if I have to hold the GIL when I destroy a sub interpreter and destroying the sub interpreter sets it to NULL and I have to have the thread that acquired the GIL to release it, how do I release the GIL after destroying a sub-interpreter?
What happens if you call PyEval_ReleaseLock() directly after you call Py_EndInterpreter()? That's what the docs tells you to do anyway. :)
I am trying to write a C++ class that calls Python methods of a class that does some I/O operations (file, stdout) at once. The problem I have ran into is that my class is called from different threads: sometimes main thread, sometimes different others. Obviously I tried to apply the approach for Python calls in multi-threaded native applications. Basically everything starts from PyEval_AcquireLock and PyEval_ReleaseLock or just global locks. According to the documentation here when a thread is already locked a deadlock ensues. When my class is called from the main thread or other one that blocks Python execution I have a deadlock.
Python> Cfunc1() - C++ func that creates threads internally which lead to calls in "my class",
It stuck on PyEval_AcquireLock, obviously the Python is already locked, i.e. waiting for C++ Cfunc1 call to complete... It completes fine if I omit those locks. Also it completes fine when Python interpreter is ready for the next user command, i.e. when thread is calling funcs in the background - not inside of a native call
I am looking for a workaround. I need to distinguish whether or not the global lock is allowed, i.e. Python is not locked and ready to receive the next command... I tried PyGIL_Ensure, unfortunately I see hang.
Any known API or solution for this ?
(Python 2.4)
Unless you have wrapped your C++ code quite peculiarly, when any Python thread calls into your C++ code, the GIL is held. You may release it in your C++ code (if you want to do some consuming task that doesn't require any Python interaction), and then will have to acquire it again when you want to do any Python interaction -- see the docs: if you're just using the good old C API, there are macros for that, and the recommended idiom is
Py_BEGIN_ALLOW_THREADS
...Do some blocking I/O operation...
Py_END_ALLOW_THREADS
the docs explain:
The Py_BEGIN_ALLOW_THREADS macro opens
a new block and declares a hidden
local variable; the
Py_END_ALLOW_THREADS macro closes the
block. Another advantage of using
these two macros is that when Python
is compiled without thread support,
they are defined empty, thus saving
the thread state and GIL
manipulations.
So you just don't have to acquire the GIL (and shouldn't) until after you've explicitly released it (ideally with that macro) and need to interact with Python in any way again. (Where the docs say "some blocking I/O operation", it could actually be any long-running operation with no Python interaction whatsoever).