Why do i need the gil for PyMem_Malloc()?

Why do i need the gil for PyMem_Malloc()? - python

As per this discussion, PyMem_Malloc() requires the GIL; however, if the function is nothing more than an alias for malloc(), who cares?

Because it is sometimes more than simply an alias for malloc(). Sometimes it is an alias for _PyMem_DebugMalloc() and there is some global accounting there to keep track of unique memory objects. There's no real point in releasing the GIL just for a PyMem_Malloc() call, so you're probably doing something more complicated in C. If that's the case, you can simply call malloc() and not get any of the debugging stuff.

Related

Why is using global in python threading bad practice?

I read all over various websites how using global is bad. I have an application where I am storing say, 300 objects, in an array. I want to have 8 threads running through these 300 objects. These objects are different sizes, say between 10 and 50,000 integers and randomly distributed (think worst case scenario here).
Basically, I want to start up 8 threads, do a process on an object, report or store the results, and pick up a new object, 300 times.
The solution I can think of is to set a global lock and a global counter, lock the array, get the current object, increment the counter, release the lock.
There is 1 lock for 8 threads. There is 1 counter for 8 threads. I have 2 global objects. I store results in a dictionary, possibly also global to make it visible to all threads but also threadsafe. I am not bothering to do something stupid like subclassing thread and passing along 300/8 objects to each object because multiprocessing.pool does that for me. So how would you do it? Also, convince me that using global in this situation is bad.

Classifying approaches as either "good" or "bad" is a bit simplistic -- in practice, if a design makes sense to you and accomplishes the goals you set out to accomplish, then it doesn't matter whether other people (except possibly your boss) think it's "good" or not; it either works or it doesn't. On the other hand, if your design causes you a lot of pain and suffering, that's a sign that you might not be using the most suitable design for the task at hand.
That said, there are some valid reasons why a lot of people think that global variables are problematic, particularly when combined with multithreading.
The general problem with global variables (with or without multithreading) is that as your program grows larger, it becomes increasingly difficult to mentally keep track of which parts of your program might be reading and/or updating the global variables' values at which times -- since they are global, by definition all parts of your program have access to them, so when you're trying to trace through your program to figure out who it was who set a global variable to some unexpected value, the list of suspects can become unmanageably large. (this isn't much of a problem for small programs, but the larger your program grows, the worse this problem becomes -- and a lot of programmers have learned, through painful experience, that it's better to nip the problem in the bud by avoiding globals wherever possible in the first place, then to have to go back and rewrite a big, complicated, buggy program later on)
In the specific use-case of a multithreaded program, the anybody-could-be-accessing-my-global-variable-at-any-time property becomes even more fraught with peril, since in a multithreaded scenario, any (non-immutable) data that is shared between threads can only be safely accessed with proper serialization (e.g. by locking a mutex before reading/writing the shared data, and unlocking it afterwards). Ideally programmers would never accidentally read or write any shared+mutable data without locking the mutex -- but programmers are human and will inevitably make mistakes; if given the ability to do so, sooner or later you (or someone else) will forget that access to a particular global variable needs to be serialized, and will just go ahead and read/write it, and then you're in for a lot of pain, because the symptoms will be rare and random, and the cause of the fault won't be obvious.
So smart programmers try to make it impossible to fall into that sort of trap, usually by limiting access to the shared-state to a specific, small, carefully-written set of functions (a.k.a. an API) that implement the serialization correctly so that no other code has to. When doing that, you want to make sure that only the code in this particular API has access to the shared data, and that nobody else does -- something that is impossible to do with a global variable, as by definition everyone has direct access to it.
There is also one performance-related reason why people prefer not to mix global variables and multithreading: the more serialization you have to do, the less your program can exploit the power of multiple CPU cores. In particular, it does you no good to have an 8-core CPU if 7 of your 8 threads are spending most of their time blocked, waiting for a mutex to become available.
So how does that relate to globals? It's related in that in most cases it's difficult or impossible to prove that a global variable won't ever be accessed by another thread, which means all accesses to that global variable need to be serialized. With a non-global variable, on the other hand, you can make sure to give a reference to that variable to only a single thread -- at which point you have effectively guaranteed that only that one thread will ever access the variable (since the other threads have no references to it, you know they can't access it), and because you have that guarantee, you no longer need to serialize access to that data, and now your thread can run more efficiently because it never has to block waiting for a mutex.
(Btw note that CPython in particular suffers from a severe form of implicit serialization caused by Python's Global Interpreter Lock, which means that even the best multithreaded, CPU-bound Python code will be unlikely to use more than a single CPU core at a time. The only way to get around that is to use multiprocessing instead, or do the bulk of your program's computations in a lower-level language such C, so that it can execute without holding the GIL)

Unclear documentation: Sharing state between processes

there is a part in the Python documentation that is unclear to me:
https://docs.python.org/3.4/library/multiprocessing.html#sharing-state-between-processes
"As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible."
But I cannot find any description above 17.2.1.5 that describes why it is best to avoid using shared state. Any ideas?

Shared state is like a global variable, but… more global.
Not only do you have to consider what parts of your code are reading and modifying the state, but also which running copy of your code is accessing it, and how. This gets even trickier when the state is mutable, i.e. can be changed.
To make sure one thread doesn't stomp on what another thread is doing you have to coordinate access to the state. That could be done using semaphores, message-passing, software transactional memory, etc.
See also https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil.

RAII in Python: What's the point of del?

At first glance, it seems like Python's __del__ special method offers much the same advantages a destructor has in C++. But according to the Python documentation (https://docs.python.org/3.4/reference/datamodel.html), there is no guarantee that your object's __del__ method ever gets called at all!
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
So in other words, the method is useless! Isn't it? A hook function that may or may not get called really doesn't do much good, so __del__ offers nothing with regard to RAII. If I have some essential cleanup, I don't need it to run some of the time, oh, when ever the GC feels like it really, I need it to run reliably, deterministically and 100% of the time.
I know that Python provides context managers, which are far more useful for that task, but why was __del__ kept around at all? What's the point?

__del__ is a finalizer. It is not a destructor. Finalizers and destructors are entirely different animals.
Destructors are called reliably, and only exist in languages with deterministic memory management (such as C++). Python's context managers (the with statement) can achieve similar effects in certain circumstances. These are reliable because the lifespan of an object is precisely fixed; in C++, objects die when they are explicitly deleted or when some scope is exited (or when a smart pointer deletes them in response to its own destruction). And that's when destructors run.
Finalizers are not called reliably. The only valid use of a finalizer is as an emergency safety net (NB: this article is written from a .NET perspective, but the concepts translate reasonably well). For instance, the file objects returned by open() automatically close themselves when finalized. But you're still supposed to close them yourself (e.g. using the with statement). This is because the objects are destroyed dynamically by the garbage collector, which may or may not run right away, and with generational garbage collection, it may or may not collect some objects in any given pass. Since nobody knows what kinds of optimizations we might invent in the future, it's safest to assume that you just can't know when the garbage collector will get around to collecting your objects. That means you cannot rely on finalizers.
In the specific case of CPython, you get slightly stronger guarantees, thanks to the use of reference counting (which is far simpler and more predictable than garbage collection). If you can ensure that you never create a reference cycle involving a given object, that object's finalizer will be called at a predictable point (when the last reference dies). This is only true of CPython, the reference implementation, and not of PyPy, IronPython, Jython, or any other implementations.

Because __del__ does get called. It's just that it's unclear when it will, because in CPython if you have circular references, the refcount mechanism can't take care of the object reclamation (and thus its finalization via __del__) and must delegate it to the garbage collector.
The garbage collector then has a problem: he cannot know in which order to break the circular references, because this may trigger additional problems (e.g. frees the memory that is going to be needed in the finalization of another object that is part of the collected loop, triggering a segfault).
The point you stress is because the interpreter may exit for reasons that prevents it to perform the cleanup (e.g. it segfaults, or some C module impolitely calls exit() ).
There's PEP 442 for safe object finalization that has been finalized in 3.4. I suggest you take a look at it.
https://www.python.org/dev/peps/pep-0442/

Will the collections.deque "pop" methods release GIL?

I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function

Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.

Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.

Calling Python code from a C thread

I'm very confused as to how exactly I can ensure thread-safety when calling Python code from a C (or C++) thread.
The Python documentation seems to be saying that the usual idiom to do so is:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);
And indeed, this stackoverflow answer seems to confirm as much. But a commenter (with a very high reputation) says otherwise. The commenter says you should use PyEval_RestoreThread()/PyEval_SaveThread().
The docs seem to confirm this:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created and
thread support is enabled) and reset the thread state to NULL,
returning the previous thread state (which is not NULL). If the lock
has been created, the current thread must have acquired it. (This
function is available even when thread support is disabled at compile
time.)
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created and thread
support is enabled) and set the thread state to tstate, which must not
be NULL. If the lock has been created, the current thread must not have
acquired it, otherwise deadlock ensues. (This function is available even
when thread support is disabled at compile time.)
The way the docs describe this, it seems that PyEval_RestoreThread()/PyEval_SaveThread() is basically a mutex lock/unlock idiom. So it would make sense that before calling any Python code from C, you first need to lock the GIL, and then unlock it.
So which is it? When calling Python code from C, should I use:
PyGILState_Ensure()/PyGILState_Release()
or
PyEval_RestoreThread/PyEval_SaveThread?
And what is really the difference?

First, you almost never want to call PyEval_RestoreThread/PyEval_SaveThread. Instead, you want to call the wrapper macros Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. The documentation is written for those macros, which is why you couldn't find it.
Anyway, either way, you don't use the thread functions/macros to acquire the GIL; you use them to temporarily release the GIL when you've acquired it.
So, why would you ever want to do this? Well, in simple cases you don't; you just need Ensure/Release. But sometimes you need to hold onto your Python thread state until later, but don't need to hold onto the GIL (or even explicitly need to not hold onto the GIL, to allow some other thread to progress so it can signal you). As the docs explain, the most common reasons for this are doing file I/O or extensive CPU-bound computation.
Finally, is there any case where you want to call the functions instead of the macros? Yes, if you want access to the stashed PyThreadState. If you can't think of a reason why you might want that, you probably don't have one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.