Python threading and GIL

Python threading and GIL - python

I was reading about the GIL and it never really specified if this includes the main thread or not (i assume so). Reason I ask is because I have a program with threads setup that modify a dictionary. The main thread adds/deletes based on player input while a thread loops the data updating and changing data.
However in some cases a thread may iterate over the dictionary keys where one could delete them. If there is a so called GIL and they are run sequentially, why am I getting dict changed errors? If only one is suppose to run at a time, then technically this should not happen.
Can anyone shed some light on such a thing? Thank you.

They are running at the same time, they just don't execute at the same time. The iterations might be interleaved. Quote Python:
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time.
So two for loops might run at the same time, there will just be no (for example) two del dict[index]'s at the same time.

The GIL locks at a Python byte-code level, and applies to all threads, even the main thread. If you have one thread modifying a dictionary, and another iterating keys, they will interfere with each other.
"Only one runs at a time" is true, but you have to understand the unit of granularity. In the case of CPython's GIL, the granularity is a bytecode instruction, so execution can switch between threads at any bytecode.

The gil prevents two threads from modifying the interpreter state simultaneously. It doesn't provide any thread consistency constraints, or any kind of mutex at all on a granularity smaller than the whole process. If you need to read and modify a dict in two threads, you should be using a mutex

Python switches threads more often than you seem to think it does. You say "only one" is supposed to run at a time, and technically that's true, but it depends on your definition of "one." Python's atomic operations are very small. For example: adding a single item to a dictionary. Iteration over an entire dictionary can be interrupted.
You should use a lock object from the threading library to isolate your program's atomic operations.

Related

Concurrency and race condition [duplicate]

Does the presence of python GIL imply that in python multi threading the same operation is not so different from repeating it in a single thread?.
For example, If I need to upload two files, what is the advantage of doing them in two threads instead of uploading them one after another?.
I tried a big math operation in both ways. But they seem to take almost equal time to complete.
This seems to be unclear to me. Can someone help me on this?.
Thanks.

Python's threads get a slightly worse rap than they deserve. There are three (well, 2.5) cases where they actually get you benefits:
If non-Python code (e.g. a C library, the kernel, etc.) is running, other Python threads can continue executing. It's only pure Python code that can't run in two threads at once. So if you're doing disk or network I/O, threads can indeed buy you something, as most of the time is spent outside of Python itself.
The GIL is not actually part of Python, it's an implementation detail of CPython (the "reference" implementation that the core Python devs work on, and that you usually get if you just run "python" on your Linux box or something.
Jython, IronPython, and any other reimplementations of Python generally do not have a GIL, and multiple pure-Python threads can execute simultaneously.
The 0.5 case: Even if you're entirely pure-Python and see little or no performance benefit from threading, some problems are really convenient in terms of developer time and difficulty to solve with threads. This depends in part on the developer, too, of course.

It really depends on the library you're using. The GIL is meant to prevent Python objects and its internal data structures to be changed at the same time. If you're doing an upload, the library you use to do the actual upload might release the GIL while it's waiting for the actual HTTP request to complete (I would assume that is the case with the HTTP modules in the standard library, but I didn't check).
As a side note, if you really want to have things running in parallel, just use multiple processes. It will save you a lot of trouble and you'll end up with better code (more robust, more scalable, and most probably better structured).

It depends on the native code module that's executing. Native modules can release the GIL and then go off and do their own thing allowing another thread to lock the GIL. The GIL is normally held while code, both python and native, are operating on python objects. If you want more detail you'll probably need to go and read quite a bit about it. :)
See:
What is a global interpreter lock (GIL)? and Thread State and the Global Interpreter Lock

Multithreading is a concept where two are more tasks need be completed simultaneously, for example, I have word processor in this application there are N numbers of a parallel task have to work. Like listening to keyboard, formatting input text, sending a formatted text to display unit. In this context with sequential processing, it is time-consuming and one task has to wait till the next task completion. So we put these tasks in threads and simultaneously complete the task. Three threads are always up and waiting for the inputs to arrive, then take that input and produce the output simultaneously.
So multi-threading works faster if we have multi-core and processors. But in reality with single processors, threads will work one after the other, but we feel it's executing with greater speed, Actually, one instruction executes at a time and a processor can execute billions of instructions at a time. So the computer creates illusion that multi-task or thread working parallel. It just an illusion.

Is "Python only run one thread in parallel" true? [duplicate]

I've been trying to wrap my head around how threads work in Python, and it's hard to find good information on how they operate. I may just be missing a link or something, but it seems like the official documentation isn't very thorough on the subject, and I haven't been able to find a good write-up.
From what I can tell, only one thread can be running at once, and the active thread switches every 10 instructions or so?
Where is there a good explanation, or can you provide one? It would also be very nice to be aware of common problems that you run into while using threads with Python.

Yes, because of the Global Interpreter Lock (GIL) there can only run one thread at a time. Here are some links with some insights about this:
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/
From the last link an interesting quote:
Let me explain what all that means.
Threads run inside the same virtual
machine, and hence run on the same
physical machine. Processes can run
on the same physical machine or in
another physical machine. If you
architect your application around
threads, you’ve done nothing to access
multiple machines. So, you can scale
to as many cores are on the single
machine (which will be quite a few
over time), but to really reach web
scales, you’ll need to solve the
multiple machine problem anyway.
If you want to use multi core, pyprocessing defines an process based API to do real parallelization. The PEP also includes some interesting benchmarks.

Python's a fairly easy language to thread in, but there are caveats. The biggest thing you need to know about is the Global Interpreter Lock. This allows only one thread to access the interpreter. This means two things: 1) you rarely ever find yourself using a lock statement in python and 2) if you want to take advantage of multi-processor systems, you have to use separate processes. EDIT: I should also point out that you can put some of the code in C/C++ if you want to get around the GIL as well.
Thus, you need to re-consider why you want to use threads. If you want to parallelize your app to take advantage of dual-core architecture, you need to consider breaking your app up into multiple processes.
If you want to improve responsiveness, you should CONSIDER using threads. There are other alternatives though, namely microthreading. There are also some frameworks that you should look into:
stackless python
greenlets
gevent
monocle

Below is a basic threading sample. It will spawn 20 threads; each thread will output its thread number. Run it and observe the order in which they print.
import threading
class Foo (threading.Thread):
def __init__(self,x):
self.__x = x
threading.Thread.__init__(self)
def run (self):
print str(self.__x)
for x in xrange(20):
Foo(x).start()
As you have hinted at Python threads are implemented through time-slicing. This is how they get the "parallel" effect.
In my example my Foo class extends thread, I then implement the run method, which is where the code that you would like to run in a thread goes. To start the thread you call start() on the thread object, which will automatically invoke the run method...
Of course, this is just the very basics. You will eventually want to learn about semaphores, mutexes, and locks for thread synchronization and message passing.

Note: wherever I mention thread i mean specifically threads in python until explicitly stated.
Threads work a little differently in python if you are coming from C/C++ background. In python, Only one thread can be in running state at a given time.This means Threads in python cannot truly leverage the power of multiple processing cores since by design it's not possible for threads to run parallelly on multiple cores.
As the memory management in python is not thread-safe each thread require an exclusive access to data structures in python interpreter.This exclusive access is acquired by a mechanism called GIL ( global interpretr lock ).
Why does python use GIL?
In order to prevent multiple threads from accessing interpreter state simultaneously and corrupting the interpreter state.
The idea is whenever a thread is being executed (even if it's the main thread), a GIL is acquired and after some predefined interval of time the
GIL is released by the current thread and reacquired by some other thread( if any).
Why not simply remove GIL?
It is not that its impossible to remove GIL, its just that in prcoess of doing so we end up putting mutiple locks inside interpreter in order to serialize access, which makes even a single threaded application less performant.
so the cost of removing GIL is paid off by reduced performance of a single threaded application, which is never desired.
So when does thread switching occurs in python?
Thread switch occurs when GIL is released.So when is GIL Released?
There are two scenarios to take into consideration.
If a Thread is doing CPU Bound operations(Ex image processing).
In Older versions of python , Thread switching used to occur after a fixed no of python instructions.It was by default set to 100.It turned out that its not a very good policy to decide when switching should occur since the time spent executing a single instruction can
very wildly from millisecond to even a second.Therefore releasing GIL after every 100 instructions regardless of the time they take to execute is a poor policy.
In new versions instead of using instruction count as a metric to switch thread , a configurable time interval is used.
The default switch interval is 5 milliseconds.you can get the current switch interval using sys.getswitchinterval().
This can be altered using sys.setswitchinterval()
If a Thread is doing some IO Bound Operations(Ex filesystem access or
network IO)
GIL is release whenever the thread is waiting for some for IO operation to get completed.
Which thread to switch to next?
The interpreter doesn’t have its own scheduler.which thread becomes scheduled at the end of the interval is the operating system’s decision. .

Use threads in python if the individual workers are doing I/O bound operations. If you are trying to scale across multiple cores on a machine either find a good IPC framework for python or pick a different language.

One easy solution to the GIL is the multiprocessing module. It can be used as a drop in replacement to the threading module but uses multiple Interpreter processes instead of threads. Because of this there is a little more overhead than plain threading for simple things but it gives you the advantage of real parallelization if you need it.
It also easily scales to multiple physical machines.
If you need truly large scale parallelization than I would look further but if you just want to scale to all the cores of one computer or a few different ones without all the work that would go into implementing a more comprehensive framework, than this is for you.

Try to remember that the GIL is set to poll around every so often in order to do show the appearance of multiple tasks. This setting can be fine tuned, but I offer the suggestion that there should be work that the threads are doing or lots of context switches are going to cause problems.
I would go so far as to suggest multiple parents on processors and try to keep like jobs on the same core(s).

Will the collections.deque "pop" methods release GIL?

I have a piece of code where I have a processing thread and a monitor thread. In the processing thread, I have a call to collections.deque.popleft function. I wanted to know if this function releases GIL because I want run my monitor thread even when the processing function is blocked on the popleft function

Instead of answering this specific question I'll answer a different question:
What is the Global Interpreter Lock (GIL), and when will it block my program?
In short, the GIL protects the interpreter's state from becoming corrupted by concurrent threads.
For a sense of what it is for, Consider the low level implementation of dict, which somewhere has an array of keys, organized for quick lookup. When you write some code like:
myDict['foo'] = 'bar'
the python interpreter needs to adjust its collection of keys. That might involve things like making more room for the additional key as well as adding the particular key to that array.
If multiple, concurrent threads are modifying that dict, then one thread might reallocate the array while another is in the middle of modifying it, which could cause some unpredictable, probably bad behavior (anything from corrupted data, segfault or heartbleed like memory content leak of sensitive data or arbitrary code execution)
Since that's not the sort of state you can reasonably describe or prevent at the level of your python application, the run-time goes to great lengths to prevent those sorts of problems from occuring. The way it does it is that certain parts of the interpreter, such as the modification of a dict, is surrounded by a PyGILState_Ensure()/PyGILState_Release() pair, so that critical operations always reach a consistent state.
Note however that the scope of this lock is very narrow; it doesn't attempt to protect from general data races, it won't protect you from writing a program with multiple threads overwriting each other's work in a common container (say, a collections.deque), only that even if you do write such a program, it wont' cause the interpreter to crash, you'll always have a valid, working deque. You can add additional application locks, as in queue.Queue to give good concurrent semantics to your application.
Since every operation that the GIL protects is a change in the interpreter state, it never blocks on external events; since those events won't cause the interpreter state to be changed, a signaling condition variable cannot corrupt memory.
The only time you might have a problem is when you have several unblocked threads, since they are potentially all executing code in the low level interpreter, they'll compete for the GIL, and only one thread can hold it, blocking other threads that also want to do some computation.
Unless you are writing C extensions, you probably don't need to worry about it, and unless you have multiple, compute bound threads, in python, you won't be affected by it, either.

Yes -- deque is thread-safe (thanks #hemanths) http://docs.python.org/2/library/collections.html#collections.deque
No, because collections.deque is not thread-safe. Use a Queue, or make your own deque subclass.

Python threads all executing on a single core

I have a Python program that spawns many threads, runs 4 at a time, and each performs an expensive operation. Pseudocode:
for object in list:
t = Thread(target=process, args=(object))
# if fewer than 4 threads are currently running, t.start(). Otherwise, add t to queue
But when the program is run, Activity Monitor in OS X shows that 1 of the 4 logical cores is at 100% and the others are at nearly 0. Obviously I can't force the OS to do anything but I've never had to pay attention to performance in multi-threaded code like this before so I was wondering if I'm just missing or misunderstanding something.
Thanks.

Note that in many cases (and virtually all cases where your "expensive operation" is a calculation implemented in Python), multiple threads will not actually run concurrently due to Python's Global Interpreter Lock (GIL).
The GIL is an interpreter-level lock.
This lock prevents execution of
multiple threads at once in the Python
interpreter. Each thread that wants to
run must wait for the GIL to be
released by the other thread, which
means your multi-threaded Python
application is essentially single
threaded, right? Yes. Not exactly.
Sort of.
CPython uses what’s called “operating
system” threads under the covers,
which is to say each time a request to
make a new thread is made, the
interpreter actually calls into the
operating system’s libraries and
kernel to generate a new thread. This
is the same as Java, for example. So
in memory you really do have multiple
threads and normally the operating
system controls which thread is
scheduled to run. On a multiple
processor machine, this means you
could have many threads spread across
multiple processors, all happily
chugging away doing work.
However, while CPython does use
operating system threads (in theory
allowing multiple threads to execute
within the interpreter
simultaneously), the interpreter also
forces the GIL to be acquired by a
thread before it can access the
interpreter and stack and can modify
Python objects in memory all
willy-nilly. The latter point is why
the GIL exists: The GIL prevents
simultaneous access to Python objects
by multiple threads. But this does not
save you (as illustrated by the Bank
example) from being a lock-sensitive
creature; you don’t get a free ride.
The GIL is there to protect the
interpreters memory, not your sanity.
See the Global Interpreter Lock section of Jesse Noller's post for more details.
To get around this problem, check out Python's multiprocessing module.
multiple processes (with judicious use
of IPC) are[...] a much better
approach to writing apps for multi-CPU
boxes than threads.
-- Guido van Rossum (creator of Python)
Edit based on a comment from #spinkus:
If Python can't run multiple threads simultaneously, then why have threading at all?
Threads can still be very useful in Python when doing simultaneous operations that do not need to modify the interpreter's state. This includes many (most?) long-running function calls that are not in-Python calculations, such as I/O (file access or network requests)) and [calculations on Numpy arrays][6]. These operations release the GIL while waiting for a result, allowing the program to continue executing. Then, once the result is received, the thread must re-acquire the GIL in order to use that result in "Python-land"

Python has a Global Interpreter Lock, which can prevent threads of interpreted code from being processed concurrently.
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
http://wiki.python.org/moin/GlobalInterpreterLock
For ways to get around this, try the multiprocessing module, as advised here:
Does running separate python processes avoid the GIL?

AFAIK, in CPython the Global Interpreter Lock means that there can't be more than one block of Python code being run at any one time. Although this does not really affect anything in a single processor/single-core machine, on a mulitcore machine it means you have effectively only one thread running at any one time - causing all the other core to be idle.

How do threads work in Python, and what are common Python-threading specific pitfalls?

I've been trying to wrap my head around how threads work in Python, and it's hard to find good information on how they operate. I may just be missing a link or something, but it seems like the official documentation isn't very thorough on the subject, and I haven't been able to find a good write-up.
From what I can tell, only one thread can be running at once, and the active thread switches every 10 instructions or so?
Where is there a good explanation, or can you provide one? It would also be very nice to be aware of common problems that you run into while using threads with Python.

Yes, because of the Global Interpreter Lock (GIL) there can only run one thread at a time. Here are some links with some insights about this:
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/
From the last link an interesting quote:
Let me explain what all that means.
Threads run inside the same virtual
machine, and hence run on the same
physical machine. Processes can run
on the same physical machine or in
another physical machine. If you
architect your application around
threads, you’ve done nothing to access
multiple machines. So, you can scale
to as many cores are on the single
machine (which will be quite a few
over time), but to really reach web
scales, you’ll need to solve the
multiple machine problem anyway.
If you want to use multi core, pyprocessing defines an process based API to do real parallelization. The PEP also includes some interesting benchmarks.

Python's a fairly easy language to thread in, but there are caveats. The biggest thing you need to know about is the Global Interpreter Lock. This allows only one thread to access the interpreter. This means two things: 1) you rarely ever find yourself using a lock statement in python and 2) if you want to take advantage of multi-processor systems, you have to use separate processes. EDIT: I should also point out that you can put some of the code in C/C++ if you want to get around the GIL as well.
Thus, you need to re-consider why you want to use threads. If you want to parallelize your app to take advantage of dual-core architecture, you need to consider breaking your app up into multiple processes.
If you want to improve responsiveness, you should CONSIDER using threads. There are other alternatives though, namely microthreading. There are also some frameworks that you should look into:
stackless python
greenlets
gevent
monocle

Below is a basic threading sample. It will spawn 20 threads; each thread will output its thread number. Run it and observe the order in which they print.
import threading
class Foo (threading.Thread):
def __init__(self,x):
self.__x = x
threading.Thread.__init__(self)
def run (self):
print str(self.__x)
for x in xrange(20):
Foo(x).start()
As you have hinted at Python threads are implemented through time-slicing. This is how they get the "parallel" effect.
In my example my Foo class extends thread, I then implement the run method, which is where the code that you would like to run in a thread goes. To start the thread you call start() on the thread object, which will automatically invoke the run method...
Of course, this is just the very basics. You will eventually want to learn about semaphores, mutexes, and locks for thread synchronization and message passing.

Use threads in python if the individual workers are doing I/O bound operations. If you are trying to scale across multiple cores on a machine either find a good IPC framework for python or pick a different language.

One easy solution to the GIL is the multiprocessing module. It can be used as a drop in replacement to the threading module but uses multiple Interpreter processes instead of threads. Because of this there is a little more overhead than plain threading for simple things but it gives you the advantage of real parallelization if you need it.
It also easily scales to multiple physical machines.
If you need truly large scale parallelization than I would look further but if you just want to scale to all the cores of one computer or a few different ones without all the work that would go into implementing a more comprehensive framework, than this is for you.

Try to remember that the GIL is set to poll around every so often in order to do show the appearance of multiple tasks. This setting can be fine tuned, but I offer the suggestion that there should be work that the threads are doing or lots of context switches are going to cause problems.
I would go so far as to suggest multiple parents on processors and try to keep like jobs on the same core(s).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.