Python threads all executing on a single core

Python threads all executing on a single core - python

I have a Python program that spawns many threads, runs 4 at a time, and each performs an expensive operation. Pseudocode:
for object in list:
t = Thread(target=process, args=(object))
# if fewer than 4 threads are currently running, t.start(). Otherwise, add t to queue
But when the program is run, Activity Monitor in OS X shows that 1 of the 4 logical cores is at 100% and the others are at nearly 0. Obviously I can't force the OS to do anything but I've never had to pay attention to performance in multi-threaded code like this before so I was wondering if I'm just missing or misunderstanding something.
Thanks.

Note that in many cases (and virtually all cases where your "expensive operation" is a calculation implemented in Python), multiple threads will not actually run concurrently due to Python's Global Interpreter Lock (GIL).
The GIL is an interpreter-level lock.
This lock prevents execution of
multiple threads at once in the Python
interpreter. Each thread that wants to
run must wait for the GIL to be
released by the other thread, which
means your multi-threaded Python
application is essentially single
threaded, right? Yes. Not exactly.
Sort of.
CPython uses what’s called “operating
system” threads under the covers,
which is to say each time a request to
make a new thread is made, the
interpreter actually calls into the
operating system’s libraries and
kernel to generate a new thread. This
is the same as Java, for example. So
in memory you really do have multiple
threads and normally the operating
system controls which thread is
scheduled to run. On a multiple
processor machine, this means you
could have many threads spread across
multiple processors, all happily
chugging away doing work.
However, while CPython does use
operating system threads (in theory
allowing multiple threads to execute
within the interpreter
simultaneously), the interpreter also
forces the GIL to be acquired by a
thread before it can access the
interpreter and stack and can modify
Python objects in memory all
willy-nilly. The latter point is why
the GIL exists: The GIL prevents
simultaneous access to Python objects
by multiple threads. But this does not
save you (as illustrated by the Bank
example) from being a lock-sensitive
creature; you don’t get a free ride.
The GIL is there to protect the
interpreters memory, not your sanity.
See the Global Interpreter Lock section of Jesse Noller's post for more details.
To get around this problem, check out Python's multiprocessing module.
multiple processes (with judicious use
of IPC) are[...] a much better
approach to writing apps for multi-CPU
boxes than threads.
-- Guido van Rossum (creator of Python)
Edit based on a comment from #spinkus:
If Python can't run multiple threads simultaneously, then why have threading at all?
Threads can still be very useful in Python when doing simultaneous operations that do not need to modify the interpreter's state. This includes many (most?) long-running function calls that are not in-Python calculations, such as I/O (file access or network requests)) and [calculations on Numpy arrays][6]. These operations release the GIL while waiting for a result, allowing the program to continue executing. Then, once the result is received, the thread must re-acquire the GIL in order to use that result in "Python-land"

Python has a Global Interpreter Lock, which can prevent threads of interpreted code from being processed concurrently.
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
http://wiki.python.org/moin/GlobalInterpreterLock
For ways to get around this, try the multiprocessing module, as advised here:
Does running separate python processes avoid the GIL?

AFAIK, in CPython the Global Interpreter Lock means that there can't be more than one block of Python code being run at any one time. Although this does not really affect anything in a single processor/single-core machine, on a mulitcore machine it means you have effectively only one thread running at any one time - causing all the other core to be idle.

Related

Multiprocessing multithreading GIL?

So, since several days I do a lot of research about multiprocessing and multithreading on python and i'm very confused about many thing. So many times I see someone talking about GIL something that doesn't allow Python code to execute on several cpu cores, but when I code a program who create many threads I can see several cpu cores are active.
1st question: What's is really GIL? does it work? I think about something like when a process create too many thread the OS distributed task on multi cpu. Am I right?
Other thing, I want take advantage of my cpus. I think about something like create as much process as cpu core and on this each process create as much thread as cpu core. Am I on the right lane?

To start with, GIL only ensures that only one cpython bytecode instruction will run at any given time. It does not care about which CPU core runs the instruction. That is the job of the OS kernel.
So going over your questions:
GIL is just a piece of code. The CPython Virtual machine is the process which first compiles the code to Cpython bytecode but it's normal job is to interpret the CPython bytecode. GIL is a piece of code that ensures a single line of bytecode runs at a time no matter how many threads are running. Cpython Bytecode instructions is what constitutes the virtual machine stack. So in a way, GIL will ensure that only one thread holds the GIL at any given point of time. (also that it keeps releasing the GIL for other threads and not starve them.)
Now coming to your actual confusion. You mention that when you run a program with many threads, you can see multiple (may be all) CPU cores firing up. So I did some experimentation and found that your findings are right (which is obvious) but the behaviour is similar in a non threaded version too.
def do_nothing(i):
time.sleep(0.0001)
return i*2
ThreadPool(20).map(do_nothing, range(10000))
def do_nothing(i):
time.sleep(0.0001)
return i*2
[do_nothing(i) for i in range(10000)]
The first one in multithreaded and the second one is not. When you compare the CPU usage by by both the programs, you will find that in both the cases multiple CPU cores will fire up. So what you noticed, although right, has not much to do with GIL or threading. CPU usage going high in multiple cores is simply because OS kernel will distribute the execution of code to different cores based on availability.
Your last question is more of an experimental thing as different programs have different CPU/io usage. You just have to be aware of the cost of creation of a thread and a process and the working of GIL & PVM and optimize the number of threads and processes to get the maximum perf out.
You can go through this talk by David Beazley to understand how multithreading can make your code perform worse (or better).

There are answers about what the Global Interpreter Lock (GIL) is here. Buried among the answers is mention of Python "bytecode", which is central to the issue. When your program is compiled, the output is bytecode, i.e. low-level computer instructions for a fictitious "Python" computer, that gets interpreted by the Python interpreter. When the interpreter is executing a bytecode, it serializes execution by acquiring the Global Interpreter Lock. This means that two threads cannot be executing bytecode concurrently on two different cores. This also means that true multi-threading is not implemented. But does this mean that there is no reason to use threading? No! Here are a couple of situations where threading is still useful:
For certain operations the interpreter will release the GIL, i.e. when doing I/O. So consider as an example the case where you want to fetch a lot of URLs from different websites. Most of the time is spent waiting for a response to be returned once the request is made and this waiting can be overlapped even if formulating the requests has to be done serially.
Many Python functions and modules are implemented in the C language and are free to release the GIL after being called if their processing requirements allow it. The numpy module is one such highly optimized package.
Consequently, threading is best used when the tasks are not CPU-intensive, i.e. they do a lot of waiting for I/O to complete, or they do a lot of sleeping, etc.

Do coroutines from PEP-492 bypass the GIL in Python 3.5?

Python 3.5 includes co-routine support with PEP-492; which is great and all.... assuming the coroutines go around the GIL; does anyone know if that's the case? Or, should I just keep using the multiprocessing module?

I would say, that coroutines do not bypass the GIL.
The reason is, that coroutines are never processed in parallel. Coroutines are a language feature that does some kind of pseudo-parallel processing without real parallel task, thread or anything else execution. Only one coroutine is ever executed at once.
Remember: Even, when using coroutines, you can still have different threads in your program!
So, the GIL is not affected, because the GIL is only a means to prevent real parallel processing of threads in specific parts of the Python interpreter, that could end in corruption of global data.
When you are using a thread-enabled version of Python, you will have the GIL -- and no thread and also no coroutines "bypass" the GIL. But the coroutines are not affected by the GIL, as threads are, since threads could be stopped by the GIL, when entering critical sections. Coroutines are not, unless a second thread is running ... (but that is the problem of the threading in your program, not of the coroutines).
Of course, you can (at least it was possible some time ago) create a Python interpreter version with no thread-support (when you really don't need it), by compiling the interpreter yourself. In such a version, the GIL should be not executed.
But you must be sure, that no module you are using, is using threading, since that module would break.
Edit:
After reading your question a second time, I guess, what you really want to ask is, if the GIL-overhead (applicable in threads) is lower in coroutines.
I would say, yes. Even when the GIL is active in your version of the interpreter. Because by limiting your execution to cooperative multiprocessing, the GIL will not (or less, when you still have more than one thread) affect your coroutines, as it is when you have multiple worker threads. There will be less (or no) contention on reserving the GIL.
There is also the webserver "Tornado" that uses coprocessing technologies for a longer time now in Python, very successfully. That should show, that coprocessing is definitively a good choice, when using Python. Also there are other examples of programs that are fast by using coprocessing technology (e.g. Nginx).

Is "Python only run one thread in parallel" true? [duplicate]

I've been trying to wrap my head around how threads work in Python, and it's hard to find good information on how they operate. I may just be missing a link or something, but it seems like the official documentation isn't very thorough on the subject, and I haven't been able to find a good write-up.
From what I can tell, only one thread can be running at once, and the active thread switches every 10 instructions or so?
Where is there a good explanation, or can you provide one? It would also be very nice to be aware of common problems that you run into while using threads with Python.

Yes, because of the Global Interpreter Lock (GIL) there can only run one thread at a time. Here are some links with some insights about this:
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/
From the last link an interesting quote:
Let me explain what all that means.
Threads run inside the same virtual
machine, and hence run on the same
physical machine. Processes can run
on the same physical machine or in
another physical machine. If you
architect your application around
threads, you’ve done nothing to access
multiple machines. So, you can scale
to as many cores are on the single
machine (which will be quite a few
over time), but to really reach web
scales, you’ll need to solve the
multiple machine problem anyway.
If you want to use multi core, pyprocessing defines an process based API to do real parallelization. The PEP also includes some interesting benchmarks.

Python's a fairly easy language to thread in, but there are caveats. The biggest thing you need to know about is the Global Interpreter Lock. This allows only one thread to access the interpreter. This means two things: 1) you rarely ever find yourself using a lock statement in python and 2) if you want to take advantage of multi-processor systems, you have to use separate processes. EDIT: I should also point out that you can put some of the code in C/C++ if you want to get around the GIL as well.
Thus, you need to re-consider why you want to use threads. If you want to parallelize your app to take advantage of dual-core architecture, you need to consider breaking your app up into multiple processes.
If you want to improve responsiveness, you should CONSIDER using threads. There are other alternatives though, namely microthreading. There are also some frameworks that you should look into:
stackless python
greenlets
gevent
monocle

Below is a basic threading sample. It will spawn 20 threads; each thread will output its thread number. Run it and observe the order in which they print.
import threading
class Foo (threading.Thread):
def __init__(self,x):
self.__x = x
threading.Thread.__init__(self)
def run (self):
print str(self.__x)
for x in xrange(20):
Foo(x).start()
As you have hinted at Python threads are implemented through time-slicing. This is how they get the "parallel" effect.
In my example my Foo class extends thread, I then implement the run method, which is where the code that you would like to run in a thread goes. To start the thread you call start() on the thread object, which will automatically invoke the run method...
Of course, this is just the very basics. You will eventually want to learn about semaphores, mutexes, and locks for thread synchronization and message passing.

Note: wherever I mention thread i mean specifically threads in python until explicitly stated.
Threads work a little differently in python if you are coming from C/C++ background. In python, Only one thread can be in running state at a given time.This means Threads in python cannot truly leverage the power of multiple processing cores since by design it's not possible for threads to run parallelly on multiple cores.
As the memory management in python is not thread-safe each thread require an exclusive access to data structures in python interpreter.This exclusive access is acquired by a mechanism called GIL ( global interpretr lock ).
Why does python use GIL?
In order to prevent multiple threads from accessing interpreter state simultaneously and corrupting the interpreter state.
The idea is whenever a thread is being executed (even if it's the main thread), a GIL is acquired and after some predefined interval of time the
GIL is released by the current thread and reacquired by some other thread( if any).
Why not simply remove GIL?
It is not that its impossible to remove GIL, its just that in prcoess of doing so we end up putting mutiple locks inside interpreter in order to serialize access, which makes even a single threaded application less performant.
so the cost of removing GIL is paid off by reduced performance of a single threaded application, which is never desired.
So when does thread switching occurs in python?
Thread switch occurs when GIL is released.So when is GIL Released?
There are two scenarios to take into consideration.
If a Thread is doing CPU Bound operations(Ex image processing).
In Older versions of python , Thread switching used to occur after a fixed no of python instructions.It was by default set to 100.It turned out that its not a very good policy to decide when switching should occur since the time spent executing a single instruction can
very wildly from millisecond to even a second.Therefore releasing GIL after every 100 instructions regardless of the time they take to execute is a poor policy.
In new versions instead of using instruction count as a metric to switch thread , a configurable time interval is used.
The default switch interval is 5 milliseconds.you can get the current switch interval using sys.getswitchinterval().
This can be altered using sys.setswitchinterval()
If a Thread is doing some IO Bound Operations(Ex filesystem access or
network IO)
GIL is release whenever the thread is waiting for some for IO operation to get completed.
Which thread to switch to next?
The interpreter doesn’t have its own scheduler.which thread becomes scheduled at the end of the interval is the operating system’s decision. .

Use threads in python if the individual workers are doing I/O bound operations. If you are trying to scale across multiple cores on a machine either find a good IPC framework for python or pick a different language.

One easy solution to the GIL is the multiprocessing module. It can be used as a drop in replacement to the threading module but uses multiple Interpreter processes instead of threads. Because of this there is a little more overhead than plain threading for simple things but it gives you the advantage of real parallelization if you need it.
It also easily scales to multiple physical machines.
If you need truly large scale parallelization than I would look further but if you just want to scale to all the cores of one computer or a few different ones without all the work that would go into implementing a more comprehensive framework, than this is for you.

Try to remember that the GIL is set to poll around every so often in order to do show the appearance of multiple tasks. This setting can be fine tuned, but I offer the suggestion that there should be work that the threads are doing or lots of context switches are going to cause problems.
I would go so far as to suggest multiple parents on processors and try to keep like jobs on the same core(s).

python threads & sockets

I have a "I just want to understand it" question..
first, I'm using python 2.6.5 on Ubuntu.
So.. threads in python (via thread module) are only "threads", and is just tells the GIL to run code blocks from each "thread" in a certain period of time and so and so.. and there aren't actually real threads here..
So the question is - if I have a blocking socket in one thread, and now I'm sending data and block the thread for like 5 seconds. I expected to block all the program because it is one C command (sock.send) that is blocking the thread. But I was surprised to see that the main thread continue to run.
So the question is - how can GIL is able to continue and run the rest of the code after it reaches a blocking command like send? Isn't it have to use real thread in here?
Thanks.

Python uses "real" threads, i.e. threads of the underlying platform. On Linux, it will use the pthread library (if you are interested, here is the implementation).
What is special about Python's threads is the GIL: A thread can only modify Python data structures if it holds this global lock. Thus, many Python operations cannot make use of multiple processor cores. A thread with a blocking socket won't hold the GIL though, so it does not affect other threads.
The GIL is often misunderstood, making people believe threads are almost useless in Python. The only thing the GIL prevents is concurrent execution of "pure" Python code on multiple processor cores. If you use threads to make a GUI responsive or to run other code during blocking I/O, the GIL won't affect you. If you use threads to run code in some C extension like NumPy/SciPy concurrently on multiple processor cores, the GIL won't affect you either.

Python wiki page on GIL mentions that
Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.

GIL (the Global Interpreter Lock) is just a lock, it does not run anything by itself. Rather, the Python interpreter captures and releases that lock as necessary. As a rule, the lock is held when running Python code, but released for calls to lower-level functions (such as sock.send). As Python threads are real OS-level threads, threads will not run Python code in parallel, but if one thread invokes a long-running C function, the GIL is released and another Python code thread can run until the first one finishes.

How do threads work in Python, and what are common Python-threading specific pitfalls?

I've been trying to wrap my head around how threads work in Python, and it's hard to find good information on how they operate. I may just be missing a link or something, but it seems like the official documentation isn't very thorough on the subject, and I haven't been able to find a good write-up.
From what I can tell, only one thread can be running at once, and the active thread switches every 10 instructions or so?
Where is there a good explanation, or can you provide one? It would also be very nice to be aware of common problems that you run into while using threads with Python.

Yes, because of the Global Interpreter Lock (GIL) there can only run one thread at a time. Here are some links with some insights about this:
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/
From the last link an interesting quote:
Let me explain what all that means.
Threads run inside the same virtual
machine, and hence run on the same
physical machine. Processes can run
on the same physical machine or in
another physical machine. If you
architect your application around
threads, you’ve done nothing to access
multiple machines. So, you can scale
to as many cores are on the single
machine (which will be quite a few
over time), but to really reach web
scales, you’ll need to solve the
multiple machine problem anyway.
If you want to use multi core, pyprocessing defines an process based API to do real parallelization. The PEP also includes some interesting benchmarks.

Python's a fairly easy language to thread in, but there are caveats. The biggest thing you need to know about is the Global Interpreter Lock. This allows only one thread to access the interpreter. This means two things: 1) you rarely ever find yourself using a lock statement in python and 2) if you want to take advantage of multi-processor systems, you have to use separate processes. EDIT: I should also point out that you can put some of the code in C/C++ if you want to get around the GIL as well.
Thus, you need to re-consider why you want to use threads. If you want to parallelize your app to take advantage of dual-core architecture, you need to consider breaking your app up into multiple processes.
If you want to improve responsiveness, you should CONSIDER using threads. There are other alternatives though, namely microthreading. There are also some frameworks that you should look into:
stackless python
greenlets
gevent
monocle

Below is a basic threading sample. It will spawn 20 threads; each thread will output its thread number. Run it and observe the order in which they print.
import threading
class Foo (threading.Thread):
def __init__(self,x):
self.__x = x
threading.Thread.__init__(self)
def run (self):
print str(self.__x)
for x in xrange(20):
Foo(x).start()
As you have hinted at Python threads are implemented through time-slicing. This is how they get the "parallel" effect.
In my example my Foo class extends thread, I then implement the run method, which is where the code that you would like to run in a thread goes. To start the thread you call start() on the thread object, which will automatically invoke the run method...
Of course, this is just the very basics. You will eventually want to learn about semaphores, mutexes, and locks for thread synchronization and message passing.

Use threads in python if the individual workers are doing I/O bound operations. If you are trying to scale across multiple cores on a machine either find a good IPC framework for python or pick a different language.

One easy solution to the GIL is the multiprocessing module. It can be used as a drop in replacement to the threading module but uses multiple Interpreter processes instead of threads. Because of this there is a little more overhead than plain threading for simple things but it gives you the advantage of real parallelization if you need it.
It also easily scales to multiple physical machines.
If you need truly large scale parallelization than I would look further but if you just want to scale to all the cores of one computer or a few different ones without all the work that would go into implementing a more comprehensive framework, than this is for you.

Try to remember that the GIL is set to poll around every so often in order to do show the appearance of multiple tasks. This setting can be fine tuned, but I offer the suggestion that there should be work that the threads are doing or lots of context switches are going to cause problems.
I would go so far as to suggest multiple parents on processors and try to keep like jobs on the same core(s).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.