python threads & sockets - python

I have a "I just want to understand it" question..
first, I'm using python 2.6.5 on Ubuntu.
So.. threads in python (via thread module) are only "threads", and is just tells the GIL to run code blocks from each "thread" in a certain period of time and so and so.. and there aren't actually real threads here..
So the question is - if I have a blocking socket in one thread, and now I'm sending data and block the thread for like 5 seconds. I expected to block all the program because it is one C command (sock.send) that is blocking the thread. But I was surprised to see that the main thread continue to run.
So the question is - how can GIL is able to continue and run the rest of the code after it reaches a blocking command like send? Isn't it have to use real thread in here?
Thanks.

Python uses "real" threads, i.e. threads of the underlying platform. On Linux, it will use the pthread library (if you are interested, here is the implementation).
What is special about Python's threads is the GIL: A thread can only modify Python data structures if it holds this global lock. Thus, many Python operations cannot make use of multiple processor cores. A thread with a blocking socket won't hold the GIL though, so it does not affect other threads.
The GIL is often misunderstood, making people believe threads are almost useless in Python. The only thing the GIL prevents is concurrent execution of "pure" Python code on multiple processor cores. If you use threads to make a GUI responsive or to run other code during blocking I/O, the GIL won't affect you. If you use threads to run code in some C extension like NumPy/SciPy concurrently on multiple processor cores, the GIL won't affect you either.

Python wiki page on GIL mentions that
Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.

GIL (the Global Interpreter Lock) is just a lock, it does not run anything by itself. Rather, the Python interpreter captures and releases that lock as necessary. As a rule, the lock is held when running Python code, but released for calls to lower-level functions (such as sock.send). As Python threads are real OS-level threads, threads will not run Python code in parallel, but if one thread invokes a long-running C function, the GIL is released and another Python code thread can run until the first one finishes.

Related

Python GIL : limitation of multi-threading in parallel programming?

Multi-threading is spawning multiple thread in one process.
As I know, these threads seems working parallel but actual is not.
Not only python also for others C,C++, Java etc,, also only one thread work which acquired context switching.
So the user feels program working in parallel since context switching is very fast occurring between thread.
I think below picture GIL replace to 'context switching'. I mean,, All threads working one-by-one.
So my question is what's the point of GIL in python compare to other language's multi-threading.

Why does Python switch threads?

In the Python documentation about threads and the GIL, it says:
In order to emulate concurrency of execution, the interpreter regularly tries to switch threads (see sys.setswitchinterval())
Why would it do this? These context switches appear to do nothing other than waste time. Wouldn't it be quicker to run each process until it releases the GIL, and then run the next?
A thread doesn't neccessarly have any I/O. You could have one thread doing number crunching, another handling I/O. The number-crunching thread with your proposal would never drop the GIL so the other thread could handle the I/O.
To ensure every thread gets to run, a thread will by default drop the GIL after 5 ms (Python 3) if it hasn't done so before because of waiting for I/O.
You can change this interval with sys.setswitchinterval().
Threading is a simple concurrency technique. For a more efficient concurrency technique look into asyncio which offers single-threaded concurrency using coroutines.

Blocking I/O in Python

Newbie on Python and multiple threading.
I read some articles on what is blocking and non-blocking I/O, and the main difference seems to be the case that blocking I/O only allows tasks to be executed sequentially, while non-blocking I/O allows multiple tasks to be executed concurrently.
If that's the case, how blocking I/O operations (some Python standard built-in functions) can do multiple threading?
Blocking I/O blocks the thread it's running in, not the whole process. (at least in this context, and on a standard PC)
Multithreading is not affected by definition - only current thread gets blocked.
The global interpreter lock (in cpython) is a measure put in place so that only one active python thread executes at the same time. As frustrating as it can be, this is a good thing because it is put in place to avoid interpreter corruption.
When a blocking operation is encountered, the current thread yields the lock and thus allows other threads to execute while the first thread is blocked. However, when CPU bound threads (when purely python calls are made), only one thread executes no matter how many threads are running.
It is interesting to note that in python 3.2, code was added to mitigate the effects of the global interpreter lock. It is also interesting to note that other implementations of python do not have a global interpreter lock
Please not this is a limitation of the python code and that the underlying libraries may be still processing data.
Also, in many cases, when it comes to I/O, to avoid blocking, a useful way to handle IO is using polling and eventing:
Polling involves checking whether the operation would block and test whether there is data. For example, if you are trying to get
data from a socket, you would use select() and poll()
Eventing involves using callbacks in such a way that your thread is triggered when a relevant IO operation just occurred

Do coroutines from PEP-492 bypass the GIL in Python 3.5?

Python 3.5 includes co-routine support with PEP-492; which is great and all.... assuming the coroutines go around the GIL; does anyone know if that's the case? Or, should I just keep using the multiprocessing module?
I would say, that coroutines do not bypass the GIL.
The reason is, that coroutines are never processed in parallel. Coroutines are a language feature that does some kind of pseudo-parallel processing without real parallel task, thread or anything else execution. Only one coroutine is ever executed at once.
Remember: Even, when using coroutines, you can still have different threads in your program!
So, the GIL is not affected, because the GIL is only a means to prevent real parallel processing of threads in specific parts of the Python interpreter, that could end in corruption of global data.
When you are using a thread-enabled version of Python, you will have the GIL -- and no thread and also no coroutines "bypass" the GIL. But the coroutines are not affected by the GIL, as threads are, since threads could be stopped by the GIL, when entering critical sections. Coroutines are not, unless a second thread is running ... (but that is the problem of the threading in your program, not of the coroutines).
Of course, you can (at least it was possible some time ago) create a Python interpreter version with no thread-support (when you really don't need it), by compiling the interpreter yourself. In such a version, the GIL should be not executed.
But you must be sure, that no module you are using, is using threading, since that module would break.
Edit:
After reading your question a second time, I guess, what you really want to ask is, if the GIL-overhead (applicable in threads) is lower in coroutines.
I would say, yes. Even when the GIL is active in your version of the interpreter. Because by limiting your execution to cooperative multiprocessing, the GIL will not (or less, when you still have more than one thread) affect your coroutines, as it is when you have multiple worker threads. There will be less (or no) contention on reserving the GIL.
There is also the webserver "Tornado" that uses coprocessing technologies for a longer time now in Python, very successfully. That should show, that coprocessing is definitively a good choice, when using Python. Also there are other examples of programs that are fast by using coprocessing technology (e.g. Nginx).

Python threads all executing on a single core

I have a Python program that spawns many threads, runs 4 at a time, and each performs an expensive operation. Pseudocode:
for object in list:
t = Thread(target=process, args=(object))
# if fewer than 4 threads are currently running, t.start(). Otherwise, add t to queue
But when the program is run, Activity Monitor in OS X shows that 1 of the 4 logical cores is at 100% and the others are at nearly 0. Obviously I can't force the OS to do anything but I've never had to pay attention to performance in multi-threaded code like this before so I was wondering if I'm just missing or misunderstanding something.
Thanks.
Note that in many cases (and virtually all cases where your "expensive operation" is a calculation implemented in Python), multiple threads will not actually run concurrently due to Python's Global Interpreter Lock (GIL).
The GIL is an interpreter-level lock.
This lock prevents execution of
multiple threads at once in the Python
interpreter. Each thread that wants to
run must wait for the GIL to be
released by the other thread, which
means your multi-threaded Python
application is essentially single
threaded, right? Yes. Not exactly.
Sort of.
CPython uses what’s called “operating
system” threads under the covers,
which is to say each time a request to
make a new thread is made, the
interpreter actually calls into the
operating system’s libraries and
kernel to generate a new thread. This
is the same as Java, for example. So
in memory you really do have multiple
threads and normally the operating
system controls which thread is
scheduled to run. On a multiple
processor machine, this means you
could have many threads spread across
multiple processors, all happily
chugging away doing work.
However, while CPython does use
operating system threads (in theory
allowing multiple threads to execute
within the interpreter
simultaneously), the interpreter also
forces the GIL to be acquired by a
thread before it can access the
interpreter and stack and can modify
Python objects in memory all
willy-nilly. The latter point is why
the GIL exists: The GIL prevents
simultaneous access to Python objects
by multiple threads. But this does not
save you (as illustrated by the Bank
example) from being a lock-sensitive
creature; you don’t get a free ride.
The GIL is there to protect the
interpreters memory, not your sanity.
See the Global Interpreter Lock section of Jesse Noller's post for more details.
To get around this problem, check out Python's multiprocessing module.
multiple processes (with judicious use
of IPC) are[...] a much better
approach to writing apps for multi-CPU
boxes than threads.
-- Guido van Rossum (creator of Python)
Edit based on a comment from #spinkus:
If Python can't run multiple threads simultaneously, then why have threading at all?
Threads can still be very useful in Python when doing simultaneous operations that do not need to modify the interpreter's state. This includes many (most?) long-running function calls that are not in-Python calculations, such as I/O (file access or network requests)) and [calculations on Numpy arrays][6]. These operations release the GIL while waiting for a result, allowing the program to continue executing. Then, once the result is received, the thread must re-acquire the GIL in order to use that result in "Python-land"
Python has a Global Interpreter Lock, which can prevent threads of interpreted code from being processed concurrently.
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
http://wiki.python.org/moin/GlobalInterpreterLock
For ways to get around this, try the multiprocessing module, as advised here:
Does running separate python processes avoid the GIL?
AFAIK, in CPython the Global Interpreter Lock means that there can't be more than one block of Python code being run at any one time. Although this does not really affect anything in a single processor/single-core machine, on a mulitcore machine it means you have effectively only one thread running at any one time - causing all the other core to be idle.

Categories