Python multithreading best practices

Python multithreading best practices - python

i just recently read an article about the GIL (Global Interpreter Lock) in python.
Which seems to be some big issue when it comes to Python performance. So i was wondering myself
what would be the best practice to archive more performance. Would it be threading or
either multiprocessing? Because i hear everybody say something different, it would be
nice to have one clear answer. Or at least to know the pros and contras of multithreading
against multiprocessing.
Kind regards,
Dirk

It depends on the application, and on the python implementation that you are using.
In CPython (the reference implementation) and pypy the GIL only allows one thread at a time to execute Python bytecode. Other threads might be doing I/O or running extensions written in C.
It is worth noting that some other implementations like IronPython and JPython don't have a GIL.
A characteristic of threading is that all threads share the same interpreter and all the live objects. So threads can share global data almost without extra effort. You need to use locking to serialize access to data, though! Imagine what would happen if two threads would try to modifiy the same list.
Multiprocessing actually runs in different processes. That sidesteps the GIL, but if large amounts of data need to be shared between processes that data has to be pickled and transported to another process via IPC where it has to be unpickled again. The multiprocessing module can take care of the messy details for you, but it still adds overhead.
So if your program wants to run Python code in parallel but doesn't need to share huge amounts of data between instances (e.g. just filenames of files that need to be processed), multiprocessing is a good choice.
Currently multiprocessing is the only way that I'm aware of in the standard library to use all the cores of your CPU at the same time.
On the other hand if your tasks need to share a lot of data and most of the processing is done in extension or is I/O, threading would be a good choice.

Related

Python asyncio or multithread for fetching data from different sources?

I have a Python app that needs to fetch data from 2-3 different sources (SQL Server, MongoDB etc..) and it can be done in parallel, as I simply need all of the data together later, and each request does not rely on the others.
I couldn't figure which is better for this case - threads, processes or async await?
I read that differences are mostly in CPU usage and I/O. But what if I simply wish to make multiple requests simultaneously (and not sequentially)? Of course, no CPU usage here at all.

I'd suggest you have a look at python's builtin Threading module...
... a quote from the doc:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
in short... multithreading has the problem of GIL (e.g. global interpreter lock) but as you can read here...
...the GIL is always released when doing I/O.

Concurrency and race condition [duplicate]

Does the presence of python GIL imply that in python multi threading the same operation is not so different from repeating it in a single thread?.
For example, If I need to upload two files, what is the advantage of doing them in two threads instead of uploading them one after another?.
I tried a big math operation in both ways. But they seem to take almost equal time to complete.
This seems to be unclear to me. Can someone help me on this?.
Thanks.

Python's threads get a slightly worse rap than they deserve. There are three (well, 2.5) cases where they actually get you benefits:
If non-Python code (e.g. a C library, the kernel, etc.) is running, other Python threads can continue executing. It's only pure Python code that can't run in two threads at once. So if you're doing disk or network I/O, threads can indeed buy you something, as most of the time is spent outside of Python itself.
The GIL is not actually part of Python, it's an implementation detail of CPython (the "reference" implementation that the core Python devs work on, and that you usually get if you just run "python" on your Linux box or something.
Jython, IronPython, and any other reimplementations of Python generally do not have a GIL, and multiple pure-Python threads can execute simultaneously.
The 0.5 case: Even if you're entirely pure-Python and see little or no performance benefit from threading, some problems are really convenient in terms of developer time and difficulty to solve with threads. This depends in part on the developer, too, of course.

It really depends on the library you're using. The GIL is meant to prevent Python objects and its internal data structures to be changed at the same time. If you're doing an upload, the library you use to do the actual upload might release the GIL while it's waiting for the actual HTTP request to complete (I would assume that is the case with the HTTP modules in the standard library, but I didn't check).
As a side note, if you really want to have things running in parallel, just use multiple processes. It will save you a lot of trouble and you'll end up with better code (more robust, more scalable, and most probably better structured).

It depends on the native code module that's executing. Native modules can release the GIL and then go off and do their own thing allowing another thread to lock the GIL. The GIL is normally held while code, both python and native, are operating on python objects. If you want more detail you'll probably need to go and read quite a bit about it. :)
See:
What is a global interpreter lock (GIL)? and Thread State and the Global Interpreter Lock

Multithreading is a concept where two are more tasks need be completed simultaneously, for example, I have word processor in this application there are N numbers of a parallel task have to work. Like listening to keyboard, formatting input text, sending a formatted text to display unit. In this context with sequential processing, it is time-consuming and one task has to wait till the next task completion. So we put these tasks in threads and simultaneously complete the task. Three threads are always up and waiting for the inputs to arrive, then take that input and produce the output simultaneously.
So multi-threading works faster if we have multi-core and processors. But in reality with single processors, threads will work one after the other, but we feel it's executing with greater speed, Actually, one instruction executes at a time and a processor can execute billions of instructions at a time. So the computer creates illusion that multi-task or thread working parallel. It just an illusion.

Python and truly concurrent threads

I've been reading for hours now and I can completely figure out how python multi threading is faster than a single thread.
The question really stems from GIL. If there is GIL, and only one thread is really running at any single time, how can multi threading be faster than a single thread?
I read that with some operations GIL is released (like writing to file). Is that what makes multi threading faster?
And about greenlets. How do those help with concurrency at all? So far all the purpose I see for them is easy switching between functions and less complicated yield functions.
EDIT: And how in the world a server like Tornado can deal with thousands of simultaneous connections?

You are correct - when python is waiting on C code execution the GIL is released, and that is how you can get some speedup. But only one line of python can be executed at a time. Note that this is a CPython (implementation) detail, and not strictly speaking part of the language python itself. For example, Jython and IronPython have no GIL and can fully exploit multiprocessor systems.
If you need truly concurrent programming in CPython, you should be looking at multiprocessing rather than threading.

Should I use fork or threads?

In my script, I have a function foo which basically uses pynotify to notify user about something repeatedly after a time interval say 15 minutes.
def foo:
while True:
"""Does something"""
time.sleep(900)
My main script has to interact with user & does all other things so I just cant call the foo() function. directly.
Whats the better way of doing it and why?
Using fork or threads?

I won't tell you which one to use, but here are some of the advantages of each:
Threads can start more quickly than processes, and threads use fewer operating system resources than processes, including memory, file handles, etc. Threads also give you the option of communicating through shared variables (although many would say this is more of a disadvantage than an advantage - See below).
Processes each have their own separate memory and variables, which means that processes generally communicate by sending messages to each other. This is much easier to do correctly than having threads communicate via shared memory. Processes can also run truly concurrently, so that if you have multiple CPU cores, you can keep all of them busy using processes. In Python*, the global interpreter lock prevents threads from making much use of more than a single core.
* - That is, CPython, which the implementation of Python that you get if you go to http://python.org and download Python. Other Python implementations (such as Jython) do not necessarily prohibit Python from running threads on multiple CPUs simultaneously. Thanks to #EOL for the clarification.

For these kinds of problems, neither threads nor forked processes seem the right approach. If all you want to do is to once every 15 minutes notify the user of something, why not use an event loop like GLib's or Twisted's reactor ? This allows you to schedule operations that should run once in a while, and get on with the rest of your program.

Using multiple processes lets you exploit multiple CPU cores at the same time, while, in CPython, using threads doesn't (threads take turns using a single CPU core) -- so, if you have CPU intensive work and absolutely want to use threads, you should consider Jython or IronPython; with CPython, this consideration is often enough to sway the choice towards the multiprocessing module and away from the threading one (they offer pretty similar interfaces, because multiprocessing was designed to be easily put in place in lieu of threading).
Net of this crucial consideration, threads might often be a better choice (performance-wise) on Windows (where making a new process is a heavy task), but less often on Unix variants (Linux, BSD versions, OpenSolaris, MacOSX, ...), since making a new process is faster there (but if you're using IronPython or Jython, you should check, on the platforms you care about, that this still applies in the virtual machines in question -- CLR with either .NET or Mono for IronPython, your JVM of choice for Jython).

Processes are much simpler. Just turn them loose and let the OS handle it.
Also, processes are often much more efficient. Processes do not share a common pool of I/O resources; they are completely independent.
Python's subprocess.Popen handles everything.

If by fork you mean os.fork then I would avoid using that. It is not cross platform and too low level - you would need to implement communication between the processes yourself.
If you want to use a separate process then use either the subprocess module or if you are on Python 2.6 or later the new multiprocessing module. This has a very similar API to the threading module, so you could start off using threads and then easily switch to processes, or vice-versa.
For what you want to do I think I would use threads, unless """does something""" is CPU intensive and you want to take advantage of multiple cores, which I doubt in this particular case.

Why the Global Interpreter Lock?

What is exactly the function of Python's Global Interpreter Lock?
Do other languages that are compiled to bytecode employ a similar mechanism?

In general, for any thread safety problem you will need to protect your internal data structures with locks.
This can be done with various levels of granularity.
You can use fine-grained locking, where every separate structure has its own lock.
You can use coarse-grained locking where one lock protects everything (the GIL approach).
There are various pros and cons of each method. Fine-grained locking allows greater parallelism - two threads can
execute in parallel when they don't share any resources. However there is a much larger administrative overhead. For
every line of code, you may need to acquire and release several locks.
The coarse grained approach is the opposite. Two threads can't run at the same time, but an individual thread will run faster because its not doing so much bookkeeping. Ultimately it comes down to a tradeoff between single-threaded speed and parallelism.
There have been a few attempts to remove the GIL in python, but the extra overhead for single threaded machines was generally too large. Some cases can actually be slower even on multi-processor machines
due to lock contention.
Do other languages that are compiled to bytecode employ a similar mechanism?
It varies, and it probably shouldn't be considered a language property so much as an implementation property.
For instance, there are Python implementations such as Jython and IronPython which use the threading approach of their underlying VM, rather than a GIL approach. Additionally, the next version of Ruby looks to be moving towards introducing a GIL.

The following is from the official Python/C API Reference Manual:
The Python interpreter is not fully
thread safe. In order to support
multi-threaded Python programs,
there's a global lock that must be
held by the current thread before it
can safely access Python objects.
Without the lock, even the simplest
operations could cause problems in a
multi-threaded program: for example,
when two threads simultaneously
increment the reference count of the
same object, the reference count could
end up being incremented only once
instead of twice.
Therefore, the rule exists that only
the thread that has acquired the
global interpreter lock may operate on
Python objects or call Python/C API
functions. In order to support
multi-threaded Python programs, the
interpreter regularly releases and
reacquires the lock -- by default,
every 100 bytecode instructions (this
can be changed with
sys.setcheckinterval()). The lock is
also released and reacquired around
potentially blocking I/O operations
like reading or writing a file, so
that other threads can run while the
thread that requests the I/O is
waiting for the I/O operation to
complete.
I think it sums up the issue pretty well.

The global interpreter lock is a big mutex-type lock that protects reference counters from getting hosed. If you are writing pure python code, this all happens behind the scenes, but if you embedding Python into C, then you might have to explicitly take/release the lock.
This mechanism is not related to Python being compiled to bytecode. It's not needed for Java. In fact, it's not even needed for Jython (python compiled to jvm).
see also this question

Python, like perl 5, was not designed from the ground up to be thread safe. Threads were grafted on after the fact, so the global interpreter lock is used to maintain mutual exclusion to where only one thread is executing code at a given time in the bowels of the interpreter.
Individual Python threads are cooperatively multitasked by the interpreter itself by cycling the lock every so often.
Grabbing the lock yourself is needed when you are talking to Python from C when other Python threads are active to 'opt in' to this protocol and make sure that nothing unsafe happens behind your back.
Other systems that have a single-threaded heritage that later evolved into mulithreaded systems often have some mechanism of this sort. For instance, the Linux kernel has the "Big Kernel Lock" from its early SMP days. Gradually over time as multi-threading performance becomes an issue there is a tendency to try to break these sorts of locks up into smaller pieces or replace them with lock-free algorithms and data structures where possible to maximize throughput.

Regarding your second question, not all scripting languages use this, but it only makes them less powerful. For instance, the threads in Ruby are green and not native.
In Python, the threads are native and the GIL only prevents them from running on different cores.
In Perl, the threads are even worse. They just copy the whole interpreter, and are far from being as usable as in Python.

Maybe this article by the BDFL will help.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.