I'm curious in how the Global Interpreter Lock in python actually works. If I have a c++ application launch four separate instances of a python script will they run in parallel on separate cores, or does the GIL go even deeper then just the single process that was launched and control all python process's regardless of the process that spawned it?
The GIL only affects threads within a single process. The multiprocessing module is in fact an alternative to threading that lets Python programs use multiple cores &c. Your scenario will easily allow use of multiple cores, too.
As Alex Martelli points out you can indeed avoid the GIL by running multiple processes, I just want to add and point out that the GIL is a limitation of the implementation (CPython) and not of Python in general, it's possible to implement Python without this limitation. Stackless Python comes to mind.
Related
Here's what I understand:
The multiprocessing library uses multiple cores, so it's processing in parallel and not just simulating parallel processing like some libraries. To do this, it overrides the Python GIL.
The concurrent library doesn't override the Python GIL and so it doesn't have the issues that multiprocessing has (ie locking, hanging). So it seems like it's not actually using multiple cores.
I understand the difference between concurrency and parallelism. My question is:
How does concurrent actually work behind the scenes?
And does subprocess work like multiprocessing or concurrent?
multiprocessing and concurrent.futures both aim at running Python code in multiple processes concurrently. They're different APIs for much the same thing. multiprocessing's API was, as #András Molnár said, designed to be much like the threading module's. concurrent.futures's API was intended to be simpler.
Neither has anything to do with the GIL. The GIL is a per-process lock in CPython, and the Python processes these modules create each have their own GIL. You can't have a CPython process without a GIL, and there's no way to "override" it (although C code can release it when it's going to be running code that it knows for certain cannot execute Python code - for example, the CPython implementation routinely releases it internally when invoking a blocking I/O function in C, so that other threads can run Python code while the thread that released the GIL waits for the I/O call to complete).
The subprocess module lets you run and control other programs. Anything you can start with the command line on the computer, can be run and controlled with this module. Use this to integrate external programs into your Python code.
The multiprocessing module lets you divide tasks written in python over multiple processes to help improve performance. It provides an API very similar to the threading module; it provides methods to share data across the processes it creates, and makes the task of managing multiple processes to run Python code (much) easier. In other words, multiprocessing lets you take advantage of multiple processes to get your tasks done faster by executing code in p
I am trying to use two threads in a server program, one for listening for any communications from clients using the Twisted library, and the other for doing some other computations on the server. In my attempt to implement the threads, it seems that the python threading library doesn't support parallel threads as answered in this question. I was wondering if there is any other python library that addresses this problem? Or any other way to circumvent this limitation?
Thank you in advance.
Python's GIL (global interpreter lock) prevents two threads to simultaneously execute Python code. Fortunately that doesn't include I/O, so if your threads do significant amounts of networking, database, or filesystem, then usual threads do work correctly. They won't let you take advantage of multiple cores for computation, but will let other threads advance while any number of them are waiting for something to happen.
If your needs are more for computation than for I/O, then threads (as implemented on Python) won't help. Better use the multiprocessing module (standard since Python 2.6), which uses a 'thread-like' API to spawn multiple processes, each one with an independent Python interpreter, and therefore it's own GIL.
In CPython, threading module doesn't utilise multiple cores because it uses global interpreter lock. However I recently found multiprocessing module from standard library which is said to sidestep the GIL. So I think with that module it is possible to utilise multiple core properly in CPython, but I wonder if I'm right.
I need to write an app which requires good utilisation of multiple cores, but it's not that performance critical so I could write it in Python, but I need to know whether this module will allow me to use multiple cores?
The multiprocessing library uses child processes; these each run in their own Python interpreter.
The OS can and will schedule these process across multiple processes and cores, yes. Because each child process is a separate Python interpreter process, the GIL does not interfere.
In my application, I have tried python threading and subprocess module to open firefox, and I have noticed that subprocess is faster than threading. what could be the reason behind this?
when to use them in place of each other?
Python (or rather CPython, the c-based implementation that is commonly used) has a Global Intepreter Lock (a.k.a. the GIL).
Some kind of locking is necessary to synchronize memory access when several threads are accessing the same memory, which is what happens inside a process. Memory is not shared by between processes (unless you specifically allocate such memory), so no lock is needed there.
The globalness of the lock prevents several threads from running python code in the same process. When running mulitiple processes, the GIL does not interfere.
So, Python code does not scale on threads, you need processes for that.
Now, had your Python code mostly been calling C-APIs (NumPy/OpenGL/etc), there would be scaling since the GIL is usually released when native code is executing, so it's alright (and actually a good idea) to use Python to manage several threads that mostly execute native code.
(There are other Python interpreter implementations out there that do scale across threads (like Jython, IronPython, etc) but these aren't really mainstream.. yet, and usually a bit slower than CPython in single-thread scenarios.)
While developing a Django app deployed on Apache mod_wsgi I found that in case of multithreading (Python threads; mod_wsgi processes=1 threads=8) Python won't use all available processors. With the multiprocessing approach (mod_wsgi processes=8 threads=1) all is fine and I can load my machine at full.
So the question: is this Python behavior normal? I doubt it because using 1 process with few threads is the default mod_wsgi approach.
The system is:
2xIntel Xeon 5XXX series (8 cores (16 with hyperthreading)) on FreeBSD 7.2 AMD64 and Python 2.6.4
Thanks all for answers.
We all found that this behavior is normal because of GIL. Here is a good explanation:
http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/
or stackoverflow GIL discussion: What is a global interpreter lock (GIL)?.
Will Python use all processors in thread mode? No.
Python won't use all available processors; is this Python behavior normal? Yes, it's normal because of the GIL.
For a discussion see http://mail.python.org/pipermail/python-3000/2007-May/007414.html.
You may find that having a couple (or 4) of threads per core/process can still improve performance if there is some blocking, for example waiting for a response from the database would cause that process to block other connections otherwise.
Will python use all processors in thread mode? No.
It this normal? Yes, this is normal. Python makes no effort to locate all your cores.
"1 process with few threads is default mod_wsgi approach". But that's not optimal or even desirable. That's just a default. Don't read anything into it.
If you want to use all your computer's resources, make the OS handle it. Use processes.
The distinction between multi-processing and multi-threading is hard to measure for the most part. Using processes or threads barely matters. It's usually simpler to use processes, since there's trivial OS support for this.
Bottom Line
Use multiple processes, that allows the OS (and Apache) to make as much use as possible of the system.
Threads share a limited set of I/O resources for the Process they're part of, and web page serving is I/O bound. Processes have independent I/O resources and will more easily max out your processor.
There is still hope. The GIL is only an implementation artifact of the C Python implementation that you download from python.org. Jython and IronPython are two other implementations of Python, and they have no GIL, so you may have better threading results with one of them.
Yes. Python is not really multi-threaded. Instead, there is a global lock and each thread gets to execute a few operations in turn. This makes it much more simple to write MT applications in Python since there can't be any problems with stale caches, etc.
So one Python process can only ever occupy a single CPU. To fully utilize a multi-core system, you must run several Python processes.
I don't know if it is still the case, but there is a global lock in the Python interpreter, which prevents the use of all processor resources from a single interpreter, even when using multi threading. IIRC, the global lock has to do with I/O.
It seems you are watching the result of this lock, so, personally, I would use multiple processes with a single thread.