Python threads for networking - threads don't run in parallel

Python threads for networking - threads don't run in parallel - python

I am trying to use two threads in a server program, one for listening for any communications from clients using the Twisted library, and the other for doing some other computations on the server. In my attempt to implement the threads, it seems that the python threading library doesn't support parallel threads as answered in this question. I was wondering if there is any other python library that addresses this problem? Or any other way to circumvent this limitation?
Thank you in advance.

Python's GIL (global interpreter lock) prevents two threads to simultaneously execute Python code. Fortunately that doesn't include I/O, so if your threads do significant amounts of networking, database, or filesystem, then usual threads do work correctly. They won't let you take advantage of multiple cores for computation, but will let other threads advance while any number of them are waiting for something to happen.
If your needs are more for computation than for I/O, then threads (as implemented on Python) won't help. Better use the multiprocessing module (standard since Python 2.6), which uses a 'thread-like' API to spawn multiple processes, each one with an independent Python interpreter, and therefore it's own GIL.

Related

Difference between multiprocessing and concurrent libraries?

Here's what I understand:
The multiprocessing library uses multiple cores, so it's processing in parallel and not just simulating parallel processing like some libraries. To do this, it overrides the Python GIL.
The concurrent library doesn't override the Python GIL and so it doesn't have the issues that multiprocessing has (ie locking, hanging). So it seems like it's not actually using multiple cores.
I understand the difference between concurrency and parallelism. My question is:
How does concurrent actually work behind the scenes?
And does subprocess work like multiprocessing or concurrent?

multiprocessing and concurrent.futures both aim at running Python code in multiple processes concurrently. They're different APIs for much the same thing. multiprocessing's API was, as #András Molnár said, designed to be much like the threading module's. concurrent.futures's API was intended to be simpler.
Neither has anything to do with the GIL. The GIL is a per-process lock in CPython, and the Python processes these modules create each have their own GIL. You can't have a CPython process without a GIL, and there's no way to "override" it (although C code can release it when it's going to be running code that it knows for certain cannot execute Python code - for example, the CPython implementation routinely releases it internally when invoking a blocking I/O function in C, so that other threads can run Python code while the thread that released the GIL waits for the I/O call to complete).

The subprocess module lets you run and control other programs. Anything you can start with the command line on the computer, can be run and controlled with this module. Use this to integrate external programs into your Python code.
The multiprocessing module lets you divide tasks written in python over multiple processes to help improve performance. It provides an API very similar to the threading module; it provides methods to share data across the processes it creates, and makes the task of managing multiple processes to run Python code (much) easier. In other words, multiprocessing lets you take advantage of multiple processes to get your tasks done faster by executing code in p

Is anyone using zeromq to coordinate multiple Python interpreters in the same process?

I love Python's global interpreter lock because it makes the underlying C code simple.
But it means that each Python interpreter main loop is restricted to one thread at a time.
This is bad because recently the number of cores per processor chip has been doubling frequently.
One of the supposed advantages to zeromq is that it makes multi-threaded programming "easy" or easier.
Is it possible to launch multiple Python interpreters in the same process and have them communicate only using in-process zeromq with no other shared state? Has anyone tried it? Does it work well? Please comment and/or provide links.

I don't know of any way to create multiple instances of the Python interpreter within a single process, but I do have experience with splitting multiple instances across multiple processes and communicating with zmq.
I've been using multiprocessing to implement an island-model architecture for global optimization, with zmq for managing communication between the islands. Each island is its own process with its own Python interpreter, created and managed by the master archipelago process.
Using multiprocessing allows you to launch as many independent Python interpreters as you wish, but they all reside in their own processes with a separate memory space. I believe the OS scheduler takes care of assigning processes to cores and sharing CPU time. The separate memory space is the hardest part, because it means you have to explicitly communicate. To communicate between processes, the objects/data you wish to send must be serializable, because zmq sends byte-strings.
The nice thing about zmq is that it's a piece of cake to scale across systems distributed over a network, and it's pretty lightweight. You can create just about any communication pattern you wish, using REP/REQ, PUB/SUB, or whatever.
But no, it's not as easy as just spinning up a few threads from the threading module.
Edit: Also, here's a Stack Overflow question similar to yours. Inside are some more relevant links which indicate that it may be possible to run multiple Python interpreters within a single process, but it doesn't look simple. Multiple independent embedded Python Interpreters on multiple operating system threads invoked from C/C++ program

Why thread is slower than subprocess ? when should I use subprocess in place of thread and vise versa

In my application, I have tried python threading and subprocess module to open firefox, and I have noticed that subprocess is faster than threading. what could be the reason behind this?
when to use them in place of each other?

Python (or rather CPython, the c-based implementation that is commonly used) has a Global Intepreter Lock (a.k.a. the GIL).
Some kind of locking is necessary to synchronize memory access when several threads are accessing the same memory, which is what happens inside a process. Memory is not shared by between processes (unless you specifically allocate such memory), so no lock is needed there.
The globalness of the lock prevents several threads from running python code in the same process. When running mulitiple processes, the GIL does not interfere.
So, Python code does not scale on threads, you need processes for that.
Now, had your Python code mostly been calling C-APIs (NumPy/OpenGL/etc), there would be scaling since the GIL is usually released when native code is executing, so it's alright (and actually a good idea) to use Python to manage several threads that mostly execute native code.
(There are other Python interpreter implementations out there that do scale across threads (like Jython, IronPython, etc) but these aren't really mainstream.. yet, and usually a bit slower than CPython in single-thread scenarios.)

Will Python use all processors in thread mode?

While developing a Django app deployed on Apache mod_wsgi I found that in case of multithreading (Python threads; mod_wsgi processes=1 threads=8) Python won't use all available processors. With the multiprocessing approach (mod_wsgi processes=8 threads=1) all is fine and I can load my machine at full.
So the question: is this Python behavior normal? I doubt it because using 1 process with few threads is the default mod_wsgi approach.
The system is:
2xIntel Xeon 5XXX series (8 cores (16 with hyperthreading)) on FreeBSD 7.2 AMD64 and Python 2.6.4
Thanks all for answers.
We all found that this behavior is normal because of GIL. Here is a good explanation:
http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/
or stackoverflow GIL discussion: What is a global interpreter lock (GIL)?.

Will Python use all processors in thread mode? No.
Python won't use all available processors; is this Python behavior normal? Yes, it's normal because of the GIL.
For a discussion see http://mail.python.org/pipermail/python-3000/2007-May/007414.html.
You may find that having a couple (or 4) of threads per core/process can still improve performance if there is some blocking, for example waiting for a response from the database would cause that process to block other connections otherwise.

Will python use all processors in thread mode? No.
It this normal? Yes, this is normal. Python makes no effort to locate all your cores.
"1 process with few threads is default mod_wsgi approach". But that's not optimal or even desirable. That's just a default. Don't read anything into it.
If you want to use all your computer's resources, make the OS handle it. Use processes.
The distinction between multi-processing and multi-threading is hard to measure for the most part. Using processes or threads barely matters. It's usually simpler to use processes, since there's trivial OS support for this.
Bottom Line
Use multiple processes, that allows the OS (and Apache) to make as much use as possible of the system.
Threads share a limited set of I/O resources for the Process they're part of, and web page serving is I/O bound. Processes have independent I/O resources and will more easily max out your processor.

There is still hope. The GIL is only an implementation artifact of the C Python implementation that you download from python.org. Jython and IronPython are two other implementations of Python, and they have no GIL, so you may have better threading results with one of them.

Yes. Python is not really multi-threaded. Instead, there is a global lock and each thread gets to execute a few operations in turn. This makes it much more simple to write MT applications in Python since there can't be any problems with stale caches, etc.
So one Python process can only ever occupy a single CPU. To fully utilize a multi-core system, you must run several Python processes.

I don't know if it is still the case, but there is a global lock in the Python interpreter, which prevents the use of all processor resources from a single interpreter, even when using multi threading. IIRC, the global lock has to do with I/O.
It seems you are watching the result of this lock, so, personally, I would use multiple processes with a single thread.

Does running separate python processes avoid the GIL?

I'm curious in how the Global Interpreter Lock in python actually works. If I have a c++ application launch four separate instances of a python script will they run in parallel on separate cores, or does the GIL go even deeper then just the single process that was launched and control all python process's regardless of the process that spawned it?

The GIL only affects threads within a single process. The multiprocessing module is in fact an alternative to threading that lets Python programs use multiple cores &c. Your scenario will easily allow use of multiple cores, too.

As Alex Martelli points out you can indeed avoid the GIL by running multiple processes, I just want to add and point out that the GIL is a limitation of the implementation (CPython) and not of Python in general, it's possible to implement Python without this limitation. Stackless Python comes to mind.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.