Difference between multiprocessing and concurrent libraries? - python

Here's what I understand:
The multiprocessing library uses multiple cores, so it's processing in parallel and not just simulating parallel processing like some libraries. To do this, it overrides the Python GIL.
The concurrent library doesn't override the Python GIL and so it doesn't have the issues that multiprocessing has (ie locking, hanging). So it seems like it's not actually using multiple cores.
I understand the difference between concurrency and parallelism. My question is:
How does concurrent actually work behind the scenes?
And does subprocess work like multiprocessing or concurrent?

multiprocessing and concurrent.futures both aim at running Python code in multiple processes concurrently. They're different APIs for much the same thing. multiprocessing's API was, as #András Molnár said, designed to be much like the threading module's. concurrent.futures's API was intended to be simpler.
Neither has anything to do with the GIL. The GIL is a per-process lock in CPython, and the Python processes these modules create each have their own GIL. You can't have a CPython process without a GIL, and there's no way to "override" it (although C code can release it when it's going to be running code that it knows for certain cannot execute Python code - for example, the CPython implementation routinely releases it internally when invoking a blocking I/O function in C, so that other threads can run Python code while the thread that released the GIL waits for the I/O call to complete).

The subprocess module lets you run and control other programs. Anything you can start with the command line on the computer, can be run and controlled with this module. Use this to integrate external programs into your Python code.
The multiprocessing module lets you divide tasks written in python over multiple processes to help improve performance. It provides an API very similar to the threading module; it provides methods to share data across the processes it creates, and makes the task of managing multiple processes to run Python code (much) easier. In other words, multiprocessing lets you take advantage of multiple processes to get your tasks done faster by executing code in p

Related

Constantly launching multiprocessing Processes in python

I have a relatively CPU heavy function that I've been running as a thread in a package I have. Since it's CPU heavy I want to move it to a multiprocessing process. The nature of this function is such that it will be called very often. Is this a fair / safe use for multiprocessing?
An alternative would be to launch the function with multiprocessing and have it run continually, and accept input from somewhere, though I am new to the multiprocessing module and am not sure if I can feed one of its processes data while it's running.

Python threads for networking - threads don't run in parallel

I am trying to use two threads in a server program, one for listening for any communications from clients using the Twisted library, and the other for doing some other computations on the server. In my attempt to implement the threads, it seems that the python threading library doesn't support parallel threads as answered in this question. I was wondering if there is any other python library that addresses this problem? Or any other way to circumvent this limitation?
Thank you in advance.
Python's GIL (global interpreter lock) prevents two threads to simultaneously execute Python code. Fortunately that doesn't include I/O, so if your threads do significant amounts of networking, database, or filesystem, then usual threads do work correctly. They won't let you take advantage of multiple cores for computation, but will let other threads advance while any number of them are waiting for something to happen.
If your needs are more for computation than for I/O, then threads (as implemented on Python) won't help. Better use the multiprocessing module (standard since Python 2.6), which uses a 'thread-like' API to spawn multiple processes, each one with an independent Python interpreter, and therefore it's own GIL.

Does multiprocessing module fix CPython multi-core usage?

In CPython, threading module doesn't utilise multiple cores because it uses global interpreter lock. However I recently found multiprocessing module from standard library which is said to sidestep the GIL. So I think with that module it is possible to utilise multiple core properly in CPython, but I wonder if I'm right.
I need to write an app which requires good utilisation of multiple cores, but it's not that performance critical so I could write it in Python, but I need to know whether this module will allow me to use multiple cores?
The multiprocessing library uses child processes; these each run in their own Python interpreter.
The OS can and will schedule these process across multiple processes and cores, yes. Because each child process is a separate Python interpreter process, the GIL does not interfere.

Why thread is slower than subprocess ? when should I use subprocess in place of thread and vise versa

In my application, I have tried python threading and subprocess module to open firefox, and I have noticed that subprocess is faster than threading. what could be the reason behind this?
when to use them in place of each other?
Python (or rather CPython, the c-based implementation that is commonly used) has a Global Intepreter Lock (a.k.a. the GIL).
Some kind of locking is necessary to synchronize memory access when several threads are accessing the same memory, which is what happens inside a process. Memory is not shared by between processes (unless you specifically allocate such memory), so no lock is needed there.
The globalness of the lock prevents several threads from running python code in the same process. When running mulitiple processes, the GIL does not interfere.
So, Python code does not scale on threads, you need processes for that.
Now, had your Python code mostly been calling C-APIs (NumPy/OpenGL/etc), there would be scaling since the GIL is usually released when native code is executing, so it's alright (and actually a good idea) to use Python to manage several threads that mostly execute native code.
(There are other Python interpreter implementations out there that do scale across threads (like Jython, IronPython, etc) but these aren't really mainstream.. yet, and usually a bit slower than CPython in single-thread scenarios.)

Does running separate python processes avoid the GIL?

I'm curious in how the Global Interpreter Lock in python actually works. If I have a c++ application launch four separate instances of a python script will they run in parallel on separate cores, or does the GIL go even deeper then just the single process that was launched and control all python process's regardless of the process that spawned it?
The GIL only affects threads within a single process. The multiprocessing module is in fact an alternative to threading that lets Python programs use multiple cores &c. Your scenario will easily allow use of multiple cores, too.
As Alex Martelli points out you can indeed avoid the GIL by running multiple processes, I just want to add and point out that the GIL is a limitation of the implementation (CPython) and not of Python in general, it's possible to implement Python without this limitation. Stackless Python comes to mind.

Categories