Constantly launching multiprocessing Processes in python - python

I have a relatively CPU heavy function that I've been running as a thread in a package I have. Since it's CPU heavy I want to move it to a multiprocessing process. The nature of this function is such that it will be called very often. Is this a fair / safe use for multiprocessing?
An alternative would be to launch the function with multiprocessing and have it run continually, and accept input from somewhere, though I am new to the multiprocessing module and am not sure if I can feed one of its processes data while it's running.

Related

Difference between multiprocessing and concurrent libraries?

Here's what I understand:
The multiprocessing library uses multiple cores, so it's processing in parallel and not just simulating parallel processing like some libraries. To do this, it overrides the Python GIL.
The concurrent library doesn't override the Python GIL and so it doesn't have the issues that multiprocessing has (ie locking, hanging). So it seems like it's not actually using multiple cores.
I understand the difference between concurrency and parallelism. My question is:
How does concurrent actually work behind the scenes?
And does subprocess work like multiprocessing or concurrent?
multiprocessing and concurrent.futures both aim at running Python code in multiple processes concurrently. They're different APIs for much the same thing. multiprocessing's API was, as #András Molnár said, designed to be much like the threading module's. concurrent.futures's API was intended to be simpler.
Neither has anything to do with the GIL. The GIL is a per-process lock in CPython, and the Python processes these modules create each have their own GIL. You can't have a CPython process without a GIL, and there's no way to "override" it (although C code can release it when it's going to be running code that it knows for certain cannot execute Python code - for example, the CPython implementation routinely releases it internally when invoking a blocking I/O function in C, so that other threads can run Python code while the thread that released the GIL waits for the I/O call to complete).
The subprocess module lets you run and control other programs. Anything you can start with the command line on the computer, can be run and controlled with this module. Use this to integrate external programs into your Python code.
The multiprocessing module lets you divide tasks written in python over multiple processes to help improve performance. It provides an API very similar to the threading module; it provides methods to share data across the processes it creates, and makes the task of managing multiple processes to run Python code (much) easier. In other words, multiprocessing lets you take advantage of multiple processes to get your tasks done faster by executing code in p

Using Subprocess to Avoid GIL

I am downloading and unzipping many large files in parallel using threading, but as I understand it, the GIL limits how much of my CPU I can actually use.
When I learned about Linux in school, I remember that we had a lab in which we spawned a lot of processes using foo.py & in the command line. These processes used up all of our CPU power.
Currently, I am working in Windows, and I wonder whether I can use the subprocess module to also spawn multiple Python processes, each with its own GIL. I would split my list of download links into, say, four roughly equal lists and pass one of these sub-lists to each of four sub-processes. Then each subprocess would use threading to further speed up my downloads. I'd do the same for the unzipping, which takes even longer than the downloading.
Am I conceptualizing subprocesses correctly, and is it possible that this approach would work for my downloading and unzipping purposes?
I've searched around SO and other web resources, but I've not found much addressing such a hacky approach to multi-processing multi-threading. There was this question, which said that the main program doesn't communicate with subprocesses once the latter are spawned, but for my purposes, I would only need each subprocess to send a "finished" flag to back to the main program.
Thank you!

Python multithreading, How is it using multiple Cores?

I am running a multithreaded application(Python2.7.3) in a Intel(R) Core(TM)2 Duo CPU E7500 # 2.93GHz. I thought it would be using only one core but using the "top" command I see that the python processes are constantly changing the core no. Enabling "SHOW THREADS" in the top command shows diffrent thread processes working on different cores.
Can anyone please explain this? It is bothering me as I know from theory that multithreading is executed on a single core.
First off, multithreading means the inverse, namely that multiple cores are being utilized (via threading) at the same time. CPython is indeed crippled when it comes to this, though whenever you call into C code (this includes parts of the standard library, but also extension modules like Numpy) the lock which prevents concurrent execution of Python code may be unlocked. You can still have multiple threads, they just won't be interpreting Python at the same time (instead, they'll take turns quite frequently). You also speak of "Python processes" -- are you confusing terminology, or is this "multithreaded" Python application in fact multiprocessing? Of course multiple Python processes can run concurrently.
However, from your wording I suspect another source of confusion. Even a single thread can run on multiple cores... just not at the same time. It is up to the operating system which thread is running on which CPU, and the OS scheduler does not necessarily re-assign a thread to the same CPU where it used to run before it was suspended (it's beneficial, as David Schwartz notes in the comments, but not vital). That is, it's perfectly normal for a single thread/process to jump from CPU to CPU.
Threads are designed to take advantage of multiple cores when they are available. If you only have one core, they'll run on one core too. :-)
There's nothing to be concerned about, what you observe is "working as intended".

Why thread is slower than subprocess ? when should I use subprocess in place of thread and vise versa

In my application, I have tried python threading and subprocess module to open firefox, and I have noticed that subprocess is faster than threading. what could be the reason behind this?
when to use them in place of each other?
Python (or rather CPython, the c-based implementation that is commonly used) has a Global Intepreter Lock (a.k.a. the GIL).
Some kind of locking is necessary to synchronize memory access when several threads are accessing the same memory, which is what happens inside a process. Memory is not shared by between processes (unless you specifically allocate such memory), so no lock is needed there.
The globalness of the lock prevents several threads from running python code in the same process. When running mulitiple processes, the GIL does not interfere.
So, Python code does not scale on threads, you need processes for that.
Now, had your Python code mostly been calling C-APIs (NumPy/OpenGL/etc), there would be scaling since the GIL is usually released when native code is executing, so it's alright (and actually a good idea) to use Python to manage several threads that mostly execute native code.
(There are other Python interpreter implementations out there that do scale across threads (like Jython, IronPython, etc) but these aren't really mainstream.. yet, and usually a bit slower than CPython in single-thread scenarios.)

Does running separate python processes avoid the GIL?

I'm curious in how the Global Interpreter Lock in python actually works. If I have a c++ application launch four separate instances of a python script will they run in parallel on separate cores, or does the GIL go even deeper then just the single process that was launched and control all python process's regardless of the process that spawned it?
The GIL only affects threads within a single process. The multiprocessing module is in fact an alternative to threading that lets Python programs use multiple cores &c. Your scenario will easily allow use of multiple cores, too.
As Alex Martelli points out you can indeed avoid the GIL by running multiple processes, I just want to add and point out that the GIL is a limitation of the implementation (CPython) and not of Python in general, it's possible to implement Python without this limitation. Stackless Python comes to mind.

Categories