Multiprocessing in python vs number of cores

Multiprocessing in python vs number of cores - python

If a run a python script where i declare 6 processes using multiprocessing, but i only have 4 CPU cores, what happens to the additional 2 processes which can find a dedicated CPU core.
How are they executed?
If the two additional processes run as separate threads on the existing Cores, will GIL not stop their execution?
#Edit 1 - 21st Jan 2021
I have mixed up threads and processes in the question I asked. Since I have better clarity on the concept, I would rephrase question 2 as follows(for any future reference):
If the two additional processes run in parallel with two other processes in existing Cores, will GIL not stop their execution?
Ans: GIL does NOT affect the processes, GIL allows only one thread to run at a time, there is no restriction on processes, however. The system scheduler manages how the additional two processes will run on the existing cores.

First you are mixing up threads and processes: in Python only threads not processes have to share a lock on their interpreter.
If your are using the multiprocessing library then, your are using Python processes which have their own interpreter.
When you are using Python processes, their execution is managed by your operating system scheduler, in the same manner as every other processes in your computer.
If you have more processes than CPU cores then the extra processes are waiting in background to be scheduled.
This usually happen when an other process terminates, wait on an IO, or periodically with clock interrupts.

It is always best practice to make sure that you use
pool = multiprocessing.Semaphore(multiprocessing.cpu_count() - 1)
#this will detect the number of cores in your system and creates a semaphore with that value.
When you create a process it takes overhead to manage it, its memory space, and its shared memory. Also, the operating system has to run, so leaving a core free is always polite and speeds up the execution of the problem.

Related

Multi Thread in Python run in parallel

I have a question about multithread in Python.
I already tried Multithread and MultiProcessing in python.
What I get is
in MultiThread, I will get a duplicate result when run it pararelly. After research, I found that in the multiThread, the Multithread can update the same variable(race Condition).
Meanwhile, in the multi processing, it will go smoothly, without problem like in the Multithread.
The question,
Can I use Multithread, but the mechanism is like Multiprocessing? Because I need to migrate more than 2 million record, and I need to run that function asynchronously as much as possible (My laptop only have 4 cores) that's why I need to use multiThread.
Can some explain to me about the question above?

In multithreading, each thread will share the same memory space as the parent process that spawned them. But in multi-processing, each process have their own memory space.
However, in multithreading, you need to use a lock (semaphore/mutex), (e.g. threading.Lock() to prevent the race condition. It is not to say that multiprocessing will not have race condition, it can have it if you specifically share the same object and not the copy of it. But by default it will copy the object.
Multithreading is also limited by python's GIL (Global Interpreter Lock) which ensures that only one thread is running at the same time. So if you have intensive computation task running on two threads, it doesn't really make it faster, as only one can be active at the same time.
However, multiprocessing can overcome it easily, as it runs on multiple process and each process will be handled by OS's scheduler and run parallely.
General rule of thumb:
if your process is computationally intensive, use process
if your process is I/O intensive, use threads
If your thread needs concurrent access to the same var/object, etc, you need to use lock.

How are threads different from process in terms of how they are executed on hardware level?

I was wondering how the threads are executed on hardware level, like a process would run on a single processing core and make a context switch on the processor and the MMU in order to switch between processes. How do threads switch? Secondly when we create/spawn a new thread will it be seen as a new process would for the processor and be scheduled as a process would?
Also when should one use threads and when a new process?
I know I probably am sounding dumb right now, that's because I have massive gaps in my knowledge that I would like fill. Thanks in advance for taking the time and explaining things to me. :)

There are a few different methods for concurrency. The threading module creates threads within the same Python process and switches between them, this means they're not really running at the same time. The same happens with the Asyncio module, however this has the additional feature of setting when a thread can be switched.
Then there is the multiprocessing module which creates a separate Python process per thread. This means that the threads will not have access to shared memory but can mean that the processes run on different CPU cores and therefore can provide a performance improvement for CPU bound tasks.
Regarding when to use new threads a good rule of thumb would be:
For I/O bound problems, use threading or async I/O. This is because you're waiting on responses from something external, like a database or browser, and this waiting time can instead be filled by another thread running it's task.
For CPU bound problems use multiprocessing. This can run multiple Python processes on separate cores at the same time.
Disclaimer: Threading is not always a solution and you should first determine whether it is necessary and then look to implement the solution.

Think of it this way: "a thread is part of a process."
A "process" owns resources such as memory, open file-handles and network ports, and so on. All of these resources are then available to every "thread" which the process owns. (By definition, every "process" always contains at least one ("main") "thread.")
CPUs and cores, then, execute these "threads," in the context of the "process" which they belong to.
On a multi-CPU/multi-core system, it is therefore possible that more than one thread belonging to a particular process really is executing in parallel. Although you can never be sure.
Also: in the context of an interpreter-based programming language system like Python, the actual situation is a little bit more complicated "behind the scenes," because the Python interpreter context does exist and will be seen by all of the Python threads. This does add a slight amount of additional overhead so that it all "just works."

On the OS level, threads are units of execution that share the same resources (memory, file descriptors, etc). Groups of threads that belong to different processes are isolated from each other, can't access resources across the process boundary. You can think of a "just process" as a single thread, not unlike any other thread.
OS threads are scheduled like you would expect: if there are several cores, they can run in parallel; if there are more threads / processes ready to run than there are cores, some threads get preempted after some time, paused, and another thread has a chance to run on that core.
In Python, though, the difference between threads (threading module) and processes (multiproceessing module) is drastic.
Python runs in a VM. Threads run within that VM. Objects within the VM are reference-counted, and also are unsafe to concurrently modify. So OS thread scheduling which can preempt one thread in the middle of a VM instruction modifying an object, and give control to another object that accesses the same object, will result in corruption.
This is why the global interpreter lock aka GIL exists. It basically prevents any computational parallelism between Python "threads": only one thread can proceed at a time, no matter how many CPU cores you have. Python threads are only good for waiting for I/O.
Unlike that, multiprocessing runs a parallel VM (Python interpreter) and shares select pieces of data with it in a safe way (by copying, or using shared memory). Such parallel processes can run in parallel and utilize multiple CPU cores.
In short: Python threads ≠ OS threads.

Difference between kernel threads and user threads in a multi core cpu?

I would like to clarify my understanding of kernel threads and user threads in a multicore environment .
Only threads created by the kernel can run on different cores of a cpu ,if the cpu supports it . The user level threads are abstracted on a single core by libraries , and hence, all user level threads run on the same core .
Python threads can run only one at a time because they need to hold the GIL , so irrespective of the implementation of python threads , only 1 core can be used at a time in a multicore environment .
In nodejs there is a main thread called eventloop that handles all the core processing . All io related activities are offloaded to worker threads . But modern computers do not use the cpu for io activities , and rather offload the io activities to the io controllers . So the so called worker threads are really just an abstraction for offloading io activities to the io controller . No real thread is being created .
Hence neither a python nor a nodejs program can truly use more than one core at a time in a multicore environment .
Am I getting this right ?

I am intimately familiar with neither Python nor Node.js, but I can help you out with the rest.
In my estimation, the easiest way to understand user threads is to understand how the kernel manages (kernel) threads in a single-core system. In such a system, there is only one hardware thread, i.e. only one thread can physically be in execution on the CPU at any given time. Clearly, then, in order to run multiple threads simultaneously, the kernel needs to multiplex between the threads. This is called time sharing: the kernel juggles between threads, running each for just a bit (usually in the order of, say, 10 ms) before changing to another thread. The time quantum given to each process is short enough so that it appears that the threads are being run in parallel, while in reality they are being run sequentially. This kind of apparent parallelism is called concurrency; true parallelism requires hardware support.
User threads are just the same kind of multiplexing taken one step further.
Every process initially starts with only one kernel thread, and it will not get more unless it explicitly asks the kernel. Therefore, in such a single-threaded process, all code is executed on the same kernel thread. This includes the user-space threading library responsible for creating and managing the user threads, as well as the user threads themselves. Creating user threads doesn't result to kernel threads being created - that is exactly the point of user-space threads. The library manages the user threads created by itself in much the same way that the kernel manages kernel threads; they both perform thread scheduling, which means that user-threads, too, are run in turns for a short time, one at a time.
You'll notice that this is highly analogous to the kernel thread scheduling described above: in this analogy, the single kernel thread the process is running on is the single core of the CPU, user threads are kernel threads and the user-space threading library is the kernel.
The situation remains largely the same if the process is running on multiple kernel threads (i.e. it has requested more threads from the kernel via a system call). User threads are just data structures local to the kernel thread they are run on, and the code executed on each user thread is simply code executed on the CPU in the context of the kernel thread; when a user thread is switched to another, the kernel thread essentially performs a jump and starts executing code in another location (indicated by the user thread's instruction pointer). Therefore, it is entirely possible to create multiple user threads from multiple kernel threads, although this would pretty much defeat the purpose of using user threads in the first place.
Here is an article about multithreading (concurrency) and multiprocessing (parallelism) in Python you might find interesting.
Finally, a word of warning: there is a lot of misinformation and confusion regarding kernel threads floating around. A kernel thread is not a thread that only executes kernel code (and threads executing kernel code aren't necessarily kernel threads, depending on how you look at it).
I hope this clears it up for you - if not, please ask for clarification and I'll try my best to provide it.

Nodejs as a main thread as you said, which will execute all the javascript code.
For all I/O execution such as fs or dns which cost more, the libuv used by nodejs will offload the work to different thread. If the threads number in the pool is larger than the cores number you have on your machine, the resources of your machine will be divided.
At the end, I/O will use the different core cpu you have available.
here an article about that
If you want to take advantage of the different core of your machine for you application, you will have to use a cluster where you can find the api there
Hope i answered your question

Python multiprocessing: dealing with 2000 processes

Following is my multi processing code. regressTuple has around 2000 items. So, the following code creates around 2000 parallel processes. My Dell xps 15 laptop crashes when this is run.
Can't python multi processing library handle the queue according to hardware availability and run the program without crashing in minimal time? Am I not doing this correctly?
Is there a API call in python to get the possible hardware process count?
How can I refactor the code to use an input variable to get the parallel thread count(hard coded) and loop through threading several times till completion - In this way, after few experiments, I will be able to get the optimal thread count.
What is the best way to run this code in minimal time without crashing. (I cannot use multi-threading in my implementation)
Hereby my code:
regressTuple = [(x,) for x in regressList]
processes = []
for i in range(len(regressList)):
processes.append(Process(target=runRegressWriteStatus,args=regressTuple[i]))
for process in processes:
process.start()
for process in processes:
process.join()

There are multiple things that we need to keep in mind
Spinning the number of processes are not limited by number of cores on your system but the ulimit for your user id on your system that controls total number of processes that be launched by your user id.
The number of cores determine how many of those launched processes can actually be running in parallel at one time.
Crashing of your system can be due to the fact your target function that these processes are running is doing something heavy and resource intensive, which system is not able to handle when multiple processes run simultaneously or nprocs limit on the system has exhausted and now kernel is not able to spin new system processes.
That being said it is not a good idea to spawn as many as 2000 processes, no matter even if you have a 16 core Intel Skylake machine, because creating a new process on the system is not a light weight task because there are number of things like generating the pid, allocating memory, address space generation, scheduling the process, context switching and managing the entire life cycle of it that happen in the background. So it is a heavy operation for the kernel to generate a new process,
Unfortunately I guess what you are trying to do is a CPU bound task and hence limited by the hardware you have on the machine. Spinning more number of processes than the number of cores on your system is not going to help at all, but creating a process pool might. So basically you want to create a pool with as many number of processes as you have cores on the system and then pass the input to the pool. Something like this
def target_func(data):
# process the input data
with multiprocessing.pool(processes=multiprocessing.cpu_count()) as po:
res = po.map(f, regressionTuple)

Can't python multi processing library handle the queue according to hardware availability and run the program without crashing in
minimal time? Am I not doing this correctly?
I don't think it's python's responsibility to manage the queue length. When people reach out for multiprocessing they tend to want efficiency, adding system performance tests to the run queue would be an overhead.
Is there a API call in python to get the possible hardware process count?
If there were, would it know ahead of time how much memory your task will need?
How can I refactor the code to use an input variable to get the parallel thread count(hard coded) and loop through threading several
times till completion - In this way, after few experiments, I will be
able to get the optimal thread count.
As balderman pointed out, a pool is a good way forward with this.
What is the best way to run this code in minimal time without crashing. (I cannot use multi-threading in my implementation)
Use a pool, or take the available system memory, divide by ~3MB and see how many tasks you can run at once.
This is probably more of a sysadmin task to balance the bottlenecks against the queue length, but generally, if your tasks are IO bound, then there isn't much point in having a long task queue if all the tasks are waiting at a the same T-junction to turn into the road. The tasks will then fight with each other for the next block of IO.

Can I run multiprocessing Python programs on a single core machine?

So this is more or less a theoretical question. I have a single core machine which is supposedly powerful but nevertheless only one core. Now I have two choices to make :
Multithreading: As far as my knowledge is concerned I cannot make use of multiple cores in my machines even if I had them because of GIL. Hence in this situation, it does not make any difference.
Multiprocessing: This is where I have a doubt. Can I do multiprocessing on a single core machine? Or everytime I have to check the cores available in my machine and then run exactly the same or less number of processes?
Can someone please guide me on the relation between multiprocessing and cores in a machine.
I know this is a theoretical question but my concepts are not very clear on this.

This is a big topic but here are some pointers.
Think of threads as processes that share the same address space and can access the same memory. Communication is done by shared variables. Multiple threads can run within the same process.
Processes (in this context, and roughly speaking) have their own private data and if two processes want to communicate that communication has to be done more explicitly.
When you are writing a program where the bottleneck is CPU cycles, neither threads or processes will give you a speedup on a single core machine.
Processes and threads are still useful for multitasking (rapid switching between (sub)programs) - this is what your operating system does because it runs far more processes than you have cores.
Processes and threads (or even coroutines!) can give you considerable speedup even on a single core machine if the tasks you are executing are I/O bound - think of fetching data from a network. For example, instead of actively waiting for data to be sent or to arrive, another process or thread can initiate the next network operation.
Threads are preferable over processes when you don't need explicit encapsulation due to their lower overhead. For most CPU-bound concurrent problems, and especially the large subset of "embarassingly parallel" ones, it does not make much sense to spawn more processes than you have processors.
The Python GIL prevents two threads in the same process from running in parallel, i.e. from multiple cores executing instructions literally at the same time.
Therefore threads in Python are relatively useless for speeding up CPU-bound tasks, but can still be very useful for I/O bound tasks, because blocking operations (e.g. waiting for network data) release the GIL such that another thread can run while the other waits.
If you have multiple processors, you can have true parallelism by spawning multiple processes despite the GIL. This is only worth it for CPU bound tasks, and often you have to consider the overhead of spawning processes and the communication cost between processes.

You CAN use both multithreading and multiprocessing in single core systems.
The GIL limits the usefulness of multithreading in pure Python for computation-bound tasks, no matter your underlying architecture. For I/O-bound tasks, they do work perfectly fine. Had they had not any use, they would not have been implemented in the first place, probably.
For pure Python software, multiprocessing is always a safer choice when it comes to parallel computing. Of course, multiple processes are more expensive than multiple threads (since processes do not share memory, contrarily to threads; also, processes come with slightly higher overhead compared to threads).
For single processor machines, however, multiprocessing (and multithreading) buys you little to no extra speed for computationally heavy tasks, and they should actually even slow you down a bit. But, if the OS supports them (which is pretty common for desktop, workstation, clusters, etc, but may not be common for embedded systems), they allow you to effectively run simultaneously multiple I/O-bound programs.
Long story short, it depends a bit on what you are doing...

multiprocessing module basically spawns up multiple instances of python interpreter, so there is no worry of GIL.
multiprocessing uses the same API used by threading module if you have used it previously.
You seem to be confused between multiprocessing, threading (you referring as multithreading) and X-core processor.
No matter what, when you start Python (CPython implementation) it will only use one core of your processor.
Threading is distributing the load between the different component of the script. Suppose you have to interact with an external API, your script has to wait for communication to finish until it proceeds next. You have are making multiple similar calls, it will take linear time. Whereas if you use threading, you can do those calls parallelly.
See also: PyPy implementation of Python

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.