Multi Thread in Python run in parallel

Multi Thread in Python run in parallel - python

I have a question about multithread in Python.
I already tried Multithread and MultiProcessing in python.
What I get is
in MultiThread, I will get a duplicate result when run it pararelly. After research, I found that in the multiThread, the Multithread can update the same variable(race Condition).
Meanwhile, in the multi processing, it will go smoothly, without problem like in the Multithread.
The question,
Can I use Multithread, but the mechanism is like Multiprocessing? Because I need to migrate more than 2 million record, and I need to run that function asynchronously as much as possible (My laptop only have 4 cores) that's why I need to use multiThread.
Can some explain to me about the question above?

In multithreading, each thread will share the same memory space as the parent process that spawned them. But in multi-processing, each process have their own memory space.
However, in multithreading, you need to use a lock (semaphore/mutex), (e.g. threading.Lock() to prevent the race condition. It is not to say that multiprocessing will not have race condition, it can have it if you specifically share the same object and not the copy of it. But by default it will copy the object.
Multithreading is also limited by python's GIL (Global Interpreter Lock) which ensures that only one thread is running at the same time. So if you have intensive computation task running on two threads, it doesn't really make it faster, as only one can be active at the same time.
However, multiprocessing can overcome it easily, as it runs on multiple process and each process will be handled by OS's scheduler and run parallely.
General rule of thumb:
if your process is computationally intensive, use process
if your process is I/O intensive, use threads
If your thread needs concurrent access to the same var/object, etc, you need to use lock.

Related

How are threads different from process in terms of how they are executed on hardware level?

I was wondering how the threads are executed on hardware level, like a process would run on a single processing core and make a context switch on the processor and the MMU in order to switch between processes. How do threads switch? Secondly when we create/spawn a new thread will it be seen as a new process would for the processor and be scheduled as a process would?
Also when should one use threads and when a new process?
I know I probably am sounding dumb right now, that's because I have massive gaps in my knowledge that I would like fill. Thanks in advance for taking the time and explaining things to me. :)

There are a few different methods for concurrency. The threading module creates threads within the same Python process and switches between them, this means they're not really running at the same time. The same happens with the Asyncio module, however this has the additional feature of setting when a thread can be switched.
Then there is the multiprocessing module which creates a separate Python process per thread. This means that the threads will not have access to shared memory but can mean that the processes run on different CPU cores and therefore can provide a performance improvement for CPU bound tasks.
Regarding when to use new threads a good rule of thumb would be:
For I/O bound problems, use threading or async I/O. This is because you're waiting on responses from something external, like a database or browser, and this waiting time can instead be filled by another thread running it's task.
For CPU bound problems use multiprocessing. This can run multiple Python processes on separate cores at the same time.
Disclaimer: Threading is not always a solution and you should first determine whether it is necessary and then look to implement the solution.

Think of it this way: "a thread is part of a process."
A "process" owns resources such as memory, open file-handles and network ports, and so on. All of these resources are then available to every "thread" which the process owns. (By definition, every "process" always contains at least one ("main") "thread.")
CPUs and cores, then, execute these "threads," in the context of the "process" which they belong to.
On a multi-CPU/multi-core system, it is therefore possible that more than one thread belonging to a particular process really is executing in parallel. Although you can never be sure.
Also: in the context of an interpreter-based programming language system like Python, the actual situation is a little bit more complicated "behind the scenes," because the Python interpreter context does exist and will be seen by all of the Python threads. This does add a slight amount of additional overhead so that it all "just works."

On the OS level, threads are units of execution that share the same resources (memory, file descriptors, etc). Groups of threads that belong to different processes are isolated from each other, can't access resources across the process boundary. You can think of a "just process" as a single thread, not unlike any other thread.
OS threads are scheduled like you would expect: if there are several cores, they can run in parallel; if there are more threads / processes ready to run than there are cores, some threads get preempted after some time, paused, and another thread has a chance to run on that core.
In Python, though, the difference between threads (threading module) and processes (multiproceessing module) is drastic.
Python runs in a VM. Threads run within that VM. Objects within the VM are reference-counted, and also are unsafe to concurrently modify. So OS thread scheduling which can preempt one thread in the middle of a VM instruction modifying an object, and give control to another object that accesses the same object, will result in corruption.
This is why the global interpreter lock aka GIL exists. It basically prevents any computational parallelism between Python "threads": only one thread can proceed at a time, no matter how many CPU cores you have. Python threads are only good for waiting for I/O.
Unlike that, multiprocessing runs a parallel VM (Python interpreter) and shares select pieces of data with it in a safe way (by copying, or using shared memory). Such parallel processes can run in parallel and utilize multiple CPU cores.
In short: Python threads ≠ OS threads.

What is the difference between ProcessPoolExecutor and ThreadPoolExecutor?

I read the docs trying to get a basic understanding but it only shows that ProcessPoolExecutor allows to side-step the Global Interpreter Lock which I think is the way to lock a variable or function so that parallel processes do not update its value at the same time.
What I am looking for is when to use ProcessPoolExecutor and when to use ThreadPoolExecutor and what should I keep in mind while using each approach!

ProcessPoolExecutor runs each of your workers in its own separate child process.
ThreadPoolExecutor runs each of your workers in separate threads within the main process.
The Global Interpreter Lock (GIL) doesn't just lock a variable or function; it locks the entire interpreter. This means that every builtin operation, including things like listodicts[3]['spam'] = eggs, is automatically thread-safe.
But it also means that if your code is CPU-bound (that is, it spends its time doing calculations rather than, e.g., waiting on network responses), and not spending most of its time in an external library designed to release the GIL (like NumPy), only one thread can own the GIL at a time. So, if you've got 4 threads, even if you have 4 or even 16 cores, most of the time, 3 of them will be sitting around waiting for the GIL. So, instead of getting 4x faster, your code gets a bit slower.
Again, for I/O-bound code (e.g., waiting on a bunch of servers to respond to a bunch of HTTP requests you made), threads are just fine; it's only for CPU-bound code that this is an issue.
Each separate child process has its own separate GIL, so this problem goes away—even if your code is CPU-bound, using 4 child processes can still make it run almost 4x as fast.
But child processes don't share any variables. Normally, this is a good thing—you pass (copies of) values in as the arguments to your function, and return (copies of) values back, and the process isolation guarantees that you're doing this safely. But occasionally (usually for performance reasons, but also sometimes because you're passing around objects that can't be copied via pickle), this is not acceptable, so you either need to use threads, or use the more complicated explicit shared data wrappers in the multiprocessing module.

ProcessPool is for CPU bound tasks so you can benefit from multiple CPU.
Threads is for io bound tasks so you can benefit from io wait.

Why do we blame GIL if CPU can execute one process (light weight) at a time? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Can standard C Python has more than one thread running at the same time? [duplicate]

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

How to maximize performance in Python when doing many I/O bound operations?

I have a situation where I'm downloading a lot of files. Right now everything runs on one main Python thread, and downloads as many as 3000 files every few minutes. The problem is that the time it takes to do this is too long. I realize Python has no true multi-threading, but is there a better way of doing this? I was thinking of launching multiple threads since the I/O bound operations should not require access to the global interpreter lock, but perhaps I misunderstand that concept.

Multithreading is just fine for the specific purpose of speeding up I/O on the net (although asynchronous programming would give even greater performance). CPython's multithreading is quite "true" (native OS threads) -- what you're probably thinking of is the GIL, the global interpreter lock that stops different threads from simultaneously running Python code. But all the I/O primitives give up the GIL while they're waiting for system calls to complete, so the GIL is not relevant to I/O performance!
For asynchronous programming, the most powerful framework around is twisted, but it can take a while to get the hang of it if you're never done such programming. It would probably be simpler for you to get extra I/O performance via the use of a pool of threads.

Could always take a look at multiprocessing.

is there a better way of doing this?
Yes
I was thinking of launching multiple threads since the I/O bound operations
Don't.
At the OS level, all the threads in a process are sharing a limited set of I/O resources.
If you want real speed, spawn as many heavyweight OS processes as your platform will tolerate. The OS is really, really good about balancing I/O workloads among processes. Make the OS sort this out.
Folks will say that spawning 3000 processes is bad, and they're right. You probably only want to spawn a few hundred at a time.
What you really want is the following.
A shared message queue in which the 3000 URI's are queued up.
A few hundred workers which are all reading from the queue.
Each worker gets a URI from the queue and gets the file.
The workers can stay running. When the queue's empty, they'll just sit there, waiting for work.
"every few minutes" you dump the 3000 URI's into the queue to make the workers start working.
This will tie up every resource on your processor, and it's quite trivial. Each worker is only a few lines of code. Loading the queue is a special "manager" that's just a few lines of code, also.

Gevent is perfect for this.
Gevent's use of Greenlets (lightweight coroutines in the same python process) offer you asynchronous operations without compromising code readability or introducing abstract 'reactor' concepts into your mix.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.