Simple multiprocessing/threading example in python?

Simple multiprocessing/threading example in python? - python

What's the differences between module threading, Thread, multiprocessing? (may be I have badly understood the conceptual differences between multithreads (share memory and global variables?) and multiprocesses (truly independant processes?
Could you please use to illustrate this simple example (calculus doesn't matter):
I have a loop that performs independant calculus that I wish to accelerate through parallel calculus:
def myfunct(d):
facto = 1
for x in range(d):
facto*=x
return facto
cases = [1,2,3,4] # and so on
for d in cases: #loop to parallelize
print myfunct(d) # or to store on a common list when calculated
thanks for your incoming pedagogical answers.

No need for answers beyond the documentation
"""multiprocessing is a package that
supports spawning processes using an
API similar to the threading module.
The multiprocessing package offers
both local and remote concurrency,
effectively side-stepping the Global
Interpreter Lock by using subprocesses
instead of threads. Due to this, the
multiprocessing module allows the
programmer to fully leverage multiple
processors on a given machine. It runs
on both Unix and Windows. """
threading is about threads
multiprocessing about processes
What else is needed?

Related

Python - pooling doesn't use all cores

I'm using Pool from multiprocessing package (from multiprocessing.dummy import Pool).
I wrote a function that reads a text file and preprocessing it for a future function.
I have about 20,000 such text files, thus I wanted to parallelize the process- and for this I used the pool.
I have 32 cores on my remote server that is running the code, thus I tried to open 70 process (I also tried less, the problem remains) - this is how my system monitor looks like:
As one can see, 16 out of 32 cores don't work at all!
Any help would be appreciated.

As I said in my comment, all multiprocessing.dummy structures are intended to simulate the multiprocessing interface using regular threads which can be quite useful for testing, debugging, profiling etc. Or, as the official docs say:
multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.
While Python (CPython) threading uses real system threads, and hence it is in theory possible to have your threaded code execute on different CPUs, due to the dreaded GIL no two of those threads will ever run simultaneously. There are exceptions to that rule, tho - all tasks that are abstracting system calls and wait for an event (like I/O) can execute in parallel but the moment processing moves to the Python domain it will be locked out by GIL and will not be allowed to continue execution until the opt-code counter switches its context.
Long story short, if you want to utilize multiple cores through a multiprocessing pool, do not use the adaptations and abstractions in the multiprocessing.dummy (that reigns true for other dummy packages, too) and use the root multiprocessing module itself - in your case, multiprocessing.pool.Pool.
That being said, given that the threading module doesn't come with a pool interface I often find myself using multiprocessing.dummy.Pool (or multiprocessing.pool.ThreadPool) instead for I/O heavy stuff (i.e. not restricted by the GIL) when shared memory is more important than shared processing and the overhead that it incurs. It's quite possible that even with a switch to multiprocessing.pool.Pool you won't notice much of a difference if you don't do heavy post-processing when you grab the files.

Can standard C Python has more than one thread running at the same time? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Python: Solving Multiple Linear Systems using Threads

I am attempting to solve multiple linear systems using python and scipy using threads. I am an absolute beginner when it comes to python threads. I have attached code which distils what I'm trying to accomplish. This code works but the execution time actually increases as one increases totalThreads. My guess is that spsolve is being treated as a critical section and is not actually being run concurrently.
My questions are as follows:
Is spsolve thread-safe?
If spsolve is blocking, is there a way around it?
Is there another linear solver package that I can be using which parallelizes better?
Is there a better way to write this code segment that will increase performance?
I have been searching the web for answers but with no luck. Perhaps, I am just using the wrong keywords. Thanks for everyone's help.
def Worker(threadnum, totalThreads):
for i in range(threadnum,N,totalThreads):
x[:,i] = sparse.linalg.spsolve( A, b[:,i] )
threads = []
for threadnum in range(totalThreads):
t = threading.Thread(target=Worker, args=(threadnum, totalThreads))
threads.append(t)
t.start()
for threadnum in range(totalThreads): threads[threadnum].join()

The first thing you should understand is that, counterintuitively, Python's threading module will not let you take advantage of multiple cores. This is due to something called the Global Interpreter Lock (GIL), which is a critical part of the standard cPython implementation. See here for more info: What is a global interpreter lock (GIL)?
You should consider using the multiprocessing module instead, which gets around the GIL by spinning up multiple independent Python processes. This is a bit more difficult to work with, because different processes have different memory spaces, so you can't just share a thread-safe object between all processes and expect the object to stay synchronized between all processes. Here is a great introduction to multiprocessing: http://www.doughellmann.com/PyMOTW/multiprocessing/

Python multi threading Yay or nay?

I have been trying to write a simple python application to implement a worker queue
every webpage I found about threading has some random guy commenting on it, you shouldn't use python threading because this or that, can someone help me out? what is up with Python threading, can I use it or not? if yes which lib? the standard one is good enough?

Python's threads are perfectly viable and useful for many tasks. Since they're implemented with native OS threads, they allow executing blocking system calls and keep "running" simultaneously - by calling the blocking syscall in a separate thread. This is very useful for programs that have to do multiple things at the same time (i.e. GUIs and other event loops) and can even improve performance for IO bound tasks (such as web-scraping).
However, due to the Global Interpreter Lock, which precludes the Python interpreter of actually running more than a single thread simultaneously, if you expect to distribute CPU-intensive code over several CPU cores with threads and improve performance this way, you're out of luck. You can do it with the multiprocessing module, however, which provides an interface similar to threading and distributes work using processes rather than threads.
I should also add that C extensions are not required to be bound by the GIL and many do release it, so C extensions can employ multiple cores by using threads.
So, it all depends on what exactly you need to do.

You shouldn't need to use
threading. 95% of code does not need
threads.
Yes, Python threading is
perfectly valid, it's implemented
through the operating system's native
threads.
Use the standard library
threading module, it's excellent.

GIL should provide you some information on that topic.

Parallelism in Python

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:
Message passing processes, possibly distributed across a cluster, e.g. MPI.
Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
Implicit shared memory parallelism, using OpenMP.
Deciding on an approach to use is an exercise in trade-offs.
In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.
In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.
Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.
You can write your performance critical code in C or Cython, and use Python for the glue.

The new (2.6) multiprocessing module is the way to go. It uses subprocesses, which gets around the GIL problem. It also abstracts away some of the local/remote issues, so the choice of running your code locally or spread out over a cluster can be made later. The documentation I've linked above is a fair bit to chew on, but should provide a good basis to get started.

Ray is an elegant (and fast) library for doing this.
The most basic strategy for parallelizing Python functions is to declare a function with the #ray.remote decorator. Then it can be invoked asynchronously.
import ray
import time
# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)
#ray.remote
def f():
time.sleep(1)
# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])
You can also parallelize stateful computation using actors, again by using the #ray.remote decorator.
# This assumes you already ran 'import ray' and 'ray.init()'.
import time
#ray.remote
class Counter(object):
def __init__(self):
self.x = 0
def inc(self):
self.x += 1
def get_counter(self):
return self.x
# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()
#ray.remote
def update_counters(counter1, counter2):
for _ in range(1000):
time.sleep(0.25)
counter1.inc.remote()
counter2.inc.remote()
# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
# Check the counter values.
for _ in range(5):
counter1_val = ray.get(counter1.get_counter.remote())
counter2_val = ray.get(counter2.get_counter.remote())
print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
time.sleep(1)
It has a number of advantages over the multiprocessing module:
The same code runs on a single multi-core machine as well as a large cluster.
Data is shared efficiently between processes on the same machine using shared memory and efficient serialization.
You can parallelize Python functions (using tasks) and Python classes (using actors).
Error messages are propagated nicely.
Ray is a framework I've been helping develop.

Depending on how much data you need to process and how many CPUs/machines you intend to use, it is in some cases better to write a part of it in C (or Java/C# if you want to use jython/IronPython)
The speedup you can get from that might do more for your performance than running things in parallel on 8 CPUs.

There are many packages to do that, the most appropriate as other said is multiprocessing, expecially with the class "Pool".
A similar result can be obtained by parallel python, that in addition is designed to work with clusters.
Anyway, I would say go with multiprocessing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.