I am running a python program in a server having python2.7.6 . I have used threading module of python to create multiple threads . I have created 23 threads , so i am confused whether all my processor cores are actually being used or not , Is there any way i can check this in my server . Any suggestion as what should be the ideal number of threads that should be spawned according to the number of processors that we have in order to improve efficiency of my program.
David Beazly has a great talk on threading in Python. He also has a great presentation on the subject here.
Unfortunately, Python has something called the GIL which limits Python to executing a single thread at a time. To use all your cores you'd have to use multiple processes: See Multiprocessing.
Some in the Python community don't necessarily look at the GIL as a setback, you can utilize multiple cores through other means than shared-memory threading.
Look here for a great blog post on utilizing multiple cores in Python.
Edit:
The above is true for CPython (the most common and also the reference implementation of Python). There's a few "yes but" answers out there (mostly referring to multithreading on a different Python implementation) but I'd generally point people to answers like this one that describe the GIL in CPython and alternatives for utilizing multiple cores6
There is no real answer to this question and everyone might have a different view. Determining the number of threads for your application to improve performance should be decided after testing for x scenarios with y of threads. Performance depends on how the OS scheduler is scheduling your threads to the available CPU cores depending on the CPU load or number of processes running. If you have 4 cores, then it doesn't mean it's good to execute 4 threads only. In fact you can run 23 threads too. This is what we call the illusion of parallelism given to us by the scheduler by scheduling processes after processes hence making us think everything is running simultaneously.
Here is the thing
If you run 1 thread, you may not gain enough performance. As you keep increasing threads to infinity, then the scheduler will take more time to schedule threads and hamper your overall application performance.
Related
So this is more or less a theoretical question. I have a single core machine which is supposedly powerful but nevertheless only one core. Now I have two choices to make :
Multithreading: As far as my knowledge is concerned I cannot make use of multiple cores in my machines even if I had them because of GIL. Hence in this situation, it does not make any difference.
Multiprocessing: This is where I have a doubt. Can I do multiprocessing on a single core machine? Or everytime I have to check the cores available in my machine and then run exactly the same or less number of processes?
Can someone please guide me on the relation between multiprocessing and cores in a machine.
I know this is a theoretical question but my concepts are not very clear on this.
This is a big topic but here are some pointers.
Think of threads as processes that share the same address space and can access the same memory. Communication is done by shared variables. Multiple threads can run within the same process.
Processes (in this context, and roughly speaking) have their own private data and if two processes want to communicate that communication has to be done more explicitly.
When you are writing a program where the bottleneck is CPU cycles, neither threads or processes will give you a speedup on a single core machine.
Processes and threads are still useful for multitasking (rapid switching between (sub)programs) - this is what your operating system does because it runs far more processes than you have cores.
Processes and threads (or even coroutines!) can give you considerable speedup even on a single core machine if the tasks you are executing are I/O bound - think of fetching data from a network. For example, instead of actively waiting for data to be sent or to arrive, another process or thread can initiate the next network operation.
Threads are preferable over processes when you don't need explicit encapsulation due to their lower overhead. For most CPU-bound concurrent problems, and especially the large subset of "embarassingly parallel" ones, it does not make much sense to spawn more processes than you have processors.
The Python GIL prevents two threads in the same process from running in parallel, i.e. from multiple cores executing instructions literally at the same time.
Therefore threads in Python are relatively useless for speeding up CPU-bound tasks, but can still be very useful for I/O bound tasks, because blocking operations (e.g. waiting for network data) release the GIL such that another thread can run while the other waits.
If you have multiple processors, you can have true parallelism by spawning multiple processes despite the GIL. This is only worth it for CPU bound tasks, and often you have to consider the overhead of spawning processes and the communication cost between processes.
You CAN use both multithreading and multiprocessing in single core systems.
The GIL limits the usefulness of multithreading in pure Python for computation-bound tasks, no matter your underlying architecture. For I/O-bound tasks, they do work perfectly fine. Had they had not any use, they would not have been implemented in the first place, probably.
For pure Python software, multiprocessing is always a safer choice when it comes to parallel computing. Of course, multiple processes are more expensive than multiple threads (since processes do not share memory, contrarily to threads; also, processes come with slightly higher overhead compared to threads).
For single processor machines, however, multiprocessing (and multithreading) buys you little to no extra speed for computationally heavy tasks, and they should actually even slow you down a bit. But, if the OS supports them (which is pretty common for desktop, workstation, clusters, etc, but may not be common for embedded systems), they allow you to effectively run simultaneously multiple I/O-bound programs.
Long story short, it depends a bit on what you are doing...
multiprocessing module basically spawns up multiple instances of python interpreter, so there is no worry of GIL.
multiprocessing uses the same API used by threading module if you have used it previously.
You seem to be confused between multiprocessing, threading (you referring as multithreading) and X-core processor.
No matter what, when you start Python (CPython implementation) it will only use one core of your processor.
Threading is distributing the load between the different component of the script. Suppose you have to interact with an external API, your script has to wait for communication to finish until it proceeds next. You have are making multiple similar calls, it will take linear time. Whereas if you use threading, you can do those calls parallelly.
See also: PyPy implementation of Python
I'm running a computation heavy program in Python that takes about 10 minutes to run on my system. When I look at CPU usage, one of eight cores is hovering at about 70%, a second is at about 20%, and the rest are close to 0%.Is there any way I can force the program into 100% usage of a single core?
edit: I recognize that utilizing all 8 cores isn't a great option, but is there a way to force the one core into 100% usage?
For multi-core applications you should use the multiprocessing module instead of threading. Python has some (well documented) performance issues with threads. Search google for Global Interpreter Lock for more information about Python and thread performance
Python isn't really multithreaded. You can create threads as a convenient way to manage logical processes, but the application itself has a global thread lock. Because of this, you'll only ever be able to utilize a single thread.
You can use a different flavor of python, but that may provide you with other issues. If you're willing to break it out into distinct processes, you may be able to utilize the subprocess module. This creates two distinct python processes, but they can operate independently. Unfortunately, you'll have to code a method by which they communicate.
I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.
The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.
Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.
Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.
I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")
I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.
The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.
Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.
Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.
I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")
Is there an algorithm that checks whether creating a new thread pays off performance wise?
I'll set a maximum of threads that can be created anyway but if I add just one task it wouldn't be an advantage to start a new thread for that.
The programming language I use is python.
Edit 1#
Can this question even be answered or is it to general because it depends on what the threads work on?
python (at least standard CPython) is a special case, because it won't run more than one thread at a time, therefore if you are doing number-crunching on a multiple cores, then pure python isn't really the best choice.
In CPython, while running python code, only one thread is executing. It protected by the Global Interpreter Lock. If you're going IO or sleeping or waiting on the other hand, then python threads make sense.
If you are number-crunching then you probably want to do that in a C-extension anyway. Failing that the multiprocessing library provides a way for pure python code to take advantage of multiple cores.
In the general, non-python, case: the question can't be answered, because it depend on:
Will running tasks on a new thread be faster at all>
What is the cost of starting a new thread?
What sort of work do the tasks contain? (IO-bound, CPU-bound, network-bound, user-bound)
How efficient is the OS at scheduling threads?
How much shared data/locking do the tasks need?
What dependencies exist between tasks?
If your tasks are independent and CPU-bound, then running one per-CPU core is probably best - but in python you'll need multiple processes to take advantage.
Rule of thumb: if a thread is going to do input/output, it may be worth separating it.
If it's doing number-crunching then optimum number of threads is number of CPUs.
Testing will tell you.
Basicly try, and benchmark.
There is no general answer for this ... yet. But there is a trend. Since computers get more and more CPU cores (try to buy a new single CPU PC ...), using threads becomes the de-facto standard.
So if you have some work which can be parallelized, then by all means use the threading module and create a pool of workers. That won't make your code much slower in the worst case (just one thread) but it can make it much faster if a user has a more powerful PC.
In the worst case, your program will complete less than 100ms later -> people won't notice the slowdown but they will notice the speedup with 8 cores.