Is Multithreading (in python) the same as calling the script multiple times?

Is Multithreading (in python) the same as calling the script multiple times? - python

Let's assume we have some task, that could be divided into independent subtasks and we want to process these tasks in parallel on the same machine.
I read about multithreading and ran into this post, which describes GlobalInterpreterLocks. Since I do not understand fully how processes are handled under the hood, I got to ask:
Putting aside the gain of threading: Is Multithreading (in my case in python) effectively the same as calling a script multiple times?
I hope this question does not lead to far and its answer is understandable for someone whose knowledge about the things happening on the low levels of a computer are sparse. Thanks for any enlightening in this matter.

Is Multithreading (in my case in python) effectivle the same as calling a script multiple times?
In a word, no.
Due to the GIL, in Python it is far easier to achieve true parallelism by using multiple processes than it is by using multiple threads. Calling the script multiple times (presumably with different arguments) is an example of using multiple processes. The multiprocessing module is another way to achieve parallelism by using multiple processes. Both are likely to give better performance than using threads.
If I were you, I'd probably consider multiprocessing as the first choice for distributing work across cores.

It is not the same thing one is Multithreading while the other is opening separate process for one another:
here is a short explanation taken from here :
It is important to first define the differences between processes and
threads. Threads are different than processes in that they share
state, memory, and resources. This simple difference is both a
strength and a weakness for threads. On one hand, threads are
lightweight and easy to communicate with, but on the other hand, they
bring up a whole host of problems including deadlocks, race
conditions, and sheer complexity. Fortunately, due to both the GIL and
the queuing module, threading in Python is much less complex to
implement than in other languages.

Related

Why does Python not use 100% of the processor? [duplicate]

I notice when I run my heavily CPU dependant python programs, it only uses a single core. Is it possible to assign multiple cores to the program when I run it?

You have to program explicitly for multiple cores. See the Symmetric Multiprocessing options on this page for the many parallel processing solutions in Python. Parallel Python is a good choice if you can't be bothered to compare the options, look at the examples here.
Some problems can't take advantage of multiple cores though. Think about how you could possibly run up the stairs faster with the help of three friends. Not going to happen!

I wonder why nobody mentioned CPython's GIL (Global Interpreter Lock) yet. It basically means that multiple threads inside one Python interpreter cannot use the power of multiple cores because many operations are protected by a global lock in order to be thread-safe. This only applies to a small amount of applications - the CPU-bound ones. For more info, just search for the term "GIL", there are already many questions on it (like that one, for example).
This answer of course assumes that you are in fact using multiple threads, or else you won't be able to use multiple cores anyway (multiprocessing would be another possibility).

If any part of your problem can be run in parallel, you should look into the multiprocessing module

Why do we blame GIL if CPU can execute one process (light weight) at a time? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Can standard C Python has more than one thread running at the same time? [duplicate]

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Multithreading With Very Large Number of Threads

I'm working on simulating a mesh network with a large number of nodes. The nodes pass data between different master nodes throughout the network.
Each master comes live once a second to receive the information, but the slave nodes don't know when the master is up or not, so when they have information to send, they try and do so every 5 ms for 1 second to make sure they can find the master.
Running this on a regular computer with 1600 nodes results in 1600 threads and the performance is extremely bad.
What is a good approach to handling the threading so each node acts as if it is running on its own thread?
In case it matters, I'm building the simulation in python 2.7, but I'm open to changing to something else if that makes sense.

For one, are you really using regular, default Python threads available in the default Python 2.7 interpreter (CPython), and is all of your code in Python? If so, you are probably not actually using multiple CPU cores because of the global interpreter lock CPython has (see https://wiki.python.org/moin/GlobalInterpreterLock). You could maybe try running your code under Jython, just to check if performance will be better.
You should probably rethink your application architecture and switch to manually scheduling events instead of using threads, or maybe try using something like greenlets (https://stackoverflow.com/a/15596277/1488821), but that would probably mean less precise timings because of lack of parallelism.

To me, 1600 threads sounds like a lot but not excessive given that it's a simulation. If this were a production application it would probably not be production-worthy.
A standard machine should have no trouble handling 1600 threads. As to the OS this article could provide you with some insights.
When it comes to your code a Python script is not a native application but an interpreted script and as such will require more CPU resources to execute.
I suggest you try implementing the simulation in C or C++ instead which will produce a native application which should execute more efficiently.

Do not use threading for that. If sticking to Python, let the nodes perform their actions one by one. If the performance you get doing so is OK, you will not have to use C/C++. If the actions each node perform are simple, that may work. Anyway, there is no reason to use threads in Python at all. Python threads are usable mostly for making blocking I/O not to block your program, not for multiple CPU kernels utilization.
If you want to really use parallel processing and to write your nodes as if they were really separated and exchanging only using messages, you may use Erlang (http://www.erlang.org/). It is a functional language very well suited for executing parallel processes and making them exchange messages. Erlang processes do not map to OS threads, and you may create thousands of them. However, Erlang is a purely functional language and may seem extremely strange if you have never used such languages. And it also is not very fast, so, like Python, it is unlikely to handle 1600 actions every 5ms unless the actions are rather simple.
Finally, if you do not get desired performance using Python or Erlang, you may move to C or C++. However, still do not use 1600 threads. In fact, using threads to gain performance is reasonable only if the number of threads does not dramatically exceed number of CPU kernels. A reactor pattern (with several reactor threads) is what you may need in that case (http://en.wikipedia.org/wiki/Reactor_pattern). There is an excellent implementation of the reactor pattern in boost.asio library. It is explained here: http://www.gamedev.net/blog/950/entry-2249317-a-guide-to-getting-started-with-boostasio/

Some random thoughts here:
I did rather well with several hundred threads working like this in Java; it can be done with the right language. (But I haven't tried this in Python.)
In any language, you could run the master node code in one thread; just have it loop continuously, running the code for each master in each cycle. You'll lose the benefits of multiple cores that way, though. On the other hand, you'll lose the problems of multithreading, too. (You could have, say, 4 such threads, utilizing the cores but getting the multithreading headaches back. It'll keep the thread-overhead down, too, but then there's blocking...)
One big problem I had was threads blocking each other. Enabling 100 threads to call the same method on the same object at the same time without waiting for each other requires a bit of thought and even research. I found my multithreading program at first often used only 25% of a 4-core CPU even when running flat out. This might be one reason you're running slow.
Don't have your slave nodes repeat sending data. The master nodes should come alive in response to data coming in, or have some way of storing it until they do come alive, or some combination.
It does pay to have more threads than cores. Once you have two threads, they can block each other (and will if they share any data). If you have code to run that won't block, you want to run it in its own thread so it won't be waiting for code that does block to unblock and finish. I found once I had a few threads, they started to multiply like crazy--hence my hundreds-of-threads program. Even when 100 threads block at one spot despite all my brilliance, there's plenty of other threads to keep the cores busy!

Algorithm should I create a new thread?

Is there an algorithm that checks whether creating a new thread pays off performance wise?
I'll set a maximum of threads that can be created anyway but if I add just one task it wouldn't be an advantage to start a new thread for that.
The programming language I use is python.
Edit 1#
Can this question even be answered or is it to general because it depends on what the threads work on?

python (at least standard CPython) is a special case, because it won't run more than one thread at a time, therefore if you are doing number-crunching on a multiple cores, then pure python isn't really the best choice.
In CPython, while running python code, only one thread is executing. It protected by the Global Interpreter Lock. If you're going IO or sleeping or waiting on the other hand, then python threads make sense.
If you are number-crunching then you probably want to do that in a C-extension anyway. Failing that the multiprocessing library provides a way for pure python code to take advantage of multiple cores.
In the general, non-python, case: the question can't be answered, because it depend on:
Will running tasks on a new thread be faster at all>
What is the cost of starting a new thread?
What sort of work do the tasks contain? (IO-bound, CPU-bound, network-bound, user-bound)
How efficient is the OS at scheduling threads?
How much shared data/locking do the tasks need?
What dependencies exist between tasks?
If your tasks are independent and CPU-bound, then running one per-CPU core is probably best - but in python you'll need multiple processes to take advantage.

Rule of thumb: if a thread is going to do input/output, it may be worth separating it.
If it's doing number-crunching then optimum number of threads is number of CPUs.

Testing will tell you.
Basicly try, and benchmark.

There is no general answer for this ... yet. But there is a trend. Since computers get more and more CPU cores (try to buy a new single CPU PC ...), using threads becomes the de-facto standard.
So if you have some work which can be parallelized, then by all means use the threading module and create a pool of workers. That won't make your code much slower in the worst case (just one thread) but it can make it much faster if a user has a more powerful PC.
In the worst case, your program will complete less than 100ms later -> people won't notice the slowdown but they will notice the speedup with 8 cores.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.