Python, Using multiple threading.Thread objects increases execution time of each thread

Python, Using multiple threading.Thread objects increases execution time of each thread - python

I have found that when using the threading.Thread class, if I have multiple threads running at the same time, the execution of each thread slows down. Here is a small sample program that demonstrates this.
If I run it with 1 thread each iteration takes about half a second on my computer. If I run it with 4 threads each iteration takes around 4 seconds.
Am I missing some key part of subclassing the threading.Thread object?
Thanks in advance
import sys
import os
import time
from threading import Thread
class LoaderThread(Thread):
def __init__(self):
super(LoaderThread,self).__init__()
self.daemon = True
self.start()
def run(self):
while True:
tic = time.time()
x = 0
for i in range(int(1e7)):
x += 1
print 'took %f sec' % (time.time()-tic)
class Test(object):
def __init__(self, n_threads):
self.n_threads = n_threads
# kick off threads
self.threads = []
for i in range(self.n_threads):
self.threads.append(LoaderThread())
if __name__ == '__main__':
print 'With %d thread(s)' % int(sys.argv[1])
test = Test(int(sys.argv[1]))
time.sleep(10)

In CPython, only one line of python can be executed at a time because of the GIL.
The GIL only matters for CPU-bound processes. IO-bound processes still get benefits from threading (as the GIL is released). Since your program is "busy" looping in python code, you don't see any performance benefits from threading here.
Note that this is a CPython (implementation) detail, and not strictly speaking part of the language python itself. For example, Jython and IronPython have no GIL and can have truly concurrent threads.
Look at multiprocessing module rather than threading if you want better concurrency in CPython.

That's because CPython doesn't actually do simultaneous threading; CPython only allows one thread of Python code to run at a time: i.e.
Thread 1 runs, no other thread runs...
Thread 2 runs, no other thread runs.
This behavior is because of the Global Interpreter Lock However, during IO the GIL is released, allowing IO-bound processes to run concurrently.

Related

Some doubts about Thread Pool Executor and Thread in python

Recently,I tried to use asyncio to execute multiple blocking operations asynchronously.I used the function loop.run_in_executor,It seems that the function puts tasks into the thread pool.As far as I know about thread pool,it reduces the overhead of creating and destroying threads,because it can put in a new task when a task is finished instead of destroying the thread.I wrote the following code for deeper unstanding.
def blocking_funa():
print('starta')
print('starta')
time.sleep(4)
print('enda')
def blocking_funb():
print('startb')
print('startb')
time.sleep(4)
print('endb')
loop = asyncio.get_event_loop()
tasks = [loop.run_in_executor(None, blocking_funa), loop.run_in_executor(None, blocking_funb)]
loop.run_until_complete(asyncio.wait(tasks))
and the output:
starta
startbstarta
startb
(wait for about 4s)
enda
endb
we can see these two tasks are almost simultaneous.now I use threading module:
threads = [threading.Thread(target = blocking_ioa), threading.Thread(target = blocking_iob)]
for thread in threads:
thread.start()
thread.join()
and the output:
starta
starta
enda
startb
startb
endb
Due to the GIL limitation, only one thread is executing at the same time，so I understand the output.But how does thread pool executor make these two tasks almost simultaneous.What is the different between thread pool and thread?And Why does thread pool look like it's not limited by GIL?

You're not making a fair comparison, since you're joining the first thread before starting the second.
Instead, consider:
import time
import threading
def blocking_funa():
print('a 1')
time.sleep(1)
print('a 2')
time.sleep(1)
print('enda (quick)')
def blocking_funb():
print('b 1')
time.sleep(1)
print('b 2')
time.sleep(4)
print('endb (a few seconds after enda)')
threads = [threading.Thread(target=blocking_funa), threading.Thread(target=blocking_funb)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
The output:
a 1
b 1
b 2
a 2
enda (quick)
endb (a few seconds after enda)
Considering it hardly takes any time to run a print statement, you shouldn't read too much into the prints in the first example getting mixed up.
If you run the code repeatedly, you may find that b 2 and a 2 will change order more or less randomly. Note how in my posted result, b 2 occurred before a 2.
Also, regarding your remark "Due to the GIL limitation, only one thread is executing at the same time" - you're right that the "execution of any Python bytecode requires acquiring the interpreter lock. This prevents deadlocks (as there is only one lock) and doesn’t introduce much performance overhead. But it effectively makes any CPU-bound Python program single-threaded." https://realpython.com/python-gil/#the-impact-on-multi-threaded-python-programs
The important part there is "CPU-bound" - of course you would still benefit from making I/O-bound code multi-threaded.

Python does a release/acquire on the GIL often. This means that runnable GIL controlled threads will all get little sprints. Its not parallel, just interleaved. More important for your example, python tends to release the GIL when doing a blocking operation. The GIL is released before sleep and also when print enters the C libraries.

Python multiprocessing seems slower than regular execution

In the code below, I am generating cube of a number 9999 and calling the same via thread pool and normal method.
I am timing the difference between the same. Seems like the normal method is way faster. I am running this on a i7 8th gen intel processor with 16 gig ram inside a python 2.7 terminal.
I am baffled by this. May be I am missing something. I hope this question is helpful for people in the future.
import time
from multiprocessing.pool import ThreadPool
def cube():
return 9999*9999*9999
print "Start Execution Threading: "
x = int(round(time.time() * 1000))
pool = ThreadPool()
for i in range(0,100):
result = pool.apply_async(cube, ())
result = pool.apply_async(cube, ())
result = pool.apply_async(cube, ())
# print result.get()
pool.close()
pool.join()
print "Stop Execution Threading: "
y = int(round(time.time() * 1000))
print y-x
print "Start Execution Main: "
x = int(round(time.time() * 1000))
for i in range(0,100):
cube()
cube()
cube()
print "Stop Execution Main: "
y = int(round(time.time() * 1000))
print y-x

Multiprocessing means you will start a new thread. That comes with quite an overhead in that it must be initialized. As such, multi-threading only pays off, especially in python, when you parallelize tasks which all on their own take considerable time to execute (in comparison to python start-up time) and which can be allowed to run asyncronously.
In your case, a simple multiplication, is so quickly executed it will not pay off.

Because of from multiprocessing.pool import ThreadPool, you are using multi-threading and not multi-processing. CPython uses a Global Interpreter Lock to prevent more than one thread to execute Python code at the same time.
So as your program is CPU-bounded, you add the threading overhead with no benefits because of the GIL. Multi-threading does make sense in Python for IO-bounded problem, because a thread can run while others are waiting for IO completion.
You could try to use true multiprocessing, because then each Python process will have its own GIL, but I am still unsure of the gain, because the communication between processes adds even more overhead...

Python Threading/ThreadPool implementation

I have the following two snippets showing the power of threading and was wondering what the difference is for each implementation.
from multiprocessing.dummy import Pool as ThreadPool
def threadInfiniteLoop(passedNumber):
while 1:
print passedNumber
if __name__ == '__main__':
packedVals={
'number':[0,1,2,3,4,5,6,7,8,9]
}
pool = ThreadPool(len(packedVals['number']))
pool.map(func=threadInfiniteLoop,iterable=packedVals['number'])
and
import threading
def threadLoop(numberPassed):
while 1:
print numberPassed
if __name__ == '__main__':
for number in range(10):
t = threading.Thread(target=threadLoop, args=(number,))
t.start()
What is the difference between the two snippets and their initialization's of each thread? Is there a benefit of one over the other and what would be a desirable situation where one would be more applicable than the other?

There's not much difference when you want to start a thread that runs forever.
Normally, you use a thread pool when your program continually creates new finite tasks to perform "in the background" (whatever that means).
Creating and destroying threads is relatively expensive, so it makes more sense to have a small number of threads that stick around for a long time, and then use those threads over and over again to perform the background tasks. That's what a thread pool does for you.
There's usually no point in creating a thread pool when all you want is a single thread that never terminates.

Why does not multi-threads save time in my Python script?

I have the following script which utilizes threading module in order to save time when doing cycle.
import threading, time, sys
def cycle(start, end):
for i in range(start, end):
pass
#########################################################
thread1 = threading.Thread(target = cycle, args=(1,1000000))
thread2 = threading.Thread(target = cycle, args=(1000001,2000000))
thread1.start()
thread2.start()
print 'start join'
thread1.join()
thread2.join()
print 'end join'
However, I found the the script cost even more time than the one without multi-threads (cycle(1, 2000000)).
What might be the reason and how can I save time?

Threads are often not useful in Python because of the global interpreter lock: only one thread can run Python code at a time.
There are cases where the GIL doesn't cause much of a bottleneck, e.g. if your threads are spending most of their time calling thread-safe native (non-Python) functions, but your program doesn't appear to be one of those cases. So even with two threads, you're basically running just one thread at a time, plus there's the overhead of two threads contending for a lock.

Non blocking python process or thread

I have a simple app that listens to a socket connection. Whenever certain chunks of data come in a callback handler is called with that data. In that callback I want to send my data to another process or thread as it could take a long time to deal with. I was originally running the code in the callback function, but it blocks!!
What's the proper way to spin off a new task?

threading is the threading library usually used for resource-based multithreading. The multiprocessing library is another library, but designed more for running intensive parallel computing tasks; threading is generally the recommended library in your case.
Example
import threading, time
def my_threaded_func(arg, arg2):
print "Running thread! Args:", (arg, arg2)
time.sleep(10)
print "Done!"
thread = threading.Thread(target=my_threaded_func, args=("I'ma", "thread"))
thread.start()
print "Spun off thread"

The multiprocessing module has worker pools. If you don't need a pool of workers, you can use Process to run something in parallel with your main program.

import threading
from time import sleep
import sys
# assume function defs ...
class myThread (threading.Thread):
def __init__(self, threadID):
threading.Thread.__init__(self)
self.threadID = threadID
def run(self):
if self.threadID == "run_exe":
run_exe()
def main():
itemList = getItems()
for item in itemList:
thread = myThread("run_exe")
thread.start()
sleep(.1)
listenToSocket(item)
while (thread.isAlive()):
pass # a way to wait for thread to finish before looping
main()
sys.exit(0)
The sleep between thread.start() and listenToSocket(item) ensures that the thread is established before you begin to listen. I implemented this code in a unit test framework were I had to launch multiple non-blacking processes (len(itemList) number of times) because my other testing framework (listenToSocket(item)) was dependent on the processes.
un_exe() can trigger a subprocess call that can be blocking (i.e. invoking pipe.communicate()) so that output data from the execution will still be printed in time with the python script output. But the nature of threading makes this ok.
So this code solves two problems - print data of a subprocess without blocking script execution AND dynamically create and start multiple threads sequentially (makes maintenance of the script better if I ever add more items to my itemList later).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, Using multiple threading.Thread objects increases execution time of each thread - python

Related

Some doubts about Thread Pool Executor and Thread in python

Python multiprocessing seems slower than regular execution

Python Threading/ThreadPool implementation

Why does not multi-threads save time in my Python script?

Non blocking python process or thread

Categories

Resources