Why does not multi-threads save time in my Python script? - python

I have the following script which utilizes threading module in order to save time when doing cycle.
import threading, time, sys
def cycle(start, end):
for i in range(start, end):
pass
#########################################################
thread1 = threading.Thread(target = cycle, args=(1,1000000))
thread2 = threading.Thread(target = cycle, args=(1000001,2000000))
thread1.start()
thread2.start()
print 'start join'
thread1.join()
thread2.join()
print 'end join'
However, I found the the script cost even more time than the one without multi-threads (cycle(1, 2000000)).
What might be the reason and how can I save time?

Threads are often not useful in Python because of the global interpreter lock: only one thread can run Python code at a time.
There are cases where the GIL doesn't cause much of a bottleneck, e.g. if your threads are spending most of their time calling thread-safe native (non-Python) functions, but your program doesn't appear to be one of those cases. So even with two threads, you're basically running just one thread at a time, plus there's the overhead of two threads contending for a lock.

Related

Basic python threading is not working. What am I missing in this?

I am trying to use python threading and am having problems getting the threads to work independently. They seem to be running in sequential order and waiting for one to finish before starting to process the next thread. I have read other posts suggesting that I need to get more work into the threads to differentiate actual CPU work vs the CPU work of starting and managing the threads, and that a sleep timer could be used to simulate this. So I tried that and then measured the task durations.
So my code is below. It first runs three tasks sequentially with a 2 second timer. This takes about 6 seconds to run as expected. The next section starts three threads and they should run roughly in parallel if my understanding of threading is correct. I have played with the timers to test the overall duration of this section of code, expecting that if one timer is larger than the other two, the code will execute in an interval closest to that larger one. but what I am seeing is that is taking the same amount of time as the three running in sequence - one after the other.
I got onto this because I am writing some code to read an asynchronous queue in the background. After launching the thread to read the queue, my code seems to stop and wait until the queue reader is stopped, which it normally doesn't as it waits for messages to come in. So what happens is that it never executes the next section of code and it seems to be waiting for the thread to complete.
Also I checked the number of threads active and it remains at the same number, and when I check for the thread ID in the code (not shown) I get the same thread number coming back for every thread.
I am new to python and am using the jupyter compiler environment. Is there a compile option or some other limitation that I am not aware of that is preventing the threading? Am I just not getting the concept? I dont believe that this is related to CPU cores / threading as it would be done through logical thread cores within the python compiled code. I also ran a similar program in a command shell environment and got the same sequential performance.
Cut and paste this code to see what it does. What am I missing?
'''
import threading
import logging
import datetime
import time
import random
class Parallel:
def work(self, interval):
time.sleep(interval)
name = self.__repr__()
print (name, " is complete after ", interval, " seconds")
# SetupLogger()
logging.getLogger().setLevel(logging.DEBUG)
logging.debug("thread program start time is %s", datetime.datetime.now())
thread1 = Parallel()
thread2 = Parallel()
thread3 = Parallel()
print ("sequential threads::")
thread1.work(2.0)
thread2.work(2.0)
thread3.work(2.0)
logging.info("parallel threads start time is %s ", datetime.datetime.now())
start = time.time()
work1 = threading.Thread(target=thread1.work(1), daemon=True)
work1.start()
print ("thread 1 is started and there are ", threading.activeCount(), " threads active")
work2 = threading.Thread(target=thread2.work(2), daemon=False)
work2.start()
print ("thread 2 is started and there are ", threading.activeCount(), " threads active")
work3 = threading.Thread(target=thread3.work(5), daemon=False)
work3.start()
print ("thread 3 is started and there are ", threading.activeCount(), " threads active")
# wait for all to complete
print ("now wait for all to finish at ", datetime.datetime.now())
work1.join()
work2.join()
work3.join()
end = time.time()
logging.info ("parallel threads end time is %s with %s elapsed", datetime.datetime.now(), str(end-start))
print ("all threads completed at:", datetime.datetime.now())
'''
In the line that initializes the thread, you are actually executing the function instead of passing its reference to the thread.
thread1.work() ----> this will actually execute the function when the program runs and encounters this statement
So when your program reaches this line,
work1 = threading.Thread(target=thread1.work(1), daemon=True)
and encounters target=thread1.work(1), it simply calls the function right there and the actual thread does nothing.
thread1.work is a reference to the function, which you need to pass to your Thread object.
So just remove the parenthesis and your code becomes
work1 = threading.Thread(target=thread1.work, daemon=True, args=(1,))
and this will behave as you expect.

Python gil strange behaviour

Look at this piece of code:
from threading import Thread
import time
cpt = 0
def myfunction():
print("myfunction.start")
global cpt
for x in range(10):
cpt += 1
time.sleep(0.2)
print("cpt=%d" % (cpt))
print("myfunction.end")
thread1 = Thread(target=myfunction)
thread2 = Thread(target=myfunction)
thread1.start()
thread2.start()
This is a very basic function which reads/write a global variable.
I am running 2 threads on this same function.
I have read that python is not very efficient with multi-threading because of GIL, which automaticly locks functions or methods which access to the same resources.
So, i was thinking that python will first run thread1, and then thread2, but i can see in the console output that the 2 threads are run in parallel.
So i do not understand what gil is really locking...
Thanks
That's because of the sleep system call which releases the CPU (and even "exits" from the interpreter for a while)
when you do time.sleep(0.2), the current thread is suspended by the system (not by Python) for a given amount of time, and the other thread is allowed to work.
Note that the print statements or threading.current_thread() that you could insert to spy the threads also yield (briefly) to the system so threads can switch because of that (remember Schrodinger's cat). The real test would be this:
from threading import Thread
import time
cpt = 0
def myfunction():
global cpt
for x in range(10):
cpt += 1
time.sleep(0.2)
print(cpt)
thread1 = Thread(target=myfunction)
thread2 = Thread(target=myfunction)
thread1.start()
thread2.start()
Here you get
20
20
which means that each thread worked to increase the counter in turn.
now comment the time.sleep() statement, and you'll get:
10
20
which means that first thread took all the increasing, ended and let the second thread increase the further 10 counts. No system calls (even print) ensure that the GIL works at full.
GIL doesn't induce a performance problem, it just prevents 2 threads to run in parallel. If you need to really run python code in parallel, you have to use the multiprocessing module instead (with all its constraints, the pickling, the forking...)

Multi Threading Makes Process Slower [duplicate]

This question already has answers here:
python multi-threading slower than serial?
(3 answers)
Closed 7 years ago.
I have the following task that I would like to make faster via multi threading (python3).
import threading, time
q = []
def fill_list():
global q
while True:
q.append(1)
if len(q) >= 1000000000:
return
The first main does not utilize multithreading:
t1 = time.clock()
fill_list()
tend = time.clock() - t1
print(tend)
And results in 145 seconds of run time.
The second invokes two threads:
t1 = time.clock()
thread1 = threading.Thread(target=fill_list, args=())
thread2 = threading.Thread(target=fill_list, args=())
thread1.start()
thread2.start()
thread1.join()
thread2.join()
tend = time.clock() - t1
print(tend)
This takes 152 seconds to complete.
Finally, I added a third thread.
t1 = time.clock()
thread1 = threading.Thread(target=fill_list, args=())
thread2 = threading.Thread(target=fill_list, args=())
thread3 = threading.Thread(target=fill_list, args=())
thread1.start()
thread2.start()
thread3.start()
thread1.join()
thread2.join()
thread3.join()
tend = time.clock() - t1
print(tend)
And this took 233 seconds to complete.
Obviously the more threads I add, the longer the process takes, though I am not sure why. Is this a fundamental misunderstanding of multithreading, or is there a bug in my code that is simply repeating the task multiple times instead of contributing to the same task?
Answers 1 and 2.
First of all, your task is CPU-bound, and in a Python process only one thread may be running CPU-bound Python code at any given time (this is due to the Global Interpreter Lock: https://wiki.python.org/moin/GlobalInterpreterLock ). Since it costs quite a bit of CPU to switch threads (and the more threads you have, the more often you have to pay that cost), your program doesn't speed up: it slows down.
Second, no matter what language you're using, you're modifying one object (a list) from multiple threads. But to guarantee that this does not corrupt the object, access must be synchronized. In other words, only one thread may be modifying it at any given time. Python does it automatically (thanks in part to the aforementioned GIL), but in another lower-level language like C++ you'd have to use a lock or risk memory corruption.
The optimal way to parallelize tasks across threads is to ensure that the threads are as isolated as possible. If they access shared objects, those should be read-only, and cross-thread writes should happen as infrequently as possible, through thread-aware data structures such as message queues.
(which is why the most performant parallel systems like Erlang and Clojure place such a high emphasis on immutable data structures and message-passing)

Why threads created with `_thread.start_new_thread` don't print anything?

I found this simple code at https://code.google.com/p/pyloadtools/wiki/CodeTutorialMultiThreading
import _thread
def hello(num):
print('hello from thread %s\n' % num)
_thread.start_new_thread(hello, (0,))
_thread.start_new_thread(hello, (1,))
_thread.start_new_thread(hello, (2,))
But when I run this, it works on IDLE, but not on eclipse which uses PyDev. Any idea how to fix it?
Note: I think the main program terminates before the threads run. The threads dont get enough time to run I guess. How do I fix it? May be should I use join?
Quoting the Caveats section of _thread documentation,
When the main thread exits, it is system defined whether the other threads survive. On most systems, they are killed without executing try ... finally clauses or executing object destructors.
When the main thread exits, it does not do any of its usual cleanup (except that try ... finally clauses are honored), and the standard I/O files are not flushed.
There are two possibilities here.
The main thread starts three threads but it exits before the threads finish the execution. So, the standard I/O files are not flushed, as they are buffered, by default.
Or, the main thread dies, and as per the first bullet point quoted, all the child threads are killed in action.
Either way, you need to make sure the main thread doesn't die before the children complete.
But when you run from IDLE, the main thread still exists, so, the I/O buffers are flushed when the threads actually complete. That is why it works in IDLE but not in eclipse.
To make sure that the main thread exits only after all the threads complete, you can make it wait for the child threads with
1. Semaphore
You can use Semaphore, like this
import _thread
import threading
def hello(num):
print('hello from thread %s' % num)
# Release semaphore when the thread is actually done
sem.release()
def create_thread(value):
# Acquire semaphore when the thread is actually created
sem.acquire()
_thread.start_new_thread(hello, (value,))
# Counting semaphore. Maximum three threads can acquire.
# Next acquire call has to wait till somebody releases
sem = threading.Semaphore(3)
for i in range(3):
create_thread(i)
# We are capturing the semaphore three times again, because
# whenever a thread completes it releases it. So, only when we
# acquire it thrice to make sure that all threads have completed.
for i in range(3):
sem.acquire()
2. Lock Objects
Alternatively, you can use the _thread.lock objects, like this
import _thread
locks = []
def hello(num, lockobject):
print('hello from thread %s' % num)
# Release the lock as we are done here
lockobject.release()
def create_thread(value):
# Create a lock and acquire it
a_lock = _thread.allocate_lock()
a_lock.acquire()
# Store it in the global locks list
locks.append(a_lock)
# Pass it to the newly created thread which can release the lock once done
_thread.start_new_thread(hello, (value, a_lock))
for i in range(3):
create_thread(i)
# Acquire all the locks, which means all the threads have released the locks
all(lock.acquire() for lock in locks)
Now you will see that the program always prints the hello from message.
Note: As the documentation says, _thread is a Low-level threading API. So, better use higher level module like threading, where you can simply wait for the all the threads to exit with join method.
From https://docs.python.org/3/library/_thread.html#module-_thread
The threading module provides an easier to use and higher-level threading API built on top of this module.
The module is optional.
So please use threading, not the optional _thread module.

Python, Using multiple threading.Thread objects increases execution time of each thread

I have found that when using the threading.Thread class, if I have multiple threads running at the same time, the execution of each thread slows down. Here is a small sample program that demonstrates this.
If I run it with 1 thread each iteration takes about half a second on my computer. If I run it with 4 threads each iteration takes around 4 seconds.
Am I missing some key part of subclassing the threading.Thread object?
Thanks in advance
import sys
import os
import time
from threading import Thread
class LoaderThread(Thread):
def __init__(self):
super(LoaderThread,self).__init__()
self.daemon = True
self.start()
def run(self):
while True:
tic = time.time()
x = 0
for i in range(int(1e7)):
x += 1
print 'took %f sec' % (time.time()-tic)
class Test(object):
def __init__(self, n_threads):
self.n_threads = n_threads
# kick off threads
self.threads = []
for i in range(self.n_threads):
self.threads.append(LoaderThread())
if __name__ == '__main__':
print 'With %d thread(s)' % int(sys.argv[1])
test = Test(int(sys.argv[1]))
time.sleep(10)
In CPython, only one line of python can be executed at a time because of the GIL.
The GIL only matters for CPU-bound processes. IO-bound processes still get benefits from threading (as the GIL is released). Since your program is "busy" looping in python code, you don't see any performance benefits from threading here.
Note that this is a CPython (implementation) detail, and not strictly speaking part of the language python itself. For example, Jython and IronPython have no GIL and can have truly concurrent threads.
Look at multiprocessing module rather than threading if you want better concurrency in CPython.
That's because CPython doesn't actually do simultaneous threading; CPython only allows one thread of Python code to run at a time: i.e.
Thread 1 runs, no other thread runs...
Thread 2 runs, no other thread runs.
This behavior is because of the Global Interpreter Lock However, during IO the GIL is released, allowing IO-bound processes to run concurrently.

Categories