Python Thread Synchronizing - python

I want to ask about the difference of the below 2 codes. In both, I have use queueLock.acquire(), but why the first code run Thread-2 after Thread-1 have finished, in the second code the Threads run in random, not waiting the previous one to finish?
First code:
class myThread (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print "Starting " + self.name
threadLock.acquire()
print_time(self.name, self.counter, 2)
threadLock.release()
def print_time(threadName, delay, counter):
while counter:
time.sleep(delay)
print "%s: %s" % (threadName, time.ctime(time.time()))
counter -= 1
threadLock = threading.Lock()
# Create new threads
thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)
# Start new Threads
thread1.start()
thread2.start()
The result is
Starting Thread-1
Starting Thread-2
Thread-1: Thu Oct 15 08:06:09 2020
Thread-1: Thu Oct 15 08:06:10 2020
Thread-2: Thu Oct 15 08:06:13 2020
Thread-2: Thu Oct 15 08:06:17 2020
Second code:
exitFlag = 0
class myThread (threading.Thread):
def __init__(self, threadID, name, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print "Starting " + self.name
process_data(self.name, self.q)
print "Exiting " + self.name
def process_data(threadName, q):
while not exitFlag:
queueLock.acquire()
if not workQueue.empty():
data = q.get()
queueLock.release()
print "%s processing %s" % (threadName, data)
else:
queueLock.release()
time.sleep(1)
threadList = ["Thread-1", "Thread-2", "Thread-3"]
nameList = ["One", "Two", "Three", "Four", "Five"]
queueLock = threading.Lock()
workQueue = Queue.Queue(10)
threads = []
threadID = 1
# Create new threads
for tName in threadList:
thread = myThread(threadID, tName, workQueue)
thread.start()
threads.append(thread)
threadID += 1
# Fill the queue
queueLock.acquire()
for word in nameList:
workQueue.put(word)
queueLock.release()
# Wait for queue to empty
while not workQueue.empty():
pass
# Notify threads it's time to exit
exitFlag = 1
# Wait for all threads to complete
for t in threads:
t.join()
print "Exiting Main Thread"
The result is
Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-1 processing One
Thread-3 processing Two
Thread-2 processing Three
Thread-1 processing Four
Thread-3 processing Five
Exiting Thread-3
Exiting Thread-2
Exiting Thread-1
Exiting Main Thread

In the first code both threads lock, do their work, then unlock. The first thread is started slightly earlier than the second, so it is the one who locks first. The second one can proceed past its lock only after the first unlocks, that is why the order is always the same.
In the second code when a thread finds that the workQueue is empty, then it releases the lock, sleeps, and tries again. This gives an opportunity for other threads to lock and check whether there is anything in the queue.
By the time the queue is filled up, most probably all of the threads are in their sleep, and there is some uncertainty about the order they wake up. This causes the "randomness" in the order they process queue elements.
It is not clear what you mean by "not waiting the previous one to finish", because they wait for each other to finish with respect to getting elements from the queue.
Also, it must be noted that your programs use an interesting mix of techniques for thread coordination: locks, synchronised queue, sleep, busy waiting, and a global variable. This is not against the law, but more diverse than it should be.

Related

Multithreading freezes when using `thread.join()`

I am trying to set up 3 thread and execute 5 tasks in a queue. The idea is that the threads will first run the first 3 tasks at the same time, then 2 threads finish the remaining 2. But the program seems freeze. I couldn't detect anything wrong with it.
from multiprocessing import Manager
import threading
import time
global exitFlag
exitFlag = 0
class myThread(threading.Thread):
def __init__(self, threadID, name, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print("Starting " + self.name)
process_data(self.name, self.q)
print("Exiting " + self.name)
def process_data(threadName, q):
global exitFlag
while not exitFlag:
if not workQueue.empty():
data = q.get()
print("%s processing %s" % (threadName, data))
else:
pass
time.sleep(1)
print('Nothing to Process')
threadList = ["Thread-1", "Thread-2", "Thread-3"]
nameList = ["One", "Two", "Three", "Four", "Five"]
queueLock = threading.Lock()
workQueue = Manager().Queue(10)
threads = []
threadID = 1
# create thread
for tName in threadList:
thread = myThread(threadID, tName, workQueue)
thread.start()
threads.append(thread)
threadID += 1
# fill up queue
queueLock.acquire()
for word in nameList:
workQueue.put(word)
queueLock.release()
# wait queue clear
while not workQueue.empty():
pass
# notify thread exit
exitFlag = 1
# wait for all threads to finish
for t in threads:
t.join()
print("Exiting Main Thread")
I don't know what happened exactly, but after I remove the join() part, the program is able to run just fun. What I don't understand is that exitFlag is supposed to have sent out the signal when the queue is emptied. So it seems somehow the signal was not detected by process_data()
There are multiple issues with your code. First of, threads in CPython don't run Python code "at the same time" because of the global interpreter lock (GIL). A thread must hold the GIL to execute Python bytecode. By default a thread holds the GIL for up to 5 ms (Python 3.2+), if it doesn't drop it earlier because it does blocking I/O. For parallel execution of Python code you would have to use multiprocessing.
You also needlessly use a Manager.Queue instead of a queue.Queue. A Manager.Queue is a queue.Queue on a separate manager-process. You introduced a detour with IPC and memory copying for no benefit here.
The cause of your deadlock is that you have a race condition here:
if not workQueue.empty():
data = q.get()
This is not an atomic operation. A thread can check workQueue.empty(), then drop the GIL, letting another thread drain the queue and then proceed with data = q.get(), which will block forever if you don't put something again on the queue. Queue.empty() checks are a general anti-pattern and there is no need to use it. Use poison pills (sentinel-values) to break a get-loop instead and to let the workers know they should exit. You need as many sentinel-values as you have workers. Find more about iter(callabel, sentinel) here.
import time
from queue import Queue
from datetime import datetime
from threading import Thread, current_thread
SENTINEL = 'SENTINEL'
class myThread(Thread):
def __init__(self, func, inqueue):
super().__init__()
self.func = func
self._inqueue = inqueue
def run(self):
print(f"{datetime.now()} {current_thread().name} starting")
self.func(self._inqueue)
print(f"{datetime.now()} {current_thread().name} exiting")
def process_data(_inqueue):
for data in iter(_inqueue.get, SENTINEL):
print(f"{datetime.now()} {current_thread().name} "
f"processing {data}")
time.sleep(1)
if __name__ == '__main__':
N_WORKERS = 3
inqueue = Queue()
input_data = ["One", "Two", "Three", "Four", "Five"]
sentinels = [SENTINEL] * N_WORKERS # one sentinel value per worker
# enqueue input and sentinels
for word in input_data + sentinels:
inqueue.put(word)
threads = [myThread(process_data, inqueue) for _ in range(N_WORKERS)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"{datetime.now()} {current_thread().name} exiting")
Example Output:
2019-02-14 17:58:18.265208 Thread-1 starting
2019-02-14 17:58:18.265277 Thread-1 processing One
2019-02-14 17:58:18.265472 Thread-2 starting
2019-02-14 17:58:18.265542 Thread-2 processing Two
2019-02-14 17:58:18.265691 Thread-3 starting
2019-02-14 17:58:18.265793 Thread-3 processing Three
2019-02-14 17:58:19.266417 Thread-1 processing Four
2019-02-14 17:58:19.266632 Thread-2 processing Five
2019-02-14 17:58:19.266767 Thread-3 exiting
2019-02-14 17:58:20.267588 Thread-1 exiting
2019-02-14 17:58:20.267861 Thread-2 exiting
2019-02-14 17:58:20.267994 MainThread exiting
Process finished with exit code 0
If you don't insist on subclassing Thread, you could also just use multiprocessing.pool.ThreadPool a.k.a. multiprocessing.dummy.Pool which does the plumbing for you in the background.

Multi-threading in Python

I am facing some issues while implementing multi-threading in python. The issue is very specific to my use case. Having gone through numerous posts on the same, I deployed the most widely suggested/used method for doing so.
I start by defining my thread class as follows.
class myThread(Thread):
def __init__(self, graphobj, q):
Thread.__init__(self)
self.graphobj = graphobj
self.q = q
def run(self):
improcess(self.graphobj, self.q)
Post which I define my function that does all the processing required.
def improcess(graphobj, q):
while not exitFlag:
queueLock.acquire()
if not q.empty():
photo_id = q.get()
queueLock.release()
# Complete processing
else:
queueLock.release()
Now comes the part where I am stuck. I am able to run the below mentioned code exactly as it is without any issues. However if I try and wrap the same in a function as such it breaks down.
def train_control(graphobj, photo_ids):
workQueue = Queue(len(photo_ids))
for i in range(1,5):
thread = myThread(graphobj=graphobj, q=workQueue)
thread.start()
threads.append(thread)
queueLock.acquire()
for photo_id in photo_ids:
workQueue.put(photo_id)
queueLock.release()
while not workQueue.empty():
pass
exitFlag = 1
for t in threads:
t.join()
By breaking down I mean that the threads complete their work but they don't stop waiting i.e. the exitFlag is never set to 1. I am unsure as to how to make this work.
Unfortunately the design of our systems is such that this piece of codes needs to be wrapped in a function which can be invoked by another module, so pulling it out is not really an option.
Looking forward to hearing from experts on this. Thanks in advance.
Edit : Forgot to mention this in the first draft. I globally initialize exitFlag and set its value to 0.
Below is the minimum, verifiable code snippet that I created to capture this problem:
import threading
import Queue
globvar01 = 5
globvar02 = 7
exitFlag = 0
globlist = []
threads = []
queueLock = threading.Lock()
workQueue = Queue.Queue(16)
class myThread(threading.Thread):
def __init__(self, threadID, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.q = q
def run(self):
print "Starting thread " + str(self.threadID)
myfunc(self.threadID, self.q)
print "Exiting thread " + str(self.threadID)
def myfunc(threadID, q):
while not exitFlag:
queueLock.acquire()
if not workQueue.empty():
thoughtnum = q.get()
queueLock.release()
print "Processing thread " + str(threadID)
if (thoughtnum < globvar01):
globlist.append([1,2,3])
elif (thoughtnum < globvar02):
globlist.append([2,3,4])
else:
queueLock.release()
def controlfunc():
for i in range(1,5):
thread = myThread(i, workQueue)
thread.start()
threads.append(thread)
queueLock.acquire()
for i in range(1,11):
workQueue.put(i)
queueLock.release()
# Wait for queue to empty
while not workQueue.empty():
pass
exitFlag = 1
# Wait for all threads to complete
for t in threads:
t.join()
print "Starting main thread"
controlfunc()
print "Exiting Main Thread"
From your MCVE, the only thing missing is:
while not workQueue.empty():
pass
global exitFlag # Need this or `exitFlag` is a local variable only.
exitFlag = 1
You could eliminate the queueLock and the exitFlag, however, by using a sentinel value in the Queue to shut down the worker threads, and it eliminates the spin-waiting. Worker threads will sleep on a q.get() and the main thread won't have to spin-wait for an empty queue:
#!python2
from __future__ import print_function
import threading
import Queue
debug = 1
console = threading.Lock()
def tprint(*args,**kwargs):
if debug:
name = threading.current_thread().getName()
with console:
print('{}: '.format(name),end='')
print(*args,**kwargs)
globvar01 = 5
globvar02 = 7
globlist = []
threads = []
workQueue = Queue.Queue(16)
class myThread(threading.Thread):
def __init__(self, threadID, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.q = q
def run(self):
tprint("Starting thread " + str(self.threadID))
myfunc(self.threadID, self.q)
tprint("Exiting thread " + str(self.threadID))
def myfunc(threadID, q):
while True:
thoughtnum = q.get()
tprint("Processing thread " + str(threadID))
if thoughtnum is None:
break
elif thoughtnum < globvar01:
globlist.append([1,2,3])
elif thoughtnum < globvar02:
globlist.append([2,3,4])
def controlfunc():
for i in range(1,5):
thread = myThread(i, workQueue)
thread.start()
threads.append(thread)
for i in range(1,11):
workQueue.put(i)
# Wait for all threads to complete
for t in threads:
workQueue.put(None)
for t in threads:
t.join()
tprint("Starting main thread")
controlfunc()
tprint("Exiting Main Thread")
Output:
MainThread: Starting main thread
Thread-1: Starting thread 1
Thread-2: Starting thread 2
Thread-3: Starting thread 3
Thread-4: Starting thread 4
Thread-1: Processing thread 1
Thread-2: Processing thread 2
Thread-3: Processing thread 3
Thread-4: Processing thread 4
Thread-1: Processing thread 1
Thread-2: Processing thread 2
Thread-3: Processing thread 3
Thread-4: Processing thread 4
Thread-1: Processing thread 1
Thread-2: Processing thread 2
Thread-3: Processing thread 3
Thread-4: Processing thread 4
Thread-1: Processing thread 1
Thread-2: Processing thread 2
Thread-3: Exiting thread 3
Thread-4: Exiting thread 4
Thread-1: Exiting thread 1
Thread-2: Exiting thread 2
MainThread: Exiting Main Thread
You need to make sure exitFlag is set to 0 (False) before spawning any threads otherwise in impprocess() they won't do anything and the queue will remain non-empty.
This problem could happen if you have exitFlag as a global and it's not cleared from a previous run.

Is the python thread method different in IDLE and pycharm?

In fact, I run and debug these codes as below in both IDLE(Python 3.5.2 shell ) and Pycharm Community Edition 2017.2.
But when I run the code many times, I found there are some questions confused me. The code run in pycharm generates this result:
Thread-3 processing One
Thread-1 processing Two
Thread-3 processing Three
Thread-2 processing Four
Thread-3 processing Five
Thread-1 processing Six
Thread-2 processing Seven
Thread-1 processing Eight
The code run in pycharm generates this result:
Thread-1 processing One
Thread-2 processing Two
Thread-3 processing Three
Thread-1 processing Four
Thread-2 processing Five
Thread-3 processing Six
Thread-1 processing Seven
Thread-2 processing Eight
As you can see, "1 3 2 3 1 2 1" and "2 3 1 2 3 1 2". I run many times and find this. So I just want to know, why the thread method is different in different IDE? And could you please tell me some good directions for learning thread in Python3?
import queue
import threading
import time
exitFlag = 0
class myThread(threading.Thread):
def __init__(self, threadID, name, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print("Open Thread:" + self.name)
process_data(self.name, self.q)
print("Exit Thread:" + self.name)
def process_data(threadName, q):
while not exitFlag:
queueLock.acquire()
if not workQueue.empty():
data = q.get()
print("%s processing %s" % (threadName, data))
queueLock.release()
else:
queueLock.release()
time.sleep(1)
threadList = ["Thread-1", "Thread-2", "Thread-3"]
nameList = ["One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight"]
queueLock = threading.Lock()
workQueue = queue.Queue(10)
threads = []
threadID = 1
for tname in threadList:
thread = myThread(threadID, tname, workQueue)
thread.start()
threads.append(thread)
threadID += 1
queueLock.acquire()
for word in nameList:
#print(workQueue.empty())
workQueue.put(word)
#time.sleep(1)
queueLock.release()
while not workQueue.empty():
pass
exitFlag = 1
for t in threads:
t.join()
print("Exit Main Thread")
Threads don't gurantee that they will execute in any order, thats why you are getting different results on differents executions.
So the threads are not dependant on IDE

Python threading with multiple target functions

I'm currently trying to get my program to process several jobs from several classes and I can't get it to actually perform the process() function. This is based on one of the topics found online.
class myThread (threading.Thread):
def __init__(self, threadID, name, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print ("Starting " + self.name)
myThread.process_data(self.name, self.q)
print ("Exiting " + self.name)
def process_data(threadName, q):
while not exitFlag:
queueLock.acquire()
if not workQueue.empty():
data = q.get()
queueLock.release()
print ("{0} processing {1}".format(threadName, data))
else:
queueLock.release()
time.sleep(1)
threadList = ["Thread-1", "Thread-2", "Thread-3"]
nameList = [myclass1.process, myclass2.process, myclass3.process, myclass5.process, myclass6.process]
queueLock = threading.Lock()
workQueue = queue.Queue()
threads = []
threadID = 1
exitFlag = 0
# Create new threads
for tName in threadList:
thread = myThread(threadID, tName, workQueue)
thread.start()
threads.append(thread)
threadID += 1
# Fill the queue
queueLock.acquire()
for word in nameList:
workQueue.put(word)
queueLock.release()
# Wait for queue to empty
while not workQueue.empty():
pass
# Notify threads it's time to exit
exitFlag = 1
# Wait for all threads to complete
for t in threads:
t.join()
print ("Exiting Main Thread")
This returns as expected:
Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-1 processing <bound method myclass1.process of <myclass1.myclass1 object at 0x7fc163542e10>>
Thread-2 processing <bound method myclass2.process of <myclass2.myclass2 object at 0x7fc163542dd8>>
Thread-3 processing <bound method myclass3.process of <myclass3.myclass3 object at 0x7fc163542a20>>
Thread-1 processing <bound method myclass4.process of <myclass4.myclass4 object at 0x7fc16354e080>>
Thread-3 processing <bound method myclass5.process of <myclass5.myclass5 object at 0x7fc16354e438>>
Exiting Thread-1
Exiting Thread-3
Exiting Thread-2
Exiting Main Thread
I currently use Thread(target=myclassX.process) but now it's different...
How could I modify my program to actually make it perform the process() function for each class instead of just adding it as a bound method in the queue?
Tom Dalton gave me the answer in comments.

Does Lock object in python 2 have time out?

I'm learning multithread in python. I write some code to practice it
import threading
import time
Total = 0
class myThead(threading.Thread):
def __init__(self, num):
threading.Thread.__init__(self)
self.num = num
self.lock = threading.Lock()
def run(self):
global Total
self.lock.acquire()
print "%s acquired" % threading.currentThread().getName()
for i in range(self.num):
Total += 1
print Total
print "%s released" % threading.currentThread().getName()
self.lock.release()
t1 = myThead(100)
t2 = myThead(100)
t1.start()
t2.start()
if i pass 100 to thread t1 and t2, they go correctly.
Thread-1 acquired
100
Thread-1 released
Thread-2 acquired
200
Thread-2 released
But when i try with bigger numbler. For example, i pass 10000. It prints out unexpected output.
Thread-1 acquired
Thread-2 acquired
14854
Thread-1 released
15009
Thread-2 released
I try many times but no thing changes. So i think Lock object in python have timeout. If Lock acquire for long time, it will allow other thread can go. Can anyone explain me about it. Thank you!
No, locks do not have a timeout. What is happening is that they are not actually sharing the same lock, as a new one is created every time you instantiate the object in the init method. If all instances of that class will always share the same lock, then you could throw it in as a class property. However, explicit is better than implicit. I would personally put the lock as an argument in the init method. Something like this.
import threading
import time
Total = 0
class myThead(threading.Thread):
def __init__(self, num, lock):
threading.Thread.__init__(self)
self.num = num
self.lock = lock
def run(self):
global Total
self.lock.acquire()
print "%s acquired" % threading.currentThread().getName()
for i in range(self.num):
Total += 1
print Total
print "%s released" % threading.currentThread().getName()
self.lock.release()
threadLock = threading.Lock()
t1 = myThead(100, threadLock)
t2 = myThead(100, threadLock)
t1.start()
t2.start()
That way both instances of the class share the same lock.
Each thread gets its own lock, so acquiring t1's lock doesn't stop t2 from acquiring its own lock.
Perhaps you could make lock a class attribute, so all instances of myThread share one.
class myThead(threading.Thread):
lock = threading.Lock()
def __init__(self, num):
threading.Thread.__init__(self)
self.num = num
Result:
Thread-1 acquired
10000
Thread-1 released
Thread-2 acquired
20000
Thread-2 released

Categories