Correct Approach to Threading in Python - python

I have a script that takes a text file as input and performs the testing. What I want to do is create two threads and divide the input text file in 2 parts and run them so as to minimize the execution time. Is there a way I can do this ?
Thanks
class myThread (threading.Thread):
def __init__(self, ip_list):
threading.Thread.__init__(self)
self.input_list = ip_list
def run(self):
# Get lock to synchronize threads
threadLock.acquire()
print "python Audit.py " + (",".join(x for x in self.input_list))
p = subprocess.Popen("python Audit.py " + (",".join(x for x in self.input_list)), shell=True)
# Free lock to release next thread
threadLock.release()
while p.poll() is None:
print('Test Execution in Progress ....')
time.sleep(60)
print('Not sleeping any longer. Exited with returncode %d' % p.returncode)
def split_list(input_list, split_count):
for i in range(0, len(input_list), split_count):
yield input_list[i:i + split_count]
if __name__ == '__main__':
threadLock = threading.Lock()
threads = []
with open("inputList.txt", "r") as Ptr:
for i in Ptr:
try:
id = str(i).rstrip('\n').rstrip('\r')
input_list.append(id)
except Exception as err:
print err
print "Exception occured..."
try:
test = split_list(input_list, len(input_list)/THREAD_COUNT)
list_of_lists = list(test)
except Exception as err:
print err
print "Exception caught in splitting list"
try:
#Create Threads & Start
for i in range(0,len(list_of_lists)-1):
# Create new threads
threads.append(myThread(list_of_lists[i]))
threads[i].start()
time.sleep(1)
# Wait for all threads to complete
for thread in threads:
thread.join()
print "Exiting Main Thread..!"
except Exception as err:
print err
print "Exception caught during THREADING..."

You are trying to do 2 things at the same time, which is the definition of parallelism. The problem here is that if you are using CPython, you won't be able to do parallelism because of the GIL(Global Interpreter Lock). The GIL makes sure that only 1 thread is running because the python interpreter is not considered thread safe.
What you should use if you really want to do two operations in parallel is to use the multiprocessing module (import multiprocessing)
Read this: Multiprocessing vs Threading Python

Some notes, in random order:
In python, multithreading is not a good solution to approach computationally intensive tasks. A better approach is multiprocessing:
Python: what are the differences between the threading and multiprocessing modules?
For resources that are not shared (in your case, each line will be used exclusively by a single process) you do not need locks. A better approach would be the map function.
def processing_function(line):
suprocess.call(["python", "Audit.py", line])
with open('file.txt', 'r') as f:
lines = f.readlines()
to_process = [lines[:len(lines)//2], lines[len(lines)//2:]]
p = multiprocessing.Pool(2)
results = p.map(processing_func, to_process)
If the computation requires a variable amount of time depending on the line, using Queues to move data between processes instead of mapping could help to balance the load

Related

Why does my multiprocess queue not appear to be thread safe?

I am building a watchdog timer that runs another Python program, and if it fails to find a check-in from any of the threads, shuts down the whole program. This is so it will, eventually, be able to take control of needed communication ports. The code for the timer is as follows:
from multiprocessing import Process, Queue
from time import sleep
from copy import deepcopy
PATH_TO_FILE = r'.\test_program.py'
WATCHDOG_TIMEOUT = 2
class Watchdog:
def __init__(self, filepath, timeout):
self.filepath = filepath
self.timeout = timeout
self.threadIdQ = Queue()
self.knownThreads = {}
def start(self):
threadIdQ = self.threadIdQ
process = Process(target = self._executeFile)
process.start()
try:
while True:
unaccountedThreads = deepcopy(self.knownThreads)
# Empty queue since last wake. Add new thread IDs to knownThreads, and account for all known thread IDs
# in queue
while not threadIdQ.empty():
threadId = threadIdQ.get()
if threadId in self.knownThreads:
unaccountedThreads.pop(threadId, None)
else:
print('New threadId < {} > discovered'.format(threadId))
self.knownThreads[threadId] = False
# If there is a known thread that is unaccounted for, then it has either hung or crashed.
# Shut everything down.
if len(unaccountedThreads) > 0:
print('The following threads are unaccounted for:\n')
for threadId in unaccountedThreads:
print(threadId)
print('\nShutting down!!!')
break
else:
print('No unaccounted threads...')
sleep(self.timeout)
# Account for any exceptions thrown in the watchdog timer itself
except:
process.terminate()
raise
process.terminate()
def _executeFile(self):
with open(self.filepath, 'r') as f:
exec(f.read(), {'wdQueue' : self.threadIdQ})
if __name__ == '__main__':
wd = Watchdog(PATH_TO_FILE, WATCHDOG_TIMEOUT)
wd.start()
I also have a small program to test the watchdog functionality
from time import sleep
from threading import Thread
from queue import SimpleQueue
Q_TO_Q_DELAY = 0.013
class QToQ:
def __init__(self, processQueue, threadQueue):
self.processQueue = processQueue
self.threadQueue = threadQueue
Thread(name='queueToQueue', target=self._run).start()
def _run(self):
pQ = self.processQueue
tQ = self.threadQueue
while True:
while not tQ.empty():
sleep(Q_TO_Q_DELAY)
pQ.put(tQ.get())
def fastThread(q):
while True:
print('Fast thread, checking in!')
q.put('fastID')
sleep(0.5)
def slowThread(q):
while True:
print('Slow thread, checking in...')
q.put('slowID')
sleep(1.5)
def hangThread(q):
print('Hanging thread, checked in')
q.put('hangID')
while True:
pass
print('Hello! I am a program that spawns threads!\n\n')
threadQ = SimpleQueue()
Thread(name='fastThread', target=fastThread, args=(threadQ,)).start()
Thread(name='slowThread', target=slowThread, args=(threadQ,)).start()
Thread(name='hangThread', target=hangThread, args=(threadQ,)).start()
QToQ(wdQueue, threadQ)
As you can see, I need to have the threads put into a queue.Queue, while a separate object slowly feeds the output of the queue.Queue into the multiprocessing queue. If instead I have the threads put directly into the multiprocessing queue, or do not have the QToQ object sleep in between puts, the multiprocessing queue will lock up, and will appear to always be empty on the watchdog side.
Now, as the multiprocessing queue is supposed to be thread and process safe, I can only assume I have messed something up in the implementation. My solution seems to work, but also feels hacky enough that I feel I should fix it.
I am using Python 3.7.2, if it matters.
I suspect that test_program.py exits.
I changed the last few lines to this:
tq = threadQ
# tq = wdQueue # option to send messages direct to WD
t1 = Thread(name='fastThread', target=fastThread, args=(tq,))
t2 = Thread(name='slowThread', target=slowThread, args=(tq,))
t3 = Thread(name='hangThread', target=hangThread, args=(tq,))
t1.start()
t2.start()
t3.start()
QToQ(wdQueue, threadQ)
print('Joining with threads...')
t1.join()
t2.join()
t3.join()
print('test_program exit')
The calls to join() means that the test program never exits all by itself since none of the threads ever exit.
So, as is, t3 hangs and the watchdog program detects this and detects the unaccounted for thread and stops the test program.
If t3 is removed from the above program, then the other two threads are well behaved and the watchdog program allows the test program to continue indefinitely.

Multiple stdout w/ flush going on in Python threading

I have a small piece of code that I made to test out and hopefully debug the problem without having to modify the code in my main applet in Python. This has let me to build this code:
#!/usr/bin/env python
import sys, threading, time
def loop1():
count = 0
while True:
sys.stdout.write('\r thread 1: ' + str(count))
sys.stdout.flush()
count = count + 1
time.sleep(.3)
pass
pass
def loop2():
count = 0
print ""
while True:
sys.stdout.write('\r thread 2: ' + str(count))
sys.stdout.flush()
count = count + 2
time.sleep(.3)
pass
if __name__ == '__main__':
try:
th = threading.Thread(target=loop1)
th.start()
th1 = threading.Thread(target=loop2)
th1.start()
pass
except KeyboardInterrupt:
print ""
pass
pass
My goal with this code is to be able to have both of these threads displaying output in stdout format (with flushing) at the same time and have then side by side or something. problem is that I assume since it is flushing each one, it flushes the other string by default. I don't quite know how to get this to work if it is even possible.
If you just run one of the threads, it works fine. However I want to be able to run both threads with their own string running at the same time in the terminal output. Here is a picture displaying what I'm getting:
terminal screenshot
let me know if you need more info. thanks in advance.
Instead of allowing each thread to output to stdout, a better solution is to have one thread control stdout exclusively. Then provide a threadsafe channel for the other threads to dispatch data to be output.
One good method to achieve this is to share a Queue between all threads. Ensure that only the output thread is accessing data after it has been added to the queue.
The output thread can store the last message from each other thread and use that data to format stdout nicely. This can include clearing output to display something like this, and update it as each thread generates new data.
Threads
#1: 0
#2: 0
Example
Some decisions were made to simplify this example:
There are gotchas to be wary of when giving arguments to threads.
Daemon threads terminate themselves when the main thread exits. They are used to avoid adding complexity to this answer. Using them on long-running or large applications can pose problems. Other
questions discuss how to exit a multithreaded application without leaking memory or locking system resources. You will need to think about how your program needs to signal an exit. Consider using asyncio to save yourself these considerations.
No newlines are used because \r carriage returns cannot clear the whole console. They only allow the current line to be rewritten.
import queue, threading
import time, sys
q = queue.Queue()
keepRunning = True
def loop_output():
thread_outputs = dict()
while keepRunning:
try:
thread_id, data = q.get_nowait()
thread_outputs[thread_id] = data
except queue.Empty:
# because the queue is used to update, there's no need to wait or block.
pass
pretty_output = ""
for thread_id, data in thread_outputs.items():
pretty_output += '({}:{}) '.format(thread_id, str(data))
sys.stdout.write('\r' + pretty_output)
sys.stdout.flush()
time.sleep(1)
def loop_count(thread_id, increment):
count = 0
while keepRunning:
msg = (thread_id, count)
try:
q.put_nowait(msg)
except queue.Full:
pass
count = count + increment
time.sleep(.3)
pass
pass
if __name__ == '__main__':
try:
th_out = threading.Thread(target=loop_output)
th_out.start()
# make sure to use args, not pass arguments directly
th0 = threading.Thread(target=loop_count, args=("Thread0", 1))
th0.daemon = True
th0.start()
th1 = threading.Thread(target=loop_count, args=("Thread1", 3))
th1.daemon = True
th1.start()
# Keep the main thread alive to wait for KeyboardInterrupt
while True:
time.sleep(.1)
except KeyboardInterrupt:
print("Ended by keyboard stroke")
keepRunning = False
for th in [th0, th1]:
th.join()
Example Output:
(Thread0:110) (Thread1:330)

Python threads hang and don't close

This is my first try with threads in Python,
I wrote the following program as a very simple example. It just gets a list and prints it using some threads. However, Whenever there is an error, the program just hangs in Ubuntu, and I can't seem to do anything to get the control prompt back, so have to restart another SSH session to get back in.
Also have no idea what the issue with my program is.
Is there some kind of error handling I can put in to ensure it doesn't hang.
Also, any idea why ctrl/c doesn't work (I don't have a break key)
from Queue import Queue
from threading import Thread
import HAInstances
import logging
log = logging.getLogger()
logging.basicConfig()
class GetHAInstances:
def oraHAInstanceData(self):
log.info('Getting HA instance routing data')
# HAData = SolrGetHAInstances.TalkToOracle.main()
HAData = HAInstances.main()
log.info('Query fetched ' + str(len(HAData)) + ' HA Instances to query')
# for row in HAData:
# print row
return(HAData)
def do_stuff(q):
while True:
print q.get()
print threading.current_thread().name
q.task_done()
oraHAInstances = GetHAInstances()
mainHAData = oraHAInstances.oraHAInstanceData()
q = Queue(maxsize=0)
num_threads = 10
for i in range(num_threads):
worker = Thread(target=do_stuff, args=(q,))
worker.setDaemon(True)
worker.start()
for row in mainHAData:
#print str(row[0]) + ':' + str(row[1]) + ':' + str(row[2]) + ':' + str(row[3])i
q.put((row[0],row[1],row[2],row[3]))
q.join()
In your thread method, it is recommended to use the "try ... except ... finally". This structure guarantees to return the control to the main thread even when errors occur.
def do_stuff(q):
while True:
try:
#do your works
except:
#log the error
finally:
q.task_done()
Also, in case you want to kill your program, go find out the pid of your main thread and use kill #pid to kill it. In Ubuntu or Mint, use ps -Ao pid,cmd, in the output, you can find out the pid (first column) by searching for the command (second column) you yourself typed to run your Python script.
Your q is hanging because your worker as errored. So your q.task_done() never got called.
import threading
to use
print threading.current_thread().name

How to Interrupt/Stop/End a hanging multi-threaded python program

I have a python program that implements threads like this:
class Mythread(threading.Thread):
def __init__(self, name, q):
threading.Thread.__init__(self)
self.name = name
self.q = q
def run(self):
print "Starting %s..." % (self.name)
while True:
## Get data from queue
data = self.q.get()
## do_some_processing with data ###
process_data(data)
## Mark Queue item as done
self.q.task_done()
print "Exiting %s..." % (self.name)
def call_threaded_program():
##Setup the threads. Define threads,queue,locks
threads = []
q = Queue.Queue()
thread_count = n #some number
data_list = [] #some data list containing data
##Create Threads
for thread_id in range(1, thread_count+1):
thread_name = "Thread-" + str(thread_id)
thread = Mythread(thread_name,q)
thread.daemon = True
thread.start()
##Fill data in Queue
for data_item in data_list:
q.put(data_item)
try:
##Wait for queue to be exhausted and then exit main program
q.join()
except (KeyboardInterrupt, SystemExit) as e:
print "Interrupt Issued. Exiting Program with error state: %s"%(str(e))
exit(1)
The call_threaded_program() is called from a different program.
I have the code working under normal circumstances. However if an error/exception occurs in one of the threads, then the program is stuck (as the queue join is infinitely blocking). The only way I am able to quit this program is to close the terminal itself.
What is the best way to terminate this program when a thread bails out? Is there a clean (actually I would take any way) way of doing this? I know this question has been asked numerous times, but I am still unable to find a convincing answer. I would really appreciate any help.
EDIT:
I tried removing the join on the queue and used a global exit flag as suggested in Is there any way to kill a Thread in Python?
However, Now the behavior is so strange, I can't comprehend what is going on.
import threading
import Queue
import time
exit_flag = False
class Mythread (threading.Thread):
def __init__(self,name,q):
threading.Thread.__init__(self)
self.name = name
self.q = q
def run(self):
try:
# Start Thread
print "Starting %s...."%(self.name)
# Do Some Processing
while not exit_flag:
data = self.q.get()
print "%s processing %s"%(self.name,str(data))
self.q.task_done()
# Exit thread
print "Exiting %s..."%(self.name)
except Exception as e:
print "Exiting %s due to Error: %s"%(self.name,str(e))
def main():
global exit_flag
##Setup the threads. Define threads,queue,locks
threads = []
q = Queue.Queue()
thread_count = 20
data_list = range(1,50)
##Create Threads
for thread_id in range(1,thread_count+1):
thread_name = "Thread-" + str(thread_id)
thread = Mythread(thread_name,q)
thread.daemon = True
threads.append(thread)
thread.start()
##Fill data in Queue
for data_item in data_list:
q.put(data_item)
try:
##Wait for queue to be exhausted and then exit main program
while not q.empty():
pass
# Stop the threads
exit_flag = True
# Wait for threads to finish
print "Waiting for threads to finish..."
while threading.activeCount() > 1:
print "Active Threads:",threading.activeCount()
time.sleep(1)
pass
print "Finished Successfully"
except (KeyboardInterrupt, SystemExit) as e:
print "Interrupt Issued. Exiting Program with error state: %s"%(str(e))
if __name__ == '__main__':
main()
The program's output is as below:
#Threads get started correctly
#The output also is getting processed but then towards the end, All i see are
Active Threads: 16
Active Threads: 16
Active Threads: 16...
The program then just hangs or keeps on printing the active threads. However since the exit flag is set to True, the thread's run method is not being exercised. So I have no clue as to how these threads are kept up or what is happening.
EDIT:
I found the problem. In the above code, thread's get method were blocking and hence unable to quit. Using a get method with a timeout instead did the trick. I have the code for just the run method that I modified below
def run(self):
try:
#Start Thread
printing "Starting %s..."%(self.name)
#Do Some processing
while not exit_flag:
try:
data = self.q.get(True,self.timeout)
print "%s processing %s"%(self.name,str(data))
self.q.task_done()
except:
print "Queue Empty or Timeout Occurred. Try Again for %s"%(self.name)
# Exit thread
print "Exiting %s..."%(self.name)
except Exception as e:
print "Exiting %s due to Error: %s"%(self.name,str(e))
If you want to force all the threads to exit when the process exits, you can set the "daemon" flag of the thread to True before the thread is created.
http://docs.python.org/2/library/threading.html#threading.Thread.daemon
I did it once in C. Basically i had a main process that were starting the other ones and kept tracks of them, ie. stored the PID and waited for the return code. If you have an error in a process the code will indicate so and then you can stop every other process. Hope this helps
Edit:
Sorry i can have forgotten in my answer that you were using threads. But I think it still applies. You can either wrap or modify the thread to get a return value or you can use the multithread pool library.
how to get the return value from a thread in python?
Python thread exit code

How can I catch SIGINT in threading python program?

When using module threading and class Thread(), I can't catch SIGINT (Ctrl + C in console) can not be caught.
Why and what can I do?
Simple test program:
#!/usr/bin/env python
import threading
def test(suffix):
while True:
print "test", suffix
def main():
for i in (1, 2, 3, 4, 5):
threading.Thread(target=test, args=(i, )).start()
if __name__ == "__main__":
main()
When I hit Ctrl + C, nothing happens.
Threads and signals don't mix. In Python this is even more so the case than outside: signals only ever get delivered to one thread (the main thread); other threads won't get the message. There's nothing you can do to interrupt threads other than the main thread. They're out of your control.
The only thing you can do here is introduce a communication channel between the main thread and whatever threads you start, using the queue module. You can then send a message to the thread and have it terminate (or do whatever else you want) when it sees the message.
Alternatively, and it's often a very good alternative, is to not use threads. What to use instead depends greatly on what you're trying to achieve, however.
Basically you can check if the parent issued a signal by reading a queue during the work. If the parent receives a SIGINT then it issues a signal across the queue (in this case anything) and the children wrap up their work and exit...
def fun(arg1, thread_no, queue):
while True:
WpORK...
if queue.empty() is False or errors == 0:
print('thread ', thread_no, ' exiting...')
with open('output_%i' % thread_no, 'w') as f:
for line in lines: f.write(line)
exit()
threads = []
for i, item in enumerate(items):
threads.append( dict() )
q = queue.Queue()
threads[i]['queue'] = q
threads[i]['thread'] = threading.Thread(target=fun, args=(arg1, i, q))
threads[i]['thread'].start()
try:
time.sleep(10000)
except:
for thread in threads:
thread['queue'].put('TERMINATING')

Categories