In the below example, if you execute the program multiple times, it spawns a new thread each time with a new ID.
1. How do I terminate all the threads on task completion ?
2. How can I assign name/ID to the threads ?
import threading, Queue
THREAD_LIMIT = 3
jobs = Queue.Queue(5) # This sets up the queue object to use 5 slots
singlelock = threading.Lock() # This is a lock so threads don't print trough each other
# list
inputlist_Values = [ (5,5),(10,4),(78,5),(87,2),(65,4),(10,10),(65,2),(88,95),(44,55),(33,3) ]
def DoWork(inputlist):
print "Inputlist received..."
print inputlist
# Spawn the threads
print "Spawning the {0} threads.".format(THREAD_LIMIT)
for x in xrange(THREAD_LIMIT):
print "Thread {0} started.".format(x)
# This is the thread class that we instantiate.
worker().start()
# Put stuff in queue
print "Putting stuff in queue"
for i in inputlist:
# Block if queue is full, and wait 5 seconds. After 5s raise Queue Full error.
try:
jobs.put(i, block=True, timeout=5)
except:
singlelock.acquire()
print "The queue is full !"
singlelock.release()
# Wait for the threads to finish
singlelock.acquire() # Acquire the lock so we can print
print "Waiting for threads to finish."
singlelock.release() # Release the lock
jobs.join() # This command waits for all threads to finish.
class worker(threading.Thread):
def run(self):
# run forever
while 1:
# Try and get a job out of the queue
try:
job = jobs.get(True,1)
singlelock.acquire() # Acquire the lock
print self
print "Multiplication of {0} with {1} gives {2}".format(job[0],job[1],(job[0]*job[1]))
singlelock.release() # Release the lock
# Let the queue know the job is finished.
jobs.task_done()
except:
break # No more jobs in the queue
def main():
DoWork(inputlist_Values)
How do I terminate all the threads on task completion?
You could put THREAD_LIMIT sentinel values (e.g., None) at the end of the queue and exit thread's run() method if a thread sees it.
On your main thread exit all non-daemon threads are joined so the program will keep running if any of the threads is alive. Daemon threads are terminated on your program exit.
How can I assign name/ID to the threads ?
You can assign name by passing it to the constructor or by changing .name directly.
Thread identifier .ident is a read-only property that is unique among alive threads. It maybe reused if one thread exits and another starts.
You could rewrite you code using multiprocessing.dummy.Pool that provides the same interface as multiprocessing.Pool but uses threads instead of processes:
#!/usr/bin/env python
import logging
from multiprocessing.dummy import Pool
debug = logging.getLogger(__name__).debug
def work(x_y):
try:
x, y = x_y # do some work here
debug('got %r', x_y)
return x / y, None
except Exception as e:
logging.getLogger(__name__).exception('work%r failed', x_y)
return None, e
def main():
logging.basicConfig(level=logging.DEBUG,
format="%(levelname)s:%(threadName)s:%(asctime)s %(message)s")
inputlist = [ (5,5),(10,4),(78,5),(87,2),(65,4),(10,10), (1,0), (0,1) ]
pool = Pool(3)
s = 0.
for result, error in pool.imap_unordered(work, inputlist):
if error is None:
s += result
print("sum=%s" % (s,))
pool.close()
pool.join()
if __name__ == "__main__":
main()
Output
DEBUG:Thread-1:2013-01-14 15:37:37,253 got (5, 5)
DEBUG:Thread-1:2013-01-14 15:37:37,253 got (87, 2)
DEBUG:Thread-1:2013-01-14 15:37:37,253 got (65, 4)
DEBUG:Thread-1:2013-01-14 15:37:37,254 got (10, 10)
DEBUG:Thread-1:2013-01-14 15:37:37,254 got (1, 0)
ERROR:Thread-1:2013-01-14 15:37:37,254 work(1, 0) failed
Traceback (most recent call last):
File "prog.py", line 11, in work
return x / y, None
ZeroDivisionError: integer division or modulo by zero
DEBUG:Thread-1:2013-01-14 15:37:37,254 got (0, 1)
DEBUG:Thread-3:2013-01-14 15:37:37,253 got (10, 4)
DEBUG:Thread-2:2013-01-14 15:37:37,253 got (78, 5)
sum=78.0
Threads don't stop unless you tell them to stop.
My recommendation is that you add a stop variable into your Thread subclass, and check whether this variable is True or not in your run loop (instead of while 1:).
An example:
class worker(threading.Thread):
def __init__(self):
self._stop = False
def stop(self):
self._stop = True
def run(self):
# run until stopped
while not self._stop:
# do work
Then when your program is quitting (for whatever reason) you have to make sure to call the stop method on all your working threads.
About your second question, doesn't adding a name variable to your Thread subclass work for you?
Related
I'm calling a function in a for loop however, I want to check if that function takes longer than 5 seconds to execute, I want to pass that iteration and move on to the next iteration.
I have thought about using the time library, and starting a clock, but the end timer will only execute after the function executes, thus I won't be able to pass that specific iteration within 5 seconds
I am attaching an example below. Hope this might help you:
from threading import Timer
class LoopStopper:
def __init__(self, seconds):
self._loop_stop = False
self._seconds = seconds
def _stop_loop(self):
self._loop_stop = True
def run( self, generator_expression, task):
""" Execute a task a number of times based on the generator_expression"""
t = Timer(self._seconds, self._stop_loop)
t.start()
for i in generator_expression:
task(i)
if self._loop_stop:
break
t.cancel() # Cancel the timer if the loop ends ok.
ls = LoopStopper( 5) # 5 second timeout
ls.run( range(1000000), print) # print numbers from 0 to 999999
Here's some code I've been experimenting with which has a task() which iterates over it params argument and takes a random amount of time to complete each.
I start a thread for each task, waiting for the thread to complete by monitoring a queue of return values. If the thread fails to complete, then the main loop abandons it, and starts the next thread.
The program shows which tasks fail or finish (different every time).
The tasks which finish have their results printed out (the param and the sleep time).
import threading, queue
import random
import time
def task(params, q):
for p in params:
s = random.randint(1,4)
s = s * s
s = s / 8
time.sleep(s)
q.put((p,s), False)
q.put(None, False) # None is sentinal value
def sampleQueue(q, ret, results):
while not q.empty():
item = q.get()
if item:
ret.append(item)
else:
# Found None sentinal
results.append(ret)
return True
return False
old = []
results = []
for p in [1,2,3,4]:
q = queue.SimpleQueue()
t = threading.Thread(target=task, args=([p,p,p,p,p], q))
t.start()
end = time.time() + 5
ret = []
failed = True
while time.time() < end:
time.sleep(0.1)
if sampleQueue(q, ret, results):
failed = False
break
if failed:
print(f'Task {p} failed!')
old.append(t)
else:
print(f'Task {p} finished!')
t.join()
print(results)
print(f'{len(old)} threads failed')
for t in old:
t.join()
print('Done')
Example output:
Task 1 finished!
Task 2 finished!
Task 3 failed!
Task 4 failed!
[[(1, 1.125), (1, 1.125), (1, 2.0), (1, 0.125), (1, 0.5)], [(2, 0.125), (2, 1.125), (2, 0.5), (2, 2.0), (2, 0.125)]]
2 threads failed
Done
I will post an alternative solution using the subprocess module. You need to create a python file with your function, call it as a subprocess, and call the wait method. If the process wont finish in the desired time it will throw an error, so you kill that process and keep going with the iteration.
As an example, this is the function you want to call:
from time import time
import sys
x = eval(sys.argv[1])
t = time()
a = [i for i in range(int(x**5))]
#pipe to the main process the computaiton time
sys.stdout.write('%s'%(time()-t))
And the main function, where I call the previous function on the func.py file:
import subprocess as sp
from subprocess import Popen, PIPE
for i in range(1,50,1):
#call the process
process = Popen(['python','~func.py', '%i'%i],
stdout = PIPE,stdin = PIPE)
try:
#if it finish within 1 sec:
process.wait(1)
print('Finished in: %s s'%(process.stdout.read().decode()))
except:
#else kill the process. It is important to kill it,
#otherwise it will keep running.
print('Timeout')
process.kill()
Given the following class:
from abc import ABCMeta, abstractmethod
from time import sleep
import threading
from threading import active_count, Thread
class ScraperPool(metaclass=ABCMeta):
Queue = []
ResultList = []
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
Thread(target=self.worker, args=(w + 1, PrintIDs,)).start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done.
while active_count() > 1:
print("Active threads: " + str(active_count()))
sleep(5)
self.HandleResults()
def worker(self, id, printID):
if printID:
print("Starting worker " + str(id) + ".")
while (len(self.Queue) > 0):
self.scraperMethod()
if printID:
print("Worker " + str(id) + " is quiting.")
# Todo Kill is this Thread.
return
def NumWorkers(self):
return 1 # Simplified for testing purposes.
#abstractmethod
def scraperMethod(self):
pass
class TestScraper(ScraperPool):
def scraperMethod(self):
# print("I am scraping.")
# print("Scraping. Threads#: " + str(active_count()))
temp_item = self.Queue[-1]
self.Queue.pop()
self.ResultList.append(temp_item)
def HandleResults(self):
print(self.ResultList)
ScraperPool.register(TestScraper)
scraper = TestScraper(Queue=["Jaap", "Piet"])
scraper.run()
print(threading.active_count())
# print(scraper.ResultList)
When all the threads are done, there's still one active thread - threading.active_count() on the last line gets me that number.
The active thread is <_MainThread(MainThread, started 12960)> - as printed with threading.enumerate().
Can I assume that all my threads are done when active_count() == 1?
Or can, for instance, imported modules start additional threads so that my threads are actually done when active_count() > 1 - also the condition for the loop I'm using in the run method.
You can assume that your threads are done when active_count() reaches 1. The problem is, if any other module creates a thread, you'll never get to 1. You should manage your threads explicitly.
Example: You can put the threads in a list and join them one at a time. The relevant changes to your code are:
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
self.WorkerThreads = []
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
thread = Thread(target=self.worker, args=(w + 1, PrintIDs,))
self.WorkerThreads.append(thread)
thread.start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done. Waiting in order
# so some threads further in the list may finish first, but we
# will get to all of them eventually
while self.WorkerThreads:
self.WorkerThreads[0].join()
self.HandleResults()
according to docs active_count() includes the main thread, so if you're at 1 then you're most likely done, but if you have another source of new threads in your program then you may be done before active_count() hits 1.
I would recommend implementing explicit join method on your ScraperPool and keeping track of your workers and explicitly joining them to main thread when needed instead of checking that you're done with active_count() calls.
Also remember about GIL...
Here is my script:
# globals
MAX_PROCESSES = 50
my_queue = Manager().Queue() # queue to store our values
stop_event = Event() # flag which signals processes to stop
my_pool = None
def my_function(var):
while not stop_event.is_set():
#this script will run forever for each variable found
return
def var_scanner():
# Since `t` could have unlimited size we'll put all `t` value in queue
while not stop_event.is_set(): # forever scan `values` for new items
x = Variable.objects.order_by('var').values('var__var')
for t in x:
t = t.values()
my_queue.put(t)
time.sleep(10)
try:
var_scanner = Process(target=var_scanner)
var_scanner.start()
my_pool = Pool(MAX_PROCESSES)
while not stop_event.is_set():
try: # if queue isn't empty, get value from queue and create new process
var = my_queue.get_nowait() # getting value from queue
p = Process(target=my_function, args=("process-%s" % var))
p.start()
except Queue.Empty:
print "No more items in queue"
except KeyboardInterrupt as stop_test_exception:
print(" CTRL+C pressed. Stopping test....")
stop_event.set()
However I don't think this script is exactly what I want. Here's what I was looking for when I wrote the script. I want it to scan for variables in "Variables" table, add "new" variables if they don't already exists to the queue, run "my_function" for each variable in the queue.
I believe I have WAYYYY to many while not stop_event.is_set() functions. Because right now it just prints out "No more items in queue" about a million times.
Please HELP!! :)
My multi-threading script raising this error :
thread.error : can't start new thread
when it reached 460 threads :
threading.active_count() = 460
I assume the old threads keeps stack up, since the script didn't kill them. This is my code:
import threading
import Queue
import time
import os
import csv
def main(worker):
#Do Work
print worker
return
def threader():
while True:
worker = q.get()
main(worker)
q.task_done()
def main_threader(workers):
global q
global city
q = Queue.Queue()
for x in range(20):
t = threading.Thread(target=threader)
t.daemon = True
print "\n\nthreading.active_count() = " + str(threading.active_count()) + "\n\n"
t.start()
for worker in workers:
q.put(worker)
q.join()
How do I kill the old threads when their job is done? (Is return not enough?)
Your threader function never exits, so your threads never die. Since you're just processing one fixed set of work and never adding items after you start working, you could set the threads up to exit when the queue is empty.
See the following altered version of your code and the comments I added:
def threader(q):
# let the thread die when all work is done
while not q.empty():
worker = q.get()
main(worker)
q.task_done()
def main_threader(workers):
# you don't want global variables
#global q
#global city
q = Queue.Queue()
# make sure you fill the queue *before* starting the worker threads
for worker in workers:
q.put(worker)
for x in range(20):
t = threading.Thread(target=threader, args=[q])
t.daemon = True
print "\n\nthreading.active_count() = " + str(threading.active_count()) + "\n\n"
t.start()
q.join()
Notice that I removed global q, and instead I pass q to the thread function. You don't want threads created by a previous call to end up sharing a q with new threads (edit although q.join() prevents this anyway, it's still better to avoid globals).
I'm about to put this design into use in an application, but I'm fairly new to threading and Queue stuff in python. Obviously the actual application is not for saying hello, but the design is the same - i.e. there is a process which takes some time to set-up and tear down, but I can do multiple tasks in one hit. Tasks will arrive at random times, and often in bursts.
Is this a sensible and thread safe design?
class HelloThing(object):
def __init__(self):
self.queue = self._create_worker()
def _create_worker(self):
import threading, Queue
def worker():
while True:
things = [q.get()]
while True:
try:
things.append(q.get_nowait())
except Queue.Empty:
break
self._say_hello(things)
[q.task_done() for task in xrange(len(things))]
q = Queue.Queue()
n_worker_threads = 1
for i in xrange(n_worker_threads):
t = threading.Thread(target=worker)
t.daemon = True
t.start()
return q
def _say_hello(self, greeting_list):
import time, sys
# setup stuff
time.sleep(1)
# do some things
sys.stdout.write('hello {0}!\n'.format(', '.join(greeting_list)))
# tear down stuff
time.sleep(1)
if __name__ == '__main__':
print 'enter __main__'
import time
hello = HelloThing()
hello.queue.put('world')
hello.queue.put('cruel world')
hello.queue.put('stack overflow')
time.sleep(2)
hello.queue.put('a')
hello.queue.put('b')
time.sleep(2)
for i in xrange(20):
hello.queue.put(str(i))
#hello.queue.join()
print 'finish __main__'
The thread safety is handled by Queue implementation (also you must handle in your _say_hello implementation if it is required).
Burst handler problem: A burst should be handled by a single thread only.(ex: let's say your process setup/teardown takes 10 seconds; at second 1 all threads will be busy with burst from sec 0, on second 5 a new task(or burst) but no thread available to handle them/it). So a burst should be defined by max number of tasks (or maybe "infinite") for a specific time-window. An entry in queue should be a list of tasks.
How can you group burst tasks list?
I provide a solution as code, more easy to explain ...
producer_q = Queue()
def _burst_thread():
while True:
available_tasks = [producer_q.get()]
time.sleep(BURST_TIME_WINDOW)
available_tasks.extend(producer_q.get() # I'm the single consumer, so will be at least qsize elements
for i in range(producer_q.qsize()))
consumer_q.push(available_tasks)
If you want to have a maximum of messages in a burst, you just need to slice the available_tasks in multiple lists.