How to implement a dynamic amount of concurrent threads? - python

I am launching concurrent threads doing some stuff:
concurrent = 10
q = Queue(concurrent * 2)
for j in range(concurrent):
t = threading.Thread(target=doWork)
t.daemon = True
t.start()
try:
# process each line and assign it to an available thread
for line in call_file:
q.put(line)
q.join()
except KeyboardInterrupt:
sys.exit(1)
At the same time I have a distinct thread counting time:
def printit():
threading.Timer(1.0, printit).start()
print current_status
printit()
I would like to increase (or decrease) the amount of concurrent threads for the main process let's say every minute. I can make a time counter in the time thread and make it do things every minute but how to change the amount of concurrent threads in the main process ?
Is it possible (and if yes how) to do that ?

This is my worker:
def UpdateProcesses(start,processnumber,CachesThatRequireCalculating,CachesThatAreBeingCalculated,CacheDict,CacheLock,IdleLock,FileDictionary,MetaDataDict,CacheIndexDict):
NewPool()
while start[processnumber]:
IdleLock.wait()
while len(CachesThatRequireCalculating)>0 and start[processnumber] == True:
CacheLock.acquire()
try:
cacheCode = CachesThatRequireCalculating[0] # The list can be empty if an other process takes the last item during the CacheLock
CachesThatRequireCalculating.remove(cacheCode)
print cacheCode,"starts processing by",processnumber,"process"
except:
CacheLock.release()
else:
CacheLock.release()
CachesThatAreBeingCalculated.append(cacheCode[:3])
Array,b,f = TIPP.LoadArray(FileDictionary[cacheCode[:2]])#opens the dask array
Array = ((Array[:,:,CacheIndexDict[cacheCode[:2]][cacheCode[2]]:CacheIndexDict[cacheCode[:2]][cacheCode[2]+1]].compute()/2.**(MetaDataDict[cacheCode[:2]]["Bit Depth"])*255.).astype(np.uint16)).transpose([1,0,2]) #slices and calculates the array
f.close() #close the file
if CachesThatAreBeingCalculated.count(cacheCode[:3]) != 0: #if not, this cache is not needed annymore (the cacheCode is removed bij wavelengthchange)
CachesThatAreBeingCalculated.remove(cacheCode[:3])
try: #If the first time the object if not aivalable try a second time
CacheDict[cacheCode[:3]] = Array
except:
CacheDict[cacheCode[:3]] = Array
print cacheCode,"done processing by",processnumber,"process"
if start[processnumber]:
IdleLock.clear()
This is how I start them:
self.ProcessLst = [] #list with all the processes who calculate the caches
for processnumber in range(min(NumberOfMaxProcess,self.processes)):
self.ProcessTerminateLst.append(True)
for processnumber in range(min(NumberOfMaxProcess,self.processes)):
self.ProcessLst.append(process.Process(target=Proc.UpdateProcesses,args=(self.ProcessTerminateLst,processnumber,self.CachesThatRequireCalculating,self.CachesThatAreBeingCalculated,self.CacheDict,self.CacheLock,self.IdleLock,self.FileDictionary,self.MetaDataDict,self.CacheIndexDict,)))
self.ProcessLst[-1].daemon = True
self.ProcessLst[-1].start()
I close them like this:
for i in range(len(self.ProcessLst)): #For both while loops in the processes self.ProcessTerminateLst[i] must be True. So or the process is now ready to be terminad or is still in idle mode.
self.ProcessTerminateLst[i] = False
self.IdleLock.set() #Makes sure no process is in Idle and all are ready to be terminated

I would use a pool. a pool has a max number of threads it uses at the same time, but you can apply inf number of jobs. They stay in the waiting list until a thread is available. I don't think you can change number of current processes in the pool.

Related

A way to wait for currently running tasks to finish then stop in multiprocessing Pool

I have a large number of tasks (40,000 to be exact) that I am using a Pool to run in parallel. To maximize efficiency, I pass the list of all tasks at once to starmap and let them run.
I would like to have it so that if my program is broken using Ctrl+C then currently running tasks will be allowed to finish but new ones will not be started. I have figured out the signal handling part to handle the Ctrl+C breaking just fine using the recommended method and this works well (at least with Python 3.6.9 that I am using):
import os
import signal
import random as rand
import multiprocessing as mp
def init() :
signal.signal(signal.SIGINT, signal.SIG_IGN)
def child(a, b, c) :
st = rand.randrange(5, 20+1)
print("Worker thread", a+1, "sleep for", st, "...")
os.system("sleep " + str(st))
pool = mp.Pool(initializer=init)
try :
pool.starmap(child, [(i, 2*i, 3*i) for i in range(10)])
pool.close()
pool.join()
print("True exit!")
except KeyboardInterrupt :
pool.terminate()
pool.join()
print("Interupted exit!")
The problem is that Pool seems to have no function to let the currently running tasks complete and then stop. It only has terminate and close. In the example above I use terminate but this is not what I want as this immediately terminates all running tasks (whereas I want to let the currently running tasks run to completion). On the other hand, close simply prevents adding more tasks, but calling close then join will wait for all pending tasks to complete (40,000 of them in my real case) (whereas I only want currently running tasks to finish not all of them).
I could somehow gradually add my tasks one by one or in chunks so I could use close and join when interrupted, but this seems less efficient unless there is a way to add a new task as soon as one finishes manually (which I'm not seeing how to do from the Pool documentation). It really seems like my use case would be common and that Pool should have a function for this, but I have not seen this question asked anywhere (or maybe I'm just not searching for the right thing).
Does anyone know how to accomplish this easily?
I tried to do something similar with concurrent.futures - see the last code block in this answer: it attempts to throttle adding tasks to the pool and only adds new tasks as tasks complete. You could change the logic to fit your needs. Maybe keep the pending work items slightly greater than the number of workers so you don't starve the executor. something like:
import concurrent.futures
import random as rand
import time
def child(*args, n=0):
signal.signal(signal.SIGINT, signal.SIG_IGN)
a,b,c = args
st = rand.randrange(1, 5)
time.sleep(st)
x = f"Worker {n} thread {a+1} slept for {st} - args:{args}"
return (n,x)
if __name__ == '__main__':
nworkers = 5 # ncpus?
results = []
fs = []
with concurrent.futures.ProcessPoolExecutor(max_workers=nworkers) as executor:
data = ((i, 2*i, 3*i) for i in range(100))
for n,args in enumerate(data):
try:
# limit pending tasks
while len(executor._pending_work_items) >= nworkers + 2:
# wait till one completes and get the result
futures = concurrent.futures.wait(fs, return_when=concurrent.futures.FIRST_COMPLETED)
#print(futures)
results.extend(future.result() for future in futures.done)
print(f'{len(results)} results so far')
fs = list(futures.not_done)
print(f'add a new task {n}')
fs.append(executor.submit(child, *args,**{'n':n}))
except KeyboardInterrupt as e:
print('ctrl-c!!}',file=sys.stderr)
# don't add anymore tasks
break
# get leftover results as they finish
for future in concurrent.futures.as_completed(fs):
print(f'{len(executor._pending_work_items)} tasks pending:')
result = future.result()
results.append(result)
results.sort()
# separate the results from the value used to sort
for n,result in results:
print(result)
Here is a way to get the results sorted in submission order without modifying the task. It uses a dictionary to relate each future to its submission order and uses it for the sort key.
# same imports
def child(*args):
signal.signal(signal.SIGINT, signal.SIG_IGN)
a,b,c = args
st = random.randrange(1, 5)
time.sleep(st)
x = f"Worker thread {a+1} slept for {st} - args:{args}"
return x
if __name__ == '__main__':
nworkers = 5 # ncpus?
sort_dict = {}
results = []
fs = []
with concurrent.futures.ProcessPoolExecutor(max_workers=nworkers) as executor:
data = ((i, 2*i, 3*i) for i in range(100))
for n,args in enumerate(data):
try:
# limit pending tasks
while len(executor._pending_work_items) >= nworkers + 2:
# wait till one completes and grab it
futures = concurrent.futures.wait(fs, return_when=concurrent.futures.FIRST_COMPLETED)
results.extend(future for future in futures.done)
print(f'{len(results)} futures completed so far')
fs = list(futures.not_done)
future = executor.submit(child, *args)
fs.append(future)
print(f'task {n} added - future:{future}')
sort_dict[future] = n
except KeyboardInterrupt as e:
print('ctrl-c!!',file=sys.stderr)
# don't add anymore tasks
break
# get leftover futures as they finish
for future in concurrent.futures.as_completed(fs):
print(f'{len(executor._pending_work_items)} tasks pending:')
results.append(future)
#sort the futures
results.sort(key=lambda f: sort_dict[f])
# get the results
for future in results:
print(future.result())
You could also just add an attribute to each future and sort on that (no need for the dictionary)
...
future = executor.submit(child, *args)
# add an attribute to the future that can be sorted on
future.submitted = n
fs.append(future)
...
results.sort(key=lambda f: f.submitted)

Dynamically generating new threads

I want to be able to run multiple threads without actually making a new line for every thread I want to run. In the code below I cannot dynamically add more accountIDs, or increase the #of threads just by changing the count on thread_count
For example this is my code now:
import threading
def get_page_list(account,thread_count):
return list_of_pages_split_by_threads
def pull_data(page_list,account_id):
data = api(page_list,account_id)
return data
if __name__ == "__main__":
accountIDs = [100]
#of threads to make:
thread_count = 3
#Returns a list of pages ie : [[1,2,3],[4,5,6],[7,8,9,10]]
page_lists = get_page_list(accountIDs[0],thread_count)
t1 = threading.Thread(target=pull_data, args=(page_list[0],accountIDs[0]))
t2 = threading.Thread(target=pull_data, args=(page_list[1],accountIDs[0]))
t3 = threading.Thread(target=pull_data, args=(page_list[2],accountIDs[0]))
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
This is where I want to get to:
Anytime I want to add an additional thread if the server can handle it or add additional accountIDs I dont have to reproduce the code?
IE (This example is what I would like to do, but the below doesnt work it tries to finish a whole list of pages before moving on to the next thread)
if __name__ == "__main__":
accountIDs = [100,101,103]
thread_count = 3
for account in accountIDs:
page_lists = get_page_list(account,thread_count)
for pg_list in page_list:
t1 = threading.Thread(target=pull_data, args=(pg_list,account))
t1.start()
t1.join()
One way of doing it is using Pool and Queue.
The pool will keep working while there are items in the queue, without holding the main thread.
Chose one of these imports:
import multiprocessing as mp (for process based parallelization)
import multiprocessing.dummy as mp (for thread based parallelization)
Creating the workers, pool and queue:
the_queue = mp.Queue() #store the account ids and page lists here
def worker_main(queue):
while waiting == True:
while not queue.empty():
account, pageList = queue.get(True) #get an id from the queue
pull_data(pageList, account)
waiting = True
the_pool = mp.Pool(num_parallel_workers, worker_main,(the_queue,))
# don't forget the coma here ^
accountIDs = [100,101,103]
thread_count = 3
for account in accountIDs:
list_of_page_lists = get_page_list(account, thread_count)
for pg_list in page_list:
the_queue.put((account, pg_list))
....
waiting = False #while you don't do this, the pool will probably never end.
#not sure if it's a good practice, but you might want to have
#the pool hanging there for a while to receive more items
the_pool.close()
the_pool.join()
Another option is to fill the queue first, create the pool second, use the worker only while there are items in the queue.
Then if more data arrives, you create another queue, another pool:
import multiprocessing.dummy as mp
#if you are not using dummy, you will probably need a queue for the results too
#as the processes will not access the vars from the main thread
#something like worker_main(input_queue, output_queue):
#and pull_data(pageList,account,output_queue)
#and mp.Pool(num_parallel_workers, worker_main,(in_queue,out_queue))
#and you get the results from the output queue after pool.join()
the_queue = mp.Queue() #store the account ids and page lists here
def worker_main(queue):
while not queue.empty():
account, pageList = queue.get(True) #get an id from the queue
pull_data(pageList, account)
accountIDs = [100,101,103]
thread_count = 3
for account in accountIDs:
list_of_page_lists = get_page_list(account, thread_count)
for pg_list in page_list:
the_queue.put((account, pg_list))
the_pool = mp.Pool(num_parallel_workers, worker_main,(the_queue,))
# don't forget the coma here ^
the_pool.close()
the_pool.join()
del the_queue
del the_pool
I couldn't get MP to work correctly so I did this instead and it seems to work great. But MP is probably the better way to tackle this problem
#Just keeps track of the threads
threads = []
#Generates a thread for whatever variable thread_count = N
for thread in range(thread_count):
#function retrns a list of pages stored in page_listS, this ensures each thread gets a unique list.
page_list = page_lists[thread]
#actual fucntion for each thread to work
t = threading.Thread(target=pull_data, args=(account,thread))
#puts all threads into a list
threads.append(t)
#runs all the treads up
t.start()
#After all threads are complete back to the main thread.. technically this is not needed
for t in threads:
t.join()
I also didn't understand why you would "need" .join() great answer here:
what is the use of join() in python threading

multiprocessing script to scan for new values and put in queue not working

Here is my script:
# globals
MAX_PROCESSES = 50
my_queue = Manager().Queue() # queue to store our values
stop_event = Event() # flag which signals processes to stop
my_pool = None
def my_function(var):
while not stop_event.is_set():
#this script will run forever for each variable found
return
def var_scanner():
# Since `t` could have unlimited size we'll put all `t` value in queue
while not stop_event.is_set(): # forever scan `values` for new items
x = Variable.objects.order_by('var').values('var__var')
for t in x:
t = t.values()
my_queue.put(t)
time.sleep(10)
try:
var_scanner = Process(target=var_scanner)
var_scanner.start()
my_pool = Pool(MAX_PROCESSES)
while not stop_event.is_set():
try: # if queue isn't empty, get value from queue and create new process
var = my_queue.get_nowait() # getting value from queue
p = Process(target=my_function, args=("process-%s" % var))
p.start()
except Queue.Empty:
print "No more items in queue"
except KeyboardInterrupt as stop_test_exception:
print(" CTRL+C pressed. Stopping test....")
stop_event.set()
However I don't think this script is exactly what I want. Here's what I was looking for when I wrote the script. I want it to scan for variables in "Variables" table, add "new" variables if they don't already exists to the queue, run "my_function" for each variable in the queue.
I believe I have WAYYYY to many while not stop_event.is_set() functions. Because right now it just prints out "No more items in queue" about a million times.
Please HELP!! :)

How to use Python multiprocessing queue to access GPU (through PyOpenCL)?

I have code that takes a long time to run and so I've been investigating Python's multiprocessing library in order to speed things up. My code also has a few steps that utilize the GPU via PyOpenCL. The problem is, if I set multiple processes to run at the same time, they all end up trying to use the GPU at the same time, and that often results in one or more of the processes throwing an exception and quitting.
In order to work around this, I staggered the start of each process so that they'd be less likely to bump into each other:
process_list = []
num_procs = 4
# break data into chunks so each process gets it's own chunk of the data
data_chunks = chunks(data,num_procs)
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Process(target=test, args=(arg1,arg2))
# Sticks the thread in a list so that it remains accessible
process_list.append(p)
# Start threads
j = 1
for process in process_list:
print('\nStarting process %i' % j)
process.start()
time.sleep(5)
j += 1
for process in process_list:
process.join()
I also wrapped a try except loop around the function that calls the GPU so that if two processes DO try to access it at the same time, the one who doesn't get access will wait a couple of seconds and try again:
wait = 2
n = 0
while True:
try:
gpu_out = GPU_Obj.GPU_fn(params)
except:
time.sleep(wait)
print('\n Waiting for GPU memory...')
n += 1
if n == 5:
raise Exception('Tried and failed %i times to allocate memory for opencl kernel.' % n)
continue
break
This workaround is very clunky and even though it works most of the time, processes occasionally throw exceptions and I feel like there should be a more effecient/elegant solution using multiprocessing.queue or something similar. However, I'm not sure how to integrate it with PyOpenCL for GPU access.
Sounds like you could use a multiprocessing.Lock to synchronize access to the GPU:
data_chunks = chunks(data,num_procs)
lock = multiprocessing.Lock()
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Process(target=test, args=(arg1,arg2, lock))
...
Then, inside test where you access the GPU:
with lock: # Only one process will be allowed in this block at a time.
gpu_out = GPU_Obj.GPU_fn(params)
Edit:
To do this with a pool, you'd do this:
# At global scope
lock = None
def init(_lock):
global lock
lock = _lock
data_chunks = chunks(data,num_procs)
lock = multiprocessing.Lock()
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Pool(initializer=init, initargs=(lock,))
p.apply(test, args=(arg1, arg2))
...
Or:
data_chunks = chunks(data,num_procs)
m = multiprocessing.Manager()
lock = m.Lock()
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Pool()
p.apply(test, args=(arg1, arg2, lock))

multiprocessing - reading big input data - program hangs

I want to run parallel computation on some input data which is loaded from a file. (The file can be really big, so I use a generator for this.)
On a certain number of items, my code runs OK but above this threshold the program hangs (some of the worker processes do not end).
Any suggestions? (I am running this with python2.7, 8 CPUs; 5,000 lines still OK, 7,500 does not work.)
Firstly, you need an input file. Generate it in bash:
for i in {0..10000}; do echo -e "$i"'\r' >> counter.txt; done
Then, run this:
python2.7 main.py 100 counter.txt > run_log.txt
main.py:
#!/usr/bin/python2.7
import os, sys, signal, time
import Queue
import multiprocessing as mp
def eat_queue(job_queue, result_queue):
"""Eats input queue, feeds output queue
"""
proc_name = mp.current_process().name
while True:
try:
job = job_queue.get(block=False)
if job == None:
print(proc_name + " DONE")
return
result_queue.put(execute(job))
except Queue.Empty:
pass
def execute(x):
"""Does the computation on the input data
"""
return x*x
def save_result(result):
"""Saves results in a list
"""
result_list.append(result)
def load(ifilename):
"""Generator reading the input file and
yielding it row by row
"""
ifile = open(ifilename, "r")
for line in ifile:
line = line.strip()
num = int(line)
yield (num)
ifile.close()
print("file closed".upper())
def put_tasks(job_queue, ifilename):
"""Feeds the job queue
"""
for item in load(ifilename):
job_queue.put(item)
for _ in range(get_max_workers()):
job_queue.put(None)
def get_max_workers():
"""Returns optimal number of processes to run
"""
max_workers = mp.cpu_count() - 2
if max_workers < 1:
return 1
return max_workers
def run(workers_num, ifilename):
job_queue = mp.Queue()
result_queue = mp.Queue()
# decide how many processes are to be created
max_workers = get_max_workers()
print "processes available: %d" % max_workers
if workers_num < 1 or workers_num > max_workers:
workers_num = max_workers
workers_list = []
# a process for feeding job queue with the input file
task_gen = mp.Process(target=put_tasks, name="task_gen",
args=(job_queue, ifilename))
workers_list.append(task_gen)
for i in range(workers_num):
tmp = mp.Process(target=eat_queue, name="w%d" % (i+1),
args=(job_queue, result_queue))
workers_list.append(tmp)
for worker in workers_list:
worker.start()
for worker in workers_list:
worker.join()
print "worker %s finished!" % worker.name
if __name__ == '__main__':
result_list = []
args = sys.argv
workers_num = int(args[1])
ifilename = args[2]
run(workers_num, ifilename)
This is because nothing in your code takes anything off result_queue. The behavior then depends on internal queue buffering details: if "not a lot" of data is waiting, everything appears fine, but if "a lot" of data is waiting, everything freezes. Not much more can be said, because it involves layers of internal magic ;-) But the docs do warn about it:
Warning
As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
One easy way to repair that: First add
result_queue.put(None)
before eat_queue() returns. Then add:
count = 0
while count < workers_num:
if result_queue.get() is None:
count += 1
before the main program .join()s the workers. That drains the result queue, and everything shuts down cleanly then.
BTW, this code is pretty bizarre:
while True:
try:
job = job_queue.get(block=False)
if job == None:
print(proc_name + " DONE")
return
result_queue.put(execute(job))
except Queue.Empty:
pass
Why are you doing non-blocking get()? This turns into a CPU-hog "busy loop" so long as the queue is empty. The primary point of .get() is to supply an efficient way to wait for work to show up. So:
while True:
job = job_queue.get()
if job is None:
print(proc_name + " DONE")
break
else:
result_queue.put(execute(job))
result_queue.put(None)
does the same thing, but far more efficiently.
Queue size caution
You didn't ask about this, but let's cover it before it bites you ;-) By default, there is no bound on a Queue's size. If, e.g., you add a billion items to the Queue, it will demand enough RAM to hold a billion items. So if your producer(s) can generate work items faster than your consumer(s) can process them, memory use can get out of hand quickly.
Fortunately, that's easy to repair: specify a maximum queue size. For example,
job_queue = mp.Queue(maxsize=10*workers_num)
^^^^^^^^^^^^^^^^^^^^^^^
Then job_queue.put(some_work_item) will block until consumers reduce the size of the queue to less than the maximum. This way you can process enormous problems with a queue that requires trivial RAM.

Categories