Limiting used resources in multithreading - python

I have been parallelizing a code calling myfunc with threading.Thread as follows:
def myfunc(elt,other):
subprocess.call("A matlab script that takes a while to execute")
allThreads = []
for elt in allElts:
allThreads.append(threading.Thread(target=myfunc,args=(elt,other)))
for t in allThreads:
t.start()
for t in allThreads:
t.join()
Due to the important amount of data, I faced a memory issue: Some of my subscribe.call raised a memory issue and could not be allocated. To avoid this issue, I tried to limit the amount of threads executing simultaneously to 8. I changes the code above to the following:
someThreads = []
k = 0
for k in range(len(allElts)):
if k%8 == 1:
for t in someThreads:
t.start()
for t in someThreads:
t.join()
someThreads = []
someThreads.append(threading.Thread(target=myfunc,args=(allElts[k],other)))
else:
someThreads.append(threading.Thread(target=myfunc,args=(allElts[k],other)))
k += 1
This is supposed to create 8 threads maximum and execute them.
However, the result from this piece of code is different from the one I got before and clearly wrong. What is wrong with it?

The threads are not started until k%8 == 1, and then a new thread is added to a new someThreads, but is not started.
That means that at the end of the loop there will be at least one thread in someThreads that does not get started with a call to t.start().
Instead, use a multiprocessing ThreadPool:
import multiprocessing as mp
import multiprocessing.pool as mpool
pool = mpool.ThreadPool(8)
for elt in allElts:
pool.apply_async(myfunc, args=(elt,other))
pool.close()
pool.join()

Related

Why is my multithreading program only actually using a single thread?

I have realized that my multithreading program isn't doing what I think its doing. The following is a MWE of my strategy. In essence I'm creating nThreads threads but only actually using one of them. Could somebody help me understand my mistake and how to fix it?
import threading
import queue
NPerThread = 100
nThreads = 4
def worker(q: queue.Queue, oq: queue.Queue):
while True:
l = []
threadIData = q.get(block=True)
for i in range(threadIData["N"]):
l.append(f"hello {i} from thread {threading.current_thread().name}")
oq.put(l)
q.task_done()
threadData = [{} for i in range(nThreads)]
inputQ = queue.Queue()
outputQ = queue.Queue()
for threadI in range(nThreads):
threadData[threadI]["thread"] = threading.Thread(
target=worker, args=(inputQ, outputQ),
name=f"WorkerThread{threadI}"
)
threadData[threadI]["N"] = NPerThread
threadData[threadI]["thread"].setDaemon(True)
threadData[threadI]["thread"].start()
for threadI in range(nThreads):
# start and end are in units of 8 bytes.
inputQ.put(threadData[threadI])
inputQ.join()
outData = [None] * nThreads
count = 0
while not outputQ.empty():
outData[count] = outputQ.get()
count += 1
for i in outData:
assert len(i) == NPerThread
print(len(i))
print(outData)
edit
I only actually realised that I had made this mistake after profiling. Here's the output, for information:
In your sample program, the worker function is just executing so fast that the same thread is able to dequeue every item. If you add a time.sleep(1) call to it, you'll see other threads pick up some of the work.
However, it is important to understand if threads are the right choice for your real application, which presumably is doing actual work in the worker threads. As #jrbergen pointed out, because of the GIL, only one thread can execute Python bytecode at a time, so if your worker functions are executing CPU-bound Python code (meaning not doing blocking I/O or calling a library that releases the GIL), you're not going to get a performance benefit from threads. You'd need to use processes instead in that case.
I'll also note that you may want to use concurrent.futures.ThreadPoolExecutor or multiprocessing.dummy.ThreadPool for an out-of-the-box thread pool implementation, rather than creating your own.

Periodically restart Python multiprocessing pool

I have a Python multiprocessing pool doing a very long job that even after a thorough debugging is not robust enough not to fail every 24 hours or so, because it depends on many third-party, non-Python tools with complex interactions. Also, the underlying machine has certain problems that I cannot control. Note that by failing I don't mean the whole program crashing, but some or most of the processes becoming idle because of some errors, and the app itself either hanging or continuing the job just with the processes that haven't failed.
My solution right now is to periodically kill the job, manually, and then just restart from where it was.
Even if it's not ideal, what I want to do now is the following: restart the multiprocessing pool periodically, programatically, from the Python code itself. I don't really care if this implies killing the pool workers in the middle of their job. Which would be the best way to do that?
My code looks like:
with Pool() as p:
for _ in p.imap_unordered(function, data):
save_checkpoint()
log()
What I have in mind would be something like:
start = 0
end = 1000 # magic number
while start + 1 < len(data):
current_data = data[start:end]
with Pool() as p:
for _ in p.imap_unordered(function, current_data):
save_checkpoint()
log()
start += 1
end += 1
Or:
start = 0
end = 1000 # magic number
while start + 1 < len(data):
current_data = data[start:end]
start_timeout(time=TIMEOUT) # which would be the best way to to do that without breaking multiprocessing?
try:
with Pool() as p:
for _ in p.imap_unordered(function, current_data):
save_checkpoint()
log()
start += 1
end += 1
except Timeout:
pass
Or any suggestion you think would be better. Any help would be much appreciated, thanks!
The problem with your current code is that it iterates the multiprocessed results directly, and that call will block. Fortunately there's an easy solution: use apply_async exactly as suggested in the docs. But because of how you describe the use-case here and the failure, I've adapted it somewhat. Firstly, a mock task:
from multiprocessing import Pool, TimeoutError, cpu_count
from time import sleep
from random import randint
def log():
print("logging is a dangerous activity: wear a hard hat.")
def work(d):
sleep(randint(1, 100) / 100)
print("finished working")
if randint(1, 10) == 1:
print("blocking...")
while True:
sleep(0.1)
return d
This work function will fail with a probabilty of 0.1, blocking indefinitely. We create the tasks:
data = list(range(100))
nproc = cpu_count()
And then generate futures for all of them:
while data:
print(f"== Processing {len(data)} items. ==")
with Pool(nproc) as p:
tasks = [p.apply_async(work, (d,)) for d in data]
Then we can try to get the tasks out manually:
for task in tasks:
try:
res = task.get(timeout=1)
data.remove(res)
log()
except TimeoutError:
failed.append(task)
if len(failed) < nproc:
print(
f"{len(failed)} processes are blocked,"
f" but {nproc - len(failed)} remain."
)
else:
break
The controlling timeout here is the timeout to .get. It should be as long as you expect the longest process to take. Note that we detect when the whole pool is tied up and give up.
But since in the scenario you describe some threads are going to take longer than others, we can give 'failed' processes some time to recover. Thus every time a task fails we quickly check if the others have in fact succeeded:
for task in failed:
try:
res = task.get(timeout=0.01)
data.remove(res)
failed.remove(task)
log()
except TimeoutError:
continue
Whether this is a good addition in your case depends on whether your tasks really are as flaky as I'm guessing they are.
Exiting the context manager for the pool will terminate the pool, so we don't even need to handle that ourselves. If you have significant variation you might want to increase the pool size (thus increasing the number of tasks which are allowed to stall) or allow tasks a grace period before considering them 'failed'.

Timing a multiprocessing script

I've stumbled across a weird timing issue while using the multiprocessing module.
Consider the following scenario. I have functions like this:
import multiprocessing as mp
def workerfunc(x):
# timehook 3
# something with x
# timehook 4
def outer():
# do something
mygen = ... (some generator expression)
pool = mp.Pool(processes=8)
# time hook 1
result = [pool.apply(workerfunc, args=(x,)) for x in mygen]
# time hook 2
if __name__ == '__main__':
outer()
I am utilizing the time module to get an arbitrary feeling for how long my functions run. I successfully create 8 separate processes, which terminate without error. The longest time for a worker to finish is about 130 ms (measured between timehook 3 and 4).
I expected (as they are running in parallel) that the time between hook 1 and 2 will be approximately the same. Surprisingly, I get 600 ms as a result.
My machine has 32 cores and should be able to handle this easily. Can anybody give me a hint where this difference in time comes from?
Thanks!
You are using pool.apply which is blocking. Use pool.apply_async instead and then the function calls will all run in parallel, and each will return an AsyncResult object immediately. You can use this object to check when the processes are done and then retrieve the results using this object also.
Since you are using multiprocessing and not multithreading your performance issue is not related to GIL (Python's Global Interpreter Lock).
I've found an interesting link explaining this with an example, you can find it in the bottom of this answer.
The GIL does not prevent a process from running on a different
processor of a machine. It simply only allows one thread to run at
once within the interpreter.
So multiprocessing not multithreading will allow you to achieve true
concurrency.
Lets understand this all through some benchmarking because only that
will lead you to believe what is said above. And yes, that should be
the way to learn — experience it rather than just read it or
understand it. Because if you experienced something, no amount of
argument can convince you for the opposing thoughts.
import random
from threading import Thread
from multiprocessing import Process
size = 10000000 # Number of random numbers to add to list
threads = 2 # Number of threads to create
my_list = []
for i in xrange(0,threads):
my_list.append([])
def func(count, mylist):
for i in range(count):
mylist.append(random.random())
def multithreaded():
jobs = []
for i in xrange(0, threads):
thread = Thread(target=func,args=(size,my_list[i]))
jobs.append(thread)
# Start the threads
for j in jobs:
j.start()
# Ensure all of the threads have finished
for j in jobs:
j.join()
def simple():
for i in xrange(0, threads):
func(size,my_list[i])
def multiprocessed():
processes = []
for i in xrange(0, threads):
p = Process(target=func,args=(size,my_list[i]))
processes.append(p)
# Start the processes
for p in processes:
p.start()
# Ensure all processes have finished execution
for p in processes:
p.join()
if __name__ == "__main__":
multithreaded()
#simple()
#multiprocessed()
Additional information
Here you can find the source of this information and a more detailed technical explanation (bonus: there's also Guido Van Rossum quotes in it :) )

Creating Threads within a Thread in Python

I am using the threading libary and want to have one thread that will call several threads. The background to this program is that I have a camera which captures Image and makes them available in a class on a TCP-SocketServer.
Thus I need one thread that runs the camera capturing and a second thread that runs the TCPServer, but within this Thread there are several Threads for each incoming connection.
This last thread means I need a thread that can create threads on its own. Unfortunately this did not work.
I managed to break down the immense code into a small snippet which represents the problem:
import threading
def adder(x,res,i):
res[i] = res[i] + x*i;
def creator(a,threads,results):
results = []
for i in range(0,a):
results.append(0)
threads.append(threading.Thread(target=adder,args=(a,results,i)))
threads[i].start()
for i in range(0,len(threads)):
threads[i].join()
return results;
threads = [];
results = [];
mainThread = threading.Thread(target=creator,args=([5,threads,results]))
mainThread.start()
mainThread.join()
for i in range(0,len(results)):
print results[i]
print threads[i]
In the function creator which is called as a thread there should be several threads created with the funciton adder.
However the results are empty, why is that so?
This is the same problem that occurs in my larger program.
You got close! :-)
The problem in the latest version of the code is that, while the global results is passed to creator(), creator() never uses it: it creates its own local results list. Of course modifying the latter has no effect on the global results, so that one remains empty. So here's a variation to repair that, but also with minor local changes to make the code more "Pythonic":
import threading
def adder(x, res, i):
res[i] += x*i
def creator(a, threads, results):
for i in range(a):
results.append(0)
t = threading.Thread(target=adder, args=(a, results, i))
threads.append(t)
t.start()
for t in threads:
t.join()
threads = []
results = []
mainThread = threading.Thread(target=creator, args=(5, threads, results))
mainThread.start()
mainThread.join()
for i in range(len(results)):
print results[i]
print threads[i]

Multiprocessing using maximum CPU power in Python-3.x

I'm working on human genome which consists of 3.2 billions of characters and i have a list of objects which need to be searched within this data. Something like this:
result_final=[]
objects=['obj1','obj2','obj3',...]
def function(obj):
result_1=search_in_genome(obj)
return(result_1)
for item in objects:
result_2=function(item)
result_final.append(result_2)
Each object's search within the data takes nearly 30 seconds and i have few thousands of objects. I noticed that while doing this serially just 7% of CPU and 5% of RAM is being used. As i searched, for reducing the computation time i should do parallel computation using queuing , threading or multiprocessing. but they seem complicated for non-experts. could anybody help me how i can code for python to run 10 simultaneous searches and is it possible to make python to use maximum available CPU and RAM for multiprocessing? (I'm using Python33 on windows 7 with 64Gb RAM,COREI7 and 3.5 GH CPU)
You can use the multiprocessing module for this:
from multiprocessing import Pool
objects=['obj1','obj2','obj3',...]
def function(obj):
result_1=search_in_genome(obj)
return(result)
if __name__ == "__main__":
pool = Pool()
result_final = pool.map(function, objects)
This will allow you to scale the work across all available CPUs on your machine, because processes aren't affected by the GIL. You wouldn't want to run too many more tasks than there are CPUs available. Once you do that, you actually start slowing things down, because then the CPUs have to constantly switch between processes, which has a performance penalty.
Ok I'm not sure of your question, but I would do this (Note that there may be a better solution because I'm not an expert with the Queue Object) :
If you want to multithread your searches :
class myThread (threading.Thread):
def __init__(self, obj):
threading.Thread.__init__(self)
self.result = None
self.obj = obj
#Function who is called when you start your Thread
def run(self)
#Execute your function here
self.result = search_in_genome(self.obj)
if __name__ == '__main__':
result_final=[]
objects=['obj1','obj2','obj3',...]
#List of Thread
listThread = []
#Count number of potential thread
allThread = objects.len()
allThreadDone = 0
for item in objects:
#Create one thread
thread = myThread(item)
#Launch that Thread
thread.start()
#Stock it into the list
listThread.append(thread)
while True:
for thread in listThread:
#Count number of Thread who are finished
if thread.result != None:
#If a Thread is finished, count it
allThreadDone += 1
#If all thread are finished, then stop program
if allThreadDone == allThread:
break
#Else initialyse flag to count again
else:
allThreadDone = 0
If someone can check and validate this code that would be better. (Sorry for my english btw)

Categories