I am using the threading libary and want to have one thread that will call several threads. The background to this program is that I have a camera which captures Image and makes them available in a class on a TCP-SocketServer.
Thus I need one thread that runs the camera capturing and a second thread that runs the TCPServer, but within this Thread there are several Threads for each incoming connection.
This last thread means I need a thread that can create threads on its own. Unfortunately this did not work.
I managed to break down the immense code into a small snippet which represents the problem:
import threading
def adder(x,res,i):
res[i] = res[i] + x*i;
def creator(a,threads,results):
results = []
for i in range(0,a):
results.append(0)
threads.append(threading.Thread(target=adder,args=(a,results,i)))
threads[i].start()
for i in range(0,len(threads)):
threads[i].join()
return results;
threads = [];
results = [];
mainThread = threading.Thread(target=creator,args=([5,threads,results]))
mainThread.start()
mainThread.join()
for i in range(0,len(results)):
print results[i]
print threads[i]
In the function creator which is called as a thread there should be several threads created with the funciton adder.
However the results are empty, why is that so?
This is the same problem that occurs in my larger program.
You got close! :-)
The problem in the latest version of the code is that, while the global results is passed to creator(), creator() never uses it: it creates its own local results list. Of course modifying the latter has no effect on the global results, so that one remains empty. So here's a variation to repair that, but also with minor local changes to make the code more "Pythonic":
import threading
def adder(x, res, i):
res[i] += x*i
def creator(a, threads, results):
for i in range(a):
results.append(0)
t = threading.Thread(target=adder, args=(a, results, i))
threads.append(t)
t.start()
for t in threads:
t.join()
threads = []
results = []
mainThread = threading.Thread(target=creator, args=(5, threads, results))
mainThread.start()
mainThread.join()
for i in range(len(results)):
print results[i]
print threads[i]
Related
I have realized that my multithreading program isn't doing what I think its doing. The following is a MWE of my strategy. In essence I'm creating nThreads threads but only actually using one of them. Could somebody help me understand my mistake and how to fix it?
import threading
import queue
NPerThread = 100
nThreads = 4
def worker(q: queue.Queue, oq: queue.Queue):
while True:
l = []
threadIData = q.get(block=True)
for i in range(threadIData["N"]):
l.append(f"hello {i} from thread {threading.current_thread().name}")
oq.put(l)
q.task_done()
threadData = [{} for i in range(nThreads)]
inputQ = queue.Queue()
outputQ = queue.Queue()
for threadI in range(nThreads):
threadData[threadI]["thread"] = threading.Thread(
target=worker, args=(inputQ, outputQ),
name=f"WorkerThread{threadI}"
)
threadData[threadI]["N"] = NPerThread
threadData[threadI]["thread"].setDaemon(True)
threadData[threadI]["thread"].start()
for threadI in range(nThreads):
# start and end are in units of 8 bytes.
inputQ.put(threadData[threadI])
inputQ.join()
outData = [None] * nThreads
count = 0
while not outputQ.empty():
outData[count] = outputQ.get()
count += 1
for i in outData:
assert len(i) == NPerThread
print(len(i))
print(outData)
edit
I only actually realised that I had made this mistake after profiling. Here's the output, for information:
In your sample program, the worker function is just executing so fast that the same thread is able to dequeue every item. If you add a time.sleep(1) call to it, you'll see other threads pick up some of the work.
However, it is important to understand if threads are the right choice for your real application, which presumably is doing actual work in the worker threads. As #jrbergen pointed out, because of the GIL, only one thread can execute Python bytecode at a time, so if your worker functions are executing CPU-bound Python code (meaning not doing blocking I/O or calling a library that releases the GIL), you're not going to get a performance benefit from threads. You'd need to use processes instead in that case.
I'll also note that you may want to use concurrent.futures.ThreadPoolExecutor or multiprocessing.dummy.ThreadPool for an out-of-the-box thread pool implementation, rather than creating your own.
On several occasions, I have a list of tasks that need to be executed via Python. Typically these tasks take a few seconds, but there are hundreds-of-thousands of tasks and treading significantly improves execution time. Is there a way to dynamically specify the number of threads a python script should utilize in order to solve a stack of tasks?
I have had success running threads when executed in the body of Python code, but I have never been able to run threads correctly when they are within a function (I assume this is because of scoping). Below is my approach to dynamically define a list of threads which should be used to execute several tasks.
The problem is that this approach waits for a single thread to complete before continuing through the for loop.
import threading
import sys
import time
def null_thread():
""" used to instanciate threads """
pass
def instantiate_threads(number_of_threads):
""" returns a list containing the number of threads specified """
threads_str = []
threads = []
index = 0
while index < number_of_threads:
exec("threads_str.append(f't{index}')")
index += 1
for t in threads_str:
t = threading.Thread(target = null_thread())
t.start()
threads.append(t)
return threads
def sample_task():
""" dummy task """
print("task start")
time.sleep(10)
def main():
number_of_threads = int(sys.argv[1])
threads = instantiate_threads(number_of_threads)
# a routine that assigns work to the array of threads
index = 0
while index < 100:
task_assigned = False
while not task_assigned:
for thread in threads:
if not thread.is_alive():
thread = threading.Thread(target = sample_task())
thread.start()
# script seems to wait until thread is complete before moving on...
print(f'index: {index}')
task_assigned = True
index += 1
# wait for threads to finish before terminating
for thread in threads:
while thread.is_alive():
pass
if __name__ == '__main__':
main()
Solved:
You could convert to using concurrent futures ThreadPoolExecutor,
where you can set the amount of threads to spawn using
max_workers=amount of threads. – user56700
I want to do threading in python. I have 100 words and want to put them in 6 different links. If one of the links is ready, I want that the link can get the new word. This while the other threads have still the first word in work. My complete program should be allowed to do more code first when the 100 keywords are done. I have the following code:
threads = []
def getresults(seed):
for link in links:
t = threading.Thread(target=getLinkResult, args = (suggestengine, seed))
threads.append(t)
for thread in threads:
thread.start()
for seed in tqdm:
getresults(seed + a)
getresults(seed + b)
for thread in threads:
thread.join()
#code that should happen after
I get an error at the moment:
threads can only be started once
You are calling getresults twice, and both times, they reference the same global threads list. This means, that when you call getresults for the first time, threads are started.
When you call them for the second time, the previous threads that are already running, have the .start() method invoked again.
You should start threads in the getresults as local threads, and then append them to the global threads list.
Although you can do the following:
for thread in threads:
if not thread.is_alive():
thread.start()
it does not solve the problem as one or more threads might've already ended and therefore be started again, and would therefore cause the same error.
You should start only new threads in your getresults
threads = []
def getresults(seed):
local_threads = []
for link in links:
t = threading.Thread(target=getLinkResult, args = (suggestengine, seed))
local_threads.append(t)
threads.append(t)
for thread in local_threads:
thread.start()
for seed in tqdm:
getresults(seed + a)
getresults(seed + b)
for thread in threads:
thread.join()
Fastest way, but not the brightest (general problem):
from tkinter import *
import threading, time
def execute_script():
def sub_execute():
print("Wait 5 seconds")
time.sleep(5)
print("5 seconds passed by")
threading.Thread(target=sub_execute).start()
root = Tk()
button_1 = Button(master=root, text="Execute Script", command=execute_script)
button_1.pack()
root.mainloop()
The error is explicit. You start your threads twice, while you shouldn't.
getresults(seed + a)
getresults(seed + b)
When you sequence these calls you start twice the loop of threads. To properly do what you want to do, you to make a thread pool and a task queue. Basically, you need a second list of words to process and a mutex. Each thread will lock the mutex, read and dequeue a word, then unlock and process the word.
I have my runner code that starts 5 threads, however, it only starts 1 thread (by which I know because it doesn't loop), take a look at the code:
import Handle
import threading
h = Handle.Handle()
h.StartConnection()
for i in range(0, 5):
print("Looped")
t = threading.Thread(target=h.Spawn())
t.start()
It only prints "Looped" once and only runs "Spawn" once aswell. Any ideas?
The issues I noticed:
You are replacing the t variable in each loop. So finally you just have one thread assigned to it.
Does the Spawn function return a function? If it does then it's okay, otherwise you should just pass Spawn to the target, not call Spawn() .
If the Spawn function is long running in nature (I assume it is), then your call to the Spawn function will block the loop and wait until it returns. This is why your loop might print "looped" once and the Spawn function getting called just once too.
My suggestion would be like this:
import Handle
import threading
h = Handle.Handle()
h.StartConnection()
threads = []
for i in range(0, 5):
print("Looped")
t = threading.Thread(target=h.Spawn)
threads.append(t)
t.start()
I took a list to store the threads - the threads list. Then appending each of the thread in it before calling start. Now I can iterate over the threads list anytime I want (may be for joining them?).
Also since I assumed Spawn is a long running function, I passed it as the target to the Thread constructor. So it should be run in background when we call start on the thread. Now it should no longer block the loop.
You are not running threads, you run the Spawn-method right in the main thread. target needs to be a function, not the result of that function:
t = threading.Thread(target=h.Spawn)
Try this code .
import Handle
import threading
h = Handle.Handle()
h.StartConnection()
for i in range(0, 5):
print("Looped")
threading.Timer(5.0, h).start()
I have been parallelizing a code calling myfunc with threading.Thread as follows:
def myfunc(elt,other):
subprocess.call("A matlab script that takes a while to execute")
allThreads = []
for elt in allElts:
allThreads.append(threading.Thread(target=myfunc,args=(elt,other)))
for t in allThreads:
t.start()
for t in allThreads:
t.join()
Due to the important amount of data, I faced a memory issue: Some of my subscribe.call raised a memory issue and could not be allocated. To avoid this issue, I tried to limit the amount of threads executing simultaneously to 8. I changes the code above to the following:
someThreads = []
k = 0
for k in range(len(allElts)):
if k%8 == 1:
for t in someThreads:
t.start()
for t in someThreads:
t.join()
someThreads = []
someThreads.append(threading.Thread(target=myfunc,args=(allElts[k],other)))
else:
someThreads.append(threading.Thread(target=myfunc,args=(allElts[k],other)))
k += 1
This is supposed to create 8 threads maximum and execute them.
However, the result from this piece of code is different from the one I got before and clearly wrong. What is wrong with it?
The threads are not started until k%8 == 1, and then a new thread is added to a new someThreads, but is not started.
That means that at the end of the loop there will be at least one thread in someThreads that does not get started with a call to t.start().
Instead, use a multiprocessing ThreadPool:
import multiprocessing as mp
import multiprocessing.pool as mpool
pool = mpool.ThreadPool(8)
for elt in allElts:
pool.apply_async(myfunc, args=(elt,other))
pool.close()
pool.join()