Python: threads can only be started once - python

I want to do threading in python. I have 100 words and want to put them in 6 different links. If one of the links is ready, I want that the link can get the new word. This while the other threads have still the first word in work. My complete program should be allowed to do more code first when the 100 keywords are done. I have the following code:
threads = []
def getresults(seed):
for link in links:
t = threading.Thread(target=getLinkResult, args = (suggestengine, seed))
threads.append(t)
for thread in threads:
thread.start()
for seed in tqdm:
getresults(seed + a)
getresults(seed + b)
for thread in threads:
thread.join()
#code that should happen after
I get an error at the moment:
threads can only be started once

You are calling getresults twice, and both times, they reference the same global threads list. This means, that when you call getresults for the first time, threads are started.
When you call them for the second time, the previous threads that are already running, have the .start() method invoked again.
You should start threads in the getresults as local threads, and then append them to the global threads list.
Although you can do the following:
for thread in threads:
if not thread.is_alive():
thread.start()
it does not solve the problem as one or more threads might've already ended and therefore be started again, and would therefore cause the same error.

You should start only new threads in your getresults
threads = []
def getresults(seed):
local_threads = []
for link in links:
t = threading.Thread(target=getLinkResult, args = (suggestengine, seed))
local_threads.append(t)
threads.append(t)
for thread in local_threads:
thread.start()
for seed in tqdm:
getresults(seed + a)
getresults(seed + b)
for thread in threads:
thread.join()

Fastest way, but not the brightest (general problem):
from tkinter import *
import threading, time
def execute_script():
def sub_execute():
print("Wait 5 seconds")
time.sleep(5)
print("5 seconds passed by")
threading.Thread(target=sub_execute).start()
root = Tk()
button_1 = Button(master=root, text="Execute Script", command=execute_script)
button_1.pack()
root.mainloop()

The error is explicit. You start your threads twice, while you shouldn't.
getresults(seed + a)
getresults(seed + b)
When you sequence these calls you start twice the loop of threads. To properly do what you want to do, you to make a thread pool and a task queue. Basically, you need a second list of words to process and a mutex. Each thread will lock the mutex, read and dequeue a word, then unlock and process the word.

Related

Starting thread after thread is finished

Lets say I want to run 10 threads at same time and after one is finished start immediately new one. How can I do that?
I know with thread.join() I can wait to get finished, but than 10 threads needs to be finished, but I want after one finished to start new one immediately.
Well, what I understand is that you need to execute 10 thread at the same time.
I suggest you to use threading.BoundedSemaphore()
A sample code on using it is given below:
import threading
from typing import List
def do_something():
print("I hope this cleared your doubt :)")
sema4 = threading.BoundedSemaphore(10)
# 10 is given as parameter since your requirement stated that you need just 10 threads to get executed parallely
threads_list: List[threading.Thread] = []
# Above variable is used to save threads
for i in range(100):
thread = threading.Thread(target=do_something)
threads_list.append(thread) # saving thread in order to join it later
thread.start() # starting the thread
for thread in threads_list:
thread.join() # else, parent program is terminated without waiting for child threads

Python multi-threading performance issue related to start()

I had some performance issues with a multi-threading code to parallelize multiple telnet probes.
Slow
My first implementation was is really slow, same a if the tasks were run sequencially:
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
for thread in threads:
thread.start()
thread.join()
Blastlingly Fast
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
thread.start() # <----- moved this
for thread in threads:
thread.join()
Question
I don't get why moving the start() method change the performance so much.
In your first implementation you are actually running the code sequentially because by calling join() immediately after start() the main thread is blocked until the started thread is finished.
thread.join() is blocking every thread as soon as they are created in your first implementation.
According to threading.Thread.join() documentation:
Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is called terminates -- either normally or through an unhandled exception or until the optional timeout occurs".
In your slow example you start the thread and wait till it is complete, then you iterate to the next thread.
Example
from threading import Thread
from time import sleep
def foo(a, b):
while True:
print(a + ' ' + b)
sleep(1)
ths = []
for i in range(3):
th = Thread(target=foo, args=('hi', str(i)))
ths.append(th)
for th in ths:
th.start()
th.join()
Produces
hi 0
hi 0
hi 0
hi 0
In your slow solution you are basically not using multithreading at all. Id's running a thread, waiting to finish it and then running another - there is no difference in running everything in one thread and this solution - you are running them in series.
The second one on the other hand starts all threads and then joins them. This solution limits the execution time to the longest execution time of one single thread - you are running them in parallel.

Running methods in the same time

At the begging I have write very general topic, even if I know about thread and proccess, but I don't know which of these both will be better for my case.
Ok, so.. code:
class Proces(object):
[...]
def Obsluz(self):
proces = LRU(self.sekwencja, int(self.przydzielone_ramki))
proces.Symulacja("T")
#.thread.join()
def Threads(self):
thread = Thread(target = self.Obsluz)
thread.start()
thread.join()
and running that code :
for lru in self.lru_procesy:
lru.Watek()
What I want achieve is running at the same time method Obsluz several times with different params(which are taking from Proces.attributes). It's random number how many will be proces object. It can be 10/20/30 ect.
My code above is not running like I want to, because each thread is ending one by one(because of .join()). Is it possible to running these at the same time?
thank you!
You are just starting one worker and immediately waiting for it to finish.
To spawn several worker threads and wait for them all to finish use something like this:
workers = []
for wid in range(nworkers):
w = Thread(target = dowork, args = ...)
w.start()
workers.append(w)
# join all of the workers
for w in workers: w.join()
print "All done!"

Python: Understanding Threading Module

While learning Python's threading module I've run a simple test. Interesting that the threads are running sequentially and not parallel. Is it possible to modify this test code so a program executes the threads in same fashion as multiprocessing does: in parallel?
import threading
def mySlowFunc(arg):
print "\nStarting...", arg
m=0
for i in range(arg):
m+=i
print '\n...Finishing', arg
myList =[35000000, 45000000, 55000000]
for each in myList:
thread = threading.Thread(target=mySlowFunc, args=(each,) )
thread.daemon = True
thread.start()
thread.join()
print "\n Happy End \n"
REVISED CODE:
This version of the code will initiate 6 'threads' running in 'parallel'. But even while there will be 6 threads only two CPU's threads are actually used (6 other Physical CPU threads will be idling and doing nothing).
import threading
def mySlowFunc(arg):
print "\nStarting " + str(arg) + "..."
m=0
for i in range(arg):
m+=i
print '\n...Finishing ' + str(arg)
myList =[35000000, 45000000, 55000000, 25000000, 75000000, 65000000]
for each in myList:
thread = threading.Thread(target=mySlowFunc, args=(each,) )
thread.daemon = False
thread.start()
print "\n Bottom of script reached \n"
From the docs for the join method:
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.
Just create a list of threads and join them after launching every single one of them.
Edit:
The threads are executing in parallel, you can think of python's threads like a computer with a single core, the thing is, python's threads are best for IO operations (reading/writing a big file, sending data through a socket, that sort of thing). If you want CPU power you need to use the multiprocessing module
If python didn't have the GIL, you ought to be able to see true parallelism by changing your code to only join after you have started all threads:
threads = []
for each in myList:
t = threading.Thread(target=mySlowFunc, args=(each,) )
t.daemon = True
t.start()
threads.append(t)
for t in threads:
t.join()
With the above code in python, you should at least be able to see interleaving: thread #2 doing some work before thread #1 has completed. But, you won't see genuine parallelism. See the GIL link for more background.

In Python threading, how I can I track a thread's completion?

I've a python program that spawns a number of threads. These threads last anywhere between 2 seconds to 30 seconds. In the main thread I want to track whenever each thread completes and print a message. If I just sequentially .join() all threads and the first thread lasts 30 seconds and others complete much sooner, I wouldn't be able to print a message sooner -- all messages will be printed after 30 seconds.
Basically I want to block until any thread completes. As soon as a thread completes, print a message about it and go back to blocking if any other threads are still alive. If all threads are done then exit program.
One way I could think of is to have a queue that is passed to all the threads and block on queue.get(). Whenever a message is received from the queue, print it, check if any other threads are alive using threading.active_count() and if so, go back to blocking on queue.get(). This would work but here all the threads need to follow the discipline of sending a message to the queue before terminating.
I'm wonder if this is the conventional way of achieving this behavior or are there any other / better ways ?
Here's a variation on #detly's answer that lets you specify the messages from your main thread, instead of printing them from your target functions. This creates a wrapper function which calls your target and then prints a message before terminating. You could modify this to perform any kind of standard cleanup after each thread completes.
#!/usr/bin/python
import threading
import time
def target1():
time.sleep(0.1)
print "target1 running"
time.sleep(4)
def target2():
time.sleep(0.1)
print "target2 running"
time.sleep(2)
def launch_thread_with_message(target, message, args=[], kwargs={}):
def target_with_msg(*args, **kwargs):
target(*args, **kwargs)
print message
thread = threading.Thread(target=target_with_msg, args=args, kwargs=kwargs)
thread.start()
return thread
if __name__ == '__main__':
thread1 = launch_thread_with_message(target1, "finished target1")
thread2 = launch_thread_with_message(target2, "finished target2")
print "main: launched all threads"
thread1.join()
thread2.join()
print "main: finished all threads"
The thread needs to be checked using the Thread.is_alive() call.
Why not just have the threads themselves print a completion message, or call some other completion callback when done?
You can the just join these threads from your main program, so you'll see a bunch of completion messages and your program will terminate when they're all done, as required.
Here's a quick and simple demonstration:
#!/usr/bin/python
import threading
import time
def really_simple_callback(message):
"""
This is a really simple callback. `sys.stdout` already has a lock built-in,
so this is fine to do.
"""
print message
def threaded_target(sleeptime, callback):
"""
Target for the threads: sleep and call back with completion message.
"""
time.sleep(sleeptime)
callback("%s completed!" % threading.current_thread())
if __name__ == '__main__':
# Keep track of the threads we create
threads = []
# callback_when_done is effectively a function
callback_when_done = really_simple_callback
for idx in xrange(0, 10):
threads.append(
threading.Thread(
target=threaded_target,
name="Thread #%d" % idx,
args=(10 - idx, callback_when_done)
)
)
[t.start() for t in threads]
[t.join() for t in threads]
# Note that thread #0 runs for the longest, but we'll see its message first!
What I would suggest is loop like this
while len(threadSet) > 0:
time.sleep(1)
for thread in theadSet:
if not thread.isAlive()
print "Thread "+thread.getName()+" terminated"
threadSet.remove(thread)
There is a 1 second sleep, so there will be a slight delay between the thread termination and the message being printed. If you can live with this delay, then I think this is a simpler solution than the one you proposed in your question.
You can let the threads push their results into a threading.Queue. Have another thread wait on this queue and print the message as soon as a new item appears.
I'm not sure I see the problem with using:
threading.activeCount()
to track the number of threads that are still active?
Even if you don't know how many threads you're going to launch before starting it seems pretty easy to track. I usually generate thread collections via list comprehension then a simple comparison using activeCount to the list size can tell you how many have finished.
See here: http://docs.python.org/library/threading.html
Alternately, once you have your thread objects you can just use the .isAlive method within the thread objects to check.
I just checked by throwing this into a multithread program I have and it looks fine:
for thread in threadlist:
print(thread.isAlive())
Gives me a list of True/False as the threads turn on and off. So you should be able to do that and check for anything False in order to see if any thread is finished.
I use a slightly different technique because of the nature of the threads I used in my application. To illustrate, this is a fragment of a test-strap program I wrote to scaffold a barrier class for my threading class:
while threads:
finished = set(threads) - set(threading.enumerate())
while finished:
ttt = finished.pop()
threads.remove(ttt)
time.sleep(0.5)
Why do I do it this way? In my production code, I have a time limit, so the first line actually reads "while threads and time.time() < cutoff_time". If I reach the cut-off, I then have code to tell the threads to shut down.

Categories