Python - Threads not parallel - python

I have my runner code that starts 5 threads, however, it only starts 1 thread (by which I know because it doesn't loop), take a look at the code:
import Handle
import threading
h = Handle.Handle()
h.StartConnection()
for i in range(0, 5):
print("Looped")
t = threading.Thread(target=h.Spawn())
t.start()
It only prints "Looped" once and only runs "Spawn" once aswell. Any ideas?

The issues I noticed:
You are replacing the t variable in each loop. So finally you just have one thread assigned to it.
Does the Spawn function return a function? If it does then it's okay, otherwise you should just pass Spawn to the target, not call Spawn() .
If the Spawn function is long running in nature (I assume it is), then your call to the Spawn function will block the loop and wait until it returns. This is why your loop might print "looped" once and the Spawn function getting called just once too.
My suggestion would be like this:
import Handle
import threading
h = Handle.Handle()
h.StartConnection()
threads = []
for i in range(0, 5):
print("Looped")
t = threading.Thread(target=h.Spawn)
threads.append(t)
t.start()
I took a list to store the threads - the threads list. Then appending each of the thread in it before calling start. Now I can iterate over the threads list anytime I want (may be for joining them?).
Also since I assumed Spawn is a long running function, I passed it as the target to the Thread constructor. So it should be run in background when we call start on the thread. Now it should no longer block the loop.

You are not running threads, you run the Spawn-method right in the main thread. target needs to be a function, not the result of that function:
t = threading.Thread(target=h.Spawn)

Try this code .
import Handle
import threading
h = Handle.Handle()
h.StartConnection()
for i in range(0, 5):
print("Looped")
threading.Timer(5.0, h).start()

Related

How to get only the current thread number?

This is the example code I am using
import time
import threading
import re
def do_action():
while True:
x = threading.current_thread()
print(x)
time.sleep(60)
for _ in range(1):
threading.Thread(target=do_action).start()
The result of Print is as follows
<Thread(Thread-1, started 10160)>
I need to get only the number of the Thread which in this case is the number 1
I tried to use
thread_number = re.findall("(\d+)", x)[0]
But error occurs when using.
The 1 in the Thread-1 output is part of the default thread name generation if you don't explicitly give your thread a name. There is no guarantee that a thread will have such a number - the main thread won't, and explicitly named threads typically won't. Also, multiple threads can have the same number, if a thread is manually given a name that matches the Thread-n pattern.
If that's the number you want, you can get it by parsing the thread's name - int(thread.name.split('-')[1]) - but it's probably not the best tool for whatever job you plan to use it for.
If you're starting a bunch of threads and they each need to use a distinct number from 1 to n for some reason, maybe work allocation or something, just pass a number to their target function:
def do_stuff(n):
# do stuff with n
threads = [threading.Thread(target=do_stuff, args=(i,)) for i in range(1, 11)]
for thread in threads:
thread.start()
Threads also have ident and native_id attributes, which are None for threads that haven't been started yet and integers for threads that have started. These are identifiers that are guaranteed to be distinct for threads alive at the same time - this distinctness guarantee is process-wide for ident and system-wide for native_id. However, if one thread finishes before another starts, they may be assigned the same ident or native_id.
Try:
thread_number = re.findall("(\d+)", x.name)[0]

How to run a python script multiple times simultaneously using python and terminate all when one has finished

Maybe it's a very simple question, but I'm new in concurrency. I want to do a python script to run foo.py 10 times simultaneously with a time limit of 60 sec before automatically abort. The script is a non deterministic algorithm, hence all executions takes different times and one will be finished before the others. Once the first ends, I would like to save the execution time, the output of the algorithm and after that kill the rest of the processes.
I have seen this question run multiple instances of python script simultaneously and it looks very similar, but how can I add time limit and the possibility of when the first one finishes the execution, kills the rest of processes?
Thank you in advance.
I'd suggest using the threading lib, because with it you can set threads to daemon threads so that if the main thread exits for whatever reason the other threads are killed. Here's a small example:
#Import the libs...
import threading, time
#Global variables... (List of results.)
results=[]
#The subprocess you want to run several times simultaneously...
def run():
#We declare results as a global variable.
global results
#Do stuff...
results.append("Hello World! These are my results!")
n=int(input("Welcome user, how much times should I execute run()? "))
#We run the thread n times.
for _ in range(n):
#Define the thread.
t=threading.Thread(target=run)
#Set the thread to daemon, this means that if the main process exits the threads will be killed.
t.setDaemon(True)
#Start the thread.
t.start()
#Once the threads have started we can execute tha main code.
#We set a timer...
startTime=time.time()
while True:
#If the timer reaches 60 s we exit from the program.
if time.time()-startTime>=60:
print("[ERROR] The script took too long to run!")
exit()
#Do stuff on your main thread, if the stuff is complete you can break from the while loop as well.
results.append("Main result.")
break
#When we break from the while loop we print the output.
print("Here are the results: ")
for i in results:
print(f"-{i}")
This example should solve your problem, but if you wanted to use blocking commands on the main thread the timer would fail, so you'd need to tweak this code a bit. If you wanted to do that move the code from the main thread's loop to a new function (for example def main(): and execute the rest of the threads from a primary thread on main. This example may help you:
def run():
pass
#Secondary "main" thread.
def main():
#Start the rest of the threads ( in this case I just start 1).
localT=threading.Thread(target=run)
localT.setDaemon(True)
localT.start()
#Do stuff.
pass
#Actual main thread...
t=threading.Thread(target=main)
t.setDaemon(True)
t.start()
#Set up a timer and fetch the results you need with a global list or any other method...
pass
Now, you should avoid global variables at all costs as sometimes they may be a bit buggy, but for some reason the threading lib doesn't allow you to return values from threads, at least i don't know any methods. I think there are other multi-processing libs out there that do let you return values, but I don't know anything about them so I can't explain you anything. Anyways, I hope that this works for you.
-Update: Ok, I was busy writing the code and I didn't read the comments in the post, sorry. You can still use this method but instead of writing code inside the threads, execute another script. You could either import it as a module or actually run it as a script, here's a question that may help you with that:
How to run one python file in another file?

A simple way to run a piece of python code in parallel?

I have this very simple python code:
Test = 1;
def para():
while(True):
if Test > 10:
print("Test is bigger than ten");
time.sleep(1);
para(); # I want this to start in parallel, so that the code below keeps executing without waiting for this function to finish
while(True):
Test = random.randint(1,42);
time.sleep(1);
if Test == 42:
break;
...#stop the parallel execution of the para() here (kill it)
..some other code here
Basically, I want to run the function para() in parallel to the other code, so that the code below it doesn't have to wait for the para() to end.
However, I want to be able to access the current value of the Test variable, inside of the para() while it is running in parallel (as seen in the code example above). Later, when I decide, that I am done with the para() running in parallel, I would like to know how to kill it both from the main thread, but also from within the parallel-ly running para() itself (self-terminate).
I have read some tutorials on threading, but almost every tutorial approaches it differently, plus I had a trouble understanding some of it, so I would like to know, what is the easiest way to run a piece of code in parallel.
Thank you.
Okay, first, here is an answer to your question, verbatim and in the simplest possible way. After that, we answer a little more fully with two examples that show two ways to do this and share access to data between the main and parallel code.
import random
from threading import Thread
import time
Test = 1;
stop = False
def para():
while not stop:
if Test > 10:
print("Test is bigger than ten");
time.sleep(1);
# I want this to start in parallel, so that the code below keeps executing without waiting for this function to finish
thread = Thread(target=para)
thread.start()
while(True):
Test = random.randint(1,42);
time.sleep(1);
if Test == 42:
break;
#stop the parallel execution of the para() here (kill it)
stop = True
thread.join()
#..some other code here
print( 'we have stopped' )
And now, the more complete answer:
In the following we show two code examples (listed below) that demonstrate (a) parallel execution using the threading interface, and (b) using the multiprocessing interface. Which of these you choose to use, depends on what you are trying to do. Threading can be a good choice when the purpose of the second thread is to wait for I/O, and multiprocessing can be a good choice when the second thread is for doing cpu intensive calculations.
In your example, the main code changed a variable and the parallel code only examined the variable. Things are different if you want to change a variable from both, for example to reset a shared counter. So, we will show you how to do that also.
In the following example codes:
The variables "counter" and "run" and "lock" are shared between the main program and the code executed in parallel.
The function myfunc(), is executed in parallel. It loops over updating counter and sleeping, until run is set to false, by the main program.
The main program loops over printing the value of counter until it reaches 5, at which point it resets the counter. Then, after it reaches 5 again, it sets run to false and finally, it waits for the thread or process to exit before exiting itself.
You might notice that counter is incremented inside of calls to lock.acquire() and lock.release() in the first example, or with lock in the second example.
Incrementing a counter comprises three steps, (1) reading the current value, (2) adding one to it, and then (3) storing the result back into the counter. The problem comes when one thread tries to set the counter at the same time that this is happening.
We solve this by having both the main program and the parallel code acquire a lock before they change the variable, and then release it when they are done. If the lock is already taken, the program or parallel code waits until it is released. This synchronizes their access to change the shared data, i.e. the counter. (Aside, see semaphore for another kind of synchronization).
With that introduction, here is the first example, which uses threads:
# Parallel code with shared variables, using threads
from threading import Lock, Thread
from time import sleep
# Variables to be shared across threads
counter = 0
run = True
lock = Lock()
# Function to be executed in parallel
def myfunc():
# Declare shared variables
global run
global counter
global lock
# Processing to be done until told to exit
while run:
sleep( 1 )
# Increment the counter
lock.acquire()
counter = counter + 1
lock.release()
# Set the counter to show that we exited
lock.acquire()
counter = -1
lock.release()
print( 'thread exit' )
# ----------------------------
# Launch the parallel function as a thread
thread = Thread(target=myfunc)
thread.start()
# Read and print the counter
while counter < 5:
print( counter )
sleep( 1 )
# Change the counter
lock.acquire()
counter = 0
lock.release()
# Read and print the counter
while counter < 5:
print( counter )
sleep( 1 )
# Tell the thread to exit and wait for it to exit
run = False
thread.join()
# Confirm that the thread set the counter on exit
print( counter )
And here is the second example, which uses multiprocessing. Notice that there are some extra steps involved to access the shared variables.
from time import sleep
from multiprocessing import Process, Value, Lock
def myfunc(counter, lock, run):
while run.value:
sleep(1)
with lock:
counter.value += 1
print( "thread %d"%counter.value )
with lock:
counter.value = -1
print( "thread exit %d"%counter.value )
# =======================
counter = Value('i', 0)
run = Value('b', True)
lock = Lock()
p = Process(target=myfunc, args=(counter, lock, run))
p.start()
while counter.value < 5:
print( "main %d"%counter.value )
sleep(1)
with lock:
counter.value = 0
while counter.value < 5:
print( "main %d"%counter.value )
sleep(1)
run.value = False
p.join()
print( "main exit %d"%counter.value)
Rather than manually starting threads, much better just use multiprocessing.pool. The multiprocessing part needs to be in a function that you call with map. Instead of map you can then use pool.imap.
import multiprocessing
import time
def func(x):
time.sleep(x)
return x + 2
if __name__ == "__main__":
p = multiprocessing.Pool()
start = time.time()
for x in p.imap(func, [1,5,3]):
print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))
Also check out:
multiprocessing.Pool: What's the difference between map_async and imap?
Also worth checking out is functools.partials which can be used to pass in multiple variables (in addition to the list).
Another trick: sometimes you don’t l really need multiprocessing (as in multiple cores of your processor), but just multiple threads to concurrently query a database with many connections at the same time. In that case just do from multiprocessing.dummy import Pool and can avoid python from spawning a separate process (which makes you lose access to all the namespaces you don’t pass into the function), but keep all the benefits of a pool, just in a single cpu core. That’s all you need to know about python multi processing (using multiple cores) and multithreading (using just one process and keeping the global interpreter lock intact).
Another little advice: always try to use map first without any pools. Then switch to pool.imap in the next step once you’re sure it all works.

Python script is hanging AFTER multithreading

I know there are a few questions and answers related to hanging threads in Python, but my situation is slightly different as the script is hanging AFTER all the threads have been completed. The threading script is below, but obviously the first 2 functions are simplified massively.
When I run the script shown, it works. When I use my real functions, the script hangs AFTER THE LAST LINE. So, all the scenarios are processed (and a message printed to confirm), logStudyData() then collates all the results and writes to a csv. "Script Complete" is printed. And THEN it hangs.
The script with threading functionality removed runs fine.
I have tried enclosing the main script in try...except but no exception gets logged. If I use a debugger with a breakpoint on the final print and then step it forward, it hangs.
I know there is not much to go on here, but short of including the whole 1500-line script, I don't know hat else to do. Any suggestions welcome!
def runScenario(scenario):
# Do a bunch of stuff
with lock:
# access global variables
pass
pass
def logStudyData():
# Combine results from all scenarios into a df and write to csv
pass
def worker():
global q
while True:
next_scenario = q.get()
if next_scenario is None:
break
runScenario(next_scenario)
print(next_scenario , " is complete")
q.task_done()
import threading
from queue import Queue
global q, lock
q = Queue()
threads = []
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
num_worker_threads = 6
lock = threading.Lock()
for i in range(num_worker_threads):
print("Thread number ",i)
this_thread = threading.Thread(target=worker)
this_thread.start()
threads.append(this_thread)
for scenario_name in scenario_list:
q.put(scenario_name)
q.join()
print("q.join completed")
logStudyData()
print("script complete")
As the docs for Queue.get say:
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
In other words, there is no way get can ever return None, except by you calling q.put(None) on the main thread, which you don't do.
Notice that the example directly below those docs does this:
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
The second one is technically necessary, but you usually get away with not doing it.
But the first one is absolutely necessary. You need to either do this, or come up with some other mechanism to tell your workers to quit. Without that, your main thread just tries to exit, which means it tries to join every worker, but those workers are all blocked forever on a get that will never happen, so your program hangs forever.
Building a thread pool may not be rocket science (if only because rocket scientists tend to need their calculations to be deterministic and hard real-time…), but it's not trivial, either, and there are plenty of things you can get wrong. You may want to consider using one of the two already-built threadpools in the Python standard library, concurrent.futures.ThreadPoolExecutor or multiprocessing.dummy.Pool. This would reduce your entire program to:
import concurrent.futures
def work(scenario):
runScenario(scenario)
print(scenario , " is complete")
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as x:
results = list(x.map(work, scenario_list))
print("q.join completed")
logStudyData()
print("script complete")
Obviously you'll still need a lock around any mutable variables you change inside runScenario—although if you're only using a mutable variable there because you couldn't figure out how to return values to the main thread, that's trivial with an Executor: just return the values from work, and then you can use them like this:
for result in x.map(work, scenario_list):
do_something(result)

How to manage python threads results?

I am using this code:
def startThreads(arrayofkeywords):
global i
i = 0
while len(arrayofkeywords):
try:
if i<maxThreads:
keyword = arrayofkeywords.pop(0)
i = i+1
thread = doStuffWith(keyword)
thread.start()
except KeyboardInterrupt:
sys.exit()
thread.join()
for threading in python, I have almost everything done, but I dont know how to manage the results of each thread, on each thread I have an array of strings as result, how can I join all those arrays into one safely? Because, I if I try writing into a global array, two threads could be writing at the same time.
First, you actually need to save all those thread objects to call join() on them. As written, you're saving only the last one of them, and then only if there isn't an exception.
An easy way to do multithreaded programming is to give each thread all the data it needs to run, and then have it not write to anything outside that working set. If all threads follow that guideline, their writes will not interfere with each other. Then, once a thread has finished, have the main thread only aggregate the results into a global array. This is know as "fork/join parallelism."
If you subclass the Thread object, you can give it space to store that return value without interfering with other threads. Then you can do something like this:
class MyThread(threading.Thread):
def __init__(self, ...):
self.result = []
...
def main():
# doStuffWith() returns a MyThread instance
threads = [ doStuffWith(k).start() for k in arrayofkeywords[:maxThreads] ]
for t in threads:
t.join()
ret = t.result
# process return value here
Edit:
After looking around a bit, it seems like the above method isn't the preferred way to do threads in Python. The above is more of a Java-esque pattern for threads. Instead you could do something like:
def handler(outList)
...
# Modify existing object (important!)
outList.append(1)
...
def doStuffWith(keyword):
...
result = []
thread = Thread(target=handler, args=(result,))
return (thread, result)
def main():
threads = [ doStuffWith(k) for k in arrayofkeywords[:maxThreads] ]
for t in threads:
t[0].start()
for t in threads:
t[0].join()
ret = t[1]
# process return value here
Use a Queue.Queue instance, which is intrinsically thread-safe. Each thread can .put its results to that global instance when it's done, and the main thread (when it knows all working threads are done, by .joining them for example as in #unholysampler's answer) can loop .getting each result from it, and use each result to .extend the "overall result" list, until the queue is emptied.
Edit: there are other big problems with your code -- if the maximum number of threads is less than the number of keywords, it will never terminate (you're trying to start a thread per keyword -- never less -- but if you've already started the max numbers you loop forever to no further purpose).
Consider instead using a threading pool, kind of like the one in this recipe, except that in lieu of queueing callables you'll queue the keywords -- since the callable you want to run in the thread is the same in each thread, just varying the argument. Of course that callable will be changed to peel something from the incoming-tasks queue (with .get) and .put the list of results to the outgoing-results queue when done.
To terminate the N threads you could, after all keywords, .put N "sentinels" (e.g. None, assuming no keyword can be None): a thread's callable will exit if the "keyword" it just pulled is None.
More often than not, Queue.Queue offers the best way to organize threading (and multiprocessing!) architectures in Python, be they generic like in the recipe I pointed you to, or more specialized like I'm suggesting for your use case in the last two paragraphs.
You need to keep pointers to each thread you make. As is, your code only ensures the last created thread finishes. This does not imply that all the ones you started before it have also finished.
def startThreads(arrayofkeywords):
global i
i = 0
threads = []
while len(arrayofkeywords):
try:
if i<maxThreads:
keyword = arrayofkeywords.pop(0)
i = i+1
thread = doStuffWith(keyword)
thread.start()
threads.append(thread)
except KeyboardInterrupt:
sys.exit()
for t in threads:
t.join()
//process results stored in each thread
This also solves the problem of write access because each thread will store it's data locally. Then after all of them are done, you can do the work to combine each threads local data.
I know that this question is a little bit old, but the best way to do this is not to harm yourself too much in the way proposed by other colleagues :)
Please read the reference on Pool. This way you will fork-join your work:
def doStuffWith(keyword):
return keyword + ' processed in thread'
def startThreads(arrayofkeywords):
pool = Pool(processes=maxThreads)
result = pool.map(doStuffWith, arrayofkeywords)
print result
Writing into a global array is fine if you use a semaphore to protect the critical section. You 'acquire' the lock when you want to append to the global array, then 'release' when you are done. This way, only one thread is every appending to the array.
Check out http://docs.python.org/library/threading.html and search for semaphore for more info.
sem = threading.Semaphore()
...
sem.acquire()
# do dangerous stuff
sem.release()
try some semaphore's methods, like acquire and release..
http://docs.python.org/library/threading.html

Categories