Python Multiprocessing Async Can't Terminate Process

Python Multiprocessing Async Can't Terminate Process - python

I have an infinite loop running async but I can't terminate it. Here is a similiar version of my code :
from multiprocessing import Pool
test_pool = Pool(processes=1)
self.button1.clicked.connect(self.starter)
self.button2.clicked.connect(self.stopper)
def starter(self):
global test_pool
test_pool.apply_async(self.automatizer)
def automatizer(self):
i = 0
while i != 0 :
self.job1()
# safe stop point
self.job2()
# safe stop point
self.job3()
# safe stop point
def job1(self):
# doing some stuff
def job2(self):
# doing some stuff
def job3(self):
# doing some stuff
def stopper(self):
global test_pool
test_pool.terminate()
My problem is terminate() inside stopper function doesn't work. I tried to put terminate() inside job1,job2,job3 functions still not working, tried putting at the end of the loop in starter function, again not working. How can I stop this async process ?
While stopping the process at anytime is good enough, is it possible to make it stop at the points I want ? I mean if a stop command (not sure about what command it is) is given to process, I want it to complete the steps to "# safe stop point" marker then terminate the process.

You really should be avoiding the use of terminate() in normal operation. It should only be used in unusual cases, such as hanging or unresponsive processes. The normal way to end a process pool is to call pool.close() followed by pool.join().
These methods do require the function that your pool is executing to return, and your call to pool.join() will block your main process until it does so. I would suggest you add a multiprocess.Queue to give yourself a way to tell your subprocess to exit:
# this import is NOT the same as multiprocessing.Queue - this is here for the
# queue.Empty exception
import Queue
queue = multiprocessing.Queue() # not the same as a Queue.Queue()
def stopper(self):
# don't need "global" keyword to call a global object's method
# it's only necessary if we want to modify a global
queue.put("Stop")
test_pool.close()
test_pool.join()
def automatizer(self):
while True: # cleaner infinite loop - yours was never executing
for func in [self.job1, self.job2, self.job3]: # iterate over methods
func() # call each one
# between each function call, check the queue for "poison pill"
try:
if queue.get(block=False) == "Stop":
return
except Queue.Empty:
pass
Since you didn't provide a more complete code sample, you'll have to figure out where to actually instantiate the multiprocessing.Queue and how to pass things around. Also, the comment from Janne Karila was correct. You should switch your code to use a single Process instead of a pool if you're only using one process at a time anyway. The Process class also uses a blocking join() method to tell it to end once it has returned. The only safe way to end processes at "known safe points" is to implement some kind of interprocess communication like I've done here. Pipes would work as well.

Related

How to start a thread again with an Event object in Python?

I want to make a thread and control it with an event object. Detailedly speaking, I want the thread to be executed whenever the event object is set and to wait itselt, repeatedly.
The below shows a sketchy logic I thought of.
import threading
import time
e = threading.Event()
def start_operation():
e.wait()
while e.is_set():
print('STARTING TASK')
e.clear()
t1 = threading.Thread(target=start_operation)
t1.start()
e.set() # first set
e.set() # second set
I expected t1 to run once the first set has been commanded and to stop itself(due to e.clear inside it), and then to run again after the second set has been commanded. So, accordign to what I expected, it should print out 'STARTING TASK' two times. But it shows it only once, which I don't understand why. How am I supposed to change the code to make it run the while loop again, whenever the event object is set?

The first problem is that once you exit a while loop, you've exited it. Changing the predicate back won't change anything. Forget about events for a second and just look at this code:
i = 0
while i == 0:
i = 1
It obviously doesn't matter if you set i = 0 again later, right? You've already left the while loop, and the whole function. And your code is doing exactly the same thing.
You can fix problem that by just adding another while loop around the whole thing:
def start_operation():
while True:
e.wait()
while e.is_set():
print('STARTING TASK')
e.clear()
However, that still isn't going to work—except maybe occasionally, by accident.
Event.set doesn't block; it just sets the event immediately, even if it's already set. So, the most likely flow of control here is:
background thread hits e.wait() and blocks.
main thread hits e.set() and sets event.
main thread hits e.set() and sets event again, with no effect.
background thread wakes up, does the loop once, calls e.clear() at the end.
background thread waits forever on e.wait().
(The fact that there's no way to avoid missed signals with events is effectively the reason conditions were invented, and that anything newer than Win32 and Python doesn't bother with events… But a condition isn't sufficient here either.)
If you want the main thread to block until the event is clear, and only then set it again, you can't do that. You need something extra, like a second event, which the main thread can wait on and the background thread can set.
But if you want to keep track of multiple set calls, without missing any, you need to use a different sync mechanism. A queue.Queue may be overkill here, but it's dead simple to do in Python, so let's just use that. Of course you don't actually have any values to put on the queue, but that's OK; you can just stick a dummy value there:
import queue
import threading
q = queue.Queue()
def start_operation():
while True:
_ = q.get()
print('STARTING TASK')
t1 = threading.Thread(target=start_operation)
t1.start()
q.put(None)
q.put(None)
And if you later want to add a way to shut down the background thread, just change it to stick values on:
import queue
import threading
q = queue.Queue()
def start_operation():
while True:
if q.get():
return
print('STARTING TASK')
t1 = threading.Thread(target=start_operation)
t1.start()
q.put(False)
q.put(False)
q.put(True)

Python script is hanging AFTER multithreading

I know there are a few questions and answers related to hanging threads in Python, but my situation is slightly different as the script is hanging AFTER all the threads have been completed. The threading script is below, but obviously the first 2 functions are simplified massively.
When I run the script shown, it works. When I use my real functions, the script hangs AFTER THE LAST LINE. So, all the scenarios are processed (and a message printed to confirm), logStudyData() then collates all the results and writes to a csv. "Script Complete" is printed. And THEN it hangs.
The script with threading functionality removed runs fine.
I have tried enclosing the main script in try...except but no exception gets logged. If I use a debugger with a breakpoint on the final print and then step it forward, it hangs.
I know there is not much to go on here, but short of including the whole 1500-line script, I don't know hat else to do. Any suggestions welcome!
def runScenario(scenario):
# Do a bunch of stuff
with lock:
# access global variables
pass
pass
def logStudyData():
# Combine results from all scenarios into a df and write to csv
pass
def worker():
global q
while True:
next_scenario = q.get()
if next_scenario is None:
break
runScenario(next_scenario)
print(next_scenario , " is complete")
q.task_done()
import threading
from queue import Queue
global q, lock
q = Queue()
threads = []
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
num_worker_threads = 6
lock = threading.Lock()
for i in range(num_worker_threads):
print("Thread number ",i)
this_thread = threading.Thread(target=worker)
this_thread.start()
threads.append(this_thread)
for scenario_name in scenario_list:
q.put(scenario_name)
q.join()
print("q.join completed")
logStudyData()
print("script complete")

As the docs for Queue.get say:
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
In other words, there is no way get can ever return None, except by you calling q.put(None) on the main thread, which you don't do.
Notice that the example directly below those docs does this:
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
The second one is technically necessary, but you usually get away with not doing it.
But the first one is absolutely necessary. You need to either do this, or come up with some other mechanism to tell your workers to quit. Without that, your main thread just tries to exit, which means it tries to join every worker, but those workers are all blocked forever on a get that will never happen, so your program hangs forever.
Building a thread pool may not be rocket science (if only because rocket scientists tend to need their calculations to be deterministic and hard real-time…), but it's not trivial, either, and there are plenty of things you can get wrong. You may want to consider using one of the two already-built threadpools in the Python standard library, concurrent.futures.ThreadPoolExecutor or multiprocessing.dummy.Pool. This would reduce your entire program to:
import concurrent.futures
def work(scenario):
runScenario(scenario)
print(scenario , " is complete")
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as x:
results = list(x.map(work, scenario_list))
print("q.join completed")
logStudyData()
print("script complete")
Obviously you'll still need a lock around any mutable variables you change inside runScenario—although if you're only using a mutable variable there because you couldn't figure out how to return values to the main thread, that's trivial with an Executor: just return the values from work, and then you can use them like this:
for result in x.map(work, scenario_list):
do_something(result)

(Python) Stop thread with raw input?

EDIT 9/15/16: In my original code (still posted below) I tried to use .join() with a function, which is a silly mistake because it can only be used with a thread object. I am trying to
(1) continuously run a thread that gets data and saves it to a file
(2) have a second thread, or incorporate queue, that will stop the program once a user enters a flag (i.e. "stop"). It doesn't interrupt the data gathering/saving thread.
I need help with multithreading. I am trying to run two threads, one that handles data and the second checks for a flag to stop the program.
I learned by trial and error that I can't interrupt a while loop without my computer exploding. Additionally, I have abandoned my GUI code because it made my code too complicated with the mulithreading.
What I want to do is run a thread that gathers data from an Arduino, saves it to a file, and repeats this. The second thread will scan for a flag -- which can be a raw_input? I can't think of anything else that a user can do to stop the data acquisition program.
I greatly appreciate any help on this. Here is my code (much of it is pseudocode, as you can see):
#threading
import thread
import time
global flag
def monitorData():
print "running!"
time.sleep(5)
def stopdata(flag ):
flag = raw_input("enter stop: ")
if flag == "stop":
monitorData.join()
flag = "start"
thread.start_new_thread( monitorData,())
thread.start_new_thread( stopdata,(flag,))
The error I am getting is this when I try entering "stop" in the IDLE.
Unhandled exception in thread started by
Traceback (most recent call last):
File "c:\users\otangu~1\appdata\local\temp\IDLE_rtmp_h_frd5", line 16, in stopdata
AttributeError: 'function' object has no attribute 'join'
Once again I really appreciate any help, I have taught myself Python so far and this is the first huge wall that I've hit.

The error you see is a result of calling join on the function. You need to call join on the thread object. You don't capture a reference to the thread so you have no way to call join anyway. You should join like so.
th1 = thread.start_new_thread( monitorData,())
# later
th1.join()
As for a solution, you can use a Queue to communicate between threads. The queue is used to send a quit message to the worker thread and if the worker does not pick anything up off the queue for a second it runs the code that gathers data from the arduino.
from threading import Thread
from Queue import Queue, Empty
def worker(q):
while True:
try:
item = q.get(block=True, timeout=1)
q.task_done()
if item == "quit":
print("got quit msg in thread")
break
except Empty:
print("empty, do some arduino stuff")
def input_process(q):
while True:
x = raw_input("")
if x == 'q':
print("will quit")
q.put("quit")
break
q = Queue()
t = Thread(target=worker, args=(q,))
t.start()
t2 = Thread(target=input_process, args=(q,))
t2.start()
# waits for the `task_done` function to be called
q.join()
t2.join()
t.join()
It's possibly a bit more code than you hoped for and having to detect the queue is empty with an exception is a little ugly, but this doesn't rely on any global variables and will always exit promptly. That wont be the case with sleep based solutions, which need to wait for any current calls to sleep to finish before resuming execution.
As noted by someone else, you should really be using threading rather than the older thread module and also I would recommend you learn with python 3 and not python 2.

You're looking for something like this:
from threading import Thread
from time import sleep
# "volatile" global shared by threads
active = True
def get_data():
while active:
print "working!"
sleep(3)
def wait_on_user():
global active
raw_input("press enter to stop")
active = False
th1 = Thread(target=get_data)
th1.start()
th2 = Thread(target=wait_on_user)
th2.start()
th1.join()
th2.join()
You made a few obvious and a few less obvious mistakes in your code. First, join is called on a thread object, not a function. Similarly, join doesn't kill a thread, it waits for the thread to finish. A thread finishes when it has no more code to execute. If you want a thread to run until some flag is set, you normally include a loop in your thread that checks the flag every second or so (depending on how precise you need the timing to be).
Also, the threading module is preferred over the lower lever thread module. The latter has been removed in python3.

This is not possible. The thread function has to finish. You can't join it from the outside.

Programmatically exiting python script while multithreading

I have some code which runs routinely, and every now and then (like once a month) the program seems to hang somewhere and I'm not sure where.
I thought I would implement [what has turned out to be not quite] a "quick fix" of checking how long the program has been running for. I decided to use multithreading to call the function, and then while it is running, check the time.
For example:
import datetime
import threading
def myfunc():
#Code goes here
t=threading.Thread(target=myfunc)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
print 'Exiting!'
raise SystemExit(0)
However, this does not close the other thread (myfunc).
What is the best way to go about killing the other thread?

The docs could be clearer about this. Raising SystemExit tells the interpreter to quit, but "normal" exit processing is still done. Part of normal exit processing is .join()-ing all active non-daemon threads. But your rogue thread never ends, so exit processing waits forever to join it.
As #roippi said, you can do
t.daemon = True
before starting it. Normal exit processing does not wait for daemon threads. Your OS should kill them then when the main process exits.
Another alternative:
import os
os._exit(13) # whatever exit code you want goes there
That stops the interpreter "immediately", and skips all normal exit processing.
Pick your poison ;-)

There is no way to kill a thread. You must kill the target from within the target. The best way is with a hook and a queue. It goes something like this.
import Threading
from Queue import Queue
# add a kill_hook arg to your function, kill_hook
# is a queue used to pass messages to the main thread
def myfunc(*args, **kwargs, kill_hook=None):
#Code goes here
# put this somewhere which is periodically checked.
# an ideal place to check the hook is when logging
try:
if q.get_nowait(): # or use q.get(True, 5) to wait a longer
print 'Exiting!'
raise SystemExit(0)
except Queue.empty:
pass
q = Queue() # the queue used to pass the kill call
t=threading.Thread(target=myfunc, args = q)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
# if your kill criteria are met, put something in the queue
q.put(1)
I originally found this answer somewhere online, possibly this. Hope this helps!
Another solution would be to use a separate instance of Python, and monitor the other Python thread, killing it from the system level, with psutils.
Wow, I like the daemon and stealth os._exit solutions too!

How to manage python threads results?

I am using this code:
def startThreads(arrayofkeywords):
global i
i = 0
while len(arrayofkeywords):
try:
if i<maxThreads:
keyword = arrayofkeywords.pop(0)
i = i+1
thread = doStuffWith(keyword)
thread.start()
except KeyboardInterrupt:
sys.exit()
thread.join()
for threading in python, I have almost everything done, but I dont know how to manage the results of each thread, on each thread I have an array of strings as result, how can I join all those arrays into one safely? Because, I if I try writing into a global array, two threads could be writing at the same time.

First, you actually need to save all those thread objects to call join() on them. As written, you're saving only the last one of them, and then only if there isn't an exception.
An easy way to do multithreaded programming is to give each thread all the data it needs to run, and then have it not write to anything outside that working set. If all threads follow that guideline, their writes will not interfere with each other. Then, once a thread has finished, have the main thread only aggregate the results into a global array. This is know as "fork/join parallelism."
If you subclass the Thread object, you can give it space to store that return value without interfering with other threads. Then you can do something like this:
class MyThread(threading.Thread):
def __init__(self, ...):
self.result = []
...
def main():
# doStuffWith() returns a MyThread instance
threads = [ doStuffWith(k).start() for k in arrayofkeywords[:maxThreads] ]
for t in threads:
t.join()
ret = t.result
# process return value here
Edit:
After looking around a bit, it seems like the above method isn't the preferred way to do threads in Python. The above is more of a Java-esque pattern for threads. Instead you could do something like:
def handler(outList)
...
# Modify existing object (important!)
outList.append(1)
...
def doStuffWith(keyword):
...
result = []
thread = Thread(target=handler, args=(result,))
return (thread, result)
def main():
threads = [ doStuffWith(k) for k in arrayofkeywords[:maxThreads] ]
for t in threads:
t[0].start()
for t in threads:
t[0].join()
ret = t[1]
# process return value here

Use a Queue.Queue instance, which is intrinsically thread-safe. Each thread can .put its results to that global instance when it's done, and the main thread (when it knows all working threads are done, by .joining them for example as in #unholysampler's answer) can loop .getting each result from it, and use each result to .extend the "overall result" list, until the queue is emptied.
Edit: there are other big problems with your code -- if the maximum number of threads is less than the number of keywords, it will never terminate (you're trying to start a thread per keyword -- never less -- but if you've already started the max numbers you loop forever to no further purpose).
Consider instead using a threading pool, kind of like the one in this recipe, except that in lieu of queueing callables you'll queue the keywords -- since the callable you want to run in the thread is the same in each thread, just varying the argument. Of course that callable will be changed to peel something from the incoming-tasks queue (with .get) and .put the list of results to the outgoing-results queue when done.
To terminate the N threads you could, after all keywords, .put N "sentinels" (e.g. None, assuming no keyword can be None): a thread's callable will exit if the "keyword" it just pulled is None.
More often than not, Queue.Queue offers the best way to organize threading (and multiprocessing!) architectures in Python, be they generic like in the recipe I pointed you to, or more specialized like I'm suggesting for your use case in the last two paragraphs.

You need to keep pointers to each thread you make. As is, your code only ensures the last created thread finishes. This does not imply that all the ones you started before it have also finished.
def startThreads(arrayofkeywords):
global i
i = 0
threads = []
while len(arrayofkeywords):
try:
if i<maxThreads:
keyword = arrayofkeywords.pop(0)
i = i+1
thread = doStuffWith(keyword)
thread.start()
threads.append(thread)
except KeyboardInterrupt:
sys.exit()
for t in threads:
t.join()
//process results stored in each thread
This also solves the problem of write access because each thread will store it's data locally. Then after all of them are done, you can do the work to combine each threads local data.

I know that this question is a little bit old, but the best way to do this is not to harm yourself too much in the way proposed by other colleagues :)
Please read the reference on Pool. This way you will fork-join your work:
def doStuffWith(keyword):
return keyword + ' processed in thread'
def startThreads(arrayofkeywords):
pool = Pool(processes=maxThreads)
result = pool.map(doStuffWith, arrayofkeywords)
print result

Writing into a global array is fine if you use a semaphore to protect the critical section. You 'acquire' the lock when you want to append to the global array, then 'release' when you are done. This way, only one thread is every appending to the array.
Check out http://docs.python.org/library/threading.html and search for semaphore for more info.
sem = threading.Semaphore()
...
sem.acquire()
# do dangerous stuff
sem.release()

try some semaphore's methods, like acquire and release..
http://docs.python.org/library/threading.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.