Why subprocess can't successfully kill the old running process? - python

I run a program test.py.
Since it collapses frequently, I import subprocess to restart it when it stops.
Sometimes I found subprocess can't successfully restart it.
Hence, I force the program to restart every 60 minutes.
But I find that there sometimes two test.py processing running simutanously.
What's wrong with my code and how to fix it?
I use windows 7 OS.
Plz check the following codes and thanks in advance:
import subprocess
import time
from datetime import datetime
p = subprocess.Popen(['python.exe', r'D:\test.py'], shell=True)
minutes = 1
total_time = 0
while True:
now = datetime.now()
#periodly restart
total_time += 1
if total_time % 100 == 0:
try:
p.kill()
except Exception as e:
terminated = True
finally:
p = subprocess.Popen(['python.exe', r'D:\test.py'], shell=True)
#check and restart if it stops
try:
terminated = p.poll()
except Exception as e:
terminated = True
if terminated:
p = subprocess.Popen(['python.exe', r'D:\test.py'], shell=True)
time.sleep(minutes * 60)

While I don't agree at all with your design, the specific problem is here:
except Exception as e:
terminated = True
finally:
p = subprocess.Popen(['python.exe', r'D:\test.py'], shell=True)
In the case that an Exception was thrown, you're setting terminated to true, but then immediately restarting the subprocess. Then, later, you check:
if terminated:
p = subprocess.Popen(['python.exe', r'D:\test.py'], shell=True)
At this point, terminated is true, so it starts a new subprocess. However, it already had done that in the finally block.
Really, what you should do is simply not bother restarting it during the kill attempt:
try:
p.kill()
except Exception:
# We don't care, just means it was already dead
pass
finally:
# Either the process is dead, or we just killed it. Either way, need to restart
terminated = True
Then your if terminated clause will properly restart the process and you won't have a duplicate.

Related

Python threads hang and don't close

This is my first try with threads in Python,
I wrote the following program as a very simple example. It just gets a list and prints it using some threads. However, Whenever there is an error, the program just hangs in Ubuntu, and I can't seem to do anything to get the control prompt back, so have to restart another SSH session to get back in.
Also have no idea what the issue with my program is.
Is there some kind of error handling I can put in to ensure it doesn't hang.
Also, any idea why ctrl/c doesn't work (I don't have a break key)
from Queue import Queue
from threading import Thread
import HAInstances
import logging
log = logging.getLogger()
logging.basicConfig()
class GetHAInstances:
def oraHAInstanceData(self):
log.info('Getting HA instance routing data')
# HAData = SolrGetHAInstances.TalkToOracle.main()
HAData = HAInstances.main()
log.info('Query fetched ' + str(len(HAData)) + ' HA Instances to query')
# for row in HAData:
# print row
return(HAData)
def do_stuff(q):
while True:
print q.get()
print threading.current_thread().name
q.task_done()
oraHAInstances = GetHAInstances()
mainHAData = oraHAInstances.oraHAInstanceData()
q = Queue(maxsize=0)
num_threads = 10
for i in range(num_threads):
worker = Thread(target=do_stuff, args=(q,))
worker.setDaemon(True)
worker.start()
for row in mainHAData:
#print str(row[0]) + ':' + str(row[1]) + ':' + str(row[2]) + ':' + str(row[3])i
q.put((row[0],row[1],row[2],row[3]))
q.join()
In your thread method, it is recommended to use the "try ... except ... finally". This structure guarantees to return the control to the main thread even when errors occur.
def do_stuff(q):
while True:
try:
#do your works
except:
#log the error
finally:
q.task_done()
Also, in case you want to kill your program, go find out the pid of your main thread and use kill #pid to kill it. In Ubuntu or Mint, use ps -Ao pid,cmd, in the output, you can find out the pid (first column) by searching for the command (second column) you yourself typed to run your Python script.
Your q is hanging because your worker as errored. So your q.task_done() never got called.
import threading
to use
print threading.current_thread().name

How can I track time a python subprocess while it's running dynamically and update a database?

I have a subprocess that encodes a video , and what I would love to do is update at database record with the time it is taking to encode the video (so I can print it out in ajax on a web page)
I am very close - this code I have so far updates the database and encodes the video - but the process/loop gets stuck on the final db.commit() and never exits the while True: loop. Is there a better way to do this? Here is the code I am tinkering with:
time_start = time.time()
try:
p = subprocess.Popen(["avconv" , "-y" , "-t" , "-i" , images , "-i" , music_file , video_filename], universal_newlines=True, stdout=subprocess.PIPE)
while True:
time_now = time.time()
elapsed_time = time_now - time_start
progress_time = "ENCODING TIME" + str(int(elapsed_time)) + " Seconds "
cursor.execute("UPDATE video SET status = %s WHERE id = %s" ,[progress_time , video_id] )
db.commit()
out, err = p.communicate()
retcode = p.wait()
except IOError:
pass
else:
print "] ENCODING OF VIDEO FINISHED:" + str(retcode)
You're right, because you have no way of exiting your infinite loop, it will just spin forever. What you need to do it call check p.poll() to see if the process has exited (it will return none if it hasn't). So, something like:
while True:
if p.poll():
break
... other stuff...
or better yet:
while p.poll() == None:
.... other stuff....
will cause your loop to terminate when the subprocess is complete. then you can call p.communicate() to get the output.
I would also suggest using a sleep or delay in there so that your loop doesn't spin using 100% of your CPU. Only check and update your database every second, not continuously. So:
while p.poll() == None:
time.sleep(1)
...other stuff...
In addition to the infinite loop issue pointed out by #clemej, there is also a possibility of a deadlock because you don't read from p.stdout pipe in the loop despite stdout=subprocess.PIPE i.e., while p.poll() is None: will also loop forever if avconv generates enough output to fill its stdout OS pipe buffer.
Also, I don't see the point to update the progress time in the database while the process is still running. All you need is two records:
video_id, start_time # running jobs
video_id, end_time # finished jobs
If the job is not finished then the progress time is current_server_time - start_time.
If you don't need the output then you could redirect it to devnull:
import os
from subprocess import call
try:
from subprocess import DEVNULL # Python 3
except ImportError:
DEVNULL = open(os.devnull, 'r+b', 0)
start_time = utcnow()
try:
returncode = call(["avconv", "-y", "-t", "-i", images,
"-i", music_file, video_filename],
stdin=DEVNULL, stdout=DEVNULL, stderr=DEVNULL)
finally:
end_time = utcnow()

Why does Popen.poll() return a return code of None even though the sub-process has completed?

I have some Python code that runs on Windows that spawns a subprocess and waits for it to complete. The subprocess isn't well behaved so the script makes a non-blocking spawn call and watches the process on the side. If some timeout threshold is met it kills of the process, assuming it has gone of the rails.
In some instances, which are non-reproducible, the spawned subprocess will just disappear and the watcher routine won't pick up on this fact. It'll keep watching until the timeout threshold is passed, try to kill the subprocess and get an error, and then exit.
What might be causing the fact that the subprocess has gone away to be undetectable to the watcher process? Why isn't the return code trapped and returned by the call to Popen.poll()?
The code I use to spawn and watch the process follows:
import subprocess
import time
def nonblocking_subprocess_call(cmdline):
print 'Calling: %s' % (' '.join(cmdline))
p = subprocess.Popen(cmdline, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
return p
def monitor_subprocess(handle, timeout=1200):
start_time = time.time()
return_code = 0
while True:
time.sleep(60)
return_code = handle.poll()
if return_code == None:
# The process is still running.
if time.time() - start_time > timeout:
print 'Timeout (%d seconds) exceeded -- killing process %i' % (timeout, handle.pid)
return_code = handle.terminate()
# give the kill command a few seconds to work
time.sleep(5)
if not return_code:
print 'Error: Failed to kill subprocess %i -- return code was: %s' % (handle.pid, str(return_code))
# Raise an error to indicate that the process was hung up
# and we had to kill it.
raise RuntimeError
else:
print 'Process exited with return code: %i' % (return_code)
break
return return_code
What I'm seeing is that, in cases where the process has disappeared, the call to return_code = handle.poll() on line 15 is returning None instead of a return code. I know the process has gone away completely -- I can see that it is no longer there in Task Manager. And I know the process disappeared long before the timeout value was reached.
Can you give an example of your cmdline variable? And also what kind of subprocess are you spawning?
I ran this on a test script, calling a batch file with the command:
ping -n 151 127.0.0.1>nul
Sleep for 150 seconds
and it worked fine.
It may be that your subprocess isn't terminating correctly. Also, try changing your sleep command to something like time.sleep(2).
In the past I've found this to work better than a longer sleep (esspecially if your subprocess is another python process).
Also, I'm not sure if your script has this, but in the else: statement, you have an extra parenthesis.
else:
#print 'Process exited with return code: %i' % (return_code))
# There's an extra closing parenthesis
print 'Process exited with return code: %i' % (return_code)
break
And how come you have a global temp_cmdline being called in the join statement:
print 'Calling: %s' % (' '.join(temp_cmdline))
I'm not sure if cmdline is being parsed from a list variable temp_cmdline, or if temp_cmdline is being created from a string split on spaces. Either way, if your cmdline variable is a string, then would it make more sense to just print it?
print 'Calling: %s' % cmdline
poll method on subprocess objects does not seem to work too good.
I used to have same issues while i was spawning some threads to do some job.
I suggest that you use the multiprocessing module.
Popen.poll doesnt work as expected if stdout is captured by something else, you can check taking out this part of the code ", stdout=subprocess.PIPE"

Run a process and kill it if it doesn't end within one hour

I need to do the following in Python. I want to spawn a process (subprocess module?), and:
if the process ends normally, to continue exactly from the moment it terminates;
if, otherwise, the process "gets stuck" and doesn't terminate within (say) one hour, to kill it and continue (possibly giving it another try, in a loop).
What is the most elegant way to accomplish this?
The subprocess module will be your friend. Start the process to get a Popen object, then pass it to a function like this. Note that this only raises exception on timeout. If desired you can catch the exception and call the kill() method on the Popen process. (kill is new in Python 2.6, btw)
import time
def wait_timeout(proc, seconds):
"""Wait for a process to finish, or raise exception after timeout"""
start = time.time()
end = start + seconds
interval = min(seconds / 1000.0, .25)
while True:
result = proc.poll()
if result is not None:
return result
if time.time() >= end:
raise RuntimeError("Process timed out")
time.sleep(interval)
There are at least 2 ways to do this by using psutil as long as you know the process PID.
Assuming the process is created as such:
import subprocess
subp = subprocess.Popen(['progname'])
...you can get its creation time in a busy loop like this:
import psutil, time
TIMEOUT = 60 * 60 # 1 hour
p = psutil.Process(subp.pid)
while 1:
if (time.time() - p.create_time()) > TIMEOUT:
p.kill()
raise RuntimeError('timeout')
time.sleep(5)
...or simply, you can do this:
import psutil
p = psutil.Process(subp.pid)
try:
p.wait(timeout=60*60)
except psutil.TimeoutExpired:
p.kill()
raise
Also, while you're at it, you might be interested in the following extra APIs:
>>> p.status()
'running'
>>> p.is_running()
True
>>>
I had a similar question and found this answer. Just for completeness, I want to add one more way how to terminate a hanging process after a given amount of time: The python signal library
https://docs.python.org/2/library/signal.html
From the documentation:
import signal, os
def handler(signum, frame):
print 'Signal handler called with signal', signum
raise IOError("Couldn't open device!")
# Set the signal handler and a 5-second alarm
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
# This open() may hang indefinitely
fd = os.open('/dev/ttyS0', os.O_RDWR)
signal.alarm(0) # Disable the alarm
Since you wanted to spawn a new process anyways, this might not be the best soloution for your problem, though.
A nice, passive, way is also by using a threading.Timer and setting up callback function.
from threading import Timer
# execute the command
p = subprocess.Popen(command)
# save the proc object - either if you make this onto class (like the example), or 'p' can be global
self.p == p
# config and init timer
# kill_proc is a callback function which can also be added onto class or simply a global
t = Timer(seconds, self.kill_proc)
# start timer
t.start()
# wait for the test process to return
rcode = p.wait()
t.cancel()
If the process finishes in time, wait() ends and code continues here, cancel() stops the timer. If meanwhile the timer runs out and executes kill_proc in a separate thread, wait() will also continue here and cancel() will do nothing. By the value of rcode you will know if we've timeouted or not. Simplest kill_proc: (you can of course do anything extra there)
def kill_proc(self):
os.kill(self.p, signal.SIGTERM)
Koodos to Peter Shinners for his nice suggestion about subprocess module. I was using exec() before and did not have any control on running time and especially terminating it. My simplest template for this kind of task is the following and I am just using the timeout parameter of subprocess.run() function to monitor the running time. Of course you can get standard out and error as well if needed:
from subprocess import run, TimeoutExpired, CalledProcessError
for file in fls:
try:
run(["python3.7", file], check=True, timeout=7200) # 2 hours timeout
print("scraped :)", file)
except TimeoutExpired:
message = "Timeout :( !!!"
print(message, file)
f.write("{message} {file}\n".format(file=file, message=message))
except CalledProcessError:
message = "SOMETHING HAPPENED :( !!!, CHECK"
print(message, file)
f.write("{message} {file}\n".format(file=file, message=message))

Python, Popen and select - waiting for a process to terminate or a timeout

I run a subprocess using:
p = subprocess.Popen("subprocess",
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=subprocess.PIPE)
This subprocess could either exit immediately with an error on stderr, or keep running. I want to detect either of these conditions - the latter by waiting for several seconds.
I tried this:
SECONDS_TO_WAIT = 10
select.select([],
[p.stdout, p.stderr],
[p.stdout, p.stderr],
SECONDS_TO_WAIT)
but it just returns:
([],[],[])
on either condition. What can I do?
Have you tried using the Popen.Poll() method. You could just do this:
p = subprocess.Popen("subprocess",
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=subprocess.PIPE)
time.sleep(SECONDS_TO_WAIT)
retcode = p.poll()
if retcode is not None:
# process has terminated
This will cause you to always wait 10 seconds, but if the failure case is rare this would be amortized over all the success cases.
Edit:
How about:
t_nought = time.time()
seconds_passed = 0
while(p.poll() is not None and seconds_passed < 10):
seconds_passed = time.time() - t_nought
if seconds_passed >= 10:
#TIMED OUT
This has the ugliness of being a busy wait, but I think it accomplishes what you want.
Additionally looking at the select call documentation again I think you may want to change it as follows:
SECONDS_TO_WAIT = 10
select.select([p.stderr],
[],
[p.stdout, p.stderr],
SECONDS_TO_WAIT)
Since you would typically want to read from stderr, you want to know when it has something available to read (ie the failure case).
I hope this helps.
This is what i came up with. Works when you need and don't need to timeout on thep process, but with a semi-busy loop.
def runCmd(cmd, timeout=None):
'''
Will execute a command, read the output and return it back.
#param cmd: command to execute
#param timeout: process timeout in seconds
#return: a tuple of three: first stdout, then stderr, then exit code
#raise OSError: on missing command or if a timeout was reached
'''
ph_out = None # process output
ph_err = None # stderr
ph_ret = None # return code
p = subprocess.Popen(cmd, shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# if timeout is not set wait for process to complete
if not timeout:
ph_ret = p.wait()
else:
fin_time = time.time() + timeout
while p.poll() == None and fin_time > time.time():
time.sleep(1)
# if timeout reached, raise an exception
if fin_time < time.time():
# starting 2.6 subprocess has a kill() method which is preferable
# p.kill()
os.kill(p.pid, signal.SIGKILL)
raise OSError("Process timeout has been reached")
ph_ret = p.returncode
ph_out, ph_err = p.communicate()
return (ph_out, ph_err, ph_ret)
Here is a nice example:
from threading import Timer
from subprocess import Popen, PIPE
proc = Popen("ping 127.0.0.1", shell=True)
t = Timer(60, proc.kill)
t.start()
proc.wait()
Using select and sleeping doesn't really make much sense. select (or any kernel polling mechanism) is inherently useful for asynchronous programming, but your example is synchronous. So either rewrite your code to use the normal blocking fashion or consider using Twisted:
from twisted.internet.utils import getProcessOutputAndValue
from twisted.internet import reactor
def stop(r):
reactor.stop()
def eb(reason):
reason.printTraceback()
def cb(result):
stdout, stderr, exitcode = result
# do something
getProcessOutputAndValue('/bin/someproc', []
).addCallback(cb).addErrback(eb).addBoth(stop)
reactor.run()
Incidentally, there is a safer way of doing this with Twisted by writing your own ProcessProtocol:
http://twistedmatrix.com/projects/core/documentation/howto/process.html
Python 3.3
import subprocess as sp
try:
sp.check_call(["/subprocess"], timeout=10,
stdin=sp.DEVNULL, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
except sp.TimeoutError:
# timeout (the subprocess is killed at this point)
except sp.CalledProcessError:
# subprocess failed before timeout
else:
# subprocess ended successfully before timeout
See TimeoutExpired docs.
If, as you said in the comments above, you're just tweaking the output each time and re-running the command, would something like the following work?
from threading import Timer
import subprocess
WAIT_TIME = 10.0
def check_cmd(cmd):
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
def _check():
if p.poll()!=0:
print cmd+" did not quit within the given time period."
# check whether the given process has exited WAIT_TIME
# seconds from now
Timer(WAIT_TIME, _check).start()
check_cmd('echo')
check_cmd('python')
The code above, when run, outputs:
python did not quit within the given time period.
The only downside of the above code that I can think of is the potentially overlapping processes as you keep running check_cmd.
This is a paraphrase on Evan's answer, but it takes into account the following :
Explicitly canceling the Timer object : if the Timer interval would be long and the process will exit by its "own will" , this could hang your script :(
There is an intrinsic race in the Timer approach (the timer attempt killing the process just after the process has died and this on Windows will raise an exception).
DEVNULL = open(os.devnull, "wb")
process = Popen("c:/myExe.exe", stdout=DEVNULL) # no need for stdout
def kill_process():
""" Kill process helper"""
try:
process.kill()
except OSError:
pass # Swallow the error
timer = Timer(timeout_in_sec, kill_process)
timer.start()
process.wait()
timer.cancel()

Categories