How can I keep the ROS Publisher publishing the messages while calling a sub-process:
import subprocess
import rospy
class Pub():
def __init__(self):
pass
def updateState(self, msg):
cmd = ['python3', planner_path, "--alias", search_options, "--plan-file", plan_path, domain_path, problem_path]
subprocess.run(cmd, shell=False, stdout=subprocess.PIPE)
self.plan_pub.publish(msg)
def myPub(self):
rospy.init_node('problem_formulator', anonymous=True)
self.plan_pub = rospy.Publisher("plan", String, queue_size=10)
rate = rospy.Rate(10) # 10hz
rospy.Subscriber('model', String, updateState)
rospy.sleep(1)
rospy.spin()
if __name__ == "__main__":
p_ = Pub()
p_.myPub()
Since subprocess.call is a blocking call your subscription callback may take a long time.
Run the command described by args. Wait for command to complete, then return the returncode attribute.
ROS itself will not call the callback again while it is executed already. This means you are blocking this and potentially also other callbacks to be called in time.
The most simple solution would be to replace subprocess.call by subprocess.Popen which
Execute a child program in a new process
nonblocking.
But keep in mind that this potentially starts the process multiple times quite fast.
Think about starting the process only conditionally if not already running. This can be achieved by checking the process to be finished in another thread. Simple but effective, use boolean flag. Here is a small prototype:
def updateState(self, msg):
#Start the process if not already running
if not self._process_running:
p = subprocess.Popen(...)
self._process_running = True
def wait_process():
while p.poll() is None:
time.sleep(0.1)
self._process_running = False
threading.Thread(target=wait_process).start()
#Other callback code
self.plan_pub.publish(msg)
Related
I am preparing a Python multiprocessing tool where I use Process and Queue commands. The queue is putting another script in a process to run in parallel. As a sanity check, in the queue, I want to check if there is any error happing in my other script and return a flag/message if there was an error (status = os.system() will run the process and status is a flag for error). But I can't output errors from the queue/child in the consumer process to the parent process. Following are the main parts of my code (shortened):
import os
import time
from multiprocessing import Process, Queue, Lock
command_queue = Queue()
lock = Lock()
p = Process(target=producer, args=(command_queue, lock, test_config_list_path))
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock))
consumers.append(c)
p.daemon = True
p.start()
for c in consumers:
c.daemon = True
c.start()
p.join()
for c in consumers:
c.join()
if error_flag:
Stop_this_process_and_send_a_message!
def producer(queue, lock, ...):
for config_path in test_config_list_path:
queue.put((config_path, process_to_be_queued))
def consumer(queue, lock):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag = 1
time.sleep(3)
Now I want to get that error_flag and use it in the main code to handle things. But seems I can't output error_flag from the consumer (child) part to the main part of the code. I'd appreciate it if someone can help with this.
Given your update, I also pass an multiprocessing.Event instance to your to_do process. This allows you to simply issue a call to wait on the event in the main process, which will block until a call to set is called on it. Naturally, when to_do or one of its threads detects a script error, it would call set on the event after setting error_flag.value to True. This will wake up the main process who can then call method terminate on the process, which will do what you want. On a normal completion of to_do, it still is necessary to call set on the event since the main process is blocking until the event has been set. But in this case the main process will just call join on the process.
Using a multiprocessing.Value instance alone would have required periodically checking its value in a loop, so I think waiting on a multiprocessing.Event is better. I have also made a couple of other updates to your code with comments, so please review them:
import multiprocessing
from ctypes import c_bool
...
def to_do(event, error_flag):
# Run the tests
wrapper_threads.main(event, error_flag)
# on error or normal process completion:
event.set()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
# Call to time.sleep(some_number_of_seconds) should go here, right?
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
while True:
status = git_pull_change(git_path)
if status:
# The repo was just pulled, so no point in doing it again:
#repo = Repo(git_path)
#repo.remotes.origin.pull()
event = multiprocessing.Event()
error_flag = multiprocessing.Value(c_bool, False, lock=False)
process = multiprocessing.Process(target=to_do, args=(event, error_flag))
process.start()
# wait for an error or normal process completion:
event.wait()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
process.terminate() # Kill the process
else:
process.join()
break
You should always tag multiprocessing questions with the platform you are running on. Since I do not see your process-creating code within a if __name__ == '__main__': block, I have to assume you are running on a platform that uses OS fork calls to create new processes, such as Linux.
That means your newly created processes inherit the value of error_flag when they are created but for all intents and purposes, if a process modifies this variable, it is modifying a local copy of this variable that exists in an address space that is unique to that process.
You need to create error_flag in shared memory and pass it as an argument to your process:
from multiprocessing import Value
from ctypes import c_bool
...
error_flag = Value(c_bool, False, lock=False)
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock, error_flag))
consumers.append(c)
...
if error_flag.value:
...
#Stop_this_process_and_send_a_message!
def consumer(queue, lock, error_flag):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag.value = True
time.sleep(3)
But I have a questions/comments for you. You have in your original code the following statement:
if error_flag:
Stop_this_process_and_send_a_message!
But this statement is located after you have already joined all the started processes. So what processes are there to stop and where are you sending a message to (you have potentially multiple consumers any of which might be setting the error_flag -- by the way, no need to have this done under a lock since setting the value True is an atomic action). And since you are joining all your processes, i.e. waiting for them to complete, I am not sure why you are making them daemon processes. You are also passing a Lock instance to your producer and consumers, but it is not being used at all.
Your consumers return when they get a None record from the queue. So if you have N consumers, the last N elements of test_config_path need to be None.
I also see no need for having the producer process. The main process could just as well write all the records to the queue either before or even after it starts the consumer processes.
The call to time.sleep(3) you have at the end of function consumer is unreachable.
So the above code summary is the inner process to run some tests in parallel. I removed the def function part from it, but just assume that is the wrapper_threads in the following code summary. Here I'll add the parent process which is checking a variable (let's assume a commit in my git repo). The following process is meant to run indefinitely and when there is a change it will trigger the multiprocess in the main question:
def to_do():
# Run the tests
wrapper_threads.main()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
process = None
while True:
status = git_pull_change(git_path)
if status:
repo = Repo(git_path)
repo.remotes.origin.pull()
process = multiprocessing.Process(target=to_do)
process.start()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
os.system('pkill -U user XXX')
break
Now I want to propagate that error_flag from the child process to this process and stop process XXX. The problem is that I don't know how to bring that error_flag to this (grand)parent process.
def adbshell(command, serial=None, adbpath='adb'):
args = [adbpath]
if serial is not None:
args.extend(['-s', serial])
args.extend(['shell', command])
return subprocess.check_output(args)
def pmpath(serial=None, adbpath='adb'):
return adbshell('am instrument -e class............', serial=serial, adbpath=adbpath)
I have to run this test for a specific time period, and then exit if it is not working. How do I provide a timeout?
Depending which Python version you are running.
Python 3.3 onwards:
subprocess.check_output() provides a timeout param. Check the signature here
subprocess.check_output(args, *, stdin=None, stderr=None, shell=False, universal_newlines=False, timeout=None)
Below Python 3.3:
You can use threading module. Something like:
def run(args, timeout):
def target():
print 'Start thread'
subprocess.check_output(args)
print 'End thread'
thread = threading.Thread(target=target)
thread.start() # Start executing the target()
thread.join(timeout) # Join the thread after specified timeout
Note - I haven't tested the code above with threading and check_output(). Normally I use the subprocess.Popen() which offers more flexibility and handles almost all scenarios. Check the doc
The Popen constructure provides more flexiblity, as it can be used to check the exit status of the subprocess call.
The Popen.poll returns None if the process has not terminated yet. Hence call the subrprocess, sleep for the time required time out.
consider a simple test.py which is the subprocess called from the main program.
import time
for i in range(10):
print i
time.sleep(2)
The test.py is called from another program using the subprocess.Popen
from subprocess import Popen, PIPE
import time
cmd = Popen(['python','test.py'],stdout=PIPE)
print cmd.poll()
time.sleep(2)
if cmd.poll()== None:
print "killing"
cmd.terminate()
time.sleep(2)
provides a time out of 2 seconds, so that the program can excecute.
checks the exit status of the process using Popen.poll
if None, the process has not terminated, kills the process.
I would like to repeatedly execute a subprocess as fast as possible. However, sometimes the process will take too long, so I want to kill it.
I use signal.signal(...) like below:
ppid=pipeexe.pid
signal.signal(signal.SIGALRM, stop_handler)
signal.alarm(1)
.....
def stop_handler(signal, frame):
print 'Stop test'+testdir+'for time out'
if(pipeexe.poll()==None and hasattr(signal, "SIGKILL")):
os.kill(ppid, signal.SIGKILL)
return False
but sometime this code will try to stop the next round from executing.
Stop test/home/lu/workspace/152/treefit/test2for time out
/bin/sh: /home/lu/workspace/153/squib_driver: not found ---this is the next execution; the program wrongly stops it.
Does anyone know how to solve this? I want to stop in time not execute 1 second the time.sleep(n) often wait n seconds. I do not want that I want it can execute less than 1 second
You could do something like this:
import subprocess as sub
import threading
class RunCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = sub.Popen(self.cmd)
self.p.wait()
def Run(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate() #use self.p.kill() if process needs a kill -9
self.join()
RunCmd(["./someProg", "arg1"], 60).Run()
The idea is that you create a thread that runs the command and to kill it if the timeout exceeds some suitable value, in this case 60 seconds.
Here is something I wrote as a watchdog for subprocess execution. I use it now a lot, but I'm not so experienced so maybe there are some flaws in it:
import subprocess
import time
def subprocess_execute(command, time_out=60):
"""executing the command with a watchdog"""
# launching the command
c = subprocess.Popen(command)
# now waiting for the command to complete
t = 0
while t < time_out and c.poll() is None:
time.sleep(1) # (comment 1)
t += 1
# there are two possibilities for the while to have stopped:
if c.poll() is None:
# in the case the process did not complete, we kill it
c.terminate()
# and fill the return code with some error value
returncode = -1 # (comment 2)
else:
# in the case the process completed normally
returncode = c.poll()
return returncode
Usage:
return = subprocess_execute(['java', '-jar', 'some.jar'])
Comments:
here, the watchdog time out is in seconds; but it's easy to change to whatever needed by changing the time.sleep() value. The time_out will have to be documented accordingly;
according to what is needed, here it maybe more suitable to raise some exception.
Documentation: I struggled a bit with the documentation of subprocess module to understand that subprocess.Popen is not blocking; the process is executed in parallel (maybe I do not use the correct word here, but I think it's understandable).
But as what I wrote is linear in its execution, I really have to wait for the command to complete, with a time out to avoid bugs in the command to pause the nightly execution of the script.
I guess this is a common synchronization problem in event-oriented programming with threads and processes.
If you should always have only one subprocess running, make sure the current subprocess is killed before running the next one. Otherwise the signal handler may get a reference to the last subprocess run and ignore the older.
Suppose subprocess A is running. Before the alarm signal is handled, subprocess B is launched. Just after that, your alarm signal handler attempts to kill a subprocess. As the current PID (or the current subprocess pipe object) was set to B's when launching the subprocess, B gets killed and A keeps running.
Is my guess correct?
To make your code easier to understand, I would include the part that creates a new subprocess just after the part that kills the current subprocess. That would make clear there is only one subprocess running at any time. The signal handler could do both the subprocess killing and launching, as if it was the iteration block that runs in a loop, in this case event-driven with the alarm signal every 1 second.
Here's what I use:
class KillerThread(threading.Thread):
def __init__(self, pid, timeout, event ):
threading.Thread.__init__(self)
self.pid = pid
self.timeout = timeout
self.event = event
self.setDaemon(True)
def run(self):
self.event.wait(self.timeout)
if not self.event.isSet() :
try:
os.kill( self.pid, signal.SIGKILL )
except OSError, e:
#This is raised if the process has already completed
pass
def runTimed(dt, dir, args, kwargs ):
event = threading.Event()
cwd = os.getcwd()
os.chdir(dir)
proc = subprocess.Popen(args, **kwargs )
os.chdir(cwd)
killer = KillerThread(proc.pid, dt, event)
killer.start()
(stdout, stderr) = proc.communicate()
event.set()
return (stdout,stderr, proc.returncode)
A bit more complex, I added an answer to solve a similar problem: Capturing stdout, feeding stdin, and being able to terminate after some time of inactivity and/or after some overall runtime.
I have a service that is running (Twisted jsonrpc server). When I make a call to "run_procs" the service will look at a bunch of objects and inspect their timestamp property to see if they should run. If they should, they get added to a thread_pool (list) and then every item in the thread_pool gets the start() method called.
I have used this setup for several other applications where I wanted to run a function within my class with theading. However, when I am using a subprocess.Popen call in the function called by each thread, the calls run one-at-a-time instead of running concurrently like I would expect.
Here is some sample code:
class ProcService(jsonrpc.JSONRPC):
self.thread_pool = []
self.running_threads = []
self.lock = threading.Lock()
def clean_pool(self, thread_pool, join=False):
for th in [x for x in thread_pool if not x.isAlive()]:
if join: th.join()
thread_pool.remove(th)
del th
return thread_pool
def run_threads(self, parallel=10):
while len(self.running_threads)+len(self.thread_pool) > 0:
self.clean_pool(self.running_threads, join=True)
n = min(max(parallel - len(self.running_threads), 0), len(self.thread_pool))
if n > 0:
for th in self.thread_pool[0:n]: th.start()
self.running_threads.extend(self.thread_pool[0:n])
del self.thread_pool[0:n]
time.sleep(.01)
for th in self.running_threads+self.thread_pool: th.join()
def jsonrpc_run_procs(self):
for i, item in enumerate(self.items):
if item.should_run():
self.thread_pool.append(threading.Thread(target=self.run_proc, args=tuple([item])))
self.run_threads(5)
def run_proc(self, proc):
self.lock.acquire()
print "\nSubprocess started"
p = subprocess.Popen('%s/program_to_run.py %s' %(os.getcwd(), proc.data), shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
self.lock.release()
Any help/suggestions are appreciated.
* EDIT *
OK. So now I want to read back the output from the stdout pipe. This works some of the time, but also fails with select.error: (4, 'Interrupted system call') I assume this is because sometimes the process has already terminated before I try to run the communicate method. the code in the run_proc method has been changed to:
def run_proc(self, proc):
self.lock.acquire()
p = subprocess.Popen( #etc
self.running_procs.append([p, proc.data.id])
self.lock.release()
after I call self.run_threads(5) I call self.check_procs()
check_procs method iterates the list of running_procs to check for poll() is not None. How can I get output from pipe? I have tried both of the following
calling check_procs once:
def check_procs(self):
for proc_details in self.running_procs:
proc = proc_details[0]
while (proc.poll() == None):
time.sleep(0.1)
stdout_value = proc.communicate('through stdin to stdout')[0]
self.running_procs.remove(proc_details)
print proc_details[1], stdout_value
del proc_details
calling check_procs in while loop like:
while len(self.running_procs) > 0:
self.check_procs()
def check_procs(self):
for proc_details in self.running_procs:
if (proc.poll() is not None):
stdout_value = proc.communicate('through stdin to stdout')[0]
self.running_procs.remove(proc_details)
print proc_details[1], stdout_value
del proc_details
I think the key code is:
self.lock.acquire()
print "\nSubprocess started"
p = subprocess.Popen( # etc
stdout_value = proc.communicate('through stdin to stdout')[0]
self.lock.release()
the explicit calls to acquire and release should guarantee serialization -- don't you observe serialization just as invariably if you do other things in this block instead of the subprocess use?
Edit: all silence here, so I'll add the suggestion to remove the locking and instead put each stdout_value on a Queue.Queue() instance -- Queue is intrinsicaly threadsafe (deals with its own locking) so you can get (or get_nowait, etc etc) results from it once they're ready and have been put there. In general, Queue is the best way to arrange thread communication (and often synchronization too) in Python, any time it can be feasibly arranged to do things that way.
Specifically: add import Queue at the start; give up making, acquiring and releasing self.lock (just delete those three lines); add self.q = Queue.Queue() to the __init__; right after the call stdout_value = proc.communicate(... add one statement self.q.put(stdout_value); now e.g finish the jsonrpc_run_procs method with
while not self.q.empty():
result = self.q.get()
print 'One result is %r' % result
to confirm that all the results are there. (Normally the empty method of queues is not reliable, but in this case all threads putting to the queue are already finished, so you should be fine).
Your specific problem is probably caused by the line stdout_value = proc.communicate('through stdin to stdout')[0]. Subprocess.communicate will "Wait for process to terminate", which, when used with a lock, will run one at a time.
What you can do is simply add the p variable to a list and run and use the Subprocess API to wait for the subprocesses to finish. Periodically poll each subprocess in your main thread.
On second look, it looks like you may have an issue on this line as well: for th in self.running_threads+self.thread_pool: th.join(). Thread.join() is another method that will wait for the thread to finish.
I need to do the following in Python. I want to spawn a process (subprocess module?), and:
if the process ends normally, to continue exactly from the moment it terminates;
if, otherwise, the process "gets stuck" and doesn't terminate within (say) one hour, to kill it and continue (possibly giving it another try, in a loop).
What is the most elegant way to accomplish this?
The subprocess module will be your friend. Start the process to get a Popen object, then pass it to a function like this. Note that this only raises exception on timeout. If desired you can catch the exception and call the kill() method on the Popen process. (kill is new in Python 2.6, btw)
import time
def wait_timeout(proc, seconds):
"""Wait for a process to finish, or raise exception after timeout"""
start = time.time()
end = start + seconds
interval = min(seconds / 1000.0, .25)
while True:
result = proc.poll()
if result is not None:
return result
if time.time() >= end:
raise RuntimeError("Process timed out")
time.sleep(interval)
There are at least 2 ways to do this by using psutil as long as you know the process PID.
Assuming the process is created as such:
import subprocess
subp = subprocess.Popen(['progname'])
...you can get its creation time in a busy loop like this:
import psutil, time
TIMEOUT = 60 * 60 # 1 hour
p = psutil.Process(subp.pid)
while 1:
if (time.time() - p.create_time()) > TIMEOUT:
p.kill()
raise RuntimeError('timeout')
time.sleep(5)
...or simply, you can do this:
import psutil
p = psutil.Process(subp.pid)
try:
p.wait(timeout=60*60)
except psutil.TimeoutExpired:
p.kill()
raise
Also, while you're at it, you might be interested in the following extra APIs:
>>> p.status()
'running'
>>> p.is_running()
True
>>>
I had a similar question and found this answer. Just for completeness, I want to add one more way how to terminate a hanging process after a given amount of time: The python signal library
https://docs.python.org/2/library/signal.html
From the documentation:
import signal, os
def handler(signum, frame):
print 'Signal handler called with signal', signum
raise IOError("Couldn't open device!")
# Set the signal handler and a 5-second alarm
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
# This open() may hang indefinitely
fd = os.open('/dev/ttyS0', os.O_RDWR)
signal.alarm(0) # Disable the alarm
Since you wanted to spawn a new process anyways, this might not be the best soloution for your problem, though.
A nice, passive, way is also by using a threading.Timer and setting up callback function.
from threading import Timer
# execute the command
p = subprocess.Popen(command)
# save the proc object - either if you make this onto class (like the example), or 'p' can be global
self.p == p
# config and init timer
# kill_proc is a callback function which can also be added onto class or simply a global
t = Timer(seconds, self.kill_proc)
# start timer
t.start()
# wait for the test process to return
rcode = p.wait()
t.cancel()
If the process finishes in time, wait() ends and code continues here, cancel() stops the timer. If meanwhile the timer runs out and executes kill_proc in a separate thread, wait() will also continue here and cancel() will do nothing. By the value of rcode you will know if we've timeouted or not. Simplest kill_proc: (you can of course do anything extra there)
def kill_proc(self):
os.kill(self.p, signal.SIGTERM)
Koodos to Peter Shinners for his nice suggestion about subprocess module. I was using exec() before and did not have any control on running time and especially terminating it. My simplest template for this kind of task is the following and I am just using the timeout parameter of subprocess.run() function to monitor the running time. Of course you can get standard out and error as well if needed:
from subprocess import run, TimeoutExpired, CalledProcessError
for file in fls:
try:
run(["python3.7", file], check=True, timeout=7200) # 2 hours timeout
print("scraped :)", file)
except TimeoutExpired:
message = "Timeout :( !!!"
print(message, file)
f.write("{message} {file}\n".format(file=file, message=message))
except CalledProcessError:
message = "SOMETHING HAPPENED :( !!!, CHECK"
print(message, file)
f.write("{message} {file}\n".format(file=file, message=message))