I have a service that is running (Twisted jsonrpc server). When I make a call to "run_procs" the service will look at a bunch of objects and inspect their timestamp property to see if they should run. If they should, they get added to a thread_pool (list) and then every item in the thread_pool gets the start() method called.
I have used this setup for several other applications where I wanted to run a function within my class with theading. However, when I am using a subprocess.Popen call in the function called by each thread, the calls run one-at-a-time instead of running concurrently like I would expect.
Here is some sample code:
class ProcService(jsonrpc.JSONRPC):
self.thread_pool = []
self.running_threads = []
self.lock = threading.Lock()
def clean_pool(self, thread_pool, join=False):
for th in [x for x in thread_pool if not x.isAlive()]:
if join: th.join()
thread_pool.remove(th)
del th
return thread_pool
def run_threads(self, parallel=10):
while len(self.running_threads)+len(self.thread_pool) > 0:
self.clean_pool(self.running_threads, join=True)
n = min(max(parallel - len(self.running_threads), 0), len(self.thread_pool))
if n > 0:
for th in self.thread_pool[0:n]: th.start()
self.running_threads.extend(self.thread_pool[0:n])
del self.thread_pool[0:n]
time.sleep(.01)
for th in self.running_threads+self.thread_pool: th.join()
def jsonrpc_run_procs(self):
for i, item in enumerate(self.items):
if item.should_run():
self.thread_pool.append(threading.Thread(target=self.run_proc, args=tuple([item])))
self.run_threads(5)
def run_proc(self, proc):
self.lock.acquire()
print "\nSubprocess started"
p = subprocess.Popen('%s/program_to_run.py %s' %(os.getcwd(), proc.data), shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
self.lock.release()
Any help/suggestions are appreciated.
* EDIT *
OK. So now I want to read back the output from the stdout pipe. This works some of the time, but also fails with select.error: (4, 'Interrupted system call') I assume this is because sometimes the process has already terminated before I try to run the communicate method. the code in the run_proc method has been changed to:
def run_proc(self, proc):
self.lock.acquire()
p = subprocess.Popen( #etc
self.running_procs.append([p, proc.data.id])
self.lock.release()
after I call self.run_threads(5) I call self.check_procs()
check_procs method iterates the list of running_procs to check for poll() is not None. How can I get output from pipe? I have tried both of the following
calling check_procs once:
def check_procs(self):
for proc_details in self.running_procs:
proc = proc_details[0]
while (proc.poll() == None):
time.sleep(0.1)
stdout_value = proc.communicate('through stdin to stdout')[0]
self.running_procs.remove(proc_details)
print proc_details[1], stdout_value
del proc_details
calling check_procs in while loop like:
while len(self.running_procs) > 0:
self.check_procs()
def check_procs(self):
for proc_details in self.running_procs:
if (proc.poll() is not None):
stdout_value = proc.communicate('through stdin to stdout')[0]
self.running_procs.remove(proc_details)
print proc_details[1], stdout_value
del proc_details
I think the key code is:
self.lock.acquire()
print "\nSubprocess started"
p = subprocess.Popen( # etc
stdout_value = proc.communicate('through stdin to stdout')[0]
self.lock.release()
the explicit calls to acquire and release should guarantee serialization -- don't you observe serialization just as invariably if you do other things in this block instead of the subprocess use?
Edit: all silence here, so I'll add the suggestion to remove the locking and instead put each stdout_value on a Queue.Queue() instance -- Queue is intrinsicaly threadsafe (deals with its own locking) so you can get (or get_nowait, etc etc) results from it once they're ready and have been put there. In general, Queue is the best way to arrange thread communication (and often synchronization too) in Python, any time it can be feasibly arranged to do things that way.
Specifically: add import Queue at the start; give up making, acquiring and releasing self.lock (just delete those three lines); add self.q = Queue.Queue() to the __init__; right after the call stdout_value = proc.communicate(... add one statement self.q.put(stdout_value); now e.g finish the jsonrpc_run_procs method with
while not self.q.empty():
result = self.q.get()
print 'One result is %r' % result
to confirm that all the results are there. (Normally the empty method of queues is not reliable, but in this case all threads putting to the queue are already finished, so you should be fine).
Your specific problem is probably caused by the line stdout_value = proc.communicate('through stdin to stdout')[0]. Subprocess.communicate will "Wait for process to terminate", which, when used with a lock, will run one at a time.
What you can do is simply add the p variable to a list and run and use the Subprocess API to wait for the subprocesses to finish. Periodically poll each subprocess in your main thread.
On second look, it looks like you may have an issue on this line as well: for th in self.running_threads+self.thread_pool: th.join(). Thread.join() is another method that will wait for the thread to finish.
Related
How can I keep the ROS Publisher publishing the messages while calling a sub-process:
import subprocess
import rospy
class Pub():
def __init__(self):
pass
def updateState(self, msg):
cmd = ['python3', planner_path, "--alias", search_options, "--plan-file", plan_path, domain_path, problem_path]
subprocess.run(cmd, shell=False, stdout=subprocess.PIPE)
self.plan_pub.publish(msg)
def myPub(self):
rospy.init_node('problem_formulator', anonymous=True)
self.plan_pub = rospy.Publisher("plan", String, queue_size=10)
rate = rospy.Rate(10) # 10hz
rospy.Subscriber('model', String, updateState)
rospy.sleep(1)
rospy.spin()
if __name__ == "__main__":
p_ = Pub()
p_.myPub()
Since subprocess.call is a blocking call your subscription callback may take a long time.
Run the command described by args. Wait for command to complete, then return the returncode attribute.
ROS itself will not call the callback again while it is executed already. This means you are blocking this and potentially also other callbacks to be called in time.
The most simple solution would be to replace subprocess.call by subprocess.Popen which
Execute a child program in a new process
nonblocking.
But keep in mind that this potentially starts the process multiple times quite fast.
Think about starting the process only conditionally if not already running. This can be achieved by checking the process to be finished in another thread. Simple but effective, use boolean flag. Here is a small prototype:
def updateState(self, msg):
#Start the process if not already running
if not self._process_running:
p = subprocess.Popen(...)
self._process_running = True
def wait_process():
while p.poll() is None:
time.sleep(0.1)
self._process_running = False
threading.Thread(target=wait_process).start()
#Other callback code
self.plan_pub.publish(msg)
I'm making remote API calls using threads, using no join so that the program could make the next API call without waiting for the last to complete.
Like so:
def run_single_thread_no_join(function, args):
thread = Thread(target=function, args=(args,))
thread.start()
return
The problem was I needed to know when all API calls were completed. So I moved to code that's using a cue & join.
Threads seem to run in serial now.
I can't seem to figure out how to get the join to work so that threads execute in parallel.
What am I doing wrong?
def run_que_block(methods_list, num_worker_threads=10):
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param num_worker_threads: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
# log = StandardLogger(logger_name='run_que_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
print(item)
msg = threading.current_thread().name, item
# log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
while True:
item = q.get()
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(num_worker_threads):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method[0](*method[1]))
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
return method_returns
You're doing all the work in the main thread:
for method in methods_list:
q.put(method[0](*method[1]))
Assuming each entry in methods_list is a callable and a sequence of arguments for it, you did all the work in the main thread, then put the result from each function call in the queue, which doesn't allow any parallelization aside from printing (which is generally not a big enough cost to justify thread/queue overhead).
Presumably, you want the threads to do the work for each function, so change that loop to:
for method in methods_list:
q.put(method) # Don't call it, queue it to be called in worker
and change the _worker function so it calls the function that does the work in the thread:
def _worker():
while True:
method, args = q.get() # Extract and unpack callable and arguments
item = method(*args) # Call callable with provided args and store result
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
This script used to have the Queue as a global object that could be accessed where the threads were being instantiated and in the threaded function itself, but to make things cleaner I refactored things in a more "acceptable" way and instead decided to pass in the Queue's get() and task_done() methods to the threaded function on its instantiation to get rid of the global; however, I'm noticing that join() hangs indefinitely now as a result, whereas before when the Queue was global, it would always run to completion and terminate appropriately.
For full context, here is the source: http://dpaste.com/0PD6SFX
However, here are what I think are the only relevant snippets of code; init is my main method essentially and the class it belongs to possesses the Queue, while run is the method of class Transcode being run (the rest of the Transcode class details aren't relevant, I feel):
def __init__(self, kargs):
self.params = kargs
# will store generated Transcode objects
self.transcodes = []
# Queue to thread ffmpeg processes with
self.q = Queue.Queue()
# generate transcodes with self.params for all rasters x formats
self.generate_transcodes()
# give the Queue the right number of task threads to make
for i in range(self.get_num_transcodes()):
self.q.put(i)
for transcode in self.transcodes:
transcode.print_commands()
# testing code to be sure command strings are generating appropriately
for transcode in self.transcodes:
self.q.put(transcode)
# kick off all transcodes by creating a new daemon (terminating on program close) thread;
# the thread is given each Transcode's run() method, which is dynamically created within
# the Transcode constructor given its command strings
for transcode in self.transcodes:
t = threading.Thread(target=transcode.run, args=(self.q.get, self.q.task_done))
t.daemon = True
t.start()
print("Transcoding in progress...")
# go through each transcode and print which process is currently underway, then sleep
# 1 = first pass, 2 = second pass, 3 = complete
while True:
still_running = False
for transcode in self.transcodes:
if not transcode.complete:
still_running = True
print('Transcode %s still running!' % transcode.filename)
if transcode.current_proc in range(3):
print(os.path.basename(transcode.filename) + ': pass %s' % transcode.current_proc)
else:
print(os.path.basename(transcode.filename) + ': complete!')
print(transcode.complete)
if not still_running:
break
time.sleep(2)
print('poll')
# block until all items in queue are gotten and processed
print('About to join...')
self.q.join()
print('End of transcode batch!')
'''
Executes sequentially the command strings given to this Transcode via subprocess; it will
first execute one, then the next, as the second command relies on the first being completed.
It will take in a get() and done() method that are Queue methods and call them at the right
moments to signify to an external Queue object that the worker thread needed for this instance
is ready to start and finish.
#param get A Queue get() function to be called at run()'s start.
#param done A Queue done() function to be called at run()'s end.
#return none
'''
def run(self, get, done):
# get a thread from the queue
get()
# convert command lists to command strings
for i in range(len(self.commands)):
self.commands[i] = ' '.join(self.commands[i])
# show that we're working with our first command string
self.current_proc = 1
# assign our current proc the first command string subprocess
self.proc = Popen(self.commands[0], stdout=PIPE, stderr=PIPE, shell=True)
# execute process until complete
self.proc.communicate()
print('Transcode %s first pass complete' % self.identifier)
# run second command string if exists
if len(self.commands) > 1:
# show that we're working with second command string
self.current_proc = 2
# spawn second process and parse output line by line as before
self.proc = Popen(self.commands[1], stdout=PIPE, stderr=PIPE, shell=True)
# execute process until complete
self.proc.communicate()
print('Transcode %s second pass complete' % self.identifier)
# delete log files when done
if self.logfile and os.path.exists(self.logfile + '-0.log'):
os.remove(self.logfile + '-0.log')
if self.logfile and os.path.exists(self.logfile + '-0.log.mbtree'):
os.remove(self.logfile + '-0.log.mbtree')
if self.logfile and os.path.exists(self.logfile + '-0.log.temp'):
os.remove(self.logfile + '-0.log.temp')
if self.logfile and os.path.exists(self.logfile + '-0.log.mbtree.temp'):
os.remove(self.logfile + '-0.log.mbtree.temp')
self.complete = True
print('Transcode %s complete' % self.identifier)
# assign value of 3 to signify completed task
self.current_proc = 3
# finish up with Queue task
done()
The solution was very silly: entries were being added to the Queue twice for each of the transcodes taking place.
# give the Queue the right number of task threads to make
for i in range(self.get_num_transcodes()):
self.q.put(i)
# code omitted...
# testing code to be sure command strings are generating appropriately
for transcode in self.transcodes:
self.q.put(transcode)
Twas a vestige from refactoring. The Queue therefore thought it needed twice as many worker threads as it actually did and was waiting for its remaining entries to be processed.
I have a program, P1, which I need to run about 24*20000 times with different inputs. The problem is the P1 hangs and I should force it to quite manually (kill). My first solution was writing a python script to call P1 and passing the proper input and receiving the output using popen and communicate. But due to the nature of communicate which waits for the output, I can not kill the process as long as it is waiting for the response. I am on Windows.
I tried to use multiprocess function, but it only runs the P1 and failed in sending the input to it. I am suspicious about not using pipes in popen and tried a little bit but I guess I can't receive the output from P1.
Any ideas?
# This code run XLE and pass the intended input to it automatically
def startExe(programPath, programArgStr):
p = subprocess.Popen(programPath,stdout=subprocess.PIPE,stdin=subprocess.PIPE) p.stdin.write(programArgStr)
p.communicate()[0]
# Need to kill the process if it takes longer than it should here
def main(folder):
..
#loop
programArgStr = "create-parser"+path1+";cd "+ path2+"/s"+ command(counter) +";exit"
startExe(path, programArgStr)
..
As you can see if P1 can finish the given task successfully it can exit itself, using the exit commands passed to it!
If you're not required to use Python, you might consider using Cygwin Bash along with the timeout(1) command to run a command with a timeout. However, since Cygwin's implementation of fork() is not very fast and you're creating an enormous number of processes, you'll likely have an enormous overhead of just creating processes (I don't know if the native Windows version of Python is any better in this regard).
Alternatively, if you have the source code to P1, why don't you just modify it so that it can perform multiple iterations of whatever it is that it does in one invocation? That way, you don't have to deal with creating and killing 480,000 processes, which will make a huge difference if the amount of work that each invocation does is small.
When you call popen, you can specify either a pipe or file descriptor to accept stdout from the process:
Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)
You can then monitor the file/pipe you pass to popen, and if nothing is written, kill the process.
More info on the popen args is in the python docs.
Rather than using p.communicate, try just looping through the lines of output:
while True:
line = p.stdout.readline()
if not line:
break
print ">>> " + line.rstrip()
How about this approach?
from threading import Thread
def main(input):
#your actual program, modified to take input as argument
queue = ['Your inputs stored here. Even better to make it to be a generator']
class Runner(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
while len(queue)>0:
input = queue.pop()
main(input)
return True
#Use 24 threads
for thread in xrange(24):
Runner().start()
#You may also join threads at the end.
Of course this approach would introduce some vulnerabilities, like "two threads pops queue list at same time", but I never encountered it in real-life.
I solved my problem by editing current code and putting the killer code in a separate file.
to do that I add a line to write PID of the newly created process in a file.
#//Should come before p.commiunicate
WriteStatus(str(p.pid) + "***" + str(gmtime().tm_hour) + "***" + str(gmtime().tm_min))
p.communicate()[0]
And the process monitor executed seperately and checks every 2 mins to see if the processes listed in the file are still active. If yes kill them and remove their ID.
def KillProcess(pid):
subprocess.Popen("TASKKILL /PID "+ str(pid) + " /F /T" , shell=True)
subprocess.Popen("TASKKILL /im WerFault.exe /F /T" , shell=True)
print "kill"
def ReadStatus(filePath):
print "Checking" + filePath
try:
status = open(mainPath+filePath, 'r').readline()
except:
print "file removed" + filePath
return 0
if len(status) >0:
info = status.split("***")
time = [gmtime().tm_hour, gmtime().tm_min]
print time
# Time deifferences
difHour = time[0]- int(info[1])
if difHour == 0: # in the same hour
difMin = time[1]- int(info[2])
else:
difMin = 60 - int(info[2]) + time[1]
if difMin > 2:
try:
open(mainPath+filePath, 'w').write("")
KillProcess(info[0])
except:
pass
return 1
def monitor():
# Read all the files
listFiles = os.listdir(mainPath)
while len(listFiles)>0:
#GO and check the contents
for file in listFiles:
#Open the file and Calculate if the process should be killed or not
pid = ReadStatus(file)
#Update listFiles due to remove of file after finishing the process '
# of each folder is done
listFiles = os.listdir(mainPath)
for i in range(0,4):
time.sleep(30) #waits 30 sec
subprocess.Popen("TASKKILL /im WerFault.exe /F /T" , shell=True)
#to indicate the job is done
return 1
I would like to repeatedly execute a subprocess as fast as possible. However, sometimes the process will take too long, so I want to kill it.
I use signal.signal(...) like below:
ppid=pipeexe.pid
signal.signal(signal.SIGALRM, stop_handler)
signal.alarm(1)
.....
def stop_handler(signal, frame):
print 'Stop test'+testdir+'for time out'
if(pipeexe.poll()==None and hasattr(signal, "SIGKILL")):
os.kill(ppid, signal.SIGKILL)
return False
but sometime this code will try to stop the next round from executing.
Stop test/home/lu/workspace/152/treefit/test2for time out
/bin/sh: /home/lu/workspace/153/squib_driver: not found ---this is the next execution; the program wrongly stops it.
Does anyone know how to solve this? I want to stop in time not execute 1 second the time.sleep(n) often wait n seconds. I do not want that I want it can execute less than 1 second
You could do something like this:
import subprocess as sub
import threading
class RunCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = sub.Popen(self.cmd)
self.p.wait()
def Run(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate() #use self.p.kill() if process needs a kill -9
self.join()
RunCmd(["./someProg", "arg1"], 60).Run()
The idea is that you create a thread that runs the command and to kill it if the timeout exceeds some suitable value, in this case 60 seconds.
Here is something I wrote as a watchdog for subprocess execution. I use it now a lot, but I'm not so experienced so maybe there are some flaws in it:
import subprocess
import time
def subprocess_execute(command, time_out=60):
"""executing the command with a watchdog"""
# launching the command
c = subprocess.Popen(command)
# now waiting for the command to complete
t = 0
while t < time_out and c.poll() is None:
time.sleep(1) # (comment 1)
t += 1
# there are two possibilities for the while to have stopped:
if c.poll() is None:
# in the case the process did not complete, we kill it
c.terminate()
# and fill the return code with some error value
returncode = -1 # (comment 2)
else:
# in the case the process completed normally
returncode = c.poll()
return returncode
Usage:
return = subprocess_execute(['java', '-jar', 'some.jar'])
Comments:
here, the watchdog time out is in seconds; but it's easy to change to whatever needed by changing the time.sleep() value. The time_out will have to be documented accordingly;
according to what is needed, here it maybe more suitable to raise some exception.
Documentation: I struggled a bit with the documentation of subprocess module to understand that subprocess.Popen is not blocking; the process is executed in parallel (maybe I do not use the correct word here, but I think it's understandable).
But as what I wrote is linear in its execution, I really have to wait for the command to complete, with a time out to avoid bugs in the command to pause the nightly execution of the script.
I guess this is a common synchronization problem in event-oriented programming with threads and processes.
If you should always have only one subprocess running, make sure the current subprocess is killed before running the next one. Otherwise the signal handler may get a reference to the last subprocess run and ignore the older.
Suppose subprocess A is running. Before the alarm signal is handled, subprocess B is launched. Just after that, your alarm signal handler attempts to kill a subprocess. As the current PID (or the current subprocess pipe object) was set to B's when launching the subprocess, B gets killed and A keeps running.
Is my guess correct?
To make your code easier to understand, I would include the part that creates a new subprocess just after the part that kills the current subprocess. That would make clear there is only one subprocess running at any time. The signal handler could do both the subprocess killing and launching, as if it was the iteration block that runs in a loop, in this case event-driven with the alarm signal every 1 second.
Here's what I use:
class KillerThread(threading.Thread):
def __init__(self, pid, timeout, event ):
threading.Thread.__init__(self)
self.pid = pid
self.timeout = timeout
self.event = event
self.setDaemon(True)
def run(self):
self.event.wait(self.timeout)
if not self.event.isSet() :
try:
os.kill( self.pid, signal.SIGKILL )
except OSError, e:
#This is raised if the process has already completed
pass
def runTimed(dt, dir, args, kwargs ):
event = threading.Event()
cwd = os.getcwd()
os.chdir(dir)
proc = subprocess.Popen(args, **kwargs )
os.chdir(cwd)
killer = KillerThread(proc.pid, dt, event)
killer.start()
(stdout, stderr) = proc.communicate()
event.set()
return (stdout,stderr, proc.returncode)
A bit more complex, I added an answer to solve a similar problem: Capturing stdout, feeding stdin, and being able to terminate after some time of inactivity and/or after some overall runtime.