Get Result of a SubProcess in Real Time
I would like to get each result (sys.stdout) in real time before the subprocess terminates.
Suppose we have the following file.py.
import time,sys
sys.stdout.write('something')
while True:
sys.stdout.write('something else')
time.sleep(4)
Well, i made some tries with modules of subprocess, asyncio and threading, although all methods gives me the result when the process is finished. Ideally i would like to terminate the process myself and get each result (stdout, stderr) in real time and not when the process it completes.
import subprocess
proc = sp.Popen([sys.executable, "/Users/../../file.py"], stdout = subprocess.PIPE, stderr= subproces.STDOUT)
proc.communicate() #This one received the result after finish
I tried as well with readline proc.stdout.readline() in a different thread with threading module and with asyncio, but it waits as well until the process completes.
The only usefull that i found is the usage of psutil.Popen(*args, **kwargs) with this one i could terminate whenever i want the process and get some stats for that. But the main issue still remains to get in real time (asynchronously) each sys.stdout or print of file.py, at the moment of each printing.
*preferred solution for python3.6
As noted in the comments, the first and foremost thing is to ensure that your file.py program actually writes the data the way you think it does.
For example, the program you have shown will write nothing for about 40 minutes, because that's how long it takes for 14-byte prints issued at 4-second intervals to fill up the 8-kilobyte IO buffer. Even more confusingly, some programs will appear to write data if you test them on a TTY (i.e. just run them), but not when you start them as subprocesses. This is because on a TTY stdout is line-buffered, and on a pipe it is fully buffered. When the output is not flushed, there is simply no way for another program to detect the output because it is stuck inside the subprocess's buffer that it never bothered to share with anyone.
In other words, don't forget to flush:
while True:
# or just print('something else', flush=True)
sys.stdout.write('something else')
sys.stdout.flush()
time.sleep(4)
With that out of the way, let's examine how to read that output. Asyncio provides a nice stream-based interface to subprocesses that is quite capable of accessing arbitrary output as it arrives. For example:
import asyncio
async def main():
loop = asyncio.get_event_loop()
proc = await asyncio.create_subprocess_exec(
"python", "file.py",
stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
)
# loop.create_task() rather than asyncio.create_task() because Python 3.6
loop.create_task(display_as_arrives(proc.stdout, 'stdout'))
loop.create_task(display_as_arrives(proc.stderr, 'stderr'))
await proc.wait()
async def display_as_arrives(stream, where):
while True:
# 1024 chosen arbitrarily - StreamReader.read will happily return
# shorter chunks - this allows reading in real-time.
output = await stream.read(1024)
if output == b'':
break
print('got', where, ':', output)
# run_until_complete() rather than asyncio.run() because Python 3.6
asyncio.get_event_loop().run_until_complete(main())
Related
I want to achieve something which is very similar to this.
My actual goal is to run Rasa from within python.
Taken from Rasa's site:
Rasa is a framework for building conversational software: Messenger/Slack bots, Alexa skills, etc. We’ll abbreviate this as a bot in this documentation.
It is basically a chatbot which runs in the command prompt. This is how it works on cmd :
Now I want to run Rasa from python so that I can integrate it with my Django-based website. i.e. I want to keep taking inputs from the user, pass it to rasa, rasa processes the text and gives me an output which I show back to the user.
I have tried this (running it from cmd as of now)
import sys
import subprocess
from threading import Thread
from queue import Queue, Empty # python 3.x
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
def getOutput(outQueue):
outStr = ''
try:
while True: #Adds output from the Queue until it is empty
outStr+=outQueue.get_nowait()
except Empty:
return outStr
p = subprocess.Popen('command_to_run_rasa',
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=False,
universal_newlines=True,
)
outQueue = Queue()
outThread = Thread(target=enqueue_output, args=(p.stdout, outQueue))
outThread.daemon = True
outThread.start()
someInput = ""
while someInput != "stop":
someInput = input("Input: ") # to take input from user
p.stdin.write(someInput) # passing input to be processed by the rasa command
p.stdin.flush()
output = getOutput(outQueue)
print("Output: " + output + "\n")
p.stdout.flush()
But it works fine only for the first line of output. Not for successive input/output cycles. See output below.
How do I get it working for multiple cycles?
I've referred to this, and I think I understand the problem in my code from it but I dont know how to solve it.
EDIT: I'm using Python 3.6.2 (64-bit) on Windows 10
You need to keep interacting with your subprocess - at the moment once you pick the output from your subprocess you're pretty much done as you close its STDOUT stream.
Here is the most rudimentary way to continue user input -> process output cycle:
import subprocess
import sys
import time
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=output_buffer, # pipe directly to the output_buffer
universal_newlines=True)
while True: # run a main loop
time.sleep(0.5) # give some time for `rasa` to forward its STDOUT
print("Input: ", end="", file=output_buffer, flush=True) # print the input prompt
print(input_buffer.readline(), file=proc.stdin, flush=True) # forward the user input
You can replace input_buffer with a buffer coming from your remote user(s) and output_buffer with a buffer that forwards the data to your user(s) and you'll get essentially what you're looking for - the sub-process will be getting the input directly from the user (input_buffer) and print its output to the user (output_buffer).
If you need to perform other tasks while all this is running in the background, just run everything under the if __name__ == "__main__": guard in a separate thread, and I'd suggest adding a try..except block to pick up KeyboardInterrupt and exit gracefully.
But... soon enough you'll notice that it doesn't exactly work properly all the time - if it takes longer than half a second of wait for rasa to print its STDOUT and enter the wait for STDIN stage, the outputs will start to mix. This problem is considerably more complex than you might expect. The main issue is that STDOUT and STDIN (and STDERR) are separate buffers and you cannot know when a subprocess is actually expecting something on its STDIN. This means that without a clear indication from the subprocess (like you have the \r\n[path]> in Windows CMD prompt on its STDOUT for example) you can only send data to the subprocesses STDIN and hope it will be picked up.
Based on your screenshot, it doesn't really give a distinguishable STDIN request prompt because the first prompt is ... :\n and then it waits for STDIN, but then once the command is sent it lists options without an indication of its end of STDOUT stream (technically making the prompt just ...\n but that would match any line preceding it as well). Maybe you can be clever and read the STDOUT line by line, then on each new line measure how much time has passed since the sub-process wrote to it and once a threshold of inactivity is reached assume that rasa expects input and prompt the user for it. Something like:
import subprocess
import sys
import threading
# we'll be using a separate thread and a timed event to request the user input
def timed_user_input(timer, wait, buffer_in, buffer_out, buffer_target):
while True: # user input loop
timer.wait(wait) # wait for the specified time...
if not timer.is_set(): # if the timer was not stopped/restarted...
print("Input: ", end="", file=buffer_out, flush=True) # print the input prompt
print(buffer_in.readline(), file=buffer_target, flush=True) # forward the input
timer.clear() # reset the 'timer' event
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=subprocess.PIPE, # pipe its STDIN so we can process it
universal_newlines=True)
# lets build a timer which will fire off if we don't reset it
timer = threading.Event() # a simple Event timer
input_thread = threading.Thread(target=timed_user_input,
args=(timer, # pass the timer
1.0, # prompt after one second
input_buffer, output_buffer, proc.stdin))
input_thread.daemon = True # no need to keep the input thread blocking...
input_thread.start() # start the timer thread
# now we'll read the `rasa` STDOUT line by line, forward it to output_buffer and reset
# the timer each time a new line is encountered
for line in proc.stdout:
output_buffer.write(line) # forward the STDOUT line
output_buffer.flush() # flush the output buffer
timer.set() # reset the timer
You can use a similar technique to check for more complex 'expected user input' patterns. There is a whole module called pexpect designed to deal with this type of tasks and I wholeheartedly recommend it if you're willing to give up some flexibility.
Now... all this being said, you are aware that Rasa is built in Python, installs as a Python module and has a Python API, right? Since you're already using Python why would you call it as a subprocess and deal with all this STDOUT/STDIN shenanigans when you can directly run it from your Python code? Just import it and interact with it directly, they even have a very simple example that does exactly what you're trying to do: Rasa Core with minimal Python.
I want to achieve something which is very similar to this.
My actual goal is to run Rasa from within python.
Taken from Rasa's site:
Rasa is a framework for building conversational software: Messenger/Slack bots, Alexa skills, etc. We’ll abbreviate this as a bot in this documentation.
It is basically a chatbot which runs in the command prompt. This is how it works on cmd :
Now I want to run Rasa from python so that I can integrate it with my Django-based website. i.e. I want to keep taking inputs from the user, pass it to rasa, rasa processes the text and gives me an output which I show back to the user.
I have tried this (running it from cmd as of now)
import sys
import subprocess
from threading import Thread
from queue import Queue, Empty # python 3.x
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
def getOutput(outQueue):
outStr = ''
try:
while True: #Adds output from the Queue until it is empty
outStr+=outQueue.get_nowait()
except Empty:
return outStr
p = subprocess.Popen('command_to_run_rasa',
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=False,
universal_newlines=True,
)
outQueue = Queue()
outThread = Thread(target=enqueue_output, args=(p.stdout, outQueue))
outThread.daemon = True
outThread.start()
someInput = ""
while someInput != "stop":
someInput = input("Input: ") # to take input from user
p.stdin.write(someInput) # passing input to be processed by the rasa command
p.stdin.flush()
output = getOutput(outQueue)
print("Output: " + output + "\n")
p.stdout.flush()
But it works fine only for the first line of output. Not for successive input/output cycles. See output below.
How do I get it working for multiple cycles?
I've referred to this, and I think I understand the problem in my code from it but I dont know how to solve it.
EDIT: I'm using Python 3.6.2 (64-bit) on Windows 10
You need to keep interacting with your subprocess - at the moment once you pick the output from your subprocess you're pretty much done as you close its STDOUT stream.
Here is the most rudimentary way to continue user input -> process output cycle:
import subprocess
import sys
import time
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=output_buffer, # pipe directly to the output_buffer
universal_newlines=True)
while True: # run a main loop
time.sleep(0.5) # give some time for `rasa` to forward its STDOUT
print("Input: ", end="", file=output_buffer, flush=True) # print the input prompt
print(input_buffer.readline(), file=proc.stdin, flush=True) # forward the user input
You can replace input_buffer with a buffer coming from your remote user(s) and output_buffer with a buffer that forwards the data to your user(s) and you'll get essentially what you're looking for - the sub-process will be getting the input directly from the user (input_buffer) and print its output to the user (output_buffer).
If you need to perform other tasks while all this is running in the background, just run everything under the if __name__ == "__main__": guard in a separate thread, and I'd suggest adding a try..except block to pick up KeyboardInterrupt and exit gracefully.
But... soon enough you'll notice that it doesn't exactly work properly all the time - if it takes longer than half a second of wait for rasa to print its STDOUT and enter the wait for STDIN stage, the outputs will start to mix. This problem is considerably more complex than you might expect. The main issue is that STDOUT and STDIN (and STDERR) are separate buffers and you cannot know when a subprocess is actually expecting something on its STDIN. This means that without a clear indication from the subprocess (like you have the \r\n[path]> in Windows CMD prompt on its STDOUT for example) you can only send data to the subprocesses STDIN and hope it will be picked up.
Based on your screenshot, it doesn't really give a distinguishable STDIN request prompt because the first prompt is ... :\n and then it waits for STDIN, but then once the command is sent it lists options without an indication of its end of STDOUT stream (technically making the prompt just ...\n but that would match any line preceding it as well). Maybe you can be clever and read the STDOUT line by line, then on each new line measure how much time has passed since the sub-process wrote to it and once a threshold of inactivity is reached assume that rasa expects input and prompt the user for it. Something like:
import subprocess
import sys
import threading
# we'll be using a separate thread and a timed event to request the user input
def timed_user_input(timer, wait, buffer_in, buffer_out, buffer_target):
while True: # user input loop
timer.wait(wait) # wait for the specified time...
if not timer.is_set(): # if the timer was not stopped/restarted...
print("Input: ", end="", file=buffer_out, flush=True) # print the input prompt
print(buffer_in.readline(), file=buffer_target, flush=True) # forward the input
timer.clear() # reset the 'timer' event
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=subprocess.PIPE, # pipe its STDIN so we can process it
universal_newlines=True)
# lets build a timer which will fire off if we don't reset it
timer = threading.Event() # a simple Event timer
input_thread = threading.Thread(target=timed_user_input,
args=(timer, # pass the timer
1.0, # prompt after one second
input_buffer, output_buffer, proc.stdin))
input_thread.daemon = True # no need to keep the input thread blocking...
input_thread.start() # start the timer thread
# now we'll read the `rasa` STDOUT line by line, forward it to output_buffer and reset
# the timer each time a new line is encountered
for line in proc.stdout:
output_buffer.write(line) # forward the STDOUT line
output_buffer.flush() # flush the output buffer
timer.set() # reset the timer
You can use a similar technique to check for more complex 'expected user input' patterns. There is a whole module called pexpect designed to deal with this type of tasks and I wholeheartedly recommend it if you're willing to give up some flexibility.
Now... all this being said, you are aware that Rasa is built in Python, installs as a Python module and has a Python API, right? Since you're already using Python why would you call it as a subprocess and deal with all this STDOUT/STDIN shenanigans when you can directly run it from your Python code? Just import it and interact with it directly, they even have a very simple example that does exactly what you're trying to do: Rasa Core with minimal Python.
I have a server-like app I want to run from Python. It never stops until user interrupts it. I want to continuously redirect both stdout and stderr to parent when the app runs. Lucklily, that's exactly what subprocess.run does.
Shell:
$ my-app
1
2
3
...
wrapper.py:
import subprocess
subprocess.run(['my-app'])
Executing wrapper.py:
$ python wrapper.py
1
2
3
...
I believe it's thanks to the fact that subprocess.run inherits stdout and stderr file descriptiors from the parent process. Good.
But now I need to do something when the app outputs particular line. Imagine I want to run arbitrary Python code when the output line will contain 4:
$ python wrapper.py
1
2
3
4 <-- here I want to do something
...
Or I want to remove some lines from the output:
$ python wrapper.py <-- allowed only odd numbers
1
3
...
I thought I could have a filtering function which I'll just hook somehow into the subprocess.run and it will get called with every line of the output, regardless whether it's stdout or stderr:
def filter_fn(line):
if line ...:
return line.replace(...
...
But how to achieve this? How to hook such or similar function into the subprocess.run call?
Note: I can't use the sh library as it has zero support for Windows.
If you want to be able to process stdout or stderr for a subprocess, just pass subprocess.PIPE for the parameter stdout (resp. stderr). You can then access the output stream from the subprocess as proc.stdout, by default as a byte stream, but you can get it as strings with universal_newlines = True. Example:
import subprocess
app = subprocess.Popen(['my-app'], stdout = subprocess.PIPE, universal_newlines = True)
for line in app.stdout:
if line.strip() == '4':
# special processing
else:
sys.stdout.write(line)
What you must pay attention, is that to be able to process output as soon as it is written by the subprocess, the subprocess must flush output after each line. By default, stdout is line buffered when directed to a terminal - each line is printed on the newline - but is size buffered when directed to a file or pipe, meaning that it is flushed only every 8k or 16k characters.
In that case, whatever you do on caller size, you will only get stdout when the program is finished.
I believe this code will do it. The previous answer does not address reading from two streams at the same time which requires asyncio. Otherwise the other answer could work for filtering stdout and then doing stderr after stdout.
This is python 3.8 which has more descriptive method names for asyncio.
Update 2021-Aug-25: Using asyncio.run and asyncio.gather as higher level, easier to understand functions rather than manipulating the asyncio loop directly.
import sys
import asyncio
async def output_filter(input_stream, output_stream):
while not input_stream.at_eof():
output = await input_stream.readline()
if not output.startswith(b"filtered"):
output_stream.buffer.write(output)
output_stream.flush()
async def run_command(command):
process = await asyncio.create_subprocess_exec(
*command, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
)
await asyncio.gather(
output_filter(process.stderr, sys.stderr),
output_filter(process.stdout, sys.stdout),
)
# process.communicate() will have no data to read but will close the
# pipes that are implemented in C, whereas process.wait() will not
await process.communicate()
def main():
asyncio.run(run_command(["python", "sample_process.py"]))
if __name__ == "__main__":
main()
I am running two processes simultaneously in python using the subprocess module:
p_topic = subprocess.Popen(['rostopic','echo','/msg/address'], stdout=PIPE)
p_play = subprocess.Popen(['rosbag','play',bagfile_path])
These are ROS processes: p_topic listens for a .bag file to be played and outputs certain information from that .bag file to the stdout stream; I want to then access this output using the p_topic.stdout object (which behaves as a file).
However, what I find happening is that the p_topic.stdout object only contains the first ~1/3 of the output lines it should have - that is, in comparison to running the two commands manually, simultaneously in two shells side by side.
I've tried waiting for many seconds for output to finish, but this doesn't change anything, its approximately the same ratio of lines captured by p_topic.stdout each time. Any hints on what this could be would be greatly appreciated!
EDIT:
Here's the reading code:
#wait for playing to stop
while p_play.poll() == None:
time.sleep(.1)
time.sleep(X)#wait for some time for the p_topic to finish
p_topic.terminate()
output=[]
for line in p_topic.stdout:
output.append(line)
Note that the value X in time.sleep(X) doesn't make any difference
By default, when a process's stdout is not connected to a terminal, the output is block buffered. When connected to a terminal, it's line buffered. You expect to get complete lines, but you can't unless rostopic unbuffers or explicitly line buffers its stdout (if it's a C program, you can use setvbuf to make this automatic).
The other (possibly overlapping) possibility is that the pipe buffer itself is filling (pipe buffers are usually fairly small), and because you never drain it, rostopic fills the pipe buffer and then blocks indefinitely until you kill it, leaving only what managed to fit in the pipe to be drained when you read the process's stdout. In that case, you'd need to either spawn a thread to keep the pipe drained from Python, or have your main thread use select module components to monitor and drain the pipe (intermingled with polling the other process). The thread is generally easier, though you do need to be careful to avoid thread safety issues.
is it worth trying process communicate/wait? rather than sleep and would that solve your issue?
i have this for general purpose so not sure if you can take this and change it to what you need?
executable_Params = "{0} {1} {2} {3} {4}".format(my_Binary,
arg1,
arg2,
arg3,
arg4)
# execute the process
process = subprocess.Popen(shlex.split(executable_Params),
shell=False,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
stdout, stderr = process.communicate()
ret_code = process.wait()
if ret_code == 0:
return 0
else:
#get the correct message from my enum method
error_msg = Process_Error_Codes(ret_code).name
raise subprocess.CalledProcessError(returncode=ret_code,
cmd=executable_Params)
There are a lot of good answers on Stack Overflow about how to handle output with subprocesses, async IO, and avoiding deadlock with PIPE. Something is just not sinking in for me though; I need some guidance on how to accomplish the following.
I want to run a subprocess from my python program. The subprocess generates a ton of standard output, and a little bit of standard error if things go bad. The subprocess itself takes about 20 minutes to complete. For the output and error generated, I want to be able to both log it to the terminal, and write it to a log file.
Doing the latter was easy. I just opened two files and set then as stdout and stderr on the Popen object. However, also capturing the output as lines so that I may print them continuously to terminal has me vexed. I was thinking I could use the poll() method to continuously poll. With this though, I'd still need to use PIPE for stdout and stderr, and call read() on them which would block until EOF.
I think what I'm trying to accomplish is this:
start the subprocess
while process is still running
if there are any lines from stdout
print them and write them to the out log file
if there are any lines from stderr
print them and write them to the err log file
sleep for a little bit
Does that seem reasonable? If so, can someone explain how one would implement the 'if' parts here without blocking.
Thanks
Here is my select.select version:
Subprocess (foo.py):
import time
import sys
def foo():
for i in range(5):
print("foo %s" %i, file=sys.stdout, )#flush=True
sys.stdout.flush()
time.sleep(7)
foo()
Main:
import subprocess as sp
import select
proc= sp.Popen(["python", "foo.py"], stderr=sp.PIPE, stdout=sp.PIPE)
last_line = "content"
while last_line:
buff = select.select([proc.stdout], [], [], 60)[0][0]
if not buff:
print('timed out')
break
last_line = buff.readline()
print(last_line)