I have a program which opens a subprocess and communicates with it by writing to its stdin and reading from its stdout.
proc = subprocess.Popen(['foo'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=subprocess.PIPE)
proc.stdin.write('stuff\n')
proc.stdin.flush()
The problem is that when reading, it always blocks if I call proc.stdout.read(), and when I try to read line by line using the following:
output = str()
while proc.stdout in select.select([proc.stdout], [], [])[0]:
output += proc.stdout.readline()
it still blocks because select.select returns proc.stdout even after all the output has been read already. What can I do?
note that I am not using proc.communicate because I would like to communicate with the process multiple times
Related
I wrote a python script that uses subprocess.Popen to call command line tool look to do the binary search on a file.
For example
p = subprocess.Popen('look -b "abc" testfile.txt',executable='/bin/bash', stdout=subprocess.PIPE, stderr=STDOUT, shell=True)
out, err = p.communicate()
result = out.decode()
print(result)
What this snippet of code does is that it calls the system command look to perform a binary search the file called testfile.txt for the string abc.
It works fine if you just have this snippet of code.
However, when your memory is loaded with some large files, it becomes significantly slow.
For example, if you do:
a = read_a_large_file() #Like GBs of data
p = subprocess.Popen('look -b "abc" testfile.txt',executable='/bin/bash', stdout=subprocess.PIPE, stderr=STDOUT, shell=True)
out, err = p.communicate()
result = out.decode()
print(result)
a[0]
The subprocess part takes a very long time to execute. Running the look command is very fast in shell as it performs binary search on sorted files.
Any help will be appreciated! Thanks!
In general this questions has a lot of answers but they are all limited to line by line reading. For example this code:
def execute(cmd):
popen = subprocess.Popen(cmd, stdout=subprocess.PIPE, universal_newlines=True)
for stdout_line in iter(popen.stdout.readline, ""):
yield stdout_line
popen.stdout.close()
return_code = popen.wait()
if return_code:
raise subprocess.CalledProcessError(return_code, cmd)
But there are output lines for example like this (where dots are added once in ~10s):
............................
They show progress of a task that runs. I don't want to stop output until the line is finished and only then print the whole dots line.
So I need to yield string when:
I can read a block of 1024 symbols of output (or just the whole output)
there are ANY symbols of output and more then 1s passed (no matter line is finished or not)
But I don't know how to do this.
p.s. Maybe a dup. Didn't find.
If you're on Linux you can use stdbuf -o0 in front of the command you're executing to make its stdout become unbuffered (i.e. instantaneous).
I am using a python script to run a process using subprocess.Popen and simultaneously store the output in a text file as well as print it on the console. This is my code:
result = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
for line in result.stdout.readlines(): #read and store result in log file
openfile.write("%s\n" %line)
print("%s" %line)
Above code works fine, but what it does is it first completes the process and stores the output in result variable. After that for loop stores the output as well as print it.
But i want the output at runtime (as my process can take hours to complete, i don't get any output for all these hours).
So is there any other function that gives me the output dynamically (at runtime), means as soon as the process gives first line, it should get printed.
The problem here is that .readlines() gets the entire output before returning, as it constructs a full list. Just iterate directly:
for line in result.stdout:
print(line)
.readlines() returns a list of all the lines the process will return while open, i.e., it doesn't return anything until all output from the subprocess is received. To read line by line in "real time":
import sys
from subprocess import Popen, PIPE
proc = Popen(cmd, shell=True, bufsize=1, stdout=PIPE)
for line in proc.stdout:
openfile.write(line)
sys.stdout.buffer.write(line)
sys.stdout.buffer.flush()
proc.stdout.close()
proc.wait()
Note: if the subprocess uses block-buffering when it is run in non-interactive mode; you might need pexpect, pty modules or stdbuf, unbuffer, script commands.
Note: on Python 2, you might also need to use iter(), to get "real time" output:
for line in iter(proc.stdout.readline, ""):
openfile.write(line)
print line,
You can iterate over the lines one by one by using readline on the pipe:
while True:
line = result.stdout.readline()
print line.strip()
if not line:
break
The lines contain a trailing \n which I stripped for printing.
When the process terminates, readline returns an empty string, so you know when to stop.
I'm trying to send a string to the first processes's stdin and chain it's stdout to the second processe's stdin
First program is paps, a text to postscript converter which accepts as input a textfile or string and ouputs a postscript file to stdout
Second Program is lpr, the line printer command. The process goes like this:
Write a string to First Program's stdin
Pipe the output of the first program to the stdin of the second.
The output of the second program is handled by itself like this in unix:
echo "The String" | paps | lpr
Here is what I've tried from the python docs.
#!/usr/bin/python
import sys
from subprocess import Popen,PIPE
paps=Popen(["/usr/local/bin/paps"],stdin=PIPE,stdout=PIPE)
lpr=Popen(["/usr/bin/lpr"],stdin=paps.stdout)
paps.communicate("ABCD")
paps.stdout.close()
lpr.communicate()[0]
This is from the documentation:
#p1 = Popen(["dmesg"], stdout=PIPE)
#p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
#p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
#output = p2.communicate()[0]
In my case, the original output originates within my program and is sent to the stdin of the first process.
lpr=Popen(["/usr/bin/lpr"],stdin=paps.stdout)
How about stdout=PIPE?
I'm trying to talk to a child process using the python subprocess.Popen() call. In my real code, I'm implementing a type of IPC, so I want to write some data, read the response, write some more data, read the response, and so on. Because of this, I cannot use Popen.communicate(), which otherwise works well for the simple case.
This code shows my problem. It never even gets the first response, hangs at the first "Reading result". Why? How can I make this work as I expect?
import subprocess
p = subprocess.Popen(["sed", 's/a/x/g'],
stdout = subprocess.PIPE,
stdin = subprocess.PIPE)
p.stdin.write("abc\n")
print "Reading result:"
print p.stdout.readline()
p.stdin.write("cat\n")
print "Reading result:"
print p.stdout.readline()
sed's output is buffered and only outputs its data until enough has been cumulated or the input stream is exhausted and closed.
Try this:
import subprocess
p = subprocess.Popen(["sed", 's/a/x/g'],
stdout = subprocess.PIPE,
stdin = subprocess.PIPE)
p.stdin.write("abc\n")
p.stdin.write("cat\n")
p.stdin.close()
print "Reading result 1:"
print p.stdout.readline()
print "Reading result 2:"
print p.stdout.readline()
Be aware that this cannot be done reliably which huge data as wriring to stdin blocks once the buffer is full. The best way to do is using communicate().
I would try to use Popen().communicate() if you can as it does a lot of nice things for you, but if you need to use Popen() exactly as you described, you'll need to set sed to flush its buffer after newlines with the -l option:
p = subprocess.Popen(['sed', '-l', 's/a/x/g'],
stdout=subprocess.PIPE,
stdin=subprocess.PIPE)
and your code should work fine