python-sys.stdin.readlines() ,Stop reading lines in command line - python

I have a code with function sys.stdin.readlines().
What is the difference between the above one and sys.stdin.buffer.readlines()?.
What exactly do they do ?
If they read lines from command line,how to stop reading lines at a certain instant and proceed to flow through the program?

1) sys.stdin is a TextIOWrapper, its purpose is to read text from stdin. The resulting strings will be actual strs. sys.stdin.buffer is a BufferedReader. The lines you get from this will be byte strings
2) They read all the lines from stdin until hitting eof or they hit the limit you give them
3) If you're trying to read a single line, you can use .readline() (note: no s). Otherwise, when interacting with the program on the command line, you'd have to give it the EOF signal (Ctrl+D on *nix)
Is there a reason you are doing this rather than just calling input() to get one text line at a time from stdin?

From the docs
sys.stdin
sys.stdout
sys.stderr
File objects corresponding to the interpreter’s standard input, output and error streams. stdin is used for all interpreter input except for scripts but including calls to input(). stdout is used for the output of print() and expression statements and for the prompts of input(). The interpreter’s own prompts and (almost all of) its error messages go to stderr. stdout and stderr needn’t be built-in file objects: any object is acceptable as long as it has a write() method that takes a string argument. (Changing these objects doesn’t affect the standard I/O streams of processes executed by os.popen(), os.system() or the exec*() family of functions in the os module.)
Note: The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').
So, sys.stdin.readlines() reads everything that was passed to stdin and separates the contents so lines are formed (you get a list as a result).
sys.stdin.buffer.readlines() does the same, but for buffer of stdin. I'd suggest to use the first method as the buffer may be empty while stdin may contain some data.
If you want to stop at some moment, then use readline() to read only one line at a time.

Related

Passing input to subprocess popen at runtime based on stdout string

I am trying to run following code
process = subprocess.Popen(args=cmd, shell=True, stdout=subprocess.PIPE)
while process.poll() is None:
stdoutput = process.stdout.readline()
print(stdoutput.decode())
if '(Y/N)' in stdoutput.decode():
process.communicate(input=b'Y\n')
this cmd argument runs for a few minutes after which it prompts for a confirmation, but the process.communicate is not working, neither is process.stdin.write()
How do I send input string 'Y' to this running process when it prompts for confirmation
Per the doc on Popen.communicate(input=None, timeout=None):
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE.
Please try that, and if it's not sufficient, do indicate what the symptom is.
On top of the answer from #Jerry101, if the subprocess that you are calling is a python script that uses the input(), be aware that as documented:
If the prompt argument is present, it is written to standard output without a trailing newline.
Thus if you perform readline() as in process.stdout.readline(), it would hang there waiting for the new line \n character as documented:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string
A quick fix is append the newline \n when requesting the input() e.g. input("(Y/N)\n") instead of just input("(Y/N)").
Related question:
Python subprocess stdout doesn't capture input prompt

Python's sub-process returns truncated output when using PIPE to read very long outputs

We have a rasterization utility developed in NodeJS that converts HTML string to the Base64 of the rendered HTML page. The way we are using it is by using sub-process module to run the utility and then reading its STDOUT by using PIPE. The basic code that implements this is as follows:
from subprocess import run, PIPE
result = run(['capture', tmp_file.name, '--type', 'jpeg'], stdout=PIPE, stderr=PIPE, check=True)
output = result.stdout.decode('utf-8')
The output contains the Base64 string of the rendered HTML page. As Base64 is very large for large pages, I have noticed that for some HTML pages, the output is truncated and is not complete. But, this happens randomly so Base64 could be correct for a page one time but truncated next time. It is important to mention here that I'm currently using threading (10 threads) to convert HTML to Base64 images concurrently so that might play a role here.
I analyzed this in detail and found out that, under the hood, the subprocess.run method uses the _communicate method which in turn uses the os.read() method to read from the PIPE. I printed its output and found out that it's also truncated and that's why STDOUT is truncated. Strange behavior altogether.
Finally, I was able to solve this by using a file handle instead of the PIPE and it works perfectly.
with open(output_filename, 'w+') as out_file:
result = run(['capture', tmp_file.name, '--type', 'jpeg'], stdout=out_file, stderr=PIPE, check=True)
I'm just curious why the PIPE fails to handle complete output and that too, randomly.
When you run subprocess, the command gets executed on bash.
When you use PIPE as stdout, internally bash stores data in a temp variable, which has hard limit of 128 Kb. anything that spills over 128kb gets truncated.
Best way to handle large data is to capture the output in a file.

Python: have check_call output to file continously?

Is it possible to get the following check_call procedure:
logPath="log.txt"
with open(logPath,"w") as log:
subprocess.check_call(command, stdout = log, stderr=subprocess.STDOUT )
to output the stdout and stderr to a file continously?
On my machine, the output is written to the file only after the subprocess.check_call finished.
To achieve this, perhaps we could modify the buffer length of the log filestream?
Not without some OS tricks.
That happens because the output usually is line-buffered (i.e. after a newline character, the buffer is flushed) when the output is a terminal, but it is block-buffered when the output is a file or pipe, so in the block-buffering case, you won't see the output written "continuously", but rather it will be written every 1k or 4k or whatever the block size it is.
This is the default behavior of libc, so if the subprocess is written in C and using printf()/fprintf(), it will check the output if it is a terminal or a file and change the buffering mode accordingly.
The concept of buffering is (better) explained at http://www.gnu.org/software/libc/manual/html_node/Buffering-Concepts.html
This is done for performance reasons (see the answer to this question).
If you can modify subprocess' code, you can put a call to flush() after each line or when needed.
Otherwise there are external tools to force line buffering mode (by tricking programs into believing the output is a terminal):
unbuffer part of the expect package
stdbuf
Possibly related:
Force line-buffering of stdout when piping to tee (suggests use of unbuffer)
java subprocess does not write its output until it terminates (a shorter explanation of mine written years ago)
How to get “instant" output of “tail -f” as input? (suggests stdbuf usage)
Piping of grep is not working with tail? (only for grep)

Stdin and Stdout

Can someone explain stdin and stdout? I don't understand what is the difference between using these two objects for user input and output as opposed to print and raw_input. Perhaps there is some vital information I am missing. Can anyone explain?
stdin and stdout are the streams for your operating system's standard input and output.
You use them to read and write data from your operating system's std input (usually keyboard) and output (your screen, the python console, or such).
print is simply a function which writes to the operting system's stdout and adds a newline to the end.
There are more features in print than just this, but that's the basic idea.
# Simplified print implementation
def print(value, end='\n'):
stdout.write(value)
stdout.write(end)
stdin and stdout are stream representations of the standard in- and output that your OS supplies Python with.
You can do almost everything you can do with a file on these, so for many applications, they are far more useful than eg. print, which adds linebreaks etc.

Running a command line from python and piping arguments from memory

I was wondering if there was a way to run a command line executable in python, but pass it the argument values from memory, without having to write the memory data into a temporary file on disk. From what I have seen, it seems to that the subprocess.Popen(args) is the preferred way to run programs from inside python scripts.
For example, I have a pdf file in memory. I want to convert it to text using the commandline function pdftotext which is present in most linux distros. But I would prefer not to write the in-memory pdf file to a temporary file on disk.
pdfInMemory = myPdfReader.read()
convertedText = subprocess.<method>(['pdftotext', ??]) <- what is the value of ??
what is the method I should call and how should I pipe in memory data into its first input and pipe its output back to another variable in memory?
I am guessing there are other pdf modules that can do the conversion in memory and information about those modules would be helpful. But for future reference, I am also interested about how to pipe input and output to the commandline from inside python.
Any help would be much appreciated.
with Popen.communicate:
import subprocess
out, err = subprocess.Popen(["pdftotext", "-", "-"], stdout=subprocess.PIPE).communicate(pdf_data)
os.tmpfile is useful if you need a seekable thing. It uses a file, but it's nearly as simple as a pipe approach, no need for cleanup.
tf=os.tmpfile()
tf.write(...)
tf.seek(0)
subprocess.Popen( ... , stdin = tf)
This may not work on Posix-impaired OS 'Windows'.
Popen.communicate from subprocess takes an input parameter that is used to send data to stdin, you can use that to input your data. You also get the output of your program from communicate, so you don't have to write it into a file.
The documentation for communicate explicitly warns that everything is buffered in memory, which seems to be exactly what you want to achieve.

Categories