missing stdout before subprocess.Popen crash [duplicate] - python

I am using a 3rd-party python module which is normally called through terminal commands. When called through terminal commands it has a verbose option which prints to terminal in real time.
I then have another python program which calls the 3rd-party program through subprocess. Unfortunately, when called through subprocess the terminal output no longer flushes, and is only returned on completion (the process takes many hours so I would like real-time progress).
I can see the source code of the 3rd-party module and it does not set printing to be flushed such as print('example', flush=True). Is there a way to force the flushing through my module without editing the 3rd-party source code? Furthermore, can I send this output to a log file (again in real time)?
Thanks for any help.

The issue is most likely that many programs work differently if run interactively in a terminal or as part of a pipe line (i.e. called using subprocess). It has very little to do with Python itself, but more with the Unix/Linux architecture.
As you have noted, it is possible to force a program to flush stdout even when run in a pipe line, but it requires changes to the source code, by manually applying stdout.flush calls.
Another way to print to screen, is to "trick" the program to think it is working with an interactive terminal, using a so called pseudo-terminal. There is a supporting module for this in the Python standard library, namely pty. Using, that, you will not explicitly call subprocess.run (or Popen or ...). Instead you have to use the pty.spawn call:
def prout(fd):
data = os.read(fd, 1024)
while(data):
print(data.decode(), end="")
data = os.read(fd, 1024)
pty.spawn("./callee.py", prout)
As can be seen, this requires a special function for handling stdout. Here above, I just print it to the terminal, but of course it is possible to do other thing with the text as well (such as log or parse...)
Another way to trick the program, is to use an external program, called unbuffer. Unbuffer will take your script as input, and make the program think (as for the pty call) that is called from a terminal. This is arguably simpler if unbuffer is installed or you are allowed to install it on your system (it is part of the expect package). All you have to do then, is to change your subprocess call as
p=subprocess.Popen(["unbuffer", "./callee.py"], stdout=subprocess.PIPE)
and then of course handle the output as usual, e.g. with some code like
for line in p.stdout:
print(line.decode(), end="")
print(p.communicate()[0].decode(), end="")
or similar. But this last part I think you have already covered, as you seem to be doing something with the output.

Related

How do you trivially change a Python script to capture everything written to stdout by itself and its subprocesses?

Suppose you have a big script that writes to stdout in many places, both directly (using Python features like print() or logging that goes to stdout) and indirectly by launching subprocesses which write to stdout.
Is there a trivial way to capture all this stdout?
For example, if you want the script to send an email with all its output when it completes.
By "trivial" I mean a constant rather than linear code change. Otherwise, I believe you will have to introduce redirection parameters (and some accumulation logic) into every single subrprocess call. You can capture all the output of the script itself by redirecting sys.stdout, however I don't see a similar "catch-all" trivial solution for all the subprocess calls, or indeed whatever other types of code you may be using to launch these subprocesses.
Is there any such solution, or must one use a runner script that will call this Python script as a subprocess and capture all stdout from that subprocess?
Probably the shortest way to do so, not python specific would be to use os.dup2() e.g.:
f = open('/tmp/OUT', 'w')
os.dup2(f.fileno(), 1)
f.close()
What it does is to replaces file descriptor 1 which would normally be your stdout. With file descriptor of f (which you can then close). After that all writes to stdout and in /tmp/OUT. This duplication is inheritable, subprocesses have fd 1 writing to the same file.

Writing and reading stdout unbuffered to a file over SSH

I'm using Node to execute a Python script. The Python script SSH's into a server, and then runs a Pig job. I want to be able to get the standard out from the Pig job, and display it in the browser.
I'm using the PExpect library to make the SSH calls, but this will not print the output of the pig call until it has totally completed (at least the way I have it written). Any tips on how to restructure it?
child.sendline(command)
child.expect(COMMAND_PROMPT)
print(child.before)
I know I shouldn't be expecting the command prompt (cause that will only show up when the process ends), but I'm not sure what I should be expecting.
Repeating my comment as an answer, since it solved the issue:
If you set child.logfile_read to a writable file-like object (e.g. sys.stdout), Pexpect will the forward the output there as it reads it.
child.logfile_read = sys.stdout
child.sendline(command)
child.expect(COMMAND_PROMPT)

twisted reactor.spawnProcess get stdout w/o bufffering on windows

I'm running an external process and I need to get the stdout immediately so I can push it to a textview, on GNU/Linux I can use "usePTY=True" to get the stdout by line, unfortunately usePTY is not available on windows.
I'm fairly new to twisted, is there a way to achieve the same result on Windows with some twisted (or python maybe) magic stuff?
on GNU/Linux I can use "usePTY=True" to get the stdout by line
Sort of! What usePTY=True actually does is create a PTY (a "pseudo-terminal" - the thing you always get when you log in to a shell on GNU/Linux unless you have a real terminal which no one does anymore :) instead of a boring old pipe. A PTY is a lot like a pipe but it has some extra features - but more importantly for you, a PTY is strongly associated with interactive sessions (ie, a user) whereas a pipe is pretty strongly associated with programmatic uses (think foo | bar - no user ever sees the output of foo).
This means that people tend to use existence of a PTY as stdout as a signal that they should produce output in a timely manner - because a human is waiting to see it. On the flip side, the existence of a regular old pipe as stdout is taken as a signal that another program is consuming the output and they should instead produce output in the most efficient way possible.
What this tends to mean in practice is that if a program has a PTY then it will line buffer its output and if it has a pipe then it will "block" buffer its output (usually gather up about 4kB of data before writing any of it) - because line buffering is less efficient.
The thing to note here is that it is the program you are running that does this buffering. Whether you pass usePTY=True or usePTY=False makes no direct difference to that buffering: it is just a hint to the program you are running what kind of output buffering it should do.
This means that you might run programs that block buffer even if you pass usePTY=True and vice versa.
However... Windows doesn't have PTYs. So programs on Windows can't consider PTYs as a hint for how to buffer their output.
I don't actually know if there is another hint that it is conventional for programs to respect on Windows. I've never come across one, at least.
If you're lucky, then the program you're running will have some way for you to request line-buffered output. If you're running Python, then it does - the PYTHONUNBUFFERED environment variable controls this, as does the -u command line option (and I think they both work on Windows).
Incidentally, if you plan to pass binary data between the two processes, then you probably also want to put stdio into binary mode in the child process as well:
import os, sys, mscvrt
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)

Getting live output from running unix command in python

I am using below code for running unix commands:
cmd = 'ls -l'
(status,output) = commands.getstatusoutput(cmd)
print output
But the problem is that it shows output only after the command completed, but i want to see the output printed as the execution progresses.
ls -l is just dummy command, i am using some complex command in actual program.
Thanks!!
Since this is homework, here's what to do instead of the full solution:
Use the subprocess.Popen class to call the executable. Note that the constructor takes a named stdout argument, and take a look at subprocess.PIPE.
Read from the Popen object's STDOUT pipe in a separate thread to avoid dead locks. See the threading module.
Wait until the subprocess has finished (see Popen.wait).
Wait until the thread has finished processing the output (see Thread.join). Note that this may very well happen after the subprocess has finished.
If you need more help please describe your precise problem.
Unless there are simpler ways in Python which I'm not aware of, I believe you'll have to dig into the slightly more complex os.fork and os.pipe functions.
Basically, the idea is to fork your process, have the child execute your command, while having its standard output redirected to a pipe which will be read by the parent. You'll easily find examples of this kind of pattern.
Most programs will use block buffered output if they are not connected to a tty, so you need to run the program connected to a pty; the easiest way is to use pexpect:
for line in pexpect.spawn('command arg1 arg2'):
print line

os.system() failing in python

I'm trying to parse some data and make graphs with python and there's an odd issue coming up. A call to os.system() seems to get lost somewhere.
The following three lines:
os.system('echo foo bar')
os.system('gnuplot test.gnuplot')
os.system('gnuplot --version')
Should print:
foo bar
Warning: empty x range [2012:2012], adjusting to [1991.88:2032.12]
gnuplot 4.4 patchlevel 2
But the only significant command in the middle seems to get dropped. The script still runs the echo and version check, and running gnuplot by itself (the gnuplot shell) works too, but there is no warning and no file output from gnuplot.
Why is this command dropped, and why completely silently?
In case it's helpful, the invocation should start gnuplot, it should open a couple of files (the instructions and a data file indicated therein) and write out to an SVG file. I tried deleting the target file so it wouldn't have to overwrite, but to no avail.
This is python 3.2 on Ubuntu Natty x86_64 virtual machine with the 2.6.38-8-virtual kernel.
Is the warning printed to stderr, and that is intercepted somehow?
Try using subprocess instead, for example using
subprocess.check_output(cmd, stderr=subprocess.STDOUT)
and checking the output.
(or plaing subprocess.call might work better than os.system)
So, it turned out the issue was something I failed to mention. Earlier in the script test.gnuplot and test.data were written, but I neglected to call the file objects' close() and verify that they got closed (still don't know how to do that last part so for now it cycles for a bit). So there was some unexpected behaviour going on there causing gnuplot to see two unreadable files, take no action, produce no output, and return 0.
I guess nobody gets points for this one.
Edit: I finally figured it out with the help of strace. Don't know how I did things before I learned how to use it.
don't use os.system. Use subprocess module.
os.system documentation says:
The subprocess module provides more powerful facilities for spawning
new processes and retrieving their results; using that module is
preferable to using this function.
Try this:
subprocess.check_call(['gnuplot', 'test.gnuplot'])

Categories