How do I perform logging of all activities that are done by a Python script and all scripts that are called from it?
I had several Bash scripts but now wrote a Python script which call all of these Bash scripts. I would like to have all output produced from these scripts stored in some file.
The script is interactive Python script, i.e contains raw_input lines, so I couldn't do like 'python script.py | tee log.txt' for overall the Python script since for some reasons questions are not seen on the screen.
Here is an excerpt from the script which calls one of the shell scripts.
cmd = "somescript.sh"
try:
retvalue = subprocess.check_call(cmd, shell=True)
except subprocess.CalledProcessError:
print ("script command has been failed")
sys.exit("exit from script")
What do you think could be done here?
Edit
Two subquestions based on Alex's answer:
How to make the answers on the questions stored in the output file as well? For example on line ok = raw_input(prompt) the user will be asked for the question and I would like to the answer logged as well.
I read about Popen and communicate and didn't use since it buffers the data in memory. Here the amount of output is big and I need to care about standard-error with standard-output as well. Do you know if this is possible to handle with Popen and communicate method as well?
Making Python's own prints go to both the terminal and a file is not hard:
>>> import sys
>>> class tee(object):
... def __init__(self, fn='/tmp/foo.txt'):
... self.o = sys.stdout
... self.f = open(fn, 'w')
... def write(self, s):
... self.o.write(s)
... self.f.write(s)
...
>>> sys.stdout = tee()
>>> print('hello world!')
hello world!
>>>
$ cat /tmp/foo.txt
hello world!
This should work both in Python 2 and Python 3.
To similarly direct the output from subcommands, don't use
retvalue = subprocess.check_call(cmd, shell=True)
which lets cmd's output go to its regular "standard output", but rather grab and re-emit it yourself, as follows:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
so, se = p.communicate()
print(so)
retvalue = p.returncode
assuming you don't care about standard-error (only standard-output) and the amount of output from cmd is reasonably small (since .communicate buffers that data in memory) -- it's easy to tweak if either assumption doesn't correspond to what you exactly want.
Edit: the OP has now clarified the specs in a long comment to this answer:
How to make the answers on the
questions stored in the output file
as well? For example on line ok =
raw_input(prompt) the user will be
asked for the question and I would
like to the answer logged as well.
Use a function such as:
def echoed_input(prompt):
response = raw_input(prompt)
sys.stdout.f.write(response)
return response
instead of just raw_input in your application code (of course, this is written specifically to cooperate with the tee class I showed above).
I read about Popen and communicate
and didn't use since it buffers the
data in memory. Here amount of output
is big and I need to care about
standard-error with standard-output
as well. Do you know if this is
possible to handle with Popen and
communicate method as well?
communicate is fine as long as you don't get more output (and standard-error) than comfortably fits in memory, say a few gigabytes at most depending on the kind of machine you're using.
If this hypothesis is met, just recode the above as, instead:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
so, se = p.communicate()
print(so)
retvalue = p.returncode
i.e., just redirect the subcommand's stderr to get mixed into its stdout.
If you DO have to worry about gigabytes (or whatever) coming at you, then
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
for line in p.stdout:
sys.stdout.write(p)
p.wait()
retvalue = p.returncode
(which gets and emits one line at a time) may be preferable (this depends on cmd not expecting anything from its standard input, of course... because, if it is expecting anything, it's not going to get it, and the problem starts to become challenging;-).
Python has a tracing module: trace. Usage: python -m trace --trace file.py
If you want to capture the output of any script, then on a *nix-y system you can redirect stdout and stderr to a file:
./script.py >> /tmp/outputs.txt 2>> /tmp/outputs.txt
If you want everything done by the scripts, not just what they print, then the python trace module won't trace things done by external scripts that your python executes. The only thing that can trace every action done by a program would be something like DTrace, if you are lucky enough to have a system that supports it. (OS X Instruments are based on DTrace)
Related
I'm trying to run a shell program through python. I need to run a command, then while it's still running and waiting for input to continue, I need to take the output received by the program, and process that data as a string. Then I need to parse some data into that program, and simulate an enter pressing.
What would be the best way to achieve this?
subprocess.Popen will work for this, but to read and then write and then read again you can't use communicate (because this will cause the process to end).
Instead, you'll need to work with the process's output pipe (process.stdout below). This is tricky to get right, because reading on the process's stdout is blocking, so you sort of need to know when to stop trying to read (or know how much output the process is going to produce).
In this example, the subprocess is a shell script that writes a line of output, and then echoes whatever you give it until it reads EOF.
import subprocess
COMMAND_LINE = 'echo "Hello World!" ; cat'
process = subprocess.Popen(COMMAND_LINE, shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
s = process.stdout.readline().strip()
print(s)
s2 = process.communicate(s)[0]
print(s2)
Gives:
Hello World!
Hello World!
For more complicated cases, you might think about looking at something like pexpect.
Use subprocess.Popen to run your shell application and use communicate to interact with it.
I am trying to learn how to write a script control.py, that runs another script test.py in a loop for a certain number of times, in each run, reads its output and halts it if some predefined output is printed (e.g. the text 'stop now'), and the loop continues its iteration (once test.py has finished, either on its own, or by force). So something along the lines:
for i in range(n):
os.system('test.py someargument')
if output == 'stop now': #stop the current test.py process and continue with next iteration
#output here is supposed to contain what test.py prints
The problem with the above is that, it does not check the output of test.py as it is running, instead it waits until test.py process is finished on its own, right?
Basically trying to learn how I can use a python script to control another one, as it is running. (e.g. having access to what it prints and so on).
Finally, is it possible to run test.py in a new terminal (i.e. not in control.py's terminal) and still achieve the above goals?
An attempt:
test.py is this:
from itertools import permutations
import random as random
perms = [''.join(p) for p in permutations('stop')]
for i in range(1000000):
rand_ind = random.randrange(0,len(perms))
print perms[rand_ind]
And control.py is this: (following Marc's suggestion)
import subprocess
command = ["python", "test.py"]
n = 10
for i in range(n):
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
output = p.stdout.readline().strip()
print output
#if output == '' and p.poll() is not None:
# break
if output == 'stop':
print 'sucess'
p.kill()
break
#Do whatever you want
#rc = p.poll() #Exit Code
You can use subprocess module or also the os.popen
os.popen(command[, mode[, bufsize]])
Open a pipe to or from command. The return value is an open file object connected to the pipe, which can be read or written depending on whether mode is 'r' (default) or 'w'.
With subprocess I would suggest
subprocess.call(['python.exe', command])
or the subprocess.Popen --> that is similar to os.popen (for instance)
With popen you can read the connected object/file and check whether "Stop now" is there.
The os.system is not deprecated and you can use as well (but you won't get a object from that), you can just check if return at the end of execution.
From subprocess.call you can run it in a new terminal or if you want to call multiple times ONLY the test.py --> than you can put your script in a def main() and run the main as much as you want till the "Stop now" is generated.
Hope this solve your query :-) otherwise comment again.
Looking at what you wrote above you can also redirect the output to a file directly from the OS call --> os.system(test.py *args >> /tmp/mickey.txt) then you can check at each round the file.
As said the popen is an object file that you can access.
What you are hinting at in your comment to Marc Cabos' answer is Threading
There are several ways Python can use the functionality of other files. If the content of test.py can be encapsulated in a function or class, then you can import the relevant parts into your program, giving you greater access to the runnings of that code.
As described in other answers you can use the stdout of a script, running it in a subprocess. This could give you separate terminal outputs as you require.
However if you want to run the test.py concurrently and access variables as they are changed then you need to consider threading.
Yes you can use Python to control another program using stdin/stdout, but when using another process output often there is a problem of buffering, in other words the other process doesn't really output anything until it's done.
There are even cases in which the output is buffered or not depending on if the program is started from a terminal or not.
If you are the author of both programs then probably is better using another interprocess channel where the flushing is explicitly controlled by the code, like sockets.
You can use the "subprocess" library for that.
import subprocess
command = ["python", "test.py", "someargument"]
for i in range(n):
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
output = p.stdout.readline()
if output == '' and p.poll() is not None:
break
if output == 'stop now':
#Do whatever you want
rc = p.poll() #Exit Code
I'm writing a simple wrapper over python debugger (pdb) and I need to parse pdb output. But I have a problem reading text from process pipe.
Example of my code:
import subprocess, threading, time
def readProcessOutput(process):
while not process.poll():
print(process.stdout.readline())
process = subprocess.Popen('python -m pdb script.py', shell=True, universal_newlines=True,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE)
read_thread = threading.Thread(target=readProcessOutput, args=(process,))
read_thread.start()
while True:
time.sleep(0.5)
When i execute given command (python -m pdb script.py) in OS shell I get results like this:
> c:\develop\script.py(1)<module>()
-> print('hello, world!')
(Pdb)
But when i run my script i get only two lines, but can't get pdb prompt. Writing commands to stdin after this has no effect. So my question is:
why I cannot read third line? How can I avoid this problem and get correct output?
Platform: Windows XP, Python 3.3
The third line can not be read by readline() because it is not terminated yet by the end of line. You see usually the cursor after "(pdb) " until you write anything + enter.
The communication to processes that have some prompt is usually more complicated. It proved to me to write also an independent thread for data writer first for easier testing the communication in order to be sure that the main thread never freezes if too much is tried to be written or read. Then it can be simplified again.
My problem is this--I need to get output from a subprocess and I am using the following code to call it-- (Feel free to ignore the long arguments. The importing thing is the stdout= subprocess.PIPE)
(stdout, stderr) = subprocess.Popen([self.ChapterToolPath, "-x", book.xmlPath , "-a", book.aacPath , "-o", book.outputPath+ "/" + fileName + ".m4b"], stdout= subprocess.PIPE).communicate()
print stdout
Thanks to an answer below, I've been able to get the output of the program, but I still end up waiting for the process to terminate before I get anything. The interesting thing is that in my debugger, there is all sorts of text flying by in the console and it is all ignored. But the moment that anything is written to the console in black (I am using pycharm) the program continues without a problem. Could the main program be waiting for some kind of output in order to move on? This would make sense because I am trying to communicate with it.... Is there a difference between text that I can see in the console and actual text that makes it to the stdout? And how would I collect the text written to the console?
Thanks!
The first line of the documentation for subprocess.call() describes it as such:
Run the command described by args. Wait for command to complete, then return the returncode attribute.
Thus, it necessarily waits for the subprocess to exit.
subprocess.Popen(), by contrast, does not do this, returning a handle on a process with which one than then communicate().
To get all output from a program:
from subprocess import check_output as qx
output = qx([program, arg1, arg2, ...])
To get output while the program is running:
from subprocess import Popen, PIPE
p = Popen([program, arg1, ...], stdout=PIPE)
for line in iter(p.stdout.readline, ''):
print line,
There might be a buffering issue on the program' side if it prints line-by-line when run interactively but buffers its output if run as a subprocess. There are various solutions depending on your OS or the program e.g., you could run it using pexpect module.
I have some Python code that executes an external app which works fine when the app has a small amount of output, but hangs when there is a lot. My code looks like:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
errcode = p.wait()
retval = p.stdout.read()
errmess = p.stderr.read()
if errcode:
log.error('cmd failed <%s>: %s' % (errcode,errmess))
There are comments in the docs that seem to indicate the potential issue. Under wait, there is:
Warning: This will deadlock if the child process generates enough output to a stdout or stderr pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
though under communicate, I see:
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
So it is unclear to me that I should use either of these if I have a large amount of data. They don't indicate what method I should use in that case.
I do need the return value from the exec and do parse and use both the stdout and stderr.
So what is an equivalent method in Python to exec an external app that is going to have large output?
You're doing blocking reads to two files; the first needs to complete before the second starts. If the application writes a lot to stderr, and nothing to stdout, then your process will sit waiting for data on stdout that isn't coming, while the program you're running sits there waiting for the stuff it wrote to stderr to be read (which it never will be--since you're waiting for stdout).
There are a few ways you can fix this.
The simplest is to not intercept stderr; leave stderr=None. Errors will be output to stderr directly. You can't intercept them and display them as part of your own message. For commandline tools, this is often OK. For other apps, it can be a problem.
Another simple approach is to redirect stderr to stdout, so you only have one incoming file: set stderr=STDOUT. This means you can't distinguish regular output from error output. This may or may not be acceptable, depending on how the application writes output.
The complete and complicated way of handling this is select (http://docs.python.org/library/select.html). This lets you read in a non-blocking way: you get data whenever data appears on either stdout or stderr. I'd only recommend this if it's really necessary. This probably doesn't work in Windows.
Reading stdout and stderr independently with very large output (ie, lots of megabytes) using select:
import subprocess, select
proc = subprocess.Popen(cmd, bufsize=8192, shell=False, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
with open(outpath, "wb") as outf:
dataend = False
while (proc.returncode is None) or (not dataend):
proc.poll()
dataend = False
ready = select.select([proc.stdout, proc.stderr], [], [], 1.0)
if proc.stderr in ready[0]:
data = proc.stderr.read(1024)
if len(data) > 0:
handle_stderr_data(data)
if proc.stdout in ready[0]:
data = proc.stdout.read(1024)
if len(data) == 0: # Read of zero bytes means EOF
dataend = True
else:
outf.write(data)
A lot of output is subjective so it's a little difficult to make a recommendation. If the amount of output is really large then you likely don't want to grab it all with a single read() call anyway. You may want to try writing the output to a file and then pull the data in incrementally like such:
f=file('data.out','w')
p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=subprocess.PIPE)
errcode = p.wait()
f.close()
if errcode:
errmess = p.stderr.read()
log.error('cmd failed <%s>: %s' % (errcode,errmess))
for line in file('data.out'):
#do something
Glenn Maynard is right in his comment about deadlocks. However, the best way of solving this problem is two create two threads, one for stdout and one for stderr, which read those respective streams until exhausted and do whatever you need with the output.
The suggestion of using temporary files may or may not work for you depending on the size of output etc. and whether you need to process the subprocess' output as it is generated.
As Heikki Toivonen has suggested, you should look at the communicate method. However, this buffers the stdout/stderr of the subprocess in memory and you get those returned from the communicate call - this is not ideal for some scenarios. But the source of the communicate method is worth looking at.
Another example is in a package I maintain, python-gnupg, where the gpg executable is spawned via subprocess to do the heavy lifting, and the Python wrapper spawns threads to read gpg's stdout and stderr and consume them as data is produced by gpg. You may be able to get some ideas by looking at the source there, as well. Data produced by gpg to both stdout and stderr can be quite large, in the general case.
I had the same problem. If you have to handle a large output, another good option could be to use a file for stdout and stderr, and pass those files per parameter.
Check the tempfile module in python: https://docs.python.org/2/library/tempfile.html.
Something like this might work
out = tempfile.NamedTemporaryFile(delete=False)
Then you would do:
Popen(... stdout=out,...)
Then you can read the file, and erase it later.
You could try communicate and see if that solves your problem. If not, I'd redirect the output to a temporary file.
Here is simple approach which captures both regular output plus error output, all within Python so limitations in stdout don't apply:
com_str = 'uname -a'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
Linux 3.11.0-20-generic SMP Fri May 2 21:32:55 UTC 2014
and
com_str = 'id'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
uid=1000(myname) gid=1000(mygrp) groups=1000(cell),0(root)