Chaining subprocess.Popen to similuate pipes [duplicate] - python

How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.

import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)

To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()

The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)

http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().

Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.

The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.

EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.

For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')

Related

How to execute '<(cat fileA fileB)' using python?

I am writing a python program that uses other software. I was able to pass the command using subprocess.popen. I am facing a new problem: I need to concatenate multiples files as two
files and use them as the input for the external program. The command line looks like this:
extersoftware --fq --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)
I cannot use shell=True because there are other commands I need to pass by variables, such as --fq.(They are not limited to --fq, here is just an example)
One possible solution is to generate middle file.
This is what I have tried:
file_1 = ['cat', 'fileA_1', 'fileB_1']
p1 = Popen(file_1, stdout=PIPE)
p2 = Popen(['>', 'output_file'], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()
output = p2.communicate()
print output
I got error message: OSError: [Errno 2] No such file or directory Which part did I do wrong?
It would be better if there is no middle file. For this reason, I am looking at named pipe. I do not quiet understand it.
I have looked at multiple questions that have been answered here. To me they are all some how different from my question here.
Thanks ahead for all your help.
The way bash handles <(..) is to:
Create a pipe
Fork a command that writes to the write end
Substitute the <(..) for /dev/fd/N where N is the input end file descriptor of the pipe (try echo <(true)).
Run the command
The command will then open /dev/fd/N, and the OS will cause that to duplicate the inherited read end of the pipe.
We can do the same thing in Python:
import subprocess
import os
# Open a pipe and run a command that writes to the write end
input_fd, output_fd = os.pipe()
subprocess.Popen(["cat", "foo.txt", "bar.txt"], shell=False, stdout=output_fd)
os.close(output_fd);
# Run a command that uses /dev/fd/* to read from the read end
proc = subprocess.Popen(["wc", "/dev/fd/" + str(input_fd)],
shell=False, stdout = subprocess.PIPE)
# Read that command's output
print proc.communicate()[0]
For example:
$ cat foo.txt
Hello
$ cat bar.txt
World
$ wc <(cat foo.txt bar.txt)
2 2 12 /dev/fd/63
$ python test.py
2 2 12 /dev/fd/4
Process substitution returns the device filename that is being used. You will have to assign the pipe to a higher FD (e.g. 20) by passing a function to preexec_fn that uses os.dup2() to copy it, and then pass the FD device filename (e.g. /dev/fd/20) as one of the arguments of the call.
def assignfd(fd, handle):
def assign():
os.dup2(handle, fd)
return assign
...
p2 = Popen(['cat', '/dev/fd/20'], preexec_fn=assignfd(20, p1.stdout.fileno()))
...
It's actually possible have it both ways -- using a shell, while passing a list of arguments through unambiguously in a way that doesn't allow them to be shell-parsed.
Use bash explicitly rather than shell=True to ensure that you have support for <(), and use "$#" to refer to the additional argv array elements, like so:
subprocess.Popen(['bash', '-c',
'extersoftware "$#" --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)',
"_", # this is a dummy passed in as argv[0] of the interpreter
"--fq", # this is substituted into the shell by the "$#"
])
If you wanted to independently pass in all three arrays -- extra arguments, and the exact filenames to be passed to each cat instance:
BASH_SCRIPT=r'''
declare -a filelist1=( )
filelist1_len=$1; shift
while (( filelist1_len-- > 0 )); do
filelist1+=( "$1" ); shift
done
filelist2_len=$1; shift
while (( filelist2_len-- > 0 )); do
filelist2+=( "$1" ); shift
done
extersoftware "$#" --f <(cat "${filelist1[#]}") <(cat "${filelist2[#]}")
'''
subprocess.Popen(['bash', '-c', BASH_SCRIPT, '' +
[str(len(filelist1))] + filelist1 +
[str(len(filelist2))] + filelist2 +
["--fq"],
])
You could put more interesting logic in the embedded shell script as well, were you so inclined.
In this specific case, we may use:
import subprocess
import os
if __name__ == '__main__':
input_fd1, output_fd1 = os.pipe()
subprocess.Popen(['cat', 'fileA_1', 'fileB_1'],
shell=False, stdout=output_fd1)
os.close(output_fd1)
input_fd2, output_fd2 = os.pipe();
subprocess.Popen(['cat', 'fileA_2', 'fileB_2'],
shell=False, stdout=output_fd2)
os.close(output_fd2)
proc = subprocess.Popen(['extersoftware','--fq', '--f',
'/dev/fd/'+str(input_fd1), '/dev/fd/' + str(input_fd2)], shell=False)
Change log:
Reformatted the code so it should be easier to read now (and hopefully still syntactically correct). It's tested in Python 2.6.6 on Scientific Linux 6.5 and everything looks fine.
Removed unnecessary semicolons.

Python communicate vs shell=True

I'm trying to do the right thing by porting a Python script that invokes a number of shell command lines via
subprocess.call(... | ... | ... , shell=True)
to one that avoid the security risk of shell=True by using Popen. So I have written a little sample script to try things out. It executes the command line
awk '{print $1 " - " $2}' < scores.txt | sort | python uppercase.py > teams.txt
as follows:
with open('teams.txt', 'w') as destination:
with open('scores.txt', 'r') as source:
p3 = Popen(['python', 'uppercase.py'], stdin=PIPE, stdout=destination)
p2 = Popen(['sort'], stdin=PIPE, stdout=p3.stdin)
p1 = Popen(['awk', '{print $1 " - " $2}'], stdin=source, stdout=p2.stdin)
p1.communicate()
This program works with a small data set.
Now I was struck by the following line from the documentation of the communicate method:
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
What? But I have huge files that need to be awk'd and sorted, among other things. The reason I tried to use communicate in the first place is that I saw this warning for subprocess.call:
Note Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.
I'm really confused. It seems my choices are:
use call with shell=True (security risk, they say)
use PIPE with call (but then risk deadlock)
use Popen and communicate (but my data is too large, 100s of megabytes).
What am I missing? How do I create a several process pipeline in Python for very large files without shell=True, or is shell=True acceptable?
The note about "buffered in memory" only makes sense if you use something like stdout=PIPE. It doesn't apply to stdout=file (os.dup2() does the redirecting at OS file descriptor level, there is nothing to do for .communicate() method).
Don't use call and PIPE. call() is (simplified) just Popen().wait() i.e., it does not read from the pipe. Do not use PIPE unless you read from (write to) the pipe (there is no point).
In your code p1.communicate() doesn't read any data. You could replace it with p1.wait(). Your code is missing p3.stdin.close(); ... ; p2.stdin.close(); ... ; p3.wait(), p2.wait()
Otherwise, the code works for large files.
On shell=True
If the command is hardcoded (as in your question) then there is no security risk. If the command may come from an untrusted source then it doesn't matter how do you run this command (the untrusted source may run whatever it likes in this case). If only some arguments come from an untrusted source then you could use plumbum module to avoid reimplementing the pipeline yourself:
from plumbum.cmd import awk, sort, python
(awk['{print $1 " - " $2}'] < untrusted_filename | sort |
python["uppercase.py"] > "teams.txt")()
See How do I use subprocess.Popen to connect multiple processes by pipes?

Python: how to get the final output of multiple system commands?

There are many posts here on SO, like this one: Store output of subprocess.Popen call in a string
There is problem with complicated commands. For example, if I need to get output from this
ps -ef|grep something|wc -l
Subprocess won't do the job, because argument for subprocess is [name of program, arguments], so it is not possible to use more sophisicated commands (more programs, pipes, etc.).
Is there way to capture the output of a chain of multiple commands?
Just pass the shell=True option to subprocess
import subprocess
subprocess.check_output('ps -ef | grep something | wc -l', shell=True)
For a no-shell, clean version using the subprocess module, you can use the following example (from the documentation):
output = `dmesg | grep hda`
becomes
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
The Python program essentially does here what the shell does: it sends the output of each command to the next one in turn. An advantage of this approach is that the programmer has full control on the individual standard error outputs of the commands (they can be suppressed if needed, logged, etc.).
That said, I generally prefer to use instead the subprocess.check_output('ps -ef | grep something | wc -l', shell=True) shell-delegation approach suggested by nneonneo: it is general, very legible and convenient.
Well, another alternative would just be to implement part of the command in plain Python. For example,
count = 0
for line in subprocess.check_output(['ps', '-ef']).split('\n'):
if something in line: # or re.search(something, line) to use regex
count += 1
print count

Linking subprocesses in Python

Hi I had a question about linking input and output with sub-processes in python. I am trying to simplify the program by skipping the output of one step by passing it to another subprocess rather than output it to a file. Then open another process to run on that file.
E.g. First process uses SAMTOOLS to output a specific chromosome from a large bam file.
So...
bigfile.bam is read in and outputs chromosome22.bam
The next subprocess uses BEDTOOLS to convert that chromosome22.bam to chromosome22.bed
So...
chromosome22.bam is read in and outputs chromosome22.bed
What I want to do is pass the stdout of the first process into the second so there is no need for the intermediate file.
So far I have this...
for x in 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,'X','Y':
subprocess.call("%s view -bh %s %s > %s/%s/%s.bam" % (samtools,bam,x,bampath,out,x), shell=True)
This makes the chromosome[1-22,X,Y].bam files. But can I avoid this and put another subprocess command in the same loop to convert them to bed files?
The command for bed conversion is:
bedpath/bedtools bamtobed -i [bamfile] > [bedfile]
Please have a look at the replacing shell pipeline example in the documentation.
output=$(dmesg | grep hda)
becomes:
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
The explanation is:
The p1.stdout.close() call after starting the p2 is important in order for p1 to receive a SIGPIPE if p2 exits before p1.
No need to use python here. Much easier in shell. But essentially, it works the same as in python.
If bedtools can read from stdin, you can e.g. do
#!/bin/sh
for x in `seq 1 22` X Y; do
$samtools view -bh $bam $x | $bedtools bamtobed > $bampath/$out/$x.bam
done
Depending on how bedtools was desinged, you might also need to use -i - to have it read from stdin.
If you stick with python, I strongly recommend about learning how to do this
without doing it in all shell,
without producing shell commands, that you need to escape properly to avoid errors
subprocess is more safe to use when you use the array-based syntax and no shell.
Make that two subprocess invocations, one for each command. See http://docs.python.org/library/subprocess.html#replacing-shell-pipeline for more details.
cmd1 = [samtools, "view", "-bh", bam, x]
cmd2 = [bedtools, "bamtobed"]
c1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE)
c2 = subprocess.Popen(cmd2, stdin=c1.stdout, stdout=open(outputfilename, "w"))
c1.stdout.close()
c2.communicate()
Yes, you can use the pipe functionality. See if you can read from stdin for the bamtobed process ... if you can, try the following. This way you save on the disk IO time assuming the processing load is light.
SLIGHT modification:
proc1.stdout is now the stdin for the 2nd process.
proc1 = subprocess.call("%s view -bh %s %s" % (samtools,bam,x,bampath,out,x), shell=True, stdout=subprocess.PIPE)
proc2 = subprocess.call("bedpath/bedtools bamtobed > %s" % (outFileName, ), shell=True, stdin=proc1.stdout)

Interactive Python script output stored in some file

How do I perform logging of all activities that are done by a Python script and all scripts that are called from it?
I had several Bash scripts but now wrote a Python script which call all of these Bash scripts. I would like to have all output produced from these scripts stored in some file.
The script is interactive Python script, i.e contains raw_input lines, so I couldn't do like 'python script.py | tee log.txt' for overall the Python script since for some reasons questions are not seen on the screen.
Here is an excerpt from the script which calls one of the shell scripts.
cmd = "somescript.sh"
try:
retvalue = subprocess.check_call(cmd, shell=True)
except subprocess.CalledProcessError:
print ("script command has been failed")
sys.exit("exit from script")
What do you think could be done here?
Edit
Two subquestions based on Alex's answer:
How to make the answers on the questions stored in the output file as well? For example on line ok = raw_input(prompt) the user will be asked for the question and I would like to the answer logged as well.
I read about Popen and communicate and didn't use since it buffers the data in memory. Here the amount of output is big and I need to care about standard-error with standard-output as well. Do you know if this is possible to handle with Popen and communicate method as well?
Making Python's own prints go to both the terminal and a file is not hard:
>>> import sys
>>> class tee(object):
... def __init__(self, fn='/tmp/foo.txt'):
... self.o = sys.stdout
... self.f = open(fn, 'w')
... def write(self, s):
... self.o.write(s)
... self.f.write(s)
...
>>> sys.stdout = tee()
>>> print('hello world!')
hello world!
>>>
$ cat /tmp/foo.txt
hello world!
This should work both in Python 2 and Python 3.
To similarly direct the output from subcommands, don't use
retvalue = subprocess.check_call(cmd, shell=True)
which lets cmd's output go to its regular "standard output", but rather grab and re-emit it yourself, as follows:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
so, se = p.communicate()
print(so)
retvalue = p.returncode
assuming you don't care about standard-error (only standard-output) and the amount of output from cmd is reasonably small (since .communicate buffers that data in memory) -- it's easy to tweak if either assumption doesn't correspond to what you exactly want.
Edit: the OP has now clarified the specs in a long comment to this answer:
How to make the answers on the
questions stored in the output file
as well? For example on line ok =
raw_input(prompt) the user will be
asked for the question and I would
like to the answer logged as well.
Use a function such as:
def echoed_input(prompt):
response = raw_input(prompt)
sys.stdout.f.write(response)
return response
instead of just raw_input in your application code (of course, this is written specifically to cooperate with the tee class I showed above).
I read about Popen and communicate
and didn't use since it buffers the
data in memory. Here amount of output
is big and I need to care about
standard-error with standard-output
as well. Do you know if this is
possible to handle with Popen and
communicate method as well?
communicate is fine as long as you don't get more output (and standard-error) than comfortably fits in memory, say a few gigabytes at most depending on the kind of machine you're using.
If this hypothesis is met, just recode the above as, instead:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
so, se = p.communicate()
print(so)
retvalue = p.returncode
i.e., just redirect the subcommand's stderr to get mixed into its stdout.
If you DO have to worry about gigabytes (or whatever) coming at you, then
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
for line in p.stdout:
sys.stdout.write(p)
p.wait()
retvalue = p.returncode
(which gets and emits one line at a time) may be preferable (this depends on cmd not expecting anything from its standard input, of course... because, if it is expecting anything, it's not going to get it, and the problem starts to become challenging;-).
Python has a tracing module: trace. Usage: python -m trace --trace file.py
If you want to capture the output of any script, then on a *nix-y system you can redirect stdout and stderr to a file:
./script.py >> /tmp/outputs.txt 2>> /tmp/outputs.txt
If you want everything done by the scripts, not just what they print, then the python trace module won't trace things done by external scripts that your python executes. The only thing that can trace every action done by a program would be something like DTrace, if you are lucky enough to have a system that supports it. (OS X Instruments are based on DTrace)

Categories