Output from a process - python

I have to execute a command and store the output of it in file. The output spans multiple pages and i have to press enter multiple times to see the complete output( similar to that when a man returns multiple pages). I am thinking of using the subprocess module, but how to provide input to the process, when the process prompts.

Disclaimer: I don't know which command you're actually executing so this is just a stab in the dark.
You should not have to provide any input.
Piping the output of the command to cat solves your problem:
less testfile.txt | cat
Also if your goal is to store the output in another file, you can simply to this (this will overwrite):
less testfile.txt > testfilecopy.txt
(and this will append):
less textfile.txt >> logfile.txt
See: https://unix.stackexchange.com/questions/15855/how-to-dump-a-man-page

The best solution is to check if the process does not support a command-line flag to run in "batch mode", disable paging or something similar which will suppress any such "waits". But I guess you have already done that. Given that you have to enter "-help" interactively tells me it's probably no standard unix command which are usually quite easy to run in a sub-process.
Your best bet in that case would be to use expect. There are python bindings available under pexpect.
Expect scripts tend to be fairly ugly, and error-prone. You have to be diligent with error handling. I have only limited practical experience with it as I only modified some of our existing scripts. I have not yet written one myself, but from our existing scripts I know they work, and they work reliably.

Related

Debug a Python program which seems paused for no reason

I am writing a Python program to analyze log files. So basically I have about 30000 medium-size log files and my Python script is designed to perform some simple (line-by-line) analysis of each log file. Roughly it takes less than 5 seconds to process one file.
So once I set up the processing, I just left it there and after about 14 hours when I came back, my Python script simply paused right after analyzing one log file; seems that it hasn't written into the file system for the analyzing output of this file, and that's it. No more proceeding.
I checked the memory usage, it seems fine (less than 1G), I also tried to write to the file system (touch test), it also works as normal. So my question is that, how should I proceed to debug the issue? Could anyone share some thoughts on that? I hope this is not too general. Thanks.
You may use Trace or track Python statement execution and/or The Python Debugger module.
Try this tool https://github.com/khamidou/lptrace with command:
sudo python lptrace -p <process_id>
It will print every python function your program invokes and may help you understand where your program stucks or in an infinity loop.
If it does not output anything, that's proberbly your program get stucks, so try
pstack <process_id>
to check the stack trace and find out where stucks. The output of pstack is c frames, but I believe somehow you can find something useful to solve your problem.

Why use Python's os module methods instead of executing shell commands directly?

I am trying to understand what is the motivation behind using Python's library functions for executing OS-specific tasks such as creating files/directories, changing file attributes, etc. instead of just executing those commands via os.system() or subprocess.call()?
For example, why would I want to use os.chmod instead of doing os.system("chmod...")?
I understand that it is more "pythonic" to use Python's available library methods as much as possible instead of just executing shell commands directly. But, is there any other motivation behind doing this from a functionality point of view?
I am only talking about executing simple one-line shell commands here. When we need more control over the execution of the task, I understand that using subprocess module makes more sense, for example.
It's faster, os.system and subprocess.call create new processes which is unnecessary for something this simple. In fact, os.system and subprocess.call with the shell argument usually create at least two new processes: the first one being the shell, and the second one being the command that you're running (if it's not a shell built-in like test).
Some commands are useless in a separate process. For example, if you run os.spawn("cd dir/"), it will change the current working directory of the child process, but not of the Python process. You need to use os.chdir for that.
You don't have to worry about special characters interpreted by the shell. os.chmod(path, mode) will work no matter what the filename is, whereas os.spawn("chmod 777 " + path) will fail horribly if the filename is something like ; rm -rf ~. (Note that you can work around this if you use subprocess.call without the shell argument.)
You don't have to worry about filenames that begin with a dash. os.chmod("--quiet", mode) will change the permissions of the file named --quiet, but os.spawn("chmod 777 --quiet") will fail, as --quiet is interpreted as an argument. This is true even for subprocess.call(["chmod", "777", "--quiet"]).
You have fewer cross-platform and cross-shell concerns, as Python's standard library is supposed to deal with that for you. Does your system have chmod command? Is it installed? Does it support the parameters that you expect it to support? The os module will try to be as cross-platform as possible and documents when that it's not possible.
If the command you're running has output that you care about, you need to parse it, which is trickier than it sounds, as you may forget about corner-cases (filenames with spaces, tabs and newlines in them), even when you don't care about portability.
It is safer. To give you an idea here is an example script
import os
file = raw_input("Please enter a file: ")
os.system("chmod 777 " + file)
If the input from the user was test; rm -rf ~ this would then delete the home directory.
This is why it is safer to use the built in function.
Hence why you should use subprocess instead of system too.
There are four strong cases for preferring Python's more-specific methods in the os module over using os.system or the subprocess module when executing a command:
Redundancy - spawning another process is redundant and wastes time and resources.
Portability - Many of the methods in the os module are available in multiple platforms while many shell commands are os-specific.
Understanding the results - Spawning a process to execute arbitrary commands forces you to parse the results from the output and understand if and why a command has done something wrong.
Safety - A process can potentially execute any command it's given. This is a weak design and it can be avoided by using specific methods in the os module.
Redundancy (see redundant code):
You're actually executing a redundant "middle-man" on your way to the eventual system calls (chmod in your example). This middle man is a new process or sub-shell.
From os.system:
Execute the command (a string) in a subshell ...
And subprocess is just a module to spawn new processes.
You can do what you need without spawning these processes.
Portability (see source code portability):
The os module's aim is to provide generic operating-system services and it's description starts with:
This module provides a portable way of using operating system dependent functionality.
You can use os.listdir on both windows and unix. Trying to use os.system / subprocess for this functionality will force you to maintain two calls (for ls / dir) and check what operating system you're on. This is not as portable and will cause even more frustration later on (see Handling Output).
Understanding the command's results:
Suppose you want to list the files in a directory.
If you're using os.system("ls") / subprocess.call(['ls']), you can only get the process's output back, which is basically a big string with the file names.
How can you tell a file with a space in it's name from two files?
What if you have no permission to list the files?
How should you map the data to python objects?
These are only off the top of my head, and while there are solutions to these problems - why solve again a problem that was solved for you?
This is an example of following the Don't Repeat Yourself principle (Often reffered to as "DRY") by not repeating an implementation that already exists and is freely available for you.
Safety:
os.system and subprocess are powerful. It's good when you need this power, but it's dangerous when you don't. When you use os.listdir, you know it can not do anything else other then list files or raise an error. When you use os.system or subprocess to achieve the same behaviour you can potentially end up doing something you did not mean to do.
Injection Safety (see shell injection examples):
If you use input from the user as a new command you've basically given him a shell. This is much like SQL injection providing a shell in the DB for the user.
An example would be a command of the form:
# ... read some user input
os.system(user_input + " some continutation")
This can be easily exploited to run any arbitrary code using the input: NASTY COMMAND;# to create the eventual:
os.system("NASTY COMMAND; # some continuation")
There are many such commands that can put your system at risk.
For a simple reason - when you call a shell function, it creates a sub-shell which is destroyed after your command exists, so if you change directory in a shell - it does not affect your environment in Python.
Besides, creating sub-shell is time consuming, so using OS commands directly will impact your performance
EDIT
I had some timing tests running:
In [379]: %timeit os.chmod('Documents/recipes.txt', 0755)
10000 loops, best of 3: 215 us per loop
In [380]: %timeit os.system('chmod 0755 Documents/recipes.txt')
100 loops, best of 3: 2.47 ms per loop
In [382]: %timeit call(['chmod', '0755', 'Documents/recipes.txt'])
100 loops, best of 3: 2.93 ms per loop
Internal function runs more than 10 time faster
EDIT2
There may be cases when invoking external executable may yield better results than Python packages - I just remembered a mail sent by a colleague of mine that performance of gzip called through subprocess was much higher than the performance of a Python package he used. But certainly not when we are talking about standard OS packages emulating standard OS commands
Shell call are OS specific whereas Python os module functions are not, in most of the case. And it avoid spawning a subprocess.
It's far more efficient. The "shell" is just another OS binary which contains a lot of system calls. Why incur the overhead of creating the whole shell process just for that single system call?
The situation is even worse when you use os.system for something that's not a shell built-in. You start a shell process which in turn starts an executable which then (two processes away) makes the system call. At least subprocess would have removed the need for a shell intermediary process.
It's not specific to Python, this. systemd is such an improvement to Linux startup times for the same reason: it makes the necessary system calls itself instead of spawning a thousand shells.

How to write python code to access input and output from a program written in C?

There is a program written and compiled in C, with typical data input from a Unix shell; on the other hand, I'm using Windows.
I need to send input to this program from the output of my own code written in Python.
What is the best way to go about doing this? I've read about pexpect, but not sure really how to implement it; can anyone explain the best way to go about this?
i recommend you use the python subprocess module.
it is the replacement of the os.popen() function call, and it allows to execute a program while interacting with its standard input/output/error streams through pipes.
example use:
import subprocess
process = subprocess.Popen("test.exe", stdin=subprocess.PIPE, stdout=subprocess.PIPE)
input,output = process.stdin,process.stdout
input.write("hello world !")
print(output.read().decode('latin1'))
input.close()
output.close()
status = process.wait()
If you don't need to deal with responding to interactive questions and prompts, don't bother with pexpect, just use subprocess.communicate, as suggested by Adrien Plisson.
However, if you do need pexpect, the best way to get started is to look through the examples on its home page, then start reading the documentation once you've got a handle on what exactly you need to do.

unpredictable behaviour with python subprocess calls

I'm writing a python script that performs a series of operations in a loop, by making subprocess calls, like so:
os.system('./svm_learn -z p -t 2 trial-input model')
os.system('./svm_classify test-input model pred')
os.system('python read-svm-rank.py')
score = os.popen('python scorer.py -g gold-test -i out').readline()
When I make the calls individually one after the other in the shell they work fine. But within the script they always break. I've traced the source of the error and it seems that the output files are getting truncated towards the end (leading me to believe that calls are being made without previous ones being completed).
I tried with subprocess.Popen and then using the wait() method of the Popen object, but to no avail. The script still breaks.
Any ideas what's going on here?
I'd probably first rewrite a little to use the subprocess module instead of the os module.
Then I'd probably scrutinize what's going wrong by studying a system call trace:
http://stromberg.dnsalias.org/~strombrg/debugging-with-syscall-tracers.html
Hopefully there'll be an "E" error code near the end of the file that'll tell you what error is being encountered.
Another option would be to comment out subsets of your subprocesses (assuming the n+1th doesn't depend heavily on the output of the nth), to pin down which one of them is having problems. After that, you could sprinkle some extra error reporting in the offending script to see what it's doing.
But if you're not put off by C-ish syscall traces, that might be easier.

os.system() failing in python

I'm trying to parse some data and make graphs with python and there's an odd issue coming up. A call to os.system() seems to get lost somewhere.
The following three lines:
os.system('echo foo bar')
os.system('gnuplot test.gnuplot')
os.system('gnuplot --version')
Should print:
foo bar
Warning: empty x range [2012:2012], adjusting to [1991.88:2032.12]
gnuplot 4.4 patchlevel 2
But the only significant command in the middle seems to get dropped. The script still runs the echo and version check, and running gnuplot by itself (the gnuplot shell) works too, but there is no warning and no file output from gnuplot.
Why is this command dropped, and why completely silently?
In case it's helpful, the invocation should start gnuplot, it should open a couple of files (the instructions and a data file indicated therein) and write out to an SVG file. I tried deleting the target file so it wouldn't have to overwrite, but to no avail.
This is python 3.2 on Ubuntu Natty x86_64 virtual machine with the 2.6.38-8-virtual kernel.
Is the warning printed to stderr, and that is intercepted somehow?
Try using subprocess instead, for example using
subprocess.check_output(cmd, stderr=subprocess.STDOUT)
and checking the output.
(or plaing subprocess.call might work better than os.system)
So, it turned out the issue was something I failed to mention. Earlier in the script test.gnuplot and test.data were written, but I neglected to call the file objects' close() and verify that they got closed (still don't know how to do that last part so for now it cycles for a bit). So there was some unexpected behaviour going on there causing gnuplot to see two unreadable files, take no action, produce no output, and return 0.
I guess nobody gets points for this one.
Edit: I finally figured it out with the help of strace. Don't know how I did things before I learned how to use it.
don't use os.system. Use subprocess module.
os.system documentation says:
The subprocess module provides more powerful facilities for spawning
new processes and retrieving their results; using that module is
preferable to using this function.
Try this:
subprocess.check_call(['gnuplot', 'test.gnuplot'])

Categories