How can I call in2csv from a Python script? - python

When I try calling the below code, I run into the following error: "You must specify a format when providing data via STDIN (pipe)."
subprocess.call(["in2csv", "--format", "xls", a_file, ">", output_file], shell=True)
I'm not sure why this is the case because I am telling it what the initial format is. I've looked at the docs, which isn't clear about the distinction between --format and -f.
Update: I've changed it to use argparse to simplify passing the arguments following this recommendation. I'm also using Popen as used here, which is apparently safer than using shell=true flag according to the docs.
parser = argparse.ArgumentParser()
parser.add_argument('in2csv')
parser.add_argument('--format')
parser.add_argument('xls')
parser.add_argument(a_file)
parser.add_argument(">")
parser.add_argument(output_file)
args = parser.parse_args()
print args
subprocess.Popen(args)

Errors like what you've seen are a symptom of the shell getting confused by the string passed in, for instance because of a space in a filename. It is indeed best to avoid using the shell when spawning processes from Python.
Instead of adding ">" and output_file as arguments, try redirecting the output using the stdout keyword argument, which takes a file that output will be written to.
Assuming:
a_file is a string with the name of your input file, and
output_file is a string with the name of your desired output file,
a working call might look like:
with open(output_file, 'wb') as of:
subprocess.check_call(["in2csv", "--format", "xls", a_file],
stdout=of)
It's not necessary to use argparse here; it's meant for handling command lines coming in to your program, rather than going out from it.

Related

Using a glob to generate arguments with subprocess.run()

I want to use metaflac (https://linux.die.net/man/1/metaflac) command from within a python script.
from subprocess import run
flac_files = "/home/fricadelle/Artist - Album (2008)/*.flac"
run(['metaflac', '--add-replay-gain', flac_files])
I get
The FLAC file could not be opened. Most likely the file does not exist
or is not readable.
if I add shell = True to the run function I'd get:
ERROR: you must specify at least one FLAC file;
metaflac cannot be used as a pipe
So what do I do wrong? Thanks!
PS: of course the command works fine in a shell:
metaflac --add-replay-gain /home/fricadelle/Artist\ -\ Album \(2008\)/*.flac
Unless you specify shell=True (and as a first approximation, you should never specify shell=True), the arguments you provide are passed as is, with no shell expansions, word-splitting or dequoting. So the filename you pass as an argument is precisely /home/fricadelle/Artist - Album (2008)/*.flac, which is not the name of any file. (That's why you don't need to add backslashes before the spaces and parentheses. If you specified shell=True -- and I repeat, you really should avoid that -- then you would need to include backslashes so that the shell doesn't split the name into several different words.)
When you type
flac_files = "/home/fricadelle/Artist - Album (2008)/*.flac unquoted in a shell, the shell will try to expand that to a list of all the files whose names match then pattern, and will then pass that list as separate arguments. Since subprocess.run doesn't do this, you will have to do it yourself, which you would normally do with glob.glob. For example,
from subprocess import run
from glob import glob
flac_files = "/home/fricadelle/Artist - Album (2008)/*.flac"
run(['metaflac', '--add-replay-gain'] + glob(flac_files))
Note: unlike the shell, glob.glob will return an empty list if the pattern matches no files. You really should check for this error rather than invoke metaflac with no filename options.
See the answer here for a better explanation.
Globbing doesn't work the way you're expecting it to here, you need to specify shell=True, but then you'll need to drop the list.
run('metaflac --add-replay-gain ' + flac_files, shell=True)
Should do the trick.

subprocess.call on path with space

the string that contains a file looks like this in the console:
>>> target_file
'src//data//annual_filings//ABB Ltd//ABB_ar_2015.pdf'
I got the target_file from a call to os.walk
The goal is to build a command to run in subprocess.call
Something like:
from subprocess import call
cmd_ = r'qpdf-7.0.0/bin/qpdf --password=%s --decrypt %s %s' %('', target_file, target_file)
call([cmd_])
I tried different variations, setting shell to either True or False.
Replacing the // with /,\ etc.
The issue seems to be with the space in the folder (I can not change the folder name).
The python code needs to run on Windows
you have to define cmd_ as a list of arguments not a list with a sole string in it, or subprocess interprets the string as the command (doesn't even try to split the args):
cmd_ = ['qpdf-7.0.0/bin/qpdf','--password=%s'%'','--decrypt',target_file, target_file]
call(cmd_)
and leave the quoting to subprocess
As a side note, no need to double the slashes. It works, but that's unnecessary.

Can I set stdin as required with Python argparse?

I have a script that uses some arguments and some stdin data.
For checking arguments I use argparse.ArgumentParser
Is it possible to check if any stdin data is given? Something like that:
parser.add_argument('infile', nargs='?', type=argparse.FileType('r'), default=sys.stdin, required=True)
but this example gives this error:
TypeError: 'required' is an invalid argument for positionals
No. It wont't read from whatever file you pass it, be it given on the command line, or stdin. You will get an open file handle, with not even a single byte/char consumed.
Simply read the data yourself, for instance with data = args.infile.read() (assuming args is the result of parsing`).
You can then test if it is empty, with a simple if not data:...
But usually, if you expect data in a specific format, the best is to simply try to parse it, and raise an error if you fail. Either empty data is invalid (json for instance), or it is valid but then it should be an acceptable input.
(as for the error, required only tells whether some option must be given on the command line or not, for --options and -o options. Positionals are always required unless you change their numbers with nargs).
The error is just because of the required=True parameter; and the message tells you what is wrong. It should be:
parser.add_argument('infile', nargs='?', type=argparse.FileType('r'), default=sys.stdin)
By 'calling' this infile, as opposed to '--infile', you've created a positional argument. argparse itself determines whether it is required or not. With nargs='?' it can't be required. It's by definition optional (but not an optionals argument :) ).
The FileType type lets you name a file (or '-') in the commandline. It will open it (stdin is already open) and assign it to the args.infile attribute. It does nothing more.
So after parsing, using args.infile gives you access to this open file, which you can read as needed (and optionally close if not stdin).
So this is a convenient way of letting your users specify which file should be opened for use in your code. It was intended for simple scripts that read one file, do something, and write to another.
But if all you are looking at is stdin, there isn't any point in using this type. sys.stdin is always available for reading. And there isn't any way of making the parser read stdin. It parses sys.argv which comes from the commandline.
There is an # prefix file feature that tells the parser to read commandline strings from a file. It parses the file and splices the values into sys.argv. See the argparse docs.
From the docs The add_argument() method
required - Whether or not the command-line option may be omitted (optionals only).
The required keyword is only used for options (e.g., -f or --foo) not for positional arguments. Just take it out.
parser.add_argument('infile', nargs='?', type=argparse.FileType('r'),
default=sys.stdin)
When parsed infile will either be a string or the sys.stdin file object. You would need to read that file to see if there is anything in it. Reading can be risky... you may block forever. But that just means that the user didn't follow instructions.

Python subprocess library: Running grep command from Python

I am trying to run grep command from my Python module using the subprocess library. Since, I am doing this operation on the doc file, I am using Catdoc third party library to get the content in a plan text file. I want to store the content in a file. I don't know where I am going wrong but the program fails to generate a plain text file and eventually to get the grep result. I have gone through the error log but its empty. Thanks for all the help.
def search_file(name, keyword):
#Extract and save the text from doc file
catdoc_cmd = ['catdoc', '-w' , name, '>', 'testing.txt']
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
output = catdoc_process.communicate()[0]
grep_cmd = []
#Search the keyword through the text file
grep_cmd.extend(['grep', '%s' %keyword , 'testing.txt'])
print grep_cmd
p = subprocess.Popen(grep_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
stdoutdata = p.communicate()[0]
print stdoutdata
On UNIX, specifying shell=True will cause the first argument to be treated as the command to execute, with all subsequent arguments treated as arguments to the shell itself. Thus, the > won't have any effect (since with /bin/sh -c, all arguments after the command are ignored).
Therefore, you should actually use
catdoc_cmd = ['catdoc -w "%s" > testing.txt' % name]
A better solution, though, would probably be to just read the text out of the subprocess' stdout, and process it using re or Python string operations:
catdoc_cmd = ['catdoc', '-w' , name]
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
for line in catdoc_process.stdout:
if keyword in line:
print line.strip()
I think you're trying to pass the > to the shell, but that's not going to work the way you've done it. If you want to spawn a process, you should arrange for its standard out to be redirected. Fortunately, that's really easy to do; all you have to do is open the file you want the output to go to for writing and pass it to popen using the stdout keyword argument, instead of PIPE, which causes it to be attached to a pipe which you can read with communicate().

How to use subprocess when multiple arguments contain spaces?

I'm working on a wrapper script that will exercise a vmware executable, allowing for the automation of virtual machine startup/shutdown/register/deregister actions. I'm trying to use subprocess to handle invoking the executable, but the spaces in the executables path and in parameters of the executable are not being handled correctly by subprocess. Below is a code fragment:
vmrun_cmd = r"c:/Program Files/VMware/VMware Server/vmware-cmd.bat"
def vm_start(target_vm):
list_arg = "start"
list_arg2 = "hard"
if vm_list(target_vm):
p = Popen([vmrun_cmd, target_vm, list_arg, list_arg2], stdout=PIPE).communicate()[0]
print p
else:
vm_register(target_vm)
vm_start(target_vm)
def vm_list2(target_vm):
list_arg = "-l"
p = Popen([vmrun_cmd, list_arg], stdout=PIPE).communicate()[0]
for line in p.split('\n'):
print line
If I call the vm_list2 function, I get the following output:
$ ./vmware_control.py --list
C:\Virtual Machines\QAW2K3Server\Windows Server 2003 Standard Edition.vmx
C:\Virtual Machines\ubunturouter\Ubuntu.vmx
C:\Virtual Machines\vacc\vacc.vmx
C:\Virtual Machines\EdgeAS-4.4.x\Other Linux 2.4.x kernel.vmx
C:\Virtual Machines\UbuntuServer1\Ubuntu.vmx
C:\Virtual Machines\Other Linux 2.4.x kernel\Other Linux 2.4.x kernel.vmx
C:\Virtual Machines\QAClient\Windows XP Professional.vmx
If I call the vm_start function, which requires a path-to-vm parameter, I get the following output:
$ ./vmware_control.py --start "C:\Virtual Machines\ubunturouter\Ubuntu.vmx"
'c:\Program' is not recognized as an internal or external command,
operable program or batch file.
Apparently, the presence of a second parameter with embedded spaces is altering the way that subprocess is interpreting the first parameter. Any suggestions on how to resolve this?
python2.5.2/cygwin/winxp
If you have spaces in the path, the easiest way I've found to get them interpreted properly is this.
subprocess.call('""' + path + '""')
I don't know why exactly it needs double double quotes, but that is what works.
I believe that list2cmdline(), which is doing the processing of your list args, splits any string arg on whitespace unless the string contains double quotes. So I would expect
vmrun_cmd = r'"c:/Program Files/VMware/VMware Server/vmware-cmd.bat"'
to be what you want.
You'll also likely want to surround the other arguments (like target_vm) in double quotes on the assumption that they, too, each represent a distinct arg to present to the command line. Something like
r'"%s"' % target_vm
(for example) should suit.
See the list2cmdline documentation
'c:\Program' is not recognized as an internal or external command, operable program or batch file.
To get this message, you are either:
Using shell=True:
vmrun_cmd = r"c:\Program Files\VMware\VMware Server\vmware-cmd.bat"
subprocess.Popen(vmrun_cmd, shell=True)
Changing vmrun_cmd on other part of your code
Getting this error from something inside vmware-cmd.bat
Things to try:
Open a python prompt, run the following command:
subprocess.Popen([r"c:\Program Files\VMware\VMware Server\vmware-cmd.bat"])
If that works, then quoting issues are out of the question. If not, you've isolated the problem.
In Python on MS Windows, the subprocess.Popen class uses the CreateProcess API to started the process. CreateProcess takes a string rather than something like an array of arguments. Python uses subprocess.list2cmdline to convert the list of args to a string for CreateProcess.
If I were you, I'd see what subprocess.list2cmdline(args) returns (where args is the first argument of Popen). It would be interesting to see if it is putting quotes around the first argument.
Of course, this explanation might not apply in a Cygwin environment.
Having said all this, I don't have MS Windows.
One problem is that if the command is surrounded with quotes and doesn't have spaces, that could also confuse the shell.
So I do this:
if ' ' in raw_cmd:
fmt = '"%s"'
else:
fmt = '%s'
cmd = fmt % raw_cmd
That was quite a hard problem for the last three ours....nothing stated so far did work, neither using r"" or Popen with a list and so on. What did work in the end was a combination of format string and r"". So my solution is this:
subprocess.Popen("{0} -f {1}".format(pathToExe, r'"%s"' % pathToVideoFileOrDir))
where both variables pathToExe and pathToVideoFileOrDir have whitespaces in their path. Using \" within the formatted string did not work and resulted in the same error that the first path is not detected any longer correctly.
Possibly stupid suggestion, but perhaps try the following, to remove subprocess + spaces from the equation:
import os
from subprocess Popen, PIPE
os.chdir(
os.path.join("C:", "Program Files", "VMware", "VMware Server")
)
p = Popen(
["vmware-cmd.bat", target_vm, list_arg, list_arg2],
stdout=PIPE
).communicate()[0]
It might also be worth trying..
p = Popen(
[os.path.join("C:", "Program Files", "VMware", "VMware Server", "vmware-cmd.bat"), ...
You probably don't want to use Pipe
If the output of the subprogram is greater than 64KB it is likely your process will crash.
http://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/
Subprocess.Popen has a keyword argument shell, making it as if the shell has been parsing your arguments, setting shell=True should do what you want.
Why are you using r""? I believe that if you remove the "r" from the beginning, it will be treated as a standard string which may contain spaces. Python should then properly quote the string when sending it to the shell.
Here's what I don't like
vmrun_cmd = r"c:/Program Files/VMware/VMware Server/vmware-cmd.bat"
You've got spaces in the name of the command itself -- which is baffling your shell. Hence the "'c:\Program' is not recognized as an internal or external command,
operable program or batch file."
Option 1 -- put your .BAT file somewhere else. Indeed, put all your VMWare somewhere else. Here's the rule: Do Not Use "Program Files" Directory For Anything. It's just wrong.
Option 2 -- quote the vmrun_cmd value
vmrun_cmd = r'"c:/Program Files/VMware/VMware Server/vmware-cmd.bat"'

Categories