Using a variable containing doublequotes with subprocess - python

I am having some trouble with the subprocess module. I would like the module to run the shell command equivalent to 'ls -l "/path/to/file/with possible space in directory/or with space in name"'. Subprocess works fine when the filename is not a variable. If the filename is a variable that contains the quotes, then it doesn't work.
Code that doesn't work:
import subprocess
archive_file_list = "/var/tmp/list"
archive = open(archive_file_list, "r")
for line in archive:
noreturnline = line[:-1]
quotedline = "\"" + noreturnline + "\""
if extension == "zip":
print quotedline
archivelist = subprocess.check_output(['ls', '-l', quotedline])
print archivelist
Code that works:
archivelist = subprocess.check_output(['ls', '-l', "/path/to/file/with possible space in directory/or with space in name"])
Here is the output for the code that doesn't work:
"/path/to/file/with possible space in directory/or with space in name"
ls: cannot access "/path/to/file/with possible space in directory/or with space in name" No such file or directory
Traceback (most recent call last):
File "./archive_test.py", line 12, in <module>
archivelist = subprocess.check_output(['ls', '-l', quotedline])
File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['ls', '-l', '"/path/to/file/with possible space in directory/or with space in name"']' returned non-zero exit status 2
Before you ask - yes, I have already verified that "/path/to/file/with possible space in directory/or with space in name" does in fact exist by running 'ls -l' from the command line.
Any help would be appreciated. Thanks in advance.

in the first command (which is the best option there is):
archivelist = subprocess.check_output(['ls', '-l', "/path/to/file/with possible space in directory/or with space in name"])
the third argument is actually /path/to/file/with possible space in directory/or with space in name (without quotes) which is the filename that exists, and the command works.
Since shell=True isn't even set, the command is directly passed to exec, with the arguments passed as-is: the spaces & other chars are preserved.
If you add more quotes, they're not removed and they're passed literally to ls.
Since there's no such file called "/path/to/file/with possible space in directory/or with space in name" (with quotes), the file/dir isn't found.
There's another (dirty) way of calling a command: passing the full command as a string (not as a list of parameters). In that case, that would work (without shell=True at least on Windows, subprocess seems to handle the argument splitting, shell=True seems to be required on Unix-like systems):
subprocess.check_output('ls -l "/path/to/file/with possible space in directory/or with space in name"')
but your first approach is cleaner, specially if you don't know the directory name because it's a parameter. Let subprocess do the heavy lifting for you.
On Unix-like systems, using this last approach requires shell=True, but then you're exposing your program to malicious attacks like any open system call (appending ;rm -rf / to the filename, evaluating sub-shells for instance)
Final note: if you're really planning to use ls and parse its output, don't do it (http://mywiki.wooledge.org/ParsingLs), use standard os.listdir, os.path.getsize/getmtime & os.stat calls to get the information you need.

Related

I am trying to print the last line of every file in a directory using shell command from python script

I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory

Calling python script (that uses subprocess to call grep) from sub-directory causes error

I'm calling grep via a python script, and storing the output into a list. I only want grep to search files with specific file extensions, i.e. .c, .cpp, and .h. I'm using the function subprocess.check_output()
(grep_pattern.py):
#!/usr/bin/env python
import subprocess
grep_str = subprocess.check_output(['grep', 'rl', '--include=*.{c,cpp,h}', 'pattern', '.'])
print grep_str
The code works fine when I call grep_pattern.py from the same directory, i.e. ./grep_pattern or from any directories above it, e.g. ./scripts/python/grep_pattern.py, or ./python/grep_pattern.py. However, the code returns the following error if I call it from any directory below it, e.g. ../grep_pattern.py or ../../grep_pattern.py
File "./grep_include.py", line 7, in <module>
ls_output_str = subprocess.check_output(['grep', '-rl', '--include=*.{c,cpp,h}', 'pattern', '.'])
File "/<path>/lib/python2.7/subprocess.py", line 575, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['grep', '-rl', '--include=*.{c,cpp,h}', 'pattern', '/projects/<path>/APACHE\n']' returned non-zero exit status 1
What's strange is if I remove the '--include=*.{c,cpp,h}' option, the code works regardless of where it is called from.
Curly braces aren't part of pattern-match syntax, and grep's --include and --exclude arguments don't handle them. Instead, they're an instruction to the shell to create several variants of the argument/word in which they exist. You have no shell here, so nothing is honoring these instructions.
Expanding them out might look as follows:
grep_str = subprocess.check_output(
['grep', '-rl',
'--include=*.c',
'--include=*.cpp',
'--include=*.h',
'pattern', '.'])
With this done, the --include pattern will actually be able to match filenames that exist on your disk, such that your grep call might be able to successfully find a match for the pattern, at which point it can return a result of success.
By the way -- the \n on the end of the path given in your exception is a bit of a red flag. Unless you actually have a directory name ending in a literal newline (which is possible, but rare), ensure that you're properly stripping trailing newlines when received from readline() or similar.

how to handle white space in filename when using subprocess in python

I'm using subprocess to remove files in python, some of the file name has white space in it. How could I handle this?
For example, I have a file named '040513 data.txt'
subprocess.call(['rm', '040513 data.txt'], shell=True)
But I got error like IOError: [Errno 2] No such file or directory
How could I fix this?
You can also pass a list of args to call. This takes cares of parameters and also you avoid the shell=True security issue:
subprocess.call(['rm', '040513 data.txt'])
If for any reason you wanted to use shell=True then you could also use quotes to escape blanks as you would do in a shell:
subprocess.call('rm "040513 data.txt"', shell=True)
You can escape the whitespace, something like:
cmd = "rm 040513\ data.txt"
subprocess.call(cmd, shell=True)

Passing a filename with an apostrophe into scp using python

I'm trying to write a python script to copy files from a remote server to a local directory via scp.
Because I'm running this on an OpenELEC distribution (minimal HTPC linux distro, read-only filesystem except for userhome makes it impractical to install python ssh module), I'm doing this ugly and just passing the filename to the scp command via os.system.
SCPCopy = "scp -c blowfish -C user#host:\"" + pipes.quote(file) + "\" /storage/downloads/incoming/"
SCPCopy = SCPCopy.replace('\n','')
os.system(SCPCopy)
This works, except for filenames containing an apostrophe.
Below is an example of what gets passed to os.system in a file with an apostrophe:
scp -c blowfish -C user#host:"'/media/sdi1/home/data/bob'"'"'s file.avi'" /storage/downloads/incoming/
And the error:
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
It looks pipes.quote(x) is escaping the apostrophe (as it should), but obviously the syntax is still incorrect. I've experimented ditching pipes.quote(x) and replacing apostrophes with /' but that isn't getting me anywhere either.
As scp is based on SSH, the filenames you give to it are subject to shell escaping on the remote side as well. Thus you need to escape twice.
A correctly escaped cmdline for the shell:
scp -c blowfish -C user#host:"\"/media/sdi1/home/data/bob's file\"" /storage/.../
To make a python string, we have to add one more level of escaping. To stay sane, we could use triple-quotes:
"""scp -c blowfish -C user#host:"\"/media/sdi1/home/data/bob's file\"" /storage/.../"""
If you do it programmatically (e.g. using the deprecated pipes.quote), then don't touch the filename at all (in your example above, you added apostrophes around the filename).
fp = "/media/sdi1/home/data/bob's file.avi"
fp = "user#host:" + pipes.quote(pipes.quote(fp))
cmdline = "scp -c blowfish -C " + fp + " /storage/downloads/incoming/"
os.system(cmdline)
This is admittedly confusing. For a simple model, the whole point of pipes.quote is to escape the input so that the input will be parsed by the shell as exactly one word, which is equal to the input.
The following is a more generally correct way (and yields the same result):
fp = "/media/sdi1/home/data/bob's file.avi"
# the filepath argument escaped for ssh/scp on the remote side
fp = pipes.quote(fp)
commandargs = ["scp", "-c", "blowfish", "-C", "user#host:"+fp, "/storage/downloads/incoming/"]
# escape all words for the local shell, and then concatenate space-separated
cmdline = " ".join(map(pipes.quote, commandargs))
os.system(cmdline)
It expresses more clearly the intent: Controlling what words exactly the shell will parse.
But why start with a shell in the first place? We don't need one and can save the escaping on the local side. To spawn a process with our args, directly, use commands from the os.exec* family.
fp = pipes.quote("/media/sdi1/home/data/bob's file.avi")
commandargs = ["scp", "-c", "blowfish", "-C", "user#host:"+fp, "/storage/downloads/incoming/"]
if os.fork() == 0:
os.execvp("scp", commandargs)

Passing shell commands with Python os.system() or subprocess.check_call()

I'm trying to call 'sed' from Python and having troubles passing the command line via either subprocess.check_call() or os.system().
I'm on Windows 7, but using the 'sed' from Cygwin (it's in the path).
If I do this from the Cygwin shell, it works fine:
$ sed 's/&nbsp;/\ /g' <"C:foobar" >"C:foobar.temp"
In Python, I've got the full pathname I'm working with in "name". I tried:
command = r"sed 's/&nbsp;/\ /g' " + "<" '\"' + name + '\" >' '\"' + name + '.temp' + '\"'
subprocess.check_call(command, shell=True)
All the concatenation is there to make sure I have double quotes around the input and output filenames (in case there are blank spaces in the Windows file path).
I also tried it replacing the last line with:
os.system(command)
Either way, I get this error:
sed: -e expression #1, char 2: unterminated `s' command
'amp' is not recognized as an internal or external command,
operable program or batch file.
'nbsp' is not recognized as an internal or external command,
operable program or batch file.
Yet, as I said, it works OK from the console. What am I doing wrong?
The shell used by subprocess is probably not the shell you want. You can specify the shell with executable='path/to/executable'. Different shells have different quoting rules.
Even better might be to skip subprocess altogether, and write this as pure Python:
with open("c:foobar") as f_in:
with open("c:foobar.temp", "w") as f_out:
for line in f_in:
f_out.write(line.replace('&nbsp;', ' '))
I agree with Ned Batchelder's assessment, but think what you might want to consider using the following code because it likely does what you ultimately want to accomplish which can be done easily with the help of Python's fileinput module:
import fileinput
f = fileinput.input('C:foobar', inplace=1)
for line in f:
line = line.replace('&nbsp;', ' ')
print line,
f.close()
print 'done'
This will effectively update the given file in place as use of the keyword suggests. There's also an optional backup= keyword -- not used above -- which will save a copy of the original file if desired.
BTW, a word of caution about using something like C:foobar to specify the file name because on Windows it means a file of that name in whatever the current directory is on drive C:, which might not be what you want.
I think you'll find that, in Windows Python, it's not actually using the CygWin shell to run your command, it's instead using cmd.exe.
And, cmd doesn't play well with single quotes the way bash does.
You only have to do the following to confirm that:
c:\pax> echo hello >hello.txt
c:\pax> type "hello.txt"
hello
c:\pax> type 'hello.txt'
The system cannot find the file specified.
I think the best idea would be to use Python itself to process the file. The Python language is a cross-platform one which is meant to remove all those platform-specific inconsistencies, such as the one you've just found.

Categories