Find (bash command) doesn't work with subprocess? - python

I have renamed a css class name in a number of (python-django) templates. The css files however are wide-spread across multiple files in multiple directories. I have a python snippet to start renaming from the root dir and then recursively rename all the css files.
from os import walk, curdir
import subprocess
COMMAND = "find %s -iname *.css | xargs sed -i s/[Ff][Oo][Oo]/bar/g"
test_command = 'echo "This is just a test. DIR: %s"'
def renamer(command):
print command # Please ignore the print commands.
proccess = subprocess.Popen(command.split(), stdout = subprocess.PIPE)
op = proccess.communicate()[0]
print op
for root, dirs, files in walk(curdir):
if root:
command = COMMAND % root
renamer(command)
It doesn't work, gives:
find ./cms/djangoapps/contentstore/management/commands/tests -iname *.css | xargs sed -i s/[Ee][Dd][Xx]/gurukul/g
find: paths must precede expression: |
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
find ./cms/djangoapps/contentstore/views -iname *.css | xargs sed -i s/[Ee][Dd][Xx]/gurukul/g
find: paths must precede expression: |
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
When I copy and run the same command (printed above), find doesn't error out and sed either gets no input files or it works.
What is wrong with the python snippet?

You're not trying to run a single command, but a shell pipeline of multiple commands, and you're trying to do it without invoking the shell. That can't possibly work. The way you're doing this, | is just one of the arguments to find, which is why find is telling you that it doesn't understand that argument with that "paths must precede expression: |" error.
You can fix that by adding shell=True to your Popen.
But a better solution is to do the pipeline in Python and keep the shell out of it. See Replacing Older Functions with the subprocess Module in the docs for an explanation, but I'll show an example.
Meanwhile, you should never use split to split a command line. The best solution is to write the list of separate arguments instead of joining them up into a string just to split them out. If you must do that, use the shlex module; that's what it's for. But in your case, even that won't help you, because you're inserting random strings verbatim, which could easily have spaces or quotes in them, and there's no way anything—shlex or otherwise—can reconstruct the data in the first place.
So:
pfind = Popen(['find', root, '-iname', '*.css'], stdout=PIPE)
pxargs = Popen(['xargs', 'sed', '-i', 's/[Ff][Oo][Oo]/bar/g'],
stdin=pfind.stdout, stdout=PIPE)
pfind.stdout.close()
output = pxargs.communicate()
But there's an even better solution here.
Python has os.walk to do the same thing as find, you can simulate xargs easily, but there's really no need to do so, and it has its own re module to use instead of sed. So, why not use them?
Or, conversely, bash is much better at driving and connecting up simple commands than Python, so if you'd rather use find and sed instead of os.walk and re.sub, why write the driving script in Python in the first place?

The problem is the pipe. To use a pipe with the subprocess module, you have to pass shell=True.

Related

Save output of a command execution with subprocess

I am trying to get output of subprocess.Popen in variable.
It is working fine for pwd command, but not working for pwdx $(pgrep -U $USER -f SimpleHTTPServer) command.
This works:
(Pdb++) p = subprocess.Popen("pwd", stdout=subprocess.PIPE)
(Pdb++) result = p.communicate()[0]
(Pdb++) result
'xyz'
This is not working:
(Pdb++) subprocess.Popen("pwdx $(pgrep -U $USER -f SimpleHTTPServer)", stdout=subprocess.PIPE)
*** OSError: [Errno 2] No such file or directory
Can someone please let me know how can I save the output of it to a variable?
If you want to pass a command with arguments to Popen(), you have to pass it as a list, like so:
subprocess.Popen(['/bin/ls', '-lat'])
If you just pass a single string as in your example, it assumes the entire thing is the command name, and obviously there is no command literally named pwdx $(pgrep -U $USER -f SimpleHTTPServer).
As the docs and previous answers already state the command you want to execute via subprocess.Popen() needs to be passed as a list.
From the docs:
Note in particular that options (such as -input) and arguments (such
as eggs.txt) that are separated by whitespace in the shell go in
separate list elements, while arguments that need quoting or backslash
escaping when used in the shell (such as filenames containing spaces) are single list elements.
Another useful tip you can get from the docs is to use shlex.split() to help you to properly split your command into a list.
Beware though that the use of special shell parameters (e.g. $USER) might not work well with subprocess.Popen() unless you would set the shell=True option, which you shouldn't do without reading the doc's Security Considerations.

I am trying to print the last line of every file in a directory using shell command from python script

I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory

for fi in sys.argv[1:]: argument list too long

I am trying to execute a python script on all text files in a folder:
for fi in sys.argv[1:]:
And I get the following error
-bash: /usr/bin/python: Argument list too long
The way I call this Python function is the following:
python functionName.py *.txt
The folder has around 9000 files. Is there some way to run this function without having to split my data in more folders etc? Splitting the files would not be very practical because I will have to execute the function in even more files in the future... Thanks
EDIT: Based on the selected correct reply and the comments of the replier (Charles Duffy), what worked for me is the following:
printf '%s\0' *.txt | xargs -0 python ./functionName.py
because I don't have a valid shebang..
This is an OS-level problem (limit on command line length), and is conventionally solved with an OS-level (or, at least, outside-your-Python-process) solution:
find . -maxdepth 1 -type f -name '*.txt' -exec ./your-python-program '{}' +
...or...
printf '%s\0' *.txt | xargs -0 ./your-python-program
Note that this runs your-python-program once per batch of files found, where the batch size is dependent on the number of names that can fit in ARG_MAX; see the excellent answer by Marcus Müller if this is unsuitable.
No. That is a kernel limitation for the length (in bytes) of a command line.
Typically, you can determine that limit by doing
getconf ARG_MAX
which, at least for me, yields 2097152 (bytes), which means about 2MB.
I recommend using python to work through a folder yourself, i.e. giving your python program the ability to work with directories instead of individidual files, or to read file names from a file.
The former can easily be done using os.walk(...), whereas the second option is (in my opinion) the more flexible one. Use the argparse module to give your python program an easy-to-use command line syntax, then add an argument of a file type (see reference documentation), and python will automatically be able to understand special filenames like -, meaning you could instead of
for fi in sys.argv[1:]
do
for fi in opts.file_to_read_filenames_from.read().split(chr(0))
which would even allow you to do something like
find -iname '*.txt' -type f -print0|my_python_program.py -file-to-read-filenames-from -
Don't do it this way. Pass mask to your python script (e.g. call it as python functionName.py "*.txt") and expand it using glob (https://docs.python.org/2/library/glob.html).
I think about using glob module. With this module you invoke your program like:
python functionName.py "*.txt"
then shell will not expand *.txt into file names. You Python program will receive *.txt in argumens list and you can pass it into glob.glob():
for fi in glob.glob(sys.argv[1]):
...

Capture output of complex shell command in python

I would like to embed a command in a python script and capture the output. In this scenario I'm trying to use "find" to find an indeterminate number of files in an indeterminate number of subdirs, and grep each matching file for a string, something like:
grep "rabbit" `find . -name "*.txt"`
I'm running Python 2.6.6 (yeah, I'm sorry too, but can't budge the entire organization for this right now).
I've tried a bunch of things using subprocess, shlex, etc. that have been suggested in here, but I haven't found a syntax that will either swallow this, or ends up sucking the "find..."as the search string forgrep`, etc. Suggestions appreciated.
Ken
import subprocess
find_p = subprocess.Popen(["find", ".", "-name", "*.txt"], stdout=subprocess.PIPE)
grep_p = subprocess.Popen(["xargs", "grep", "rabbit"], stdin=find_p.stdout)
grep_p.wait()

Avoid subprocess.Popen auto escaping my backslashes in grep

I'm trying to write an svn pre-commit hook in python. Part of this involves checking the diff file to see if there are any actual file changes (as opposed to just property changes).
I have a working grep command which I can execute fine on the shell
grep "^\(Added: \|Modified: \|Deleted: \)" diff filename | grep -v 'svn:'
However when I put it through subprocess.POpen it escapes all my backslashes, which knackers the regexp.
Executing command: ['grep', '"^\\Added: \\|Modified: \\|Deleted: \\)", ...]
How do I avoid this?
NB: I'm aware that I can pipe results between subprocesses and I can do the two greps that way. I need help getting the first one working first though :/
NB2: I also tried using filterdiff --clean instead and couldn't get it to work. Searching for Added, Modified or Deleted lines, removing those with 'svn:' in and checking I had some results seemed to work though.
Python code:
command = ['grep', '"^\(Added: \|Modified: \|Deleted: \)"', filename]
sys.stdout.write('Executing command: %s\n' % (command))
p = subprocess.Popen(command,
stdin = subprocess.PIPE
stdout = subprocess.PIPE
stderr = subprocess.STDOUT
shell = True)
data = p.stdout.read()
if len(data) == 0:
sys.stdout.write("Diff does not contain any file modifications./n")
exit(0)
You need to consider what you want grep to see in its command line arguments.
The first argument needs to be the literal string "^\(Added: \|Modified: \|Deleted: \)", so that means that it shouldn't include the double quotes but should include the backslashes.
The way to express this kind of string is to use Python raw strings:
command = ['grep', r'^\(Added: \|Modified: \|Deleted: \)', filename]
A good way to check what you're actually running is to replace grep by echo so you can at least see what you're passing to the command.

Categories