I would like to embed a command in a python script and capture the output. In this scenario I'm trying to use "find" to find an indeterminate number of files in an indeterminate number of subdirs, and grep each matching file for a string, something like:
grep "rabbit" `find . -name "*.txt"`
I'm running Python 2.6.6 (yeah, I'm sorry too, but can't budge the entire organization for this right now).
I've tried a bunch of things using subprocess, shlex, etc. that have been suggested in here, but I haven't found a syntax that will either swallow this, or ends up sucking the "find..."as the search string forgrep`, etc. Suggestions appreciated.
Ken
import subprocess
find_p = subprocess.Popen(["find", ".", "-name", "*.txt"], stdout=subprocess.PIPE)
grep_p = subprocess.Popen(["xargs", "grep", "rabbit"], stdin=find_p.stdout)
grep_p.wait()
Related
I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory
I am trying to execute a python script on all text files in a folder:
for fi in sys.argv[1:]:
And I get the following error
-bash: /usr/bin/python: Argument list too long
The way I call this Python function is the following:
python functionName.py *.txt
The folder has around 9000 files. Is there some way to run this function without having to split my data in more folders etc? Splitting the files would not be very practical because I will have to execute the function in even more files in the future... Thanks
EDIT: Based on the selected correct reply and the comments of the replier (Charles Duffy), what worked for me is the following:
printf '%s\0' *.txt | xargs -0 python ./functionName.py
because I don't have a valid shebang..
This is an OS-level problem (limit on command line length), and is conventionally solved with an OS-level (or, at least, outside-your-Python-process) solution:
find . -maxdepth 1 -type f -name '*.txt' -exec ./your-python-program '{}' +
...or...
printf '%s\0' *.txt | xargs -0 ./your-python-program
Note that this runs your-python-program once per batch of files found, where the batch size is dependent on the number of names that can fit in ARG_MAX; see the excellent answer by Marcus Müller if this is unsuitable.
No. That is a kernel limitation for the length (in bytes) of a command line.
Typically, you can determine that limit by doing
getconf ARG_MAX
which, at least for me, yields 2097152 (bytes), which means about 2MB.
I recommend using python to work through a folder yourself, i.e. giving your python program the ability to work with directories instead of individidual files, or to read file names from a file.
The former can easily be done using os.walk(...), whereas the second option is (in my opinion) the more flexible one. Use the argparse module to give your python program an easy-to-use command line syntax, then add an argument of a file type (see reference documentation), and python will automatically be able to understand special filenames like -, meaning you could instead of
for fi in sys.argv[1:]
do
for fi in opts.file_to_read_filenames_from.read().split(chr(0))
which would even allow you to do something like
find -iname '*.txt' -type f -print0|my_python_program.py -file-to-read-filenames-from -
Don't do it this way. Pass mask to your python script (e.g. call it as python functionName.py "*.txt") and expand it using glob (https://docs.python.org/2/library/glob.html).
I think about using glob module. With this module you invoke your program like:
python functionName.py "*.txt"
then shell will not expand *.txt into file names. You Python program will receive *.txt in argumens list and you can pass it into glob.glob():
for fi in glob.glob(sys.argv[1]):
...
I am trying to make a python script that will open a directory, apply a perl script to every file in that directory and get its out put in either multiple text files or just one.
I currently have:
import shlex, subprocess
arg_str = "perl tilt.pl *.pdb > final.txt"
arg = shlex.split(arg_str)
import os
framespdb = os.listdir("prac_frames")
for frames in framespdb:
subprocess.Popen(arg, stdout=True)
I keep getting *.pdb not found. I am very new to all of this so any help trying to complete this script would help.
*.pdb not found means exactly that - there won't be a *.pdb in whatever directory you're running the script... and as I read the code - I don't see anything to imply it's within 'frames' when it runs the perl script.
you probably need os.chdir(path) before the Popen.
How do I "cd" in Python?
...using a python script to run somewhat dubious syscalls to perl may offend some people but everyone's done it.. aside from that I'd point out:
always specify full paths (this becomes a problem if you will later say, want to run your job automatically from cron or an environment that doesn't have your PATH).
i.e. 'which perl' - put that full path in.
./.pdb would be better but not as good as the fullpath/.pdb (which you could use instead of the os.chdir option).
subprocess.Popen(arg, stdout=True)
does not expand filename wildcards. To handle your *.pdb, use shell=True.
I am trying to match the following string and not having any luck. Below you will find my attempt.
LOG FORMAT:
riskserver.2014-04-07-08:45:01.log
I think I only will need the year month and date. So I was attempting a wildcard * which python 2.7 does not seem to like.
cmd = 'tail -n10000 /opt/rubedo/log/riskserver.'+nowFormat+*'
Help is very much appreciated here. Thanks, I hope I explained this well, and some can understand.
I am using subprocess with grep involved.
tail: cannot open `/opt/rubedo/log/riskserver.2014-04-08' for reading: No such file or directory
grep: not: No such file or directory
EDIT:
now = datetime.datetime.now().strftime("%H:%M:%S")
nowFormat = datetime.datetime.now().strftime("%Y\-%m\-%d")
glob is the splat (*) , eg ls *.txt gets processed by the linux shell into ls f1.txt f2.txt f3.txt f4.txt ...
so that ls actually recieves a list of files that match not the matching string. that is what they mean in the comments
nowFormat = "2014-04-07"
cmd = 'tail -n10000 /opt/rubedo/log/riskserver.'+nowFormat+'*'
os.system(cmd) #this will execute it through your linux shell you should see the output, allthough this call will not give you access to the output in python
or in python
import glob
fnames = glob.glob('/opt/rubedo/log/riskserver.'+nowFormat+'*')
print fnames
I have renamed a css class name in a number of (python-django) templates. The css files however are wide-spread across multiple files in multiple directories. I have a python snippet to start renaming from the root dir and then recursively rename all the css files.
from os import walk, curdir
import subprocess
COMMAND = "find %s -iname *.css | xargs sed -i s/[Ff][Oo][Oo]/bar/g"
test_command = 'echo "This is just a test. DIR: %s"'
def renamer(command):
print command # Please ignore the print commands.
proccess = subprocess.Popen(command.split(), stdout = subprocess.PIPE)
op = proccess.communicate()[0]
print op
for root, dirs, files in walk(curdir):
if root:
command = COMMAND % root
renamer(command)
It doesn't work, gives:
find ./cms/djangoapps/contentstore/management/commands/tests -iname *.css | xargs sed -i s/[Ee][Dd][Xx]/gurukul/g
find: paths must precede expression: |
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
find ./cms/djangoapps/contentstore/views -iname *.css | xargs sed -i s/[Ee][Dd][Xx]/gurukul/g
find: paths must precede expression: |
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
When I copy and run the same command (printed above), find doesn't error out and sed either gets no input files or it works.
What is wrong with the python snippet?
You're not trying to run a single command, but a shell pipeline of multiple commands, and you're trying to do it without invoking the shell. That can't possibly work. The way you're doing this, | is just one of the arguments to find, which is why find is telling you that it doesn't understand that argument with that "paths must precede expression: |" error.
You can fix that by adding shell=True to your Popen.
But a better solution is to do the pipeline in Python and keep the shell out of it. See Replacing Older Functions with the subprocess Module in the docs for an explanation, but I'll show an example.
Meanwhile, you should never use split to split a command line. The best solution is to write the list of separate arguments instead of joining them up into a string just to split them out. If you must do that, use the shlex module; that's what it's for. But in your case, even that won't help you, because you're inserting random strings verbatim, which could easily have spaces or quotes in them, and there's no way anything—shlex or otherwise—can reconstruct the data in the first place.
So:
pfind = Popen(['find', root, '-iname', '*.css'], stdout=PIPE)
pxargs = Popen(['xargs', 'sed', '-i', 's/[Ff][Oo][Oo]/bar/g'],
stdin=pfind.stdout, stdout=PIPE)
pfind.stdout.close()
output = pxargs.communicate()
But there's an even better solution here.
Python has os.walk to do the same thing as find, you can simulate xargs easily, but there's really no need to do so, and it has its own re module to use instead of sed. So, why not use them?
Or, conversely, bash is much better at driving and connecting up simple commands than Python, so if you'd rather use find and sed instead of os.walk and re.sub, why write the driving script in Python in the first place?
The problem is the pipe. To use a pipe with the subprocess module, you have to pass shell=True.