I've seen a lot of other pages here that talk output grep output to a file but I can't seem to make any of those work.
I have
subprocess.run(['grep', '-o', searchFor, filename])
if this wasn't a subprocess I would use something like
grep -o searchfor filename >> outputfile.txt
and when I try to use > or >> or anything else in the subprocess, I can't get it to output to a simple txt file. I assume that this is because most all of the pages I've seen here are writing to a file from regular command compared to me trying within the subprocess. My guess is that my syntax is wrong. Where should > or >> or whatever I should be using go? I've tried after the ] and before and many other combinations.
Open the file in write (>) or append (>>) mode, and assign the descriptor associated with it to stdout in subprocess.run call.
with open('outputfile.txt', 'w') as fd:
subprocess.run(['grep', '-o', searchFor, filename], stdout=fd)
Related
I wrote a program which searches for the oldest logs, and then I want to check the logs, if there have for example logs from the date "Jul 30 22:40".
I would like to delete these logs.
But i did not find something like this here or somewhere else.
Could you maybe help me?
var = subprocess.Popen('find /var/log/syslog* -mtime +%i' % specific_delete_range, stderr=subprocess.PIPE, stdout=subprocess.PIPE, shell=True)
out, err = var.communicate()
out = out.decode('ascii')
for line in out.split():
firstresult.append(line)
for element in firstresult:
with gzip.open(element, 'rb') as f:
for line in f:
if my_str_as_bytes in line:
rightlines.append(line)
So the lines, which are in the list "rightlines" , should be deleted.
It is not possible to 'delete lines' in the middle of the file. Even if this was possible for regular file, it will not be possible to do it for compressed file because the compress file is composed of 'blocks', and it is very likely that blocks will not be aligned on line boundaries.
As an alternative, consider extracting the content to be left in the file into new file, and then renaming the new file to override the old file.
The following bash script look for the pattern "P" in zipped log files, and replace the content with a new file that doe not have lines with the pattern "P".
Note: The script will not handle uncompressed file (similar to the way the OP script works). The pattern /var/log/syslog* was modified to select only compressed files (/var/log/syslog*.gz). This may need adjustment based on actual suffix used for compressed files.
days=30 # Change to whatever file age
P="Jul 30 22:40" # Pattern to remove
P=
for file in $(zfgrep -l "$P" $(find /var/log/syslog*.gz -mtime +$days)) ; do
# Extract content, re-compress and overwrite old files
zfgrep -v "$P" $file | gzip > $file.new && mv $file.new $file
done
In some sense doing this in Python is mildly crazy when it's so much easier to do succinctly in shell script. But here is a go at refactoring your code.
You generally should avoid subprocess.Popen() if you can; your code would be easier and more idiomatic with subprocess.run(). But in this case, when find can potentially return a lot of matches, we might want to process the files as they are reported, rather than wait for the subprocess to finish and then collect its output. Using code from this Stack Overflow answer, and adapting in accordance with Actual meaning of 'shell=True' in subprocess to avoid the shell=True, try something like
#!/usr/bin/env python3
from subprocess import Popen, PIPE
import gzip
from tempfile import NamedTemporaryFile
import shutil
import os
with Popen(
['find' '/var/log', '--name=syslog*', '-mtime', '+' + specific_delete_range],
stdout=PIPE, bufsize=1, text=True) as p:
for filename in p.stdout:
filename = filename.rstrip('\n')
temp = NamedTemporaryFile(delete=False)
with gzip.open(filename, 'rb') as f, gzip.open(temp, 'wb') as z:
for line in f:
if my_str_as_bytes not in line:
z.write(line)
os.unlink(filename)
shutil.copy(temp, filename)
os.unlink(temp)
With text=True we don't have to decode the output from Popen. The lines from gzip are still binary bytes; we could decode them, of course, but instead encoding the search string into bytes, as you have done, is more efficient.
The beef here is using a temporary file for the filtered result, and then moving it back on top over the original file once we are done writing it.
NamedTemporaryFile has some sad quirks on Windows, but lucky for you, you are not on Windows.
I am trying to make an output txtfile by using subprocess in Python 3.6 but the thing is that the documention does not really show me how to code in Windows. For instance,
import subprocess
subprocess.run(["ls", "-l"])
FileNotFoundError: [WinError 2] The system cannot find the file specified
Does not work on my computer somehow and neither other examples.
Could you kindly give me some hints to complete this code?
f = open('output.txt', 'w')
subprocess.check_output( ? , shell=True, ? )
print("Example")
print("Example")
f.close()
EDIT: make sure you use subprocess.run or subprocess.Popen
Windows differences aside (like martineau said in the comments to your OP, ls won't work on Windows, you need to use the dir command), you want to use subprocess.PIPE to be able to store the output of a command to a variable. Then you should be able to iterate through that variable, storing it in a file, something like:
# Save the output of the `dir` command
var = subprocess.run( <dir_command>, stdout=subprocess.PIPE, shell=True)
# Iterate through the lines saved in `var`, and write them to `file`, line by line
with open('path/to/file', 'a') as file:
for line in var:
file.write(line)
file.close()
I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory
I am trying to run the bash command pdfcrack in Python on a remote server. This is my code:
bashCommand = "pdfcrack -f pdf123.pdf > myoutput.txt"
import subprocess
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
I, however, get the following error message:
Non-option argument myoutput2.txt
Error: file > not found
Can anybody see my mistake?
The first argument to Popen is a list containing the command name and its arguments. > is not an argument to the command, though; it is shell syntax. You could simply pass the entire line to Popen and instruct it to use the shell to execute it:
process = subprocess.Popen(bashCommand, shell=True)
(Note that since you are redirecting the output of the command to a file, though, there is no reason to set its standard output to a pipe, because there will be nothing to read.)
A better solution, though, is to let Python handle the redirection.
process = subprocess.Popen(['pdfcrack', '-f', 'pdf123.pdf'], stdout=subprocess.PIPE)
with open('myoutput.txt', 'w') as fh:
for line in process.stdout:
fh.write(line)
# Do whatever else you want with line
Also, don't use str.split as a replacement for the shell's word splitting. A valid command line like pdfcrack -f "foo bar.pdf" would be split into the incorrect list ['pdfcrack', '-f', '"foo', 'bar.pdf"'], rather than the correct list ['pdfcrack', '-f', 'foo bar.pdf'].
> is interpreted by shell, but not valid otherwise.
So, that would work (don't split, use as-is):
process = subprocess.Popen(bashCommand, shell=True)
(and stdout=subprocess.PIPE isn't useful since all output is redirected to the output file)
But it could be better with native python for redirection to output file and passing arguments as list (handles quote protection if needed)
with open("myoutput.txt","w") as f:
process = subprocess.Popen(["pdfcrack","-f","pdf123.pdf"], stdout=subprocess.PIPE)
f.write(process.read())
process.wait()
Your mistake is > in command.
It doesn't treat this as redirection to file because normally bash does it and now you run it without using bash.
Try with shell=True if you whan to use bash. And then you don't have to split command into list.
subprocess.Popen("pdfcrack -f pdf123.pdf > myoutput.txt", shell=True)
I am trying to run grep command from my Python module using the subprocess library. Since, I am doing this operation on the doc file, I am using Catdoc third party library to get the content in a plan text file. I want to store the content in a file. I don't know where I am going wrong but the program fails to generate a plain text file and eventually to get the grep result. I have gone through the error log but its empty. Thanks for all the help.
def search_file(name, keyword):
#Extract and save the text from doc file
catdoc_cmd = ['catdoc', '-w' , name, '>', 'testing.txt']
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
output = catdoc_process.communicate()[0]
grep_cmd = []
#Search the keyword through the text file
grep_cmd.extend(['grep', '%s' %keyword , 'testing.txt'])
print grep_cmd
p = subprocess.Popen(grep_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
stdoutdata = p.communicate()[0]
print stdoutdata
On UNIX, specifying shell=True will cause the first argument to be treated as the command to execute, with all subsequent arguments treated as arguments to the shell itself. Thus, the > won't have any effect (since with /bin/sh -c, all arguments after the command are ignored).
Therefore, you should actually use
catdoc_cmd = ['catdoc -w "%s" > testing.txt' % name]
A better solution, though, would probably be to just read the text out of the subprocess' stdout, and process it using re or Python string operations:
catdoc_cmd = ['catdoc', '-w' , name]
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
for line in catdoc_process.stdout:
if keyword in line:
print line.strip()
I think you're trying to pass the > to the shell, but that's not going to work the way you've done it. If you want to spawn a process, you should arrange for its standard out to be redirected. Fortunately, that's really easy to do; all you have to do is open the file you want the output to go to for writing and pass it to popen using the stdout keyword argument, instead of PIPE, which causes it to be attached to a pipe which you can read with communicate().