I'm working on a script to call an executable for i/o files. I'm using subprocess and trying to shell out the exe and the stdout to a log file. Problem is I would like to output a simple ascii file and I'm getting a hexadecimal file. Just really learning to program python (or any language for that matter) so, I'm assuming there some type of formatting I can do but I just don't get it. I've done a fair bit of searching on this site and others but I haven't anything like what I'm using subprocess for. The "outRadcorr" is what I need help on the most...Any ideas? More code on request.
Import system modules
import os, sys, string, traceback, time, datetime
import params
from subprocess import Popen, PIPE, STDOUT
...some code here.....
Write stdout to log file
rad_log_file = open(dsFolder + '\\radcorr.log', 'w')
# loop through the files in raw file list to run radiometric correction
for rawfiles in rawFolderList:
# Define the file base
rawBase = rawfiles.split(".")[0]
print ('\nProcessing file: %s \n')%( rawBase )
# define variables from raw file to process radcorr
radFile = rawfiles
pixFile = ('%s.pix')%( rawBase )
attFile = ('%s.att')%( rawBase )
# create windose bat file function
def rad_bat_writer( radcorr_bat ):
with open(dsFolder + '\\radcorr.bat', 'a') as rad_bat_file:
rad_bat_file.write(radcorr_bat + '\n')
# grab radcor input/output files and run radcorr.exe
radcorr = ('C:\\itres\\bin\\radcorr.exe -f 1 -j 100 -g 50 -s -1 -n -1 %s %s %s -i '+ rawFolder + '\%s,rb -o ' + radFolder + '\%s -a ' \
+ radFolder + '\%s -c C:\\itres\\rad_cal_files\\%s -I 0 -v 0 -r Y -R Y -^^ 2') %( sum,scatter,shift,radFile,pixFile,attFile,rad_prefix )
# print out radcorr command
print radcorr
# Execute radcorr and write stdout
outRadcorr = Popen("{};".format(radcorr), stdout=PIPE, stderr=STDOUT)
for line in outRadcorr.stdout:
sys.stdout.write(line)
rad_log_file.write(line)
# write output to log
rad_bat_writer( radcorr )
# Close out exe and log files
outRadcorr.wait()
rad_log_file.close()
I was using UltraEdit to view the file, which you can view hex files in. I was not looking in hex mode. I might be confusing the terminology here. File looks normal in NotePad/NotePad++/WordPad etc. As I ran the script in debug mode, I could hit the loop file by file. First two files (1GB/file), the output log looked fine. Once the radcorr.log file was over 10kb, I change from a normal ascii output to this binary file viewed that looked hex. I can't post images yet, but just google ultraedit hexadecimal.
Still not sure why it moved to this format. Finally, output size was 45kb. I change to view/edit hex mode in UltraEdit and it looks fine. Just wanted to get it out there to see if others had any ideas why when I specified the log to be 'w' and not 'wb', for instance.
I do appreciate all you help. #J.F. Sebastian I'll have to test the code you posted, probably help fix potential bugs down the road.
Related
I wrote a program which searches for the oldest logs, and then I want to check the logs, if there have for example logs from the date "Jul 30 22:40".
I would like to delete these logs.
But i did not find something like this here or somewhere else.
Could you maybe help me?
var = subprocess.Popen('find /var/log/syslog* -mtime +%i' % specific_delete_range, stderr=subprocess.PIPE, stdout=subprocess.PIPE, shell=True)
out, err = var.communicate()
out = out.decode('ascii')
for line in out.split():
firstresult.append(line)
for element in firstresult:
with gzip.open(element, 'rb') as f:
for line in f:
if my_str_as_bytes in line:
rightlines.append(line)
So the lines, which are in the list "rightlines" , should be deleted.
It is not possible to 'delete lines' in the middle of the file. Even if this was possible for regular file, it will not be possible to do it for compressed file because the compress file is composed of 'blocks', and it is very likely that blocks will not be aligned on line boundaries.
As an alternative, consider extracting the content to be left in the file into new file, and then renaming the new file to override the old file.
The following bash script look for the pattern "P" in zipped log files, and replace the content with a new file that doe not have lines with the pattern "P".
Note: The script will not handle uncompressed file (similar to the way the OP script works). The pattern /var/log/syslog* was modified to select only compressed files (/var/log/syslog*.gz). This may need adjustment based on actual suffix used for compressed files.
days=30 # Change to whatever file age
P="Jul 30 22:40" # Pattern to remove
P=
for file in $(zfgrep -l "$P" $(find /var/log/syslog*.gz -mtime +$days)) ; do
# Extract content, re-compress and overwrite old files
zfgrep -v "$P" $file | gzip > $file.new && mv $file.new $file
done
In some sense doing this in Python is mildly crazy when it's so much easier to do succinctly in shell script. But here is a go at refactoring your code.
You generally should avoid subprocess.Popen() if you can; your code would be easier and more idiomatic with subprocess.run(). But in this case, when find can potentially return a lot of matches, we might want to process the files as they are reported, rather than wait for the subprocess to finish and then collect its output. Using code from this Stack Overflow answer, and adapting in accordance with Actual meaning of 'shell=True' in subprocess to avoid the shell=True, try something like
#!/usr/bin/env python3
from subprocess import Popen, PIPE
import gzip
from tempfile import NamedTemporaryFile
import shutil
import os
with Popen(
['find' '/var/log', '--name=syslog*', '-mtime', '+' + specific_delete_range],
stdout=PIPE, bufsize=1, text=True) as p:
for filename in p.stdout:
filename = filename.rstrip('\n')
temp = NamedTemporaryFile(delete=False)
with gzip.open(filename, 'rb') as f, gzip.open(temp, 'wb') as z:
for line in f:
if my_str_as_bytes not in line:
z.write(line)
os.unlink(filename)
shutil.copy(temp, filename)
os.unlink(temp)
With text=True we don't have to decode the output from Popen. The lines from gzip are still binary bytes; we could decode them, of course, but instead encoding the search string into bytes, as you have done, is more efficient.
The beef here is using a temporary file for the filtered result, and then moving it back on top over the original file once we are done writing it.
NamedTemporaryFile has some sad quirks on Windows, but lucky for you, you are not on Windows.
I am using Inkscape to take an input single page pdf file and to output an svg file. The following works from the command line
c:\progra~1\Inkscape\inkscape -z -f "N:\pdf_skunkworks\inflation-report-may-2018-page0.pdf" -l "N:\pdf_skunkworks\inflation-report-may-2018-page0.svg"
where -z is short for --without-gui, -f is short for input file, -l is short for --export-plain-svg. And that works from command line.
I could not get the equivalent to work from Python, either passing the command line as one long string or as separate arguments. stderr and stdout give no error as they both print None
import subprocess #import call,subprocess
#completed = subprocess.run(["c:\Progra~1\Inkscape\Inkscape.exe",r"-z -f \"N:\pdf_skunkworks\inflation-report-may-2018-page0.pdf\" -l \"N:\pdf_skunkworks\inflation-report-may-2018-page0.svg\""])
completed = subprocess.run(["c:\Progra~1\Inkscape\Inkscape.exe","-z", r"-f \"N:\pdf_skunkworks\inflation-report-may-2018-page0.pdf\"" , r"-l \"N:\pdf_skunkworks\inflation-report-may-2018-page0.svg\""])
print ("stderr:" + str(completed.stderr))
print ("stdout:" + str(completed.stdout))
Just to test OS plumbing I wrote some VBA code (my normal language) and this works
Sub TestShellToInkscape()
'* Tools->References->Windows Script Host Object Model (IWshRuntimeLibrary)
Dim sCmd As String
sCmd = "c:\progra~1\Inkscape\inkscape -z -f ""N:\pdf_skunkworks\inflation-report-may-2018-page0.pdf"" -l ""N:\pdf_skunkworks\inflation-report-may-2018-page0.svg"""
Debug.Print sCmd
Dim oWshShell As IWshRuntimeLibrary.WshShell
Set oWshShell = New IWshRuntimeLibrary.WshShell
Dim lProc As Long
lProc = oWshShell.Run(sCmd, 0, True)
End Sub
So I'm obviously doing something silly in the Python code. I'm sure experienced Python programmer could solve easily.
Swap your slashes:
import subprocess #import call,subprocess
completed = subprocess.run(['c:/Progra~1/Inkscape/Inkscape.exe',
'-z',
'-f', r'N:/pdf_skunkworks/inflation-report-may-2018-page0.pdf' ,
'-l', r'N:/pdf_skunkworks/inflation-report-may-2018-page0.svg'])
print ("stderr:" + str(completed.stderr))
print ("stdout:" + str(completed.stdout))
Python knows to swap forward slashes for back slashes on windows OS, and your back slashes are currently acting as escape prefixes.
I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory
This is baffling me.
I have a python script that does some work on a Windows platform to generate an XML file, download some images and then call an external console application to generate a video using the xml and images the python script has generated.
The application I call with pOPen is supposed to return a status i.e. [success] or [invalid] or [fail] dependant on how it interprets the data I pass it.
If I use my generation script to generate the information and then call the console application separately in another script it works fine and I get success and a video is generated.
Successful code (please ignore the test prints!):
print ("running console app...")
cmd = '"C:/Program Files/PropertyVideos/propertyvideos.console.exe" -xml data/feed2820712.xml -mpeg -q normal'
print (cmd)
p = subprocess.Popen(cmd , stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output = p.communicate()[0]
print ("\n----\n[" + output + "]\n----\n")
if output == "[success]":
print "\nHURRAHHHHH!!!!!!!"
print ("finished...")
But if I include the same code at the end of the script that generates the info to feed the console application then it runs for about 2 seconds and output = []
Same code, just ran at the end of a different script...
EDIT:
Thanks to Dave, Dgrant and ThomasK it seems to be that the generate script is not closing the file as redirecting strerr to stdout shows:
Unhandled Exception: System.IO.IOException: The process cannot access the file '
C:\videos\data\feed2820712.xml' because it is being used by another process.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, I
nt32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions o
ptions, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy)
However I AM closing the file:
Extract from the generation script:
xmlfileObj.write('</FeedSettings>\n') # write the last line
xmlfileObj.close
# sleep to allow files to close
time.sleep(10)
# NOW GENERATE THE VIDEO
print ("initialising video...")
cmd = '"C:/Program Files/PropertyVideos/propertyvideos.console.exe" -xml data/feed2820712.xml -mpeg -q normal'
print (cmd)
p = subprocess.Popen(cmd , stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = p.communicate()[0]
print ("\n----\n[" + output + "]\n----\n")
if output == "[success]":
print "\nHURRAHHHHH!!!!!!!"
print ("finished...")
Any help would be appreciated.
You're not closing the file. Your code says:
xmlfileObj.close
when it should be
xmlfileObj.close()
Edit: Just to clarify - the code xmlfileObj.close is a valid python expression which returns a reference to the built in close method of a file (or file like) object. Since it is a valid python expression it is perfectly legal code, but it does not have any side effects. Specifically, it does not have the effect of actually calling the close() method. You need to include the open and close brackets to do that.
I have a script which connects to database and gets all records which statisfy the query. These record results are files present on a server, so now I have a text file which has all file names in it.
I want a script which would know:
What is the size of each file in the output.txt file?
What is the total size of all the files present in that text file?
Update:
I would like to know how can I achieve my task using Perl programming language, any inputs would be highly appreciated.
Note: I do not have any specific language constraint, it could be either Perl or Python scripting language which I can run from the Unix prompt. Currently I am using the bash shell and have sh and py script. How can this be done?
My scripts:
#!/usr/bin/ksh
export ORACLE_HOME=database specific details
export PATH=$ORACLE_HOME/bin:path information
sqlplus database server information<<EOF
SET HEADING OFF
SET ECHO OFF
SET PAGESIZE 0
SET LINESIZE 1000
SPOOL output.txt
select * from my table_name;
SPOOL OFF
EOF
I know du -h would be the command which I should be using but I am not sure how should my script be, I have tried something in python. I am totally new to Python and it's my first time effort.
Here it is:
import os
folderpath='folder_path'
file=open('output file which has all listing of query result','r')
for line in file:
filename=line.strip()
filename=filename.replace(' ', '\ ')
fullpath=folderpath+filename
# print (fullpath)
os.system('du -h '+fullpath)
File names in the output text file for example are like: 007_009_Bond Is Here_009_Yippie.doc
Any guidance would be highly appreciated.
Update:
How can I move all the files which are present in output.txt file to some other folder location using Perl ?
After doing step1, how can I delete all the files which are present in output.txt file ?
Any suggestions would be highly appreciated.
In perl, the -s filetest operator is probaby what you want.
use strict;
use warnings;
use File::Copy;
my $folderpath = 'the_path';
my $destination = 'path/to/destination/directory';
open my $IN, '<', 'path/to/infile';
my $total;
while (<$IN>) {
chomp;
my $size = -s "$folderpath/$_";
print "$_ => $size\n";
$total += $size;
move("$folderpath/$_", "$destination/$_") or die "Error when moving: $!";
}
print "Total => $total\n";
Note that -s gives size in bytes not blocks like du.
On further investigation, perl's -s is equivalent to du -b. You should probably read the man pages on your specific du to make sure that you are actually measuring what you intend to measure.
If you really want the du values, change the assignment to $size above to:
my ($size) = split(' ', `du "$folderpath/$_"`);
Eyeballing, you can make YOUR script work this way:
1) Delete the line filename=filename.replace(' ', '\ ') Escaping is more complicated than that, and you should just quote the full path or use a Python library to escape it based on the specific OS;
2) You are probably missing a delimiter between the path and the file name;
3) You need single quotes around the full path in the call to os.system.
This works for me:
#!/usr/bin/python
import os
folderpath='/Users/andrew/bin'
file=open('ft.txt','r')
for line in file:
filename=line.strip()
fullpath=folderpath+"/"+filename
os.system('du -h '+"'"+fullpath+"'")
The file "ft.txt" has file names with no path and the path part is '/Users/andrew/bin'. Some of the files have names that would need to be escaped, but that is taken care of with the single quotes around the file name.
That will run du -h on each file in the .txt file, but does not give you the total. This is fairly easy in Perl or Python.
Here is a Python script (based on yours) to do that:
#!/usr/bin/python
import os
folderpath='/Users/andrew/bin/testdir'
file=open('/Users/andrew/bin/testdir/ft.txt','r')
blocks=0
i=0
template='%d total files in %d blocks using %d KB\n'
for line in file:
i+=1
filename=line.strip()
fullpath=folderpath+"/"+filename
if(os.path.exists(fullpath)):
info=os.stat(fullpath)
blocks+=info.st_blocks
print `info.st_blocks`+"\t"+fullpath
else:
print '"'+fullpath+"'"+" not found"
print `blocks`+"\tTotal"
print " "+template % (i,blocks,blocks*512/1024)
Notice that you do not have to quote or escape the file name this time; Python does it for you. This calculates file sizes using allocation blocks; the same way that du does it. If I run du -ahc against the same files that I have listed in ft.txt I get the same number (well kinda; du reports it as 25M and I get the report as 24324 KB) but it reports the same number of blocks. (Side note: "blocks" are always assumed to be 512 bytes under Unix even though the actual block size on larger disc is always larger.)
Finally, you may want to consider making your script so that it can read a command line group of files rather than hard coding the file and the path in the script. Consider:
#!/usr/bin/python
import os, sys
total_blocks=0
total_files=0
template='%d total files in %d blocks using %d KB\n'
print
for arg in sys.argv[1:]:
print "processing: "+arg
blocks=0
i=0
file=open(arg,'r')
for line in file:
abspath=os.path.abspath(arg)
folderpath=os.path.dirname(abspath)
i+=1
filename=line.strip()
fullpath=folderpath+"/"+filename
if(os.path.exists(fullpath)):
info=os.stat(fullpath)
blocks+=info.st_blocks
print `info.st_blocks`+"\t"+fullpath
else:
print '"'+fullpath+"'"+" not found"
print "\t"+template % (i,blocks,blocks*512/1024)
total_blocks+=blocks
total_files+=i
print template % (total_files,total_blocks,total_blocks*512/1024)
You can then execute the script (after chmod +x [script_name].py) by ./script.py ft.txt and it will then use the path to the command line file as the assumed path to the files "ft.txt". You can process multiple files as well.
You can do it in your shell script itself.
You have all the files names in your spooled file output.txt, all you have to add at the end of existing script is:
< output.txt du -h
It will give size of each file and also a total at the end.
You can use the Python skeleton that you've sketched out and add os.path.getsize(fullpath) to get the size of individual file.
For example, if you wanted a dictionary with the file name and size you could:
dict((f, os.path.getsize(f)) for f in file)
Keep in mind that the result from os.path.getsize(...) is in bytes so you'll have to convert it to get other units if you want.
In general os.path is a key module for manipulating files and paths.