File Manipulation: Scripting Question - python

I have a script which connects to database and gets all records which statisfy the query. These record results are files present on a server, so now I have a text file which has all file names in it.
I want a script which would know:
What is the size of each file in the output.txt file?
What is the total size of all the files present in that text file?
Update:
I would like to know how can I achieve my task using Perl programming language, any inputs would be highly appreciated.
Note: I do not have any specific language constraint, it could be either Perl or Python scripting language which I can run from the Unix prompt. Currently I am using the bash shell and have sh and py script. How can this be done?
My scripts:
#!/usr/bin/ksh
export ORACLE_HOME=database specific details
export PATH=$ORACLE_HOME/bin:path information
sqlplus database server information<<EOF
SET HEADING OFF
SET ECHO OFF
SET PAGESIZE 0
SET LINESIZE 1000
SPOOL output.txt
select * from my table_name;
SPOOL OFF
EOF
I know du -h would be the command which I should be using but I am not sure how should my script be, I have tried something in python. I am totally new to Python and it's my first time effort.
Here it is:
import os
folderpath='folder_path'
file=open('output file which has all listing of query result','r')
for line in file:
filename=line.strip()
filename=filename.replace(' ', '\ ')
fullpath=folderpath+filename
# print (fullpath)
os.system('du -h '+fullpath)
File names in the output text file for example are like: 007_009_Bond Is Here_009_Yippie.doc
Any guidance would be highly appreciated.
Update:
How can I move all the files which are present in output.txt file to some other folder location using Perl ?
After doing step1, how can I delete all the files which are present in output.txt file ?
Any suggestions would be highly appreciated.

In perl, the -s filetest operator is probaby what you want.
use strict;
use warnings;
use File::Copy;
my $folderpath = 'the_path';
my $destination = 'path/to/destination/directory';
open my $IN, '<', 'path/to/infile';
my $total;
while (<$IN>) {
chomp;
my $size = -s "$folderpath/$_";
print "$_ => $size\n";
$total += $size;
move("$folderpath/$_", "$destination/$_") or die "Error when moving: $!";
}
print "Total => $total\n";
Note that -s gives size in bytes not blocks like du.
On further investigation, perl's -s is equivalent to du -b. You should probably read the man pages on your specific du to make sure that you are actually measuring what you intend to measure.
If you really want the du values, change the assignment to $size above to:
my ($size) = split(' ', `du "$folderpath/$_"`);

Eyeballing, you can make YOUR script work this way:
1) Delete the line filename=filename.replace(' ', '\ ') Escaping is more complicated than that, and you should just quote the full path or use a Python library to escape it based on the specific OS;
2) You are probably missing a delimiter between the path and the file name;
3) You need single quotes around the full path in the call to os.system.
This works for me:
#!/usr/bin/python
import os
folderpath='/Users/andrew/bin'
file=open('ft.txt','r')
for line in file:
filename=line.strip()
fullpath=folderpath+"/"+filename
os.system('du -h '+"'"+fullpath+"'")
The file "ft.txt" has file names with no path and the path part is '/Users/andrew/bin'. Some of the files have names that would need to be escaped, but that is taken care of with the single quotes around the file name.
That will run du -h on each file in the .txt file, but does not give you the total. This is fairly easy in Perl or Python.
Here is a Python script (based on yours) to do that:
#!/usr/bin/python
import os
folderpath='/Users/andrew/bin/testdir'
file=open('/Users/andrew/bin/testdir/ft.txt','r')
blocks=0
i=0
template='%d total files in %d blocks using %d KB\n'
for line in file:
i+=1
filename=line.strip()
fullpath=folderpath+"/"+filename
if(os.path.exists(fullpath)):
info=os.stat(fullpath)
blocks+=info.st_blocks
print `info.st_blocks`+"\t"+fullpath
else:
print '"'+fullpath+"'"+" not found"
print `blocks`+"\tTotal"
print " "+template % (i,blocks,blocks*512/1024)
Notice that you do not have to quote or escape the file name this time; Python does it for you. This calculates file sizes using allocation blocks; the same way that du does it. If I run du -ahc against the same files that I have listed in ft.txt I get the same number (well kinda; du reports it as 25M and I get the report as 24324 KB) but it reports the same number of blocks. (Side note: "blocks" are always assumed to be 512 bytes under Unix even though the actual block size on larger disc is always larger.)
Finally, you may want to consider making your script so that it can read a command line group of files rather than hard coding the file and the path in the script. Consider:
#!/usr/bin/python
import os, sys
total_blocks=0
total_files=0
template='%d total files in %d blocks using %d KB\n'
print
for arg in sys.argv[1:]:
print "processing: "+arg
blocks=0
i=0
file=open(arg,'r')
for line in file:
abspath=os.path.abspath(arg)
folderpath=os.path.dirname(abspath)
i+=1
filename=line.strip()
fullpath=folderpath+"/"+filename
if(os.path.exists(fullpath)):
info=os.stat(fullpath)
blocks+=info.st_blocks
print `info.st_blocks`+"\t"+fullpath
else:
print '"'+fullpath+"'"+" not found"
print "\t"+template % (i,blocks,blocks*512/1024)
total_blocks+=blocks
total_files+=i
print template % (total_files,total_blocks,total_blocks*512/1024)
You can then execute the script (after chmod +x [script_name].py) by ./script.py ft.txt and it will then use the path to the command line file as the assumed path to the files "ft.txt". You can process multiple files as well.

You can do it in your shell script itself.
You have all the files names in your spooled file output.txt, all you have to add at the end of existing script is:
< output.txt du -h
It will give size of each file and also a total at the end.

You can use the Python skeleton that you've sketched out and add os.path.getsize(fullpath) to get the size of individual file.
For example, if you wanted a dictionary with the file name and size you could:
dict((f, os.path.getsize(f)) for f in file)
Keep in mind that the result from os.path.getsize(...) is in bytes so you'll have to convert it to get other units if you want.
In general os.path is a key module for manipulating files and paths.

Related

I am trying to print the last line of every file in a directory using shell command from python script

I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory

Using Makefile bash to save the contents of a python file

For those who are curious as to why I'm doing this: I need specific files in a tar ball - no more, no less. I have to write unit tests for make check, but since I'm constrained to having "no more" files, I have to write the check within the make check. In this way, I have to write bash(but I don't want to).
I dislike using bash for unit testing(sorry to all those who like bash. I just dislike it so much that I would rather go with an extremely hacky approach than to write many lines of bash code), so I wrote a python file. I later learned that I have to use bash because of some unknown strict rule. I figured that there was a way to cache the entire content of the python file into a single string in the bash file, so I could take the string literal in bash and write to a python file and then execute it.
I tried the following attempt (in the following script and result, I used another python file that's not unit_test.py, so don't worry if it doesn't actually look like a unit test):
toStr.py:
import re
with open("unit_test.py", 'r+') as f:
s = f.read()
s = s.replace("\n", "\\n")
print(s)
And then I piped the results out using:
python toStr.py > temp.txt
It looked something like:
#!/usr/bin/env python\n\nimport os\nimport sys\n\n#create number of bytes as specified in the args:\nif len(sys.argv) != 3:\n print("We need a correct number of args : 2 [NUM_BYTES][FILE_NAME].")\n exit(1)\nn = -1\ntry:\n n = int(sys.argv[1])\nexcept:\n print("Error casting number : " + sys.argv[1])\n exit(1)\n\nrand_string = os.urandom(n)\n\nwith open(sys.argv[2], 'wb+') as f:\n f.write(rand_string)\n f.flush()\n f.close()\n\n
I tried taking this as a string literal and echoing it into a new file and see whether I could run it as a python file but it failed.
echo '{insert that giant string above here}' > new_unit_test.py
I wanted to take this statement above and copy it into my "bash unit test" file so I can just execute the python file within the bash script.
The resulting file looked exactly like {insert giant string here}. What am I doing wrong in my attempt? Are there other, much easier ways where I can hold a python file as a string literal in a bash script?
the easiest way is to only use double-quotes in your python code, then, in your bash script, wrap all of your python code in one pair of single-quotes, e.g.,
#!/bin/bash
python -c 'import os
import sys
#create number of bytes as specified in the args:
if len(sys.argv) != 3:
print("We need a correct number of args : 2 [NUM_BYTES][FILE_NAME].")
exit(1)
n = -1
try:
n = int(sys.argv[1])
except:
print("Error casting number : " + sys.argv[1])
exit(1)
rand_string = os.urandom(n)
# i changed ""s to ''s below -webb
with open(sys.argv[2], "wb+") as f:
f.write(rand_string)
f.flush()
f.close()'

Escape $ in filename

I have a list of files on my filesystem which I'd like to chmod to 664 via python.
On of the filenames/dirpaths (I am not allowed to change the filename nor dirpaths!!!) is:
/home/media/Music/Ke$ha/song.mp3 (NOTE $ is a literal, not a variable!)
I receive the files in a list: ['/some/path/file1', '/some/otherpath/file2', etc...]
If I try to run the following code:
files = ['/home/media/Music/Ke$ha/song.mp3']
for file in files:
os.chmod(file, 0664)
It complains that it cannot find /home/media/Music/Ke$ha/song.mp3. Most likely (I guess) because the called shell tries to expand $ha, which is obviously wrong.
The 'Ke$ha' file is just an example, there are many more files with escape characters in it (e.g. /home/media/Music/Hill's fire/song.mp3)
The question I have is: How can I elegantly convince python and/or the shell to handle these files properly?
Kind regards,
Robert Nagtegaal.
You can do like this
files=["/home/media/Music/Ke$ha/song.mp3", "/home/media/Music/Hill's fire/song.mp3"]
import os,re
os.system("chmod 777 " + re.escape(files[i]))
How about this, a raw string? Also is your username 'media'?
files = [r'/home/media/Music/Ke$ha/song.mp3']

Python: Write stdout to log file; output is hexadecimal not ascii

I'm working on a script to call an executable for i/o files. I'm using subprocess and trying to shell out the exe and the stdout to a log file. Problem is I would like to output a simple ascii file and I'm getting a hexadecimal file. Just really learning to program python (or any language for that matter) so, I'm assuming there some type of formatting I can do but I just don't get it. I've done a fair bit of searching on this site and others but I haven't anything like what I'm using subprocess for. The "outRadcorr" is what I need help on the most...Any ideas? More code on request.
Import system modules
import os, sys, string, traceback, time, datetime
import params
from subprocess import Popen, PIPE, STDOUT
...some code here.....
Write stdout to log file
rad_log_file = open(dsFolder + '\\radcorr.log', 'w')
# loop through the files in raw file list to run radiometric correction
for rawfiles in rawFolderList:
# Define the file base
rawBase = rawfiles.split(".")[0]
print ('\nProcessing file: %s \n')%( rawBase )
# define variables from raw file to process radcorr
radFile = rawfiles
pixFile = ('%s.pix')%( rawBase )
attFile = ('%s.att')%( rawBase )
# create windose bat file function
def rad_bat_writer( radcorr_bat ):
with open(dsFolder + '\\radcorr.bat', 'a') as rad_bat_file:
rad_bat_file.write(radcorr_bat + '\n')
# grab radcor input/output files and run radcorr.exe
radcorr = ('C:\\itres\\bin\\radcorr.exe -f 1 -j 100 -g 50 -s -1 -n -1 %s %s %s -i '+ rawFolder + '\%s,rb -o ' + radFolder + '\%s -a ' \
+ radFolder + '\%s -c C:\\itres\\rad_cal_files\\%s -I 0 -v 0 -r Y -R Y -^^ 2') %( sum,scatter,shift,radFile,pixFile,attFile,rad_prefix )
# print out radcorr command
print radcorr
# Execute radcorr and write stdout
outRadcorr = Popen("{};".format(radcorr), stdout=PIPE, stderr=STDOUT)
for line in outRadcorr.stdout:
sys.stdout.write(line)
rad_log_file.write(line)
# write output to log
rad_bat_writer( radcorr )
# Close out exe and log files
outRadcorr.wait()
rad_log_file.close()
I was using UltraEdit to view the file, which you can view hex files in. I was not looking in hex mode. I might be confusing the terminology here. File looks normal in NotePad/NotePad++/WordPad etc. As I ran the script in debug mode, I could hit the loop file by file. First two files (1GB/file), the output log looked fine. Once the radcorr.log file was over 10kb, I change from a normal ascii output to this binary file viewed that looked hex. I can't post images yet, but just google ultraedit hexadecimal.
Still not sure why it moved to this format. Finally, output size was 45kb. I change to view/edit hex mode in UltraEdit and it looks fine. Just wanted to get it out there to see if others had any ideas why when I specified the log to be 'w' and not 'wb', for instance.
I do appreciate all you help. #J.F. Sebastian I'll have to test the code you posted, probably help fix potential bugs down the road.

running a system command in a python script

I have been going through "A byte of Python" to learn the syntax and methods etc...
I have just started with a simple backup script (straight from the book):
#!/usr/bin/python
# Filename: backup_ver1.py
import os
import time
# 1. The files and directories to be backed up are specified in a list.
source = ['"C:\\My Documents"', 'C:\\Code']
# Notice we had to use double quotes inside the string for names with spaces in it.
# 2. The backup must be stored in a main backup directory
target_dir = 'E:\\Backup' # Remember to change this to what you will be using
# 3. The files are backed up into a zip file.
# 4. The name of the zip archive is the current date and time
target = target_dir + os.sep + time.strftime('%Y%m%d%H%M%S') + '.zip'
# 5. We use the zip command to put the files in a zip archive
zip_command = "zip -qr {0} {1}".format(target, ' '.join(source))
# Run the backup
if os.system(zip_command) == 0:
print('Successful backup to', target)
else:
print('Backup FAILED')
Right, it fails. If I run the zip command in the terminal it works fine. I think it fails because the zip_command is never actually run. And I don't know how to run it.
Simply typing out zip_command does not work. (I am using python 3.1)
Are you sure that the Python script is seeing the same environment you have access to when you enter the command manually in the shell? It could be that zip isn't on the path when Python launches the command.
It would help us if you could format your code as code; select the code parts, and click on the "Code Sample" button in the editor toolbar. The icon looks like "101/010" and if you hold the mouse pointer over it, the yellow "tool tip" box says "Code Sample <pre></pre> Ctrl+K"
I just tried it, and if you paste code in to the StackOverflow editor, lines with '#' will be bold. So the bold lines are comments. So far so good.
Your strings seem to contain backslash characters. You will need to double each backslash, like so:
target_dir = 'E:\\Backup'
This is because Python treats the backslash specially. It introduces a "backslash escape", which lets you put a quote inside a quoted string:
single_quote = '\''
You could also use a Python "raw string", which has much simpler rules for a backslash. A raw string is introduced by r" or r' and terminated by " or ' respectively. examples:
# both of these are legal
target_dir = r"E:\Backup"
target_dir = r'E:\Backup'
The next step I recommend is to modify your script to print the command string, and just look at the string and see if it seems correct.
Another thing you can try is to make a batch file that prints out the environment variables, and have Python run that, and see what the environment looks like. Especially PATH.
Here is a suggested example:
set
echo Trying to run zip...
zip
Put those in a batch file called C:\mytest.cmd, and then have your Python code run it:
result_code = os.system("C:\\mytest.cmd")
print('Result of running mytest was code', result_code)
If it works, you will see the environment variables printed out, then it will echo "Trying to run zip...", then if zip runs it will print a message with the version number of zip and how to run it.
zip command only work in linux not for windows.. thats why it make an error..

Categories