Perl substitution from Python in Anaconda on Windows platform

Perl substitution from Python in Anaconda on Windows platform - python

In an Anaconda shell environment on Windows with perl, m2-base and maybe some other packages installed:
$ echo "/" > junk
$ more junk
"/"
$ perl -pi.bak -e "s/\"\/\"/\"\\\\\"/" junk
$ more junk junk.bak
::::::::::::::
junk
::::::::::::::
"\"
::::::::::::::
junk.bak
::::::::::::::
"/"
I want to replicate this in Python. My script is this:
import subprocess
cmd = 'perl -pi.bak -e "s/\"\/\"/\"\\\\\"/" junk'
subprocess.call(cmd, shell = True)
which gives the following output:
$python test_perl.py
Substitution replacement not terminated at -e line 1.
I have tried different combinations of backslashes, different quotation styles, and using a different delimiter in perl (i.e. replacing the / with something like #), but can't seem to figure out how to crack this nut.
UPDATE
subprocess.call(['perl', '-pi.bak', '-e', "s!\\\"\/\"!\"\\\\\\\"!", 'junk'], shell = True) works, but I'm confused about why subprocess does not need extra quotes to encapsulate the perl switch statement. Any insights would be appreciated.
--
For more info on what I'm actually doing, I am trying to install a Python module that was designed for Linux/Unix on my Windows platform in an Anaconda environment. For one part, I need to replace "/" with "\" in some of the files of the module. I am aware that I could edit the files directly and use something like os.path.split instead of just split("/") but I am trying to create a file that does all the work, so all one needs to do is clone the git repository and run a setup script.

Following python demo code emulate perl -i.bak ... behavour.
Problem description does not explains why OP resorts to Perl assistance to make simple substitution with preserving .bak file as backup copy.
Python has enough muscle to perform such operation, just a few lines of code.
import os
ext_bak = '.bak'
file_in = 'path_substitute.txt'
file_bak = file_in + ext_bak
# remove backup file if exists
if os.path.exists(file_bak):
os.remove(file_bak)
# rename original file to backup
os.rename(file_in,file_bak);
f = open(file_bak,'r') # read from backup file
o = open(file_in, 'w') # write to a file with original name
for line in f:
o.write(line.replace('/','\\')) # replace / with \ and write
# close files
f.close()
o.close()
Input path_substitute.txt
some path /opt/pkg/dir_1/file_1 word_1
other path /opt/pkg/dir_2/file_2 word_2
one more /opt/pkg/dir_3/file_3 word_3
Output path_substitute.txt
some path \opt\pkg\dir_1\file_1 word_1
other path \opt\pkg\dir_2\file_2 word_2
one more \opt\pkg\dir_3\file_3 word_3

Related

Find all files that use `print` but do not have `from future import print_function`

I'm migrating a codebase from 2.7 to 3.6, and would like to ensure that all files that use print do the __future__ import.
How do I find/grep/ack recursively through a multi-package codebase to find all files that use print but don't have the from __future__ import print_function?
I know that 2to3 should handle this automatically, but I've seen one instance where there is a print expression of the form print("updated database: ", db_name) in a file that does not include print-function import. Running 2to3-2.7 on this file transforms the line to print(("updated database: ", db_name)), which changes the output. I would like to find all instances where this problem might arise in order to fix them before running the automated tool

If you don't mind doing this in Python itself:
import os
for folder, subfolder, files in os.walk('/my/project/dir'):
scripts = [f for f in files if f.endswith('.py')]
for script in scripts:
path = os.path.join(folder, script)
with open(path, 'r') as file:
text = file.read()
if "print" in text and "print_function" not in text:
print("future print import not found in file", path)

2 egreps (works with Mac egrep and gnu egrep, others dunno) --
#!/bin/bash
fileswithprint=` egrep --files-with-matches --include '*.py' --recursive -w print $* `
# -R: Follow all symbolic links, unlike -r
# too long: see xargs
egrep --files-without-match '^from __future .*print_function' $fileswithprint

I am trying to print the last line of every file in a directory using shell command from python script

I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')

Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.

os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.

You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.

you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory

Chain of UNIX commands within Python

I'd like to execute the following UNIX command in Python:
cd 2017-02-10; pwd; echo missing > 123.txt
The date directory DATE = 2017-02-10 and OUT = 123.txt are already variables in Python so I have tried variations of
call("cd", DATE, "; pwd; echo missing > ", OUT)
using the subprocess.call function, but I’m struggling to find documentation for multiple UNIX commands at once, which are normally separated by ; or piping with >
Doing the commands on separate lines in Python doesn’t work either because it “forgets” what was executed on the previous line and essentiality resets.

You can pass a shell script as a single argument, with strings to be substituted as out-of-band arguments, as follows:
date='2017-02-10'
out='123.txt'
subprocess.call(
['cd "$1"; pwd; echo missing >"$2"', # shell script to run
'_', # $0 for that script
date, # $1 for that script
out, # $2 for that script
], shell=True)
This is much more secure than substituting your date and out values into a string which is evaluated by the shell as code, because these values are treated as literals: A date of $(rm -rf ~) will not in fact try to delete your home directory. :)

Doing the commands on separate lines in Python doesn’t work either
because it “forgets” what was executed on the previous line and
essentiality resets.
This is because if you have separate calls to subprocess.call it will run each command in its own shell, and the cd call has no effect on the later shells.
One way around that would be to change the directory in the Python script itself before doing the rest. Whether or not this is a good idea depends on what the rest of the script does. Do you really need to change directory? Why not just write "missing" to 2017-02-10/123.txt from Python directly? Why do you need the pwd call?
Assuming you're looping through a list of directories and want to output the full path of each and also create files with "missing" in them, you could perhaps do this instead:
import os
base = "/path/to/parent"
for DATE, OUT in [["2017-02-10", "123.txt"], ["2017-02-11", "456.txt"]]:
date_dir = os.path.join(base, DATE)
print(date_dir)
out_path = os.path.join(date_dir, OUT)
out = open(out_path, "w")
out.write("missing\n")
out.flush()
out.close()
The above could use some error handling in case you don't have permission to write to the file or the directory doesn't exist, but your shell commands don't have any error handling either.

>>> date = "2017-02-10"
>>> command = "cd " + date + "; pwd; echo missing > 123.txt"
>>> import os
>>> os.system(command)

Passing shell commands with Python os.system() or subprocess.check_call()

I'm trying to call 'sed' from Python and having troubles passing the command line via either subprocess.check_call() or os.system().
I'm on Windows 7, but using the 'sed' from Cygwin (it's in the path).
If I do this from the Cygwin shell, it works fine:
$ sed 's/&nbsp;/\ /g' <"C:foobar" >"C:foobar.temp"
In Python, I've got the full pathname I'm working with in "name". I tried:
command = r"sed 's/&nbsp;/\ /g' " + "<" '\"' + name + '\" >' '\"' + name + '.temp' + '\"'
subprocess.check_call(command, shell=True)
All the concatenation is there to make sure I have double quotes around the input and output filenames (in case there are blank spaces in the Windows file path).
I also tried it replacing the last line with:
os.system(command)
Either way, I get this error:
sed: -e expression #1, char 2: unterminated `s' command
'amp' is not recognized as an internal or external command,
operable program or batch file.
'nbsp' is not recognized as an internal or external command,
operable program or batch file.
Yet, as I said, it works OK from the console. What am I doing wrong?

The shell used by subprocess is probably not the shell you want. You can specify the shell with executable='path/to/executable'. Different shells have different quoting rules.
Even better might be to skip subprocess altogether, and write this as pure Python:
with open("c:foobar") as f_in:
with open("c:foobar.temp", "w") as f_out:
for line in f_in:
f_out.write(line.replace('&nbsp;', ' '))

I agree with Ned Batchelder's assessment, but think what you might want to consider using the following code because it likely does what you ultimately want to accomplish which can be done easily with the help of Python's fileinput module:
import fileinput
f = fileinput.input('C:foobar', inplace=1)
for line in f:
line = line.replace('&nbsp;', ' ')
print line,
f.close()
print 'done'
This will effectively update the given file in place as use of the keyword suggests. There's also an optional backup= keyword -- not used above -- which will save a copy of the original file if desired.
BTW, a word of caution about using something like C:foobar to specify the file name because on Windows it means a file of that name in whatever the current directory is on drive C:, which might not be what you want.

I think you'll find that, in Windows Python, it's not actually using the CygWin shell to run your command, it's instead using cmd.exe.
And, cmd doesn't play well with single quotes the way bash does.
You only have to do the following to confirm that:
c:\pax> echo hello >hello.txt
c:\pax> type "hello.txt"
hello
c:\pax> type 'hello.txt'
The system cannot find the file specified.
I think the best idea would be to use Python itself to process the file. The Python language is a cross-platform one which is meant to remove all those platform-specific inconsistencies, such as the one you've just found.

File Manipulation: Scripting Question

I have a script which connects to database and gets all records which statisfy the query. These record results are files present on a server, so now I have a text file which has all file names in it.
I want a script which would know:
What is the size of each file in the output.txt file?
What is the total size of all the files present in that text file?
Update:
I would like to know how can I achieve my task using Perl programming language, any inputs would be highly appreciated.
Note: I do not have any specific language constraint, it could be either Perl or Python scripting language which I can run from the Unix prompt. Currently I am using the bash shell and have sh and py script. How can this be done?
My scripts:
#!/usr/bin/ksh
export ORACLE_HOME=database specific details
export PATH=$ORACLE_HOME/bin:path information
sqlplus database server information<<EOF
SET HEADING OFF
SET ECHO OFF
SET PAGESIZE 0
SET LINESIZE 1000
SPOOL output.txt
select * from my table_name;
SPOOL OFF
EOF
I know du -h would be the command which I should be using but I am not sure how should my script be, I have tried something in python. I am totally new to Python and it's my first time effort.
Here it is:
import os
folderpath='folder_path'
file=open('output file which has all listing of query result','r')
for line in file:
filename=line.strip()
filename=filename.replace(' ', '\ ')
fullpath=folderpath+filename
# print (fullpath)
os.system('du -h '+fullpath)
File names in the output text file for example are like: 007_009_Bond Is Here_009_Yippie.doc
Any guidance would be highly appreciated.
Update:
How can I move all the files which are present in output.txt file to some other folder location using Perl ?
After doing step1, how can I delete all the files which are present in output.txt file ?
Any suggestions would be highly appreciated.

In perl, the -s filetest operator is probaby what you want.
use strict;
use warnings;
use File::Copy;
my $folderpath = 'the_path';
my $destination = 'path/to/destination/directory';
open my $IN, '<', 'path/to/infile';
my $total;
while (<$IN>) {
chomp;
my $size = -s "$folderpath/$_";
print "$_ => $size\n";
$total += $size;
move("$folderpath/$_", "$destination/$_") or die "Error when moving: $!";
}
print "Total => $total\n";
Note that -s gives size in bytes not blocks like du.
On further investigation, perl's -s is equivalent to du -b. You should probably read the man pages on your specific du to make sure that you are actually measuring what you intend to measure.
If you really want the du values, change the assignment to $size above to:
my ($size) = split(' ', `du "$folderpath/$_"`);

Eyeballing, you can make YOUR script work this way:
1) Delete the line filename=filename.replace(' ', '\ ') Escaping is more complicated than that, and you should just quote the full path or use a Python library to escape it based on the specific OS;
2) You are probably missing a delimiter between the path and the file name;
3) You need single quotes around the full path in the call to os.system.
This works for me:
#!/usr/bin/python
import os
folderpath='/Users/andrew/bin'
file=open('ft.txt','r')
for line in file:
filename=line.strip()
fullpath=folderpath+"/"+filename
os.system('du -h '+"'"+fullpath+"'")
The file "ft.txt" has file names with no path and the path part is '/Users/andrew/bin'. Some of the files have names that would need to be escaped, but that is taken care of with the single quotes around the file name.
That will run du -h on each file in the .txt file, but does not give you the total. This is fairly easy in Perl or Python.
Here is a Python script (based on yours) to do that:
#!/usr/bin/python
import os
folderpath='/Users/andrew/bin/testdir'
file=open('/Users/andrew/bin/testdir/ft.txt','r')
blocks=0
i=0
template='%d total files in %d blocks using %d KB\n'
for line in file:
i+=1
filename=line.strip()
fullpath=folderpath+"/"+filename
if(os.path.exists(fullpath)):
info=os.stat(fullpath)
blocks+=info.st_blocks
print `info.st_blocks`+"\t"+fullpath
else:
print '"'+fullpath+"'"+" not found"
print `blocks`+"\tTotal"
print " "+template % (i,blocks,blocks*512/1024)
Notice that you do not have to quote or escape the file name this time; Python does it for you. This calculates file sizes using allocation blocks; the same way that du does it. If I run du -ahc against the same files that I have listed in ft.txt I get the same number (well kinda; du reports it as 25M and I get the report as 24324 KB) but it reports the same number of blocks. (Side note: "blocks" are always assumed to be 512 bytes under Unix even though the actual block size on larger disc is always larger.)
Finally, you may want to consider making your script so that it can read a command line group of files rather than hard coding the file and the path in the script. Consider:
#!/usr/bin/python
import os, sys
total_blocks=0
total_files=0
template='%d total files in %d blocks using %d KB\n'
print
for arg in sys.argv[1:]:
print "processing: "+arg
blocks=0
i=0
file=open(arg,'r')
for line in file:
abspath=os.path.abspath(arg)
folderpath=os.path.dirname(abspath)
i+=1
filename=line.strip()
fullpath=folderpath+"/"+filename
if(os.path.exists(fullpath)):
info=os.stat(fullpath)
blocks+=info.st_blocks
print `info.st_blocks`+"\t"+fullpath
else:
print '"'+fullpath+"'"+" not found"
print "\t"+template % (i,blocks,blocks*512/1024)
total_blocks+=blocks
total_files+=i
print template % (total_files,total_blocks,total_blocks*512/1024)
You can then execute the script (after chmod +x [script_name].py) by ./script.py ft.txt and it will then use the path to the command line file as the assumed path to the files "ft.txt". You can process multiple files as well.

You can do it in your shell script itself.
You have all the files names in your spooled file output.txt, all you have to add at the end of existing script is:
< output.txt du -h
It will give size of each file and also a total at the end.

You can use the Python skeleton that you've sketched out and add os.path.getsize(fullpath) to get the size of individual file.
For example, if you wanted a dictionary with the file name and size you could:
dict((f, os.path.getsize(f)) for f in file)
Keep in mind that the result from os.path.getsize(...) is in bytes so you'll have to convert it to get other units if you want.
In general os.path is a key module for manipulating files and paths.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.