How can I write find and exec command in python - python

I have a shell script that does
find /tmp/test/* -name "*.json" -exec python /python/path {} \;
it looks for all the JSON files in specific directories and executes the OTHER python script that I have..
How can I do this python scripting?

I'm not sure if I understood your question, if you are trying to execute a shell command from a python script, you can use os.system() :
import os
os.system('ls -l')
complete documentation

import glob,subprocess
for json_file in glob.glob("/home/tmp/*.json"):
subprocess.Popen(["python","/path/to/my.py",json_file],env=os.environ).communicate()

If you want to use Python instead of find, start with os.walk (official doc) to get the files. Once you have them (or as you them), act on them however you like.
From that page:
import os
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
# act on the file

I guess you want to adjust your other python file /python/path/script.py, such that you only need this one file:
#!/usr/bin/env python
import sys
import glob
import os
#
# parse the command line arguemnts
#
for i, n in enumerate(sys.argv):
# Debug output
print "arg nr %02i: %s" % (i, n)
# Store the args in variables
# (0 is the filename of the script itself)
if i==1:
path = sys.argv[1] # the 1st arg is the path like "/tmp/test"
if i==2:
pattern = sys.argv[2] # a pattern to match for, like "'*.json'"
#
# merge path and pattern
# os.path makes it win / linux compatible ( / vs \ ...) and other stuff
#
fqps = os.path.join(path, pattern)
#
# Do something with your files
#
for filename in glob.glob(fqps):
print filename
# Do your stuff here with one file
with open(filename, 'r') as f: # 'r'= only ready from file ('w' = write)
lines = f.readlines()
# at this point, the file is closed again!
for line in lines:
print line
# and so on ...
Then you can use the one script like this
/python/path/script.py /tmp/test/ '*.json'. (Without needing to write python in front, thanks to the very first line, called shebang. But you need to make it executable once, using chmod +x /python/path/script.py)
Of course you can omit the 2nd arg and assign a default value to pattern, or only use one arg in the first place. I did it this way to demonstrate os.path.join() and quoting of arguments that should not be extended by bash (compare the effect of using '*.json' and *.json on the printed list of arguments at the start)
Here are some information about better / more sophisticated ways of handling command line arguments.
And, as a bonus, using a main() function is advisable as well, to keep overview if your script gets larger or is being used by other python scripts.

What you need is
find /tmp/test/* -name "*.json" -exec sh -c "python /python/path {}" \;

Related

Using regular expression in subprocess module

I am trying to automate a particular process using subprocess module in python. For example, if I have a set of files that start with a word plot and then 8 digits of numbers. I want to copy them using the subprocess run command.
copyfiles = subprocess.run(['cp', '-r', 'plot*', 'dest'])
When I run the code, the above code returns an error "cp: plot: No such file or directory*"
How can I execute such commands using subprocess module? If I give a full filename, the above code works without any errors.
I have found a useful but probably not the best efficent code fragment from this post, where an additional python library is used (shlex), and what I propose is to use os.listdir method to iterate over folder that you need to copy files, save in a list file_list and filter using a lambda function to extract specific file names, define a command sentence as string and use subproccess.Popen() to execute the child process to copy the files on to a destination folder.
import shlex
import os
import subprocess
# chage directory where your files are located at
os.chdir('C:/example_folder/')
# you can use os.getcwd function to check the current working directory
print(os.getcwd)
# extract file names
file_list = os.listdir()
file_list = list(filter(lambda file_name: file_name.startswith('plot'), file_list))
# command sentence
cmd = 'find test_folder -iname %s -exec cp {} dest_folder ;'
for file in file_list:
subprocess.Popen(shlex.split(cmd % file))

how to split a full file path into a path and a file name without an extension

how to split a full file path into a path and a file name without an extension?
I'm looking for any files with the extension .conf:
find /path -name .conf
/path/file1.conf
/path/smth/file2.conf
/path/smth/file3.conf
/path/smth/smth1/.conf
...
/path/smt//*.conf
I need the output in string(without extension .conf):
/path;file1|path/smth;file2;file3|...
What's the best way to do it?
I was thinking of a solution - save the output of the find work to a file and process them in a loop..but maybe there is a more effective way.
Sorry for mistakes, I newbie..
Thanx for u feedback, guys!
since you mentioned .conf, does this help?
kent$ basename -s .conf '/path/smth/file2.conf'
file2
kent$ dirname '/path/smth/file2.conf'
/path/smth
To do this in Bash:
find /path/ -type f -name "*.conf"
Note that if you want to do this in a Bash script, you can store /path/ in a variable, for instance one named directory, and change the command like so:
find $directory -type f -name "*.conf"
To do this in Python:
import os
PATH = /path/
test_files = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames
if os.path.splitext(f)[1] == '.json']
There are some other ways to do this in Python listed here as well
bash parameter parsing is easy, fast, and lightweight.
for fp in /path/file1.conf /path/smth/file2.conf /path/smth/file3.conf; do
p="${fp%/*}" # % strips the pattern from the end (minimal, non-greedy)
f="${fp##*/}" # ## strips the pattern from the beginning (max-match, greedy)
f="${f%.*}" # end-strip the already path-cleaned filename to remove extention
echo "$p, $f"
done
/path, file1
/path/smth, file2
/path/smth, file3
To get what you apparently want as your formatting -
declare -A paths # associative array
while read -r fp; do
p=${fp%/*} f=${fp##*/}; # preparse path and filename
paths[$p]="${paths[$p]};${f%.*}"; # p as key, stacked/delimited val
done < file
Then stack/delimit your datasets.
for p in "${!paths[#]}"; do printf "%s|" "$p${paths[$p]}"; done; echo
/path;file1|/path/smth;file2;file3|
For each key, print key/val and a delimiter. echo at end for a newline.
If you don't want the trailing pipe, assign it all to one var in the second loop instead of printing it out, and trim the trailing pipe at the end.
$: for p in "${!paths[#]}"; do out="$out$p${paths[$p]}|"; done; echo "${out%|}"
/path;file1|/path/smth;file2;file3
Some folk will tell you not to use bash for anything this complex. Be aware that it can lead to ugly maintenance, especially if the people maintaining it behind you aren't bash experts and can't be bothered to go RTFM.
If you actually needed that embedded space in your example then your rules are inconsistent and you'll have to explain them.
if you have the file paths in a list you can do this using a dictionary with key the path and value the filename
aa=['/path/file1.conf','/path/smth/file2.conf','/path/smth/file3.conf']
f={}
for x in aa:
temp=x[:-len(".conf")].split("/")
filename=temp[-1]
path="/".join(temp[:-1])
if path in f:
f[path]=f[path]+","+filename
else:
f[path]=filename
result=""
for x in f:
result=result+str(x)+";"+f[x]+"|"
print(result)

Bash command doesn't run properly in Python

I have the following files in a directory
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Using ls Co-sqp* filters so that the output is
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
However, in a python script, I used
cmd = ["ls", self.prefix+".pdos_atm*wfc*"]
output = subprocess.Popen(cmd,stdout=subprocess.PIPE,shell=True).communicate()[0]
print(output)
return output.splitlines()
and the output contains both files
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
What am I doing wrong in the python code that causes the script to not filter the ls output correctly?
To expand on IanAuld's comment, I would like to offer you two solutions that solve your problem without relying on calling a subprocess. Using subprocesses is kind of clunky and for finding files, python offers several powerful and more pythonic options.
I created a folder named files containing two files with the names you described and a python script in the parent folder:
find_files.py
files/
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Now there are several options to do this:
import glob
import os
# a glob solution
# you can use wildcards etc like in the command line
print(glob.glob("files/Co-sq*"))
# an os solution
# you can walk through all files and only print/ keep those that start with your desired string
for root, dirs, files in os.walk("files/"):
for file in files:
if file.startswith("Co-sq"):
print(file)
I would prefer the glob solution for finding files, because it is quite flexible and easy to use. As the name suggests, glob allows you to use glob patterns as you know them from the command line.
Your command isn't bash it's ls. There's no shell involved to expand your filename pattern self.prefix+".pdos_atm*wfc*", so ls gets that as a literal argument, as if you'd entered (in a shell)
ls 'Co-sqp-C70.pdos_atm*wfc*'
You have at least these options:
Invoke a shell to expand the pattern:
cmd = [ "sh", "-c", "ls " + self.prefix + ".pdos_atm*wfc*"]
Equivalently:
cmd = "ls " + self.prefix + ".pdos_atm*wfc*"
This is risky if self.prefix isn't guaranteed to be shell-clean.
Expand the glob, and pass the results to ls:
import glob
cmd = ["ls"] + glob.glob(self.prefix+".pdos_atm*wfc*")
You're still using ls, which is not intended to give parseable output. Don't do any more than simply passing the output to a user (e.g. in a log file).
Expand the glob, and process the results in python:
import glob
for file in glob.glob(self.prefix+".pdos_atm*wfc*"):
some_function(file)
You should do this if you want to examine the file entries in any way.

Apply function to all files in directory

I have written a function for data processing and need to apply it to a large amount of files in a directory.
The function works when applied to individual files.
def getfwhm(x):
import numpy as np
st=np.std(x[:,7])
fwhm=2*np.sqrt(2*np.log(2))*st
file=open('all_fwhm2.txt', 'at')
file.write("fwhm = %.6f\n" % (fwhm))
file.close()
file=open('all_fwhm2.txt', 'rt')
print file.read()
file.close()
I now want to use this in a larger scale. So far I have written this code
import os
import fwhmfunction
files=os.listdir(".")
fwhmfunction.getfwhm(files)
But I get the following error
File "fwhmfunction.py", line 11, in getfwhm
st=np.std(x[:,7])
TypeError: list indices must be integers, not tuple
I am writing in python using spyder.
Thanks for your help!
In the spirit of the unix You should separate the program in two:
the program which acts on the given file
the program which applies a given script to a given list of files (glob, or whatever)
So here's a sample of 1:
# name: script.py
import sys
File = sys.argv[1]
# do something here
print File
(it is better to use argparse to parse the args, but we used argv to keep it simple)
As to the second part, there is an excellent unix tool already:
$ find . -maxdepth 1 -mindepth 1 -name "*.txt" | parallel python2.7 script.py {}
Here you get an extra bonus: parallel task execution.
If You on windows, then You can write something simple (sequentional) in python:
# name: apply_script_to_glob.py
import sys, os
from glob import glob
Script = sys.argv[1]
Glob = sys.argv[2]
Files = glob(Glob)
for File in Files:
os.system("python2.7 " + Script + " " + File)
(again we didn't use argparse nor checked anything to keep it simple). You'd call the script with
$ python2.7 apply_script_to_glob.py "script.py" "*.txt"

Bulk renaming of files based on lookup

I have a folder full of image files such as
1500000704_full.jpg
1500000705_full.jpg
1500000711_full.jpg
1500000712_full.jpg
1500000714_full.jpg
1500000744_full.jpg
1500000745_full.jpg
1500000802_full.jpg
1500000803_full.jpg
I need to rename the files based on a lookup from a text file which has entries such as,
SH103239 1500000704
SH103240 1500000705
SH103241 1500000711
SH103242 1500000712
SH103243 1500000714
SH103244 1500000744
SH103245 1500000745
SH103252 1500000802
SH103253 1500000803
SH103254 1500000804
So, I want the image files to be renamed,
SH103239_full.jpg
SH103240_full.jpg
SH103241_full.jpg
SH103242_full.jpg
SH103243_full.jpg
SH103244_full.jpg
SH103245_full.jpg
SH103252_full.jpg
SH103253_full.jpg
SH103254_full.jpg
How can I do this job the easiest? Any one can write me a quick command or script which can do this for me please? I have a lot of these image files and manual change isnt feasible.
I am on ubuntu but depending on the tool I can switch to windows if need be. Ideally I would love to have it in bash script so that I can learn more or simple perl or python.
Thanks
EDIT: Had to Change the file names
Here's a simple Python 2 script to do the rename.
#!/usr/bin/env python
import os
# A dict with keys being the old filenames and values being the new filenames
mapping = {}
# Read through the mapping file line-by-line and populate 'mapping'
with open('mapping.txt') as mapping_file:
for line in mapping_file:
# Split the line along whitespace
# Note: this fails if your filenames have whitespace
new_name, old_name = line.split()
mapping[old_name] = new_name
suffix = '_full'
# List the files in the current directory
for filename in os.listdir('.'):
root, extension = os.path.splitext(filename)
if not root.endswith(suffix):
# File doesn't end with this suffix; ignore it
continue
# Strip off the number of characters that make up suffix
stripped_root = root[:-len(suffix)]
if stripped_root in mapping:
os.rename(filename, ''.join(mapping[stripped_root] + suffix + extension))
Various bits of the script are hard-coded that really shouldn't be. These include the name of the mapping file (mapping.txt) and the filename suffix (_full). These could presumably be passed in as arguments and interpreted using sys.argv.
This will work for your problem:
#!/usr/bin/perl
while (<DATA>) {
my($new, $old) = split;
rename("$old.jpg", "$new.jpg")
|| die "can't rename "$old.jpg", "$new.jpg": $!";
}
__END__
SH103239 1500000704
SH103240 1500000705
SH103241 1500000711
SH103242 1500000712
SH103243 1500000714
SH103244 1500000744
SH103245 1500000745
SH103252 1500000802
SH103253 1500000803
SH103254 1500000804
Switch to ARGV from DATA to read the lines from a particular input file.
Normally for mass rename operations, I use something more like this:
#!/usr/bin/perl
# rename script by Larry Wall
#
# eg:
# rename 's/\.orig$//' *.orig
# rename 'y/A-Z/a-z/ unless /^Make/' *
# rename '$_ .= ".bad"' *.f
# rename 'print "$_: "; s/foo/bar/ if <STDIN> =~ /^y/i' *
# find /tmp -name '*~' -print | rename 's/^(.+)~$/.#$1/'
($op = shift) || die "Usage: rename expr [files]\n";
chomp(#ARGV = <STDIN>) unless #ARGV;
for (#ARGV) {
$was = $_;
eval $op;
die if $#; # means eval `failed'
rename($was,$_) unless $was eq $_;
}
I’ve a more full-featured version, but that should suffice.
#!/bin/bash
for FILE in *.jpg; do
OLD=${FILE%.*} # Strip off extension.
NEW=$(awk -v "OLD=$OLD" '$2==OLD {print $1}' map.txt)
mv "$OLD.jpg" "$NEW.jpg"
done
A rewrite of Wesley's using generators:
import os, os.path
with open('mapping.txt') as mapping_file:
mapping = dict(line.strip().split() for line in mapping_file)
rootextiter = ((filename, os.path.splitext(filename)) for filename in os.listdir('.'))
mappediter = (
(filename, os.path.join(mapping[root], extension))
for filename, root, extension in rootextiter
if root in mapping
)
for oldname, newname in mappediter:
os.rename(oldname, newname)
This is very straightforward to do in Bash assuming that there's an entry in the lookup file for each file and each file has a lookup entry.
#!/bin/bash
while read -r to from
do
if [ -e "${from}_full.jpg" ]
then
mv "${from}_full.jpg" "${to}_full.jpg"
fi
done < lookupfile.txt
If the lookup file has many more entries than there are files then this approach may be inefficient. If the reverse is true then an approach that iterates over the files may be inefficient. However, if the numbers are close then this may be the best approach since it doesn't have to actually do any lookups.
If you'd prefer a lookup version that's pure-Bash:
#!/bin/bash
while read -r to from
do
lookup[from]=$to
done < lookupfile.txt
for file in *.jpg
do
base=${file%*_full.jpg}
mv "$file" "${lookup[base]}_full.jpg"
done
I modified Wesley's Code to work for my specific situation. I had a mapping file "sort.txt" that consisted of different .pdf files and numbers to indicate the order that I want them in based on an output from DOM manipulation from a website. I wanted to combine all these separate pdf files into a single pdf file but I wanted to retain the same order they are in as they are on the website. So I wanted to append numbers according to their tree location in a navigation menu.
1054 spellchecking.pdf
1055 using-macros-in-the-editor.pdf
1056 binding-macros-with-keyboard-shortcuts.pdf
1057 editing-macros.pdf
1058 etc........
Here is the Code I came up with:
import os, sys
# A dict with keys being the old filenames and values being the new filenames
mapping = {}
# Read through the mapping file line-by-line and populate 'mapping'
with open('sort.txt') as mapping_file:
for line in mapping_file:
# Split the line along whitespace
# Note: this fails if your filenames have whitespace
new_name, old_name = line.split()
mapping[old_name] = new_name
# List the files in the current directory
for filename in os.listdir('.'):
root, extension = os.path.splitext(filename)
#rename, put number first to allow for sorting by name and
#then append original filename +e extension
if filename in mapping:
print "yay" #to make coding fun
os.rename(filename, mapping[filename] + filename + extension)
I didn't have a suffix like _full so I didn't need that code. Other than that its the same code, I've never really touched python so this was a good learning experience for me.
Read in the text file, create a hash with the current file name, so files['1500000704'] = 'SH103239' and so on. Then go through the files in the current directory, grab the new filename from the hash, and rename it.
Here's a fun little hack:
paste -d " " lookupfile.txt lookupfile.txt | cut -d " " -f 2,3 | sed "s/\([ ]\|$\)/_full.jpg /g;s/^/mv /" | sh
import os,re,sys
mapping = <Insert your mapping here> #Dictionary Key value entries (Lookup)
for k,v in mapping:
for f in os.listdir("."):
if re.match('1500',f): #Executes code on specific files
os.rename(f,f.replace(k,v))

Categories