run bash script over files that have not been converted

run bash script over files that have not been converted - python

I have a python script to convert json files to csv. It currently converts all files, but I want it to run only on those json files that have not been converted to csv already. All are in the same directory. How can I modify my code:
#!/bin/bash
# file: foo.sh
for f in *.json; do
python ~/bin/convert.py "$f" "-fcsv"
done

Assuming your script creates basename.csv for an input file named basename.json
for f in *.json; do
test -e "${f%.json}.csv" && continue
python ~/bin/convert.py "$f" "-fcsv"
done
The shell parameter expansion $(variable%pattern} produces the value of variable with any match on the glob pattern removed.

This leverages the find GNU-util and uses python for
the rest. Since you are running a python script, I am assuming that python is
installed on the system.
Full command below:
find . \( -type f -regextype posix-extended -regex '.*json' \) - exec python -c "import sys; import os; filename, file_extension = os.path.splittext(sys.argv[1]); if not os.path.isfile(filename + '.csv') : os.system('python ~/bin/convert ' + filename + file_extension + ' -fcsv')" {}
searches for files with .json extension
find . \( -type f -regextype posix-extended -regex '.*json' \)
takes the output from the find command and enters it into the python namespace
filename, file_extension = os.path.splittext(sys.argv[1]);
checks if there is a filename with extension .csv, if not runs the convert.py program
if not os.path.isfile(filename+'.csv'): os.system('python ~/bin/convert.py ' + filename+file_extension + ' -fcsv')"
You can put the entire script into a bash script and use \ to break it up into multiple lines as shown here: How can I split a shell command over multiple lines when using an IF statement?

Related

Perl substitution from Python in Anaconda on Windows platform

In an Anaconda shell environment on Windows with perl, m2-base and maybe some other packages installed:
$ echo "/" > junk
$ more junk
"/"
$ perl -pi.bak -e "s/\"\/\"/\"\\\\\"/" junk
$ more junk junk.bak
::::::::::::::
junk
::::::::::::::
"\"
::::::::::::::
junk.bak
::::::::::::::
"/"
I want to replicate this in Python. My script is this:
import subprocess
cmd = 'perl -pi.bak -e "s/\"\/\"/\"\\\\\"/" junk'
subprocess.call(cmd, shell = True)
which gives the following output:
$python test_perl.py
Substitution replacement not terminated at -e line 1.
I have tried different combinations of backslashes, different quotation styles, and using a different delimiter in perl (i.e. replacing the / with something like #), but can't seem to figure out how to crack this nut.
UPDATE
subprocess.call(['perl', '-pi.bak', '-e', "s!\\\"\/\"!\"\\\\\\\"!", 'junk'], shell = True) works, but I'm confused about why subprocess does not need extra quotes to encapsulate the perl switch statement. Any insights would be appreciated.
--
For more info on what I'm actually doing, I am trying to install a Python module that was designed for Linux/Unix on my Windows platform in an Anaconda environment. For one part, I need to replace "/" with "\" in some of the files of the module. I am aware that I could edit the files directly and use something like os.path.split instead of just split("/") but I am trying to create a file that does all the work, so all one needs to do is clone the git repository and run a setup script.

Following python demo code emulate perl -i.bak ... behavour.
Problem description does not explains why OP resorts to Perl assistance to make simple substitution with preserving .bak file as backup copy.
Python has enough muscle to perform such operation, just a few lines of code.
import os
ext_bak = '.bak'
file_in = 'path_substitute.txt'
file_bak = file_in + ext_bak
# remove backup file if exists
if os.path.exists(file_bak):
os.remove(file_bak)
# rename original file to backup
os.rename(file_in,file_bak);
f = open(file_bak,'r') # read from backup file
o = open(file_in, 'w') # write to a file with original name
for line in f:
o.write(line.replace('/','\\')) # replace / with \ and write
# close files
f.close()
o.close()
Input path_substitute.txt
some path /opt/pkg/dir_1/file_1 word_1
other path /opt/pkg/dir_2/file_2 word_2
one more /opt/pkg/dir_3/file_3 word_3
Output path_substitute.txt
some path \opt\pkg\dir_1\file_1 word_1
other path \opt\pkg\dir_2\file_2 word_2
one more \opt\pkg\dir_3\file_3 word_3

Using Python to run a sox command for converting wav files

I would like to process .wav files in Python. Particularly, I would like to perform following operation
sox input.wav -c 1 -r 16000 output.wav
in every .wav file in my folder. My code is below:
#!/usr/bin/python
# encoding=utf8
# -*- encoding: utf -*-
import glob
import subprocess
segments= []
for filename in glob.glob('*.wav'):
new_filename = "converted_" + filename
subprocess.call("sox" + filename + "-c 1 -r 16000" + new_filename, shell=True)
However, it is not working as expected that it's not calling my command.

When you write
subprocess.call("sox" + filename + "-c 1 -r 16000" + new_filename, shell=True)
what's actually going to be executed for an exemplary TEST.WAV file looks like this:
soxTEST.WAV-c 1 -r 16000converted_TEST.WAV
So you're missing the spaces in between. A nice solution using Python's f-strings (Formatted string literals) would be something like this:
subprocess.call(f"sox {filename} -c 1 -r 16000 {new_filename}", shell=True)
However, I'd recommend switching over to subprocess.run and disregarding the shell=True flag:
subprocess.run(["sox", filename, "-c 1", "-r 16000", new_filename])
More information also at the docs https://docs.python.org/3/library/subprocess.html
Note: Read the Security Considerations section before using shell=True.

Find all files that use `print` but do not have `from future import print_function`

I'm migrating a codebase from 2.7 to 3.6, and would like to ensure that all files that use print do the __future__ import.
How do I find/grep/ack recursively through a multi-package codebase to find all files that use print but don't have the from __future__ import print_function?
I know that 2to3 should handle this automatically, but I've seen one instance where there is a print expression of the form print("updated database: ", db_name) in a file that does not include print-function import. Running 2to3-2.7 on this file transforms the line to print(("updated database: ", db_name)), which changes the output. I would like to find all instances where this problem might arise in order to fix them before running the automated tool

If you don't mind doing this in Python itself:
import os
for folder, subfolder, files in os.walk('/my/project/dir'):
scripts = [f for f in files if f.endswith('.py')]
for script in scripts:
path = os.path.join(folder, script)
with open(path, 'r') as file:
text = file.read()
if "print" in text and "print_function" not in text:
print("future print import not found in file", path)

2 egreps (works with Mac egrep and gnu egrep, others dunno) --
#!/bin/bash
fileswithprint=` egrep --files-with-matches --include '*.py' --recursive -w print $* `
# -R: Follow all symbolic links, unlike -r
# too long: see xargs
egrep --files-without-match '^from __future .*print_function' $fileswithprint

Writing a bash file to run a python program for all folders in a directory

I currently wrote a program to take the standard deviation of a single set of data. I have over 200 folders, each with their own set of data. I am attempting to write a bash file that will execute this program for all folders (while outputting all of the standard deviation into a master file as dictated in python).
So far I have:
#!/bin/bash
for D in SAND; do python sample.py
[ -d "$D" -a -x "$D/all" ] && "$D/all"
done
Note: SAND is my directory.
But this does not work. Please help.
In addition, when I try other examples and run them I keep having the error:
Traceback (most recent call last):
File "sample.py", line 1, in <module>
f=open("default")
IOError: [Errno 2] No such file or directory: 'default'
even though I DO have the data file of "default" in the folders.

The below assumes that SAND is the literal name of your directory.
First choice: Use a loop.
for d in SAND/*/all; do
python sample.py "$d"
done
...or, if you need to change into the directory that's found...
orig_dir=$PWD
for d in SAND/*/all; do
(cd "$d/.." && exec python "$orig_dir/sample.py" all)
done
Second choice: Use find.
I'd suggest searching directly for the targets named all:
find SAND -name all -exec python sample.py '{}' '+'
Alternately, with POSIX find, you can have find invoke a shell to perform more logic:
find SAND -type d -exec bash -c \
'for d; do [[ -d "$d/all" ]] && python sample.py "$d/all"; done' _ '{}' +
If SAND is a variable name, not a literal directory, change SAND in the above to "$SAND", with the quotes (and, ideally, make it lower-case -- by convention, only environment variables and shell builtin variables should be all-caps to avoid namespace conflicts).

Alternatively, you could skip bash altogether and modify your Python script. os.walk() allows you to visit each directory in turn:
import os,sys
for arg in sys.argv[1:] or ['.']:
for dirpath, _, filenames in os.walk(arg):
for filename in filenames:
if filename == 'all':
all_file = os.path.join(dirpath, filename)
default_file = os.path.join(dirpath, 'default')
# ... whatever you do with SAND/foo/all
# foo = open(all_file)
# std_dev = bar(foo)
# ... I'll just print them
print all_file, default_file

Add line on top of each Python file in current and sub directories

I'm on an Ubuntu platform and have a directory containing many .py files and subdirectories (also containing .py files). I would like to add a line of text to the top of each .py file. What's the easiest way to do that using Perl, Python, or shell script?

find . -name \*.py | xargs sed -i '1a Line of text here'
Edit: from tchrist's comment, handle filenames with spaces.
Assuming you have GNU find and xargs (as you specified the linux tag on the question)
find . -name \*.py -print0 | xargs -0 sed -i '1a Line of text here'
Without GNU tools, you'd do something like:
while IFS= read -r filename; do
{ echo "new line"; cat "$filename"; } > tmpfile && mv tmpfile "$filename"
done < <(find . -name \*.py -print)

for a in `find . -name '*.py'` ; do cp "$a" "$a.cp" ; echo "Added line" > "$a" ; cat "$a.cp" >> "$a" ; rm "$a.cp" ; done

#!/usr/bin/perl
use Tie::File;
for (#ARGV) {
tie my #array, 'Tie::File', $_ or die $!;
unshift #array, "A new line";
}
To process all .py files in a directory recursively run this command in your shell:
find . -name '*.py' | xargs perl script.pl

This will
recursively walk all directories starting with the current working
directory
modify only those files whose filename end with '.py'
preserve file permissions (unlike
open(filename,'w').)
fileinput also gives you the option of backing up your original files before modifying them.
import fileinput
import os
import sys
for root, dirs, files in os.walk('.'):
for line in fileinput.input(
(os.path.join(root,name) for name in files if name.endswith('.py')),
inplace=True,
# backup='.bak' # uncomment this if you want backups
):
if fileinput.isfirstline():
sys.stdout.write('Add line\n{l}'.format(l=line))
else:
sys.stdout.write(line)

import os
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith('.py')
file_ptr = open(file, 'r')
old_content = file_ptr.read()
file_ptr = open(file, 'w')
file_ptr.write(your_new_line)
file_ptr.write(old_content)
As far as I know you can't insert in begining or end of file in python. Only re-write or append.

What's the easiest way to do that using Perl, Python, or shell script?
I'd use Perl, but that's because I know Perl much better than I know Python. Heck, maybe I'd do this in Python just to learn it a bit better.
The easiest way is to use the language that you're familiar with and can work with. And, that's also probably the best way too.
If these are all Python scripts, I take it you know Python or have access to a bunch of people who know Python. So, you're probably better off doing the project in Python.
However, it's possible with shell scripts too, and if you know shell the best, be my guest. Here's a little, completely untested shell script right off the top of my head:
find . -type f -name "*.py" | while read file
do
sed 'i\
I want to insert this line
' $file > $file.temp
mv $file.temp $file
done

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.