using python tool on multiple files - python

there is a tool I write using python that analyze a pdf file by passing it in the cmd
c:\python "my_tool.py" -s "my_pdf.pdf"
I want to test the tool on 1000 files. how could I run the tool on all of the 1000 files.
I used this
for /f %%f in ('dir /b C:\Users\Test\Desktop\CVE_2010-2883_PDF_25files') do echo %%f
but how can I specify (the tool) and (-s) argument

Try like this :
#echo off
for /f %%f in ('dir /a-d/b C:\Users\Test\Desktop\CVE_2010-2883_PDF_25files\*.pdf') do (
"c:\python\my_tool.py" -s "%%~dnxf")

you can use grep to pass all .pdf file to script !
c:\python grep *.pdf|"my_tool.py" -s
or with this script :
for i in $(\ls -d *.pdf)
do
python "my_tool.py" -s $i
done

If you have the Unix find command you can use
find . -type f -name "*.pdf" -exec c:\python "my_tool.py" -s {} \;
This runs your command on each of the pdf files in the current directory

You can make your life a lot easier, by making sure the tool can just search a directory for all pdf files:
import glob
import os
def get_files(directory):
for i in glob.iglob(os.path.join(directory, '*.pdf')):
do_something(i)
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-f', '--file', help='enter filename')
parser.add_argument('-d', '--directory', help='enter directory of pdf files')
args = parser.parse_args()
if args.directory:
get_files(args.directory)
if args.file:
do_something(args.file)

Related

How to share arguments/variables between executable python script and python script?

I have a script (test.py) that I run as a command from the terminal, which calls another script (main.py). I want to store the arguments entered by the user as variables, and then pass them onto the second script.
E.g. when I run 'test -t foo' I want to save 'foo' as 'test=foo', and then when I call 'os.system("python main.py")' at the end of test.py I want main.py to print 'foo'.
This is what I have so far:
test.py
import os, argparse
parser = argparse.ArgumentParser()
parser.add_argument("-t", "--test", action="store", help="Store argument as variable")
args = parser.parse_args()
#I'm not sure how to save the argument as a variable
os.system("python main.py") #I need to keep this line - please see the comments below
terminal commands
chmod +x test.py
mv test.py test
mkdir -p ~/bin
cp test ~/bin
echo 'export PATH=$PATH":$HOME/bin"' >> .profile
main.py
from __main__ import * #this does not work
if args.test:
print(#variable)
In case it is helpful for anyone, I have found a way around the problem:
test.py
import os, argparse
parser = argparse.ArgumentParser()
parser.add_argument("-t", "--test", action="store", type=str, default="None", help="Store argument as variable")
args = parser.parse_args()
with open ("variables.py", 'w') as variables:
variables.writelines(args.test)
os.system("python main.py")
terminal commands
chmod +x test.py
mv test.py test
mkdir -p ~/bin
cp test ~/bin
echo 'export PATH=$PATH":$HOME/bin"' >> .profile
main.py
with open ("variables.py", 'r') as variables:
test = variables.read()
print(test)
It's probably not very pythonic but it does the trick.

Processing files from shell script to argument inside python

I am manually running file with command line
python script.py input_files/input.txt text_out/output.json
while inside script.py
there is
input_path = sys.argv[1]
out_path = sys.argv[2]
now I have shell script and I want to make it for all files in one go.
I am facing issue.
My shell script is like below
there are two folders 1) input_files and 2) text_out
for i in input_files/*.txt;
do name=`echo "$i" | cut -d'.' -f1`
echo "$name"
python script.py -i "$i" text_out/"${name}.json"
done
but when I execude .sh as stated above, it is throwing error as sys.argv is not picking properly.
out_path = sys.argv[2]
IndexError: list index out of range
If you can guide what to change in .py or in shell .sh script would be kind.
I don't know exactly why you're getting a ListIndexOutOfRange, but it doesn't really matter, since you're also passing -i after script.py, so out_path cannot be what you expect.
$ cat script.py
import sys; print(len(sys.argv)); print(sys.argv); print({i:v for i, v in enumerate(sys.argv)})
$ (set -x; i=input_files/foo.txt; name=`echo "$i" | cut -d'.' -f1`; python script.py -i "$i" text_out/"${name}.json")
+ i=input_files/foo.txt
++ echo input_files/foo.txt
++ cut -d. -f1
+ name=input_files/foo
+ python script.py -i input_files/foo.txt text_out/input_files/foo.json
4
['script.py', '-i', 'input_files/foo.txt', 'text_out/input_files/foo.json']
{0: 'script.py', 1: '-i', 2: 'input_files/foo.txt', 3: 'text_out/input_files/foo.json'}
I recommend using argparse whenever you need to deal with cli arguments in Python. It will give better feedback and reduce the ambiguity of looking directly at indices.
$ cat script2.py
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--input-file', type=argparse.FileType('r'))
parser.add_argument('output_file', type=argparse.FileType('w'))
print(parser.parse_args())
$ (set -x; i=input_files/foo.txt; name=`echo "$i" | cut -d'.' -f1`; python script2.py -i "$i" text_out/"${name}.json")
+ i=input_files/foo.txt
++ echo input_files/foo.txt
++ cut -d. -f1
+ name=input_files/foo
+ python script2.py -i input_files/foo.txt text_out/input_files/foo.json
Namespace(input_file=<_io.TextIOWrapper name='input_files/foo.txt' mode='r' encoding='UTF-8'>, output_file=<_io.TextIOWrapper name='text_out/input_files/foo.json' mode='w' encoding='UTF-8'>)
for i in input_files/*.txt;
do name=`echo "$i" | cut -d'.' -f1`
echo "$name"
python script.py "$i" text_out/"${name}.json"
done
You get the IndexError: list index out of range error because you try to access the list at index 2 for out_path (in your Python script).
But in your shell script you're just passing one argument in the Python script (at this line: python script.py text_out/"${name}.json").
Your first example (python script.py input_files/input.txt text_out/output.json) works ofc, because you're just passing two arguments into the Python script. That's why you can access sys.argv easily at index 1 and 2.
You should check the length of len(sys.argv) to know how many arguments are passed into the Python script.
Example Shell Script
Your shell script should look something like this to get rid if the IndexError:
for i in input_files/*.txt;
do name=`echo "$i" | cut -d'.' -f1`
echo "$name"1
# Pass two args into your Python script
python script.py input.txt output.txt
done

invalid syntax while concat mpeg files on windows PYTHON

I am tying to concatenate all mpeg files together in one new file in windows 7, I adjusted the environment variables and running the code from python shell but it gives invalid syntax. Any help as I am new to Python and ffmpeg library?
My code:
ffmpeg -f concat -i <(for f in glob.glob("*.mpeg"); do echo "file '$PWD/$f'"; done) -c copy output.mpeg
Thanks
Your example code is mix or Python code and Bash code so it can't run in Python Shell nor in Bash Shell :)
On Linux it works in Bash as two commands:
(Windows probably doesn't have printf command)
printf "file '%s'\n" *.wav > input.txt
ffmpeg -f concat -i input.txt -c copy output.mpeg
Python version which doesn't need Bash:
#!/usr/bin/env python3
import os
import sys
import glob
import subprocess
# get Current Working Directory (CWD)
pwd = os.getcwd()
# get list of files
if len(sys.argv) > 1:
#filenames = sys.argv[1:] # Linux
filenames = glob.glob(sys.argv[1]) # Windows
else:
filenames = glob.glob("*.mpg")
#print(filenames)
# generate "input.txt" file
with open("input.txt", "w") as f:
for name in filenames:
f.write("file '{}/{}'\n".format(pwd, name))
#f.write("file '{}'\n".format(name))
# run ffmpeg
subprocess.run('ffmpeg -f concat -i input.txt -c copy output.mpeg', shell=True)
And you can run it with or without argument ie. "*.wav"
python script.py *.wav
(tested only on Linux)
printf (and other Bash commands) for Windows: GnuWin32
More on GnuWin32

Use curl to download multiple files

I have to use cURL on Windows using python script. My goal is: using python script get all files from remote directory ... preferably into local directory. After that I will compare each file with the files stored locally. I am able to get one file at a time but I need to get all of the files from remote directory.
Could someone please advice how to get multiple files?
I use this command:
curl.exe -o file1.txt sftp:///dir1/file1.txt -k -u user:password
thanks
I haven't tested this, but I think you could just try launching each shell command as a separate process to run them simultaneously. Obviously, this might be a bad idea if you have a large set of files, so you might need to manage that more carefully. Here's some untested code, and you'd need to edit the 'cmd' variable in the get_file function, of course.
from multiprocessing import Process
import subprocess
def get_file(filename):
cmd = '''curl.exe -o {} sftp:///dir1/{} -k -u user:password'''.format(filename, filename)
subprocess.check_output(cmd, shell=True, stderr=subprocess.STDOUT) # run the shell command
files = ['file1.txt', 'file2.txt', 'file3.txt']
for filename in files:
p = Process(target=get_file, args=(filename,)) # create a process which passes filename to get_file()
p.start()

Bash script to find files in folders

I have a couple of folders as
Main/
/a
/b
/c
..
I have to pass input file abc1.txt, abc2.txt from each of these folders respectively as an input file to my python program.
The script right now is,
for i in `cat file.list`
do
echo $i
cd $i
#works on the assumption that there is only one .txt file
inputfile=`ls | grep .txt`
echo $inputfile
python2.7 ../getDOC.py $inputfile
sleep 10
cd ..
done
echo "Script executed successfully"
So I want the script to work correctly regardless of number of .txt files.
Can anyone let me know if there is any inbuilt command in shell to fetch the correct .txt files in case for multiple .txt files?
The find command is well suited for this with -exec:
find /path/to/Main -type f -name "*.txt" -exec python2.7 ../getDOC.py {} \; -exec sleep 10 \;
Explanation:
find - invoke find
/path/to/Main - The directory to start your search at. By default find searches recursively.
-type f - Only consider files (as opposed to directories, etc)
-name "*.txt" - Only find the files with .txt extension. This is quoted so bash doesn't auto-expand the wildcard * via globbing.
-exec ... \; - For each such result found, run the following command on it:
python2.7 ../getDOC.py {}; - the {} part is where the search result from the find gets substituted into each time.
sleep 10 - sleep for 10 seconds after each time python script is run on the file. Remove this if you don't want it to sleep.
Better using globs :
shopt -s globstar nullglob
for i in Main/**/*txt; do
python2.7 ../getDOC.py "$i"
sleep 10
done
This example is recursive and require bash4
find . -name *.txt | xargs python2.7 ../getDOC.py

Categories