I've got an existing .pyw-script (with a GUI) and want to turn it into a batch process.
The script itselfs covert me a PDF to a new PDF
(for backup), but it's a bit annoying because I only can process 1 file at once.
Here are the inputs:
Input-Filepath
Output-Filepath
Is there now a way, so I can add a folder-path and it will convert all the existing files inside?
You could automate it using your favourite console script. I like bash:
The scripts are untested!
A version that expects: <thescript> inputfile1 inputfile2 ... inputfileN and outputs inputfileN_out.pdf
for ((i = 1; i < $#; ++i)); do
inputfile="${!i}"
outputfile="${inputfile}_out.pdf"
<your python file> "$inputfile" "$outputfile"
done
And here is a version that takes a folder and processes all pdf files found and outputs pdffilename_out.pdf
while read -d$'\0' -r inputfile; do
outputfile="${inputfile}_out.pdf"
<your python file> "$inputfile" "$outputfile"
done < <(find "$1" -type f -iname '*.pdf' -print0)
Related
I need to run a python script for multiple input files and for each one, I want to generate a new corresponding output file (e.g. for input_16jun.txt I want the output file to be 16jun_output.txt). I tried doing something like:
nohup python script.py input_{16..22}jun.txt > {16..22}jun_output.txt &
But I keep getting "ambiguous redirect" error. Does anyone know how to fix this? Or any other better approach?
Looping over each input file like this with bash should work.
for f in input_*.txt; do python script.py $f > "${f:6:-4}"_output.txt; done
Alternatively if you want to do the loop in a python script.
import glob
import os
input_files = glob.glob("input_*.txt")
for f in input_files:
os.system("python script.py {} > {}_output.txt".format(f,f.split("input_")[1].rstrip(".txt")))
If you want to run script.py in parallel (rather than sequentially) you can also consider using the python multiprocessing package.
I have 3 different files, one Python file and two .bat files. They communicate between each other (hopefully).
When I execute the "Process_Videos.bat" by itself (double clicking in the windows explorer) it works fine, but whenever I call it from the Python file it doesnt work at all, just says "press any button to continue..."
I really need to have this structure, calling the "Process_Videos.bat" from a Python file, since I am extracting some web info. The "pythonExecute.bat" just works as a trigger for the entire process.
Also I have tried the "subprocess" approach, but not working either.
The files and respective code:
pythonExecute.bat
python "D:\\tests\\pythonCall.py"
pythonCall.py
import os
os.system('D:\\tests\\3.asc\\Process_Videos_asc.bat')
Process_Videos.bat
#echo off
setlocal EnableDelayedExpansion
set "FolderBaseName=TestName"
set "DropBoxFolder=D:\tests\3.asc\myDropBoxFolder"
set "BaseOutputFolder=D:\tests\3.asc\TEMP"
for %%I in (*.png) do (
set "slaveName=%%~nI"
set "slaveName=!slaveName:~6!
set "OutputFolder=%BaseOutputFolder%_!slaveName!"
echo !slaveName!
md "!OutputFolder!" 2>nul
for %%J in (*.mp4*) do (
ffmpeg -i "%%~fJ" -i "%%~fI" -filter_complex overlay "!OutputFolder!\%%~nJ.mp4"
)
"C:\Program Files\WinRAR\rar.exe" a -cfg- -ep1 -inul -m5 "%DropBoxFolder%\%FolderBaseName%_!slaveName!" "!slaveName:~6!\*"
rd /S /Q "!OutputFolder!"
)
pause
You need to:
a) Invoke your batch file within the directory it is in, (e.g. by changing directory first), and
b) Get rid of the pause at the end of the batch file.
You should also consider replacing the batch file altogether - python can do all of the things that it does much more neatly.
The accepted answer to this SO question gives some very good tips.
I am trying to execute a python script on all text files in a folder:
for fi in sys.argv[1:]:
And I get the following error
-bash: /usr/bin/python: Argument list too long
The way I call this Python function is the following:
python functionName.py *.txt
The folder has around 9000 files. Is there some way to run this function without having to split my data in more folders etc? Splitting the files would not be very practical because I will have to execute the function in even more files in the future... Thanks
EDIT: Based on the selected correct reply and the comments of the replier (Charles Duffy), what worked for me is the following:
printf '%s\0' *.txt | xargs -0 python ./functionName.py
because I don't have a valid shebang..
This is an OS-level problem (limit on command line length), and is conventionally solved with an OS-level (or, at least, outside-your-Python-process) solution:
find . -maxdepth 1 -type f -name '*.txt' -exec ./your-python-program '{}' +
...or...
printf '%s\0' *.txt | xargs -0 ./your-python-program
Note that this runs your-python-program once per batch of files found, where the batch size is dependent on the number of names that can fit in ARG_MAX; see the excellent answer by Marcus Müller if this is unsuitable.
No. That is a kernel limitation for the length (in bytes) of a command line.
Typically, you can determine that limit by doing
getconf ARG_MAX
which, at least for me, yields 2097152 (bytes), which means about 2MB.
I recommend using python to work through a folder yourself, i.e. giving your python program the ability to work with directories instead of individidual files, or to read file names from a file.
The former can easily be done using os.walk(...), whereas the second option is (in my opinion) the more flexible one. Use the argparse module to give your python program an easy-to-use command line syntax, then add an argument of a file type (see reference documentation), and python will automatically be able to understand special filenames like -, meaning you could instead of
for fi in sys.argv[1:]
do
for fi in opts.file_to_read_filenames_from.read().split(chr(0))
which would even allow you to do something like
find -iname '*.txt' -type f -print0|my_python_program.py -file-to-read-filenames-from -
Don't do it this way. Pass mask to your python script (e.g. call it as python functionName.py "*.txt") and expand it using glob (https://docs.python.org/2/library/glob.html).
I think about using glob module. With this module you invoke your program like:
python functionName.py "*.txt"
then shell will not expand *.txt into file names. You Python program will receive *.txt in argumens list and you can pass it into glob.glob():
for fi in glob.glob(sys.argv[1]):
...
I use gsutil in a Linux environment for managing files in GCS. I enjoy being able to use the command
gsutil -m cp -I gs://...
preceded by some other command to pass the STDIN to gsutil for uploading files; in doing so, I can maintain a local list of files that have been uploaded or generate specific patterns to upload and hand them off.
I would like to be able to do a similar command like
gsutil -m rm -I gs://...
to scrub files similarly. Presently, I build a big list of files to remove and run it with the following code:
while read line
do
gsutil rm gs://...
done < "$myfile.txt"
This is extraordinarily slow compared to the multithreaded "gsutil -m rm..." command, and enabling the -m flag has no effect when you have to process files one at a time from a list. I also experimented with just running
gsutil -m rm gs://.../* # remove everything
<my command> | gsutil -m cp -I gs://.../ # put back the pieces that I want
but this involves recopying a lot of a data and wastes a lot of time; the data is already there and just needs to have some removed. Any thoughts would be appreciated. Also, I don't have a lot of flexibility on either end with renaming files; otherwise, a quick rename before uploading would handle all of this.
As an interim solution, since we don't have a -I option for rm right now, how about just creating a string of all the objects you want to delete in your loop and then using gsutil -m rm to delete it? You could also do this with a simple python script that invokes the gsutil command from within python as a separate process.
Expanding on your earlier example, maybe something like the following (disclaimer: my bash-fu isn't the greatest, and I haven't tested this):
objects=''
while read line
do
objects="$objects gs://$line"
done
gsutil -m rm $objects
For anyone wondering, I wound up doing like Zach Wilt indicated above. For reference, I was removing on the order of a couple thousand files from a span of 5 directories, so roughly 10,000 files. Doing this without the "-m" switch was taking upwards of 30 minutes; with the "-m" switch, it takes less than 30 seconds. Zoom!
For a robust example: I am using this to update Google Cloud Storage files to match local files. On the current day, I have a program that dumps lots of files that are incremental, and also a handful that are "rolled up". After a week, the incremental files get scrubbed locally automatically, but the same should happen in GCS to save the space. Here's how to do this:
#!/bin/bash
# get the full date strings for touch
start=`date --date='-9 days' +%x`
end=`date --date='-8 days' +%x`
# other vars
mon=`date --date='-9 days' +%b | tr [A-Z] [a-z]`
day=`date --date='-9 days' +%d`
# display start and finish times
echo "Cleaning files from $start"
# update start and finish times
touch --date="$start" /tmp/start1
touch --date="$end" /tmp/end1
# repeat for all servers
for dr in "dir1" "dir2" "dir3" ...
do
# list files in range and build retention file
find /local/path/$dr/ -newer /tmp/start1 ! -newer /tmp/end1 > "$dr-local.txt"
# get list of all files from appropriate folder on GCS
gsutil ls gs://gcs_path/$mon/$dr/$day/ > "$dr-gcs.txt"
# formatting the host list file
sed -i "s|gs://gcs_path/$mon/$dr/$day/|/local/path/$dr/|" "$dr-gcs.txt"
# build sed command file to delete matches
while read line
do
echo "\|$line|d" >> "$dr-del.txt"
done < "$dr-local.txt"
# run command file to strip lines for files that need to remain
sed -f "$dr-del.txt" <"$dr-gcs.txt" >"$dr-out.txt"
# convert local names to GCS names
sed -i "s|/local/path/$dr/|gs://gcs_path/$mon/$dr/$day/|" "$dr-out.txt"
# new variable to hold string
del=""
# convert newline separated file to one long string
while read line
do
del="$del$line "
done < "$dr-out.txt"
# remove all files matching the final output
gsutil -m rm $del
# cleanup files
rm $dr-local.txt
rm $dr-gcs.txt
rm $dr-del.txt
rm $dr-out.txt
done
You'll need to modify to fit your needs, but this is a concrete and working method for deleting files locally, and then synchronizing the change to Google Cloud Storage. Obviously, modify to fit your needs. Thanks again to #Zach Wilt.
youtube-dl is a Python script that allows one to download YouTube videos. It supports an option for batch downloads:
-a FILE, --batch-file=FILE
file containing URLs to download ('-' for stdin)
I want to setup some sort of queue so I can simply append URLs to a file and have youtube-dl process them. Currently, it does not remove files from the batch file. I see the option for '-' stdin and don't know if I can use this to my advantage.
In effect, I'd like to run youtube-dl as some form of daemon which will check the queue file and download the contained file names.
How can I do this?
The tail -f will not work because the script reads all the input at once.
It will work if you modify the script to perform a continuous read of the batch file.
Then simply run the script as:
% ./youtube-dl -a batch.txt -c
When you append some data into batch.txt, say:
% echo "http://www.youtube.com/watch?v=j9SgDoypXcI" >>batch.txt
The script will start downloading the appended video to the batch.
This is the patch you should apply to the latest version of "youtube-dl":
2278,2286d2277
< while True:
< batchurls = batchfd.readlines()
< if not batchurls:
< time.sleep(1)
< continue
< batchurls = [x.strip() for x in batchurls]
< batchurls = [x for x in batchurls if len(x) > 0]
< for bb in batchurls:
< retcode = fd.download([bb])
Hope it helps,
Happy video watching
;)
NOTE: Due to code restructuring this patch will no longer work. Would be interested to see if this could be added to the upstream code.
You might be able to get away with using tail -f to read from your file. It will not exit when it reaches end-of-file but will wait for more data to be appended to the file.
>video.queue # erase and/or create queue file
tail -f video.queue | youtube-dl -a -
Since tail -f does not exit, youtube-dl should continue reading file names from stdin and never exit.