Simple queue for youtube-dl in the Linux shell - python

youtube-dl is a Python script that allows one to download YouTube videos. It supports an option for batch downloads:
-a FILE, --batch-file=FILE
file containing URLs to download ('-' for stdin)
I want to setup some sort of queue so I can simply append URLs to a file and have youtube-dl process them. Currently, it does not remove files from the batch file. I see the option for '-' stdin and don't know if I can use this to my advantage.
In effect, I'd like to run youtube-dl as some form of daemon which will check the queue file and download the contained file names.
How can I do this?

The tail -f will not work because the script reads all the input at once.
It will work if you modify the script to perform a continuous read of the batch file.
Then simply run the script as:
% ./youtube-dl -a batch.txt -c
When you append some data into batch.txt, say:
% echo "http://www.youtube.com/watch?v=j9SgDoypXcI" >>batch.txt
The script will start downloading the appended video to the batch.
This is the patch you should apply to the latest version of "youtube-dl":
2278,2286d2277
< while True:
< batchurls = batchfd.readlines()
< if not batchurls:
< time.sleep(1)
< continue
< batchurls = [x.strip() for x in batchurls]
< batchurls = [x for x in batchurls if len(x) > 0]
< for bb in batchurls:
< retcode = fd.download([bb])
Hope it helps,
Happy video watching
;)
NOTE: Due to code restructuring this patch will no longer work. Would be interested to see if this could be added to the upstream code.

You might be able to get away with using tail -f to read from your file. It will not exit when it reaches end-of-file but will wait for more data to be appended to the file.
>video.queue # erase and/or create queue file
tail -f video.queue | youtube-dl -a -
Since tail -f does not exit, youtube-dl should continue reading file names from stdin and never exit.

Related

FFmpeg-split.py can't determine video length

First, I'm not a developer. I'm trying to split a movie into 1 minute clips usinf ffmpeg-split.py python script. I made sure FFmpeg is installed it trying a simple command and it worked like magic:
ffmpeg -i soccer.mp4 -ss 00:00:00 -codec copy -t 10 soccer1.mp4
A new video file was created in the same folder.
I saved the FFmpeg-split.py in the same dir, updated python PATH and typed the following command:
python ffmpeg-split.py -f soccer.mp4 -s 10
what I got back was:
can't determine video length
I believe it just can't find the file. I switched video files and even deleted it and got the same message.
Any ideas?
first time I've seen that name!? Because I believe you were able to run ffmpeg from the command line and execute basic python stuff I recommend following my example as it should avoid any weird directory.connection.stuff in the given file (which i ignored). "Earlier that day": Let me ignore the .py script and share as follows:
Assuming you ran
ffmpeg -i soccer.mp4 ...stuff... soccer1.mp4
from a windows.command.line...
It would be better to write
ffmpeg -t 10 -i "Z:\\full\\input\\path.mp4" -c copy "Z:\\full\\output\\path.mp4"
This says, run ffmpeg, -t=input.duration.seconds, -i=input.file.next,
"fullinpath" in quotes cause spaces etc., -c=all.codecs, copy=atlantian.magic.trick,
"fulloutpath" also to be safe, nothing else!
"Piping" through python to windows works great for this:
import subprocess as subprocess
def pegRunner(cmd): #Takes a list of strings we'll pass to windows.
command = [x for x in cmd] # peg short for mpeg, shoulda used meg.gem.gepm.gipper.translyvania.otheroptions
result = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output, err = result.communicate()
print result.wait()
return "pegRannered"
#########
# Find the duration from properties or something. If you need to do this
# often it's more complicated. Let's say you found 4mins33secs.
############
leng = 4*60+33 # time in seconds
last_dur = int(leng%60) #remaining time after the 4 one.min.vids
if last_dur == 0: num_vids = int(leng/60)
else: num_vids = int(leng/60)+1
for i in range(num_vids):
da_command = ['ffmpeg']
da_command.append('-ss')
da_command.append(str(i*60))
da_command.append('-t')
if i != num_vids: da_command.append('60')
else: da_command.append(str(last_dur))
da_command.append('-i')
da_command.append('Z:\\full\\input\\path.mp4') #this format!
da_command.append('-c')
da_command.append('copy')
#optionally to overwrite!!!! da_command.append('-y')
da_command.append('Z:\\full\\output\\path\\filename_'+str(i)+'.mp4')
print pegRunner(da_command)
print "Finished "+str(i)+" filez."
This should handle the 1.min pieces and provide a good starting place for ffmpeg from python.

Serial Numbers from a Storage Controller over SSH

Background
I'm working on a bash script to pull serial numbers and part numbers from all the devices in a server rack, my goal is to be able to run a single script (inventory.sh) and walk away while it generates text files containing the information I need. I'm using bash for maximum compatibility, the RHEL 6.7 systems do have Perl and Python installed, however they have minimal libraries. So far I haven't had to use anything other than bash, but I'm not against calling a Perl or Python script from my bash script.
My Problem
I need to retrieve the Serial Numbers and Part numbers from the drives in a Dot Hill Systems AssuredSAN 3824, as well as the Serial numbers from the equipment inside. The only way I have found to get all the information I need is to connect over SSH and run the following three commands dumping the output to a local file:
show controllers
show frus
show disks
Limitations:
I don't have "sshpass" installed, and would prefer not to install it.
The Controller is not capable of storing SSH keys ( no option in custom shell).
The Controller also cannot write or transfer local files.
The Rack does NOT have access to the Internet.
I looked at paramiko, but while Python is installed I do not have pip.
I also cannot use CPAN.
For what its worth, the output comes back in XML format. (I've already written the code to parse it in bash)
Right now I think my best option would be to have a library for Python or Perl in the folder with my other scripts, and write a script to dump the commands' output to files that I can parse with my bash script. Which language is easier to just provide a library in a file? I'm looking for a library that is as small and simple as possible to use. I just need a way to get the output of those commands to XML files. Right now I am just using ssh 3 times in my script and having to enter the password each time.
Have a look at SNMP. There is a reasonable chance that you can use SNMP tools to remotely extract the information you need. The manufacturer should be able to provide you with the MIBs.
I ended up contacting the Manufacturer and asking my question. They said that the system isn't setup for connecting without a password, and their SNMP is very basic and won't provide the information I need. They said to connect to the system with FTP and use "get logs " to download an archive of the configuration and logs. Not exactly ideal as it takes 4 minutes just to run that one command but it seems to be my only option. Below is the script I wrote to retrieve the file automatically by adding the login credentials to the .netrc file. This works on RHEL 6.7:
#!/bin/bash
#Retrieve the logs and configuration from a Dot Hill Systems AssuredSAN 3824 automatically.
#Modify "LINE" and "HOST" to fit your configuration.
LINE='machine <IP> login manage password <password>'
HOST='<IP>'
AUTOLOGIN="/root/.netrc"
FILE='logfiles.zip'
#Check for and verify the autologin file
if [ -f $AUTOLOGIN ]; then
printf "Found auto-login file, checking for proper entry... \r"
READLINE=`cat $AUTOLOGIN | grep "$LINE"`
#Append the line to the end of .netrc if file exists but not the line.
if [ "$LINE" != "$READLINE" ]; then
printf "Proper entry not found, creating it... \r"
echo "$LINE" >> "$AUTOLOGIN"
else
printf "Proper entry found... \r"
fi
#Create the Autologin file if it doesn't exist
else
printf "Auto-Login file does not exist, creating it and setting permissions...\r"
echo "$LINE" > "$AUTOLOGIN"
chmod 600 "$AUTOLOGIN"
fi
#Start getting the information from the controller. (This takes a VERY long time)
printf "Retrieving Storage Controller data, this will take awhile... \r"
ftp $HOST << SCRIPT
get logs $FILE
SCRIPT
exit 0
This gave me a bunch of files in the zip, but all I needed was the "store_....logs" file. It was about 500,000 lines long, the first portion is the entire configuration in XML format, then the configuration in text format, followed by the logs from the system. I parsed the file and stripped off the logs at the end which cut the file down to 15,000 lines. From there I divided it into two files (config.xml and config.txt). I then pulled the XML output of the 3 commands that I needed and it to the 3 files my previously written script searches for. Now my inventory script pulls in everything it needs, albeit pretty slow due to waiting 4 minutes for the system to generate the zip file. I hope this helps someone in the future.
Edit:
Waiting 4 minutes for the system to compile was taking too long. So I ended up using paramiko and python scripts to dump output from the commands to files that my other code can parse. It accepts the IP of the Controller as a parameter. Here is the script for those interested. Thank you again for all the help.
#!/usr/bin/env python
#Saves output of "show disks" from the storage Controller to an XML file.
import paramiko
import sys
import re
import xmltodict
IP = sys.argv[1]
USERNAME = "manage"
PASSWORD = "password"
FILENAME = "./logfiles/disks.xml"
cmd = "show disks"
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
client.connect(IP,username=USERNAME,password=PASSWORD)
stdin, stdout, stderr = client.exec_command(cmd)
except Exception as e:
sys.exit(1)
data = ""
for line in stdout:
if re.search('#', line):
pass
else:
data += line
client.close()
f = open(FILENAME, 'w+')
f.write(data)
f.close()
sys.exit(0)

python: unable to find files in recently changed directory (OSx)

I'm automating some tedious shell tasks, mostly file conversions, in a kind of blunt force way with os.system calls (Python 2.7). For some bizarre reason, however, my running interpreter doesn't seem to be able to find the files that I just created.
Example code:
import os, time, glob
# call a node script to template a word document
os.system('node wordcv.js')
# print the resulting document to pdf
os.system('launch -p gowdercv.docx')
# move to the directory that pdfwriter prints to
os.chdir('/users/shared/PDFwriter/pauliglot')
print glob.glob('*.pdf')
I expect to have a length 1 list with the resulting filename, instead I get an empty list.
The same occurs with
pdfs = [file for file in os.listdir('/users/shared/PDFwriter/pauliglot') if file.endswith(".pdf")]
print pdfs
I've checked by hand, and the expected files are actually where they're supposed to be.
Also, I was under the impression that os.system blocked, but just in case it doesn't, I also stuck a time.sleep(1) in there before looking for the files. (That's more than enough time for the other tasks to finish.) Still nothing.
Hmm. Help? Thanks!
You should add a wait after the call to launch. Launch will spawn the task in the background and return before the document is finished printing. You can either put in some arbitrary sleep statements or if you want you can also check for file existence if you know what the expected filename will be.
import time
# print the resulting document to pdf
os.system('launch -p gowdercv.docx')
# give word about 30 seconds to finish printing the document
time.sleep(30)
Alternative:
import time
# print the resulting document to pdf
os.system('launch -p gowdercv.docx')
# wait for a maximum of 90 seconds
for x in xrange(0, 90):
time.sleep(1)
if os.path.exists('/path/to/expected/filename'):
break
Reference for potentially needing a longer than 1 second wait here

Turn .pyw Python-script to batch process

I've got an existing .pyw-script (with a GUI) and want to turn it into a batch process.
The script itselfs covert me a PDF to a new PDF
(for backup), but it's a bit annoying because I only can process 1 file at once.
Here are the inputs:
Input-Filepath
Output-Filepath
Is there now a way, so I can add a folder-path and it will convert all the existing files inside?
You could automate it using your favourite console script. I like bash:
The scripts are untested!
A version that expects: <thescript> inputfile1 inputfile2 ... inputfileN and outputs inputfileN_out.pdf
for ((i = 1; i < $#; ++i)); do
inputfile="${!i}"
outputfile="${inputfile}_out.pdf"
<your python file> "$inputfile" "$outputfile"
done
And here is a version that takes a folder and processes all pdf files found and outputs pdffilename_out.pdf
while read -d$'\0' -r inputfile; do
outputfile="${inputfile}_out.pdf"
<your python file> "$inputfile" "$outputfile"
done < <(find "$1" -type f -iname '*.pdf' -print0)

"gsutil rm" command using STDIN

I use gsutil in a Linux environment for managing files in GCS. I enjoy being able to use the command
gsutil -m cp -I gs://...
preceded by some other command to pass the STDIN to gsutil for uploading files; in doing so, I can maintain a local list of files that have been uploaded or generate specific patterns to upload and hand them off.
I would like to be able to do a similar command like
gsutil -m rm -I gs://...
to scrub files similarly. Presently, I build a big list of files to remove and run it with the following code:
while read line
do
gsutil rm gs://...
done < "$myfile.txt"
This is extraordinarily slow compared to the multithreaded "gsutil -m rm..." command, and enabling the -m flag has no effect when you have to process files one at a time from a list. I also experimented with just running
gsutil -m rm gs://.../* # remove everything
<my command> | gsutil -m cp -I gs://.../ # put back the pieces that I want
but this involves recopying a lot of a data and wastes a lot of time; the data is already there and just needs to have some removed. Any thoughts would be appreciated. Also, I don't have a lot of flexibility on either end with renaming files; otherwise, a quick rename before uploading would handle all of this.
As an interim solution, since we don't have a -I option for rm right now, how about just creating a string of all the objects you want to delete in your loop and then using gsutil -m rm to delete it? You could also do this with a simple python script that invokes the gsutil command from within python as a separate process.
Expanding on your earlier example, maybe something like the following (disclaimer: my bash-fu isn't the greatest, and I haven't tested this):
objects=''
while read line
do
objects="$objects gs://$line"
done
gsutil -m rm $objects
For anyone wondering, I wound up doing like Zach Wilt indicated above. For reference, I was removing on the order of a couple thousand files from a span of 5 directories, so roughly 10,000 files. Doing this without the "-m" switch was taking upwards of 30 minutes; with the "-m" switch, it takes less than 30 seconds. Zoom!
For a robust example: I am using this to update Google Cloud Storage files to match local files. On the current day, I have a program that dumps lots of files that are incremental, and also a handful that are "rolled up". After a week, the incremental files get scrubbed locally automatically, but the same should happen in GCS to save the space. Here's how to do this:
#!/bin/bash
# get the full date strings for touch
start=`date --date='-9 days' +%x`
end=`date --date='-8 days' +%x`
# other vars
mon=`date --date='-9 days' +%b | tr [A-Z] [a-z]`
day=`date --date='-9 days' +%d`
# display start and finish times
echo "Cleaning files from $start"
# update start and finish times
touch --date="$start" /tmp/start1
touch --date="$end" /tmp/end1
# repeat for all servers
for dr in "dir1" "dir2" "dir3" ...
do
# list files in range and build retention file
find /local/path/$dr/ -newer /tmp/start1 ! -newer /tmp/end1 > "$dr-local.txt"
# get list of all files from appropriate folder on GCS
gsutil ls gs://gcs_path/$mon/$dr/$day/ > "$dr-gcs.txt"
# formatting the host list file
sed -i "s|gs://gcs_path/$mon/$dr/$day/|/local/path/$dr/|" "$dr-gcs.txt"
# build sed command file to delete matches
while read line
do
echo "\|$line|d" >> "$dr-del.txt"
done < "$dr-local.txt"
# run command file to strip lines for files that need to remain
sed -f "$dr-del.txt" <"$dr-gcs.txt" >"$dr-out.txt"
# convert local names to GCS names
sed -i "s|/local/path/$dr/|gs://gcs_path/$mon/$dr/$day/|" "$dr-out.txt"
# new variable to hold string
del=""
# convert newline separated file to one long string
while read line
do
del="$del$line "
done < "$dr-out.txt"
# remove all files matching the final output
gsutil -m rm $del
# cleanup files
rm $dr-local.txt
rm $dr-gcs.txt
rm $dr-del.txt
rm $dr-out.txt
done
You'll need to modify to fit your needs, but this is a concrete and working method for deleting files locally, and then synchronizing the change to Google Cloud Storage. Obviously, modify to fit your needs. Thanks again to #Zach Wilt.

Categories