I have a list of YouTube links that I'd like to download just the sound file (a list of albums, that I'm going to then turn into .wav files to analyze). I've been using Pytube, but it's very slow and I'm hoping to find a way to possibly compress the file before it actually downloads or processes so it can provide the file faster. Code I'm using is below:
from pytube import YouTube
import time
t1 = time.time()
myAudioStream = YouTube("https://www.youtube.com/watch?v=U_SLL3-NEMM").streams.last()
t2 = time.time()
print(t2-t1)
myAudioStream.download("C:\\Users\\MyUser\\Python Projects\\AlbumFiles\\")
t3 = time.time()
print(t3-t2)
The link in there currently is just a song, since I wanted to get an idea of how long it'd take, and it still takes about 200 seconds. If I want to download something 4-8x larger, it will probably be quite awhile before it finishes. Is there something I can do when processing this data to speed this up?
There is a free, cross platform (Windows/Mac/Linux), command line program named youtube-dl that can convert YouTube videos to mp3 files.
Show a list of the available formats for a specific YouTube URL which I have denoted by <URL> in the following line of code.
youtube-dl -F <URL>
Some of the available formats for a specific YouTube URL are audio only and they are identified as audio only in the results of youtube-dl -F <URL>.
youtube-dl can convert YouTube videos to mp3 files with the following command:
youtube-dl -f your-choice-of-format --extract-audio --audio-format mp3 <URL>
where your-choice-of-format is replaced by an format integer number that is selected from the results of youtube-dl -F <URL>.
A YouTube video has to be downloaded before it can be converted as part of the execution of the above command, because youtube-dl cannot convert a video to mp3 format unless it has access to it, so youtube-dl downloads the entire video as a temporary file and then deletes the temporary file automatically when it is done converting it.
youtube-dl can be installed on any OS that has Python installed with this command:
python3 -m pip install youtube-dl
In addition to converting YouTube videos to mp3 files, youtube-dl has an amazing list of capabilities including downloading playlists and channels, downloading multiple videos from a list of URLs in a text file, and downloading part of a playlist or channel by specifying the start NUMBER and the end NUMBER of the batch of videos that you want to download from a playlist as follows:
youtube-dl -f FORMAT -ci --playlist-start NUMBER --playlist-end NUMBER <URL-of-playlist>
There's something else you can do with youtube-dl if you already bought a CD and found the music video of a song from that CD on YouTube. You can download the music video, remove its audio track, and replace it with a high definition audio track from your own CD.
So I'd like to just report the results of the post above. I know this might belong in a comment, but I tried slightly different methods and would like to provide the code. I looked at different approaches people used to call youtube-dl and compared the speed.
So in all of my methods, I used youtube-dl, because it was so much faster than Pytube. I'm not sure what makes Pytube so much slower, but if someone wants to comment an explanation, I am interested!
First method: Using os.system to play the command line
import os
os.system('youtube-dl --extract-audio --audio-format mp3 https://www.youtube.com/watch?v=U_SLL3-NEMM')
Result: about 30 second, and produced an MP3.
Second method: Embedding youtube-dl as a library
import youtube-dl as ydl
with youtube_dl.YoutubeDL({}) as ydl:
ydl.download(['https://www.youtube.com/watch?v=U_SLL3-NEMM'])
Result: About 10 seconds, and produced a MKV file (larger storage space than the MP3)
Third method: Running the command line with subprocess
from subprocess import call
command = "youtube-dl --extract-audio --audio-format mp3 https://www.youtube.com/watch?v=U_SLL3-NEMM"
call(command.split(), shell=False)
Result: Similar to first method with os; 30 seconds, output was an MP3.
EDIT: I have found a way to output the fastest method (embedding youtube-dl) as a wav, mp3, or whatever (in my case, .wav). Here is where I found it. It edits some of the initial settings of the import, which ends up changing the output file. Sorry if this is all obvious to some of you! Just explaining for other new programmers who stumble upon this.
Related
I have a bunch of videos for which I want to extract specific sections (either as videos or as frames). I get the specific sections from a .json file where the start and end frames are stored according to labels, like 'cat in video', 'dog in video'. I have an existing method in Python using opencv using the method mentioned here but I found a one-liner using ffmpeg which is a lot more faster and efficient than my Python script, except that I have to manually fill in the start and end frames in this command.
ffmpeg -i in.mp4 -vf select='between(n\,x\,y)' -vsync 0 frames%d.png
I read a few questions about working with .json files in a shell script or passing arguments to a batch script which looks quite complicated and might spoil my system. Since I'm not familar working with .json files in a shell/batch script, I'm not sure how to start. Can anyone point me in the right direction on how to make a batch script that can read variables from a .json file and input it into my ffmpeg command?
Since you're already familiar with Python, I suggest you to use it to parse JSON files, then you can use ffmpeg-python library, which is a ffmpeg binding for Python. It also has a crop function, which I assume is what you need.
An alternative would be to use the os.system('ffmpeg <arguments>') calls from a Python script, which allows you to run external tools from the script.
Python natively supports JSON with its builtin json package
As for doing this in python, here is an alternative approach that you can try my ffmpegio-core package:
import ffmpegio
ffmpegio.transcode('in.mp4','frames%d.png',vf=f"select='between(n\,{x}\,{y})'",vsync=0)
If the videos are constant frame rate, it could be faster to specify the start and end timestamps as input options:
fs = ffmpegio.probe.video_streams_basic('in.mp4')[0]['frame_rate']
ffmpegio.transcode('in.mp4', 'frames%d.png', ss_in=x/fs, to_in=y/fs, vsync=0)
If you don't know the frame rate, you are calling ffprobe and ffmpeg for each file, so there is a tradeoff. But if your input video is long, it could be worthwhile.
But if speed is your primary goal, calling FFmpeg directly always is the fastest.
ffmpegio GitHub repo
Currently, I'm working on a python3 script that helps me sort the Google Photos takeout files. The Takeout service for Google Photos actually strips all the metadata of an image/video into a separate JSON file.
This script that I'm working on helps me to merge the timestamp present in the JSON file into its subsequent photo or video. In order to achieve this, I'm currently using - ExifTool by Phil Harvey, which is a Perl executable. I call this tool in a subprocess to edit the Date tags in EXIF/Metadata.
This process is quite hefty and is taking a large amount of time. Then I realised that most of my photos are JPG and videos are MP4, it is very easy to edit Exif data of JPG files in python using some of the libraries present & for the lesser proportion of photos like PNG I can use exiftool.
This has drastically improved the runtime of my script. Now I want to know that is there any way to edit the creation dates of MP4 files natively in python which can theoretically execute faster than the subprocess method.
Please help! Thanks in advance.
Im not too familiar with it, but ffmpeg seems like an option for just the mp4's.
cmd = 'ffmpeg -i "file.mp4" -codec copy -metadata timestamp="new_time_here" "output.mp4"'
subprocess.call(shlex.split(cmd))
modified from:
https://www.reddit.com/r/learnpython/comments/3yotj2/comment/cyfiyb7/?utm_source=share&utm_medium=web2x&context=3
I have a firend who gave me a very specific problem, he has a list of plain text words (for purpose of writing i'll just call this list he has list.txt and i'll populate it will basic words) and he wants to pull an audio only file of each word on this list from youtube. So i decided youtube-dl would be the fastest tool.
Since the list.txt is a plain words list and not a list youtube links it makes it harder to download in bulk. After reading all the documents I think the simplest way is to use the search feature built into youtube-dl.
List.txt
blue
river
red
(ETC...)
So basically something that does this. but since there is a lot more than just 3 items this isn't really practical.
youtube-dl.exe -f bestaudio ytsearch:blue
youtube-dl.exe -f bestaudio ytsearch:river
youtube-dl.exe -f bestaudio ytsearch:red
I've been looking all day for something similar to this bit of code below that can take a .txt file and search youtube and download 1 audio/video per item on this list.
youtube-dl.exe -f bestaudio ytsearch:list.txt
I work more with network stuff rather than coding so im kinda out of my depth here and only really have basic coding skills so any help is much appreciated
Solution that ended up working for me
also because i need file conversion i used ffmpeg which has built in support for youtube-dl
youtube-dl -c --title --batch-file test.txt --default-search "ytsearch" -x
-c Force resume of partially downloaded files.
--title not really sure doesnt seem to make a diffrence anyhow
-x as alastairtree pointed out its a better way to get audio
Ps. Thanks to all who tried to help me and my friend solve our problem
If list.txt contains just the words, use the --batch-file (or -a for short option) and set a default search provider (YouTube search in this case):
youtube-dl -a list.txt --default-search ytsearch -x
Note that I replaced -f bestaudio with -x (or --extract-audio). This has two advantages: It works with videos without dedicated audio streams (extremely rare these days), and it corrects the m4a file so that it can be read on all music players. You can also pass in --audio-format mp3 (or e.g. opus instead of mp3) to get the videos all in a desired output format.
If you were willing to not use python, and as you are on windows, you could just use Powershell to iterate over lines in the file, see Read file line by line in PowerShell
The url for ordinary people to watch the video is: http://v.youku.com/v_show/id_XNjM5NDU1OTUy.html
This video is split into 14 flv pieces, 5 of which are advertising flvs.
If I open the Developer Tools of IE11 and keep capturing the network flow during the whole process of watching the video (It must be the whole process, or the server doesn't send all of the video flv urls to IE11), the flv urls will be captured by IE11 and then I can copy the data of the flv urls which the below picture displays in a red line box:
Then I can change the data into a list of url-strings and use Python to download them.
But this is really trouble.
I have tried to match the source code of http://v.youku.com/v_show/id_XNjM5NDU1OTUy.html with the flv urls, but no results. So I guess there must be a function or a javascript or something else in the code to tell the server to send all the flv urls. Am I right?
So,
1.How to get all the urls of a video flv pieces only with Python?
2.What should I learn to solve this kind of problem.
After all, using Developer Tools of IE11, waiting for the whole process of the video (nearly one hour), copying the related data to a txt file and finally using Python to parse the txt file are really something trouble.
Thanks in advance.
I think you could get some insights from Youtube-dl. It is a set of python scripts created to "download Youtube videos and a few more sites". Go to their Download section and get the full source tarball. I think that could be useful in some way, at least to give you some directions on how to deal with flv pieces.
I want to write a python program that could extract audio from a video file (e.g. video.avi).
Is there any good library for it? And where should I start from?
I tried to use PyMedia, but I couldn't install it on my MacOSX(Mountain Lion).
EDIT:
The problem is video.avi is not completely available. Someone is writing on it and adding some frames to it every second. So I wanted to write a code in python to get the video as it comes and extract the audio from it and write it to a file (e.g. audio.mp3, audio.wav).
I don't know if ffmpeg can wait for the video to be copied to video.avi.
And I cannot wait for the video to be copied completely and then do the audio extraction. I have to do it as it comes.
I don't have a complete solution, but from the ffmpeg docs, it looks like ffmpeg can read an incomplete file by piping to stdin. Example from the docs:
cat test.wav | ffmpeg -i pipe:0
If you must use python, you can always call ffmpeg using subprocess.