Currently, I'm working on a python3 script that helps me sort the Google Photos takeout files. The Takeout service for Google Photos actually strips all the metadata of an image/video into a separate JSON file.
This script that I'm working on helps me to merge the timestamp present in the JSON file into its subsequent photo or video. In order to achieve this, I'm currently using - ExifTool by Phil Harvey, which is a Perl executable. I call this tool in a subprocess to edit the Date tags in EXIF/Metadata.
This process is quite hefty and is taking a large amount of time. Then I realised that most of my photos are JPG and videos are MP4, it is very easy to edit Exif data of JPG files in python using some of the libraries present & for the lesser proportion of photos like PNG I can use exiftool.
This has drastically improved the runtime of my script. Now I want to know that is there any way to edit the creation dates of MP4 files natively in python which can theoretically execute faster than the subprocess method.
Please help! Thanks in advance.
Im not too familiar with it, but ffmpeg seems like an option for just the mp4's.
cmd = 'ffmpeg -i "file.mp4" -codec copy -metadata timestamp="new_time_here" "output.mp4"'
subprocess.call(shlex.split(cmd))
modified from:
https://www.reddit.com/r/learnpython/comments/3yotj2/comment/cyfiyb7/?utm_source=share&utm_medium=web2x&context=3
Related
I have a bunch of videos for which I want to extract specific sections (either as videos or as frames). I get the specific sections from a .json file where the start and end frames are stored according to labels, like 'cat in video', 'dog in video'. I have an existing method in Python using opencv using the method mentioned here but I found a one-liner using ffmpeg which is a lot more faster and efficient than my Python script, except that I have to manually fill in the start and end frames in this command.
ffmpeg -i in.mp4 -vf select='between(n\,x\,y)' -vsync 0 frames%d.png
I read a few questions about working with .json files in a shell script or passing arguments to a batch script which looks quite complicated and might spoil my system. Since I'm not familar working with .json files in a shell/batch script, I'm not sure how to start. Can anyone point me in the right direction on how to make a batch script that can read variables from a .json file and input it into my ffmpeg command?
Since you're already familiar with Python, I suggest you to use it to parse JSON files, then you can use ffmpeg-python library, which is a ffmpeg binding for Python. It also has a crop function, which I assume is what you need.
An alternative would be to use the os.system('ffmpeg <arguments>') calls from a Python script, which allows you to run external tools from the script.
Python natively supports JSON with its builtin json package
As for doing this in python, here is an alternative approach that you can try my ffmpegio-core package:
import ffmpegio
ffmpegio.transcode('in.mp4','frames%d.png',vf=f"select='between(n\,{x}\,{y})'",vsync=0)
If the videos are constant frame rate, it could be faster to specify the start and end timestamps as input options:
fs = ffmpegio.probe.video_streams_basic('in.mp4')[0]['frame_rate']
ffmpegio.transcode('in.mp4', 'frames%d.png', ss_in=x/fs, to_in=y/fs, vsync=0)
If you don't know the frame rate, you are calling ffprobe and ffmpeg for each file, so there is a tradeoff. But if your input video is long, it could be worthwhile.
But if speed is your primary goal, calling FFmpeg directly always is the fastest.
ffmpegio GitHub repo
I have a list of YouTube links that I'd like to download just the sound file (a list of albums, that I'm going to then turn into .wav files to analyze). I've been using Pytube, but it's very slow and I'm hoping to find a way to possibly compress the file before it actually downloads or processes so it can provide the file faster. Code I'm using is below:
from pytube import YouTube
import time
t1 = time.time()
myAudioStream = YouTube("https://www.youtube.com/watch?v=U_SLL3-NEMM").streams.last()
t2 = time.time()
print(t2-t1)
myAudioStream.download("C:\\Users\\MyUser\\Python Projects\\AlbumFiles\\")
t3 = time.time()
print(t3-t2)
The link in there currently is just a song, since I wanted to get an idea of how long it'd take, and it still takes about 200 seconds. If I want to download something 4-8x larger, it will probably be quite awhile before it finishes. Is there something I can do when processing this data to speed this up?
There is a free, cross platform (Windows/Mac/Linux), command line program named youtube-dl that can convert YouTube videos to mp3 files.
Show a list of the available formats for a specific YouTube URL which I have denoted by <URL> in the following line of code.
youtube-dl -F <URL>
Some of the available formats for a specific YouTube URL are audio only and they are identified as audio only in the results of youtube-dl -F <URL>.
youtube-dl can convert YouTube videos to mp3 files with the following command:
youtube-dl -f your-choice-of-format --extract-audio --audio-format mp3 <URL>
where your-choice-of-format is replaced by an format integer number that is selected from the results of youtube-dl -F <URL>.
A YouTube video has to be downloaded before it can be converted as part of the execution of the above command, because youtube-dl cannot convert a video to mp3 format unless it has access to it, so youtube-dl downloads the entire video as a temporary file and then deletes the temporary file automatically when it is done converting it.
youtube-dl can be installed on any OS that has Python installed with this command:
python3 -m pip install youtube-dl
In addition to converting YouTube videos to mp3 files, youtube-dl has an amazing list of capabilities including downloading playlists and channels, downloading multiple videos from a list of URLs in a text file, and downloading part of a playlist or channel by specifying the start NUMBER and the end NUMBER of the batch of videos that you want to download from a playlist as follows:
youtube-dl -f FORMAT -ci --playlist-start NUMBER --playlist-end NUMBER <URL-of-playlist>
There's something else you can do with youtube-dl if you already bought a CD and found the music video of a song from that CD on YouTube. You can download the music video, remove its audio track, and replace it with a high definition audio track from your own CD.
So I'd like to just report the results of the post above. I know this might belong in a comment, but I tried slightly different methods and would like to provide the code. I looked at different approaches people used to call youtube-dl and compared the speed.
So in all of my methods, I used youtube-dl, because it was so much faster than Pytube. I'm not sure what makes Pytube so much slower, but if someone wants to comment an explanation, I am interested!
First method: Using os.system to play the command line
import os
os.system('youtube-dl --extract-audio --audio-format mp3 https://www.youtube.com/watch?v=U_SLL3-NEMM')
Result: about 30 second, and produced an MP3.
Second method: Embedding youtube-dl as a library
import youtube-dl as ydl
with youtube_dl.YoutubeDL({}) as ydl:
ydl.download(['https://www.youtube.com/watch?v=U_SLL3-NEMM'])
Result: About 10 seconds, and produced a MKV file (larger storage space than the MP3)
Third method: Running the command line with subprocess
from subprocess import call
command = "youtube-dl --extract-audio --audio-format mp3 https://www.youtube.com/watch?v=U_SLL3-NEMM"
call(command.split(), shell=False)
Result: Similar to first method with os; 30 seconds, output was an MP3.
EDIT: I have found a way to output the fastest method (embedding youtube-dl) as a wav, mp3, or whatever (in my case, .wav). Here is where I found it. It edits some of the initial settings of the import, which ends up changing the output file. Sorry if this is all obvious to some of you! Just explaining for other new programmers who stumble upon this.
I want to extract video duration metadata from every video file in a specified directory and then view the total duration.
I need to extract the data from as much as thousands of videos overall. In Windows I can view the total duration for many files manually when selecting them in the explorer and going into details. For 1500 mp4 files it takes about 20 seconds to make the calculations and view the total time. It's relatively much faster then what I'm currently getting when iterating with FFprobe.
How fast I am currently getting the result with FFprobe.
for filename in dirFiles:
print(subprocess.check_output(['ffprobe', '-i', filename, '-show_entries','format=duration', '-sexagesimal' ,'-v', 'quiet', '-of', 'csv=%s' % ("p=0")]))
What is the faster way to do this?
I solved the problem with mutagen module as it handles files quite fast.
mp4 = mutagen.mp4.MP4(filename)
duration+=mp4.info.length
There is no "best" way, just a best way for your use case. os.stat doesn't have it because video duration is not part of any posix file system, (files systems don't care about the contents of a file, Whats is the duration of a text file? what is the resolution of an executable?) If you don't like ffprobe, try mediainfo, or mp4box, or any other tool that can read media files.
I want to write a python program that could extract audio from a video file (e.g. video.avi).
Is there any good library for it? And where should I start from?
I tried to use PyMedia, but I couldn't install it on my MacOSX(Mountain Lion).
EDIT:
The problem is video.avi is not completely available. Someone is writing on it and adding some frames to it every second. So I wanted to write a code in python to get the video as it comes and extract the audio from it and write it to a file (e.g. audio.mp3, audio.wav).
I don't know if ffmpeg can wait for the video to be copied to video.avi.
And I cannot wait for the video to be copied completely and then do the audio extraction. I have to do it as it comes.
I don't have a complete solution, but from the ffmpeg docs, it looks like ffmpeg can read an incomplete file by piping to stdin. Example from the docs:
cat test.wav | ffmpeg -i pipe:0
If you must use python, you can always call ffmpeg using subprocess.
Looking for easiest way to do the following:
I have created 10,000 unique QR-codes, with unique filenames.
I have one postcard design (.ai, eps, pdf - doesn't matter) with place holder for the qr code and for a the unique filename (sans .png extension).
How would I go about inserting each of the 10.000 png's into 10,000 copies of the pdf files? (and I need to do the same with the unique filename /textstring that represents each QR code).
since I am really no good with programming it' doesn't matter which tools to use. As long as you hold my hand - or there is a link to a beginners documentation.
however:
I am trying to learn python - so that is preferred.
I work a little bit with R - but that will not be the easiest solution.
If this can be done directly from the terminal with a shell script then halliluja :-)
But really - if you know of a solution - then please post it, regardless of the tools.
Thanks in advance.
You can do it in Python using pyPdf to merge documents.
Basically, you create a PDF with your QRCode placed where you want it in the end.
You can use the (c)StringIO module to store the created PDF file in memory.
You can find pyPDF here; there's an example that shows how you would add a watermark to a file, you should be following the same logic.