I want to extract video duration metadata from every video file in a specified directory and then view the total duration.
I need to extract the data from as much as thousands of videos overall. In Windows I can view the total duration for many files manually when selecting them in the explorer and going into details. For 1500 mp4 files it takes about 20 seconds to make the calculations and view the total time. It's relatively much faster then what I'm currently getting when iterating with FFprobe.
How fast I am currently getting the result with FFprobe.
for filename in dirFiles:
print(subprocess.check_output(['ffprobe', '-i', filename, '-show_entries','format=duration', '-sexagesimal' ,'-v', 'quiet', '-of', 'csv=%s' % ("p=0")]))
What is the faster way to do this?
I solved the problem with mutagen module as it handles files quite fast.
mp4 = mutagen.mp4.MP4(filename)
duration+=mp4.info.length
There is no "best" way, just a best way for your use case. os.stat doesn't have it because video duration is not part of any posix file system, (files systems don't care about the contents of a file, Whats is the duration of a text file? what is the resolution of an executable?) If you don't like ffprobe, try mediainfo, or mp4box, or any other tool that can read media files.
Related
Currently, I'm working on a python3 script that helps me sort the Google Photos takeout files. The Takeout service for Google Photos actually strips all the metadata of an image/video into a separate JSON file.
This script that I'm working on helps me to merge the timestamp present in the JSON file into its subsequent photo or video. In order to achieve this, I'm currently using - ExifTool by Phil Harvey, which is a Perl executable. I call this tool in a subprocess to edit the Date tags in EXIF/Metadata.
This process is quite hefty and is taking a large amount of time. Then I realised that most of my photos are JPG and videos are MP4, it is very easy to edit Exif data of JPG files in python using some of the libraries present & for the lesser proportion of photos like PNG I can use exiftool.
This has drastically improved the runtime of my script. Now I want to know that is there any way to edit the creation dates of MP4 files natively in python which can theoretically execute faster than the subprocess method.
Please help! Thanks in advance.
Im not too familiar with it, but ffmpeg seems like an option for just the mp4's.
cmd = 'ffmpeg -i "file.mp4" -codec copy -metadata timestamp="new_time_here" "output.mp4"'
subprocess.call(shlex.split(cmd))
modified from:
https://www.reddit.com/r/learnpython/comments/3yotj2/comment/cyfiyb7/?utm_source=share&utm_medium=web2x&context=3
I have been trying to build a project in which a Flask Application can automatically concatenate a selected amount of video's to a 'core video'.
User's can upload a video, which is sent to amazon s3 for storage.
All video's are preprocessed by Moviepy to be an mp4 file, running on 24 fps without audio, with a resolution of 720p.
After this preprocessing, the video is uploaded to amazon s3.
Of all new uploads in s3, a queue is created which an administrator can approve or delete.
All approved video's end up in a list that is concatenated with a current 'core video'.
This is done by using Pythons Moviepy library.
from moviepy.editor import VideoFileClip, concatenate_videoclips, AudioFileClip
videos_to_concat(VideoFileClip(core_video.s3_link))
for video in approved_videos:
videos_to_concat.append(VideoFileClip(video.s3_link))
result = concatenate_videoclips(videos_to_concat, method=compose)
Later, some audio is added under the full duration of the video.
result_with_audio = result.set_audio(some_audio.mp3)
The problem however, is that without throwing any errors, some videos are frozen after the first couple of frames after concatenation has been successfull. A frame remains stationary for the duration of the original clip. The audio keeps playing though. When a next clip is loaded, that one either plays normally or has the same behaviour of freezing after a couple of frames. There seems to be no obvious patern.
Initially I thought the mistake might be that ffmpeg does not download video's from the normal s3 link properly, but that would not explain why the the biggest video in the beginning and some other video's get rendered correctly, and some others aren't.
Could it that this is caused by a potential difference in codecs? (libx264 vs. mpeg4)?
Or is this way of accessing the files by URL and then directly feeding that to moviepy a potential cause of troubles? (VideoFileClip(https://amazon.s3.link.to.file.here.mp4)
Should I try to download all files and then locally concatenating them, or am I right to assume that the current approach should work.
When inspecting the files, nothing obvious like filename, filetype, resolution seems to be the issue, the preprocessing seems to do what it should.
Would love to hear any idea's on how the corruption of the resulting concatenated video could be explained and hopefully resolved.
Okay, I did manage to figure it out in the end. The problem was solved by downloading all video's with the boto3 client that amazon s3 provides for Python. Once downloading all video's to local storage of the webserver, concatenation worked without any issues.
I'm guessing that this might have something to do with s3 not serving the entire video file instantly through the link. In the end it seems quite logical to just use the provided s3 client to download store video's before performing any edits with moviepy.
I have url of a wav format audio file which is basically an audio of a call . I want to find the duration of the wav file, which means the duration of the call. I do not want to download the wav file, as I have to repeat this set for a large number of such records. Is there any way to do this in python ?
A little more information about where the audio file would be helpful.
Such as the URL or the location of the file. You might be able to use beautiful soup to scrape for the length if it's mentioned on a webpage, otherwise, you might be able to use some sort of an API call.
https://github.com/quodlibet/mutagen would helpful to look at
You will need to download some of the WAV files to be able to read the header and parse it with e.g. the builtin wave module.
Alternately, if all of the files are uncompressed PCM, and have the same format, you can just look at the file size (which you can get with a HTTP HEAD request) and guess the approximate duration.
The PyDub library, for me, is pretty much ideal for converting audio formats. I recently used it to write a command line audio converter to convert about 200 audio files, and it saved me having to buy or look for an audio converter that would allow me to queue up songs and other audio files for conversion. But I quickly noticed that it replaced my audio files. Now, for me, this was ideal. This was great. But what if I didn't want PyDub to replace the audio files, but rather duplicate it but in a different format? I could just copy the files into the directory and convert them, but is there no way to do this from within PyDub? I looked into it and I couldn't find a way to do this, nor could I find a question on this, so maybe this isn't a very common thing to do.
Thanks!
When you export an audio segment, you can always specify a new name for the file (or use the same name but in a different folder)
from pydub import AudioSegment
song = AudioSegment.from_file("/path/to/file.mp3", format="mp3")
song.export("/path/to/new/filename.mp4", format="mp4")
Hope this helps:
myaudio = AudioSegment.from_mp3("XXXXX/y.mp3")
chunk_length_ms = 1000000 # pydub calculates in millisec
chunks = make_chunks(myaudio, chunk_length_ms) # Make chunks of one sec
chunks.export('path where file needs to be exported' + chunks, format='mp3')
I've got a program that downloads part01, then part02 etc of a rar file split across the internet.
My program downloads part01 first, then part02 and so on.
After some tests, I found out that using, on example, UnRAR2 for python I can extract the first part of the file (an .avi file) contained in the archive and I'm able to play it for the first minutes. When I add another file it extracts a bit more and so on. What I wonder is: is it possible to make it extract single files WHILE downloading them?
I'd need it to start extracting part01 without having to wait for it to finish downloading... is that possible?
Thank you very much!
Matteo
You are talking about an .avi file inside the rar archives. Are you sure the archives are actually compressed? Video files released by the warez scene do not use compression:
Ripped movies are still packaged due to the large filesize, but compression is disallowed and the RAR format is used only as a container. Because of this, modern playback software can easily play a release directly from the packaged files, and even stream it as the release is downloaded (if the network is fast enough).
(I'm thinking VLC, BSPlayer, KMPlayer, Dziobas Rar Player, rarfilesource, rarfs,...)
You can check for the compression as follows:
Open the first .rar archive in WinRAR. (name.part01.rar or name.rar for old style volumes names)
Click the info button.
If Version to extract indicates 2.0, then the archive uses no compression. (unless you have decade old rars) You can see Total size and Packed size will be equal.
is it possible to make it extract
single files WHILE downloading them?
Yes. When no compression is used, you can write your own program to extract the files. (I know of someone who wrote a script to directly download the movie from external rar files; but it's not public and I don't have it.) Because you mentioned Python I suggest you take a look at rarfile 2.2 by Marko Kreen like the author of pyarrfs did. The archive is just the file chopped up with headers (rar blocks) added. It will be a copy operation that you need to pause until the next archive is downloaded.
I strongly believe it is also possible for compressed files. Your approach here will be different because you must use unrar to extract the compressed files. I have to add that there is also a free RARv3 implementation to extract rars implemented in The Unarchiver.
I think this parameter for (un)rar will make it possible:
-vp Pause before each volume
By default RAR asks for confirmation before creating
or unpacking next volume only for removable disks.
This switch forces RAR to ask such confirmation always.
It can be useful if disk space is limited and you wish
to copy each volume to another media immediately after
creation.
It will give you the possibility to pause the extraction until the next archive is downloaded.
I believe that this won't work if the rar was created with the 'solid' option enabled.
When the solid option is used for rars, all packed files are treated as one big file stream. This should not cause any problems if you always start from the first file even if it doesn't contain the file you want to extract.
I also think it will work with passworded archives.
I highly doubt it. By nature of compression (from my understanding), every bit is needed to uncompress it. It seems that the source of where you are downloading from has intentionally broken the avi into pieces before compression, but by the time you apply compression, whatever you compressed is now one atomic unit. So they kindly broke the whole avi into Parts, but each Part is still an atomic nit.
But I'm not an expert in compression.
The only test I can currently think of is something like: curl http://example.com/Part01 | unrar.
I don't know if this was asked with a specific language in mind, but it is possible to stream a compressed RAR directly from the internet and have it decompressed on the fly. I can do this with my C# library http://sharpcompress.codeplex.com/
The RAR format is actually kind of nice. It has headers preceding each entry and the compressed data itself does not require random access on the stream of bytes.
Do it multi-part files, you'd have to fully extract part 1 first, then continue writing when part 2 is available.
All of this is possible with my RarReader API. Solid archive are also streamable (in fact, they're only streamable. You can't randomly access files in a solid archive. You pretty much have to extract them all at once.)