Programmatically accessing PTS times in MP4 container

Programmatically accessing PTS times in MP4 container - python

Background
For a research project, we are recording video data from two cameras and feed a synchronization pulse directly into the microphone ADC every second.
Problem
We want to derive a frame time stamp in the clock of the pulse source for each camera frame to relate the camera images temporally. With our current methods (see below), we get a frame offset of around 2 frames between the cameras. Unfortunately, inspection of the video shows that we are clearly 6 frames off (at least at one point) between the cameras.
I assume that this is because we are relating audio and video signal wrong (see below).
Approach I think I need help with
I read that in the MP4 container, there should be PTS times for video and audio. How do we access those programmatically. Python would be perfect, but if we have to call ffmpeg via system calls, we may do that too ...
What we currently fail with
The original idea was to find video and audio times as
audio_sample_times = range(N_audiosamples)/audio_sampling_rate
video_frame_times = range(N_videoframes)/video_frame_rate
then identify audio_pulse_times in audio_sample_times base, calculate the relative position of each video_time to the audio_pulse_times around it, and select the same relative value to the corresponding source_pulse_times.
However, a first indication that this approach is problematic is already that for some videos, N_audiosamples/audio_sampling_rate differs from N_videoframes/video_frame_rate by multiple frames.
What I have found by now
OpenCV's cv2.CAP_PROP_POS_MSEC seems to do exactly what we do, and not access any PTS ...
Edit: What I took from the winning answer
container = av.open(video_path)
signal = []
audio_sample_times = []
video_sample_times = []
for frame in tqdm(container.decode(video=0, audio=0)):
if isinstance(frame, av.audio.frame.AudioFrame):
sample_times = (frame.pts + np.arange(frame.samples)) / frame.sample_rate
audio_sample_times += list(sample_times)
signal_f_ch0 = frame.to_ndarray().reshape((-1, len(frame.layout.channels))).T[0]
signal += list(signal_f_ch0)
elif isinstance(frame, av.video.frame.VideoFrame):
video_sample_times.append(float(frame.pts*frame.time_base))
signal = np.abs(np.array(signal))
audio_sample_times = np.array(audio_sample_times)
video_sample_times = np.array(video_sample_times)
Unfortunately, in my particular case, all pts are consecutive and gapless, so the result is the same as with the naive solution ...
By picture clues, we identified a section of ~10s in the videos, somewhere in which they desync, but can't find any traces of that in the data.

You need to run ffprobe to retrieve the PTS times. I don't know the exact command, but if you're ok with another package, try ffmpegio:
pip install ffmpegio-core
// OR
pip install ffmpegio // if you also want to use it to read video frames & audio samples
If you're on Windows, see this doc on where ffmpeg.exe can be found automatically.
Then if you can run
import ffmpegio
frames = ffmpegio.probe.frames('video.mp4', intervals=10)
This will return the frames info as a list of dicts of the first 10 packets (of mixed streams in the order of pts). If you remove the intervals argument, it'll retrieve every frame (will take a long time).
Inspect each dict of frames and decide which entries you need (say 'media_type', 'stream_index', pts and pts_time). Then add entries argument containing these:
frames = ffmpegio.probe.frames('video.mp4', intervals=10,
entries=['media_type', 'stream_index', 'pts','pts_time'])
Once you're happy with what it returns, incorporate to your program.
The intervals argument accepts many different formats, please read the doc.
What this or any other FFmpeg-based approach does not offer you is getting this info with the data frames. You need to read in the frame timing data separately and mesh them with the data yourself. If you prefer a solution with more control (but perhaps more coding) look into pyav, which interfaces the underlying library of FFmpeg. I'm fairly certain you can retrieve pts simultaneously with framedata.
Disclaimer: This function has not been tested extensively. So, you may encounter an issue. If you have, please report on GitHub and I'll fix it asap.

Related

How to get the last frame of a video with imageio in python?

I wanna grab 5 frames of a video distributed evenly including first and last frame. The answer to this helped me to loop over the video and get frames. However I didn't find out how to know when it's gonna be the last frame. Also looping over the whole video seems a bit expensive.
Python - Extracting and Saving Video Frames
Is there a better way of getting 5 specific frames (e.g. every 20% of the video) or at least and easy way of getting the total frame number? I already tried multiplying duration and fps from the metadata, but those numbers seem to be rounded and give a wrong number.
Thank you for your help.

Whether or not this is possible depends on your container and codec being used. Assuming the codec allows retrieving the total number of frames, you can do something like this:
import imageio.v3 as iio
import numpy as np
my_video = "imageio:cockatoo.mp4"
props = iio.improps(my_video, plugin="pyav")
# Make sure the codec knows the number of frames
assert props.shape[0] != -1
for idx in np.linspace(0, props.shape[0]-1, 5, dtype=int):
# imageIO < 2.21.0
image = iio.imread(my_video, index=idx)
# imageIO > 2.21.0
# image = iio.imread(my_video, index=idx, plugin="pyav")
iio.imwrite(f"delete_me/frame_{idx:03d}.png", image)
A few notes:
you want to use pyav for the call to iio.improps, because it doesn't decode frames, so it is fast.
some codecs or containers (especially when streaming video) either don't report or can't report the number of frames stored within. In this case props.shape[0] will be -1 and we can't use this method.
normally I'd recommend using pyav for the imread call as well, but there is a bug in the plugin that causes an EoF exception when trying to index= to the last frame. This will be fixed by #855, though I can't say if it will make it into this weekly release or the next.
if the codec used doesn't guarantee constant framerate, you won't get any speed advantages over iio.imiter because we can't safely predict how far into the video to seek. If you know that your video has a constant framerate, however, you can use the constant_framerate kwarg of the pyav plugin to speed things up. (IMPORTANT: if the video has variable framerate and you set this anyway the seek will not be accurate.)
As usual, you can use iio.imopen if you want to do all of these operations without reopening the video each time.

Video Overlay System with OpenCV

I'm trying to implement a video overlay solution such as this one: https://www.videologixinc.com/, where there is no delay in the original source video.
My problem is that, with OpenCV, all the necessary drawings (circle, text, etc) requires the entire frame to be processed and then returned to be exhibited. Is there any solution where I could just overlay the information in the original source without implying in delay/frame drop? (the additional information can be displayed with delay - drawings, text - but not the original video pipeline).
Multiprocessing could make things faster, but I would still have delay or frame drops.
I was also thinking if would be better to have two simultaneous applications and maybe two different computers - one to read the frame and make the processing - and another one to just receive, somehow, the information to overlay it on the original video pipeline.
Any thoughts? Thank you all!
An example of data pipeline in this case, without interfering in the original video flow

Presenting parts of a pre-prepared image array in Shady

I'm interested in migrating from psychtoolbox to shady for my stimulus presentation. I looked through the online docs, but it is not very clear to me how to replicate what I'm currently doing in matlab in shady.
What I do is actually very simple. For each trial,
I load from disk a single image (I do luminance linearization off-line), which contains all the frames I plan to display in that trial (the stimulus is 1000x1000 px, and I present 25 frames, hence the image is 5000x5000px. I only use BW images, so I have a single int8 value per pixel).
I transfer the entire image from the CPU to the GPU
At some point (externally controlled) I copy the first frame to the video buffer and present it
At some other point (externally controlled) I trigger the presentation of the
remaining 24 frames (copying the relevant part of the image to video buffer for each video frame, and then calling flip()).
The external control happens by having another machine communicate with the stimulus presentation code over TCP/IP. After the control PC sends a command to the presentation PC and this is executed, the presentation PC needs to send back an acknowledgement message to the control PC. I need to send three ACK messages, one when the first frame appears on screen, one when the 2nd frame appears on screen, and one when the 25th frame appears on screen (this way the control PC can easily verify if a frame has been dropped).
In matlab I do this by calling the blocking method flip() to present a frame, and when it returns I send the ACK to the control PC.
That's it. How would I do that in shady? Is there an example that I should look at?

The places to look for this information are the docstrings of Shady.Stimulus and Shady.Stimulus.LoadTexture, as well as the included example script animated-textures.py.
Like most things Python, there are multiple ways to do what you want. Here's how I would do it:
w = Shady.World()
s = w.Stimulus( [frame00, frame01, frame02, ...], multipage=True )
where each frameNN is a 1000x1000-pixel numpy array (either floating-point or uint8).
Alternatively you can ask Shady to load directly from disk:
s = w.Stimulus('trial01/*.png', multipage=True)
where directory trial01 contains twenty-five 1000x1000-pixel image files, named (say) 00.png through 24.png so that they get sorted correctly. Or you could supply an explicit list of filenames.
Either way, whether you loaded from memory or from disk, the frames are all transferred to the graphics card in that call. You can then (time-critically) switch between them with:
s.page = 0 # or any number up to 24 in your case
Note that, due to our use of the multipage option, we're using the "page" animation mechanism (create one OpenGL texture per frame) instead of the default "frame" mechanism (create one 1000x25000 OpenGL texture) because the latter would exceed the maximum allowable dimensions for a single texture on many graphics cards. The distinction between these mechanisms is discussed in the docstring for the Shady.Stimulus class as well as in the aforementioned interactive demo:
python -m Shady demo animated-textures
To prepare the next trial, you might use .LoadPages() (new in Shady version 1.8.7). This loops through the existing "pages" loading new textures into the previously-used graphics-card texture buffers, and adds further pages as necessary:
s.LoadPages('trial02/*.png')
Now, you mention that your established workflow is to concatenate the frames as a single 5000x5000-pixel image. My solutions above assume that you have done the work of cutting it up again into 1000x1000-pixel frames, presumably using numpy calls (sounds like you might be doing the equivalent in Matlab at the moment). If you're going to keep saving as 5000x5000, the best way of staying in control of things might indeed be to maintain your own code for cutting it up. But it's worth mentioning that you could take the entirely different strategy of transferring it all in one go:
s = w.Stimulus('trial01_5000x5000.png', size=1000)
This loads the entire pre-prepared 5000x5000 image from disk (or again from memory, if you want to pass a 5000x5000 numpy array instead of a filename) into a single texture in the graphics card's memory. However, because of the size specification, the Stimulus will only show the lower-left 1000x1000-pixel portion of the array. You can then switch "frames" by shifting the carrier relative to the envelope. For example, if you were to say:
s.carrierTranslation = [-1000, -2000]
then you would be looking at the frame located one "column" across and two "rows" up in your 5x5 array.
As a final note, remember that you could take advantage of Shady's on-the-fly gamma-correction and dithering–they're happening anyway unless you explicitly disable them, though of course they have no physical effect if you leave the stimulus .gamma at 1.0 and use integer pixel values. So you could generate your stimuli as separate 1000x1000 arrays, each containing unlinearized floating-point values in the range [0.0,1.0], and let Shady worry about everything beyond that.

Python: total number of video frames

I wish to use python to open a video file (avi, wmv, mp4), determine the total number of frames contained within the video, and save an arbitrary frame from within the video as an image file.
I have looked at pyffmpeg, but I do not know how to obtain the total number of frames contained in the video without iterating over each (which is incredibly slow). My code to obtain the number of frames in a video is given below:
import pyffmpeg
stream = pyffmpeg.VideoStream()
stream.open('video.avi')
frame_no = 0
# Very inefficient code:
while (stream.GetFramNo(frame_no)):
frame_no=frame_no+1
Is there a way in which I can do this efficiently? If not, please suggest an alternative extension or approach; code fragments would be a nice bonus.

How to extract the bitrate and other statistics of a video file with Python

I am trying to extract the prevailing bitrate of a video file (e.g. .mkv file containing a movie) at a regular sampling interval of between 1-10 seconds under conditions of normal playback. Kind of like you may see in vlc, during playback of the file in the statistics window.
Can anyone suggest the best way to bootstrap the coding of such an analyser? Is there a library that provides an API to such information that people know of? Perhaps a Python wrapper for ffmpeg or equivalent tool that processes video files and can thereby extract such statistics.
What I am really aiming for is a CSV format file containing the seconds offset and the average or actual bitrate in KiB/s at that offset into the asset.
Update:
I built pyffmpeg and wrote the following spike:
import pyffmpeg
reader = pyffmpeg.FFMpegReader(False)
reader.open("/home/mark/Videos/BBB.m2ts", pyffmpeg.TS_VIDEO)
tracks=reader.get_tracks()
# Called for each frame
def obs(f):
pass
tracks[0].set_observer(obs)
reader.run()
But observing frame information (f) in the callback does not appear to give me any hooks to calculate per second bitrates. In fact bitrate calculations within pyffmpeg are measured across the entire file (filesize / duration) and so the treatment within the library is very superficial. Clearly its focus is on extract i-frames and other frame/GOP specific work.

Something like these:
http://code.google.com/p/pyffmpeg/
http://pymedia.org/

You should be able to do this with gstreamer. http://pygstdocs.berlios.de/pygst-tutorial/seeking.html has an example of a simple media player. It calls
pos_int = self.player.query_position(gst.FORMAT_TIME, None)[0]
periodically. All you have to do is call query_position() a second time with gst.FORMAT_BYTES, do some simple math, and voila! Bitrate vs. time.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.