How to get video's data with python - python

I need to retrive from a video all the information about it such as - frame rate, size, bits, length... and also all the frame's data (the pixels as 2D numpy array for example). Do you have a function in python that you can load a video, and then retrive from it all it's data? Thank's a lot!
I know there is such a function in matlab and I'm looking for a way of doing this on python.

You'll benefit from the OpenCV library. You can install it several ways including [pip](pip install opencv-python) or conda. Here's a few previous SO threads discussing your question.
Looping through frames: the numpy array you're looking for is the frame variable immediately underneath the while statement
Retrieving frame rate and video duration
Retrieving frame rate

Related

Programmatically accessing PTS times in MP4 container

Background
For a research project, we are recording video data from two cameras and feed a synchronization pulse directly into the microphone ADC every second.
Problem
We want to derive a frame time stamp in the clock of the pulse source for each camera frame to relate the camera images temporally. With our current methods (see below), we get a frame offset of around 2 frames between the cameras. Unfortunately, inspection of the video shows that we are clearly 6 frames off (at least at one point) between the cameras.
I assume that this is because we are relating audio and video signal wrong (see below).
Approach I think I need help with
I read that in the MP4 container, there should be PTS times for video and audio. How do we access those programmatically. Python would be perfect, but if we have to call ffmpeg via system calls, we may do that too ...
What we currently fail with
The original idea was to find video and audio times as
audio_sample_times = range(N_audiosamples)/audio_sampling_rate
video_frame_times = range(N_videoframes)/video_frame_rate
then identify audio_pulse_times in audio_sample_times base, calculate the relative position of each video_time to the audio_pulse_times around it, and select the same relative value to the corresponding source_pulse_times.
However, a first indication that this approach is problematic is already that for some videos, N_audiosamples/audio_sampling_rate differs from N_videoframes/video_frame_rate by multiple frames.
What I have found by now
OpenCV's cv2.CAP_PROP_POS_MSEC seems to do exactly what we do, and not access any PTS ...
Edit: What I took from the winning answer
container = av.open(video_path)
signal = []
audio_sample_times = []
video_sample_times = []
for frame in tqdm(container.decode(video=0, audio=0)):
if isinstance(frame, av.audio.frame.AudioFrame):
sample_times = (frame.pts + np.arange(frame.samples)) / frame.sample_rate
audio_sample_times += list(sample_times)
signal_f_ch0 = frame.to_ndarray().reshape((-1, len(frame.layout.channels))).T[0]
signal += list(signal_f_ch0)
elif isinstance(frame, av.video.frame.VideoFrame):
video_sample_times.append(float(frame.pts*frame.time_base))
signal = np.abs(np.array(signal))
audio_sample_times = np.array(audio_sample_times)
video_sample_times = np.array(video_sample_times)
Unfortunately, in my particular case, all pts are consecutive and gapless, so the result is the same as with the naive solution ...
By picture clues, we identified a section of ~10s in the videos, somewhere in which they desync, but can't find any traces of that in the data.
You need to run ffprobe to retrieve the PTS times. I don't know the exact command, but if you're ok with another package, try ffmpegio:
pip install ffmpegio-core
// OR
pip install ffmpegio // if you also want to use it to read video frames & audio samples
If you're on Windows, see this doc on where ffmpeg.exe can be found automatically.
Then if you can run
import ffmpegio
frames = ffmpegio.probe.frames('video.mp4', intervals=10)
This will return the frames info as a list of dicts of the first 10 packets (of mixed streams in the order of pts). If you remove the intervals argument, it'll retrieve every frame (will take a long time).
Inspect each dict of frames and decide which entries you need (say 'media_type', 'stream_index', pts and pts_time). Then add entries argument containing these:
frames = ffmpegio.probe.frames('video.mp4', intervals=10,
entries=['media_type', 'stream_index', 'pts','pts_time'])
Once you're happy with what it returns, incorporate to your program.
The intervals argument accepts many different formats, please read the doc.
What this or any other FFmpeg-based approach does not offer you is getting this info with the data frames. You need to read in the frame timing data separately and mesh them with the data yourself. If you prefer a solution with more control (but perhaps more coding) look into pyav, which interfaces the underlying library of FFmpeg. I'm fairly certain you can retrieve pts simultaneously with framedata.
Disclaimer: This function has not been tested extensively. So, you may encounter an issue. If you have, please report on GitHub and I'll fix it asap.

How to get the last frame of a video with imageio in python?

I wanna grab 5 frames of a video distributed evenly including first and last frame. The answer to this helped me to loop over the video and get frames. However I didn't find out how to know when it's gonna be the last frame. Also looping over the whole video seems a bit expensive.
Python - Extracting and Saving Video Frames
Is there a better way of getting 5 specific frames (e.g. every 20% of the video) or at least and easy way of getting the total frame number? I already tried multiplying duration and fps from the metadata, but those numbers seem to be rounded and give a wrong number.
Thank you for your help.
Whether or not this is possible depends on your container and codec being used. Assuming the codec allows retrieving the total number of frames, you can do something like this:
import imageio.v3 as iio
import numpy as np
my_video = "imageio:cockatoo.mp4"
props = iio.improps(my_video, plugin="pyav")
# Make sure the codec knows the number of frames
assert props.shape[0] != -1
for idx in np.linspace(0, props.shape[0]-1, 5, dtype=int):
# imageIO < 2.21.0
image = iio.imread(my_video, index=idx)
# imageIO > 2.21.0
# image = iio.imread(my_video, index=idx, plugin="pyav")
iio.imwrite(f"delete_me/frame_{idx:03d}.png", image)
A few notes:
you want to use pyav for the call to iio.improps, because it doesn't decode frames, so it is fast.
some codecs or containers (especially when streaming video) either don't report or can't report the number of frames stored within. In this case props.shape[0] will be -1 and we can't use this method.
normally I'd recommend using pyav for the imread call as well, but there is a bug in the plugin that causes an EoF exception when trying to index= to the last frame. This will be fixed by #855, though I can't say if it will make it into this weekly release or the next.
if the codec used doesn't guarantee constant framerate, you won't get any speed advantages over iio.imiter because we can't safely predict how far into the video to seek. If you know that your video has a constant framerate, however, you can use the constant_framerate kwarg of the pyav plugin to speed things up. (IMPORTANT: if the video has variable framerate and you set this anyway the seek will not be accurate.)
As usual, you can use iio.imopen if you want to do all of these operations without reopening the video each time.

Object tracking without openCV

I am trying to build up an algorithm to detect some objects and track them over time. My input data is a tif multi-stack file, which I read as a np array. I apply a U-Net model to create a binary mask and then identify the coordinates of single objects using scipy.
Up to here everything kind of works but I just cannot get my head around the tracking. I have a dictionary where keys are the frame numbers and values are lists of tuples. Each tuple contain the coordinates of each object.
Now I have to link the objects together, which on paper seems pretty simple. I was hoping there was a function or a package to do so (ideally something similar to trackMate or M2track on ImageJ), but I cannot find anything like that. I am considering writing my own nearest neighbor tool but I'd like to know whether there is a less painful way (and also, I would like to consider also more advanced metrics).
The other option I considered is using cv2, but this would require converting the data in a format cv2 likes, which will significantly slow down the code. In addition, I would like to keep the data as close as possible to the original input, so no cv2 for me.
I solved it using trackpy.
http://soft-matter.github.io/trackpy/v0.5.0/
trackpy properly reads multistack tiff files (OpenCv can't).

Is there a way to generate a gif in python without consuming an excessive amount of RAM?

I'm writing a little application to generate a GIF from a kifu file (it's a type of file used to save a game in Japanese chess). I'm using Matplotlib currently to draw the board and the pieces, and the matplotlib.animation.FuncAnimation class combined with numpngw.AnimatedPNGWriter to write the gif. However, it uses more than 800MB of RAM to generate a single gif with 80 frames. After reflection, this value seems not surprising, because (from my understanding), each frame has a dimension of 1700x1000 and is in color. So, to keep every frame in frame, it needs a minimum of 1700*1000*80*(nb_bytes by pixel), which is a huge amount of RAM.
Is there a way to minimize this amount either with matplotlib or with another library? I suppose I need to compress frames after creating them instead of keeping them raw but I can't figure out how to do that.
Thank you very much

Video Overlay System with OpenCV

I'm trying to implement a video overlay solution such as this one: https://www.videologixinc.com/, where there is no delay in the original source video.
My problem is that, with OpenCV, all the necessary drawings (circle, text, etc) requires the entire frame to be processed and then returned to be exhibited. Is there any solution where I could just overlay the information in the original source without implying in delay/frame drop? (the additional information can be displayed with delay - drawings, text - but not the original video pipeline).
Multiprocessing could make things faster, but I would still have delay or frame drops.
I was also thinking if would be better to have two simultaneous applications and maybe two different computers - one to read the frame and make the processing - and another one to just receive, somehow, the information to overlay it on the original video pipeline.
Any thoughts? Thank you all!
An example of data pipeline in this case, without interfering in the original video flow

Categories