How can I extract frames from a video file using Python3?
For example, I want to get 16 picture from a video and combine them into a 4x4 grid.
I don't want 16 separate images at the end, I want one image containing 16 frames from the video.
import av
container ='/home/uguraba/Downloads/equals/equals.mp4')
video = next(s for s in container.streams)
for packet in container.demux(video):
for frame in packet.decode():
if frame.index %3000==0:
frame.to_image().save('/home/uguraba/Downloads/equals/frame-%04d.jpg' % frame.index)
By using this script i can get frames. There will be lots of frames saved. Can i take specific frames like 5000-7500-10000 ?
Also my question is how can i see the total frame number ?
Use PyMedia or PyAV to access image data and PIL or Pillow to manipulate it in desired form(s).
These libraries have plenty of examples, so with basic knowledge about the video muxing/demuxing and picture editing you should be able to do it pretty quickly. It's not so complicated as it would seem at first.
Essentially, you demux the video stream into frames, going frame by frame.
You get the picture either in its original (e.g. JPEG) or raw form and push it into PIL/Pillow.
You do with it what you want, resizing etc... - PIL provides all necessary stuff.
And then you paste it into one big image at desired position.
That's all.
You can do that with OpenCV3, the Python wrapper and Numpy.
First you need to do is capture the frames then save them separately and finally paste them all in a bigger matrix.
import numpy as np
import cv2
cap = cv2.VideoCapture(video_source)
# capture the 4 frames
_, frame1 =
_, frame2 =
_, frame3 =
_, frame4 =
# 'glue' the frames using numpy and vertigal/horizontal stacks
big_frame = np.vstack((np.hstack((frame1, frame2)),
np.hstack((frame3, frame4))))
# Show a 4x4 unique frame
cv2.imshow('result', big_frame)
To compile and install OpenCV3 and Numpy in Python3 you can follow this tutorial.
You can implement a kind of "control panel" from 4 different video sources with something like that:
import numpy as np
import cv2
cam1 = cv2.VideoCapture(video_source1)
cam2 = cv2.VideoCapture(video_source2)
cam3 = cv2.VideoCapture(video_source3)
cam4 = cv2.VideoCapture(video_source4)
while True:
more1, frame_cam1 =
more2, frame_cam2 =
more3, frame_cam3 =
more4, frame_cam4 =
if not all([more1, more2, more3, more4]) or cv2.waitKey(1) & 0xFF in (ord('q'), ord('Q')):
big_frame = np.vstack((np.hstack((frame_cam1, frame_cam2)),
np.hstack((frame_cam3, frame_cam4))))
# Show a 4x4 unique frame
cv2.imshow('result', big_frame)
print('END. One or more sources ended.')
how can I display multiple cameras in one window?(OpenCv)
Using this code: Capturing video from two cameras in OpenCV at once , I open multiple cameras in separate windows, but I want to show them in one.
I found code for concanating images but it doesn't work with cameras.
Same question was asked here previously, but no answer was given.
You can do this using numpy methods.
Option 1: np.vstack/np.hstack
Option 2: np.concatenate
Note 1: The methods will fail if you have different frames sizes because you are trying to do operation on matrices of different dimensions. That's why I resized one of the frames to fit the another.
Note 2: OpenCV also has hconcat and vconcat methods but I didn't try to use them in python.
Example Code: (using my Camera feed and a Video)
import cv2
import numpy as np
capCamera = cv2.VideoCapture(0)
capVideo = cv2.VideoCapture("desk.mp4")
while True:
isNextFrameAvail1, frame1 =
isNextFrameAvail2, frame2 =
if not isNextFrameAvail1 or not isNextFrameAvail2:
frame2Resized = cv2.resize(frame2,(frame1.shape[0],frame1.shape[1]))
# ---- Option 1 ----
#numpy_vertical = np.vstack((frame1, frame2))
numpy_horizontal = np.hstack((frame1, frame2))
# ---- Option 2 ----
#numpy_vertical_concat = np.concatenate((image, grey_3_channel), axis=0)
#numpy_horizontal_concat = np.concatenate((frame1, frame2), axis=1)
cv2.imshow("Result", numpy_horizontal)
Result: (for horizontal concat)
Hi could you give me some help here? My intent in this code is a simple threshold in a serie of images that compose a video. The main problem is I can't store 237 frame in a single variable, in these case, := outputdata
The video in question I wanna thresh has a shape of (284,640,352,3). As you see at 22 line i will kill the rgb channel because further only grayscale is needed to threshold method. So this gives me an np.array bigger than 63Gb
you may find weird set manual outputdata index in video.nextFrame() loop; but only in this way i make the code run without error. But when I see value of outputdata variable on debugger they show me all nested np.arrays filled with -1
import skvideo.datasets
import numpy as np
import argparse
import cv2
import time
def threshOneFrame(frame):
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(frame, (7, 7), 0)
thresh = cv2.threshold(blurred,20,255, cv2.THRESH_BINARY)
thresh = thresh[1].astype(np.uint8)
return thresh
ap = argparse.ArgumentParser()
ap.add_argument("-v", "--video", required=True,
help="path to input video file")
args = vars(ap.parse_args())
vidSh =["video"])
vidSh = vidSh.shape[:3]
video =["video"])
outputdata = np.zeros(vidSh, dtype=np.int8)
for id ,frame in enumerate(video.nextFrame()):
frame = threshOneFrame(frame)
outputdata[id] = frame
if cv2.waitKey(1) == ord('q'):
lapstime = time.time()"outputvideo"+time.ctime(lapstime)+".mp4", outputdata)
Like you say, the output video is unreasonably large to hold all of its frames at once uncompressed in memory.
The solution to this is to write output frames to disk as you create them. One way to do that: create an instance, then in for loop, call writer.writeFrame(frame) to write out the frame.
Or another possibility: use e.g. the PIL library to write each frame as an image file: Image.fromarray(frame).save(f'frame-{id:06}.png'). Then, outside of Python, run FFmpeg to convert the images to a video.
I want to get each frame from a video as an image. background to this is following. I have written a Neural Network which is able to recognize Hand Signs. Now I want to start a video stream, where each image/frame of the stream is put through the Neural Network. To fit it into my neural Network, I want to render each frame and reduce the image to 28*28 pixels. In the end it should look similar to this:
I have searched through the web and found out that I can use cv2.VideoCapture to get the stream. But how can I pick each image of the Frame, render it and print the result back on the screen. My Code looks like this until now:
import numpy as np
import cv2
cap = cv2.VideoCapture(0)
# Todo: each Frame/Image from the video should be saved as a variable and open imageToLabel()
# Todo: before the image is handed to the method, it needs to be translated into a 28*28 np Array
# Todo: the returned Label should be printed onto the video (otherwise it can be )
i = 0
while (True):
# Capture frame-by-frame
# Load model once and pass it as an parameter
ret, frame =
i += 1
image = cv2.imwrite('database/{index}.png'.format(index=i), frame)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2BGRAY)
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
# When everything done, release the capture
def imageToLabel(imgArr, checkpointLoad):
new_model = tf.keras.models.load_model(checkpointLoad)
imgArrNew = imgArr.reshape(1, 28, 28, 1) / 255
prediction = new_model.predict(imgArrNew)
label = np.argmax(prediction)
return label
frame is the RGB Image you get from the stream.
gray is the grayscale converted image.
I suppose your network takes grayscaled images because of its shape. Therefor you need to first resize the image to (28,28) and then pass it to your imageToLabel function
resizedImg = cv2.resize(gray,(28,28))
label = imageToLabel(resizedImg,yourModel)
now that you know the prediction you can draw it on the frame using e.g. cv2.putText() and then draw the frame it returns instead of frame
If you want to use parts of the image for your network you can slice the image like this:
slicedImg = gray[50:150,50:150]
resizedImg = cv2.resize(slicedImg,(28,28))
label = imageToLabel(resizedImg,yourModel)
If you're not that familiar with indexing in python you might want to take a look at this
Also if you want it to look like in the linked video you can draw a rectangle from e.g. (50,50) to (150,150) that is green (0,255,0)
I am attempting to use opencv_python to break an mp4 file down into it's frames so I can later open them with pillow, or at least be able to use the images to run my own methods on them.
I understand that the following snippet of code gets a frame from a live video or a recorded video.
import cv2
cap = cv2.VideoCapture("myfile.mp4")
boolean, frame =
What exactly does the read function return and how can I create an array of images which I can modify.
adapted from How to process images of a video, frame by frame, in video streaming using OpenCV and Python. Untested. However, the frames are read into a numpy array and and append to a list that is converted to a numpy array when the all the frames are read in.
import cv2
import numpy as np
images = []
cap = cv2.VideoCapture("./out.mp4")
while not cap.isOpened():
cap = cv2.VideoCapture("./out.mp4")
print "Wait for the header"
pos_frame = cap.get(
while True:
frame_ready, frame = # get the frame
if frame_ready:
# The frame is ready and already captured
# cv2.imshow('video', frame)
# store the current frame in as a numpy array
np_frame = cv2.imread('video', frame)
pos_frame = cap.get(
# The next frame is not ready, so we try to read it again
cap.set(, pos_frame-1)
print "frame is not ready"
# It is better to wait for a while for the next frame to be ready
if cv2.waitKey(10) == 27:
if cap.get( == cap.get(
# If the number of captured frames is equal to the total number of frames,
# we stop
all_frames = np.array(images)
Simply use this code to get an array of frames from your video:
import cv2
import numpy as np
frames = []
video = cv2.VideoCapture("spiderino_turning.mp4")
while True:
read, frame=
if not read:
frames = np.array(frames)
but regarding your question, returns two values. The first one (read in the example code) indicates if the frame is successfully read or not (i.e., True on succeeding and False on any error). The second returning value is the frame that can be empty if the read attempt is unsuccessful or a 3D array (i.e., color image) otherwise.
But why can a read attempt be unsuccessful?
If you are reading from a camera, any problem with the camera (e.g., the cable is disconnected or the camera's battery is dead) can cause an error.
If you are reading from a video, the read attempt will fail when all the frames are read, and there are no more.
I am looking for a way to concatenate a directory of images files (e.g., JPEGs) to a movie file (MOV, MP4, AVI) with Python. Ideally, this would also allow me to take multiple JPEGs from that directory and "paste" them into a grid which is one frame of a movie file. Which modules could achieve this?
You could use the Python interface of OpenCV, in particular a VideoWriter could probably do the job. From what I understand of the doc, the following would do what you want:
w = cvCreateVideoWriter(filename, -1, <your framerate>,
<your frame size>, is_color=1)
and, in a loop, for each file:
cvWriteFrame(w, frame)
Note that I have not tried this code, but I think that I got the idea right. Please tell me if it works.
here's a cut-down version of a script I have that took frames from one video and them modified them(that code taken out), and written to another video. maybe it'll help.
import cv2
fourcc =*'XVID')
out = cv2.VideoWriter('out_video.avi', fourcc, 24, (704, 240))
c = cv2.VideoCapture('in_video.avi')
_, f =
if f is None:
f2 = f.copy() #make copy of the frame
#do a bunch of stuff (missing)
out.write(f2) #write frame to the output video
If you have a bunch of images, load them in a loop and just write one image after another to your vid.
I finally got into a working version of the project that got me into this question.
Now I want to contribute with the knowledge I got.
Here is my solution for getting all pictures in current directory and converting into a video having then centralized in a black background, so this solution works for different size images.
import glob
import cv2
import numpy as np
DESIRED_SIZE = (800, 600)
SLIDE_TIME = 5 # Seconds each image
FPS = 24
fourcc = cv2.VideoWriter.fourcc(*'X264')
writer = cv2.VideoWriter('output.avi', fourcc, FPS, DESIRED_SIZE)
for file_name in glob.iglob('*.jpg'):
img = cv2.imread(file_name)
# Resize image to fit into DESIRED_SIZE
height, width, _ = img.shape
proportion = min(DESIRED_SIZE[0]/width, DESIRED_SIZE[1]/height)
new_size = (int(width*proportion), int(height*proportion))
img = cv2.resize(img, new_size)
# Centralize image in a black frame with DESIRED_SIZE
target_size_img = np.zeros((DESIRED_SIZE[1], DESIRED_SIZE[0], 3), dtype='uint8')
width_offset = (DESIRED_SIZE[0] - new_size[0]) // 2
height_offset = (DESIRED_SIZE[1] - new_size[1]) // 2
width_offset:width_offset+new_size[0]] = img
for _ in range(SLIDE_TIME * FPS):
Is it actually important to you that the solution should use python and produce a movie file? Or are these just your expectations of what a solution would look like?
If you just want to be able to play back a bunch of jpeg files as a movie, you can do it without using python or cluttering up your computer with .avi/.mov/mp4 files by going to and using your mouse to select image files from your hard drive. The "movie" plays back in your Web browser.