I am creating training data where the idea is to capture images from a PiCamera and store them in a JSON encoded file that I can load later into my neural network for training.
Since I'm using a Pi board, memory is obviously a constraint. Therefore I can't take a large amount of images and then serialize them at once.
I would like to serialize each image as I capture it, especially in case of failure, then I will not have lost all the data
def trainer(LEFT, RIGHT):
# capture frames from the camera
with open('robot-train.json', 'w') as train_file:
writer = csv.writer(open('robot-train.csv', 'w'), delimiter=',', quotechar='|')
for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
data = {}
# grab the raw NumPy array representing the image, then initialize the timestamp
data['image'] = frame.array
data['time'] = time.time()
data['left'] = LEFT
data['right'] = RIGHT
# human readable version
writer.writerow([data['time'], data['left'], data['right']])
train_file.write(json.dumps(data, cls=NumpyEncoder))
# prepare for next image
rawCapture.truncate(0)
However I'm getting the error
File "/home/pi/pololu-rpi-slave-arduino-library/pi/xiaonet.py", line 30, in default
return json.JSONEncoder(self, obj)
RuntimeError: maximum recursion depth exceeded
What is the correct way to do this?
Related
I am trying to write a function that creates a new MP4 video from a set of frames taken from another video. The frames will be given in PIL.Image format and is often cropped to include only a part of the input video, but all images will have the same dimension.
What I have tried:
def modify_image(img):
return img
test_input = av.open('input_vid.mp4')
test_output =av.open('output_vid.mp4', 'w')
in_stream = test_input.streams.video[0]
out_stream = test_output.add_stream(template=in_stream)
for frame in test_input.decode(in_stream):
img_frame = frame.to_image()
# Some possible modifications to img_frame...
img_frame = modify_image(img_frame)
out_frame = av.VideoFrame.from_image(img_frame)
out_packet = out_stream.encode(out_frame)
print(out_packet)
test_input.close()
test_output.close()
And the error that I got:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[23], line 11
8 img_frame = frame.to_image()
10 out_frame = av.VideoFrame.from_image(img_frame)
---> 11 out_packet = out_stream.encode(out_frame)
12 print(out_packet)
15 test_input.close()
File av\stream.pyx:153, in av.stream.Stream.encode()
File av\codec\context.pyx:490, in av.codec.context.CodecContext.encode()
File av\frame.pyx:52, in av.frame.Frame._rebase_time()
ValueError: Cannot rebase to zero time.
I followed the answer given in How to create a video out of frames without saving it to disk using python?, and met with the same issue.
Comparing the original VideoFrame and the VideoFrame created from the image, I found that the pts value of the new frames are saved as None instead of integer values. Overwriting the pts value of the new frame with the original values still causes the same error, and overwriting the dts value of the new frame gives the following error:
AttributeError: attribute 'dts' of 'av.frame.Frame' objects is not writable
Is there a way to modify the dts value, or possibly another method to create a video from a set of PIL.Image objects?
Using add_stream(template=in_stream) is only documented in the Remuxing example.
It's probably possible to use template=in_stream when re-encoding, but we have to set the time-base, and set the PTS timestamp of each encoded packet.
I found a discussion here (I didn't try it).
Instead of using template=in_stream, we may stick to the code sample from my other answer, and copy few parameters from the input stream to the output stream.
Example:
in_stream = test_input.streams.video[0]
codec_name = in_stream.codec_context.name # Get the codec name from the input video stream.
fps = in_stream.codec_context.rate # Get the framerate from the input video stream.
out_stream = test_output.add_stream(codec_name, str(fps))
out_stream.width = in_stream.codec_context.width # Set frame width to be the same as the width of the input stream
out_stream.height = in_stream.codec_context.height # Set frame height to be the same as the height of the input stream
out_stream.pix_fmt = in_stream.codec_context.pix_fmt # Copy pixel format from input stream to output stream
#stream.options = {'crf': '17'} # Select low crf for high quality (the price is larger file size).
We also have to "Mux" the video frame:
test_output.mux(out_packet)
At the end, we have to flush the encoder before closing the file:
out_packet = out_stream.encode(None)
test_output.mux(out_packet)
Code sample:
import av
# Build input_vid.mp4 using FFmpeg CLI (for testing):
# ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=100 -vcodec libx264 -crf 10 -pix_fmt yuv444p input_vid.mp4
test_input = av.open('input_vid.mp4')
test_output = av.open('output_vid.mp4', 'w')
in_stream = test_input.streams.video[0]
#out_stream = test_output.add_stream(template=in_stream) # Using template=in_stream is not working (probably meant to be used for re-muxing and not for re-encoding).
codec_name = in_stream.codec_context.name # Get the codec name from the input video stream.
fps = in_stream.codec_context.rate # Get the framerate from the input video stream.
out_stream = test_output.add_stream(codec_name, str(fps))
out_stream.width = in_stream.codec_context.width # Set frame width to be the same as the width of the input stream
out_stream.height = in_stream.codec_context.height # Set frame height to be the same as the height of the input stream
out_stream.pix_fmt = in_stream.codec_context.pix_fmt # Copy pixel format from input stream to output stream
#stream.options = {'crf': '17'} # Select low crf for high quality (the price is larger file size).
for frame in test_input.decode(in_stream):
img_frame = frame.to_image()
out_frame = av.VideoFrame.from_image(img_frame) # Note: to_image and from_image is not required in this specific example.
out_packet = out_stream.encode(out_frame) # Encode video frame
test_output.mux(out_packet) # "Mux" the encoded frame (add the encoded frame to MP4 file).
print(out_packet)
# Flush the encoder
out_packet = out_stream.encode(None)
test_output.mux(out_packet)
test_input.close()
test_output.close()
I have a list of image frames frames that I would like to be able to display in Streamlit application: st.video(frames_converted).
Challenges:
Streamlit takes HTML5 and video requires H264 encoding
Want to complete all processing in-memory (as opposed to the much more common saving to temporary file
Current attempt:
## Convert frames to video for streamlit
height, width, layers = frames[0].shape
codec = cv.VideoWriter_fourcc(*'H264')
fps = 1
video = cv.VideoWriter("temp_video",codec, fps, (width,height)) # Initialize video object
for frame in frames:
video.write(frame)
cv.destroyAllWindows()
video.release()
st.video(video)
Current Blocker
RuntimeError: Invalid binary data format: <class 'cv2.VideoWriter'>
We may encode an "in memory" MP4 video using PyAV as described in my following answer - the video is stored in BytesIO object.
We may pass the BytesIO object as input to Streamlit (or convert the BytesIO object to bytes array and use the array as input).
Code sample:
import numpy as np
import cv2 # OpenCV is used only for writing text on image (for testing).
import av
import io
import streamlit as st
n_frmaes = 100 # Select number of frames (for testing).
width, height, fps = 192, 108, 10 # Select video resolution and framerate.
output_memory_file = io.BytesIO() # Create BytesIO "in memory file".
output = av.open(output_memory_file, 'w', format="mp4") # Open "in memory file" as MP4 video output
stream = output.add_stream('h264', str(fps)) # Add H.264 video stream to the MP4 container, with framerate = fps.
stream.width = width # Set frame width
stream.height = height # Set frame height
#stream.pix_fmt = 'yuv444p' # Select yuv444p pixel format (better quality than default yuv420p).
stream.pix_fmt = 'yuv420p' # Select yuv420p pixel format for wider compatibility.
stream.options = {'crf': '17'} # Select low crf for high quality (the price is larger file size).
def make_sample_image(i):
""" Build synthetic "raw BGR" image for testing """
p = width//60
img = np.full((height, width, 3), 60, np.uint8)
cv2.putText(img, str(i+1), (width//2-p*10*len(str(i+1)), height//2+p*10), cv2.FONT_HERSHEY_DUPLEX, p, (255, 30, 30), p*2) # Blue number
return img
# Iterate the created images, encode and write to MP4 memory file.
for i in range(n_frmaes):
img = make_sample_image(i) # Create OpenCV image for testing (resolution 192x108, pixel format BGR).
frame = av.VideoFrame.from_ndarray(img, format='bgr24') # Convert image from NumPy Array to frame.
packet = stream.encode(frame) # Encode video frame
output.mux(packet) # "Mux" the encoded frame (add the encoded frame to MP4 file).
# Flush the encoder
packet = stream.encode(None)
output.mux(packet)
output.close()
output_memory_file.seek(0) # Seek to the beginning of the BytesIO.
#video_bytes = output_memory_file.read() # Convert BytesIO to bytes array
#st.video(video_bytes)
st.video(output_memory_file) # Streamlit supports BytesIO object - we don't have to convert it to bytes array.
# Write BytesIO from RAM to file, for testing:
#with open("output.mp4", "wb") as f:
# f.write(output_memory_file.getbuffer())
#video_file = open('output.mp4', 'rb')
#video_bytes = video_file.read()
#st.video(video_bytes)
We can't use cv.VideoWriter, because it does not support in-memory video encoding (cv.VideoWriter requires a "true file").
I have a function that returns a frame as result. I wanted to know how to make a video out of a for-loop with this function without saving every frame and then creating the video.
What I have from now is something similar to:
import cv2
out = cv2.VideoWriter('video.mp4',cv2.VideoWriter_fourcc(*'DIVX'), 14.25,(500,258))
for frame in frames:
img_result = MyImageTreatmentFunction(frame) # returns a numpy array image
out.write(img_result)
out.release()
Then the video will be created as video.mp4 and I can access it on memory. I'm asking myself if there's a way to have this video in a variable that I can easily convert to bytes later. My purpose for that is to send the video via HTTP post.
I've looked on ffmpeg-python and opencv but I didn't find anything that applies to my case.
We may use PyAV for encoding "in memory file".
PyAV is a Pythonic binding for the FFmpeg libraries.
The interface is relatively low level, but it allows us to do things that are not possible using other FFmpeg bindings.
Here are the main stages for creating MP4 in memory using PyAV:
Create BytesIO "in memory file":
output_memory_file = io.BytesIO()
Use PyAV to open "in memory file" as MP4 video output file:
output = av.open(output_memory_file, 'w', format="mp4")
Add H.264 video stream to the MP4 container, and set codec parameters:
stream = output.add_stream('h264', str(fps))
stream.width = width
stream.height = height
stream.pix_fmt = 'yuv444p'
stream.options = {'crf': '17'}
Iterate the OpenCV images, convert image to PyAV VideoFrame, encode, and "Mux":
for i in range(n_frmaes):
img = make_sample_image(i) # Create OpenCV image for testing (resolution 192x108, pixel format BGR).
frame = av.VideoFrame.from_ndarray(img, format='bgr24')
packet = stream.encode(frame)
output.mux(packet)
Flush the encoder and close the "in memory" file:
packet = stream.encode(None)
output.mux(packet)
output.close()
The following code samples encode 100 synthetic images to "in memory" MP4 memory file.
Each synthetic image applies OpenCV image, with sequential blue frame number (used for testing).
At the end, the memory file is written to output.mp4 file for testing.
import numpy as np
import cv2
import av
import io
n_frmaes = 100 # Select number of frames (for testing).
width, height, fps = 192, 108, 23.976 # Select video resolution and framerate.
output_memory_file = io.BytesIO() # Create BytesIO "in memory file".
output = av.open(output_memory_file, 'w', format="mp4") # Open "in memory file" as MP4 video output
stream = output.add_stream('h264', str(fps)) # Add H.264 video stream to the MP4 container, with framerate = fps.
stream.width = width # Set frame width
stream.height = height # Set frame height
stream.pix_fmt = 'yuv444p' # Select yuv444p pixel format (better quality than default yuv420p).
stream.options = {'crf': '17'} # Select low crf for high quality (the price is larger file size).
def make_sample_image(i):
""" Build synthetic "raw BGR" image for testing """
p = width//60
img = np.full((height, width, 3), 60, np.uint8)
cv2.putText(img, str(i+1), (width//2-p*10*len(str(i+1)), height//2+p*10), cv2.FONT_HERSHEY_DUPLEX, p, (255, 30, 30), p*2) # Blue number
return img
# Iterate the created images, encode and write to MP4 memory file.
for i in range(n_frmaes):
img = make_sample_image(i) # Create OpenCV image for testing (resolution 192x108, pixel format BGR).
frame = av.VideoFrame.from_ndarray(img, format='bgr24') # Convert image from NumPy Array to frame.
packet = stream.encode(frame) # Encode video frame
output.mux(packet) # "Mux" the encoded frame (add the encoded frame to MP4 file).
# Flush the encoder
packet = stream.encode(None)
output.mux(packet)
output.close()
# Write BytesIO from RAM to file, for testing
with open("output.mp4", "wb") as f:
f.write(output_memory_file.getbuffer())
I am trying to create a video editor, where obviously, you will be able to remove and add frames. My thinking, was to convert the video file itself into an array of frames which can then be manipulated.
Using this answers code, I did that. This works fine for small video files, but with big video files, a memory error can quickly occur - because, of course, memory is storing hundreds of uncompressed images.
This is the exact code I am using:
import numpy
import cv2
def video_to_frames(file):
"""Splits a video file into a numpy array of frames"""
video = cv2.VideoCapture(file)
frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
frame_width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
buffer = numpy.empty((frame_count, frame_height, frame_width, 3), numpy.dtype("uint8"))
index_count = 0
running = True
while(index_count < frame_count and running): #Reads each frame to the array
running, buffer[index_count] = video.read()
index_count += 1
video.release()
return buffer #Returns the numpy array of frames
print(video_to_frames("Video.mp4"))
Finally, here is the exact memory error I got: MemoryError: Unable to allocate 249. GiB for an array with shape (46491, 1000, 1920, 3) and data type uint8
So I have two questions really:
Is this the most efficient way to go about manipulating a video?
If it is, how can I go about storing all those frames without running into a memory error?
Thank you.
I am attempting to use opencv_python to break an mp4 file down into it's frames so I can later open them with pillow, or at least be able to use the images to run my own methods on them.
I understand that the following snippet of code gets a frame from a live video or a recorded video.
import cv2
cap = cv2.VideoCapture("myfile.mp4")
boolean, frame = cap.read()
What exactly does the read function return and how can I create an array of images which I can modify.
adapted from How to process images of a video, frame by frame, in video streaming using OpenCV and Python. Untested. However, the frames are read into a numpy array and and append to a list that is converted to a numpy array when the all the frames are read in.
import cv2
import numpy as np
images = []
cap = cv2.VideoCapture("./out.mp4")
while not cap.isOpened():
cap = cv2.VideoCapture("./out.mp4")
cv2.waitKey(1000)
print "Wait for the header"
pos_frame = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
while True:
frame_ready, frame = cap.read() # get the frame
if frame_ready:
# The frame is ready and already captured
# cv2.imshow('video', frame)
# store the current frame in as a numpy array
np_frame = cv2.imread('video', frame)
images.append(np_frame)
pos_frame = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
else:
# The next frame is not ready, so we try to read it again
cap.set(cv2.cv.CV_CAP_PROP_POS_FRAMES, pos_frame-1)
print "frame is not ready"
# It is better to wait for a while for the next frame to be ready
cv2.waitKey(1000)
if cv2.waitKey(10) == 27:
break
if cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES) == cap.get(cv2.cv.CV_CAP_PROP_FRAME_COUNT):
# If the number of captured frames is equal to the total number of frames,
# we stop
break
all_frames = np.array(images)
Simply use this code to get an array of frames from your video:
import cv2
import numpy as np
frames = []
video = cv2.VideoCapture("spiderino_turning.mp4")
while True:
read, frame= video.read()
if not read:
break
frames.append(frame)
frames = np.array(frames)
but regarding your question, video.read() returns two values. The first one (read in the example code) indicates if the frame is successfully read or not (i.e., True on succeeding and False on any error). The second returning value is the frame that can be empty if the read attempt is unsuccessful or a 3D array (i.e., color image) otherwise.
But why can a read attempt be unsuccessful?
If you are reading from a camera, any problem with the camera (e.g., the cable is disconnected or the camera's battery is dead) can cause an error.
If you are reading from a video, the read attempt will fail when all the frames are read, and there are no more.