I am using ffmpeg to convert a video into images. These images are then processed by my Python program. Originally I used ffmpeg to first save the images to disk, then reading them one by one with Python.
This works fine, but in an effort to speed up the program I am trying to skip the storage step and only work with the images in memory.
I use the following ffmpeg and Python subproccesses command to pipe the output from ffmpeg to Python:
command = "ffmpeg.exe -i ADD\\sg1-original.mp4 -r 1 -f image2pipe pipe:1"
pipe = subprocess.Popen(ffmpeg-command, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
image = Image.new(pipe.communicate()[0])
The image variable can then be used by my program. The problem is that if I send more than 1 image from ffmpeg all the data is stored in this variable. I need a way to separate the images. The only way I can think of is splitting on jpeg markers end of file (0xff, 0xd9). This works, but is unreliable.
What have I missed regarding piping files with subproccesses. Is there a way to only read one file at a time from the pipeline ?
One solution to this would be to use the ppm format, which has a predictable size:
ffmpeg -i movie.mp4 -r 1 -f image2pipe -vcodec ppm pipe:1
The format is specified here: http://netpbm.sourceforge.net/doc/ppm.html
And looks something like this:
P6 # magic number
640 480 # width height
255 # colors per channel
<data>
Where will be exactly 640 * 480 * 3 bytes (assuming there are 255 or fewer colors per channel).
Note that this is an uncompressed format, so it may potentially take up quite a bit of memory if you read it all at once. You may consider switching your algorithm to:
pipe = subprocess.Popen(ffmpeg_command, stdout=subprocess.PIPE, stderr=sys.stderr)
while True:
chunk = pipe.stdout.read(4096)
if not chunk:
break
# ... process chunk of data ...
Note that the subprocess' stderr is set to the current process' stderr; this is important because, if we don't, the stderr buffer could fill up (as nothing is reading it) and cause a deadlock.
Related
I am trying to get the frame using the ffmpeg command and show using the opencv function cv2.imshow(). This snippet gives the black and white image on the RTSP Stream link . Output is given below link [ output of FFmpeg link].
I have tried the ffplay command but it gives the direct image . i am not able to access the frame or apply the image processing.
Output of FFMPEG
import cv2
import subprocess as sp
command = [ 'C:/ffmpeg/ffmpeg.exe',
'-i', 'rtsp://192.168.1.12/media/video2',
'-f', 'image2pipe',
'-pix_fmt', 'rgb24',
'-vcodec', 'rawvideo', '-']
import numpy
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=10**8)
while True:
raw_image = pipe.stdout.read(420*360*3)
# transform the byte read into a numpy array
image = numpy.fromstring(raw_image, dtype='uint8')
image = image.reshape((360,420,3))
cv2.imshow('hello',image)
cv2.waitKey(1)
# throw away the data in the pipe's buffer.
pipe.stdout.flush()
You're using a wrong output format, it should be -f rawvideo. This should fix your primary problem. Current -f image2pipe wraps the RGB data in an image format (donno what it is maybe BMP as rawvideo codec is being used?) thus not shown correctly.
Other tips:
If your data is grayscale, use -pix_fmt gray and read 420*360 bytes at a time.
Don't know the difference in speed, but I use np.frombuffer instead of np.fromstring
pipe.stdout.flush() is a dangerous move IMO as the buffer may have a partial frame. Consider setting bufsize to be an exact integer multiple of framesize in bytes.
If you are expecting processing to take much longer than input frame rate, you may want to reduce the output framerate -r to match the processing rate (to avoid extraneous data transfer from ffmpeg to python)
The Question
I want to load an audio file of any type (mp3, m4a, flac, etc) and write it to an output stream.
I tried using pydub, but it loads the entire file at once which takes forever and runs out of memory easily.
I also tried using python-vlc, but it's been unreliable and too much of a black box.
So, how can I open large audio files chunk-by-chunk for streaming?
Edit #1
I found half of a solution here, but I'll need to do more research for the other half.
TL;DR: Use subprocess and ffmpeg to convert the file to wav data, and pipe that data into np.frombuffer. The problem is, the subprocess still has to finish before frombuffer is used.
...unless it's possible to have the pipe written to on 1 thread while np reads it from another thread, which I haven't tested yet. For now, this problem is not solved.
I think the python package https://github.com/irmen/pyminiaudio can be of helpful. You can stream an audio file like this
import miniaudio
audio_path = "my_audio_file.mp3"
target_sampling_rate = 44100 #the input audio will be resampled a this sampling rate
n_channels = 1 #either 1 or 2
waveform_duration = 30 #in seconds
offset = 15 #this means that we read only in the interval [15s, duration of file]
waveform_generator = miniaudio.stream_file(
filename = audio_path,
sample_rate = target_sampling_rate,
seek_frame = int(offset * target_sampling_rate),
frames_to_read = int(waveform_duration * target_sampling_rate),
output_format = miniaudio.SampleFormat.FLOAT32,
nchannels = n_channels)
for waveform in waveform_generator:
#do something with the waveform....
I know for sure that this works on mp3, ogg, wav, flac but for some reason it does not on mp4/acc and I am actually looking for a way to read mp4/acc
I am using python to do some basic image processing, and want to extend it to process a video frame by frame.
I get the video as a blob from a server - .webm encoded - and have it in python as a byte string (b'\x1aE\xdf\xa3\xa3B\x86\x81\x01B\xf7\x81\x01B\xf2\x81\x04B\xf3\x81\x08B\x82\x88matroskaB\x87\x81\x04B\x85\x81\x02\x18S\x80g\x01\xff\xff\xff\xff\xff\xff\xff\x15I\xa9f\x99*\xd7\xb1\x83\x0fB#M\x80\x86ChromeWA\x86Chrome\x16T\xaek\xad\xae\xab\xd7\x81\x01s\xc5\x87\x04\xe8\xfc\x16\t^\x8c\x83\x81\x01\x86\x8fV_MPEG4/ISO/AVC\xe0\x88\xb0\x82\x02\x80\xba\x82\x01\xe0\x1fC\xb6u\x01\xff\xff\xff\xff\xff\xff ...).
I know that there is cv.VideoCapture, which can do almost what I need. The problem is that I would have to first write the file to disk, and then load it again. It seems much cleaner to wrap the string, e.g., into an IOStream, and feed it to some function that does the decoding.
Is there a clean way to do this in python, or is writing to disk and loading it again the way to go?
According to this post, you can't use cv.VideoCapture for decoding in memory stream.
you may decode the stream by "piping" to FFmpeg.
The solution is a bit complicated, and writing to disk is much simpler, and probably cleaner solution.
I am posting a solution using FFmpeg (and FFprobe).
There are Python bindings for FFmpeg, but the solution is executing FFmpeg as an external application using subprocess module.
(The Python binding is working well with FFmpeg, but piping to FFprobe is not).
I am using Windows 10, and I put ffmpeg.exe and ffprobe.exe in the execution folder (you may set the execution path as well).
For Windows, download the latest (statically liked) stable version.
I created a standalone example that performs the following:
Generate synthetic video, and save it to WebM file (used as input for testing).
Read file into memory as binary data (replace it with your blob from the server).
Pipe the binary stream to FFprobe, for finding the video resolution.
In case the resolution is known from advance, you may skip this part.
Piping to FFprobe makes the solution more complicated than it should have.
Pipe the binary stream to FFmpeg stdin for decoding, and read decoded raw frames from stdout pipe.
Writing to stdin is done in chunks using Python thread.
(The reason for using stdin and stdout instead of named pipes is for Windows compatibility).
Piping architecture:
-------------------- Encoded --------- Decoded ------------
| Input WebM encoded | data | ffmpeg | raw frames | reshape to |
| stream (VP9 codec) | ----------> | process | ----------> | NumPy array|
-------------------- stdin PIPE --------- stdout PIPE -------------
Here is the code:
import numpy as np
import cv2
import io
import subprocess as sp
import threading
import json
from functools import partial
import shlex
# Build synthetic video and read binary data into memory (for testing):
#########################################################################
width, height = 640, 480
sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size={}x{}:rate=1 -vcodec vp9 -crf 23 -t 50 test.webm'.format(width, height)))
with open('test.webm', 'rb') as binary_file:
in_bytes = binary_file.read()
#########################################################################
# https://stackoverflow.com/questions/5911362/pipe-large-amount-of-data-to-stdin-while-using-subprocess-popen/14026178
# https://stackoverflow.com/questions/15599639/what-is-the-perfect-counterpart-in-python-for-while-not-eof
# Write to stdin in chunks of 1024 bytes.
def writer():
for chunk in iter(partial(stream.read, 1024), b''):
process.stdin.write(chunk)
try:
process.stdin.close()
except (BrokenPipeError):
pass # For unknown reason there is a Broken Pipe Error when executing FFprobe.
# Get resolution of video frames using FFprobe
# (in case resolution is know, skip this part):
################################################################################
# Open In-memory binary streams
stream = io.BytesIO(in_bytes)
process = sp.Popen(shlex.split('ffprobe -v error -i pipe: -select_streams v -print_format json -show_streams'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8)
pthread = threading.Thread(target=writer)
pthread.start()
pthread.join()
in_bytes = process.stdout.read()
process.wait()
p = json.loads(in_bytes)
width = (p['streams'][0])['width']
height = (p['streams'][0])['height']
################################################################################
# Decoding the video using FFmpeg:
################################################################################
stream.seek(0)
# FFmpeg input PIPE: WebM encoded data as stream of bytes.
# FFmpeg output PIPE: decoded video frames in BGR format.
process = sp.Popen(shlex.split('ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8)
thread = threading.Thread(target=writer)
thread.start()
# Read decoded video (frame by frame), and display each frame (using cv2.imshow)
while True:
# Read raw video frame from stdout as bytes array.
in_bytes = process.stdout.read(width * height * 3)
if not in_bytes:
break # Break loop if no more bytes.
# Transform the byte read into a NumPy array
in_frame = (np.frombuffer(in_bytes, np.uint8).reshape([height, width, 3]))
# Display the frame (for testing)
cv2.imshow('in_frame', in_frame)
if cv2.waitKey(100) & 0xFF == ord('q'):
break
if not in_bytes:
# Wait for thread to end only if not exit loop by pressing 'q'
thread.join()
try:
process.wait(1)
except (sp.TimeoutExpired):
process.kill() # In case 'q' is pressed.
################################################################################
cv2.destroyAllWindows()
Remark:
In case you are getting an error like "file not found: ffmpeg...", try using full path.
For example (in Linux): '/usr/bin/ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'
Two years after Rotem wrote his answer there is now a cleaner / easier way to do this using ImageIO.
Note: Assuming ffmpeg is in your path, you can generate a test video to try this example using: ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 testsrc.webm
import imageio.v3 as iio
from pathlib import Path
webm_bytes = Path("testsrc.webm").read_bytes()
# read all frames from the bytes string
frames = iio.imread(webm_bytes, index=None, format_hint=".webm")
frames.shape
# Output:
# (300, 720, 1280, 3)
for frame in iio.imiter(webm_bytes, format_hint=".webm"):
print(frame.shape)
# Output:
# (720, 1280, 3)
# (720, 1280, 3)
# (720, 1280, 3)
# ...
To use this you'll need the ffmpeg backend (which implements a solution similar to what Rotem proposed): pip install imageio[ffmpeg]
In response to Rotem's comment a bit of explanation:
The above snippet uses imageio==2.16.0. The v3 API is an upcoming user-facing API that streamlines reading and writing. The API is available since imageio==2.10.0, however, you will have to use import imageio as iio and use iio.v3.imiter and iio.v3.imread on versions older than 2.16.0.
The ability to read video bytes has existed forever (>5 years and counting) but has (as I am just now realizing) never been documented directly ... so I will add a PR for that soon™ :)
On older versions (tested on v2.9.0) of ImageIO (v2 API) you can still read video byte strings; however, this is slightly more verbose:
import imageio as iio
import numpy as np
from pathlib import Path
webm_bytes = Path("testsrc.webm").read_bytes()
# read all frames from the bytes string
frames = np.stack(iio.mimread(webm_bytes, format="FFMPEG", memtest=False))
# iterate over frames one by one
reader = iio.get_reader(webm_bytes, format="FFMPEG")
for frame in reader:
print(frame.shape)
reader.close()
There is a pythonic way to do this by using decord package.
import io
from decord import VideoReader
# This is the bytes object of your video.
video_str
# Load video
file_obj = io.BytesIO(video_str)
container = decord.VideoReader(file_obj)
# Get the total number of video frames
len(container)
# Access the NDarray of the (i+1)-th frame
container[i]
You can learn more about decord in decord github repo.
You can learn more about video IO in mmaction repo. See DecordInit for using decord IO.
I have a gif that I am taking apart frame by frame in order to write text onto it. I used ffmpeg to put the frames (saved as individual .png files) back together and it worked nicely. This is the code I used:
ffmpeg -f image2 -i newimg%d.png out.gif
But now I want to use the python wrapper ffmpy. Following the docs, I tried a variety of things but I keep getting syntax errors.
Here is one instance of my efforts:
ff = ffmpy(FFmpeg(inputs = {ffmpeg -f image2 -i "newimg%d.png"}, outputs = {"gif_with_text.gif"}))
ff.run()
In this attempt, the syntax error points to the "2" in image2. Could someone help me out? Note - I'm new to python, let alone ffmpeg and ffmpy.
I am trying to convert an .h264 file created with python from an incoming stream to xvid format with ffmpeg.
The file is 30min long and 12fps. However, the converted file automatically creates a file that is 25fps and thus 14.4min long. If I set fps like ffmpeg -i test.h264 -r 12 test.avi it creates a video of 14.4min long with a fps of 12.
How can I set it to see the incoming video as 12fps? I tried recording immediately in xvid coded in python using FOURCC but on mac OS X the only codec that seems to work is mp4v. I also tried using MP4Box, which creates the right video duration and fps but for which I cannot set it to the xvid coded (which I need).
The options are the same for input and output. If they are set before the -i, they are applied to input file. After -i, they are applied to output.
Everything is explained in the doc
ffmpeg -r 12 -i inuputAt12fps.h264 -r 25 outputAt25Fps.avi