Python Open CV image streaming without decoding and reencoding?

Python Open CV image streaming without decoding and reencoding? - python

Is it possible in python with the latest CV2 to use CV2 to directly bind mjpeg output from the camera to a stream without having to do source -> cv2.read() -> numpy array -> cv2.imencode(".jpg") -> mjpeg? I am looking to do source -> mjpeg in a pythonic way.
Latency is a major issue so any advice including options beyond CV2 would be appreciated.

No. OpenCV is not a media library. Its video I/O is not intended or made for this.
I would advise to use PyAV, which is the only proper python wrapper around ffmpeg's libraries that I know of. PyAV comes with a few examples to give you a feel for how it works.
the basic problem then is how to use ffmpeg's libraries to enumerate available video devices, query their modes, select the mode you want, and move packets around.

Related

Capturing and manipulating a webcam feed and exposing it as a "virtual webcam" - in Python, on Windows

The final goal would be to capture the regular webcam feed, manipulate it in some way (blur face, replace background, ...) and then output the result in some way so that the manipulated feed can be chosen as input for whatever application expects a webcam (Discord, Teams, ...).
I am working on a Windows machine and would prefer to do this in Python. This combination has me lost, at the moment.
capturing and manipulating is easy with https://pypi.org/project/opencv-python/
the exposing the feed step seems overly complicated
Apparently, on Linux there are Python libraries just offering that functionality, but they do not work on Windows. Everything that sounded like it could hint towards a good solution went directly into C++ country. There are programs which basically do what I want, e.g. webcamoid (https://webcamoid.github.io/) and I could hack together a solution which captures and processes the feed via Python, then uses webcamoid to record the output and feed it into a virtual webcam. But I'd much prefer to do the whole thing in one.
I have been searching around a bit and found these questions on stackoverflow on the topic:
Using OpenCV Output as Webcam (uses C++ but also gives a Python solution - however, pyfakewebcam does not work on Windows)
How do I stream to a new video source? (not really answered, just links to other question)
How to simulate a webcam device (more C++ hints, links to msdn's Writing a Custom Media Source)
Artificial webcam on windows (basically what I want, but in C++ again)
Writing a virtual webcam? (more explanation on how this might work in C++)
I am getting the strong impression that I need C++ for this or have to work on Linux. However, lacking both a Linux machine and any setup as well as experience in programming in C++, this seems like a large amount of work for the "toy project" this was supposed to be. But maybe I am just missing an obvious library or functionality somewhere?
Hence, the question is: Is there a way to expose a "webcam" stream via Python on Windows?
And, one last idea: What if I used a docker container with a Linux Python environment to implement the functionality I want. Could that container then stream a "virtual webcam" to the host?

You can do this by using pyvirtualcam
First, you need to install it using pip
pip install pyvirtualcam
Then go to This Link and download the zip file from the latest release
Unzip and navigate to \bin\[your computer's bittedness]
Open Command Prompt in that directory and type
regsvr32 /n /i:1 "obs-virtualsource.dll"
This will register a fake camera to your computer
and if you want to unregister the camera then run this command:
regsvr32 /u "obs-virtualsource.dll"
Now you can send frames to the camera using pyvirtualcam
This is a sample:
import pyvirtualcam
import numpy as np
with pyvirtualcam.Camera(width=1280, height=720, fps=30) as cam:
while True:
frame = np.zeros((cam.height, cam.width, 4), np.uint8) # RGBA
frame[:,:,:3] = cam.frames_sent % 255 # grayscale animation
frame[:,:,3] = 255
cam.send(frame)
cam.sleep_until_next_frame()

Face detection inside a MP4 or YUV video file in linux?

I am stuck in finding help on my next project. My use case as follows,
1) Read frames from a mp4 file.
2) Detect faces inside the frames.
3) Store or Display the final output.
"same use case to be executed with a YUV420P (raw) video"
Am very very new to openCV platform but am quite familiar with gstreamer and linux interface programming.
Please help me to find any reference (example) for the same.

When you are familiar with GStreamer - there is a OpenCV facedetect element: https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-bad-plugins/html/gst-plugins-bad-plugins-facedetect.html.
Reading and writing raw video data should be trivial for you then I guess? ;-)

There is a very similar post on OpenCV's forum. Basically you have to build your OpenCV from source and enable the extra features for working with additional video formats. For that you need ffmpeg (which you will probably also have to build from source), gstreamer (if you want to use it) and fourcc, which allows you to call cv::CreateVideoCapture() with CV_FOURCC(...) set to the codec you are using (the list of abrevations can be found here).

How to write avi file with OpenCV larger than 2 GB?

I'm using OpenCV with Python, but actually can switch to C++, so if it's matter please answer question considering it.
I'm writing .avi file(joining multiple avi files into one) using
cv2.VideoWriter([filename, fourcc, fps, frameSize[, isColor]])
but recently found out that I can't write .avi file larger than 2 GB with it. It even mentioned there: Due to this OpenCV for video containers supports only the avi extension, its first version. A direct limitation of this is that you cannot save a video file larger than 2 GB.
But right now I've got no time to learn new library like ffmpeg, I need to do it very fast.
How can I write this file, using C++ or Python with knowledge of OpenCV, or at least with input part - using
cv::Mat
as frames

This limitation was removed in OpenCV 3.0, due to the introduction of new file formats such as .mkv, who do support video files larger than 2GB.
See Does OpenCV 3.0 Still Has Limits On VideoWriter Size?.
NOTE: The documentation and examples weren't updated yet, so maybe this should be considered experimental.

You have answered your own question but I'm afriad it isn't the answer you want.
From your link
As you can see things can get really complicated with videos. However, OpenCV is mainly a computer vision library, not a video stream, codec and write one. Therefore, the developers tried to keep this part as simple as possible. Due to this OpenCV for video containers supports only the avi extension, its first version. A direct limitation of this is that you cannot save a video file larger than 2 GB. Furthermore you can only create and expand a single video track inside the container. No audio or other track editing support here. Nevertheless, any video codec present on your system might work. If you encounter some of these limitations you will need to look into more specialized video writing libraries such as FFMpeg or codecs as HuffYUV, CorePNG and LCL.
What this paragraph says is that the developers of OpenCV made a design choice that says you cannot write video files larger than 2Gb using OpenCV for the specific reason that it is a computer vision library not a video tool.
Unfortunately if you want to write videos larger than 2Gb you are going to need to learn to use FFMPEG or something similar (It isn't that hard and has good bindings to OpenCV)

mpeg-1 video writing with python

I am trying to use the Opencv VideoWriter object with the mpeg-1 encoding to create videos, I am aiming at writing only two images on that video, using mpeg-1 encoding, I would like to know how much the first image that I wrote first helps in compressing the second image. In other words find the file size before writing the 2nd image and after. My questions are:
Is there any way to perform this process using Opencv?
Is there a way to avoid writing on disks and just have the information of the size of the compreesed video( after adding the second image)?
Is there any other good alternatives reach my goals?

I suggest you learn GStreamer framework which has Python bindings available.
http://gstreamer.freedesktop.org/modules/gst-python.html
It works best on Linux platforms, some OSX support is available.
GStreamer provides "sane", but very powerful and very complex, APIs for procedural video and audio generation.
See also:
GStreamer: status of Python bindings and encoding video with mixed audio
Alternative you can write out frames to raw image images files and parse them to a video using ffmpeg command. Might work on Microsoft Windows platforms too.

Dealing with huge (potentially over 30000x30000) images in Python?

I'm trying to use a python script called deepzoom.py to convert large overhead renders (often over 1GP) to the Deep Zoom image format (ie, google maps-esque tile format), but unfortunately it's powered by PIL, which usually ends up crashing due to memory limitations. The creator has said he's delving into VIPS, but even nip2 (the GUI frontend for VIPS) fails to open the image. In another question by someone else (though on the same topic), someone suggested OpenImageIO, which looks like it has the ability, and has Python wrappers, but there aren't any proper binaries provided, and trying to compile it on Windows is a nightmare.
Are there any alternative libraries for Python I can use? I've tried PythonMagickWand (wrapper for ImageMagick) and PythonMagick (wrapper for GraphicsMagick), but both of those also run into memory problems.

I had a very similar problem and I ended up solving it by using netpbm, which works fine on windows. Netpbm had no problem with converting huge .png files and then slicing, cropping, re-combining (using pamcrop, pamdice, and pamundice) and converting back to .png without using much memory at all. I just included the necessary netpbm binaries and dlls with my application and called them from python.

It sounds like you're trying to use georeferenced imagery or something similar, for which a GIS solution sounds more appropriate. I'd use GDAL -- it's an excellent library and comes with easy-to-use Python bindings via Swig.
On Windows, the easiest way to install it is via Frank Warmerdam's FWTools package.

I'm able to use pyvips to read images with size (50000, 50000, 3):
img = pyvips.Image.new_from_file('xxx.jpg')
arr = np.ndarray(buffer=img.write_to_memory(),
dtype=np.uint8,
shape=[img.height, img.width, img.bands])

Is a partial load useful? If you use PIL and the image format is .BMP: you can open() an image file (which doesn't load it), then do a crop(), and then load - which will only actually load the part of the image which you've selected by crop. Will probably also work with TGA, maybe even for JPG and less efficiently for PNG and other formats.

libvips comes with a very fast DeepZoom creator that can work with images of any size. Try:
$ vips dzsave huge.tif mydz
Will write the tiles to mydz_files and also write a mydz.dzi info file for you. It's typically 10x faster than deepzoom.py and has no size limit.
See this chapter in the manual for an introduction to dzsave.
You can do the same thing from Python using pyvips like this:
import pyvips
my_image = pyvips.Image.new_from_file("huge.tif", access="sequential")
my_image.dzsave("mydz")
The access="sequential" tells pyvips it can stream the image rather than having to read the whole thing into memory.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.