I am trying to compute a rough "quality" metric for a video, which takes the following into consideration:
"Smoothness" of video; i.e., the opposite of how "choppy" it is
Image quality; i.e. if there are a lot of compression artifacts, the quality should decrease in size
I came across https://github.com/aizvorski/scikit-video, but the code seems to be littered with FIXMEs and TODOs, and on top of that there's barely any comments or documentation.
Is there a Python library, or even a program with a CLI, for computing video quality, or perhaps a set of libraries that will help me compute the above two metrics separately?
Image Quality
I would think that "Image Quality" is largely a function of bit-depth (or effective bit-depth) and bit-rate.
You can parse ffmpeg output to get this information. PIL or PyQt/PySide can also do this.
Smoothness
For smoothness, you may need to use some type of optical flow algorithm and get deltas from frame to frame.
OpenCV looks like a project that does many of these things.
Related
I'm trying to get the black region from an image using TensorFlow. To this point I was using OpenCV but it fails to get the hole region given that the gray scale is very complicated.
The image I'm using is a photo of a electric meter, the whole meter is white(normally) except for the part with the numbers that is black. I would want to isolate this part in order to get the numbers later on.
To de the date, I have been using the function findContours from OpenCV, with a defined threshold .
I have seen that TensorFlow is very potent so I think this could no be a problem, but I can't find any documentation. Any hints? Thanks!
Tensorflow is a general purpose math library that is unique in two respects:
It provides automatic differentiation.
It has efficient kernels built to run on either the CPU or GPU.
It does have a library of image functions, but it's nowhere near as extensive as OpenCV, and will never be. Those are mostly for data augmentation (as it pertains to ML) and data loading.
Note that you can run OpenCV code on the GPU in many cases (I'm not sure about findContours in particular. So sticking with OpenCV should be considered.
But within tensorflow you would have to re-write that function yourself. In looking at the code (which I provided a link to in your question) it doesn't look very hard to do. You could replicate that in symbolic tensorflow operations in relatively short order, but nothing like that exists pre-built in tensorflow. Nor is it likely to in the future.
I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.
I have a video of a road/building and I want to create a 3D model out of it. The scene I am looking at is rigid and the drone is moving. I assume not having any extra info like camera pose, accelerations or GPS position. I would love to find a python implementation that I can adapt to my liking.
So far, I have decided to use the OpenCV calcOpticalFlowFarneback() for optical flow, which seems reasonably fast and accurate. With it, I can get the Fundamental Matrix F with findFundamentalMat(). So far so good.
Now, according to the tutorial I am following here, I am supposed to magically have the Calibration Matrix of the camera, which I obviously don't have nor plan to have available in the future app I am developing.
After some long research, I have found a paper (Self-calibration of a moving camera from point correspondences and
fundamental matrices) from 1997 that defines what I am looking for (with a nice summary here). I am looking for the simplest/easiest implementation possible, and I am stuck with these problems:
If the camera I am going to use changes exposure and focus automatically (no zoom), are the intrinsic parameters of the camera going to change?
I am not familiar with the Homotopy Continuation Method for solving equations numerically, plus they seem to be slow.
I intend to use the Extended Kalman Filter, but do not know where to start, knowing that a bad initialization leads to non-convergence.
Digging some more I found a Multi Camera Self Calibration toolbox open-source written for Octave with a Python wrapper. My last resort will be to break down the code and write it in Python directly. Any other options?
Note: I do not want to use the a chess board nor the planarity constraint.
Is there any other way to very accurately self-calibrate my camera? After 20 years of research since 1997, has anyone come up with a more straightforward method??
Is this a one-shot thing, or are you developing an app to process lots videos like these automatically?
If the former, I'd rather use an integrated tool like Blender. Look up one of the motion tracking (or "matchmoving") tutorials on youtube to get an idea of it, for example this one.
Imagine someone taking a burst shot from camera, he will be having multiple images, but since no tripod or stand was used, images taken will be slightly different.
How can I align them such that they overlay neatly and crop out the edges
I have searched a lot, but most of the solutions were either making a 3D reconstruction or using matlab.
e.g. https://github.com/royshil/SfM-Toy-Library
Since I'm very new to openCV, I will prefer a easy to implement solution
I have generated many datasets by manually rotating and cropping images in MSPaint but any link containing corresponding datasets(slightly rotated and translated images) will also be helpful.
EDIT:I found a solution here
http://www.codeproject.com/Articles/24809/Image-Alignment-Algorithms
which gives close approximations to rotation and translation vectors.
How can I do better than this?
It depends on what you mean by "better" (accuracy, speed, low memory requirements, etc). One classic approach is to align each frame #i (with i>2) with the first frame, as follows:
Local feature detection, for instance via SIFT or SURF (link)
Descriptor extraction (link)
Descriptor matching (link)
Alignment estimation via perspective transformation (link)
Transform image #i to match image 1 using the estimated transformation (link)
I have a camera that will be stationary, pointed at an indoors area. People will walk past the camera, within about 5 meters of it. Using OpenCV, I want to detect individuals walking past - my ideal return is an array of detected individuals, with bounding rectangles.
I've looked at several of the built-in samples:
None of the Python samples really apply
The C blob tracking sample looks promising, but doesn't accept live video, which makes testing difficult. It's also the most complicated of the samples, making extracting the relevant knowledge and converting it to the Python API problematic.
The C 'motempl' sample also looks promising, in that it calculates a silhouette from subsequent video frames. Presumably I could then use that to find strongly connected components and extract individual blobs and their bounding boxes - but I'm still left trying to figure out a way to identify blobs found in subsequent frames as the same blob.
Is anyone able to provide guidance or samples for doing this - preferably in Python?
The latest SVN version of OpenCV contains an (undocumented) implementation of HOG-based pedestrian detection. It even comes with a pre-trained detector and a python wrapper. The basic usage is as follows:
from cv import *
storage = CreateMemStorage(0)
img = LoadImage(file) # or read from camera
found = list(HOGDetectMultiScale(img, storage, win_stride=(8,8),
padding=(32,32), scale=1.05, group_threshold=2))
So instead of tracking, you might just run the detector in each frame and use its output directly.
See src/cvaux/cvhog.cpp for the implementation and samples/python/peopledetect.py for a more complete python example (both in the OpenCV sources).
Nick,
What you are looking for is not people detection, but motion detection. If you tell us a lot more about what you are trying to solve/do, we can answer better.
Anyway, there are many ways to do motion detection depending on what you are going to do with the results. Simplest one would be differencing followed by thresholding while a complex one could be proper background modeling -> foreground subtraction -> morphological ops -> connected component analysis, followed by blob analysis if required. Download the opencv code and look in samples directory. You might see what you are looking for. Also, there is an Oreilly book on OCV.
Hope this helps,
Nand
This is clearly a non-trivial task. You'll have to look into scientific publications for inspiration (Google Scholar is your friend here). Here's a paper about human detection and tracking: Human tracking by fast mean shift mode seeking
This is similar to a project we did as part of a Computer Vision course, and I can tell you right now that it is a hard problem to get right.
You could use foreground/background segmentation, find all blobs and then decide that they are a person. The problem is that it will not work very well since people tend to go together, go past each other and so on, so a blob might very well consist of two persons and then you will see that blob splitting and merging as they walk along.
You will need some method of discriminating between multiple persons in one blob. This is not a problem I expect anyone being able to answer in a single SO-post.
My advice is to dive into the available research and see if you can find anything there. The problem is not unsolvavble considering that there exists products which do this: Autoliv has a product to detect pedestrians using an IR-camera on a car, and I have seen other products which deal with counting customers entering and exiting stores.