I am in a position where relatively low resolution images are provided (via an API, higher resolution images are not available) and high resolution images need to be generated.
I've taken a look at PIL and it's just great for about everything... Except scaling up images.
It has the common resizing algorithms:
Nearest Neighbor
Bilinear
Bicubic
Anti-aliased
I would like to use Fractal Resizing (as per jeff's post on coding horror), but alas, PIL has no support for this kind of resizing.
Further Google searches yield no alternative libraries to provide fractal image resizing either.
Does such a thing exist or do I really have to buckle down and write my own fractal resizing algorithm?
I'm no expert but from my current vantage point, that looks like a pretty steep learning curve :(
If no such library exists, maybe you have some advice where to learn about fractal compression algorithms?
There are algorithms, and you're definitely not going to find them in Python. To start with, you can take this paper :
Daniel Glasner, Shai Bagon, and Michal Irani, "Super-Resolution from a Single Image," in Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 2009.
It is very much state of the art, highly sophisticated, and producing promising results. If you ever make it into a python implementation please release it to the public :)
Fractal-based image scaling is still quite unusual and hasn't settled down onto one accepted-best algorithm yet. I'm afraid you aren't going to find it in standard image processing libraries yet.
It's also not invariably preferable to bicubic. It can have artefacts which will be undesirable for some kinds of images. For me, Jeff's example image looks a bit weird and unnatural around the sharp edges like the right-hand-side of the nose. Better for some values of ‘better’, sure, but I wouldn't blanket-apply it to all my images.
(This is the case with other ‘advanced’ upscaling techniques, too, including the better-known and more widely-implemented Lanczos/Sinc method.)
Check out this solution using Residual Dense Networks:
https://github.com/idealo/image-super-resolution
Good documentation and rather easy to implement. They even have docker builds and google collab notebooks available. See the docs
Related
I have a video of a road/building and I want to create a 3D model out of it. The scene I am looking at is rigid and the drone is moving. I assume not having any extra info like camera pose, accelerations or GPS position. I would love to find a python implementation that I can adapt to my liking.
So far, I have decided to use the OpenCV calcOpticalFlowFarneback() for optical flow, which seems reasonably fast and accurate. With it, I can get the Fundamental Matrix F with findFundamentalMat(). So far so good.
Now, according to the tutorial I am following here, I am supposed to magically have the Calibration Matrix of the camera, which I obviously don't have nor plan to have available in the future app I am developing.
After some long research, I have found a paper (Self-calibration of a moving camera from point correspondences and
fundamental matrices) from 1997 that defines what I am looking for (with a nice summary here). I am looking for the simplest/easiest implementation possible, and I am stuck with these problems:
If the camera I am going to use changes exposure and focus automatically (no zoom), are the intrinsic parameters of the camera going to change?
I am not familiar with the Homotopy Continuation Method for solving equations numerically, plus they seem to be slow.
I intend to use the Extended Kalman Filter, but do not know where to start, knowing that a bad initialization leads to non-convergence.
Digging some more I found a Multi Camera Self Calibration toolbox open-source written for Octave with a Python wrapper. My last resort will be to break down the code and write it in Python directly. Any other options?
Note: I do not want to use the a chess board nor the planarity constraint.
Is there any other way to very accurately self-calibrate my camera? After 20 years of research since 1997, has anyone come up with a more straightforward method??
Is this a one-shot thing, or are you developing an app to process lots videos like these automatically?
If the former, I'd rather use an integrated tool like Blender. Look up one of the motion tracking (or "matchmoving") tutorials on youtube to get an idea of it, for example this one.
I am trying to compute a rough "quality" metric for a video, which takes the following into consideration:
"Smoothness" of video; i.e., the opposite of how "choppy" it is
Image quality; i.e. if there are a lot of compression artifacts, the quality should decrease in size
I came across https://github.com/aizvorski/scikit-video, but the code seems to be littered with FIXMEs and TODOs, and on top of that there's barely any comments or documentation.
Is there a Python library, or even a program with a CLI, for computing video quality, or perhaps a set of libraries that will help me compute the above two metrics separately?
Image Quality
I would think that "Image Quality" is largely a function of bit-depth (or effective bit-depth) and bit-rate.
You can parse ffmpeg output to get this information. PIL or PyQt/PySide can also do this.
Smoothness
For smoothness, you may need to use some type of optical flow algorithm and get deltas from frame to frame.
OpenCV looks like a project that does many of these things.
I am working with python and opencv on a piece of software which should compare two images and return as result a value representing their similarity.
I tried first with histograms, and then with SIFT and SURF but the first method is not localized while the second and the third are slow and do not fit very much with my datased content (mostly pictures of crowds).
I would avoid people detector, so I would like to apply some algorithm connected to edges and textures comparison. Cany you give some hints or online resource?
This is an interesting, although challenging problem! Recently, I came across an article by the University of California, San Diego's Vision Group about classifying scenes of crowds. Here is the link: Urban Tribes: Analyzing Group Photos from a Social Perspective.
As you can see, there is no one-size-fits-all solution, but I would think that this should provide you a good place to start from.
What you're asking is a general image classification framework.
Try googling: image classification, scene classification, image Indexing and Retrieval.
In most cases, you'll have to use a multimodal descriptor. Use color, texture, entropy, keypoints, edge histograms.
You can read this and try that.
I have a camera that will be stationary, pointed at an indoors area. People will walk past the camera, within about 5 meters of it. Using OpenCV, I want to detect individuals walking past - my ideal return is an array of detected individuals, with bounding rectangles.
I've looked at several of the built-in samples:
None of the Python samples really apply
The C blob tracking sample looks promising, but doesn't accept live video, which makes testing difficult. It's also the most complicated of the samples, making extracting the relevant knowledge and converting it to the Python API problematic.
The C 'motempl' sample also looks promising, in that it calculates a silhouette from subsequent video frames. Presumably I could then use that to find strongly connected components and extract individual blobs and their bounding boxes - but I'm still left trying to figure out a way to identify blobs found in subsequent frames as the same blob.
Is anyone able to provide guidance or samples for doing this - preferably in Python?
The latest SVN version of OpenCV contains an (undocumented) implementation of HOG-based pedestrian detection. It even comes with a pre-trained detector and a python wrapper. The basic usage is as follows:
from cv import *
storage = CreateMemStorage(0)
img = LoadImage(file) # or read from camera
found = list(HOGDetectMultiScale(img, storage, win_stride=(8,8),
padding=(32,32), scale=1.05, group_threshold=2))
So instead of tracking, you might just run the detector in each frame and use its output directly.
See src/cvaux/cvhog.cpp for the implementation and samples/python/peopledetect.py for a more complete python example (both in the OpenCV sources).
Nick,
What you are looking for is not people detection, but motion detection. If you tell us a lot more about what you are trying to solve/do, we can answer better.
Anyway, there are many ways to do motion detection depending on what you are going to do with the results. Simplest one would be differencing followed by thresholding while a complex one could be proper background modeling -> foreground subtraction -> morphological ops -> connected component analysis, followed by blob analysis if required. Download the opencv code and look in samples directory. You might see what you are looking for. Also, there is an Oreilly book on OCV.
Hope this helps,
Nand
This is clearly a non-trivial task. You'll have to look into scientific publications for inspiration (Google Scholar is your friend here). Here's a paper about human detection and tracking: Human tracking by fast mean shift mode seeking
This is similar to a project we did as part of a Computer Vision course, and I can tell you right now that it is a hard problem to get right.
You could use foreground/background segmentation, find all blobs and then decide that they are a person. The problem is that it will not work very well since people tend to go together, go past each other and so on, so a blob might very well consist of two persons and then you will see that blob splitting and merging as they walk along.
You will need some method of discriminating between multiple persons in one blob. This is not a problem I expect anyone being able to answer in a single SO-post.
My advice is to dive into the available research and see if you can find anything there. The problem is not unsolvavble considering that there exists products which do this: Autoliv has a product to detect pedestrians using an IR-camera on a car, and I have seen other products which deal with counting customers entering and exiting stores.
For my next project I plan to create images with text and graphics. I'm comfortable with ruby, but interested in learning python. I figured this may be a good time because PIL looks like a great library to use. However, I don't know how it compares to what ruby has to offer (e.g. RMagick and ruby-gd). From what I can gather PIL had better documentation (does ruby-gd even have a homepage?) and more features. Just wanted to hear a few opinions to help me decide.
Thanks.
Vince
PIL is a good library, use it. ImageMagic (what RMagick wraps) is a very heavy library that should be avoided if possible. Its good for doing local processing of images, say, a batch photo editor, but way too processor inefficient for common image manipulation tasks for web.
EDIT: In response to the question, PIL supports drawing vector shapes. It can draw polygons, curves, lines, fills and text. I've used it in a project to produce rounded alpha corners to PNG images on the fly over the web. It essentially has most of the drawing features of GDI+ (in Windows) or GTK (in Gnome on Linux).
PIL has been around for a long time and is very stable, so it's probably a good candidate for your first Python project. The PIL documentation includes a helpful tutorial, which should get you up to speed quickly.
ImageMagic is a huge library and will do everything under the sun, but many report memory issues with the RMagick variant and I have personally found it to be an overkill for my needs.
As you say ruby-gd is a little thin on the ground when it comes to English documentation.... but GD is a doddle to install on post platforms and there is a little wrapper with some helpful examples called gruby thats worth a look. (If you're after alpha transparency make sure you install the latest GD lib)
For overall community blogy help, PIL's the way.