I'd like to compare ORB, SIFT, BRISK, AKAZE, etc. to find which works best for my specific image set. I'm interested in the final alignment of images.
Is there a standard way to do it?
I'm considering this solution: take each algorithm, extract the features, compute the homography and transform the image.
Now I need to check which transformed image is closer to the target template.
Maybe I can repeat the process with the target template and the transformed image and look for the homography matrix closest to the identity but I'm not sure how to compute this closeness exactly. And I'm not sure which algorithm should I use for this check, I suppose a fixed one.
Or I could do some pixel level comparison between the images using a perceptual difference hash (dHash). But I suspect the the following hamming distance may not be very good for images that will be nearly identical.
I could blur them and do a simple subtraction but sounds quite weak.
Thanks for any suggestions.
EDIT: I have thousands of images to test. These are real world pictures. Images are of documents of different kinds, some with a lot of graphics, others mostly geometrical. I have about 30 different templates. I suspect different templates works best with different algorithms (I know in advance the template so I could pick the best one).
Right now I use cv2.matchTemplate to find some reference patches in the transformed images and I compare their locations to the reference ones. It works but I'd like to improve over this.
From your question, it seems like the task is not to compare the feature extractors themselves, but rather to find which type of feature extractor leads to the best alignment.
For this, you need two things:
a way to perform the alignment using the features from different extractors
a way to check the accuracy of the alignment
The algorithm you suggested is a good approach for doing the alignment. To check if accuracy, you need to know what is a good alignment.
You may start with an alignment you already know. And the easiest way to know the alignment between two images is if you made the inverse operation yourself. For example, starting with one image, you rotate it some amount, you translate/crop/scale or combine all this operations. Knowing how you obtained the image, you can obtain your ideal alignment (the one that undoes your operations).
Then, having the ideal alignment and the alignment generated by your algorithm, you can use one metric to evaluate its accuracy, depending on your definition of "good alignment".
Related
I have an image comparison problem.
To be more precise, I have a test image (a building taken from outside, could be a house, an apartment, a big public building) and I need to compare it against 100.000 other building images in my DB.
Is there an effective method to output top X images (which are most similar, if not the same) in the most accurate way possible to-date?
A number of StackOverflow answers guided me more towards feature-matching OpenCV but sadly I failed to progress (hitting bad accuracy and therefore roadblocks in terms of a way to improve it).
For instance, this is a test image that I would like to compare (white house - South). test_image
and these are the images in my DB pic1_DB pic2_DB pic3_DB pic4_DB pic5_DB
The desired/ideal output would be "the test image is the same building as that in Pic1, Pic3, Pic4 and Pic5".
And the test image is different significantly from Pic2.
Thank you all.
matchTemplate wont work well in this case, as they need exact size and viewpoint match.
Opencv Feature based method might work. You can try SIFT based method first. But the general assumption is that the rotation, translation, perspective changes are bounded. It means that for adjacent iamge pair, it can not be 1 taken from 20m and other picture taken from 10km away. Assumptions are made so that the feature can be associated.
Deep learning-based method might work well given enough datasets. take POSEnet for reference. It can matches same building from different geometry view point and associate them correctly.
Each method has pros and cons. You have to decide which method you can afford to use
Regards
Dr. Yuan Shenghai
For pixel-wise similarity, you may use res = cv2.matchTemplate(img1, img2, cv2.TM_CCOEFF_NORMED) \\ similarity = res[0][0], which adopts standard corralation coefficient to evaluate simlarity (first assure two inputted image is in the same size).
For chromatic similarity, you may calculate histogram of each image by cv2.calHist, then measure the similarity between each histogram by metric of your choice.
For intuitive similarity, I'm afraid you have to use some machine learning or deep learning method since "similar" is a rather vague concept here.
I would like to test if a set of documents have some special similarity, looking on a graph built with each one's vector representation, showed together with a text dataset of other documents. I guess that they will be together in a visualization.
The solution is to use doc2vec to calculate the vector for each document and plot it? Can it be done in a unsupervised way? Which python library should I use to get those beautiful 2D and 3D representations of Word2vec?
Not sure of what you're asking but if you want a way to check if vector are of the same type you could use K-Means.
K-Means make a number K of cluster out of a list of vector, so if you choose a good K (not too low so it will search for something but not too high so it will not be too discriminant) it could work.
K-Means grossly work that way:
init_center(K) # randomly set K vector that will be the center of your cluster
while not converge(): # This one is tricky as you can find a lot of way to check for the convergence, the easiest is to check if your center has moved since the last itteration
associate_vector() # Here you associate all the vectors to the closest center
re_calculate_center() # And now you put the center at the... well center of their point, you can do that just by doing the mean of all the vector of the cluster.
This gif is probably clearer than me:
And this article (where this gif is from) is really clearer than me, even if he talk for java here:
https://picoledelimao.github.io/blog/2016/03/12/multithreaded-k-means-in-java/
I have multiple images of same object taken at different angles and has many such objects. I need to match a test image which is taken at a random angle later belongs to particular object with similar background, by matching it with those images. The objects are light installations inside a building. Same object may be installed at different places, but backgrounds are different.
I used mean shift error, template matching from opencv and Structural Similarity Index, but with less accuracy.
How about Image Fingerprinting or SIFT/SURF
The state of the art for such object recognition tasks are convolutional neural networks, but you will need a large labelled training set, which might rule that out. Otherwise SIFT/SURF is probably what you are looking for. They are pretty robust towards most transformations.
I would comment but not enough rep merp.. I would suggest using feature matching along with SIFT or SRUF. You could use a homography matrix as it would help with the object being at different angles. Here is a tutorial on how to do just that: Feature Matching
I hope this helps.
I took some photos that I am attempting to map/transform onto satellite images on Google Maps. Normally, I would need only 4 pairs of points to apply a perspective transform effectively. However, this is not useful in my case because of two reasons:
Mainly, the poor resolution of Google's satellite images (poor for my application, at least) make it harder to pinpoint the exact points that correspond with those on my photos.
I think Google's satellite images are stitched together slightly imperfectly meaning that even with perfectly chosen pairs of points, I might be a little off because the points on the Google images are slightly off themselves.
As a result, I would like to conduct a least-squares estimation of the perspective transform using more than 4 points, so that I can get a better fit. However, I have no idea how to do so.
I am using Python with PIL and/or OpenCV for this, so a solution using those libraries would be helpful.
Homography is slightly more powerful than affine (it doesn’t preserve parallel lines). It requires 4 points or more (findHomography uses RANSAC and selects its best set of inliers using a linear solution; this is followed by non-linear optimization of distance residual in a least squares sense). You have to provide as many matches as you can (>=4) but try to avoid too many inaccurate matches.
The original statistical model for least squares is ML (maximum likelihood) that finds an optimal solution in the presence of noise. RANSAC compensates for the presence of outliers. There is nothing in the algorithm though that compensates for systematic biases. If they cannot be modeled as noise or outliers a solution is not well defined. If the number of inliers (after rejecting outliers) is less than 4 the solution won’t be found.
As you may have heard of, there is an online font recognition service call WhatTheFont
I'm curious about the tech behind this tool. I think basically we can seperate this into two parts:
Generate images from font files of various format, refer to http://www.fileinfo.com/filetypes/font for a list of font file extensions.
Compare submitted image with all generated images
I appreciate you share some advice or python code to implement two steps above.
As the OP states, there are two parts (and probably also a third part):
Use PIL to generate images from fonts.
Use an image analysis toolkit, like OpenCV (which has Python bindings) to compare different shapes. There are a variety of standard techniques to compare different objects to see whether they're similar. For example, scale invariant moments work fairly well and are part of the OpenCv toolkit.
Most of the standard tools in #2 are designed to look for similar but not necessarily identical shapes, but for font comparison this might not be what you want, since the differences between fonts can be based on very fine details. For fine-detail analysis, try comparing the x and y profiles of a perimeter path around the each letter, appropriately normalized, of course. (This, or a more mathematically complicated variant of it, has been used with good success in font analysis.)
I can't offer Python code, but here are two possible approaches.
"Eigen-characters." In face recognition, given a large training set of normalized facial images, you can use principal component analysis (PCA) to obtain a set of "eigenfaces" which, when the training faces are projected upon this subspace, exhibit the greatest variance. The "coordinates" of the input test faces with respect to the space of eigenfaces can be used as the feature vector for classification. The same thing can be done with textual characters, i.e., many versions of the character 'A'.
Dynamic Time Warping (DTW). This technique is sometimes used for handwriting character recognition. The idea is that the trajectory taken by the tip of a pencil (i.e., d/dx, d/dy) is similar for similar characters. DTW makes invariant some of the variations across instances of single person's writing. Similarly, the outline of a character can represent a trajectory. This trajectory then becomes the feature vector for each font set. I guess the DTW part is not as necessary with font recognition because a machine creates the characters, not a human. But it may still be useful to disambiguate spatial ambiguities.
This question is a little old, so here goes an updated answer.
You should take a look into this paper DeepFont: Identify Your Font from An Image. Basically it's a neural network trained on tons of images. It was presented commercially in this video.
Unfortunately, there is no code available. However, there is an independent implementation available here. You'll need to train it yourself, since weights are not provided, but the code is really easy to follow. In addition to this, consider that this implementation is only for a few fonts.
There is also a link to the dataset and a repo to generate more data.
Hope it helps.