Python - Best way to find similar images in a directory

Python - Best way to find similar images in a directory - python

Here is the effect I am trying to achieve - Imagine a user submits an image, then a python script to cycle through each JPEG/PNG for a similar image in the current working directory.
Close to how Google image search works (when you submit your image and it returns similar ones). Should I use PIL or OpenCV?
Preferably using Python3.4 by the way, but Python 2.7 is fine.
Wilson

I mean, why not use both? It's trivial to convert PIL images into OpenCV images and vice-versa, and both have niche functions that can make your life easier. Pair them up with sklearn and numpy, and you're cooking with gas.

I created the undouble library in Python which seems a match for your issue.
It uses Hash functions to detect (near-)identical images in for example a directory. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.

Related

using python to obtain objects in an image

I have this picture. I need to identify the animal in this picture as shown using an image processing algorithm. I'm thinking of using Python for this. But I don't know which algorithm to use and I don't know where to start. Where should I start?
image

The best place to start is fast.ai. You will find the videos you need and the code. You can do it on any computer, even a cheap laptop. You will also want to look into LIME model explanations for image classifiers.

Simple Way to Compare Two Images in Python

I need to copy images from 'Asset' folder in Windows 10 which has background images automatically downloaded. Some of these images will never be displayed and at some point deleted. To make sure I have seen all the new images before they are deleted I have created a Python script that copy these images into a different folder. To efficient I need a way to compare two images those that only the new ones are copied. All I need to do is to have a function that takes two images compare them with a simple approach to be sure that the two images are not visually identical. A simple test would be to take an image file copy it and compare the copy and the original, in which case the function should be able to tell that those are the same images.
How can I compare two images in python? I need simple and efficient way to do it. Several answers I have read are a bit complicated.

I encountered a similar problem before. I used PIL.Image.tobytes() to convert the image to a byte object, then call hash() on the byte object and compared the hash values.

Compare two images in python
Option 1:
Use ImageChops module and it contains a number of arithmetical image operations, called channel operations (“chops”). These can be used for various purposes, including special effects, image compositions, algorithmic painting, and more.
Example:
ImageChops.difference(image1, image2) ⇒ image
Returns the absolute value of the difference between the two images.
out = abs(image1 - image2)
Option 2:
Scikit-image is an image processing toolbox for SciPy.
In scikit-image, please use the compare_ssim to Compute the mean structural similarity index between two images.
References:
Python Compare Two Images

Creating a fingerprint database

I was wondering how to create a fingerprint database. If fingerprints are stored as images, how do you compare images in a database, or create an image search engine like TinEye?
I know this is a big subject, but I'm just looking for a starting point. Can this be done using Python/Django libraries and MySQL?

OpenCV comes with a sample program that does what you are looking for. It's called find_obj.py. Pull it up in your editor and change:
surf = cv2.SURF(1000)
to
surf = cv2.SURF(100)
This should find lots of "inlier" points of interest in the negative of the fingerprint scan.
You can play around with a number of the variables and eventually find the best configuration for the sort of images you're comparing. It's also fairly straightforward to alter the sample to allow you to compare a single image against an entire directory.
I should point out that this will only be effective for the sort of digitized fingerprint scans used by law enforcement.

The Python Imaging Library is probably the best library to get started on image processing with.
The library most commonly used for real time image processing (you don't need real time, but you can't go wrong with fast) is OpenCV. It has Python Bindings and built-in feature detection algorithms. See also this comparison.
For an overview of image comparison algorithms take a look at this question.

As a very simple approach you can crawl all images and compute a hash for each.
Later on, when user submits an image for a search, you compute a hash for that too and look for the same hash in your database.
However, this is really simplistic approach and will only work if searched for exact image copies. Ideally, each image should be converted to some simplified feature set (to have tolerance against different versions of the same image --- different formats, sizes, noise, etc.) used for comparison. For instance, it could be worth trying convert images (both crawled and submitted for search) to grayscale of 128x128 size and compute hash of that.

Python: Manipulating a 16-bit .tiff image in PIL &/or pygame: convert to 8-bit somehow?

Hello all,
I am working on a program which determines the average colony size of yeast from a photograph, and it is working fine with the .bmp images I tested it on. The program uses pygame, and might use PIL later.
However, the camera/software combo we use in my lab will only save 16-bit grayscale tiff's, and pygame does not seem to be able to recognize 16-bit tiff's, only 8-bit. I have been reading up for the last few hours on easy ways around this, but even the Python Imaging Library does not seem to be able to work with 16-bit .tiff's, I've tried and I get "IOError: cannot identify image file".
import Image
img = Image.open("01 WT mm.tif")
My ultimate goal is to have this program be user-friendly and easy to install, so I'm trying to avoid adding additional modules or requiring people to install ImageMagick or something.
Does anyone know a simple workaround to this problem using freeware or pure python? I don't know too much about images: bit-depth manipulation is out of my scope. But I am fairly sure that I don't need all 16 bits, and that probably only around 8 actually have real data anyway. In fact, I once used ImageMagick to try to convert them, and this resulted in an all-white image: I've since read that I should use the command "-auto-levels" because the data does not actually encompass the 16-bit range.
I greatly appreciate your help, and apologize for my lack of knowledge.
P.S.: Does anyone have any tips on how to make my Python program easy for non-programmers to install? Is there a way, for example, to somehow bundle it with Python and pygame so it's only one install? Can this be done for both Windows and Mac? Thank you.
EDIT: I tried to open it in GIMP, and got 3 errors:
1) Incorrect count for field "DateTime" (27, expecting 20); tag trimmed
2) Sorry, can not handle images with 12-bit samples
3) Unsupported layout, no RGBA loader
What does this mean and how do I fit it?

py2exe is the way to go for packaging up your application if you are on a windows system.
Regarding the 16bit tiff issue:
This example http://ubuntuforums.org/showthread.php?t=1483265 shows how to convert for display using PIL.
Now for the unasked portion question: When doing image analysis, you want to maintain the highest dynamic range possible for as long as possible in your image manipulations - you lose less information that way. As you may or may not be aware, PIL provides you with many filters/transforms that would allow you enhance the contrast of an image, even out light levels, or perform edge detection. A future direction you might want to consider is displaying the original image (scaled to 8 bit of course) along side a scaled image that has been processed for edge detection.
Check out http://code.google.com/p/pyimp/wiki/screenshots for some more examples and sample code.

I would look at pylibtiff, which has a pure python tiff reader.
For bundling, your best bet is probably py2exe and py2app.

This is actually a 2 part question:
1) 16 bit image data mangling for Python - I usually use GDAL + Numpy. This might be a bit too much for your requirements, you can use PIL + Numpy instead.
2) Release engineering Python apps can get messy. Depending on how complex your app is you can get away with py2deb, py2app and py2exe. Learning distutils will help too.

Adding multiple images to one canvas in Python

I want to load a number of images from harddrive and place them on a larger white background. And I want to do it in Python. I am wondering what is the best way of doing that. I am using a windows machine and I can use any library I want. Any pointer to a webpage or a sample code that can point me to a good direction would be appreciated.
Something like this:

A very popular image processing library for Python is PIL. The official PIL tutorial might be useful, especially the section about "Cutting, Pasting and Merging Images".

PIL isn't enough. Try also with PIL the aggdraw library.
But aggdraw also isn't enough. It works poorly with transparency. I mean 0.5-1 gray pixel around opaque object over the transpparent area.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.