OpenCV - Best Image Recognition Algorithm - python

I am working on a "UPS Package Detection" program. The app takes photos of my porch every 1 minute from a Raspberry Pi. I then run a the image through a Feature Matching against a "cube" to detect cubes (packages are cubes, right??)
I decide if a package has been delivered if there are 3 or more observations (see the two images below). I find my algorithm crude, and I know I can do better. Can someone please recommend and advise a better way for me to detect if a package has been delivered.
(I am using Python) Not a package - number of observations very low
a package - number of observations high

No, in your case the packages are not cubes. They are more like a blob.
If you want to detect any package on your porch, not just ups, you can:
A) create a standardized environment for taking the picture (constant illumination, clear background, pattern on the froor, etc). So to say you have to create a package dropping zone that always looks the same to your camera, except when a package is present.
or
B) (the hard way) compare the images over time. I assume, the image we see is taken outside. So there will be different brightness during the day. Shadows. Etc. You can pre process the image accordingly. For example, use a threshold that is calculated as a percentage of the brightest pixel in the image. Then you can compare images over time. If there is a big blob, where was none in the previous images, there might be a package.
or
C) if you only want to detect UPS packages only, you can use OCR or try to match for the UPS logo. This approach will not only detect packages, but also the UPS guy himself :)
However, have fun, it sounds like a really nice home project.

Related

Python quality inspection with opencv (ssim)

I'm currently an intern at a quality inspector company. My job is to write a program that can detect faulty products (for example, missing screw). They take a picture of every single product. My idea is that I choose an image which could serve as a benchmark and I would compare the other images to that, with the SSIM score, and maybe display the faulty part with a rectangle. Is this a viable idea? (Its a strange internship, because it seems like I'm the only one who can code there...) that's why I'm asking here.
It sounds good idea if your goal is to classify different objects within images comparing benchmark image.
But in my experience, SSIM score was sensitive to angle, light or environment.
So in conclusion, if your goal is to classify different objects in images, your idea would work. But if your goal is to classify exactly same objects, it might not be able to classify.

How to find an exact match of an image in hashed data with openCV

for my school project, I need to find images in a large dataset. I'm working with python and opencv. Until now, I've managed to find an exact match of an image in the dataset but it takes a lot of time even though I had 20 images for the test code. So, I've searched few pages of google and I've tried the code on these pages
image hashing
building an image hashing search engine
feature matching
Also, I've been thinking to search through the hashed dataset, save their paths, then find the best feature matching image among them. But most of the time, my narrowed down working area is so much different than what is my query image.
The image hashing is really great. It looks like what I need but there is a problem: I need to find an exact match, not similar photos. So, I'm asking you guys, if you have any suggestion or a piece of code might help or improve the reference code that I've linked, can you share it with me? I'd be really happy to try or research what you guys send or suggest.
opencv is probably the wrong tool for this. The algorithms there are geared towards finding similar matches, not exact ones. The general idea is to use machine learning to teach the code to recognize what a car looks like so it can detect cars in videos, even when the color or form changes (driving in the shadow, different make, etc).
I've found two approaches work well when trying to build an image database.
Use a normal hash algorithm like SHA-256 plus maybe some metadata (file or image size) to find matches
Resize the image down to 4x4 or even 2x2. Use the pixel RGB values as "hash".
The first approach is to reduce the image to a number. You can then put the number in a look up table. When searching for the image, apply the same hashing algorithm to the image you're looking for. Use the new number to look in the table. If it's there, you have a match.
Note: In all cases, hashing can produce the same number for different pictures. So you have to compare all the pixels of two pictures to make sure it's really an exact match. That's why it sometimes helps to add information like the picture size (in pixels, not file size in bytes).
The second approach allows to find pictures which very similar to the eye but in fact slightly different. Imagine cropping off a single pixel column on the left or tilting the image by 0.01°. To you, the image will be the same but for a computer, they will by totally different. The second approach tries to average small changes out. The cost here is that you will get more collisions, especially for B&W pictures.
Finding exact image matches using hash functions can be done with the undouble library (Disclaimer: I am also the author). It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.

Reading multiple invoices from an image using OCR/computer vision

I wish to extract key-value pairs from the following image that consists of 2 invoices.
Image example
I am using AWS Textract to achieve this however I'd like to be able to map the key-value pairs back to the invoices. For ex- 'Cornbread SVC' should be mapped to bill #1 and '1 #1 CHKN PLATE' should be mapped to bill #2.
One approach I thought was to perform some pre-processing on the image in which if we could find out the no. of bills and their coordinates then crop the image as per the dimensions. So basically '5' bills on an image would yield the coordinates of '5' bills and then take the original image and crop it 5 times as per the different bill dimensions. And then send each bill as a separate image to AWS Textract.
However, I have not been to able to figure out a method to detect the no. of bills in an image and it's boundary coordinates.
Any help would be appreciated. I am open to using any other APIs or methods to achieve this.
As you've already mentioned it would be necessary to split bills before you do any OCR. There are some techniques to achieve this.
You could use OpenCV and detect white paper in the image, see. From my experiences, I can tell you that it will work when the background of an image is dark enough. It won't work when you will take a picture at, for example, a white table. Therefore user experience achieved with this approach won't be satisfying - sometimes it works, sometimes it doesn't.
If it is a mobile app, you could ask your user to draw a rectangle around each receipt. A similar approach for a single document is used in mobile scanners, example.
The last option, which I prefer, is to use scanning app/SDK and force a user to simply take pictures of a single receipt. It may sound a bit rigid and uncool, but it works all the time. Let's face it - more steps that you have with a chance of failure, more failures will happen. In the invoice data extraction process you have at least the following steps:
image capture
image processing
OCR - not 100% accurate
recognition of data (what is invoice number, etc.) - not 100% accurate
At least, you have two steps that are not 100%. Why adding a new step that cannot work in 100% cases while it can achieve the same feature by taking separate images?

OpenCV decentralized processing for stereo vision

I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.

How can I detect and track people using OpenCV?

I have a camera that will be stationary, pointed at an indoors area. People will walk past the camera, within about 5 meters of it. Using OpenCV, I want to detect individuals walking past - my ideal return is an array of detected individuals, with bounding rectangles.
I've looked at several of the built-in samples:
None of the Python samples really apply
The C blob tracking sample looks promising, but doesn't accept live video, which makes testing difficult. It's also the most complicated of the samples, making extracting the relevant knowledge and converting it to the Python API problematic.
The C 'motempl' sample also looks promising, in that it calculates a silhouette from subsequent video frames. Presumably I could then use that to find strongly connected components and extract individual blobs and their bounding boxes - but I'm still left trying to figure out a way to identify blobs found in subsequent frames as the same blob.
Is anyone able to provide guidance or samples for doing this - preferably in Python?
The latest SVN version of OpenCV contains an (undocumented) implementation of HOG-based pedestrian detection. It even comes with a pre-trained detector and a python wrapper. The basic usage is as follows:
from cv import *
storage = CreateMemStorage(0)
img = LoadImage(file) # or read from camera
found = list(HOGDetectMultiScale(img, storage, win_stride=(8,8),
padding=(32,32), scale=1.05, group_threshold=2))
So instead of tracking, you might just run the detector in each frame and use its output directly.
See src/cvaux/cvhog.cpp for the implementation and samples/python/peopledetect.py for a more complete python example (both in the OpenCV sources).
Nick,
What you are looking for is not people detection, but motion detection. If you tell us a lot more about what you are trying to solve/do, we can answer better.
Anyway, there are many ways to do motion detection depending on what you are going to do with the results. Simplest one would be differencing followed by thresholding while a complex one could be proper background modeling -> foreground subtraction -> morphological ops -> connected component analysis, followed by blob analysis if required. Download the opencv code and look in samples directory. You might see what you are looking for. Also, there is an Oreilly book on OCV.
Hope this helps,
Nand
This is clearly a non-trivial task. You'll have to look into scientific publications for inspiration (Google Scholar is your friend here). Here's a paper about human detection and tracking: Human tracking by fast mean shift mode seeking
This is similar to a project we did as part of a Computer Vision course, and I can tell you right now that it is a hard problem to get right.
You could use foreground/background segmentation, find all blobs and then decide that they are a person. The problem is that it will not work very well since people tend to go together, go past each other and so on, so a blob might very well consist of two persons and then you will see that blob splitting and merging as they walk along.
You will need some method of discriminating between multiple persons in one blob. This is not a problem I expect anyone being able to answer in a single SO-post.
My advice is to dive into the available research and see if you can find anything there. The problem is not unsolvavble considering that there exists products which do this: Autoliv has a product to detect pedestrians using an IR-camera on a car, and I have seen other products which deal with counting customers entering and exiting stores.

Categories