Ideal images for Viola-Jones Haar cascade in OpenCV

Ideal images for Viola-Jones Haar cascade in OpenCV - python

I am trying to detect specific objects in images using Haar cascade in OpenCV.
Let's say I am interested in detecting stop signs in landscape images. When defining positive image samples for my training set, which would be the best kind of image: (a) full images with my object, (b) a medium crop or (c) a tight crop?
Similarly, what's best for negative images? Does this influence overfitting? I would also appreciate any other general tips from those with experience. Thanks.
Image ref: http://kaitou-ace.deviantart.com/art/Stop-sign-on-a-country-road-Michigan-271990933

You only want features that you want to detect in your positive samples. So the C image would be correct for positive samples.
As for negative samples you want EVERYTHING else. Although that is obviously unrealistic if you are using your detector in a specific environment then training to detect that as negative is the right way to go. I.e. lots of pictures of landscapes etc (ones that don't have stop signs in)

The best choice is (c) because (a) and (b) contain too many features, all around the border of the sign, that are not interesting for you.
Not only they are not useful but they can seriously compromise the performance of the algorithm.
In case (c) its aim is to recognize situations where in the current window there are the features you are looking for.
But what about (b) and (c)?
In those cases the algorithm has to detect interesting features just in a corner of the window (and unfortunately that corner could be everywhere) and at the same time to be consistent with all the infinite possibilities that could occur around that corner.
You would need a huge amount of samples and anyway, even if you finally manage to get an acceptable hit rate, the job of separating positives and negatives is so difficult that the running time would be very high.
As to negatives collection, ideally you should pick up images that reproduce what you think are the images against which your final detector will run.
For example if you think that indoor images are not interesting for this, just discard them. If you think that a certain kind of landscapes are the ones where you detector will run, just retain much of them.
But this is only theoretical, I feel that the improvement would be negligeble. Just collect as many images as you can, The number of different images, that really matters.

Related

Measure the rate of growth of a crack from Video

My experiment involves subjecting a substance to pressure that makes the substance eventually crack. The crack grows with time and pressure applied. I have a set-up to take a picture of the substance at fixed intervals of time.
I need to measure how fast crack grows.How do I go about this? (I can code in Python).
Is there a way to measure live speed or speed of growth of crack from one frame to another?
Google drive link to series of pictures taken - https://drive.google.com/open?id=189cv8B4rm3lhSgT6OYfI_aN0Xmqi-tYi
Kindly advise.
I Tried floodFill from OpenCV as per suggestions to this question. But the returned mask is as shown:
h, w = resized.shape[:2]
mask = np.zeros((h+2, w+2), np.uint8)
seed = (int(w/2),int(h/2))
# Floodfill from point (0, 0)
num,im,mask,rect = cv2.floodFill(resized, mask, (0,0), (255,0,0), (10,)*3, (10,)*3, floodflags)
I thought if I can get the co-ordinates of the rectangle bounding box that encloses the crack, I can track its co-ordinates across frames and measure the size of the crack and eventually the speed.
I tried thresholding as below:
th, im_th = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY);
This gives:
I'm unsure if this will let me filter out the background and draw a bounding box over the crack alone. Please advise.
Thanks in advance.

Depending on how slowly the crack forms, you probably don't need a video; you'll likely wind up sampling every X frames anyway, and throwing all of the extra frames away. What you want is enough frames to get "incremental" changes in the crack without getting too many frames that it becomes too computationally expensive.
If you can carefully control the lighting conditions in your setup, then you're in luck! This becomes a very simple problem. You can take a histogram of the pixels (openCV has handles for this, but so does PIL and numpy); you should get two families of color; one that is the color of the outside of the substance, and another that is biased by the shadow in the crack.
You can also try dramatically increasing the contrast in each image/frame in order to get a binary mask of the crack, or running an edge detector over the image. These techniques will lead to frames that are substantially easier to process than the raw footage. You can even feed these into a skeletonization process in order to generate a vector-based representation of the line, in XY image coordinates.
If you can't control the lighting, or the sample is a similar color to the crack, you'll probably need to use object detection techniques, but it's unlikely there's an existing "crack detector," so you may either need to build your own, or look for what other detectors serve as a good proxy for the color and shape of the forming crack.
I'd highly recommend trying the first option if at all possible; pixel and histogram math is far easier than other techniques.

I appreciate you are only just getting started but you have some issues with your video. Firstly the lighting it is not best and it is not consistent because people are moving around in front of it and casting shadows - it also doesn't illuminate the the background behind the crack best - it would be better if it was at the height of the crack and shining more into it so that it better illuminates the background behind the crack. Secondly, you could do without the camera moving part way through the experiment!
Finally, if you want to measure things you need to calibrate, which at the very least means putting a ruler in the image - or scale lines on your background at fixed intervals. If you are doing all that you may as well make life easy for yourself and put markers of a specific colour/pattern, both different, on the top and bottom of the frame plates that are applying the load.
Finally then, you want to do something like a floodfill, or a fill just within the confines of your material (probably by masking) to fill the crack with a different colour. It is then pretty simple to measure the length of the crack and the left-most extent of the crack.

With a proper segmentation approach you are going to have a detailed geometry of the object extracted from a single frame. For example:
If you process multiple frames you will be able to see geometry evolution in time. Having that it should be easy to compare polygons to find form changes, cracks, etc:
I used to work with 4K video to get all required details and good accuracy. You might not need all that data, but video is still way more flexible.
Here is a complete example: https://youtu.be/g2KyfrBtTA4
Provide some examples if you want to get more detailed recommendations.
Update
Real examples are always helpful. So you can segment a crack:
or a substance:
or both:
Basically, you need to enhance overall quality of the input (focus, background under the substance, etc).
As Mark Setchell showed, you might get unwanted background as part of the result shape (the right side of the crack), so it is better to make sure that will not happen or just try to analyze only the substance.
Anyway, your task doesn't seem to be complex. It might be trivial if you can improve image quality and do some simplifications to the environment (some specific background, etc).

Requesting Guidance: What approach to follow to do Quality Control on small and thin metal ring shafts using Computer Vision?

I am new to the Computer Vision field and looking for your guidance to identify approach to tackle the following scenario:
What approach to follow to do Quality Control on small and thin metal rings using Computer Vision
Putting below the detailed requirement(this is the best I can share):
To begin with, I have attached a picture of the ring we need to do QC of.
Ring_for_QC
Ring diameter = 3 inch
Following checks we need to do:
1.Surface coating of the ring peeled off
2.Portion of ring chipped off
3.Scratch on the ring's Surface
4.Width of the ring is uneven
5.Dent on the ring
6.Entire surface of the ring is not completely horizontal to the plane;
may be due to some dent a part of the ring is resting on the plane surface creating some 1 or 2 degree angle
(I have marked no.6 as 'uneven surface' in the attached picture)
I have also attached another picture marking the quality issues found on a random ring.elevated view with marked QC issues
Scenario:
One single ring can have one or more than one of the above mentioned 6 defects
Issue 1 & 3 can occur at either surface of the ring and we need to check both the surfaces
We need to QC on one single ring at a time
Challenge:
- Need to set up a work station to capture image or video of each ring under check
How many cameras will be there in that work station and what would be the angle for the camera
As we need to check both the sides of the ring we need to decide whether:
we will place the ring on a trasperent surface and take image
or
we need to flip the ring after image is taken on one side
Next challenge is what computer vision technique we should employ to identify all these issues
For the time being we are doing some research around opencv's background substraction methods
It will be helpful to get some insight from you on
what should be a better/feasible approach

Since this is for a student project I'll emphasize image processing more than other aspects of an application. See the bottom section for considerations for real-world applications.
That aside, a general comment: implementing vision for quality control (QC) is hard to get right. If the product to be inspected is cheap (e.g. a ring, a small plastic thing), and if the result of the vision inspection is a borderline pass/fail, or uncertain, you can reject the part. If the part to be inspected is expensive (e.g. a large assembly for a tractor, individual CPUs, medical devices near the end of the production line), then you have to have very well defined specifications, and the system needs to be made as robust as possible.
In general, you want to optimize imaging for each type of defect. For example, the camera location, lens, and lighting to detect scratches may be quite different than what is needed for dimensional gauging (a.k.a. dimensional measurement).
Machine Vision vs. Computer Vision
When you search online for algorithms, equipment, and techniques specific to vision for industrial automation, including the quality control of parts on production lines, then for English-language websites favor the term "machine vision" instead of "computer vision."
https://en.wikipedia.org/wiki/Machine_vision
Machine vision is the common industry term for image processing (+ cameras + lighting + ...) for industrial use. Although different people may use different terminology, and the terminology isn't as important as learning techniques, you'll find a lot of material by searching for "machine vision." The term "computer vision" tends to be used for non-industrial applications, and for academic research, though in languages other than English the terms "machine vision" and "computer vision" may be the same. By comparison, "medical imaging" is similar to machine vision, but involves application of image processing to medical applications.
Lighting
Most importantly, you must control the lighting. Ambient lighting, such as desk lamps, overhead lights, etc., are not only useless for a vision system inspecting parts in production, but will typically interfere with image processing. You might find some defects sometimes with poorly controlled light, but to generate the most consistent results, you'll need to set up lights in specific locations, run the lights at specific, verifiable intensities, and have your vision system detect when something has gone wrong with the lighting.
There are "machine vision lights" designed especially for specific applications such as finding scratches in shiny surfaces, making shiny surfaces look less shiny, to backlight parts (which is useful for dimensional gauging), to illuminate parts from low angles, and so on. Read about different types of lighting.
https://smartvisionlights.com/
https://www.vision-systems.com/content/dam/VSD/solutionsinvision/Resources/lighting_tips_white_paper.pdf
Rather than spend a lot of money on special lights, you can mock them up:
LED flashlight or single LED (as a "point" light source)
Bright light + translucent sheet of plastic (for backlighting)
White tissue paper or some other diffusing material in front of a bright light
...
The importance of lighting can not be underestimated. Controlling lighting conditions improves the chance of success, and is typically necessary to achieve the accuracy of measurement or pass/fail assessment required in real-world environments.
Accuracy, Correctness, Usefulness
At some point you'll probably wonder whether machine learning is useful or necessary for the application. The question to ask yourself (or the customer) is this: what percentage of defects would need to be detected?
For example, if a chip is missing from the ring that could be a fatal defect. Is the ring used in some safety-critical application? If so, vision inspection for QC would have to be extraordinarily robust.
Even if you're familiar with the terms "accuracy" and "precision," make sure they have very clear meanings as you consider image processing problems:
https://en.wikipedia.org/wiki/Accuracy_and_precision
So, what percentage of chip defects needs to be found? 90%? 95%? 98%?
Using the term "accurate" more loosely to mean "the vision system gets the measurement correct and/or finds the defects we know are there," what is the accuracy of the most accurate machine learning algorithm you've read about? Or at least, what would qualify as reasonably impressive accuracy for machine learning? 95%? 98%?
If you're making measurements of machine parts on a production line, then you would typically want the accuracy of dimensional measurements and defect detection to be 99% or better. For high-value products, and products such as electronic components that are highly sensitive to defects, accuracy may need to be 99.999% or better. Think of it this way: if a manufacturer is making thousands or tens of thousands of parts, they don't want garbage parts to make it past your vision system several times a day.
Machine learning for image processing has been around a long time. Processing speeds, memory, and training set sizes have improved, and there have been improvements in algorithms as well, but it's important to note that machine learning is suitable only for some applications, and will fail miserably at other applications.
Techniques
To begin with, I have attached a picture of the ring we need to do QC
of.
Ring_for_QC
Ring diameter = 3 inch
Get the exact diameter, including tolerances. If the nominal diameter is 3.000 inches, then then tolerance might be expressed in terms of thousands of an inch. You may not need to know that for a student project, but if you were proposing a solution for a factory owner you wouldn't want to even suggest a price or timeline for delivery without having complete specs for the part, and numerous samples of the part.
From the one image it's not possible to be too specific about what a defect might look like--the same part can have different defects in different factories, or even on different production lines of the same factory--but we can make some guesses.
1.Surface coating of the ring peeled off
From the one image it's not clear what the surface coating is supposed to look like, or what's underneath. You must provide at least one image of a good part, and at least one image for each type of defect.
What is the surface coating? Anodization? Paint? Enamel? Plastic? Cheese? Whatever the case, knowing what material it is, and how that material degrades, will give some clues about what sort of vision setup may help detect problems with the coating. Changes in coating quality can affect apparent texture (e.g. edge content), brightness/darkness (intensity), color, shininess, and so on.
For the moment, let's assume the coating peeling off changes the brightness or texture of the uncoated surface vs. the remaining coated surface. Then your image processing might look something like the following:
Determine whether a ring is in the image
Segment the ring from the background. That is, use an algorithm such as connected components (OpenCV's findContours()), SIFT, or some other technique to identify the presence and location of a rigid object of known size and shape from the background.
Isolate further processing to just those pixels corresponding to the surface of the part.
Use some technique to find clusters of different texture differences, brightness differences, etc. This is where a better description of the coating is required. If lighting and lens parameters are "fixed," you can consider generating a histogram of brightness values in the image (0 = black, 255 = white) and then comparing the histogram of good parts and bad parts--is there some statistical difference? Or you might use connected components (findContours() again) to cluster pixels of different colors, assuming the lack of coating changes the apparent color of the part: maybe the coating is brown and the part is silvery.
It's hard to guess what technique would be relevant here without photos and/or a much more specific description of the coating. Hopefully this makes it clear why specs are important.
Coatings can be absent in different ways: peeling, small absences (voids), partially scraped away, etc. It can be difficult to predict in advance what the shape and size of missing coating may be.
When the size and shape of a defect is hard to predict, but when the defect is associated with a difference in image intensity (pixel brightness) or color, then explore these ideas:
Generate an "edge image" in which you find brightness/color transitions. You start with the grayscale or color image, then use Sobel or Canny or some other algorithm to generate an image of edge intensities.
Apply statistical methods to determine how "edgy" an image is. Are there more than N pixels (or more than 5% of all pixels) with an edge strength greater than S?
Once you have some basic algorithm that identifies the difference between good parts and parts with some missing coating, then you could consider using machine learning to review lots (lots!) of samples to help determine the best parameterization. For example, how do you know what number of edge pixels or edge pixel strength should be considered "bad"?
2.Portion of ring chipped off
It depends on whether the chip is visible just from the part's outline. For example, if you placed the part on a light table (a.k.a. "backlight"), would you always see a defect considered to be a "chip"? Or could the chip just be on the top surface facing the camera?
To find chips on edges, having the part on a backlight simplifies matters greatly.
Identify the location and orientation of the part (e.g. using connect components, normalized correlation, SIFT, or whatever algorithm is suitable for the part and accuracy of location required).
Find edges corresponding to the outer and inner rings of the part.
Fit a circle or nearly circle ellipse to the edge points using Hough circle fit, RANSAC circle fit, or (meh) least square circle fit parameterized to the known dimensions (in pixels) of the outer ring and inner rind diameters.
For the points used for the circle fits, find the point-to-circle (or point-to-ellipse) shortest distance. The larger this distance, the more likely you have a chip or missing chunk.
To ensure you're finding identations, chips, or whatever, and not just individual "noise" edge points, examine points in order going clockwise or anticlockwise, and only consider a series of perimeter points as defects if N successive points have a median or possibly mean point-to-edge distance greater than N.
A simpler approach could be to fit a black-and-white mask--a template--representing a good part to the current location and rotation of the part to be inspected. If the template and sample part are aligned very precisely, and if you perform image subtraction, then you may be fortunate enough to get clusters or pixels where there are defects. But this method is fairly crude, and harder to make robust.
There are machine learning techniques to identify chips on edges, but you'd need lots of part samples to train the techniques. Optionally, if you don't have enough samples, you can use the sample samples with slightly modified lighting, at different locations in the image, with manually added defects, etc., to help train the algorithm. But that's another discussion altogether.
3.Scratch on the ring's Surface
See the link above about different types of lighting. You'll need to experiment with a few different lighting configurations to figure out what works for your part.
Generally, though, scratches are likely to have difference in brightness and "edginess" (image edge content) relative to the rest of the part. If you're lucky, a scratch can reveal a different color.
Scratches can vary so much in appearance, area, and shape that it would be hard to parameterize an algorithm to catch them all. Once again, statistical analysis of edge content, brightness, and color tends to be useful.
In general: to achieve the best results for a particular QC inspection, you'll need to engineer a system specifically for the part. Your vision system may be configurable, and there can be different combinations of lights and cameras for different types of QC inspection, but for any particular defect detection you want to control the appearance of the part as much as possible. Relying on software to do all the work yields a less robust system that customers will typically yank out and throw away.
4.Width of the ring is uneven
This is almost an example of dimensional gauging or optical gauging. If you're just looking for unevenness, you don't necessarily need to measurement diameter in engineering units such as millimeters: you can just measure pixels. BUT the effort required to ensure your measurement in pixels is accurate will typically lead you to measuring in millimeters anyway.
Assuming the optical setup is correct and (more or less) calibrated, which I'll describe below, here's a basic process:
Identify the position and location of the part
From the algorithm that find the part, or from a follow-on algorithm that identifies edge pixels (e.g. Sobel, Canny, ...), find the edge pixels just for the outer diameter of the ring.
Perform a circle/ellipse fit to the edge pixels, and eliminate outlier pixels that don't actually belong to the circle/ellipse.
Have your algorithm start with the 1st pixel in the list of edge pixels corresponding to the outer diameter.
From that 1st pixel, find the edge pixel farthest away. Ideally, this would be the point diametrically opposite.
Cycle through all pixels, finding the distance to the farthest pixel. (This is not optimal in terms of speed, but simpler to code.)
Generate a histogram of all distances.
Make a determination of good/bad based on the histogram of point-to-point distances.
You might call a part "bad" for one or more of the following conditions:
At least N point-to-point distances exceed a distance of P pixels
The standard deviation of point-to-point distances exceeds some threshold T
...
Measurement of distance depends on the consistency of point-to-point distances at different locations within the image. If you perform accurate, precise measurements of distance, you'll notice that an object of fixed length appears to vary in length depending on its location in the image: if the object is located in the center of the image it may appear to be 57.5 pixels long, but in one corner of the image it may appear to be 56.2 pixels long.
To correct for these irregularities, you can...
Perform a nonlinear flatness correction. This will also correct for non-normal alignment of the camera to part, though you want to start with the optical axis of the camera as normal (perpendicular) to the surface of the part as possib.e.
Make a few quick measurements to estimate how much measurements vary.
5.Dent on the ring
6.Entire surface of the ring is not completely horizontal to the plane; may be due to some dent a part of the ring is resting on the
plane surface creating some 1 or 2 degree angle (I have marked no.6 as
'uneven surface' in the attached picture)
Use cameras imaging from the sides. Make sure the background is simple.
A 1- to 2-degree difference could be hard to detect using a camera placed directed overhead. If you're lucky you could detect that the outer edge of the part is more elliptical than circular, but the ability to detect this would depend on the color and thickness of the part. Also, you wouldn't necessarily be able to distinguish between a misshapen part and one resting at an angle--but for some inspections that's okay since both are defects.
HOWEVER, in a real-world application the customer might not be happy if you reject parts that are otherwise good, but happen to be sitting at a slight angle. A mechanical fixture might fix the problem by ensure parts are lying flat.
I have also attached another picture marking the quality issues found
on a random ring.elevated view with marked QC issues
The image isn't clear enough. Put the part on a simpler background and tinker with lighting to make it more obvious what the differences are between good and bad.
One single ring can have one or more than one of the above mentioned 6 defects
Run one algorithm after the other. You may also have to turn different lights on and off before running each algorithm (or rather, each chain of algorithms).
Issue 1 & 3 can occur at either surface of the ring and we need to check both the surfaces
We need to QC on one single ring at a time
You may have to write an algorithm to detect whether multiple rings happen to be present. Even if you weren't asked to do this specifically, this happens in production, and your professor may surprise you with it. At least have an idea how you would detect the presence of multiple rings.
That's another aspect of vision: you may start thinking of what algorithms and lighting are necessary to solve "the problem," but you'll also spend a lot of time figuring out everything that could go wrong, and writing software to detect those conditions to ensure you don't yield a false result. For example, what happens if the lights turn off? What if two rings are present? What if the ring isn't fully within the field of view? What if dirt gets on the surface the part is resting on? What if the lens gets dirty (which it will)?
A few principles:
Provide the best image for image processing before you consider what algorithm would work best.
Understand what accuracy/success rate is necessary, and measure it.
Get as many samples as you possibly can: hundreds, thousands if possible. Having a chance to measure "online" (in real production) is helpful.
Real-world applications
If it were a real-world application--that is, if you went into the field of vision professionally--there are many more steps that may seem less difficult, but that turn out to be critical:
How rings come into view (or into "station"): on a moving conveyor? placed by a robot? in some container?
What triggers vision inspection of the ring -- a programmable logic controller, a "light curtain" the ring passes through, or whether the vision system itself has to determine when a ring is ready for inspection.
How results are communicated to other equipment. (This can be a huge hassle, and an otherwise good vision system can be rejected by a customer if communications aren't designed and implemented properly.)
Whether you are guaranteed to see only one ring at a time
This isn't to say university isn't the real world: just that you probably won't lose tens or hundreds of thousands of Euros/pounds/dollars if you happen to overlook something.

You can see how to makes face recognition.
Face detection.
Face alignment and normalization.
Features extraction.
Comparing features with pattern.
But in your case, you can skip paragraph 3 and compare 2 with the reference image. Depending on the conditions, additional filtering may be necessary.

Discard images from a group of similar images

I am generating images (thumbnails) from a video every 3 seconds. Now I need to discard/remove all the similar images. Is there a way I could this?
I generate thumbnails using FFMPEG. I read about various image-diff solutions like given in this SO post, but I do not want to do this manually. How and what parameters should be considered that could tell if a particular image is similar to other images present.

You can calculate the Structural Similarity Index between images and based on the score keep or discard an image. There are other measures you can use, but basically a method that returns a score. Try PIL or OpenCV
https://pillow.readthedocs.io/en/3.1.x/reference/ImageChops.html?highlight=difference
https://www.pyimagesearch.com/2017/06/19/image-difference-with-opencv-and-python/

I dont have enough reputation to comment my idea on your problem, so i will just go ahead and post it as an answer in hope of helping you.
I am quite confused about the term "similar" but since you are reffering on video frames i am going to assume that you want to avoid having "similar" frames that have been captured because of poor camera movement. If that's the case you might want to consider using salient point descriptors.
To be more specific you can detect salient points (using for instance Harris) and then use a point descriptor algorithm (such as SURF) and discard the frames that have been found to have "too many" similar points with a pre-selected frame.
Keep in mind that in order for the above process to be successful, the frames must be as sharp as possible, i guess you don't want to extract as a thubnail a blurred frame anyway. So applying a blurred images detection might be useful in your case.

Can we train a haar-cascade to detect numbers and alphabets?

We want to train a particular font and all the alphabets from A-Z and all numbers from 0-9. How many positive and negative samples of each would do the job?
It would be a tedious task to do though but tesseract is not that accurate to read number plates of moving vehicles. Any other suggestions to do the task?

I am quoting from the following Wikipedia article- https://en.m.wikipedia.org/wiki/Automatic_number_plate_recognition
There are seven primary algorithms that the software requires for identifying a license plate:
1.Plate localization – responsible for finding and isolating the plate on the picture.
2.Plate orientation and sizing – compensates for the skew of the plate and adjusts the dimensions to the required size.
3.Normalization – adjusts the brightness and contrast of the image.
4.Character segmentation – finds the individual characters on the plates.
5.Optical character recognition.
6.Syntactical/Geometrical analysis – check characters and positions against country-specific rules.
7.The averaging of the recognised value over multiple fields/images to produce a more reliable or confident result. Especially since any single image may contain a reflected light flare, be partially obscured or other temporary effect.
Coming back to your question Haar cascades can be used to localise number plates. However for the OCR part I would personally recommend a CNN network. You can find an implementation here- https://matthewearl.github.io/2016/05/06/cnn-anpr/
There is also this library specialised in the task-https://github.com/openalpr/openalpr check out that as well
For haar cascade-https://github.com/opencv/opencv/blob/master/data/haarcascades/haarcascade_licence_plate_rus_16stages.xml
Good luck

Detect the location of an image within a larger image

How do you detect the location of an image within a larger image? I have an unmodified copy of the image. This image is then changed to an arbitrary resolution and placed randomly within a much larger image which is of an arbitrary size. No other transformations are conducted on the resulting image. Python code would be ideal, and it would probably require libgd. If you know of a good approach to this problem you'll get a +1.

There is a quick and dirty solution, and that's simply sliding a window over the target image and computing some measure of similarity at each location, then picking the location with the highest similarity. Then you compare the similarity to a threshold, if the score is above the threshold, you conclude the image is there and that's the location; if the score is below the threshold, then the image isn't there.
As a similarity measure, you can use normalized correlation or sum of squared differences (aka L2 norm). As people mentioned, this will not deal with scale changes. So you also rescale your original image multiple times and repeat the process above with each scaled version. Depending on the size of your input image and the range of possible scales, this may be good enough, and it's easy to implement.
A proper solution is to use affine invariants. Try looking up "wide-baseline stereo matching", people looked at that problem in that context. The methods that are used are generally something like this:
Preprocessing of the original image
Run an "interest point detector". This will find a few points in the image which are easily localizable, e.g. corners. There are many detectors, a detector called "harris-affine" works well and is pretty popular (so implementations probably exist). Another option is to use the Difference-of-Gaussians (DoG) detector, it was developed for SIFT and works well too.
At each interest point, extract a small sub-image (e.g. 30x30 pixels)
For each sub-image, compute a "descriptor", some representation of the image content in that window. Again, many descriptors exist. Things to look at are how well the descriptor describes the image content (you want two descriptors to match only if they are similar) and how invariant it is (you want it to be the same even after scaling). In your case, I'd recommend using SIFT. It is not as invariant as some other descriptors, but can cope with scale well, and in your case scale is the only thing that changes.
At the end of this stage, you will have a set of descriptors.
Testing (with the new test image).
First, you run the same interest point detector as in step 1 and get a set of interest points. You compute the same descriptor for each point, as above. Now you have a set of descriptors for the target image as well.
Next, you look for matches. Ideally, to each descriptor from your original image, there will be some pretty similar descriptor in the target image. (Since the target image is larger, there will also be "leftover" descriptors, i.e. points that don't correspond to anything in the original image.) So if enough of the original descriptors match with enough similarity, then you know the target is there. Moreover, since the descriptors are location-specific, you will also know where in the target image the original image is.

You probably want cross-correlation. (Autocorrelation is correlating a signal with itself; cross correlating is correlating two different signals.)
What correlation does for you, over simply checking for exact matches, is that it will tell you where the best matches are, and how good they are. Flip side is that, for a 2-D picture, it's something like O(N^3), and it's not that simple an algorithm. But it's magic once you get it to work.
EDIT: Aargh, you specified an arbitrary resize. That's going to break any correlation-based algorithm. Sorry, you're outside my experience now and SO won't let me delete this answer.

http://en.wikipedia.org/wiki/Autocorrelation is my first instinct.

Take a look at Scale-Invariant Feature Transforms; there are many different flavors that may be more or less tailored to the type of images you happen to be working with.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.