I'm making a document scanner for a college project, my code work quite well for any of the uniform lighted images. However I came across issues detecting images with even a little amount of light reflections (or too much light) on the background surface.
I first tried different simple codes I found online, then using different morphological operation, with the result that now my code is a little messy and inaccurate.
Here's the code:
def scanner(img):
clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
image = cv2.imread(img)
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
contrast = clahe.apply(gray)
blurred = cv2.medianBlur(contrast, 21)
canny = cv2.Canny(blurred, 0, 70)
dialated = cv2.dilate(canny, cv2.getStructuringElement(cv2.MORPH_RECT,(5,5)), iterations = 3)
closing = cv2.morphologyEx(dialated, cv2.MORPH_CLOSE, np.ones((5,5),np.uint8),iterations = 10)
contimage, contours, hierarchy = cv2.findContours(closing, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key = cv2.contourArea, reverse = True)[:5]
target = None
for c in contours:
p = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.09 * p, True)
if len(approx) == 4:
target = approx
cv2.drawContours(image, [target], -1, (0, 255, 0), 2)
break
plt.figure(figsize = (20,20))
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.title("final")
plt.show()
Here's an example of the code working:
input1, output1, input2, output2
And not working:
input1, output1, input2, output2
This image shows a successful segmentation:
This image shows a failed segmentation:
It looks like the main issue here is that in the failing images, there is not a sufficient contrast between the paper and the table it's placed on. the ones that work it's basically white paper on dark background, so it's relatively easy to segment out the paper, however when you place the paper on a light colored surface, there isn't sufficient contrast between the paper and the background to tell what is what. Unfortunately in image processing, there is only so much you can do when your input image is bad so there is no easy automated fix for this but I can think of a few workarounds, but they're all going to require extra work.
One would be instead of having your program automatically detect where the paper is, just have a static box that the user has to place the document inside of and simply capture the contents that way. Probably the most simple to implement however it seems like you WANT to detect it automatically so this probably isn't what you're looking for.
Two would be to have some intermediate step allowing the user to select a specific threshold value to apply to the image. Basically you would take the picture, then have the user set a threshold value such that the paper ends up being white, and the background is dark, then you could use that as a template to create the boundary of the paper which you can then segment from the original image. This is probably the most work but closest to what you're looking for.
Three would be similar to number one but instead of having a set area you place the documents within, you could take the picture then have the user manually select where the corners are and segment it that way, more work than #1, less work than #2, but probably still not what you're looking for.
Finally you could just leave it as is and use it knowing that you need a sufficiently dark background for it to work correctly. There are probably some other work arounds but with a lot of image processing stuff you can be very constrained by the quality of your image and there isn't always a software solution for exactly what you're looking to do.
Related
I have read through dozens of questions on this topic here. (see eg: 1, 2, 3). There are a lot of helpful explanations of how to play around with parameters etc, watershedding, etc. Yet no matter what I have tried to put together I am still not managing a halfway-passable count of the cells in my image
Here are two examples of the kind of images I need to process.
Initially I was trying to count all the cells, but because of the difference in focus at the edges (where it gets blurrier) I thought it might be easier to count cells within a rectangle the user selects.
I was hopeful this would improve the results, but as you can see, HoughCircles is both selecting as circles empty spaces with nothing in them, and missing many cells:
Other algorithms I have tried have fared worse.
My code:
cap = cv2.VideoCapture(video_file)
frames = []
while True:
frame_exists, curr_frame = cap.read()
if frame_exists:
frames.append(curr_frame)
else:
break
frames = [cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) for frame in frames]
for img in frames:
circles = cv2.HoughCircles(img,
cv2.HOUGH_GRADIENT,
minDist=10,
dp=1.1,
param1=4, #the lower the number the more circles found
param2=13,
minRadius=4,
maxRadius=10)
if circles is not None:
circles = np.round(circles[0, :]).astype("int")
for (x, y, r) in circles:
cv2.circle(img, (x, y), r, (0, 255, 0), 1)
cv2.imshow("result", img)
Editing to add in my not helpful preprocessing code:
denoise = cv2.fastNlMeansDenoising(img, h=4.0, templateWindowSize=15, searchWindowSize=21)
thresh=cv2.adaptiveThreshold(denoise,255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,41,2)
(and then I passed thresh to HoughCircles instead of img)
It just didn't seem to make any difference...
I believe that these are not circular enough for Hough to work well, you would have to lower param2 too much to account for the lack of uniformity. I would recommend looking into the cv2.findContours method instead and use your 'thresh' image.
https://docs.opencv.org/4.x/dd/d49/tutorial_py_contour_features.html
I have these set of images I want to de-noise in order to run OCR on :
I am trying to read the 1973 from the image.
I have tried
import cv2,numpy as np
img=cv2.imread('uxWbP.png',0)
img = cv2.resize(img, (0, 0), fx=2, fy=2)
copy_img=np.copy(img)
#adaptive threshold as the image has different lighting conditions in different areas
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 21, 2)
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#kill small contours
for i_cnt, cnt in enumerate(sorted(contours, key=lambda x: cv2.boundingRect(x)[0])):
_area = cv2.contourArea(cnt)
x, y, w, h = cv2.boundingRect(cnt)
x_y_area = w * h
if 10000 < x_y_area and x_y_area < 400000:
pass
# cv2.rectangle(copy_img, (x, y), (x + w, y + h), (255, 0, 255), 2)
# cv2.putText(copy_img, str(int(x_y_area)) + ' , ' + str(w) + ' , ' + str(h), (x, y + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)
# cv2.drawContours(copy_img, [cnt], 0, (0, 255, 0), 1)
elif 10000 > x_y_area:
#write over small contours
cv2.drawContours(thresh, [cnt], -1, 255, -1)
cv2.imshow('img',copy_img)
cv2.imshow('thresh',thresh)
cv2.waitKey(0)
Which significantly improves the image to:
Any recommendations on how to filter this image sufficiently either on improvements to the filtered image or complete change from the start, that I could run OCR or some ML detection scripts on this? I'd like to split out the numbers for detection, but open to other methods as well.
Another thing to try - either separately from the blurring (or in combination with it - is the erosion/dilation game, as hinted at in the comment by #eldesgraciado , to whom I think a good part of the credit for these answers should go.
These two (erosion and dilation) can be applied one after the other, repeatedly. I think the trick is to change the kernel size. Anyway, I know I've used that to reduce noise in the past. Here's one example of dilation:
>>> import cv2
>>> import numpy as np
>>> im_0 = cv2.imread("FWM8b.png")
>>> k_size = 3
>>> kernel = np.ones((k_size, k_size), np.uint8)
>>> im_dilated = cv2.dilate(im_0, kernel, iterations=1)
>>> cv2.imshow("d", im_dilated)
>>> cv2.waitKey(0)
Make whatever kernel you want for erosion, and check out the effects.
>>> im_eroded = cv2.erode(im_0, kernel, iterations=1)
>>> cv2.imshow("erosion", im_eroded)
>>> cv2.waitKey(0)
Edit with possible improvements:
>>> im_blurred = cv2.GaussianBlur(im_dilated, (0, 0), 3)
>>> im_better = cv2.addWeighted(im_0, 0.5, im_blurred, 1.2, 0)
# Getting closer.
^ dilated, blurred, and combined (added) with original, 1st way
# Even better, I think.
im_better2 = cv2.addWeighted(im_0, 0.9, im_blurred, 1.7, 0)
^ dilated, blurred, and combined (added) with original, 2nd way
You could do artifact removal, but be careful not to get rid of the stalk of the 7. If you can keep the 7 together, you can do connected-component analysis and keep the biggest connected components.
You could sum the values of pixels on each column and each row, which would probably lead to something like this (very approximated - almost time for work). Note that I was much more careful with the green curve - sums of columns - but the consistency of scaling is probably off.
Note that this is more a sum of (255 - pixel_value). That could find you rectangles where your to-be-found glyphs (digits) should be. You could do a 2-d map of column_pixel_sum + row_pixel_sum, or just do some approximation, as I have done below.
Also to feel free to rotate the image (or take pixel sums at different angles), and combine your info for each rotation.
Lots of other things to try ... the suggestion by #eldesgraciado of a noise model is especially intriguing.
Another thing you could try out is to create a "noise model" and subtract it from the original image. First, take the image and apply Gaussian Blur with very low parameters, just barely blurring it, next subtract this mask from the image. From here, the steps are experimental: The difference should be again blurred and thresholded. Save this image. You run this pre-processing with various parameters and saving each time the final binary image, then, average the masks obtained so far. The persistent blobs should be the ones you are looking for... like some sort of spatial bandstop, I guess...
Keep experimenting.
Unsharp mask (my other answer) on this result image. More noise gone, but hurts the 7.
My first thought is to put on a Gaussian blur for a sort of "unsharp filter". (I think my second idea is better; it combines this blur-and-add with the erosion/dilation game. I posted it as a separate answer, because I think it is a different-enough strategy to merit that.) #eldesgraciado noted frequency stuff, which is basically what we're doing here. I'll put on some code and explanation. (Here is one answer to an SO post that has a lot about sharpening - the answer linked is a more variable unsharp mask written in Python. Do take the time to look at other answers - including this one, one of many simple implementations that look just like mine - though some are written in different programming languages.) You'll need to mess with parameters. It's possible this won't work, but it's the first thing I thought of.
>>> import cv2
>>> im_0 = cv2.imread("FWM8b.png")
>>> cv2.imshow("FWM8b.png", im_0)
>>> cv2.waitKey(0)
## Press any key.
>>> ## Here's where we get to frequency. We'll use a Gaussian Blur.
## We want to take out the "frequency" of changes from white to black
## and back to white that are less than the thickness of the "1973"
>>> k_size = 0 ## This is the kernal size - the "width frequency",
## if you will. Using zero gives a width based on sigmas in
## the Gaussian function.
## You'll want to experiment with this and the other
## parameters, perhaps trying to run OCR over the image
## after each combination of parameters.
## Hint, avoid even numbers, and think of it as a radius
>>> gs_border = 3
>>> im_blurred = cv2.GaussianBlur(im_0, (k_size, k_size), gs_border)
>>> cv2.imshow("gauss", im_blurred)
>>> cv2.waitKey(0)
Okay, my parameters probably didn't blur this enough. The parts of the words that you want to get rid of aren't really blurry. I doubt you'll even see much of a difference from the original, but hopefully you'll get the idea.
We're going to multiply the original image by a value, multiply the blurry image by a value, and subtract value*blurry from value*orig. Code will be clearer, I hope.
>>> orig_img_multiplier = 1.5
>>> blur_subtraction_factor = -0.5
>>> gamma = 0
>>> im_better = cv2.addWeighted(im_0, orig_img_multiplier, im_blurred, blur_subtraction_factor, gamma)
>>> cv2.imshow("First shot at fixing", im_better)
Yeah, not too much different. Mess around with the parameters, try to do the blur before you do your adaptive threshold, and try some other methods. I can't guarantee it will work, but hopefully it will get you started going somewhere.
Edit
This is a great question. Responding to the tongue-in-cheek criticism of #eldesgraciado
Ah, naughty, naughty. Trying to break them CAPTCHA codes, huh? They are difficult to break for a reason. The text segmentation, as you see, is non-trivial. In your particular image, there’s a lot of high-frequency noise, you could try some frequency filtering first and see what result you get.
I submit the following from the Wikipedia article on reCAPTCHA (archived).
reCAPTCHA has completely digitized the archives of The New York Times and books from Google Books, as of 2011.three The archive can be searched from the New York Times Article Archive.four Through mass collaboration, reCAPTCHA was helping to digitize books that are too illegible to be scanned by computers, as well as translate books to different languages, as of 2015.five
Also look at this article (archived).
I don't think this CAPTCHA is part of Massive-scale Online Collaboration, though.
Edit: Some other type of sharpening will be needed. I just realized that I'm applying 1.5 and -0.5 multipliers to pixels which usually have values very close to 0 or 255, meaning I'm probably just recovering the original image after the sharpening. I welcome any feedback on this.
Also, from comments with #eldesgracio:
Someone probably knows a better sharpening algorithm than the one I used. Blur it enough, and maybe threshold on average values over an n-by-n grid (pixel density). I don't know to much about the whole adaptive-thresholding-then-contours thing. Maybe that could be re-done after the blurring...
Just to give you some ideas ...
Here's a blur with k_size = 5
Here's a blur with k_size = 25
Note those are the BLURS, not the fixes. You'll likely need to mess with the orig_img_multiplier and blur_subtraction_factor based on the frequency (I can't remember exactly how, so I can't really tell you how it's done.) Don't hesitate to fiddle with gs_border, gamma, and anything else you might find in the documentation for the methods I've shown.
Good luck with it.
By the way, the frequency is more something based on the 2-D Fast Fourier Transform, and possibly based on kernel details. I've just messed around with this stuff myself - definitely not an expert AND definitely happy if someone wants to give more details - but I hope I've given a basic idea. Adding some jitter noise (up and down or side to side blurring, rather than radius-based), might be helpful as well.
I am using openCV in python to detect cracks in concrete. I am able to use canny edge detection to detect cracks. Next, I need to fill the edges. I used floodfill operation of openCV but some of the gaps are filled whereas some are not filled. The image on the left is the input image whereas that on the right is the floodfilled image. I am guessing this is because my edges have breaks at points. How do i solve this ?
My code for floodfilling:
im_th1 = imginput
im_floodfill = im_th1.copy()
# Mask used to flood filling.
# Notice the size needs to be 2 pixels than the image.
h, w = im_th1.shape[:2]
mask = np.zeros((h + 2, w + 2), np.uint8)
# Floodfill from point (0, 0)
cv2.floodFill(im_floodfill, mask, (5, 5), 255);
# Invert floodfilled image
im_floodfill_inv = cv2.bitwise_not(im_floodfill)
# Combine the two images to get the foreground.
im_out = im_th1 | im_floodfill_inv
cv2.imshow("Foreground", im_out)
cv2.waitKey(0)
I found the solution to what i was looking for. Posting it here as it might come of use to others. After some research on the internet, it was just 2 lines of codes as suggested in this : How to complete/close a contour in python opencv?
The code that worked for me is :
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (9, 9))
dilated = cv2.dilate(image, kernel)
eroded=cv2.erode(dilated,kernel)
The result is in the image attached that shows before and after results.
I see this so often here on SO, everybody wants to use edge detection, and then fill in the area in between the edges.
Unless you use a method for edge detection that purposefully creates a closed contour, detected edges will likely not form a closed contour. And you cannot flood-fill a region unless you have a closed contour.
In most of these cases, some filtering and a simple threshold suffice. For example:
import PyDIP as dip
import matplotlib.pyplot as pp
img = dip.Image(pp.imread('oJAo7.jpg')).TensorElement(1) # From OP's other question
img = img[4:698,6:]
lines = dip.Tophat(img, 10, polarity='black')
dip.SetBorder(lines, [0], [2])
lines = dip.PathOpening(lines, length=100, polarity='opening', mode={'robust'})
lines = dip.Threshold(lines, method='otsu')[0]
This result is obtained after a simple top-hat filter, which keeps only thin things, followed by a path opening, which keeps only long things. This combination removes large-scale shading, as well as the small bumps and things. After the filtering, a simple Otsu threshold yields a binary image that marks all pixels in the crack.
Notes:
The input image is the one OP posted in another question, and is the input to the images posted in this question.
I'm using PyDIP, which you can get on GitHub and need to compile yourself. Hopefully soon we'll have a binary distribution. I'm an author.
I am trying to make a program that is capable of identifying a road in a scene and proceeded to using morphological filtering and the watershed algorithm. However the program produces either mediocre or bad results. It seems to do okay (not good enough through) if the road takes up most of the scene. However in other pictures, it turns out that the sky gets segmented instead (watershed with the clouds).
I tried to see if I can preform more image processing to improve the results, but this is the best I have so far and don't know how to move forward to improve my program.
How can I improve my program?
Code:
import numpy as np
import cv2
from matplotlib import pyplot as plt
import imutils
def invert_img(img):
img = (255-img)
return img
#img = cv2.imread('images/coins_clustered.jpg')
img = cv2.imread('images/road_4.jpg')
img = imutils.resize(img, height = 300)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
thresh = invert_img(thresh)
# noise removal
kernel = np.ones((3,3), np.uint8)
opening = cv2.morphologyEx(thresh,cv2.MORPH_OPEN,kernel, iterations = 4)
# sure background area
sure_bg = cv2.dilate(opening,kernel,iterations=3)
#sure_bg = cv2.morphologyEx(sure_bg, cv2.MORPH_TOPHAT, kernel)
# Finding sure foreground area
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
ret, sure_fg = cv2.threshold(dist_transform,0.7*dist_transform.max(),255,0)
# Finding unknown region
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg,sure_fg)
# Marker labelling
ret, markers = cv2.connectedComponents(sure_fg)
# Add one to all labels so that sure background is not 0, but 1
markers = markers+1
# Now, mark the region of unknown with zero
markers[unknown==255] = 0
'''
imgray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
imgray = cv2.GaussianBlur(imgray, (5, 5), 0)
img = cv2.Canny(imgray,200,500)
'''
markers = cv2.watershed(img,markers)
img[markers == -1] = [255,0,0]
cv2.imshow('background',sure_bg)
cv2.imshow('foreground',sure_fg)
cv2.imshow('threshold',thresh)
cv2.imshow('result',img)
cv2.waitKey(0)
For start, segmentation problems are hard. The more general you want the solution to be, the more hard it gets. Road segemntation is a well-known problem, and i'm sure you can find many papers which tackle this issue from various directions.
Something that helps me get ideas for computer vision problems is trying to think what makes it so easy for me to detect it and so hard for computer.
For example, let's look on the road on your images. What makes it unique from the background?
Distinct gray color.
Always have 2 shoulders lines in white color
Always on the bottom section of the image
Always have a seperation line in the middle (yellow/white)
Pretty smooth
Wider on the bottom and vanishing into horizon.
Now, after we have found some unique features, we need to find ways to quantify them, so it will be obvious to the algorithm as it is obvious to us.
Work on the RGB (or even better - HSV) image, don't convert it to gray on the beginning and lose all the color data. Look for gray area!
Again, let's find white regions (inside gray ones). You can try do edge detection in the specific orientation of the shoulders line. You are looking for line that takes about half of the height of the image. etc...
Lets delete the upper half of the image. It is hardly that you ever have there a road, and you will get rid from a lot of noise in your algorithm.
see 2...
Lets check the local standard deviation, or some other smoothness feature.
If we found some shape, lets check if it fits what we expect.
I know these are just ideas and I don't claim they are easy to implement, but if you want to improve your algorithm you must give it more "knowledge", just as you have.
Exploit some domain knowledge; in other words, make some simplifying assumptions. Even basic things like "the camera's not upside down" and "the pavement has a uniform hue" will improve the common case.
If you can treat crossroads as a special case, then finding the edges of the roadway may be a simpler and more useful task than finding the roadway itself.
I read this blog post where he uses a Laser and a Webcam to estimated the distance of the cardboard from the Webcam.
I had another idea about that. I don't want to calculate the distance from the webcam.
I want to check if an object is approaching the webcam. The algorithm, according to me, will be something like:
Detect the object in the webcam feed.
If the object is approaching the webcam it'll grow larger and larger in the video feed.
Use this data for further calculations.
Since I want to detect random objects, I am using the findContours() method to find the contours in the video feed. Using that, I will at least have the outlines of the objects in the video feed. The source code is:
import numpy as np
import cv2
vid=cv2.VideoCapture(0)
ans, instant=vid.read()
average=np.float32(instant)
cv2.accumulateWeighted(instant, average, 0.01)
background=cv2.convertScaleAbs(average)
while(1):
_,f=vid.read()
imgray=cv2.cvtColor(f, cv2.COLOR_BGR2GRAY)
ret, thresh=cv2.threshold(imgray,127,255,0)
diff=cv2.absdiff(f, background)
cv2.imshow("input", f)
cv2.imshow("Difference", diff)
if cv2.waitKey(5)==27:
break
cv2.destroyAllWindows()
The output is:
I am stuck here. I have the contours stored in an array. What do I do with it when the size increases? How do I proceed?
One trouble here is recognising and differentiating the moving objects from other stuff in the video feed. An approach might be to let the camera 'learn' what the background looks like with no object. Then you can constantly compare its input against this background. One way to get the background is to use a running average.
Any difference greater than a small threshold means there is a moving object. If you constantly display this difference, you basically have a motion tracker. The size of the objects is simply the sum of all the non-zero (thresholded) pixels, or their bounding rectangles. You can track this size and use it to guess whether the object is moving closer or further. Morphological operations can help group the contours into one cohesive object.
Since it will be tracking ANY movement, if there are two objects, they will be counted together. Here is where you can use the contours to find and track individual objects, e.g. using the contour bounds or centroids. You could also possibly separate them by colour.
Here are some results using this strategy (the grey blob is my hand):
It actually did a fairly good job of guessing which way my hand was moving.
Code:
import cv2
import numpy as np
AVERAGE_ALPHA = 0.2 # 0-1 where 0 never adapts, and 1 instantly adapts
MOVEMENT_THRESHOLD = 30 # Lower values pick up more movement
REDUCED_SIZE = (400, 600)
MORPH_KERNEL = np.ones((10, 10), np.uint8)
def reduce_image(input_image):
"""Make the image easier to deal with."""
reduced = cv2.resize(input_image, REDUCED_SIZE)
reduced = cv2.cvtColor(reduced, cv2.COLOR_BGR2GRAY)
return reduced
# Initialise
vid = cv2.VideoCapture(0)
average = None
old_sizes = np.zeros(20)
size_update_index = 0
while (True):
got_frame, frame = vid.read()
if got_frame:
# Reduce image
reduced = reduce_image(frame)
if average is None: average = np.float32(reduced)
# Get background
cv2.accumulateWeighted(reduced, average, AVERAGE_ALPHA)
background = cv2.convertScaleAbs(average)
# Get thresholded difference image
movement = cv2.absdiff(reduced, background)
_, threshold = cv2.threshold(movement, MOVEMENT_THRESHOLD, 255, cv2.THRESH_BINARY)
# Apply morphology to help find object
dilated = cv2.dilate(threshold, MORPH_KERNEL, iterations=10)
closed = cv2.morphologyEx(dilated, cv2.MORPH_CLOSE, MORPH_KERNEL)
# Get contours
contours, _ = cv2.findContours(closed, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(closed, contours, -1, (150, 150, 150), -1)
# Find biggest bounding rectangle
areas = [cv2.contourArea(c) for c in contours]
if (areas != list()):
max_index = np.argmax(areas)
max_cont = contours[max_index]
x, y, w, h = cv2.boundingRect(max_cont)
cv2.rectangle(closed, (x, y), (x+w, y+h), (255, 255, 255), 5)
# Guess movement direction
size = w*h
if size > old_sizes.mean():
print "Towards"
else:
print "Away"
# Update object size
old_sizes[size_update_index] = size
size_update_index += 1
if (size_update_index) >= len(old_sizes): size_update_index = 0
# Display image
cv2.imshow('RaptorVision', closed)
Obviously this needs more work in terms of identifying, selecting and tracking the objects etc (at the moment it does horribly if there is something else moving in the background). There are also many parameters to vary and tweak (the ones set are what worked well for my system). I'll leave that up to you though.
Some links:
background extraction
motion tracking
If you want to get a bit more high-tech with the background removal, have a look here:
wallflower
Detect the object in the webcam feed.
If the object is approaching the webcam it'll grow larger and larger in the video feed.
Use this data for further calculations.
Good idea.
If you want to use the contour detection approach, you could do it the following way:
You have a series of Images I1, I2, ... In
Do a contour detection on each one. C1, C2, ..., Cn (Contour is a set of points in OpenCV)
Take a large enough sample on your Image i and i+1: S_i \leq C_i, i \in 1...n
Check for all points in your sample for the nearest point on i+1. Then you trajectorys for all your points.
Check if this trajectorys point mostly outwards (tricky part ;)
If they appear outwards for a suffiecent number of frames your contour got bigger.
Alternative you could try to prune the points that are not part of the correct contour and work with a covering rectangle. It's very easy to check the size that way, but i don't knwo how easy it will be to choose the "correct" points.