How to define top and left points from a cropped numpy rectangle? - python

Good day, in following code, I am able to crop a rectangle ROI from the first frame.
The final outcome of this while loop provided an ROI stored as numpy array named "monitor_region".
video = cv2.VideoCapture("Rob.mp4")
ret, frame = video.read()
roi_status = False
while(roi_status == False):
roi = cv2.selectROI("Region Selection by ROI", frame, False)
if(not all(roi)):
print("Undefined monitor region.")
continue
monitor_region = frame[int(roi[1]):int(roi[1]+roi[3]),int(roi[0]):int(roi[0]+roi[2])]
cv2.imshow("Selected Region", monitor_region)
if(cv2.waitKey(0) & 0xFF == 8): #backspace to save
print("Monitor region has been saved.")
roi_status = True
cv2.destroyAllWindows()
Since this "monitor_region" is a portion of entire frame as well as a rectangle. Therefore, I am looking for a feasible solution to define its left and top points in order to define a range for checking (As illustration provided). In following code, I am able to define and width and height of the roi.
monitor_width = monitor_region.shape[1]
monitor_height = monitor_region.shape[0]
However, I am still lack of left and top points. Once I have obtained both top and left points of ROI. I could perform x and y points checking in as below, which I use to determine if the object exists within ROI or not.
if((monitor_left < Point_x < (monitor_left + monitor_width)) and (monitor_top < Point_y < (monitor_top + monitor_height))):

selectROI returns a tuple of exactly the values you seek: the left bound, the top bound, the width and the height of the selected ROI.
If you print(roi) you will get something like (100,150,300,200)
You can unpack the values like this:
x,y,width,height = roi
Where x is the left bound and y is the top bound.

Related

Motion Detector - Disregard Background Motion/Incorrect ID

I have a script in python which acts as a motion detector. I read a video file using cv2, convert to grayscale, and do simple background subtraction from the current frame to detect motion, which I draw a rectangle over. The video is eventually saved as a new file, where I can finally view it.
This works fine, except sometimes the starting frame (background frame) already has motion in it, or sometimes there are features in the background which move but I don't want to detect (eg if I was detecting people, I wouldn't be interested in a flag blowing in the breeze). So I want to somehow disregard 'stationary' movement (ie motion which does not move vertically/horizontally over the course of the video). However I'm having trouble with my approach. There doesn't seem to be any functions or scripts on the internet to solve this.
One idea I had was to draw a larger rectangle over the original, and then if the original rectangle doesn't leave the outer rectangle (which stays put) over the video, then that motion can be cancelled altogether. I have no idea how to implement this. I have managed to draw a larger rectangle, but it follows the original and doesn't stay in place.
Does anyone have any idea how I might be able to do this? Or any resources they could point me in? Thank you. Below is my code starting from when I draw the rectangles.
for c in cnts:
# if the contour is too small, ignore it
if cv2.contourArea(c) < min_area:
continue
# compute the bounding box for the contour, draw it on the frame, and update the text
(x, y, w, h) = cv2.boundingRect(c)
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
text = "Occupied" # frame is occupied
half_w=int(w/2) # get 50% sizing width
half_h=int(h/2) # get 50% sizing height
x_surr = int (x - (half_w/2))
y_surr = int(y - (half_h/2))
w_surr = (w+half_w)
h_surr = (h+half_h)
cv2.rectangle(frame, (x_surr, y_surr), (x_surr+w_surr, y_surr + h_surr), (255, 255, 255), 2)
I think this code might help you. Basically it compares the value of each pixel in the current frame to the corresponding value of that pixel in the average of the previous n frames. When no motion is present, it is all black. When there is motion, it will show the color of the moving option. Since it is keeping track average of recent frames. You should be able to filter our slight movements for flags fluttering, etc. You will probably need to play around with some thresholding on the final image to get the result you want.
Stillness:
Motion:
import cv2
def main():
# define the length of the list of the number of recent frames to keep track of
NUMBER_FRAMES_TO_TRACK = 30
# start the webcam
cap = cv2.VideoCapture(1)
ret, frame = cap.read()
if ret == False:
print("No webcam detected.")
return
# generate a list of recent frames
recent_frames = [frame for n in range(NUMBER_FRAMES_TO_TRACK)]
# start the video loop
while True:
ret, frame = cap.read()
if ret == False:
break
# update the list of recent frames with the most recent frame
recent_frames = recent_frames[1:]
recent_frames.append(frame)
# calculate the average of all recent frames
average = recent_frames[0]
for i in range(len(recent_frames)):
if i == 0:
pass
else:
alpha = 1.0/(i + 1)
beta = 1.0 - alpha
average = cv2.addWeighted(recent_frames[i], alpha, average, beta, 0.0)
# find the difference between the current frame and the average of recent frames
difference = cv2.subtract(frame, average)
# show the results
cv2.imshow("video", frame)
cv2.imshow("average", average)
cv2.imshow("difference", difference)
key = cv2.waitKey(1)
if key == ord('q'):
break
cv2.destroyAllWindows()
cap.release()
if __name__ == "__main__":
main()

Removing blank space around a circle shaped mask

I have an image of a circular-shaped mask, which is essentially a colored circle within a black image.
I want to remove all the blank space around the mask, such that the boundaries of the image align with the circle as such:
I've written up a script to do this by searching through every column and row until a pixel with a value greater than 0 appears. searching from left to right, right to left, top to bottom, and bottom to the top gets me the mask boundaries, allowing me to crop the original image. Here is the code:
ROWS, COLS, _ = img.shape
BORDER_RIGHT = (0,0)
BORDER_LEFT = (0,0)
right_found = False
left_found = False
# find borders of blank space for removal.
# left and right border
print('Searching for Right and Left corners')
for col in tqdm(range(COLS), position=0, leave=True):
for row in range(ROWS):
if left_found and right_found:
break
# searching from left to right
if not left_found and N.sum(img[row][col]) > 0:
BORDER_LEFT = (row, col)
left_found = True
# searching from right to left
if not right_found and N.sum(img[row][-col]) > 0:
BORDER_RIGHT = (row, img.shape[1] + (-col))
right_found = True
BORDER_TOP = (0,0)
BORDER_BOTTOM = (0,0)
top_found = False
bottom_found = False
# top and bottom borders
print('Searching for Top and Bottom corners')
for row in tqdm(range(ROWS), position=0, leave=True):
for col in range(COLS):
if top_found and bottom_found:
break
# searching top to bottom
if not top_found and N.sum(img[row][col]) > 0:
BORDER_TOP = (row, col)
top_found = True
# searching bottom to top
if not bottom_found and N.sum(img[-row][col]) > 0:
BORDER_BOTTOM = (img.shape[0] + (-row), col)
bottom_found = True
# crop left and right borders
new_img = img[:,BORDER_LEFT[1]: BORDER_RIGHT[1] ,:]
# crop top and bottom borders
new_img = new_img[BORDER_TOP[0] : BORDER_BOTTOM[0],:,:]
I was wondering whether there was a more efficient way to do this. With larger images, this can be quite time-consuming especially if the mask is relatively small with respect to the original image shape. thanks!
Assuming you have only this object inside the image, there are two ways to do this:
You can threshold the image, then use numpy.where to find all locations that are non-zero, then use numpy.min and numpy.max on the appropriate row and column locations that come out of numpy.where to give you the bounding rectangle.
You can first find the contour points of the object after you threshold with cv2.findContours. This should result in a single contour, so once you have these points you put this through cv2.boundingRect to return the top-left corner of the rectangle followed by the width and height of its extent.
The first method will work if there is a single object and efficiently at that. The second one will work if there is more than one object, but you have to know which contour the object of interest is in, then you simply index into the output of cv2.findContours and pipe this through cv2.boundingRect to get the rectangular dimensions of the object of interest.
However, the takeaway is that either of these methods is much more efficient than the approach you have proposed where you are manually looping over each row and column and calculating sums.
Pre-processing
These sets of steps are going to be common to both methods. In summary, we read in the image, then convert it to grayscale then threshold. I didn't have access to your original image so I read it in from Stack Overflow and cropped it so that the axes are not showing. This will apply to the second method as well.
Here's a reconstruction of your image where I've taken a snapshot.
First I'll read in the image directly from the Internet as well as import the relevant packages I need to get the job done:
import skimage.io as io
import numpy as np
import cv2
img = io.imread('https://i.stack.imgur.com/dj1a8.png')
Thankfully, Scikit image has a method that reads in images directly from the Internet: skimage.io.imread.
After, I'm going to convert the image to grayscale, then threshold it:
img_gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
im = img_gray > 40
I use OpenCV's cv2.cvtColor to convert the image from colour to grayscale. After, I threshold the image so that any intensity above 40 is set to True and everything else is set to False. The threshold of 40 I chose by trial and error until I get a mask that appeared to be circular. Taking a look at this image we get:
Method #1
As I illustrated above, use numpy.where on the thresholded image, then use numpy.min and numpy.max find the appropriate top-left and bottom-right corners and crop the image:
(r, c) = np.where(im == 1)
min_row, min_col = np.min(r), np.min(c)
max_row, max_col = np.max(r), np.max(c)
im_crop = img[min_row:max_row+1, min_col:max_col+1]
numpy.where for a 2D array will return a tuple of row and column locations that are non-zero. If we find the minimum row and column location, that corresponds to the top-left corner of the bounding rectangle. Similarly, the maximum row and column location corresponds to the bottom-right corner of the bounding rectangle. What's nice is that numpy.min and numpy.max work in a vectorised fashion, meaning that it operates on entire NumPy arrays in a single sweep. This logic is used above, then we index into the original colour image to crop out the range of rows and columns that contain the object of interest. im_crop contains that result. Note that the maximum row and column needs to be added with 1 when we're indexing as slicing with the end indices are exclusive so adding with 1 ensures we include the pixel locations at the bottom right corner of the rectangle.
We therefore get:
Method #2
We will use cv2.findContours to find all contour points of all objects in the image. Because there's a single object, only one contour should result, so we use this contour to pipe into cv2.boundingRect to find the top-left corner of the bounding rectangle of the object, combined with its width and height to crop out the image.
cnt, _ = cv2.findContours(im.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
x, y, w, h = cv2.boundingRect(cnt[0])
im_crop = img[y:y+h, x:x+w]
Take note that we have to convert the thresholded image into unsigned 8-bit integer, as that is the type that the function is expecting. Furthermore, we use cv2.RETR_EXTERNAL as we only want to retrieve the coordinates of the outer perimeter of any objects we see in the image. We also use cv2.CHAIN_APPROX_NONE to return every possible contour point on the object. The cnt is a list of contours that was found in the image. The size of this list should only be 1, so we index into this directly and pipe this into cv2.boundingRect. We then use the top-left corner of the rectangle, combined with its width and height to crop out the object.
We therefore get:
Full Code
Here's the full code listing from start to finish. I've left comments below to delineate what methods #1 and #2 are. For now, method #2 has been commented out, but you can decide whichever one you want to use by simply commenting and uncommenting the relevant code.
import skimage.io as io
import cv2
import numpy as np
img = io.imread('https://i.stack.imgur.com/dj1a8.png')
img_gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
im = img_gray > 40
# Method #1
(r, c) = np.where(im == 1)
min_row, min_col = np.min(r), np.min(c)
max_row, max_col = np.max(r), np.max(c)
im_crop = img[min_row:max_row+1, min_col:max_col+1]
# Method #2
#cnt, _ = cv2.findContours(im.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
#x, y, w, h = cv2.boundingRect(cnt[0])
#im_crop = img[y:y+h, x:x+w]

Find minimal number of rectangles in the image

I have binary images where rectangles are placed randomly and I want to get the positions and sizes of those rectangles.
If possible I want the minimal number of rectangles necessary to exactly recreate the image.
On the left is my original image and on the right the image I get after applying scipys.find_objects()
(like suggested for this question).
import scipy
# image = scipy.ndimage.zoom(image, 9, order=0)
labels, n = scipy.ndimage.measurements.label(image, np.ones((3, 3)))
bboxes = scipy.ndimage.measurements.find_objects(labels)
img_new = np.zeros_like(image)
for bb in bboxes:
img_new[bb[0], bb[1]] = 1
This works fine if the rectangles are far apart, but if they overlap and build more complex structures this algorithm just gives me the largest bounding box (upsampling the image made no difference). I have the feeling that there should already exist a scipy or opencv method which does this.
I would be glad to know if somebody has an idea on how to tackle this problem or even better knows of an existing solution.
As result I want a list of rectangles (ie. lower-left-corner : upper-righ-corner) in the image. The condition is that when I redraw those filled rectangles I want to get exactly the same image as before. If possible the number of rectangles should be minimal.
Here is the code for generating sample images (and a more complex example original vs scipy)
import numpy as np
def random_rectangle_image(grid_size, n_obstacles, rectangle_limits):
n_dim = 2
rect_pos = np.random.randint(low=0, high=grid_size-rectangle_limits[0]+1,
size=(n_obstacles, n_dim))
rect_size = np.random.randint(low=rectangle_limits[0],
high=rectangle_limits[1]+1,
size=(n_obstacles, n_dim))
# Crop rectangle size if it goes over the boundaries of the world
diff = rect_pos + rect_size
ex = np.where(diff > grid_size, True, False)
rect_size[ex] -= (diff - grid_size)[ex].astype(int)
img = np.zeros((grid_size,)*n_dim, dtype=bool)
for i in range(n_obstacles):
p_i = np.array(rect_pos[i])
ps_i = p_i + np.array(rect_size[i])
img[tuple(map(slice, p_i, ps_i))] = True
return img
img = random_rectangle_image(grid_size=64, n_obstacles=30,
rectangle_limits=[4, 10])
Here is something to get you started: a naïve algorithm that walks your image and creates rectangles as large as possible. As it is now, it only marks the rectangles but does not report back coordinates or counts. This is to visualize the algorithm alone.
It does not need any external libraries except for PIL, to load and access the left side image when saved as a PNG. I'm assuming a border of 15 pixels all around can be ignored.
from PIL import Image
def fill_rect (pixels,xp,yp,w,h):
for y in range(h):
for x in range(w):
pixels[xp+x,yp+y] = (255,0,0,255)
for y in range(h):
pixels[xp,yp+y] = (255,192,0,255)
pixels[xp+w-1,yp+y] = (255,192,0,255)
for x in range(w):
pixels[xp+x,yp] = (255,192,0,255)
pixels[xp+x,yp+h-1] = (255,192,0,255)
def find_rect (pixels,x,y,maxx,maxy):
# assume we're at the top left
# get max horizontal span
width = 0
height = 1
while x+width < maxx and pixels[x+width,y] == (0,0,0,255):
width += 1
# now walk down, adjusting max width
while y+height < maxy:
for w in range(x,x+width,1):
if pixels[x,y+height] != (0,0,0,255):
break
if pixels[x,y+height] != (0,0,0,255):
break
height += 1
# fill rectangle
fill_rect (pixels,x,y,width,height)
image = Image.open('A.png')
pixels = image.load()
width, height = image.size
print (width,height)
for y in range(16,height-15,1):
for x in range(16,width-15,1):
if pixels[x,y] == (0,0,0,255):
find_rect (pixels,x,y,width,height)
image.show()
From the output
you can observe the detection algorithm can be improved, as, for example, the "obvious" two top left rectangles are split up into 3. Similar, the larger structure in the center also contains one rectangle more than absolutely needed.
Possible improvements are either to adjust the find_rect routine to locate a best fit¹, or store the coordinates and use math (beyond my ken) to find which rectangles may be joined.
¹ A further idea on this. Currently all found rectangles are immediately filled with the "found" color. You could try to detect obviously multiple rectangles, and then, after marking the first, the other rectangle(s) to check may then either be black or red. Off the cuff I'd say you'd need to try different scan orders (top-to-bottom or reverse, left-to-right or reverse) to actually find the minimally needed number of rectangles in any combination.

Optical Flow using OpenCV - Horizontal and Vertical Components

I have the following code that finds the Optical Flow of 2 images (or 2 frames of a video) and it's colour coded. What I want is the horizontal and vertical components of the optical flow separately (as in separate images)
Here is the code I have so far:
import cv2
import numpy as np
frame1 = cv2.imread('my1.bmp')
frame2 = cv2.imread('my2.bmp')
prvs = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)
next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY)
hsv = np.zeros_like(frame1)
hsv[...,1] = 255
while(1):
next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY)
flow = cv2.calcOpticalFlowFarneback(prvs, next, 0.5, 3, 15, 3, 5, 1.2, 0)
mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1])
hsv[...,0] = ang*180/np.pi/2
hsv[...,2] = cv2.normalize(mag,None,0,255,cv2.NORM_MINMAX)
rgb = cv2.cvtColor(hsv,cv2.COLOR_HSV2BGR)
cv2.imshow('frame2',rgb)
k = cv2.waitKey(30) & 0xff
if k == 27:
break
elif k == ord('s'):
cv2.imwrite('opticalmyhsv.pgm',rgb)
cap.release()
cv2.destroyAllWindows()
This is what the optical flow looks like given my two images:
If you want to visualize the horizontal and vertical component separately, you can visualize both separately as grayscale images. I'll make it such that a colour of gray denotes no motion, black denotes the maximum amount of motion in the frame going to the left (negative) while white denotes the maximum amount of motion in the frame going towards the right (positive).
The output of calcOpticalFlowFarneback is a 3D numpy array where the first slice denotes the amount of horizontal (x) displacement while the second slice denotes the amount of vertical (y) displacement.
As such, all you need to do is define two separate 2D numpy arrays that will store these values so we can display them to the user. However, you're going to need to normalize the flow for display such that no motion is a rough gray, motion to the extreme left is black, or intensity 0, and motion to the extreme right is white, or intensity 255.
Therefore, all you would need to do is modify your code to show two OpenCV windows for the horizontal and vertical motion like so:
import cv2
import numpy as np
frame1 = cv2.imread('my1.bmp')
frame2 = cv2.imread('my2.bmp')
prvs = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)
next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY)
flow = cv2.calcOpticalFlowFarneback(prvs, next, 0.5, 3, 15, 3, 5, 1.2, 0)
# Change here
horz = cv2.normalize(flow[...,0], None, 0, 255, cv2.NORM_MINMAX)
vert = cv2.normalize(flow[...,1], None, 0, 255, cv2.NORM_MINMAX)
horz = horz.astype('uint8')
vert = vert.astype('uint8')
# Change here too
cv2.imshow('Horizontal Component', horz)
cv2.imshow('Vertical Component', vert)
k = cv2.waitKey(0) & 0xff
if k == ord('s'): # Change here
cv2.imwrite('opticalflow_horz.pgm', horz)
cv2.imwrite('opticalflow_vert.pgm', vert)
cv2.destroyAllWindows()
I've modified the code so that there is no while loop as you're only finding the optical flow between two predetermined frames. You're not grabbing frames off of a live source, like a camera, so we can just show both of the images not in a while loop. I've made the wait time for waitKey set to 0 so that you wait indefinitely until you push a key. This pretty much simulates your while loop behaviour from before, but it doesn't burden your CPU needlessly with wasted cycles. I've also removed some unnecessary variables, like the hsv variable as we aren't displaying both horizontal and vertical components colour coded. We also just compute the optical flow once.
In any case, with the above code we compute the optical flow, extract the horizontal and vertical components separately, normalize the components between the range of [0,255], cast to uint8 so that we can display the results then show the results. I've also modified your code so that if you wanted to save the components, it'll save the horizontal and vertical components as two separate images.
Edit
In your comments, you want to display a sequence of images using the same logic we have created above. You have a list of file names that you want to cycle through. That isn't very difficult to do. Simply take your strings and put them into a list and compute the optical flow between pairs of images by using the file names stored in this list. I'll modify the code such that when we reach the last element of the list, we will wait for the user to push something. Until then, we will cycle through each pair of images until the end. In other words:
import cv2
import numpy as np
# Create list of names here from my1.bmp up to my20.bmp
list_names = ['my' + str(i+1) + '.bmp' for i in range(20)]
# Read in the first frame
frame1 = cv2.imread(list_names[0])
prvs = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)
# Set counter to read the second frame at the start
counter = 1
# Until we reach the end of the list...
while counter < len(list_names):
# Read the next frame in
frame2 = cv2.imread(list_names[counter])
next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY)
# Calculate optical flow between the two frames
flow = cv2.calcOpticalFlowFarneback(prvs, next, 0.5, 3, 15, 3, 5, 1.2, 0)
# Normalize horizontal and vertical components
horz = cv2.normalize(flow[...,0], None, 0, 255, cv2.NORM_MINMAX)
vert = cv2.normalize(flow[...,1], None, 0, 255, cv2.NORM_MINMAX)
horz = horz.astype('uint8')
vert = vert.astype('uint8')
# Show the components as images
cv2.imshow('Horizontal Component', horz)
cv2.imshow('Vertical Component', vert)
# Change - Make next frame previous frame
prvs = next.copy()
# If we get to the end of the list, simply wait indefinitely
# for the user to push something
if counter == len(list_names)-1
k = cv2.waitKey(0) & 0xff
else: # Else, wait for 1 second for a key
k = cv2.waitKey(1000) & 0xff
if k == 27:
break
elif k == ord('s'): # Change
cv2.imwrite('opticalflow_horz' + str(counter) + '-' + str(counter+1) + '.pgm', horz)
cv2.imwrite('opticalflow_vert' + str(counter) + '-' + str(counter+1) + '.pgm', vert)
# Increment counter to go to next frame
counter += 1
cv2.destroyAllWindows()
The above code will cycle through pairs of frames and wait for 1 second between each pair to give you the opportunity to either break out of the showing, or saving the horizontal and vertical components to file. Bear in mind that I have made it such that whatever frames you save, they are indexed with two numbers that tell you which pairs of frames they are showing. Before the next iteration happens, the next frame will be come the previous frame and so next gets replaced by a copy of prvs. At the beginning of the loop, the next frame gets read in appropriately.
Hope this helps. Good luck!

Trim scanned images with PIL?

What would be the approach to trim an image that's been input using a scanner and therefore has a large white/black area?
the entropy solution seems problematic and overly intensive computationally. Why not edge detect?
I just wrote this python code to solve this same problem for myself. My background was dirty white-ish, so the criteria that I used was darkness and color. I simplified this criteria by just taking the smallest of the R, B or B value for each pixel, so that black or saturated red both stood out the same. I also used the average of the however many darkest pixels for each row or column. Then I started at each edge and worked my way in till I crossed a threshold.
Here is my code:
#these values set how sensitive the bounding box detection is
threshold = 200 #the average of the darkest values must be _below_ this to count (0 is darkest, 255 is lightest)
obviousness = 50 #how many of the darkest pixels to include (1 would mean a single dark pixel triggers it)
from PIL import Image
def find_line(vals):
#implement edge detection once, use many times
for i,tmp in enumerate(vals):
tmp.sort()
average = float(sum(tmp[:obviousness]))/len(tmp[:obviousness])
if average <= threshold:
return i
return i #i is left over from failed threshold finding, it is the bounds
def getbox(img):
#get the bounding box of the interesting part of a PIL image object
#this is done by getting the darekest of the R, G or B value of each pixel
#and finding were the edge gest dark/colored enough
#returns a tuple of (left,upper,right,lower)
width, height = img.size #for making a 2d array
retval = [0,0,width,height] #values will be disposed of, but this is a black image's box
pixels = list(img.getdata())
vals = [] #store the value of the darkest color
for pixel in pixels:
vals.append(min(pixel)) #the darkest of the R,G or B values
#make 2d array
vals = np.array([vals[i * width:(i + 1) * width] for i in xrange(height)])
#start with upper bounds
forupper = vals.copy()
retval[1] = find_line(forupper)
#next, do lower bounds
forlower = vals.copy()
forlower = np.flipud(forlower)
retval[3] = height - find_line(forlower)
#left edge, same as before but roatate the data so left edge is top edge
forleft = vals.copy()
forleft = np.swapaxes(forleft,0,1)
retval[0] = find_line(forleft)
#and right edge is bottom edge of rotated array
forright = vals.copy()
forright = np.swapaxes(forright,0,1)
forright = np.flipud(forright)
retval[2] = width - find_line(forright)
if retval[0] >= retval[2] or retval[1] >= retval[3]:
print "error, bounding box is not legit"
return None
return tuple(retval)
if __name__ == '__main__':
image = Image.open('cat.jpg')
box = getbox(image)
print "result is: ",box
result = image.crop(box)
result.show()
For starters, Here is a similar question. Here is a related question. And a another related question.
Here is just one idea, there are certainly other approaches. I would select an arbitrary crop edge and then measure the entropy* on either side of the line, then proceed to re-select the crop line (probably using something like a bisection method) until the entropy of the cropped-out portion falls below a defined threshold. As I think, you may need to resort to a brute root-finding method as you will not have a good indication of when you have cropped too little. Then repeat for the remaining 3 edges.
*I recall discovering that the entropy method in the referenced website was not completely accurate, but I could not find my notes (I'm sure it was in a SO post, however.)
Edit:
Other criteria for the "emptiness" of an image portion (other than entropy) might be contrast ratio or contrast ratio on an edge-detect result.

Categories