This is rather a theoretical question than asking for specific code issue.
I have done a bit of facial landmark detection using Haar Cascades, but this time I have a different type of video on my hands. It's a side view of a horse's eye (camera is mounted to the side of the head) so essentially what I see is a giant eye. I tried using Haar Cascades but it's no use, since there is no face to be detected in my video.
I was wondering what the best way to detect the eye and blinks would be on this horse? Do I try and customize a dlib facial mark detector? I didn't find much information on animal landmarks.
Thanks in advance! :)
I used an object tracker to continue locating the eye after drawing a bounding box around it on the first frame.
I created a set width and height bounding box since we can roughly assume that the eye isn't growing or shrinking relative to the camera. When drawing the bounding box for the tracker, we have to include more than just the eye since it would otherwise lose track of the object whenever they blink.
I looked for whether the saturation of the bounded area dropped below a threshold in each frame as a check for whether or not they blinked. The blue box is the bounding box returned by the tracker, the green box is the area I'm cropping and checking the saturation level of.
Here's a graph of the saturation level over the course of the video
You can clearly see the areas where they blinked
Here's a (heavily compressed to make the 2mb limit) gif of the result
import cv2
import numpy as np
import math
# tuplifies things for opencv
def tup(p):
return (int(p[0]), int(p[1]));
# returns the center of the box
def getCenter(box):
x = box[0];
y = box[1];
x += box[2] / 2.0;
y += box[3] / 2.0;
return [x,y];
# rescales image by percent
def rescale(img, scale):
h,w = img.shape[:2];
h = int(h*scale);
w = int(w*scale);
return cv2.resize(img, (w,h));
# load video
cap = cv2.VideoCapture("blinking.mov");
scale = 0.5;
# font stuff
font = cv2.FONT_HERSHEY_SIMPLEX;
org = (50, 50);
fontScale = 1;
font_color = (255, 255, 0);
thickness = 2;
# set up tracker
tracker = cv2.TrackerCSRT_create(); # I'm using OpenCV 3.4
backup = cv2.TrackerCSRT_create();
# grab the first frame
_, frame = cap.read();
frame = rescale(frame, scale);
# init tracker
box = cv2.selectROI(frame, False);
tracker.init(frame, box);
backup.init(frame, box);
cv2.destroyAllWindows();
# set center bounds
width = 75;
height = 60;
# save numbers
file_index = 0;
# blink counter
blinks = 0;
blink_thresh = 35;
blink_trigger = True;
# show video
done = False;
while not done:
# get frame
ret, frame = cap.read();
if not ret:
break;
frame = rescale(frame, scale);
# choose a color space
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV);
h,s,v = cv2.split(hsv);
channel = s;
# grab tracking box
ret, box = tracker.update(frame);
if ret:
# get the center
center = getCenter(box);
x, y = center;
# make box on center
tl = [x - width, y - height];
br = [x + width, y + height];
tl = tup(tl);
br = tup(br);
# get top left and bottom right
p1 = [box[0], box[1]];
p2 = [p1[0] + box[2], p1[1] + box[3]];
p1 = tup(p1);
p2 = tup(p2);
# draw a roi around the image
cv2.rectangle(frame, p1, p2, (255,0,0), 3);
cv2.rectangle(frame, tl, br, (0,255,0), 3);
cv2.circle(frame, tup(center), 6, (0,0,255), -1);
# get the channel average in the box
slc = channel[tl[1]:br[1], tl[0]:br[0]];
ave = np.mean(slc);
# if it dips below a set value, then trigger a blink
if ave < blink_thresh:
if blink_trigger:
blinks += 1;
blink_trigger = False;
else:
blink_trigger = True;
# draw blink count
frame = cv2.putText(frame, "Blinks: " + str(blinks), org, font, fontScale,
font_color, thickness, cv2.LINE_AA);
# show
cv2.imshow("Frame", frame);
key = cv2.waitKey(1);
# check keypress
done = key == ord('q');
Related
I have this project that combines two data from two different sensors, a TFLuna LiDAR and a Raspberry Pi Camera Module V2, for an Object Detection self driving vehicle. I've tried Threading both sensors to allow concurrent operation but I have trouble displaying those data together. Sometimes they work but the LiDAR data won't continously update itself. here's the code I'm using for the LiDAR:
def read_tfluna_data():
while True:
counter = ser.in_waiting # count the number of bytes of the serial port
if counter > 8:
bytes_serial = ser.read(9) # read 9 bytes
ser.reset_input_buffer() # reset buffer
if bytes_serial[0] == 0x59 and bytes_serial[1] == 0x59: # check first two bytes
distance = bytes_serial[2] + bytes_serial[3]*256 # distance in next two bytes
return distance
class lidar:
def update(self):
# Keep looping indefinitely until the thread is stopped
while True:
distance = read_tfluna_data()
return distance
def __init__(self):
distance = Thread(target=self.update,args=())
distance.start()
and this is the code for the display:
while True:
# Grab frame from video stream
frame1 = videostream.read()
# Acquire frame and resize to expected shape [1xHxWx3]
frame = frame1.copy()
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (width, height))
input_data = np.expand_dims(frame_resized, axis=0)
# Normalize pixel values if using a floating model (i.e. if model is non-quantized)
if floating_model:
input_data = (np.float32(input_data) - input_mean) / input_std
# Perform the actual detection by running the model with the image as input
interpreter.set_tensor(input_details[0]['index'],input_data)
interpreter.invoke()
# Retrieve detection results
boxes = interpreter.get_tensor(output_details[boxes_idx]['index'])[0] # Bounding box coordinates of detected objects
classes = interpreter.get_tensor(output_details[classes_idx]['index'])[0] # Class index of detected objects
scores = interpreter.get_tensor(output_details[scores_idx]['index'])[0] # Confidence of detected objects
# Loop over all detections and draw detection box if confidence is above minimum threshold
for i in range(len(scores)):
if ((scores[i] > min_conf_threshold) and (scores[i] <= 1.0)):
# Get bounding box coordinates and draw box
# Interpreter can return coordinates that are outside of image dimensions, need to force them to be within image using max() and min()
ymin = int(max(1,(boxes[i][0] * imH)))
xmin = int(max(1,(boxes[i][1] * imW)))
ymax = int(min(imH,(boxes[i][2] * imH)))
xmax = int(min(imW,(boxes[i][3] * imW)))
cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
# Draw label
object_name = labels[int(classes[i])] # Look up object name from "labels" array using class index
label = '%s: %d%%' % (object_name, int(scores[i]*100)) # Example: 'person: 72%'
labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2) # Get font size
label_ymin = max(ymin, labelSize[1] + 10) # Make sure not to draw label too close to top of window
cv2.rectangle(frame, (xmin, label_ymin-labelSize[1]-10), (xmin+labelSize[0], label_ymin+baseLine-10), (255, 255, 255), cv2.FILLED) # Draw white box to put label text in
cv2.putText(frame, label, (xmin, label_ymin-7), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 0), 2) # Draw label text
#Draw LiDAR Distance in corner of frame
cv2.putText(frame,'Distance: %s cm' %dist_data,(900, 700),cv2.FONT_HERSHEY_SIMPLEX,1,(255,0,0),2,cv2.LINE_AA)
#Draw Rudder in corner of frame
cv2.putText(frame,'Rudder dir: forward!',(50, 700),cv2.FONT_HERSHEY_SIMPLEX,1,(255,0,0),2,cv2.LINE_AA)
# All the results have been drawn on the frame, so it's time to display it.
cv2.imshow('Object detector', frame)
# Press 'q' to quit
if cv2.waitKey(1) == ord('q'):
break
I've tried putting the .update() in the while True loop but it stalls the program and ended up crashing it. I've tried turning the serial port on'off, edited the boot.config to allow uart and turned off the USB serial.
So I have started a program that takes two images, one that's the model image and the other that's an image with a change I want it to detect the differences and show me with circling the differences. I have come to an issue with finding the difference coordinates as my circle keeps ending up in the middle of the image.
This is the code I have:
import cv2 as cv
import numpy as np
from PIL import Image, ImageChops
#Ideal Image and The main Image
img2= cv.imread("ideal.jpg")
img1 = cv.imread("Actual.jpg")
#Verifys if there is or isnt a differance in the Image for the If statement
diff = cv.subtract(img2, img1)
results = not np.any(diff)
#Tells the User if there is a Differance within the 2 images with the model image and the image given
if results is True:
print("The Images are the same!")
else:
print("The images are differant")
#This is to make the image show the differance to circle
img_1=Image.open("Actual.jpg")
img_2=Image.open("ideal.jpg")
diff=ImageChops.difference(img_1,img_2)
diff.save("Differance.jpg")
#Reads the image Just saved
Differance = cv.imread("Differance.jpg", 0)
#Resize the Image to make it smaller
img1s = cv.resize(img1, (0, 0), fx=0.5, fy=0.5)
Differance = cv.resize(Differance, (0, 0), fx=0.5, fy=0.5)
# Find anything not black, i.e. The differance
nz = cv.findNonZero(Differance)
# Find top, bottom, left and right edge of the Differance
a = nz[:,0,0].min()
b = nz[:,0,0].max()
c = nz[:,0,1].min()
d = nz[:,0,1].max()
# Average top and bottom edges, left and right edges, to give centre
c0 = (a+b)/2
c1 = (c+d)/2
#The Center Coords
c3 = (int(c0),int(c1))
#Values for the below code so it doesnt look messy
radius = 50
color = (0, 0, 255)
thickness = 2
#This Places a Circle around the center of the differance
Finished = cv.circle(img1s, c3, radius, color, thickness)
#Saves the Final Image with the circle around it
cv.imwrite("Final.jpg", Finished)
And the Images attached 1
2
This code currently takes both images and blacks out the background leaving only the difference within the image then the program is meant to take the location of the difference and place a circle around the center of the main image that is the one with the difference on it.
Your main problem is JPG format which changes pixels to better compress image - and this creates differences in all area. If you display diff or difference then you should see many gray pixels
I hope you see pixels below ball
If you use PNG for original image (without ball) and later use this image to create image with ball and also save in PNG then code will works correctly.
My version without PIL.
Press any key to close window with image.
import cv2 as cv
import numpy as np
# load images
img1 = cv.imread("img1.png")
img2 = cv.imread("img2.png")
# calculate difference
diff = cv.subtract(img1, img2) # other order `(img2, img1)` gives worse result
# saves difference
cv.imwrite("difference.png", diff)
# show difference - press any key to close
cv.imshow('diff', diff)
cv.waitKey(0)
cv.destroyWindow('diff')
if not np.any(diff):
print("The images are the same!")
else:
print("The images are differant")
# resize images to make them smaller
#img1_resized = cv.resize(img1, (0, 0), fx=0.5, fy=0.5)
#diff_resized = cv.resize(diff, (0, 0), fx=0.5, fy=0.5)
img1_resized = img1
diff_resized = diff
# convert to grayscale (without saving and loading again)
diff_resized = cv.cvtColor(diff_resized, cv.COLOR_BGR2GRAY)
# find anything not black in differance
non_zero = cv.findNonZero(diff_resized)
#print(non_zero)
# find top, bottom, left and right edge of the differance
x_min = non_zero[:,0,0].min()
x_max = non_zero[:,0,0].max()
y_min = non_zero[:,0,1].min()
y_max = non_zero[:,0,1].max()
print('x:', x_min, x_max)
print('y:', y_min, y_max)
sizes = [x_max-x_min+1, y_max-y_min+1]
print('width :', sizes[0])
print('height:', sizes[1])
# center
center_x = (x_min + x_max) // 2
center_y = (y_min + y_max) // 2
center = (center_x, center_y)
print('center:', center)
# radius
radius = max(sizes) // 2
print('radius:', radius)
color = (0, 0, 255)
thickness = 2
# draw circle around the center of the differance
finished = cv.circle(img1_resized, center, radius, color, thickness)
# saves final image with circle
#cv.imwrite("final.png", finished)
# show final image - press any key to close
cv.imshow('finished', finished)
cv.waitKey(0)
cv.destroyWindow('finished')
img1.png
img2.png
difference.png
final.png
EDIT:
If you work with JPG then you can try to reduce noises
diff = cv.subtract(img1, img2)
diff_gray = cv.cvtColor(diff, cv.COLOR_BGR2GRAY)
diff_gray[diff_gray < 50] = 0
For different images you may need different values instead of 50.
You may also try thresholding
(_, diff_gray) = cv.threshold(diff_gray, 50, 0, cv.THRESH_TOZERO)
It may need also other functions like blur(), erode(), dilate(),
do not need PIL
take Differance image
threshold it
use findcontour to find regions
if contours finded then draw it
for cnt in contours:
out_image = cv2.drawContours(out_image, [cnt], 0, (255,0,0), -1)
(x,y),radius = cv2.minEnclosingCircle(cnt)
center = (int(x),int(y))
radius = int(radius)
out_image = cv2.circle(out_image,center,radius,(0,255,0),2)
I wrote a small script in python where I'm trying to extract or crop the part of the playing card that represents the artwork only, removing all the rest. I've been trying various methods of thresholding but couldn't get there. Also note that I can't simply record manually the position of the artwork because it's not always in the same position or size, but always in a rectangular shape where everything else is just text and borders.
from matplotlib import pyplot as plt
import cv2
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret,binary = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
binary = cv2.bitwise_not(binary)
kernel = np.ones((15, 15), np.uint8)
closing = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
plt.imshow(closing),plt.show()
The current output is the closest thing I could get. I could be on the right way and try some further wrangling to draw a rectangle around the white parts, but I don't think it's a sustainable method :
As a last note, see the cards below, not all frames are exactly the same sizes or positions, but there's always a piece of artwork with only text and borders around it. It doesn't have to be super precisely cut, but clearly the art is a "region" of the card, surrounded by other regions containing some text. My goal is to try to capture the region of the artwork as well as I can.
I used Hough line transform to detect linear parts of the image.
The crossings of all lines were used to construct all possible rectangles, which do not contain other crossing points.
Since the part of the card you are looking for is always the biggest of those rectangles (at least in the samples you provided), i simply chose the biggest of those rectangles as winner.
The script works without user interaction.
import cv2
import numpy as np
from collections import defaultdict
def segment_by_angle_kmeans(lines, k=2, **kwargs):
#Groups lines based on angle with k-means.
#Uses k-means on the coordinates of the angle on the unit circle
#to segment `k` angles inside `lines`.
# Define criteria = (type, max_iter, epsilon)
default_criteria_type = cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER
criteria = kwargs.get('criteria', (default_criteria_type, 10, 1.0))
flags = kwargs.get('flags', cv2.KMEANS_RANDOM_CENTERS)
attempts = kwargs.get('attempts', 10)
# returns angles in [0, pi] in radians
angles = np.array([line[0][1] for line in lines])
# multiply the angles by two and find coordinates of that angle
pts = np.array([[np.cos(2*angle), np.sin(2*angle)]
for angle in angles], dtype=np.float32)
# run kmeans on the coords
labels, centers = cv2.kmeans(pts, k, None, criteria, attempts, flags)[1:]
labels = labels.reshape(-1) # transpose to row vec
# segment lines based on their kmeans label
segmented = defaultdict(list)
for i, line in zip(range(len(lines)), lines):
segmented[labels[i]].append(line)
segmented = list(segmented.values())
return segmented
def intersection(line1, line2):
#Finds the intersection of two lines given in Hesse normal form.
#Returns closest integer pixel locations.
#See https://stackoverflow.com/a/383527/5087436
rho1, theta1 = line1[0]
rho2, theta2 = line2[0]
A = np.array([
[np.cos(theta1), np.sin(theta1)],
[np.cos(theta2), np.sin(theta2)]
])
b = np.array([[rho1], [rho2]])
x0, y0 = np.linalg.solve(A, b)
x0, y0 = int(np.round(x0)), int(np.round(y0))
return [[x0, y0]]
def segmented_intersections(lines):
#Finds the intersections between groups of lines.
intersections = []
for i, group in enumerate(lines[:-1]):
for next_group in lines[i+1:]:
for line1 in group:
for line2 in next_group:
intersections.append(intersection(line1, line2))
return intersections
def rect_from_crossings(crossings):
#find all rectangles without other points inside
rectangles = []
# Search all possible rectangles
for i in range(len(crossings)):
x1= int(crossings[i][0][0])
y1= int(crossings[i][0][1])
for j in range(len(crossings)):
x2= int(crossings[j][0][0])
y2= int(crossings[j][0][1])
#Search all points
flag = 1
for k in range(len(crossings)):
x3= int(crossings[k][0][0])
y3= int(crossings[k][0][1])
#Dont count double (reverse rectangles)
if (x1 > x2 or y1 > y2):
flag = 0
#Dont count rectangles with points inside
elif ((((x3 >= x1) and (x2 >= x3))and (y3 > y1) and (y2 > y3) or ((x3 > x1) and (x2 > x3))and (y3 >= y1) and (y2 >= y3))):
if(i!=k and j!=k):
flag = 0
if flag:
rectangles.append([[x1,y1],[x2,y2]])
return rectangles
if __name__ == '__main__':
#img = cv2.imread('TAJFp.jpg')
#img = cv2.imread('Bj2uu.jpg')
img = cv2.imread('yi8db.png')
width = int(img.shape[1])
height = int(img.shape[0])
scale = 380/width
dim = (int(width*scale), int(height*scale))
# resize image
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
img2 = img.copy()
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(5,5),cv2.BORDER_DEFAULT)
# Parameters of Canny and Hough may have to be tweaked to work for as many cards as possible
edges = cv2.Canny(gray,10,45,apertureSize = 7)
lines = cv2.HoughLines(edges,1,np.pi/90,160)
segmented = segment_by_angle_kmeans(lines)
crossings = segmented_intersections(segmented)
rectangles = rect_from_crossings(crossings)
#Find biggest remaining rectangle
size = 0
for i in range(len(rectangles)):
x1 = rectangles[i][0][0]
x2 = rectangles[i][1][0]
y1 = rectangles[i][0][1]
y2 = rectangles[i][1][1]
if(size < (abs(x1-x2)*abs(y1-y2))):
size = abs(x1-x2)*abs(y1-y2)
x1_rect = x1
x2_rect = x2
y1_rect = y1
y2_rect = y2
cv2.rectangle(img2, (x1_rect,y1_rect), (x2_rect,y2_rect), (0,0,255), 2)
roi = img[y1_rect:y2_rect, x1_rect:x2_rect]
cv2.imshow("Output",roi)
cv2.imwrite("Output.png", roi)
cv2.waitKey()
These are the results with the samples you provided:
The code for finding line crossings can be found here: find intersection point of two lines drawn using houghlines opencv
You can read more about Hough Lines here.
We know that cards have straight boundaries along the x and y axes. We can use this to extract parts of the image. The following code implements detecting horizontal and vertical lines in the image.
import cv2
import numpy as np
def mouse_callback(event, x, y, flags, params):
global num_click
if num_click < 2 and event == cv2.EVENT_LBUTTONDOWN:
num_click = num_click + 1
print(num_click)
global upper_bound, lower_bound, left_bound, right_bound
upper_bound.append(max(i for i in hor if i < y) + 1)
lower_bound.append(min(i for i in hor if i > y) - 1)
left_bound.append(max(i for i in ver if i < x) + 1)
right_bound.append(min(i for i in ver if i > x) - 1)
filename = 'image.png'
thr = 100 # edge detection threshold
lined = 50 # number of consequtive True pixels required an axis to be counted as line
num_click = 0 # select only twice
upper_bound, lower_bound, left_bound, right_bound = [], [], [], []
winname = 'img'
cv2.namedWindow(winname)
cv2.setMouseCallback(winname, mouse_callback)
img = cv2.imread(filename, 1)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
bw = cv2.Canny(gray, thr, 3*thr)
height, width, _ = img.shape
# find horizontal lines
hor = []
for i in range (0, height-1):
count = 0
for j in range (0, width-1):
if bw[i,j]:
count = count + 1
else:
count = 0
if count >= lined:
hor.append(i)
break
# find vertical lines
ver = []
for j in range (0, width-1):
count = 0
for i in range (0, height-1):
if bw[i,j]:
count = count + 1
else:
count = 0
if count >= lined:
ver.append(j)
break
# draw lines
disp_img = np.copy(img)
for i in hor:
cv2.line(disp_img, (0, i), (width-1, i), (0,0,255), 1)
for i in ver:
cv2.line(disp_img, (i, 0), (i, height-1), (0,0,255), 1)
while num_click < 2:
cv2.imshow(winname, disp_img)
cv2.waitKey(10)
disp_img = img[min(upper_bound):max(lower_bound), min(left_bound):max(right_bound)]
cv2.imshow(winname, disp_img)
cv2.waitKey() # Press any key to exit
cv2.destroyAllWindows()
You just need to click two areas to include. A sample click area and the corresponding result are as follows:
Results from other images:
I don't think it is possible to automatically crop the artwork ROI using traditional image processing techniques due to the dynamic nature of the colors, dimensions, locations, and textures for each card. You would have to look into machine/deep learning and train your own classifier if you want to do it automatically. Instead, here's a manual approach to select and crop a static ROI from an image.
The idea is to use cv2.setMouseCallback() and event handlers to detect if the mouse has been clicked or released. For this implementation, you can extract the artwork ROI by holding down the left mouse button and dragging to select the desired ROI. Once you have selected the desired ROI, press c to crop and save the ROI. You can reset the ROI using the right mouse button.
Saved artwork ROIs
Code
import cv2
class ExtractArtworkROI(object):
def __init__(self):
# Load image
self.original_image = cv2.imread('1.png')
self.clone = self.original_image.copy()
cv2.namedWindow('image')
cv2.setMouseCallback('image', self.extractROI)
self.selected_ROI = False
# ROI bounding box reference points
self.image_coordinates = []
def extractROI(self, event, x, y, flags, parameters):
# Record starting (x,y) coordinates on left mouse button click
if event == cv2.EVENT_LBUTTONDOWN:
self.image_coordinates = [(x,y)]
# Record ending (x,y) coordintes on left mouse button release
elif event == cv2.EVENT_LBUTTONUP:
# Remove old bounding box
if self.selected_ROI:
self.clone = self.original_image.copy()
# Draw rectangle
self.selected_ROI = True
self.image_coordinates.append((x,y))
cv2.rectangle(self.clone, self.image_coordinates[0], self.image_coordinates[1], (36,255,12), 2)
print('top left: {}, bottom right: {}'.format(self.image_coordinates[0], self.image_coordinates[1]))
print('x,y,w,h : ({}, {}, {}, {})'.format(self.image_coordinates[0][0], self.image_coordinates[0][1], self.image_coordinates[1][0] - self.image_coordinates[0][0], self.image_coordinates[1][1] - self.image_coordinates[0][1]))
# Clear drawing boxes on right mouse button click
elif event == cv2.EVENT_RBUTTONDOWN:
self.selected_ROI = False
self.clone = self.original_image.copy()
def show_image(self):
return self.clone
def crop_ROI(self):
if self.selected_ROI:
x1 = self.image_coordinates[0][0]
y1 = self.image_coordinates[0][1]
x2 = self.image_coordinates[1][0]
y2 = self.image_coordinates[1][1]
# Extract ROI
self.cropped_image = self.original_image.copy()[y1:y2, x1:x2]
# Display and save image
cv2.imshow('Cropped Image', self.cropped_image)
cv2.imwrite('ROI.png', self.cropped_image)
else:
print('Select ROI before cropping!')
if __name__ == '__main__':
extractArtworkROI = ExtractArtworkROI()
while True:
cv2.imshow('image', extractArtworkROI.show_image())
key = cv2.waitKey(1)
# Close program with keyboard 'q'
if key == ord('q'):
cv2.destroyAllWindows()
exit(1)
# Crop ROI
if key == ord('c'):
extractArtworkROI.crop_ROI()
I have a bunch of images with respective bounding box co-ordinates (x,y,w,h). Some of the bounding boxes are rectangular, so firstly I want to make them square while still centered on the region of interest. Using the following example of an apple, with a bounding box on the stalk, I'd want to expand the box to a square while still keeping it centered on the stalk.
Secondly, after I've extracted out the contents of the bounding box, I want to capture contextual information by increasing the bounding box size by n pixels and extracting and then repeat. After that, I want to shift the geometric center of the region of interest just by a few pixels and repeat the multiple bounding box extraction. Like the below image, where the differently colored boxes represent the different boxes I want to extract. The right image shows the small shift in center that I want to achieve.
I have an idea on how to do this in numpy, but are there any higher-level functions/libraries that would help me with defining the bounding box and manipulating it as such?
I use this image to do the same effects:
The code and the comment(as description):
#!/usr/bin/python3
# 2017.11.25 17:10:34 CST
# 2017.12.01 11:23:02 CST
import cv2
import numpy as np
## Read and copy
img = cv2.imread("cat.jpg")
canvas = img.copy()
## set and crop the ROI
x,y,w,h = bbox = (180, 100, 50, 100)
cv2.rectangle(canvas, (x,y), (x+w,y+h), (0,0,255), 2)
croped = img[y:y+h, x:x+w]
cv2.imshow("croped", croped)
## get the center and the radius
cx = x+w//2
cy = y+h//2
cr = max(w,h)//2
## set offset, repeat enlarger ROI
dr = 10
for i in range(0,4):
r = cr+i*dr
cv2.rectangle(canvas, (cx-r, cy-r), (cx+r, cy+r), (0,255,0), 1)
croped = img[cy-r:cy+r, cx-r:cx+r]
cv2.imshow("croped{}".format(i), croped)
## display
cv2.imshow("source", canvas)
cv2.waitKey()
cv2.destroyAllWindows()
The result:
Handling Corner Cases (Improved)
Hi, I just literally used Github Co-Pilot to Generate this code, which increases the size of the bounding box by 10 percent, and this one also can handle corner cases.
# expand bounding box by 10 percent
x = x - (w * 0.1)
y = y - (h * 0.1)
w = w + (w * 0.2)
h = h + (h * 0.2)
# make sure bounding box is within frame
x = max(int(x), 0)
y = max(int(y), 0)
w = min(int(w), imw - x)
h = min(int(h), imh - y)
# getting the bounding box
face_image = frame[y:w+y,x:x+h]
Before and After
White Represents Original Bounding Box and Green the New One.
A Javascript solution with proportionate bounding box values:
const expansionFactor = 0.1;
const x = imageWidth * boundingBox["Left"];
const y = imageHeight * boundingBox["Top"];
const w = imageWidth * boundingBox["Width"];
const h = imageHeight * boundingBox["Height"];
const left = Math.max(x - w * expansionFactor, 0);
const top = Math.max(y - h * expansionFactor, 0);
const width = Math.min(w + w * 2 * expansionFactor, imageWidth - left);
const height = Math.min(h + h * 2 * expansionFactor, imageHeight - top);
I'm still hacking together a book scanning script, and for now, all I need is to be able to automagically detect a page turn. The book fills up 90% of the screen (I'm using a cruddy webcam for the motion detection), so when I turn a page, the direction of motion is basically in that same direction.
I have modified a motion-tracking script, but derivatives are getting me nowhere:
#!/usr/bin/env python
import cv, numpy
class Target:
def __init__(self):
self.capture = cv.CaptureFromCAM(0)
cv.NamedWindow("Target", 1)
def run(self):
# Capture first frame to get size
frame = cv.QueryFrame(self.capture)
frame_size = cv.GetSize(frame)
grey_image = cv.CreateImage(cv.GetSize(frame), cv.IPL_DEPTH_8U, 1)
moving_average = cv.CreateImage(cv.GetSize(frame), cv.IPL_DEPTH_32F, 3)
difference = None
movement = []
while True:
# Capture frame from webcam
color_image = cv.QueryFrame(self.capture)
# Smooth to get rid of false positives
cv.Smooth(color_image, color_image, cv.CV_GAUSSIAN, 3, 0)
if not difference:
# Initialize
difference = cv.CloneImage(color_image)
temp = cv.CloneImage(color_image)
cv.ConvertScale(color_image, moving_average, 1.0, 0.0)
else:
cv.RunningAvg(color_image, moving_average, 0.020, None)
# Convert the scale of the moving average.
cv.ConvertScale(moving_average, temp, 1.0, 0.0)
# Minus the current frame from the moving average.
cv.AbsDiff(color_image, temp, difference)
# Convert the image to grayscale.
cv.CvtColor(difference, grey_image, cv.CV_RGB2GRAY)
# Convert the image to black and white.
cv.Threshold(grey_image, grey_image, 70, 255, cv.CV_THRESH_BINARY)
# Dilate and erode to get object blobs
cv.Dilate(grey_image, grey_image, None, 18)
cv.Erode(grey_image, grey_image, None, 10)
# Calculate movements
storage = cv.CreateMemStorage(0)
contour = cv.FindContours(grey_image, storage, cv.CV_RETR_CCOMP, cv.CV_CHAIN_APPROX_SIMPLE)
points = []
while contour:
# Draw rectangles
bound_rect = cv.BoundingRect(list(contour))
contour = contour.h_next()
pt1 = (bound_rect[0], bound_rect[1])
pt2 = (bound_rect[0] + bound_rect[2], bound_rect[1] + bound_rect[3])
points.append(pt1)
points.append(pt2)
cv.Rectangle(color_image, pt1, pt2, cv.CV_RGB(255,0,0), 1)
num_points = len(points)
if num_points:
x = 0
for point in points:
x += point[0]
x /= num_points
movement.append(x)
if len(movement) > 0 and numpy.average(numpy.diff(movement[-30:-1])) > 0:
print 'Left'
else:
print 'Right'
# Display frame to user
cv.ShowImage("Target", color_image)
# Listen for ESC or ENTER key
c = cv.WaitKey(7) % 0x100
if c == 27 or c == 10:
break
if __name__=="__main__":
t = Target()
t.run()
It detects the average motion of the average center of all of the boxes, which is extremely inefficient. How would I go about detecting such motions quickly and accurately (i.e. within a threshold)?
I'm using Python, and I plan to stick with it, as my whole framework is based on Python.
And help is appreciated, so thank you all in advance. Cheers.
I haven't used OpenCV in Python before, just a bit in C++ with openframeworks.
For this I presume OpticalFlow's velx,vely properties would work.
For more on how Optical Flow works check out this paper.
HTH
why don't you use cv.GoodFeaturesToTrack ? it may solve the script runtime ... and shorten the code ...