Related
I was looking for automated ways of doing some basic color corrections and I came across this blog post.
https://www.pyimagesearch.com/2021/02/15/automatic-color-correction-with-opencv-and-python/
python color_correction.py --reference ref.jpg --input input.jpg
To summarize the blog post, it enables to identify the Pantone color card in a given input image, modify the histogram to match the colors on a reference Pantone color card which has the actual colors. Any color shift due to lighting would be adjusted in the inputted color card.
I had one query as an extension to the use case you described in the blog post. While the histogram matching happens well between the two images cropped to the boundaries of the color cards - it is only now applied to the cropped input image where the color card is present. I want to apply this histogram transformation on the entire input image - beyond the color card as well - how do I go about doing that?
Can we save the transformation from the match_histpgram function and apply it to the whole image?
Edit 1:
Here is what I tried.
https://github.com/Sum-Al/color_correction
Ok, here is the final working Script. Thanks also for "code-lukas" hints.
You just need an already optimized color input image and another image which is not color optimized. Both images with Color Card, using ArUCo Marker
(you can glue them on the Corners of every imageCard to for detecting)
https://github.com/dazzafact/image_color_correction
input color optimized:
Input not color optimize
Color optimized Output Image
user the Script with this arguments
python color_correction.py --reference ref.jpg --input input.jpg --output out.jpg
https://github.com/dazzafact/image_color_correction
from imutils.perspective import four_point_transform
from skimage import exposure
import numpy as np
import argparse
import imutils
import cv2
import sys
from os.path import exists
import os.path as pathfile
from PIL import Image
def find_color_card(image):
# load the ArUCo dictionary, grab the ArUCo parameters, and
# detect the markers in the input image
arucoDict = cv2.aruco.Dictionary_get(cv2.aruco.DICT_ARUCO_ORIGINAL)
arucoParams = cv2.aruco.DetectorParameters_create()
(corners, ids, rejected) = cv2.aruco.detectMarkers(image,
arucoDict, parameters=arucoParams)
# try to extract the coordinates of the color correction card
try:
# otherwise, we've found the four ArUco markers, so we can
# continue by flattening the ArUco IDs list
ids = ids.flatten()
# extract the top-left marker
i = np.squeeze(np.where(ids == 923))
topLeft = np.squeeze(corners[i])[0]
# extract the top-right marker
i = np.squeeze(np.where(ids == 1001))
topRight = np.squeeze(corners[i])[1]
# extract the bottom-right marker
i = np.squeeze(np.where(ids == 241))
bottomRight = np.squeeze(corners[i])[2]
# extract the bottom-left marker
i = np.squeeze(np.where(ids == 1007))
bottomLeft = np.squeeze(corners[i])[3]
# we could not find color correction card, so gracefully return
except:
return None
# build our list of reference points and apply a perspective
# transform to obtain a top-down, bird’s-eye view of the color
# matching card
cardCoords = np.array([topLeft, topRight,
bottomRight, bottomLeft])
card = four_point_transform(image, cardCoords)
# return the color matching card to the calling function
return card
def _match_cumulative_cdf_mod(source, template, full):
"""
Return modified full image array so that the cumulative density function of
source array matches the cumulative density function of the template.
"""
src_values, src_unique_indices, src_counts = np.unique(source.ravel(),
return_inverse=True,
return_counts=True)
tmpl_values, tmpl_counts = np.unique(template.ravel(), return_counts=True)
# calculate normalized quantiles for each array
src_quantiles = np.cumsum(src_counts) / source.size
tmpl_quantiles = np.cumsum(tmpl_counts) / template.size
interp_a_values = np.interp(src_quantiles, tmpl_quantiles, tmpl_values)
# Here we compute values which the channel RGB value of full image will be modified to.
interpb = []
for i in range(0, 256):
interpb.append(-1)
# first compute which values in src image transform to and mark those values.
for i in range(0, len(interp_a_values)):
frm = src_values[i]
to = interp_a_values[i]
interpb[frm] = to
# some of the pixel values might not be there in interp_a_values, interpolate those values using their
# previous and next neighbours
prev_value = -1
prev_index = -1
for i in range(0, 256):
if interpb[i] == -1:
next_index = -1
next_value = -1
for j in range(i + 1, 256):
if interpb[j] >= 0:
next_value = interpb[j]
next_index = j
if prev_index < 0:
interpb[i] = (i + 1) * next_value / (next_index + 1)
elif next_index < 0:
interpb[i] = prev_value + ((255 - prev_value) * (i - prev_index) / (255 - prev_index))
else:
interpb[i] = prev_value + (i - prev_index) * (next_value - prev_value) / (next_index - prev_index)
else:
prev_value = interpb[i]
prev_index = i
# finally transform pixel values in full image using interpb interpolation values.
wid = full.shape[1]
hei = full.shape[0]
ret2 = np.zeros((hei, wid))
for i in range(0, hei):
for j in range(0, wid):
ret2[i][j] = interpb[full[i][j]]
return ret2
def match_histograms_mod(inputCard, referenceCard, fullImage):
"""
Return modified full image, by using histogram equalizatin on input and
reference cards and applying that transformation on fullImage.
"""
if inputCard.ndim != referenceCard.ndim:
raise ValueError('Image and reference must have the same number '
'of channels.')
matched = np.empty(fullImage.shape, dtype=fullImage.dtype)
for channel in range(inputCard.shape[-1]):
matched_channel = _match_cumulative_cdf_mod(inputCard[..., channel], referenceCard[..., channel],
fullImage[..., channel])
matched[..., channel] = matched_channel
return matched
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-r", "--reference", required=True,
help="path to the input reference image")
ap.add_argument("-v", "--view", required=False, default=False, action='store_true',
help="Image Preview?")
ap.add_argument("-o", "--output", required=False, default=False,
help="Image Output Path")
ap.add_argument("-i", "--input", required=True,
help="path to the input image to apply color correction to")
args = vars(ap.parse_args())
# load the reference image and input images from disk
print("[INFO] loading images...")
# raw = cv2.imread(args["reference"])
# img1 = cv2.imread(args["input"])
file_exists = pathfile.isfile(args["reference"])
print(file_exists)
if not file_exists:
print('[WARNING] Referenz File not exisits '+str(args["reference"]))
sys.exit()
raw = cv2.imread(args["reference"])
img1 = cv2.imread(args["input"])
# resize the reference and input images
#raw = imutils.resize(raw, width=301)
#img1 = imutils.resize(img1, width=301)
raw = imutils.resize(raw, width=600)
img1 = imutils.resize(img1, width=600)
# display the reference and input images to our screen
if args['view']:
cv2.imshow("Reference", raw)
cv2.imshow("Input", img1)
# find the color matching card in each image
print("[INFO] finding color matching cards...")
rawCard = find_color_card(raw)
imageCard = find_color_card(img1)
# if the color matching card is not found in either the reference
# image or the input image, gracefully exit
if rawCard is None or imageCard is None:
print("[INFO] could not find color matching card in both images")
sys.exit(0)
# show the color matching card in the reference image and input image,
# respectively
if args['view']:
cv2.imshow("Reference Color Card", rawCard)
cv2.imshow("Input Color Card", imageCard)
# apply histogram matching from the color matching card in the
# reference image to the color matching card in the input image
print("[INFO] matching images...")
# imageCard2 = exposure.match_histograms(img1, ref,
# inputCard = exposure.match_histograms(inputCard, referenceCard, multichannel=True)
result2 = match_histograms_mod(imageCard, rawCard, img1)
# show our input color matching card after histogram matching
cv2.imshow("Input Color Card After Matching", inputCard)
if args['view']:
cv2.imshow("result2", result2)
if args['output']:
file_ok = exists(args['output'].lower().endswith(('.png', '.jpg', '.jpeg', '.tiff', '.bmp', '.gif')))
if file_ok:
cv2.imwrite(args['output'], result2)
print("[SUCCESSUL] Your Image was written to: "+args['output']+"")
else:
print("[WARNING] Sorry, But this is no valid Image Name "+args['output']+"\nPlease Change Parameter!")
if args['view']:
cv2.waitKey(0)
if not args['view']:
if not args['output']:
print('[EMPTY] You Need at least one Paramter "--view" or "--output".')
If you follow the skimage tutorial, you can derive the following approach, which utilizes any kind of image and not a colour palette:
import matplotlib.pyplot as plt
import numpy as np
from skimage import data
from skimage import exposure
from skimage.exposure import match_histograms
reference = np.array(data.coffee(), dtype=np.uint8)
image = np.array(data.chelsea(), dtype=np.uint8)
matched = match_histograms(image, reference, channel_axis=-1)
test = match_histograms(matched, image, channel_axis=-1)
fig, (ax1, ax2, ax3) = plt.subplots(nrows=1, ncols=3, figsize=(8, 3), sharex=True, sharey=True)
for aa in (ax1, ax2, ax3):
aa.set_axis_off()
ax1.imshow(image)
ax1.set_title('Source')
ax2.imshow(matched)
ax2.set_title('Reference')
ax3.imshow(test)
ax3.set_title('Matched')
plt.tight_layout()
plt.show()
Which yields the following result:
You can also have a look at the corresponding histograms:
fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(8, 8))
for i, img in enumerate((image, matched, test)):
for c, c_color in enumerate(('red', 'green', 'blue')):
img_hist, bins = exposure.histogram(img[..., c], source_range='dtype')
axes[c, i].plot(bins, img_hist / img_hist.max())
img_cdf, bins = exposure.cumulative_distribution(img[..., c])
axes[c, i].plot(bins, img_cdf)
axes[c, 0].set_ylabel(c_color)
axes[0, 0].set_title('Source')
axes[0, 1].set_title('Reference')
axes[0, 2].set_title('Matched')
plt.tight_layout()
plt.show()
As you can observe, the histograms of the reference and the 'matched' image look similar after the matching process.
Edit: Note that the multichannel argument is deprecated in favor of the channel_axis argument as of version 0.19. Reference
Edit 2: If you want to store this 'transformation' you have two options:
The first and straightforward method is to keep passing your reference image when applying the matching.
The alternative would be to store the relevant quantile for each channel calculated by skimage's _match_cumulative_cdf function, which is used by match_histograms under the hood and apply the interpolation in the same manner as the function.
def _match_cumulative_cdf(source, template):
"""
Return modified source array so that the cumulative density function of
its values matches the cumulative density function of the template.
"""
src_values, src_unique_indices, src_counts = np.unique(source.ravel(),
return_inverse=True,
return_counts=True)
tmpl_values, tmpl_counts = np.unique(template.ravel(), return_counts=True)
# calculate normalized quantiles for each array
src_quantiles = np.cumsum(src_counts) / source.size
tmpl_quantiles = np.cumsum(tmpl_counts) / template.size
interp_a_values = np.interp(src_quantiles, tmpl_quantiles, tmpl_values)
return interp_a_values[src_unique_indices].reshape(source.shape)
I don't have much experience with PIL and I've got these images edited from a stack of microscopy image cells, each one is in a mask of an image size 30x30. I've been struggling to put these cells in a black background as closest as possible to each other without overlapping.
My code is the following:
def spread_circles(circles, rad, iterations,step):
radsqr = rad**2
for i in range(iterations):
for ix,c in enumerate(circles):
vecs = c-circles
dists = np.sum((vecs)**2,axis=1)
if len(dists)>0:
push = (vecs[dists<radsqr,:].T*dists[dists<radsqr]).T
push = np.sum(push,axis=0)
pushmag = np.sum(push*push)**0.5
if pushmag>0:
push = push/pushmag*step
circles[ix]+=push
return circles
def gen_image(sample,n_iter, height=850, width = 850, max_shape=30, num_circles=150):
circles = np.random.uniform(low=max_shape,high=height-max_shape,size=(num_circles,2))
circles = spread_circles(circles, max_shape, n_iter, 1).astype(int)
img = Image.new(mode='F',size=(height,width),color=0).convert('RGBA')
final1 = Image.new("RGBA", size=(height,width))
final1.paste(img, (0,0), img)
for n,c in enumerate(circles):
foreground = sample[n]
final1.paste(foreground, (c[0],c[1]), foreground)
return final1
But it's hard to avoid overlapping if I do few iterations, and if I Increase they'd be too much sparsed, like this:
What I want it's something similar like inside the red circles that I drew :
I need them closer as they can get, almost like tiles. How can I do that?
I have started thinking about this and have got a couple of strategies implemented. Anyone else fancying some fun is more than welcome to borrow, steal, appropriate or hack any chunks of my code that they can use! I'll probably play some more tomorrow.
#!/usr/bin/env python3
from PIL import Image, ImageOps
import numpy as np
from glob import glob
import math
def checkCoverage(im):
"""Determines percentage of image that is cells rather than background"""
N = np.count_nonzero(im)
return N * 100 / im.size
def loadImages():
"""Load all cell images in current directory into list of trimmed Numpy arrays"""
images = []
for filename in glob('*.png'):
# Open and convert to greyscale
im = Image.open(filename).convert('L')
# Trim to bounding box
im = im.crop(im.getbbox())
images.append(np.array(im))
return images
def Strategy1():
"""Get largest image and pad all images to that size - at least it will tesselate perfectly"""
images = loadImages()
N = len(images)
# Find height of tallest image and width of widest image
maxh = max(im.shape[0] for im in images)
maxw = max(im.shape[1] for im in images)
# Determine how many images we will pack across and down the output image - could be improved
Nx = int(math.sqrt(N))+1
Ny = int(N/Nx)+1
print(f'Padding {N} images each to height:{maxh} x width:{maxw}')
# Create output image
res = Image.new('L', (Nx*maxw,Ny*maxh), color=0)
# Pack all images from list onto regular grid
x, y = 0, 0
for im in images:
this = Image.fromarray(im)
h, w = im.shape
# Pack this image into top-left of its grid-cell, unless
# a) in first row, in which case pack to bottom
# b) in first col, in which case pack to right
thisx = x*maxw
thisy = y*maxh
if y==0:
thisy += maxh - h
if x==0:
thisx += maxw - w
res.paste(this, (thisx,thisy))
x += 1
if x==Nx:
x = 0
y += 1
# Trim extraneous black edges
res = res.crop(res.getbbox())
# Save as JPEG so we don't find it as a PNG in next strategy
res.save('strategy1.jpg')
cov = checkCoverage(np.array(res))
print(f'Strategy1 coverage: {cov}')
def Strategy2():
"""Rotate all images to portrait (tall rather than wide) and order by height so we tend to stack equal height images side-by-side"""
tmp = loadImages()
# Recreate list with all images in portrait format, i.e. tall
portrait = []
for im in tmp:
if im.shape[0] >= im.shape[1]:
# Already portrait, add as-is
portrait.append(im)
else:
# Landscape, so rotate
portrait.append(np.rot90(im))
images = sorted(portrait, key=lambda x: x.shape[0], reverse=True)
N = len(images)
maxh, maxw = 31, 31
# Determine how many images we will pack across and down the output image
Nx = int(math.sqrt(N))+1
Ny = int(N/Nx)+1
print(f'Packing images by height')
# Create output image
resw, resh = Nx*maxw, Ny*maxh
res = Image.new('L', (resw,resh), color=0)
# Pack all from list
xpos, ypos = 0, 0
# Pack first row L->R, second row R->L and alternate
packToRight = True
for im in images:
thisw, thish = im.shape
this = Image.fromarray(im)
if packToRight:
if xpos+thisw < resw:
# If it fits to the right, pack it there
res.paste(this,(xpos,ypos))
xpos += thisw
else:
# Else start a new row, pack at right end and continue packing to left
packToRight = False
res.paste(this,(resw-thisw,ypos))
ypos = res.getbbox()[3]
else:
if xpos>thisw:
# If it fits to the left, pack it there
res.paste(this,(xpos-thisw,ypos))
xpos -= thisw
else:
# Else start a new row, pack at left end and continue packing to right
ypos = res.getbbox()[3]
packToRight = True
res.paste(this,(0,ypos))
# Trim any black edges
res = res.crop(res.getbbox())
# Save as JPEG so we don't find it as a PNG in next strategy
res.save('strategy2.jpg')
cov = checkCoverage(np.array(res))
print(f'Strategy2 coverage: {cov}')
Strategy1()
Strategy2()
Strategy1 gives this at 42% coverage:
Strategy2 gives this at 64% coverage:
I wrote a small script in python where I'm trying to extract or crop the part of the playing card that represents the artwork only, removing all the rest. I've been trying various methods of thresholding but couldn't get there. Also note that I can't simply record manually the position of the artwork because it's not always in the same position or size, but always in a rectangular shape where everything else is just text and borders.
from matplotlib import pyplot as plt
import cv2
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret,binary = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
binary = cv2.bitwise_not(binary)
kernel = np.ones((15, 15), np.uint8)
closing = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
plt.imshow(closing),plt.show()
The current output is the closest thing I could get. I could be on the right way and try some further wrangling to draw a rectangle around the white parts, but I don't think it's a sustainable method :
As a last note, see the cards below, not all frames are exactly the same sizes or positions, but there's always a piece of artwork with only text and borders around it. It doesn't have to be super precisely cut, but clearly the art is a "region" of the card, surrounded by other regions containing some text. My goal is to try to capture the region of the artwork as well as I can.
I used Hough line transform to detect linear parts of the image.
The crossings of all lines were used to construct all possible rectangles, which do not contain other crossing points.
Since the part of the card you are looking for is always the biggest of those rectangles (at least in the samples you provided), i simply chose the biggest of those rectangles as winner.
The script works without user interaction.
import cv2
import numpy as np
from collections import defaultdict
def segment_by_angle_kmeans(lines, k=2, **kwargs):
#Groups lines based on angle with k-means.
#Uses k-means on the coordinates of the angle on the unit circle
#to segment `k` angles inside `lines`.
# Define criteria = (type, max_iter, epsilon)
default_criteria_type = cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER
criteria = kwargs.get('criteria', (default_criteria_type, 10, 1.0))
flags = kwargs.get('flags', cv2.KMEANS_RANDOM_CENTERS)
attempts = kwargs.get('attempts', 10)
# returns angles in [0, pi] in radians
angles = np.array([line[0][1] for line in lines])
# multiply the angles by two and find coordinates of that angle
pts = np.array([[np.cos(2*angle), np.sin(2*angle)]
for angle in angles], dtype=np.float32)
# run kmeans on the coords
labels, centers = cv2.kmeans(pts, k, None, criteria, attempts, flags)[1:]
labels = labels.reshape(-1) # transpose to row vec
# segment lines based on their kmeans label
segmented = defaultdict(list)
for i, line in zip(range(len(lines)), lines):
segmented[labels[i]].append(line)
segmented = list(segmented.values())
return segmented
def intersection(line1, line2):
#Finds the intersection of two lines given in Hesse normal form.
#Returns closest integer pixel locations.
#See https://stackoverflow.com/a/383527/5087436
rho1, theta1 = line1[0]
rho2, theta2 = line2[0]
A = np.array([
[np.cos(theta1), np.sin(theta1)],
[np.cos(theta2), np.sin(theta2)]
])
b = np.array([[rho1], [rho2]])
x0, y0 = np.linalg.solve(A, b)
x0, y0 = int(np.round(x0)), int(np.round(y0))
return [[x0, y0]]
def segmented_intersections(lines):
#Finds the intersections between groups of lines.
intersections = []
for i, group in enumerate(lines[:-1]):
for next_group in lines[i+1:]:
for line1 in group:
for line2 in next_group:
intersections.append(intersection(line1, line2))
return intersections
def rect_from_crossings(crossings):
#find all rectangles without other points inside
rectangles = []
# Search all possible rectangles
for i in range(len(crossings)):
x1= int(crossings[i][0][0])
y1= int(crossings[i][0][1])
for j in range(len(crossings)):
x2= int(crossings[j][0][0])
y2= int(crossings[j][0][1])
#Search all points
flag = 1
for k in range(len(crossings)):
x3= int(crossings[k][0][0])
y3= int(crossings[k][0][1])
#Dont count double (reverse rectangles)
if (x1 > x2 or y1 > y2):
flag = 0
#Dont count rectangles with points inside
elif ((((x3 >= x1) and (x2 >= x3))and (y3 > y1) and (y2 > y3) or ((x3 > x1) and (x2 > x3))and (y3 >= y1) and (y2 >= y3))):
if(i!=k and j!=k):
flag = 0
if flag:
rectangles.append([[x1,y1],[x2,y2]])
return rectangles
if __name__ == '__main__':
#img = cv2.imread('TAJFp.jpg')
#img = cv2.imread('Bj2uu.jpg')
img = cv2.imread('yi8db.png')
width = int(img.shape[1])
height = int(img.shape[0])
scale = 380/width
dim = (int(width*scale), int(height*scale))
# resize image
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
img2 = img.copy()
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(5,5),cv2.BORDER_DEFAULT)
# Parameters of Canny and Hough may have to be tweaked to work for as many cards as possible
edges = cv2.Canny(gray,10,45,apertureSize = 7)
lines = cv2.HoughLines(edges,1,np.pi/90,160)
segmented = segment_by_angle_kmeans(lines)
crossings = segmented_intersections(segmented)
rectangles = rect_from_crossings(crossings)
#Find biggest remaining rectangle
size = 0
for i in range(len(rectangles)):
x1 = rectangles[i][0][0]
x2 = rectangles[i][1][0]
y1 = rectangles[i][0][1]
y2 = rectangles[i][1][1]
if(size < (abs(x1-x2)*abs(y1-y2))):
size = abs(x1-x2)*abs(y1-y2)
x1_rect = x1
x2_rect = x2
y1_rect = y1
y2_rect = y2
cv2.rectangle(img2, (x1_rect,y1_rect), (x2_rect,y2_rect), (0,0,255), 2)
roi = img[y1_rect:y2_rect, x1_rect:x2_rect]
cv2.imshow("Output",roi)
cv2.imwrite("Output.png", roi)
cv2.waitKey()
These are the results with the samples you provided:
The code for finding line crossings can be found here: find intersection point of two lines drawn using houghlines opencv
You can read more about Hough Lines here.
We know that cards have straight boundaries along the x and y axes. We can use this to extract parts of the image. The following code implements detecting horizontal and vertical lines in the image.
import cv2
import numpy as np
def mouse_callback(event, x, y, flags, params):
global num_click
if num_click < 2 and event == cv2.EVENT_LBUTTONDOWN:
num_click = num_click + 1
print(num_click)
global upper_bound, lower_bound, left_bound, right_bound
upper_bound.append(max(i for i in hor if i < y) + 1)
lower_bound.append(min(i for i in hor if i > y) - 1)
left_bound.append(max(i for i in ver if i < x) + 1)
right_bound.append(min(i for i in ver if i > x) - 1)
filename = 'image.png'
thr = 100 # edge detection threshold
lined = 50 # number of consequtive True pixels required an axis to be counted as line
num_click = 0 # select only twice
upper_bound, lower_bound, left_bound, right_bound = [], [], [], []
winname = 'img'
cv2.namedWindow(winname)
cv2.setMouseCallback(winname, mouse_callback)
img = cv2.imread(filename, 1)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
bw = cv2.Canny(gray, thr, 3*thr)
height, width, _ = img.shape
# find horizontal lines
hor = []
for i in range (0, height-1):
count = 0
for j in range (0, width-1):
if bw[i,j]:
count = count + 1
else:
count = 0
if count >= lined:
hor.append(i)
break
# find vertical lines
ver = []
for j in range (0, width-1):
count = 0
for i in range (0, height-1):
if bw[i,j]:
count = count + 1
else:
count = 0
if count >= lined:
ver.append(j)
break
# draw lines
disp_img = np.copy(img)
for i in hor:
cv2.line(disp_img, (0, i), (width-1, i), (0,0,255), 1)
for i in ver:
cv2.line(disp_img, (i, 0), (i, height-1), (0,0,255), 1)
while num_click < 2:
cv2.imshow(winname, disp_img)
cv2.waitKey(10)
disp_img = img[min(upper_bound):max(lower_bound), min(left_bound):max(right_bound)]
cv2.imshow(winname, disp_img)
cv2.waitKey() # Press any key to exit
cv2.destroyAllWindows()
You just need to click two areas to include. A sample click area and the corresponding result are as follows:
Results from other images:
I don't think it is possible to automatically crop the artwork ROI using traditional image processing techniques due to the dynamic nature of the colors, dimensions, locations, and textures for each card. You would have to look into machine/deep learning and train your own classifier if you want to do it automatically. Instead, here's a manual approach to select and crop a static ROI from an image.
The idea is to use cv2.setMouseCallback() and event handlers to detect if the mouse has been clicked or released. For this implementation, you can extract the artwork ROI by holding down the left mouse button and dragging to select the desired ROI. Once you have selected the desired ROI, press c to crop and save the ROI. You can reset the ROI using the right mouse button.
Saved artwork ROIs
Code
import cv2
class ExtractArtworkROI(object):
def __init__(self):
# Load image
self.original_image = cv2.imread('1.png')
self.clone = self.original_image.copy()
cv2.namedWindow('image')
cv2.setMouseCallback('image', self.extractROI)
self.selected_ROI = False
# ROI bounding box reference points
self.image_coordinates = []
def extractROI(self, event, x, y, flags, parameters):
# Record starting (x,y) coordinates on left mouse button click
if event == cv2.EVENT_LBUTTONDOWN:
self.image_coordinates = [(x,y)]
# Record ending (x,y) coordintes on left mouse button release
elif event == cv2.EVENT_LBUTTONUP:
# Remove old bounding box
if self.selected_ROI:
self.clone = self.original_image.copy()
# Draw rectangle
self.selected_ROI = True
self.image_coordinates.append((x,y))
cv2.rectangle(self.clone, self.image_coordinates[0], self.image_coordinates[1], (36,255,12), 2)
print('top left: {}, bottom right: {}'.format(self.image_coordinates[0], self.image_coordinates[1]))
print('x,y,w,h : ({}, {}, {}, {})'.format(self.image_coordinates[0][0], self.image_coordinates[0][1], self.image_coordinates[1][0] - self.image_coordinates[0][0], self.image_coordinates[1][1] - self.image_coordinates[0][1]))
# Clear drawing boxes on right mouse button click
elif event == cv2.EVENT_RBUTTONDOWN:
self.selected_ROI = False
self.clone = self.original_image.copy()
def show_image(self):
return self.clone
def crop_ROI(self):
if self.selected_ROI:
x1 = self.image_coordinates[0][0]
y1 = self.image_coordinates[0][1]
x2 = self.image_coordinates[1][0]
y2 = self.image_coordinates[1][1]
# Extract ROI
self.cropped_image = self.original_image.copy()[y1:y2, x1:x2]
# Display and save image
cv2.imshow('Cropped Image', self.cropped_image)
cv2.imwrite('ROI.png', self.cropped_image)
else:
print('Select ROI before cropping!')
if __name__ == '__main__':
extractArtworkROI = ExtractArtworkROI()
while True:
cv2.imshow('image', extractArtworkROI.show_image())
key = cv2.waitKey(1)
# Close program with keyboard 'q'
if key == ord('q'):
cv2.destroyAllWindows()
exit(1)
# Crop ROI
if key == ord('c'):
extractArtworkROI.crop_ROI()
Given the coordinates of four arbitrary points in an image (which are guaranteed to form a rectangle), I want to extract the patch that they represent and get a vectorized (flat) representation of the same. How can I do this?
I saw the answer to this question and using it I am able to reach to the patch that I require. For example, given the image coordinates of the 4 corners of the green rectangle in this image:
I am able to get to the patch and get something like:
using the following code:
p1 = (334,128)
p2 = (438,189)
p3 = (396,261)
p4 = (292,200)
pts = np.array([p1, p2, p3, p4])
mask = np.zeros((img.shape[0], img.shape[1]))
cv2.fillConvexPoly(mask, pts, 1)
mask = mask.astype(np.bool)
out = np.zeros_like(img)
out[mask] = img[mask]
patch = img[mask]
cv2.imwrite(img_name, out)
However, the problem is that the patch variable that I obtain is simply an array of all pixels of the image that belong to the patch, when the image is read as a matrix in row-major order.
What I want is that patch variable should contain the pixels in the order they can form a genuine image so that I can perform operations on it. Is there an opencv function that I should be aware of that would help me in doing this?
Thanks!
This is how you can implement this:
Code:
# create a subimage with the outer limits of the points
subimg = out[128:261,292:438]
# calculate the angle between the 2 'lowest' points, the 'bottom' line
myradians = math.atan2(p3[0]-p4[0], p3[1]-p4[1])
# convert to degrees
mydegrees = 90-math.degrees(myradians)
# create rotationmatrix
h,w = subimg.shape[:2]
center = (h/2,w/2)
M = cv2.getRotationMatrix2D(center, mydegrees, 1)
# rotate subimage
rotatedImg = cv2.warpAffine(subimg, M, (h, w))
Result:
Next, the black areas in the image can be easily cropped by removing all rows/columns that are 100% black.
Final result:
Code:
# converto image to grayscale
img = cv2.cvtColor(rotatedImg, cv2.COLOR_BGR2GRAY)
# sum each row and each volumn of the image
sumOfCols = np.sum(img, axis=0)
sumOfRows = np.sum(img, axis=1)
# Find the first and last row / column that has a sum value greater than zero,
# which means its not all black. Store the found values in variables
for i in range(len(sumOfCols)):
if sumOfCols[i] > 0:
x1 = i
print('First col: ' + str(i))
break
for i in range(len(sumOfCols)-1,-1,-1):
if sumOfCols[i] > 0:
x2 = i
print('Last col: ' + str(i))
break
for i in range(len(sumOfRows)):
if sumOfRows[i] > 0:
y1 = i
print('First row: ' + str(i))
break
for i in range(len(sumOfRows)-1,-1,-1):
if sumOfRows[i] > 0:
y2 = i
print('Last row: ' + str(i))
break
# create a new image based on the found values
finalImage = rotatedImg[y1:y2,x1:x2]
I'm still hacking together a book scanning script, and for now, all I need is to be able to automagically detect a page turn. The book fills up 90% of the screen (I'm using a cruddy webcam for the motion detection), so when I turn a page, the direction of motion is basically in that same direction.
I have modified a motion-tracking script, but derivatives are getting me nowhere:
#!/usr/bin/env python
import cv, numpy
class Target:
def __init__(self):
self.capture = cv.CaptureFromCAM(0)
cv.NamedWindow("Target", 1)
def run(self):
# Capture first frame to get size
frame = cv.QueryFrame(self.capture)
frame_size = cv.GetSize(frame)
grey_image = cv.CreateImage(cv.GetSize(frame), cv.IPL_DEPTH_8U, 1)
moving_average = cv.CreateImage(cv.GetSize(frame), cv.IPL_DEPTH_32F, 3)
difference = None
movement = []
while True:
# Capture frame from webcam
color_image = cv.QueryFrame(self.capture)
# Smooth to get rid of false positives
cv.Smooth(color_image, color_image, cv.CV_GAUSSIAN, 3, 0)
if not difference:
# Initialize
difference = cv.CloneImage(color_image)
temp = cv.CloneImage(color_image)
cv.ConvertScale(color_image, moving_average, 1.0, 0.0)
else:
cv.RunningAvg(color_image, moving_average, 0.020, None)
# Convert the scale of the moving average.
cv.ConvertScale(moving_average, temp, 1.0, 0.0)
# Minus the current frame from the moving average.
cv.AbsDiff(color_image, temp, difference)
# Convert the image to grayscale.
cv.CvtColor(difference, grey_image, cv.CV_RGB2GRAY)
# Convert the image to black and white.
cv.Threshold(grey_image, grey_image, 70, 255, cv.CV_THRESH_BINARY)
# Dilate and erode to get object blobs
cv.Dilate(grey_image, grey_image, None, 18)
cv.Erode(grey_image, grey_image, None, 10)
# Calculate movements
storage = cv.CreateMemStorage(0)
contour = cv.FindContours(grey_image, storage, cv.CV_RETR_CCOMP, cv.CV_CHAIN_APPROX_SIMPLE)
points = []
while contour:
# Draw rectangles
bound_rect = cv.BoundingRect(list(contour))
contour = contour.h_next()
pt1 = (bound_rect[0], bound_rect[1])
pt2 = (bound_rect[0] + bound_rect[2], bound_rect[1] + bound_rect[3])
points.append(pt1)
points.append(pt2)
cv.Rectangle(color_image, pt1, pt2, cv.CV_RGB(255,0,0), 1)
num_points = len(points)
if num_points:
x = 0
for point in points:
x += point[0]
x /= num_points
movement.append(x)
if len(movement) > 0 and numpy.average(numpy.diff(movement[-30:-1])) > 0:
print 'Left'
else:
print 'Right'
# Display frame to user
cv.ShowImage("Target", color_image)
# Listen for ESC or ENTER key
c = cv.WaitKey(7) % 0x100
if c == 27 or c == 10:
break
if __name__=="__main__":
t = Target()
t.run()
It detects the average motion of the average center of all of the boxes, which is extremely inefficient. How would I go about detecting such motions quickly and accurately (i.e. within a threshold)?
I'm using Python, and I plan to stick with it, as my whole framework is based on Python.
And help is appreciated, so thank you all in advance. Cheers.
I haven't used OpenCV in Python before, just a bit in C++ with openframeworks.
For this I presume OpticalFlow's velx,vely properties would work.
For more on how Optical Flow works check out this paper.
HTH
why don't you use cv.GoodFeaturesToTrack ? it may solve the script runtime ... and shorten the code ...