Related
Consider the following image:
and the following bounding contour( which is a smooth version of the output of a text-detection neural network of the above image ), so this contour is a given.
I need to warp both images so that I end up with a straight enough textline, so that it can be fed to a text recognition neural network:
using Piecewise Affine Transformation, or some other method. with an implementation if possible or key points of implementation in python.
I know how to find the medial axis, order its points, simplify it (e.g using Douglas-Peucker algorithm), and find the corresponding points on a straight line.
EDIT: the question can be rephrased -naively- as the following :
have you tried the "puppet warp" feature in Adobe Photoshop? you specify "joint" points on an image , and you move these points to the desired place to perform the image warping, we can calculate the source points using a simplified medial axis (e.g 20 points instead of 200 points), and calculate the corresponding target points on a straight line, how to perform Piecewise Affine Transformation using these two sets of points( source and target)?
EDIT: modified the images, my bad
Papers
Here's a paper that does the needed result:
A Novel Technique for Unwarping Curved Handwritten Texts Using Mathematical Morphology and Piecewise Affine Transformation
another paper: A novel method for straightening curved text-lines in stylistic documents
Similar questions:
Straighten B-Spline
Challenge : Curved text extraction using python
How to convert curves in images to lines in Python?
Deforming an image so that curved lines become straight lines
Straightening a curved contour
Full code also available in this notebook , runtime -> run all to reproduce the result.
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from scipy import interpolate
from scipy.spatial import distance
from shapely.geometry import LineString, GeometryCollection, MultiPoint
from skimage.morphology import skeletonize
from sklearn.decomposition import PCA
from warp import PiecewiseAffineTransform # https://raw.githubusercontent.com/TimSC/image-piecewise-affine/master/warp.py
# Helper functions
def extendline(line, length):
a = line[0]
b = line[1]
lenab = distance.euclidean(a, b)
cx = b[0] + ((b[0] - a[0]) / lenab * length)
cy = b[1] + ((b[1] - a[1]) / lenab * length)
return [cx, cy]
def XYclean(x, y):
xy = np.concatenate((x.reshape(-1, 1), y.reshape(-1, 1)), axis=1)
# make PCA object
pca = PCA(2)
# fit on data
pca.fit(xy)
# transform into pca space
xypca = pca.transform(xy)
newx = xypca[:, 0]
newy = xypca[:, 1]
# sort
indexSort = np.argsort(x)
newx = newx[indexSort]
newy = newy[indexSort]
# add some more points (optional)
f = interpolate.interp1d(newx, newy, kind='linear')
newX = np.linspace(np.min(newx), np.max(newx), 100)
newY = f(newX)
# #smooth with a filter (optional)
# window = 43
# newY = savgol_filter(newY, window, 2)
# return back to old coordinates
xyclean = pca.inverse_transform(np.concatenate((newX.reshape(-1, 1), newY.reshape(-1, 1)), axis=1))
xc = xyclean[:, 0]
yc = xyclean[:, 1]
return np.hstack((xc.reshape(-1, 1), yc.reshape(-1, 1))).astype(int)
def contour2skeleton(cnt):
x, y, w, h = cv2.boundingRect(cnt)
cnt_trans = cnt - [x, y]
bim = np.zeros((h, w))
bim = cv2.drawContours(bim, [cnt_trans], -1, color=255, thickness=cv2.FILLED) // 255
sk = skeletonize(bim > 0)
#####
skeleton_yx = np.argwhere(sk > 0)
skeleton_xy = np.flip(skeleton_yx, axis=None)
xx, yy = skeleton_xy[:, 0], skeleton_xy[:, 1]
skeleton_xy = XYclean(xx, yy)
skeleton_xy = skeleton_xy + [x, y]
return skeleton_xy
mm = cv2.imread('cont.png', cv2.IMREAD_GRAYSCALE)
plt.imshow(mm)
cnts, _ = cv2.findContours(mm.astype('uint8'), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cont = cnts[0].reshape(-1, 2)
# find skeleton
sk = contour2skeleton(cont)
mm = np.zeros_like(mm)
cv2.polylines(mm, [sk], False, 255, 2)
plt.imshow(mm)
# simplify the skeleton
ln = LineString(sk).simplify(2)
sk_simp = np.int0(ln.coords)
mm = np.zeros_like(mm)
for pt in sk_simp:
cv2.circle(mm, pt, 5, 255, -1)
plt.imshow(mm)
# extend both ends of the skeleton
print(len(sk_simp))
a, b = sk_simp[1], sk_simp[0]
c1 = np.int0(extendline([a, b], 50))
sk_simp = np.vstack([c1, sk_simp])
a, b = sk_simp[-2], sk_simp[-1]
c2 = np.int0(extendline([a, b], 50))
sk_simp = np.vstack([sk_simp, c2])
print(len(sk_simp))
cv2.circle(mm, c1, 10, 255, -1)
cv2.circle(mm, c2, 10, 255, -1)
plt.imshow(mm)
########
# find the target points
########
pts1 = sk_simp.copy()
dists = [distance.euclidean(p1, p2) for p1, p2 in zip(pts1[:-1], pts1[1:])]
zip1 = list(zip(pts1[:-1], dists))
# find the first 2 target points
a = pts1[0]
b = a - (dists[0], 0)
pts2 = [a, b, ]
for z in zip1[1:]:
lastpt = pts2[-1]
pt, dst = z
ln = [a, lastpt]
c = extendline(ln, dst)
pts2.append(c)
pts2 = np.int0(pts2)
ln1 = LineString(pts1)
ln2 = LineString(pts2)
GeometryCollection([ln1.buffer(5), ln2.buffer(5),
MultiPoint(pts2), MultiPoint(pts1)])
########
# create translated copies of source and target points
# 50 is arbitary
pts1 = np.vstack([pts1 + [0, 50], pts1 + [0, -50]])
pts2 = np.vstack([pts2 + [0, 50], pts2 + [0, -50]])
MultiPoint(pts1)
########
# performing the warping
im = Image.open('orig.png')
dstIm = Image.new(im.mode, im.size, color=(255, 255, 255))
# Perform transform
PiecewiseAffineTransform(im, pts1, dstIm, pts2)
plt.figure(figsize=(10, 10))
plt.imshow(dstIm)
1- find medial axis , e.g using skimage.morphology.skeletonize and simplify it ,e.g using shapely object.simplify , I used a tolerance of 2 , the medial axis points are in white:
2- find the corresponding points on a straight line, using the distance between each point and the next:
3 - also added extra points on the ends, colored blue, so that the points fit the entire contour length
4- create 2 copies of the source and target points, one copy translated up and the other translated down (I choose an offset of 50 here), so the source points are now like this, please note that simple upward/downward displacement may not be the best approach for all contours, e.g if the contour is curving with degrees > 45:
5- using the code here , perform PiecewiseAffineTransform using the source and target points, here's the result, it's straight enough:
If the goal is to just unshift each column, then:
import numpy as np
from PIL import Image
source_img = Image.open("73614379-input-v2.png")
contour_img = Image.open("73614379-map-v3.png").convert("L")
assert source_img.size == contour_img.size
contour_arr = np.array(contour_img) != 0 # convert to boolean array
col_offsets = np.argmax(
contour_arr, axis=0
) # find the first non-zero row for each column
assert len(col_offsets) == source_img.size[0] # sanity check
min_nonzero_col_offset = np.min(
col_offsets[col_offsets > 0]
) # find the minimum non-zero row
target_img = Image.new("RGB", source_img.size, (255, 255, 255))
for x, col_offset in enumerate(col_offsets):
offset = col_offset - min_nonzero_col_offset if col_offset > 0 else 0
target_img.paste(
source_img.crop((x, offset, x + 1, source_img.size[1])), (x, 0)
)
target_img.save("unshifted3.png")
with the new input and the new contour from OP outputs this image:
I have a stationary camera which takes photos rapidly of the continuosly moving product but in a fixed position just of the same angle (translation perspective). I need to stitch all images into a panoramic picture. I've tried by using the class Stitcher. It worked, but it took a long time to compute.
I also tried to use another method by using the SIFT detector, FNNbasedMatcher, finding Homography and then warping the images. This method works fine if I only use two images. For multiple images it still doesn't stitch them properly. Does anyone know the best and fastest image stitching algorithm for this case?
This is my code which uses the Stitcher class.
import time
import cv2
import os
import numpy as np
import sys
def main():
# read input images
imgs = []
path = 'pics_rotated/'
i = 0
for (root, dirs, files) in os.walk(path):
images = [f for f in files]
print(images)
for i in range(0,len(images)):
curImg = cv2.imread(path + images[i])
imgs.append(curImg)
stitcher = cv2.Stitcher.create(mode= 0)
status ,result = stitcher.stitch(imgs)
if status != cv2.Stitcher_OK:
print("Can't stitch images, error code = %d" % status)
sys.exit(-1)
cv2.imwrite("imagesout/output.jpg", result)
cv2.waitKey(0)
if __name__ == '__main__':
start = time.time()
main()
end = time.time()
print("Time --->>>>>", end - start)
cv2.destroyAllWindows()enter code here
Briefing
Although OpenCV Stitcher class provides lots of methods and options to perform stitching, I find it hard to use it because of the complexity.
Therefore, I will try to provide the minimum and fastest way to perform stitching.
In case you are wondering more sophisticated approachs such as exposure compensation, I highly recommend looking at the detailed sample code.
As a side note, I will be grateful if someone can convert the following functions to use Stitcher class.
Introduction
In order to combine multiple images into the same perspective, the following operations are needed:
Detect and match features.
Compute homography (perspective transform between frames).
Warp one image onto the other perspective.
Combine the base and warped images while keeping track of the shift in origin.
Given the combination pattern, stitch multiple images.
Feature detection and matching
What are features?
They are distinguishable parts, like corners of a square, that are preserved across images.
There are different algorithms proposed for obtaining these characteristic points, like Harris, ORB, SIFT, SURF, etc.
See cv::Feature2d for the full list.
I will use SIFT because it is accurate and sufficiently fast.
A feature consists of a KeyPoint, which is the location in the image, and a descriptor, which is a set of numbers (e.g. a 128-D vector) that represents the properties of the feature.
After finding distinct points in images, we need to match the corresponding point pairs.
See cv::DescriptionMatcher.
I will use Flann-based descriptor matcher.
First, we initialize the descriptor and matcher classes.
descriptor = cv.SIFT.create()
matcher = cv.DescriptorMatcher.create(cv.DescriptorMatcher.FLANNBASED)
Then, we find the features in each image.
(kps, desc) = descriptor.detectAndCompute(image, mask=None)
Now we find the corresponding point pairs.
if (desc1 is not None and desc2 is not None and len(desc1) >=2 and len(desc2) >= 2):
rawMatch = matcher->knnMatch(desc2, desc1, k=2)
matches = []
# ensure the distance is within a certain ratio of each other (i.e. Lowe's ratio test)
ratio = 0.75
for m in rawMatch:
if len(m) == 2 and m[0].distance < m[1].distance * ratio:
matches.append((m[0].trainIdx, m[0].queryIdx))
Homography computation
Homography is the perspective transformation from one view to another.
The parallel lines in one view may not be parallel in another, like a road to sunset.
We need to have at least 4 corresponding point pairs.
The more means redundant data that have to be decomposed or eliminated.
Homography matrix that transforms the point in the initial view to its warped position.
It is a 3x3 matrix that is computed by Direct Linear Transform algorithm.
There are 8 DoF and the last element in the matrix is 1.
[pt2] = H * [pt1]
Now that we have corresponding point matches, we compute the homography.
The method we use to handle redundant data is RANSAC, which randomly selects 4 point pairs and uses the best fitting result.
See cv::findHomography for more options.
if len(matches) > 4:
(H, status) = cv.findHomography(pts1, pts2, cv.RANSAC)
Warping to perspective
By computing homography, we know which point in the source image corresponds to which point in the destination image.
In order not to lose information from the source image, we need to pad the destination image by the amount that the transformed point falls to negative regions.
At the same time, we need to keep track of the shift amount of the origin for stitching multiple images.
Auxilary functions
# find the ROI of a transformation result
def warpRect(rect, H):
x, y, w, h = rect
corners = [[x, y], [x, y + h - 1], [x + w - 1, y], [x + w - 1, y + h - 1]]
extremum = cv.transform(corners, H)
minx, miny = np.min(extremum[:,0]), np.min(extremum[:,1])
maxx, maxy = np.max(extremum[:,0]), np.max(extremum[:,1])
xo = int(np.floor(minx))
yo = int(np.floor(miny))
wo = int(np.ceil(maxx - minx))
ho = int(np.ceil(maxy - miny))
outrect = (xo, yo, wo, ho)
return outrect
# homography matrix is translated to fit in the screen
def coverH(rect, H):
# obtain bounding box of the result
x, y, _, _ = warpRect(rect, H)
# shift amount to the first quadrant
xpos = int(-x if x < 0 else 0)
ypos = int(-y if y < 0 else 0)
# correct the homography matrix so that no point is thrown out
T = np.array([[1, 0, xpos], [0, 1, ypos], [0, 0, 1]])
H_corr = T.dot(H)
return (H_corr, (xpos, ypos))
# pad image to cover ROI, return the shift amount of origin
def addBorder(img, rect):
x, y, w, h = rect
tl = (x, y)
br = (x + w, y + h)
top = int(-tl[1] if tl[1] < 0 else 0)
bottom = int(br[1] - img.shape[0] if br[1] > img.shape[0] else 0)
left = int(-tl[0] if tl[0] < 0 else 0)
right = int(br[0] - img.shape[1] if br[0] > img.shape[1] else 0)
img = cv.copyMakeBorder(img, top, bottom, left, right, cv.BORDER_CONSTANT, value=[0, 0, 0])
orig = (left, top)
return img, orig
def size2rect(size):
return (0, 0, size[1], size[0])
Warping function
def warpImage(img, H):
# tweak the homography matrix to move the result to the first quadrant
H_cover, pos = coverH(size2rect(img.shape), H)
# find the bounding box of the output
x, y, w, h = warpRect(size2rect(img.shape), H_cover)
width, height = x + w, y + h
# warp the image using the corrected homography matrix
warped = cv.warpPerspective(img, H_corr, (width, height))
# make the external boundary solid black, useful for masking
warped = np.ascontiguousarray(warped, dtype=np.uint8)
gray = cv.cvtColor(warped, cv.COLOR_RGB2GRAY)
_, bw = cv.threshold(gray, 1, 255, cv.THRESH_BINARY)
# https://stackoverflow.com/a/55806272/12447766
major = cv.__version__.split('.')[0]
if major == '3':
_, cnts, _ = cv.findContours(bw, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)
else:
cnts, _ = cv.findContours(bw, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)
warped = cv.drawContours(warped, cnts, 0, [0, 0, 0], lineType=cv.LINE_4)
return (warped, pos)
Combining warped and destination images
This is the step where image enhancement such as exposure compensation becomes involved.
In order to keep things simple, we will use mean value blending.
The easiest solution would be overriding the existing data in the destination image but averaging operation is not a burden for us.
# only the non-zero pixels are weighted to the average
def mean_blend(img1, img2):
assert(img1.shape == img2.shape)
locs1 = np.where(cv.cvtColor(img1, cv.COLOR_RGB2GRAY) != 0)
blended1 = np.copy(img2)
blended1[locs1[0], locs1[1]] = img1[locs1[0], locs1[1]]
locs2 = np.where(cv.cvtColor(img2, cv.COLOR_RGB2GRAY) != 0)
blended2 = np.copy(img1)
blended2[locs2[0], locs2[1]] = img2[locs2[0], locs2[1]]
blended = cv.addWeighted(blended1, 0.5, blended2, 0.5, 0)
return blended
def warpPano(prevPano, img, H, orig):
# correct homography matrix
T = np.array([[1, 0, -orig[0]], [0, 1, -orig[1]], [0, 0, 1]])
H_corr = H.dot(T)
# warp the image and obtain shift amount of origin
result, pos = warpImage(prevPano, H_corr)
xpos, ypos = pos
# zero pad the result
rect = (xpos, ypos, img.shape[1], img.shape[0])
result, _ = addBorder(result, rect)
# mean value blending
idx = np.s_[ypos : ypos + img.shape[0], xpos : xpos + img.shape[1]]
result[idx] = mean_blend(result[idx], img)
# crop extra paddings
x, y, w, h = cv.boundingRect(cv.cvtColor(result, cv.COLOR_RGB2GRAY))
result = result[y : y + h, x : x + w]
# return the resulting image with shift amount
return (result, (xpos - x, ypos - y))
Stitching multiple images given combination pattern
# base image is the last image in each iteration
def blend_multiple_images(images, homographies):
N = len(images)
assert(N >= 2)
assert(len(homographies) == N - 1)
pano = np.copy(images[0])
pos = (0, 0)
for i in range(N - 1):
img = images[i + 1]
# get homography matrix
H = homographies[i]
# warp pano onto image
pano, pos = warpPano(pano, img, H, pos)
return (pano, pos)
The method above warps the previously combined image, called pano, onto the next image subsequently.
A pattern, however, may have conjunction points for the best stitching view.
For example
1 2 3
4 5 6
The best pattern to combine these images is
1 -> 2 <- 3
|
V
4 -> 5 <- 6
Therefore, we need one last function to combine 1 & 2 with 2 & 3, or 1235 with 456 at node 5.
from operator import sub
# no warping here, useful for combining two different stitched images
# the image at given origin coordinates must be the same
def patchPano(img1, img2, orig1=(0,0), orig2=(0,0)):
# bottom right points
br1 = (img1.shape[1] - 1, img1.shape[0] - 1)
br2 = (img2.shape[1] - 1, img2.shape[0] - 1)
# distance from orig to br
diag2 = tuple(map(sub, br2, orig2))
# possible pano corner coordinates based on img1
extremum = np.array([(0, 0), br1,
tuple(map(sum, zip(orig1, diag2))),
tuple(map(sub, orig1, orig2))])
bb = cv.boundingRect(extremum)
# patch img1 to img2
pano, shift = addBorder(img1, bb)
orig = tuple(map(sum, zip(orig1, shift)))
idx = np.s_[orig[1] : orig[1] + img2.shape[0] - orig2[1],
orig[0] : orig[0] + img2.shape[1] - orig2[0]]
subImg = img2[orig2[1] : img2.shape[0], orig2[0] : img2.shape[1]]
pano[idx] = mean_blend(pano[idx], subImg)
return (pano, orig)
For a quick demo, you can run the Python code in GitHub.
If you want to use the above methods in C++, you can have a look at Stitch library.
Any PR or edit to this post is welcome.
As an alternative to the last step that #Burak gave, this is the way I used as I had the number of images for each of the rows (chunks), the multiStitching being nothing but a function to stitch images horizontally:
def stitchingImagesHV(img_list, size):
"""
As our multi stitching algorithm works on the horizontal line, we will hack
it to use also the vertical stitching by rotating each row "stitch_img" and
apply the same technique, and after that, the final result is rotated back to the
original direction.
"""
# Generate row chunks of "size" length from image list
chunks = [img_list[i:i + size] for i in range(0, len(img_list), size)]
list_rotated_images = []
for i in range(len(chunks)):
stitch_img = multiStitching(chunks[i])
stitch_img_rotated = cv2.rotate(stitch_img, cv2.ROTATE_90_COUNTERCLOCKWISE)
list_rotated_images.append(stitch_img_rotated.astype('uint8'))
stitch_img2 = multiStitching(list_rotated_images)
return cv2.rotate(stitch_img2, cv2.ROTATE_90_CLOCKWISE)
I am using Voronoi diagrams for image processing (procedurally generated stippling).
In order to do this I need to create a list (cells) of a list (coords_within_cell) of tuples (x,y pixel locations).
I have developed a couple brute-force algorithms to accomplish this (see below), but they are too slow to process more than ~10 points. The scipy spatial utilities seem to be more than 1000x more efficient. Because of this, I would like to use scipy to generate the Voronoi diagram:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.Voronoi.html
Using scipy to generate the Voronoi diagram is fairly simple but unfortunately I cannot figure out how to convert the cell areas into pixel coordinates. What is the best way to do this?
I found a related question, but it has no answers and it was deleted: https://web.archive.org/web/20200120151304/https://stackoverflow.com/questions/57703129/converting-a-voronoi-diagram-into-bitmap
Brute Force Algorithm 1 (too slow)
import math
import random
from PIL import Image
def distance(x1, y1, x2, y2):
return math.hypot(x2 - x1, y2 - y1)
# define the size of the x and y bounds
screen_width = 1260
screen_height = 1260
# define the number of points that should be used
number_of_points = 16
# randomly generate a list of n points within the given x and y bounds
point_x_coordinates = random.sample(range(0, screen_width), number_of_points)
point_y_coordinates = random.sample(range(0, screen_height), number_of_points)
points = list(zip(point_x_coordinates, point_y_coordinates))
# each point needs to have a corresponding list of pixels
point_pixels = []
for i in range(len(points)):
point_pixels.append([])
# for each pixel within bounds, determine which point it is closest to and add it to the corresponding list in point_pixels
for pixel_y_coordinate in range(screen_height):
for pixel_x_coordinate in range(screen_width):
distance_to_closest_point = float('inf')
closest_point_index = 1
for point_index, point in enumerate(points):
distance_to_point = distance(pixel_x_coordinate, pixel_y_coordinate, point[0], point[1])
if(distance_to_point < distance_to_closest_point):
closest_point_index = point_index
distance_to_closest_point = distance_to_point
point_pixels[closest_point_index].append((pixel_x_coordinate, pixel_y_coordinate))
# each point needs to have a corresponding centroid
point_pixels_centroid = []
for pixel_group in point_pixels:
x_sum = 0
y_sum = 0
for pixel in pixel_group:
x_sum += pixel[0]
y_sum += pixel[1]
x_average = x_sum / len(pixel_group)
y_average = y_sum / len(pixel_group)
point_pixels_centroid.append((round(x_average), round(y_average)))
# display the resulting voronoi diagram
display_voronoi = Image.new("RGB", (screen_width, screen_height), "white")
for pixel_group in point_pixels:
rgb = random.sample(range(0, 255), 3)
for pixel in pixel_group:
display_voronoi.putpixel( pixel, (rgb[0], rgb[1], rgb[2], 255) )
for centroid in point_pixels_centroid:
print(centroid)
display_voronoi.putpixel( centroid, (1, 1, 1, 255) )
display_voronoi.show()
Brute Force Algorithm 2 (also too slow):
Based on this concept.
import math
import random
from PIL import Image
def distance(x1, y1, x2, y2):
return math.hypot(x2 - x1, y2 - y1)
# define the size of the x and y bounds
screen_width = 500
screen_height = 500
# define the number of points that should be used
number_of_points = 4
# randomly generate a list of n points within the given x and y bounds
point_x_coordinates = random.sample(range(0, screen_width), number_of_points)
point_y_coordinates = random.sample(range(0, screen_height), number_of_points)
points = list(zip(point_x_coordinates, point_y_coordinates))
# each point needs to have a corresponding list of pixels
point_pixels = []
for i in range(len(points)):
point_pixels.append([])
# for each pixel within bounds, determine which point it is closest to and add it to the corresponding list in point_pixels
# do this by continuously growing circles outwards from the points
# if circles overlap then whoever was their first claims the location
# keep track of whether pixels have been used or not
# this is done via a 2D list of booleans
is_drawn_on = []
for i in range(screen_width):
is_drawn_on.append([])
for j in range(screen_height):
is_drawn_on[i].append(False)
circles_are_growing = True
radius = 1
while(circles_are_growing):
circles_are_growing = False
for point_index, point in enumerate(points):
for i in range(point[0] - radius, point[0] + radius):
for j in range(point[1] - radius, point[1] + radius):
# print(str(i)+" vs "+str(len(is_drawn_on)))
if(i >= 0 and i < len(is_drawn_on)):
if(j >= 0 and j < len(is_drawn_on[i])):
if(not is_drawn_on[i][j] and distance(i, j, point[0], point[1]) <= radius):
point_pixels[point_index].append((i, j))
circles_are_growing = True
is_drawn_on[i][j] = True
radius += 1
# each point needs to have a corresponding centroid
point_pixels_centroid = []
for pixel_group in point_pixels:
x_sum = 0
y_sum = 0
for pixel in pixel_group:
x_sum += pixel[0]
y_sum += pixel[1]
x_average = x_sum / len(pixel_group)
y_average = y_sum / len(pixel_group)
point_pixels_centroid.append((round(x_average), round(y_average)))
# display the resulting voronoi diagram
display_voronoi = Image.new("RGB", (screen_width, screen_height), "white")
for pixel_group in point_pixels:
rgb = random.sample(range(0, 255), 3)
for pixel in pixel_group:
display_voronoi.putpixel( pixel, (rgb[0], rgb[1], rgb[2], 255) )
for centroid in point_pixels_centroid:
print(centroid)
display_voronoi.putpixel( centroid, (1, 1, 1, 255) )
display_voronoi.show()
Rather than build and interrogate the Voronoi diagram directly, it is easier to build and query a standard search tree. Below is my modification of your code using scipy.spatial.KDTree to determine the closest point for each pixel location followed by an image of the result (a 500x500 image with 500 Voronoi points).
The code is still a little slow but now scales well in the number of Voronoi points. This could be faster if you avoid building the list of pixel locations for each Voronoi cell and instead just directly set data in the image.
The fastest solution may involve building the Voronoi diagam and walking through it one pixel at a time and associating the closest Voronoi cell, looking at neighboring Voronoi cells when needed (since the previous pixel gives a very good guess for finding the Voronoi cell for the next pixel). But that will involve writing a lot more code that using the KDTree naively like this and probably not yield huge gains: the slow part of the code at this point is building all the per-pixel arrays/data which can be cleaned up independently.
import math
import random
from PIL import Image
from scipy import spatial
import numpy as np
# define the size of the x and y bounds
screen_width = 500
screen_height = 500
# define the number of points that should be used
number_of_points = 500
# randomly generate a list of n points within the given x and y bounds
point_x_coordinates = random.sample(range(0, screen_width), number_of_points)
point_y_coordinates = random.sample(range(0, screen_height), number_of_points)
points = list(zip(point_x_coordinates, point_y_coordinates))
# each point needs to have a corresponding list of pixels
point_pixels = []
for i in range(len(points)):
point_pixels.append([])
# build a search tree
tree = spatial.KDTree(points)
# build a list of pixed coordinates to query
pixel_coordinates = np.zeros((screen_height*screen_width, 2));
i = 0
for pixel_y_coordinate in range(screen_height):
for pixel_x_coordinate in range(screen_width):
pixel_coordinates[i] = np.array([pixel_x_coordinate, pixel_y_coordinate])
i = i+1
# for each pixel within bounds, determine which point it is closest to and add it to the corresponding list in point_pixels
[distances, indices] = tree.query(pixel_coordinates)
i = 0
for pixel_y_coordinate in range(screen_height):
for pixel_x_coordinate in range(screen_width):
point_pixels[indices[i]].append((pixel_x_coordinate, pixel_y_coordinate))
i = i+1
# each point needs to have a corresponding centroid
point_pixels_centroid = []
for pixel_group in point_pixels:
x_sum = 0
y_sum = 0
for pixel in pixel_group:
x_sum += pixel[0]
y_sum += pixel[1]
x_average = x_sum / max(len(pixel_group),1)
y_average = y_sum / max(len(pixel_group),1)
point_pixels_centroid.append((round(x_average), round(y_average)))
# display the resulting voronoi diagram
display_voronoi = Image.new("RGB", (screen_width, screen_height), "white")
for pixel_group in point_pixels:
rgb = random.sample(range(0, 255), 3)
for pixel in pixel_group:
display_voronoi.putpixel( pixel, (rgb[0], rgb[1], rgb[2], 255) )
for centroid in point_pixels_centroid:
#print(centroid)
display_voronoi.putpixel( centroid, (1, 1, 1, 255) )
#display_voronoi.show()
display_voronoi.save("test.png")
scipy.interpolate.griddata does exactly that, and more or less by the same method as in Alex's answer
import numpy as np
from numpy.random import default_rng
from scipy.interpolate import griddata
# define the size of the x and y bounds
screen_width = 1260
screen_height = 1260
# define the number of points that should be used
number_of_points = 16
# randomly generate a list of n points within the given x and y bounds
rng = default_rng()
points = rng.random((number_of_points,2)) * [screen_width, screen_height]
grid_x, grid_y = np.mgrid[0:screen_width, 0:screen_height]
labels = griddata(points, np.arange(number_of_points), (grid_x, grid_y), method='nearest')
Then, you can use np.where(labels==10) to get the coordinates of all the pixels that belong to cell #10.
Or your can use all the machinery in scipy.ndimage to measure various properties of labeld regions. For instance the center of gravity.
If you want to display colorful cells:
from matplotlib.pyplot import imsave
rgb = rng.integers(0, 255, size=(number_of_points,3))
rgb_labels = rgb[labels]
imsave('test.png', rgb_labels)
Given the coordinates of four arbitrary points in an image (which are guaranteed to form a rectangle), I want to extract the patch that they represent and get a vectorized (flat) representation of the same. How can I do this?
I saw the answer to this question and using it I am able to reach to the patch that I require. For example, given the image coordinates of the 4 corners of the green rectangle in this image:
I am able to get to the patch and get something like:
using the following code:
p1 = (334,128)
p2 = (438,189)
p3 = (396,261)
p4 = (292,200)
pts = np.array([p1, p2, p3, p4])
mask = np.zeros((img.shape[0], img.shape[1]))
cv2.fillConvexPoly(mask, pts, 1)
mask = mask.astype(np.bool)
out = np.zeros_like(img)
out[mask] = img[mask]
patch = img[mask]
cv2.imwrite(img_name, out)
However, the problem is that the patch variable that I obtain is simply an array of all pixels of the image that belong to the patch, when the image is read as a matrix in row-major order.
What I want is that patch variable should contain the pixels in the order they can form a genuine image so that I can perform operations on it. Is there an opencv function that I should be aware of that would help me in doing this?
Thanks!
This is how you can implement this:
Code:
# create a subimage with the outer limits of the points
subimg = out[128:261,292:438]
# calculate the angle between the 2 'lowest' points, the 'bottom' line
myradians = math.atan2(p3[0]-p4[0], p3[1]-p4[1])
# convert to degrees
mydegrees = 90-math.degrees(myradians)
# create rotationmatrix
h,w = subimg.shape[:2]
center = (h/2,w/2)
M = cv2.getRotationMatrix2D(center, mydegrees, 1)
# rotate subimage
rotatedImg = cv2.warpAffine(subimg, M, (h, w))
Result:
Next, the black areas in the image can be easily cropped by removing all rows/columns that are 100% black.
Final result:
Code:
# converto image to grayscale
img = cv2.cvtColor(rotatedImg, cv2.COLOR_BGR2GRAY)
# sum each row and each volumn of the image
sumOfCols = np.sum(img, axis=0)
sumOfRows = np.sum(img, axis=1)
# Find the first and last row / column that has a sum value greater than zero,
# which means its not all black. Store the found values in variables
for i in range(len(sumOfCols)):
if sumOfCols[i] > 0:
x1 = i
print('First col: ' + str(i))
break
for i in range(len(sumOfCols)-1,-1,-1):
if sumOfCols[i] > 0:
x2 = i
print('Last col: ' + str(i))
break
for i in range(len(sumOfRows)):
if sumOfRows[i] > 0:
y1 = i
print('First row: ' + str(i))
break
for i in range(len(sumOfRows)-1,-1,-1):
if sumOfRows[i] > 0:
y2 = i
print('Last row: ' + str(i))
break
# create a new image based on the found values
finalImage = rotatedImg[y1:y2,x1:x2]
I am trying to circular mask an image in Python. I found some example code on the web, but I'm not sure how to change the maths to get my circle in the correct place.
I have an image image_data of type numpy.ndarray with shape (3725, 4797, 3):
total_rows, total_cols, total_layers = image_data.shape
X, Y = np.ogrid[:total_rows, :total_cols]
center_row, center_col = total_rows/2, total_cols/2
dist_from_center = (X - total_rows)**2 + (Y - total_cols)**2
radius = (total_rows/2)**2
circular_mask = (dist_from_center > radius)
I see that this code applies euclidean distance to calculate dist_from_center, but I don't understand the X - total_rows and Y - total_cols part. This produces a mask that is a quarter of a circle, centered on the top-left of the image.
What role are X and Y playing on the circle? And how can I modify this code to produce a mask that is centered somewhere else in the image instead?
The algorithm you got online is partly wrong, at least for your purposes. If we have the following image, we want it masked like so:
The easiest way to create a mask like this is how your algorithm goes about it, but it's not presented in the way that you want, nor does it give you the ability to modify it in an easy way. What we need to do is look at the coordinates for each pixel in the image, and get a true/false value for whether or not that pixel is within the radius. For example, here's a zoomed in picture showing the circle radius and the pixels that were strictly within that radius:
Now, to figure out which pixels lie inside the circle, we'll need the indices of each pixel in the image. The function np.ogrid() gives two vectors, each containing the pixel locations (or indices): there's a column vector for the column indices and a row vector for the row indices:
>>> np.ogrid[:4,:5]
[array([[0],
[1],
[2],
[3]]), array([[0, 1, 2, 3, 4]])]
This format is useful for broadcasting so that if we use them in certain functions, it will actually create a grid of all the indices instead of just those two vectors. We can thus use np.ogrid() to create the indices (or pixel coordinates) of the image, and then check each pixel coordinate to see if it's inside or outside the circle. In order to tell whether it's inside the center, we can simply find the Euclidean distance from the center to every pixel location, and then if that distance is less than the circle radius, we'll mark that as included in the mask, and if it's greater than that, we'll exclude it from the mask.
Now we've got everything we need to make a function that creates this mask. Furthermore we'll add a little bit of nice functionality to it; we can send in the center and the radius, or have it automatically calculate them.
def create_circular_mask(h, w, center=None, radius=None):
if center is None: # use the middle of the image
center = (int(w/2), int(h/2))
if radius is None: # use the smallest distance between the center and image walls
radius = min(center[0], center[1], w-center[0], h-center[1])
Y, X = np.ogrid[:h, :w]
dist_from_center = np.sqrt((X - center[0])**2 + (Y-center[1])**2)
mask = dist_from_center <= radius
return mask
In this case, dist_from_center is a matrix the same height and width that is specified. It broadcasts the column and row index vectors into a matrix, where the value at each location is the distance from the center. If we were to visualize this matrix as an image (scaling it into the proper range), then it would be a gradient radiating from the center we specify:
So when we compare it to radius, it's identical to thresholding this gradient image.
Note that the final mask is a matrix of booleans; True if that location is within the radius from the specified center, False otherwise. So we can then use this mask as an indicator for a region of pixels we care about, or we can take the opposite of that boolean (~ in numpy) to select the pixels outside that region. So using this function to color pixels outside the circle black, like I did up at the top of this post, is as simple as:
h, w = img.shape[:2]
mask = create_circular_mask(h, w)
masked_img = img.copy()
masked_img[~mask] = 0
But if we wanted to create a circular mask at a different point than the center, we could specify it (note that the function is expecting the center coordinates in x, y order, not the indexing row, col = y, x order):
center = (int(w/4), int(h/4))
mask = create_circular_mask(h, w, center=center)
Which, since we're not giving a radius, would give us the largest radius so that the circle would still fit in the image bounds:
Or we could let it calculate the center but use a specified radius:
radius = h/4
mask = create_circular_mask(h, w, radius=radius)
Giving us a centered circle with a radius that doesn't extend exactly to the smallest dimension:
And finally, we could specify any radius and center we wanted, including a radius that extends outside the image bounds (and the center can even be outside the image bounds!):
center = (int(w/4), int(h/4))
radius = h/2
mask = create_circular_mask(h, w, center=center, radius=radius)
What the algorithm you found online does is equivalent to setting the center to (0, 0) and setting the radius to h:
mask = create_circular_mask(h, w, center=(0, 0), radius=h)
I'd like to offer a way to do this that doesn't involve the np.ogrid() function. I'll crop an image called "robot.jpg", which is 491 x 491 pixels. For readability I'm not going to define as many variables as I would in a real program:
Import libraries:
import matplotlib.pyplot as plt
from matplotlib import image
import numpy as np
Import the image, which I'll call "z". This is a color image so I'm also pulling out just a single color channel. Following that, I'll display it:
z = image.imread('robot.jpg')
z = z[:,:,1]
zimg = plt.imshow(z,cmap="gray")
plt.show()
robot.jpg as displayed by matplotlib.pyplot
To wind up with a numpy array (image matrix) with a circle in it to use as a mask, I'm going to start with this:
x = np.linspace(-10, 10, 491)
y = np.linspace(-10, 10, 491)
x, y = np.meshgrid(x, y)
x_0 = -3
y_0 = -6
mask = np.sqrt((x-x_0)**2+(y-y_0)**2)
Note the equation of a circle on that last line, where x_0 and y_0 are defining the center point of the circle in a grid which is 491 elements tall and wide. Because I defined the grid to go from -10 to 10 in both x and y, it is within that system of units that x_0 and x_y set the center point of the circle with respect to the center of the image.
To see what that produces I run:
maskimg = plt.imshow(mask,cmap="gray")
plt.show()
Our "proto" masking circle
To turn that into an actual binary-valued mask, I'm just going to take every pixel below a certain value and set it to 0, and take every pixel above a certain value and set it to 256. The "certain value" will determine the radius of the circle in the same units defined above, so I'll call that 'r'. Here I'll set 'r' to something and then loop through every pixel in the mask to determine if it should be "on" or "off":
r = 7
for x in range(0,490):
for y in range(0,490):
if mask[x,y] < r:
mask[x,y] = 0
elif mask[x,y] >= r:
mask[x,y] = 256
maskimg = plt.imshow(mask,cmap="gray")
plt.show()
The mask
Now I'll just multiply the mask by the image element-wise, then display the result:
z_masked = np.multiply(z,mask)
zimg_masked = plt.imshow(z_masked,cmap="gray")
plt.show()
To invert the mask I can just swap the 0 and the 256 in the thresholding loop above, and if I do that I get:
Masked version of robot.jpg
The other answers work, but they are slow, so I will propose an answer using skimage.draw.disk. Using this is faster and I find it simple to use. Simply specify the center of the circle and radius then use the output to create a mask
from skimage.draw import disk
mask = np.zeros((10, 10), dtype=np.uint8)
row = 4
col = 5
radius = 5
rr, cc = disk(row, col, radius)
mask[rr, cc] = 1