I am looking to OCR some digital numbers in a couple of different formats. I have a function which levels text on the horizontal plane to enable me to create bounding boxes in Opencv which works for one of my digit images. However, the second digit style is slightly leaning (italicised), which sometimes works, but I have found the decimal point mostly gets lost as it gets incorporated into one of the digits bounding rectangles.
Is there a way to align the digits based on the vertical lines of the actual digit?
Below is my working function for the horizontal plane:
def deskew(img):
img_edges = cv2.Canny(img, 100, 100, apertureSize=3)
lines = cv2.HoughLinesP(img_edges, 1, math.pi / 180.0, 100, minLineLength=20, maxLineGap=50)
angles = []
for x1, y1, x2, y2 in lines[0]:
angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
angles.append(angle)
med_angle = np.median(angles)
rotated_img = ndimage.rotate(img, med_angle, cval=255)
cv2.imshow("rotated image", rotated_img)
cv2.waitKey(0)
return rotated_img
Below is the type of image/digit format I am trying to deskew and OCR, I have found through some manual entries that an angle of around 5 degrees seems to work with accurately drawing separate bounding rectangles to capture the digits and decimal points.
Below is the manually adjusted angle, showing all digits and decimal point captured, which can be OCR'd
Related
I am building a video game overlay that sends data back to the player to create a custom HUD, just for fun.
I am trying to read an image of a video game compass and determine the exact orientation of the compass to be a part of my HUD.
Example photo which shows the compass at the top of the screen:
(The circle currently facing ~170°, NOTE: The position of the compass is also fixed)
Example photo which shows the compass at the top of the screen:
Obviously, when I image process on the compass I will only be looking at the compass and not the whole screen.
This has been more challenging for me compared to previous computer vision aspects of my HUD. I have been trying to process the image using cv2 and from there use some object detection to find the "needle" of the compass.
I am struggling to get a triangle shape detection on either needle that will help me know my orientation.
The solution could be lower-tech and hackier, perhaps just searching for the pixel on the edge of the compass and determining that is the end of the needle.
One solution I do not think is viable is using object detection to find a picture of a compass facing true north and then calculating the rotation of the current compass. This is due to the fact that the background of the compass does not rotate only the needle does.
So far I have applied Hough Circle Transform as seen here:
https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_houghcircles/py_houghcircles.html#hough-circles
Which has helped me get a circle around my compass as well as the middle of my compass. However, I cannot find a good solution for finding the facing of the needle compared to the middle of the compass.
I understand this is a pretty open-ended question but I am looking for any theoretical solutions that would help me implement a solution. Anything would help as this is a strange problem for me and I am struggling to think how to go about solving it.
In general I would suggest to look at a thin ring just beneath the border or your compass (This will give you lowest error). Either you could work on an image which is a polar transform of this ring or directly on that ring, looking for the center of gravity of the color red. This center of gravity with respect to the center of your compass should give you the angle. Most likely you don't even need the polar transform.
im = cv.imread("RPc9Q.png")
(x,y,w,h) = (406, 14, 29, 29)
warped = cv.warpPolar(
src=im,
dsize=(512, 512),
center=(x + (w-1)/2, y + (h-1)/2),
maxRadius=(w-1)/2,
flags=cv.WARP_POLAR_LINEAR | cv.INTER_LINEAR
)
Here's some more elaboration on the polar warp approach.
polar warp
take a column of pixels, being a circle in the source picture
plot to see what's there
argmax to find the red bits of the arrow
im = cv.imread("RPc9Q.png") * np.float32(1/255)
(x,y,w,h) = (406, 14, 29, 29)
# polar warp...
steps_angle = 360 * 2
steps_radius = 512
warped = cv.warpPolar(
src=im,
dsize=(steps_radius, steps_angle),
center=(x + (w-1)/2, y + (h-1)/2),
maxRadius=(w-1)/2,
flags=cv.WARP_POLAR_LINEAR | cv.INTER_LANCZOS4
)
# goes 360 degrees, starting from 90 degrees (east) clockwise
# sample at 85% of "full radius", picked manually
col = int(0.85 * steps_radius)
# for illustration
imshow(cv.rotate(cv.line(warped.copy(), (col, 0), (col, warped.shape[0]), (0, 0, 255), 1), rotateCode=cv.ROTATE_90_COUNTERCLOCKWISE))
signal = warped[:,col,2] # red channel, that column
# polar warp coordinate system:
# first row of pixels is sampled at exactly 90 degrees (east)
samplepoints = np.arange(steps_angle) / steps_angle * 360 + 90
imax = np.argmax(signal) # peak
def vertex_parabola(y1, y2, y3):
return 0.5 * (y1 - y3) / (y3 - 2*y2 + y1)
# print("samples around maximum:", signal[imax-1:imax+2] * 255)
imax += vertex_parabola(*signal[imax-1:imax+2].astype(np.float32))
# that slice will blow up in your face if the index gets close to the edges
# either use np.roll() or drop the correction entirely
angle = imax / steps_angle * 360 + 90 # ~= samplepoints[imax]
print("angle:", angle) # 176.2
plt.figure(figsize=(16,4))
plt.xlim(90, 360+90)
plt.xticks(np.arange(90, 360+90, 45))
plt.plot(
samplepoints, signal, 'k-',
samplepoints, signal, 'k.')
plt.axvline(x=angle, color='r', linestyle='-')
plt.show()
I have been able to solve my question with the feedback provided.
First I grab the image of the compass:
step_1
After I process the image crop out the middle and edges of the compass as seen here:
step_2
Now I have a cropped compass with only a little bit of red showing where the compass needle points. I masked out the red part of the image.
step_3
From there it is a simple operation to find the center of the blob which roughly outputs where the needle is pointing. Although this is not perfectly accurate I believe it will work for my purposes.
step_4
Now that I know where the needle end is it should be easy to calculate the direction based on that.
Some references:
Finding red color in image using Python & OpenCV
https://www.geeksforgeeks.org/python-opencv-find-center-of-contour/
I am using a script to measure the size of objects using OpenCV. For my pixel_to_mm_ratio I use an Aruco Marker. This is done with exact values.
To draw boxes around the other objects I first find contours and then draw rectangles around them with cv.minAreaRect(). My problem is that width (w) and height (h) are not given as exact numbers (float) but are already rounded (integer):
rect = cv.minAreaRect(cnt)
(x, y), (w, h), angle = rect
Through the rounding of these numbers (w and h) from rectangles around contours i later get an inaccuracy when calculating the width and height in mm.
object_width = w / pixel_mm_ratio
object_height = h / pixel_mm_ratio
Is there a way to get the exact values from cv2.minAreaRect()? Or an other way to grab these values?
Thanks in advance!
I'm working on OpenCV based project in python, and I have to calculate/extract and show visually the vanishing point from existing lines.
My first task is to detect lines, that's very easy with Canny and HoughLinesP functions:
import cv2
import numpy as np
img = cv2.imread('.image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
edges = cv2.Canny(gray, 500, 460)
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 30, maxLineGap=250)
for line in lines:
x1, y1, x2, y2 = line[0]
cv2.line(img, (x1, y1), (x2, y2), (0, 0, 128), 1)
cv2.imwrite('linesDetected.jpg', img)
But I want to calculate/extrapolate the vanishing point of all lines, to find (and plot) where they cross with each other, like the image below.
I know I need to add a bigger frame to plot the continuation of lines, to find the cross (vanishing point), but I'm very lost at this point.
Thanks too much!!
Instead of the probabilistic Hough Transform implementation cv2.HoughTransformP, if you use the traditional one, cv2.HoughTransform, the lines are represented in parametric space (ρ,Θ). The parametric space relates to the actual point coordinates as ρ=xcosθ+ysinθ where ρ is the perpendicular distance from the origin to the line, and θ is the angle formed by this perpendicular line and the horizontal axis measured in counter-clockwise.
lines = cv2.HoughLines(edges, 1, np.pi/180, 200)
for line in lines:
rho,theta = line[0]
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 10000*(-b))
y1 = int(y0 + 10000*(a))
x2 = int(x0 - 10000*(-b))
y2 = int(y0 - 10000*(a))
cv2.line(img,(x1,y1),(x2,y2),(0,255,0),1)
As you can see below, the projection of vanishing lines already starts to appear.
Now if we play with the parameters for this specific image and skip already-parallel vertical lines, we can get a better set of vanishing lines.
# fine tune parameters
lines = cv2.HoughLines(edges, 0.7, np.pi/120, 120, min_theta=np.pi/36, max_theta=np.pi-np.pi/36)
for line in lines:
rho,theta = line[0]
# skip near-vertical lines
if abs(theta-np.pi/90) < np.pi/9:
continue
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 10000*(-b))
y1 = int(y0 + 10000*(a))
x2 = int(x0 - 10000*(-b))
y2 = int(y0 - 10000*(a))
cv2.line(img,(x1,y1),(x2,y2),(0,255,0),1)
At this step, there are multiple options to find the intersection point of the lines, the vanishing points. I will list some of them below.
Best approximation: All of these lines have a known (ρ,θ), and satisfy (ideally) only two (x,y) points, let's call the left one (x0,y0) and the right one (x1,y1). If you create a linear system with all these variables using the equation above, ρ=xcosθ+ysinθ, you can write it as ρ_n=[x y][cosθ_n sinθ_n]T. This turns the problem into a linear regression and you can solve for best (x,y) points. You can order the lines based on their slope and create two linear systems for (x0,y0) and (x1,y1).
Cumbersome solution: As mentioned in one of the comments, you can find the pairwise intersections of all lines, then cluster them based on proximity, and threshold the clusters based on number of intersections. Then you can output the cluster means of the two most populated clusters.
Trivial image-based solution: Since you already have the image of the intersections, you can do some image processing to find the points. This is by no means an exact solution, it is exercised as a quick and approximate solution. You can get rid of the lines by an erosion with a kernel same size of your lines. Then you can strengthen the intersections by a dilation with a larger kernel. Then if you do a closing operation with a slightly larger kernel, only the strongest intersections will remain. You can output the mean of these blobs as the vanishing points.
Below, you can see the line image before, and the resulting left and right blobs image after running the code below.
# delete lines
kernel = np.ones((3,3),np.uint8)
img2 = cv2.erode(img2,kernel,iterations = 1)
# strengthen intersections
kernel = np.ones((9,9),np.uint8)
img2 = cv2.dilate(img2,kernel,iterations = 1)
# close remaining blobs
kernel = np.ones((11,11),np.uint8)
img2 = cv2.erode(img2,kernel,iterations = 1)
img2 = cv2.dilate(img2,kernel,iterations = 1)
cv2.imwrite('points.jpg', img2)
If you want to find vanishing point from an image, You need to draw the lines. For this you can use Hough Transform. What it does, well it will draw the all possible lines on image. You can tune the parameter of it according to your need. It will give you the intersection points where most of the lines getting intersect. Although it's a one type of estimation which is not the perfectly correct but you can say that it is perfectly estimated. You can also use others forms of Hough as well according to your need.
In this case standard Hough transform is enough.
I'm desperately trying to find my way within openCV to detect lines using HoughLines or any other method, I'm starting from a document image and using structural element and erosion to obtain a binary image with lines.
I managed to obtain the following file but can't seem to obtain HoughLines that are following what seems to me (here is probably the issue) as obvious lines. Any idea on how to go forward or should I start from scratch using other methods ?
The ultimate goal is to extract the lines of the documents as separate images and then try some ML algorithm for handwritten text recognition.
I think that Hough Lines should work in your case. Running
lines = cv2.HoughLines(img_thr, 1, np.pi / 180, threshold=800)
where img_thr is your binary image gives quite good result:
The lines can by sorted by y coordinate of left end (for example) and then two consecutive lines will form a rectangle, which can be extracted using cv2.perspectiveTransform.
There are a few problems, which need to be solved to make this procedure more robust:
Algorithm can return multiple lines for each line in the image, so they need to be deduplicated.
There may be some false positive lines, so you need some condition to remove them. I think that looking at the slope of the line and distances between consecutive lines should do the work.
Effect of threshold parameter in cv2.HoughLines highly depends on the image resolution, so you should resize images to some constant size before running this procedure.
Full code:
img_orig = url_to_image('https://i.stack.imgur.com/PXDKG.png') # orignal image
img_thr = url_to_image('https://i.stack.imgur.com/jZChK.png') # binary image
h, w, _ = img_thr.shape
img_thr = img_thr[:,:,0]
lines = cv2.HoughLines(img_thr, 1, np.pi / 180, threshold=800)
img_copy = img_orig.copy()
points = []
for rho,theta in lines[:, 0]:
a, b = np.cos(theta), np.sin(theta)
x0, y0 = a*rho, b*rho
x1, x2 = 0, w
y1 = y0 + a*((0-x0) / -b)
y2 = y0 + a*((w-x0) / -b)
cv2.line(img_copy,(int(x1),int(y1)),(int(x2),int(y2)),(255,0,0),4)
points.append([[x1, y1], [x2, y2]])
points = np.array(points)
I need help on an algorithm I've been working. I'm trying to detect all the lines in a thresholded image, detect all the lines and then output only those that are parallel. The thresholded image outputs the object of my interest, and then I filter this image through a canny edge detector. This edge image is then passed through the Probabilistic Hough Transform. Now, I want the algorithm to be capable of detecting parallel lines in any image. I had in mind to do this by trying to detect the coordinates of all the lines and calculate their slope (with this then the angle). Parallel lines must have the same or almost the same angle and in that way I could output only the lines with the same angle. I could maybe draw an imaginary line in the image and then use it as reference for all the detected lines in the image? I just don't understand how to use the coordinates of all the lines detected through the function cv2.HoughLinesP(). The documentation of this functions says that the output is a 4D array and this is confusing for me. This is a part of my code:
Line Detection through Probabilistic Hough Transform
rho_res = .1 # [pixels]
theta_res = np.pi / 180. # [radians]
threshold = 50 # [# votes]
min_line_length = 100 # [pixels]
max_line_gap = 40 # [pixels]
lines = cv2.HoughLinesP(edge_image, rho_res, theta_res, threshold, np.array([]),
minLineLength=min_line_length, maxLineGap=max_line_gap)
Draw lines
if lines is not None:
for i in range(0, len(linesP)):
coords = lines[i][0]
slope = (float(coords[3]) - coords[1]) / (float(coords[2]) - coords[0])
cv2.line(img, (coords[0], coords[1]), (coords[2], coords[3]), (0,0,255), 2, cv2.LINE_AA)
Any idea on how I could extrapolate all the detected lines and then output only those that are parallel? I have tried a few algorithms online but none seems to work. Again, my problem is understanding and working with the output variables of the function cv2.HoughLinesP(). I have also find a code that is supposed to calculate the slope. I tried this but is just giving me one value (one slope). I want the slope of all the lines in the image.
Project the Hough transform onto the angle axis. This gives you a 1D signal as a function of theta, that is proportional to the “amount of line” in that orientation. Peaks in this signal indicate orientations that have many parallel lines. Find the largest peak, that gives you a theta.
Now go back to the Hough transform image, and detect peaks with this value of theta (maybe allow a little bit of wiggle). Now you’ll have all parallel lines at this orientation.
Sorry I can’t give you code that works with cv2.HoughLinesP, I don’t know this function. I hope this description gives you a starting point.
Calculate slope (angle) for all lines in range 0..Pi using atan2 function. To limit range by positive angles, add Pi to negative results.
Sort results by slope. Walk through sorted list, make unions for close values - these lines are near parallel. Note that you might have long series for slightly different neighbor value but start and end of series might differ a lot. So use some (angular) threshold to break series run.
I just don't understand how to use the coordinates of all the lines detected through the function cv2.HoughLinesP(). The documentation of this function says that the output is a 4D array and this is confusing for me.
4D array is just the output vector of detected lines. Each line is represented by a 4-element vector (x1, y1, x2, y2) , where (x1,y1) and (x2, y2) are the ending points of each detected line segment.
Please refer to the attached picture to get an idea of what those mean. Keep in my mind that those coordinated are in the image space.