I am trying to train a YOLO model.
For this purpose I have divided my input image of 224*224 into 14*14 grids.
Now if suppose theres an object its centre is located at Bx,By considering 0,0 as top left of image and has Bw, Bh height and width respectively.
Required_prediction=[Pc,Bx,By,Bw,Bh]
where Pc is probability of required object
Thus output of model will be 14*14*5.
My question is what should the output Label be ?
All boxes [0,0,0,0,0] and the box containing the centre of req img as [pc,bx,by,bw,bh] OR
All boxes [0,0,0,0,0] except whole area of required image labelled as [pc,bx. . . ]
ALSO
for bx,by,bw,bh the centre of the image is to be specified wrt to top left of the image or the grid the coordinate fall into?
All boxes [0,0,0,0,0] and the box containing the center of req img as [pc,bx,by,bw,bh] is the right choice for the assumption that you divided the image into 14*14 grid.
but in real world problems they split the image using different sizes to solve this problem which means that you may split the image into 14*14, 8*8 and 4*4 grids to address different sizes of the objects
Related
I have a depth image but I am manipulating it like a (2d) grayscale image to analyze the shape of the figures I have.
I am trying to get the width (distance) of a shape, as given by this image. The width is shown by the red line, which also follows the direction of vector v2.
I have the vectors shown in the image, resulting of a 2-components PCA to gather the direction of the shape (the shape in the picture is cropped, since I just need the width on red, on this part of the shape).
I have no clue, how to rotate the points to origin, or how to project the points to the line and then to calculate the width, somehow by calculating eucledian distance from min to max.
How to get width given by a set of points that are not aligned to axis?
I managed it using a rotated bounding box from cv2, as described by this solution.
I am working on camera calibration using opencv in python. I’ve already done calibration using cv2.calibrateCamera and got the camera matrix and distortion coefficients. I also have evaluated the validity of camera matrix; in other words, the estimated focal lens is very close to the sensor’s focal lens in the datasheet (I know the pixel size and the focal lens in mm from the datasheet). I should mention that in order to undistort new images, I follow the instructions below; as I NEED to keep all source pixels in the undistorted images.
alpha = 1. # to keep all pixels
scale = 1. # to change output image size
w,h = 200,200 # original image size captured by camera
newcameramtx, roi = cv2.getOptimalNewCameraMatrix(camera_matrix, dist_coefs, (w,h), alpha, (int(scale2*w), int(scale2*h))))
mapx, mapy = cv2.initUndistortRectifyMap(camera_matrix, dist_coefs, None, newcameramtx, (w,h), 5)
dst = cv2.remap(img, mapx, mapy, cv2.INTER_CUBIC)
x_, y_, w_, h_ = roi
dst_cropped = dst[y_:y_+h_, x_:x_+w_]
And now the issues and my questions:
The source images are suffering high positive radial distortions, and dst images resulted from undistorsion process are satisfying and seems the positive radial distortion is already canceled, at lease visually. Because of alpha = 1. I also have all source pixels in the dst image. However, the roi is really small and it crops a region in the middle of imae. I could say that dst_cropped only contains the pixels close to the center of dst. According to the links below:
cv2.getOptimalNewCameraMatrix returns ROI of [0,0,0,0] on some data sets
https://answers.opencv.org/question/28438/undistortion-at-far-edges-of-image/?answer=180493#post-id-180493
I found that the probable issue might be because of my dataset; then I tried to balance the dataset to have more images having chessboard close to the image boundaries. I repeated the calibration and the obtained results are very close to the first trial, however still the same effect is presented in the dst_cropped images. I tried to play with alpha parameter as well, but any number less than 1. does not keep all source pixels in dst image. Considering all above information, it seems that I'm obliged to keep using dst images instead of dst_cropped ones; then another issue arises from dst size which is the same as source image (w,h). It is clear that because of alpha=1. the dst contains all source pixels as well as zero pixels, but my question is that how can keep the resolution as before. If I don't make a mistake, seems all points are mapped and then the resulted image is scaled down to fit (w,h). Then, my question is that how can I force the calibration to KEEP resolution as before? For example, if some points are mappted to (-100,-100) or (300,300) the dst should be [400,400] and not [200,200]. How to expand images instead of scaling down?
Thanks in advance for your helps or advice,
I have an object, there are 2 code on it. text printed on it. The text is curve. half of text is in the top side, and another half is in bottom side of object. Here is my sample image
I am using OPENCV, and Deep learning approaches and tessract to OCR it's code.
I logical approach(not Deep approach) I first used HoughCircles() andlogPloar() to align text in line then used tessract such this example sample code. But because of distortion in aligned text, tesseract fail to OCR it's text.
In Deep approach I cant find fine a optimum solution for curve text OCR in tensorflow or torch. There are many sources for text detection not recognition.
Regards,John
why not transform the circular text to linear? Similar to this De-skew characters in binary image just a bit more complicated. So detect (or manually select) the center of circle and convert the image to unrotated one ...
So create new image that has dimensions 6.28*max_radius , 2*max_radius and copy pixels using polar unwraping ... simply convert target pixel position into polar coordinates and convert that to Cartesian source pixel position.
I do not code in Python nor OpenCV but here is a simple C++ example of this:
//---------------------------------------------------------------------------
picture pic0,pic1; // pic0 - original input image,pic1 output
//---------------------------------------------------------------------------
void ExtractCircularText(int x0,int y0) // pic0 -> pic1 center = (x0,y0)
{
int x,y,xx,yy,RR;
float fx,fy,r,a,R;
// resize target image
x= -x0; y= -y0; a=sqrt((x*x)+(y*y)); R=a;
x=pic0.xs-x0; y= -y0; a=sqrt((x*x)+(y*y)); if (R<a) R=a;
x= -x0; y=pic0.ys-y0; a=sqrt((x*x)+(y*y)); if (R<a) R=a;
x=pic0.xs-x0; y=pic0.ys-y0; a=sqrt((x*x)+(y*y)); if (R<a) R=a;
R=ceil(R); RR=R;
pic1.resize((628*RR)/100,RR<<1);
for (yy=0;yy<pic1.ys;yy++)
for (xx=0;xx<pic1.xs;xx++)
{
// pic1 position xx,yy -> polar coordinates a,r
a=xx; a/=R; r=yy;
// a,r -> pic0 position
fx=r*cos(a); x=x0+fx;
fy=r*sin(a); y=y0+fy;
// copy pixel
if ((x>=0)&&(x<pic0.xs))
if ((y>=0)&&(y<pic0.ys))
{
pic1.p[ yy][pic1.xs-1-xx]=pic0.p[y][x]; // 2 mirrors as the text is not uniformly oriented
pic1.p[pic1.ys-1-yy][ xx]=pic0.p[y][x];
}
}
pic1.save("out.png");
}
//---------------------------------------------------------------------------
I use my own picture class for images so some members are:
xs,ys is size of image in pixels
p[y][x].dd is pixel at (x,y) position as 32 bit integer type
clear(color) clears entire image with color
resize(xs,ys) resizes image to new resolution
And finally the resulting image:
I made a 2 copies of the un rotated image (hence 2*max_radius height) so I can copy image in 2 modes to made both orientations of the text readable (as they are mirrored to each other)
Text will be more straight if you chose the center (x0,y0)more precisely I did just click it by mouse on the center of the circle but I doubt the center of text has the same center as that circle/disc. After some clicking this is the best center I could found:
The result suggest that none of the two texts nor disc has the same center ...
The quality of input image is not good you should improve it before doing this (maybe even binarization is a good idea) also storing it as JPG is not a good idea as its lossy compression adding more noise to it. Take a look at these:
Enhancing dynamic range and normalizing illumination
OCR and character similarity
PS. The center could be computed geometrically from selected text (arc) simply find most distant points on it (edges) and point on the middle between them on the arc. From that you can compute arc center and radius... or even fit it ...
The black dot is a perfect feature for centering, and the polar unwarping seems to work fine, the deformation of the characters is negligible.
The failure of Tesserac might be explained by the low image quality (blur).
[Updated The Question at the End]
I'm trying to detect a design pattern of simple geometrical shapes in a 640x480 image. I have divided the image in 32x32 blocks and checking in which block each shape's center lies.
Based on this calculation I created a numpy matrix of (160x120) zeros (float32) with
col=640/4
row=480/4
Each time a shape is found, the center is calculated and check in which block it is found. The corresponding item along with its 8 neighbors in 160x120 numpy array are set to 1. In the end the 160x120 numpy array is represented as a grayscale image with black background and white pixels representing the blocks of detected shapes.
As shown in the image below.
The image in top left corner represents the 160x120 numpy array. No issue so far.
As you can see the newly generated image has a white line on black foreground. I want to find the rho,theta,x0,y0,x1,y1 for this line. So I decided to use HoughLines transformation for this.
For is as followed:
edges = cv2.Canny(np.uint8(g_quadrants), 50, 150, apertureSize=3)
lines = cv2.HoughLines(edges, 1, np.pi / 180, 200)
print lines
Here g_quadrants is the 160x120 matrix representing a gray scale image but output of cv2.HoughLines does not contain anything but None.
Please help me with this.
Update:
The small window with a black and white (np.float32 consider GrayScale) image displaying a white is what I get actually when I
Divide the 640x480 in 32x32 blocks
Find the triangles in the image
Create a 32x32 matrix to map the results for each block
Update the corresponding matrix element by 1 if a triangle is found in a block
Zoomed View:
You can see there are white pixels forming a straight line. The may be some unwanted detected. I need to eliminate unwanted lone pixels and reconstructing a continuous straight line. That may be achieved by dilating then eroding the image. I need the find x0,y0, x1,y1, rho, theta of this line.
Their may be more than one lines. In that case I need to find top 2 lines with respect to length.
This question already has an answer here:
Displaying stitched images together without cutoff using warpAffine
(1 answer)
Closed 5 years ago.
In short, my question is how do I put an image on top of another by specifying specific coordinates for the added image? I would need to extend the "canvas" of the base image as needed so that the added image doesn't get cropped.
Here's the extended version:
My project is to take pictures extracted from a drone video and make a rough map with them, by aligning one photo with the last. I know there is software I can use to do this, like Agisoft Photoscan, but my goal is to create a more lightweight, rough solution.
So here's my plan, which I intend to do with each frame:
Use estimateRigidTransform, to generate the transformation matrix to align curr_photo with the last photo, base
Calculate the bounding rectangle needed to enclose the resulting image (using transformations of the four corners)
Modify the transformation matrix so that the top left of the bounding box is at the origin
Apply the transformation to the current photo, using the bounding rectangle's width and height to ensure none of the resulting image gets cropped
Super-impose the current image with the last image (making sure no cropping of either image occurs), by adding curr_image to base at the proper coordinates. This step is what I am asking about.
Here is the code that does steps one to four.
import numpy as np
import cv2
base = cv2.imread("images/frame_03563.jpg")
curr_photo = cv2.imread("images/frame_03564.jpg")
height, width = curr_photo.shape[:2]
# Step 1
# which transformation is required to go from curr_photo to base?
transformation = cv2.estimateRigidTransform(curr_photo, base, True)
# Step 2
# add a line to the affine transformation matrix so it can be used by
# perspectiveTransform
three_by_three = np.array([
transformation[0],
transformation[1],
[0, 0, 1]], dtype="float32")
# get corners of curr_photo (to be transformed)
corners = np.array([
[0, 0],
[width - 1, 0],
[width - 1, height - 1],
[0, height - 1]
])
# where do the corners of the image go
trans_corners = cv2.perspectiveTransform(np.float32([corners]), three_by_three)
# get the bounding rectangle for the four corner points (and thus, the transformed image)
bx, by, bwidth, bheight = cv2.boundingRect(trans_corners)
# Step 3
# modify transformation matrix so that the top left of the bounding box is at the origin
transformation[0][2] = transformation[0][2] - bx
transformation[1][2] = transformation[1][2] - by
# Step 4
# transform the image in a window the size of its bounding rectangle (so no cropping)
mod_curr_photo = cv2.warpAffine(curr_photo, transformation, (bwidth, bheight))
# for viewing
cv2.imshow("base", base)
cv2.imshow("current photo", curr_photo)
cv2.imshow("image2 transformed to image 1", mod_curr_photo)
cv2.waitKey()
I've also attached two sample images. I used the first one as the base, but it works either way.
Edit: I have now turned the answer linked below into a Python module, which you can now grab from GitHub here.
I answered this question a few weeks ago. The answer should contain everything needed to accomplish what you're after; the only thing I don't discuss there is alpha blending or other techniques to blend the borders of the images together as you would with a panorama or similar.
In order to not crop the warped photo you need to calculate the needed padding beforehand because the image warp itself could reference negative indices, in which case it won't draw them...so you need to calculate the warp locations first, pad your image enough to account for those indices outside your image bounds, and then modify your warp matrix to add those translations in so they get warped to positive values.
This allows you to create an image like this:
Image from Oxford's VGG.