I have an object, there are 2 code on it. text printed on it. The text is curve. half of text is in the top side, and another half is in bottom side of object. Here is my sample image
I am using OPENCV, and Deep learning approaches and tessract to OCR it's code.
I logical approach(not Deep approach) I first used HoughCircles() andlogPloar() to align text in line then used tessract such this example sample code. But because of distortion in aligned text, tesseract fail to OCR it's text.
In Deep approach I cant find fine a optimum solution for curve text OCR in tensorflow or torch. There are many sources for text detection not recognition.
Regards,John
why not transform the circular text to linear? Similar to this De-skew characters in binary image just a bit more complicated. So detect (or manually select) the center of circle and convert the image to unrotated one ...
So create new image that has dimensions 6.28*max_radius , 2*max_radius and copy pixels using polar unwraping ... simply convert target pixel position into polar coordinates and convert that to Cartesian source pixel position.
I do not code in Python nor OpenCV but here is a simple C++ example of this:
//---------------------------------------------------------------------------
picture pic0,pic1; // pic0 - original input image,pic1 output
//---------------------------------------------------------------------------
void ExtractCircularText(int x0,int y0) // pic0 -> pic1 center = (x0,y0)
{
int x,y,xx,yy,RR;
float fx,fy,r,a,R;
// resize target image
x= -x0; y= -y0; a=sqrt((x*x)+(y*y)); R=a;
x=pic0.xs-x0; y= -y0; a=sqrt((x*x)+(y*y)); if (R<a) R=a;
x= -x0; y=pic0.ys-y0; a=sqrt((x*x)+(y*y)); if (R<a) R=a;
x=pic0.xs-x0; y=pic0.ys-y0; a=sqrt((x*x)+(y*y)); if (R<a) R=a;
R=ceil(R); RR=R;
pic1.resize((628*RR)/100,RR<<1);
for (yy=0;yy<pic1.ys;yy++)
for (xx=0;xx<pic1.xs;xx++)
{
// pic1 position xx,yy -> polar coordinates a,r
a=xx; a/=R; r=yy;
// a,r -> pic0 position
fx=r*cos(a); x=x0+fx;
fy=r*sin(a); y=y0+fy;
// copy pixel
if ((x>=0)&&(x<pic0.xs))
if ((y>=0)&&(y<pic0.ys))
{
pic1.p[ yy][pic1.xs-1-xx]=pic0.p[y][x]; // 2 mirrors as the text is not uniformly oriented
pic1.p[pic1.ys-1-yy][ xx]=pic0.p[y][x];
}
}
pic1.save("out.png");
}
//---------------------------------------------------------------------------
I use my own picture class for images so some members are:
xs,ys is size of image in pixels
p[y][x].dd is pixel at (x,y) position as 32 bit integer type
clear(color) clears entire image with color
resize(xs,ys) resizes image to new resolution
And finally the resulting image:
I made a 2 copies of the un rotated image (hence 2*max_radius height) so I can copy image in 2 modes to made both orientations of the text readable (as they are mirrored to each other)
Text will be more straight if you chose the center (x0,y0)more precisely I did just click it by mouse on the center of the circle but I doubt the center of text has the same center as that circle/disc. After some clicking this is the best center I could found:
The result suggest that none of the two texts nor disc has the same center ...
The quality of input image is not good you should improve it before doing this (maybe even binarization is a good idea) also storing it as JPG is not a good idea as its lossy compression adding more noise to it. Take a look at these:
Enhancing dynamic range and normalizing illumination
OCR and character similarity
PS. The center could be computed geometrically from selected text (arc) simply find most distant points on it (edges) and point on the middle between them on the arc. From that you can compute arc center and radius... or even fit it ...
The black dot is a perfect feature for centering, and the polar unwarping seems to work fine, the deformation of the characters is negligible.
The failure of Tesserac might be explained by the low image quality (blur).
Related
I have one image given
here and I have centroids and area of every small and big defect present here, for example I have three lists x, y and area where x and y are coordinates of centroids of defect(every yellow object considers defect) in the image and area is area of defect computed from contour. I want to show density map or heatmap on this image where it is clearly shown that defect with higher area is having more peak compare to defect with lower area, how can I do this in python? for reference I have attached one more image from one paper given here, here based one kde and weighted kde of image it is clearly shown where bigger defect(big yellow circle) is having more area.
So you are trying to draw a heatmap superimposed on an image, to represent what you are calling the "defects" in the image (it's not clear from your explanation what those are--maybe deviations from a reference image?)? This sounds like it would be VERY confusing for a viewer to interpret, having to mentally separate the heatmap pixels from the pixels of the image itself. Much better would be to create a new blank image with the same dimensions as the original, then plot points in that image whose center (x,y) represent the location in the original image, and whose radius/color represent area.
I am working on a stereo vision project. My goal is to locate the 3D-coordinate of a point on a target that marked by a laser point.
I do the stereo calibration with full size pictures. After getting the parameters, "initUndistortRectifyMap" is applied to get the mapping data "map1" and "map2".
cv.initUndistortRectifyMap( cameraMatrix, distCoeffs, R, newCameraMatrix, size, m1type[, map1[, map2]] ) -> map1, map2
Since my target is just a small area and I would like to increase my acquiring fps, My cameras acquire the ROI instead of full size pictures.
Here comes my problem.
Can I just map the ROI of an image instead of full picture?
It is easy to map the same size picture as map1 and map2 with remap function, however, how can I just map the ROI of the picture.
cv.remap( src, map1, map2, interpolation[, dst[, borderMode[, borderValue]]] ) -> dst
Note, I try to crop the ROI of the "map1" and "map2" but it is not simply mapping pixels from source picture to destination picture.
According to https://stackoverflow.com/a/34265822/18306909, I can not directly use map_x and map_y to get the destination of ROI.
As stated in the docs you refer to, it is dst(x, y) = src(map_x(x, y), map_y(x, y)). Transforming points dst -> src is easy (lookup in map_x and map_y), but the OP wants the other (more natural) direction: src -> dst. This is admittingly confusing because cv::remap works "inversely" (for numerical stability reasons). I.e., in order to map an image src -> dst, you supply a mapping from dst -> src. Unfortunately, that's only efficient when transforming many points on a regular grid (i.e. image). Transforming a single random point is pretty difficult. – pasbi Feb 24, 2021 at 10:17
I am trying to train a YOLO model.
For this purpose I have divided my input image of 224*224 into 14*14 grids.
Now if suppose theres an object its centre is located at Bx,By considering 0,0 as top left of image and has Bw, Bh height and width respectively.
Required_prediction=[Pc,Bx,By,Bw,Bh]
where Pc is probability of required object
Thus output of model will be 14*14*5.
My question is what should the output Label be ?
All boxes [0,0,0,0,0] and the box containing the centre of req img as [pc,bx,by,bw,bh] OR
All boxes [0,0,0,0,0] except whole area of required image labelled as [pc,bx. . . ]
ALSO
for bx,by,bw,bh the centre of the image is to be specified wrt to top left of the image or the grid the coordinate fall into?
All boxes [0,0,0,0,0] and the box containing the center of req img as [pc,bx,by,bw,bh] is the right choice for the assumption that you divided the image into 14*14 grid.
but in real world problems they split the image using different sizes to solve this problem which means that you may split the image into 14*14, 8*8 and 4*4 grids to address different sizes of the objects
[Updated The Question at the End]
I'm trying to detect a design pattern of simple geometrical shapes in a 640x480 image. I have divided the image in 32x32 blocks and checking in which block each shape's center lies.
Based on this calculation I created a numpy matrix of (160x120) zeros (float32) with
col=640/4
row=480/4
Each time a shape is found, the center is calculated and check in which block it is found. The corresponding item along with its 8 neighbors in 160x120 numpy array are set to 1. In the end the 160x120 numpy array is represented as a grayscale image with black background and white pixels representing the blocks of detected shapes.
As shown in the image below.
The image in top left corner represents the 160x120 numpy array. No issue so far.
As you can see the newly generated image has a white line on black foreground. I want to find the rho,theta,x0,y0,x1,y1 for this line. So I decided to use HoughLines transformation for this.
For is as followed:
edges = cv2.Canny(np.uint8(g_quadrants), 50, 150, apertureSize=3)
lines = cv2.HoughLines(edges, 1, np.pi / 180, 200)
print lines
Here g_quadrants is the 160x120 matrix representing a gray scale image but output of cv2.HoughLines does not contain anything but None.
Please help me with this.
Update:
The small window with a black and white (np.float32 consider GrayScale) image displaying a white is what I get actually when I
Divide the 640x480 in 32x32 blocks
Find the triangles in the image
Create a 32x32 matrix to map the results for each block
Update the corresponding matrix element by 1 if a triangle is found in a block
Zoomed View:
You can see there are white pixels forming a straight line. The may be some unwanted detected. I need to eliminate unwanted lone pixels and reconstructing a continuous straight line. That may be achieved by dilating then eroding the image. I need the find x0,y0, x1,y1, rho, theta of this line.
Their may be more than one lines. In that case I need to find top 2 lines with respect to length.
I have an image, using steganography I want to save the data in border pixels only.
In other words, I want to save data only in the least significant bits(LSB) of border pixels of an image.
Is there any way to get border pixels to store data( max 15 characters text) in the border pixels?
Plz, help me out...
OBTAINING BORDER PIXELS:
Masking operations are one of many ways to obtain the border pixels of an image. The code would be as follows:
a= cv2.imread('cal1.jpg')
bw = 20 //width of border required
mask = np.ones(a.shape[:2], dtype = "uint8")
cv2.rectangle(mask, (bw,bw),(a.shape[1]-bw,a.shape[0]-bw), 0, -1)
output = cv2.bitwise_and(a, a, mask = mask)
cv2.imshow('out', output)
cv2.waitKey(5000)
After I get an array of ones with the same dimension as the input image, I use cv2.rectangle function to draw a rectangle of zeros. The first argument is the image you want to draw on, second argument is start (x,y) point and the third argument is the end (x,y) point. Fourth argument is the color and '-1' represents the thickness of rectangle drawn (-1 fills the rectangle). You can find the documentation for the function here.
Now that we have our mask, you can use 'cv2.bitwise_and' (documentation) function to perform AND operation on the pixels. Basically what happens is, the pixels that are AND with '1' pixels in the mask, retain their pixel values. Pixels that are AND with '0' pixels in the mask are made 0. This way you will have the output as follows:
.
The input image was :
You have the border pixels now!
Using LSB planes to store your info is not a good idea. It makes sense when you think about it. A simple lossy compression would affect most of your hidden data. Saving your image as JPEG would result in loss of info or severe affected info. If you want to still try LSB, look into bit-plane slicing. Through bit-plane slicing, you basically obtain bit planes (from MSB to LSB) of the image. (image from researchgate.net)
I have done it in Matlab and not quite sure about doing it in python. In Matlab,
the function, 'bitget(image, 1)', returns the LSB of the image. I found a question on bit-plane slicing using python here. Though unanswered, you might want to look into the posted code.
To access border pixel and enter data into it.
A shape of an image is accessed by t= img.shape. It returns a tuple of the number of rows, columns, and channels.A component is RGB which 1,2,3 respectively.int(r[0]) is variable in which a value is stored.
import cv2
img = cv2.imread('xyz.png')
t = img.shape
print(t)
component = 2
img.itemset((0,0,component),int(r[0]))
img.itemset((0,t[1]-1,component),int(r[1]))
img.itemset((t[0]-1,0,component),int(r[2]))
img.itemset((t[0]-1,t[1]-1,component),int(r[3]))
print(img.item(0,0,component))
print(img.item(0,t[1]-1,component))
print(img.item(t[0]-1,0,component))
print(img.item(t[0]-1,t[1]-1,component))
cv2.imwrite('output.png',img)