Does Tesseract do image resizing internally? - python

OpenCv doesn't read the metadata of the image. So that, we can't get the dpi of an image. When someone asks about dpi related ocr questions in stackoverflow,
Most of the answers said we don't need DPI. We only need a pixel size.
Changing image DPI for usage with tesseract
Change dpi of an image in OpenCV
In some places, where no one asks about dpi and needs to improve the OCR accuracy someone's come up with the idea that setup DPI to 300 will improve the accuracy.
Tesseract OCR How do I improve result?
Best way to recognize characters in screenshot?
One more thing is, Tesseract said on their official page about that
Tesseract works best on images which have a DPI of at least 300 dpi, so it may be beneficial to resize images.
After some google search, I have found the following things.
We can't tell the image resolution based on height and width
We want an image resolution is high enough to support accurate OCR.
Font size typically means unit length and not pixels like if we have 72 points we have one inch. font size 12pt means 1/6 inchs.
When we have 300 ppi image with a 12pt fontsize texts then the text pixel size is 300 1/6 = 50 pixels.
If we have 60 ppi then the text pixel size is 601/6 =10 pixels.
Below quoted one is from the tesseract official page.
Is there a Minimum / Maximum Text Size? (It won’t read screen text!)
There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be “noise removed”.
Using LSTM there seems also to be a maximum x-height somewhere around 30 px. Above that, Tesseract doesn’t produce accurate results. The legacy engine seems to be less prone to this (see https://groups.google.com/forum/#!msg/tesseract-ocr/Wdh_JJwnw94/24JHDYQbBQAJ).
From these things, I come to one solution that is,
We need a 10 to 12 pt font size text for the OCR. which means If we have 120 ppi(pixel per inch) then we need a height of 20-pixel size. if we have 300 ppi then we need a 50-pixel height for the text.
If Opencv doesn't read the dpi information. What is the default dpi value to tesseract input from an image which is got by imread method of OpenCV?
Does Tesseract do image resizing based on the dpi of an image internally?
If I do resizing the image using opencv then i need to set the dpi to 300 dpi if resizing happens based on dpi internally. What is the easiest way to set up the DPI in OpenCV + pytesseract? but we can do this with PIL

To answer your questions:
DPI is only really relevant when scanning documents - it's a measure of how many dots per inch are used to represent the scanned image. Once tesseract is processing images, it only cares about pixels.
Not as far as I can tell.
The SO answer you linked to relates to writing an image, not reading an image.
I think I understand the core of what you're trying to get at. You're trying to improve the accuracy of your results as it relates to font/text size.
Generally speaking, tesseract seems to work best on text that is about 32 px tall.
Manual resizing
If you're working on a small set of images or a consistent group of images, you can manually resize those images to have capital letters that are approximately 32 pixels tall. That should theoretically give the best results in tesseract.
Automatic resizing
I'm working with an inconsistent data set, so I need an automated approach to resizing images. What I do is to find the bounding boxes for text within the image (using tesseract itself, but you could use EAST or something similar).
Then, I calculate the median height of these bounding boxes. Using that, I can calculate how much I need to resize the image so that the median height of a capital letter in the image is ~32 px tall.
Once I've resized the image, I rerun tesseract and hope for the best. Yay!
Hope that helps somewhat! :)
Bonus: I shared my source code for this function in this Gist

Related

How does cv2.IMREAD_GRAYSCALE convert 16 bit images to 8 bit (OpenCV python)?

I am importing images in python using OpenCV.
import cv2
img = cv2.imread(img, cv2.IMREAD_GRAYSCALE)
The image is a 16 bit .png or .tif, and is converted to 8 bit by cv2.IMREAD_GRAYSCALE (as expected). I am happy that OpenCV converts my image. I am aware that I can use cv2.IMREAD_UNCHANGED if I want a 16bit image.
I just want to know how OpenCV is converting my image from 16 to 8 bit. E.g. via typical normalisation, or in some other way that might saturate pixels? For my downstream problems it is more important that the general range is preserved.
I have checked the OpenCV documentation and cannot find an explanation for this. I cannot find a similar question.
EDIT: My images are single channel.
The uint 16 image ( or the 16 bit image ) used values from 0 to 2^16-1, while the uint8 images ( or the 8 bit image ) uses a range from, 0 to 2^8-1 only.
If you just cast the original values to uint8, the saturation will destroy a lot of information.
This is seen in the form of quality degradation .
This happens due to bit depth .
“Bit-depth” determines the smallest changes you can make, relative to some range of values. If our scale is brightness from pure black to pure white, then the 4 values we get from a 2-bit number would include: black, dark midtones, light midtones, and white. This is a pretty lumpy scale and not very useful for a photograph. But if we have enough bits, we have enough gray values to make what appears to be a perfectly smooth gradient from black to white.
Here’s an example comparing a black to white gradient at different bit depths. The embedded image here is just an example, click here to see the full resolution image in the JPEG2000 format with bit depths up to 14-bits. Depending on the quality of your monitor, you can probably only display differences up to 8-10 bits.
All software designs use and implement the same process.

Anti-aliasing of random dot stereograms

I recently completed some Python (2.7) code for generating random dot stereograms based on this paper. The output is fairly good, though I have noticed that, even with a smooth gradient in the depth map, the output stereogram lacks these smooth gradients, instead having varying levels of depth. I believe this to be due to the DPI chosen when generating the image. While the detail of the depth can be increased by increasing the DPI, this becomes impractical as the convergence point becomes more difficult to reach.
Here are two examples. First at 75 DPI and second at 175 DPI. On the 75 DPI image, distinct "triangles" of depth can be seen. In the 175 DPI image, these are less pronounced but the guidance dots at the bottom of the image are further apart, and therefore viewing the 3D image is more difficult.
I'm looking to modify my current code to anti-alias the 3D image in order to smooth out the gradients even with a lower DPI. I have tried using SSAA on the depth map and pattern and generating the stereogram, then reducing the image size again with an antialiasing filter. However this seems to just contain the stereogram to the left of the image. For example, if I make the image 4 times bigger, the stereogram is limited to the left hand quarter of the image. The rest is just random noise and cannot be viewed. How would I go about antialiasing the image hidden in the stereogram? My code is almost the same as the algorithm described in the paper, so an antialiasing algorithm based on that would be perfect.
The solution for the problem I was having, with the stereogram being contained to the left of the image, was caused by not extending the same array to reflect the larger depth map. This caused everything beyond the original length of the depth map to be randomly generated noise.
After solving this problem, a second problem arose, in that the 3D image was distorted by the anti-aliasing, causing more gradient issues than it was solving. My solution for this was to increase the DPI setting in the code. For example, if I increased the size of the depth map by 4x, the stereogram must be generated with a DPI 4 times greater (300, rather than 75). When scaled down again, this produced excellent results.
This image uses 2x SSAA, making the gradients comparable with the 175DPI image from the question, but with a much easier converging point.
This image uses 4x SSAA, and I find the jaggies barely visible at all. The noise here becomes a lot more blurred and the general colour of the image becomes quite grey. I have found this effect can be avoided by pregenerating the noise and scaling that up by the same AA factor. This is demonstrated in the next image.

Image resize to a specific height and width in django python

I have a django based website in which I have created profiles of people working in the organisation. Since this is a redesign effort, I used the already existing profile pictures. The size of current profile image style is 170x190px. Since the images already exist and are of different sizes, I want to crop them to the size specified above. But how do I decide from which side I have to crop?
Currently, I have applied style of 170by190 to all the images while displaying in profiles, but most of them look distorted as the aspect ratios do not match.
I have tried PIL thumbnail function but it does not fit the need.
Please suggest a solution.
Well, you have to resize pictures, but images ratio create huge impact on final result. As images have some ratio, and you cannot simply resize them to 170px190px without prior adjusting of their ratio, so you have to update( not crop them!) images before resizing them to get best possible output, it can be done in next ways:
Crop them manually to desired ratio (17:19). (take a while if you have plenty of images)
Create script which add padding to that images if image ratio is close to required, all images which ratio is far away from desired mark as 'human cropping required' and work with their ratio later by own (semi-manual, so still may be really time consuming)
Spend some time and write face recognation function, then process images with that function and find faces, then crop them from origin image, but before: add padding to achieve desired radio (17:19) at top and bottom of face. (recommended)
Some links which may be use full for you:
Face Recognition With Python, in Under 25 Lines of Code
facereclib module, they probably are able to help you.
Image Manipulation, The Hitchhiker’s Guide
Good luck !
Use sorl-thumbnail, you don't need to crop every image manually.

image rendering issue in psychopy

I am a long-time psychopy user, and i just upgraded to 1.81.03 (from 1.78.x). In one experiment, i present images (.jpgs) to the user and ask for a rating scale response. The code worked fine before the update, but now i am getting weird artifacts on some images. For example, here is one image i want to show:
But here is what shows up [screencapped]:
You can see that one border is missing. This occurs for many of my images, though it is not always the same border, and sometimes two or three borders are missing.
Does anyone have an idea about what might be going on?
I received this information from the psychopy-users group (Micahel MacAskill):
As a general point, you should avoid using .jpgs for line art: they aren't designed for this (if you zoom in, in the internal corners of your square, you'll see the typical compression artefacts that their natural image-optimised compression algorithm introduces when applied to line art). .png format is optimal for line art. It is lossless and for this sort of image will still be very small file-size wise.
Graphics cards sometimes do scaling-up and then down-scaling of bitmaps, which can lead to issues like this with single-pixel width lines. Perhaps this is particularly the issue here because (I think) this image was supposed to be 255 × 255 pixels, and cards will sometimes scale up to the nearest power-of-two size (256 × 256) and then down again, so easy to see how the border might be trimmed.
I grabbed your image off SO, it seemed to have a surrounding border around the black line to make it 321 × 321 in total. I made that surround transparent and saved it as .png (another benefit of png vs jpg). It displays without problems (although a version cropped to just the precise dimensions of the black line did show the error you mentioned). (Also, the compression artefacts are still there, as I just made this png directly from the jpg). See attached file.
If this is the sort of simple stimulus you are showing, you might want to use ShapeStim/Polygon stimuli instead of bitmaps. They will always be drawn precisely, without any scaling issues, and there wouldn't be the need for any jiggery pokery.
Why this changed from 1.78 I'm not sure. The issue is also there in 1.82.00

How can I improve ReportLab image quality?

I'm building a label printer. It consists of a logo and some text, not tough. I have already spent 3 days trying to get the original SVG logo to draw to screen but the SVG is too complex, using too many gradients, etc.
So I have a high quality bitmapped logo (as a JPG or PNG) and I'm drawing that on a ReportLab canvas. The image in question is much larger than 85*123px. I did this hoping ReportLab would embed the whole thing and scale it accordingly. Here's how I'm doing it:
canvas.drawImage('logo.jpg', 22+xoffset, 460, 85, 123)
The problem is, my assumption was incorrect. It seems to scale it down to 85*123px at screen resolution and that means when it's printed, it doesn't look great.
Does ReportLab have any DPI commands for canvases or documents so I can keep the quality sane?
Having previously worked at the ReportLab company, I can tell you that raster images do not go through any automatic resampling/downscaling while being included in the PDF. The 85*123 dimensions you are using are not pixels, but points (pt) which are a physical unit like millimetres or inches.
I would suggest printing the PDF with different quality images to confirm this or otherwise zooming in very, very far using your PDF viewer. It will always look a bit fuzzy in a PDF viewer as the image is resampled twice (once in the imaging software and then again to the pixels available to the PDF viewer).
This is how I would calculate what size in pixels to make a raster image for it to print well at a given physical size:
Assume I want the picture to be 2 inches wide, there are 72 points in a inch so the width in my code would be 144. I know that a good crisp resolution to print at is 300dpi (dots per inch) so the raster image is saved at 600px wide.
One option that I thought of while writing the question is: increase the size of the PDF and let the printer sort things out.
If I just multiplied all my numbers by 5 and the printer did manage to figure things out, I'd have close to 350DPI... But I'm making quite an assumption.
I don't know if it will work for all but in my case it did.
I only needed to add a logo on the top so I used drawImage()
but shrank the size of the logo by a third
c.drawImage(company_logo,225,750,width=(483/3),height=(122/3))
I had to previously know the real company logo size so it does not get distorted.
I hope it helps!

Categories