Why can't tesseract-ocr detect text which is in a box? - python

Consider this experiment:
I have two images, one with free text and another with text with-in a box (surrounded by border)
If I run tesseract-ocr on these two images, the free text image outputs 'Text' while the boxed image outputs Nothing ''
Why is that?
As a fix, I can crop borders using some image processing but I am trying to know what's causing this problem.
free image
Boxed image
So far,
I cropped borders of the image using below logic [We should feed it outer border contour-cropped image] and I then I was able to detect the text. However I don't understand why tesseract isn't detecting the boxed text. Feel free to experiment on attached images.
`# Below code modified (x,y) and (height,width) `
`# in a way that new values choose a smaller box enclosed`
`# by the original box`
y = y + int(0.025*h)
x = x + int(0.025*w)
h = h - int(0.05*h)
w = w - int(0.05*w)

Related

Using PIL to draw individual pixels, but the image is blurry

I am trying to create an image made up of coloured squares. I only need each square to be one pixel large, as it is just a single block colour. However, when I use this code, the image generated is extremely blurry. Is there anyway to make the boarders sharp?
def fancycolnw2(seq,m):
data=numbwall(seq,m)
#print(data)
for i in range(len(data)):
for j in range(len(data[i])):
if data[i][j]==' ':
data[i][j]=-1
im = Image.new('RGBA', (len(data[0]),len(data))) # create the Image of size 1 pixel
#print(data)
for i in range(len(data)-1):
for j in range(len(data[i])-1):
#print(i,j)
if data[i][j]==-1:
im.putpixel((j,i), ImageColor.getcolor('black', 'RGBA'))
if data[i][j]==0:
#print('howdy')
im.putpixel((j,i), ImageColor.getcolor('red', 'RGBA'))
if data[i][j]==1:
im.putpixel((j,i), ImageColor.getcolor('blue', 'RGBA'))
if data[i][j]==2:
im.putpixel((j,i), ImageColor.getcolor('grey', 'RGBA'))
im.show()
im.save('simplePixel.png') # or any image format
The result I get looks like this:
Image
It is the correct image, I just wish the boundaries between pixels were sharp. Any help would be greatly appreciated!
The image is perfectly sharp, but rather small. I suspect that you are "zooming in" to view it clearer, and that whatever program you are zooming with is filtering the image, because with most images this looks better. You need to find a viewing program that uses "nearest neighbour" resampling when zooming in, or generate a larger image to start with, for example by setting a 4-by-4 pixel block rather than individual pixels.
(Also, the code says "# or any other image format". Don’t use JPEG for this, as the lossy compression will likely wreck your image.)

Python - Find out aspect ratio of image

I'm writing a script that automatically lets me download images from a website and set them as my background. But I only want landscape/horizontal pictures, is there a way for python to see the aspect ratio of an image and then discard them based on whether it's horizontal or vertical? And would this be easier to do after downloading the pictures, or before using Selenium?
You can achieve this with the Python OpenCV Library, available here.
With openCV you can read the width and height of your choice image. Just like this:
import cv2
im = cv2.imread('www.your_image_url.com')
h, w, c = im.shape
# h is the image height, and w is the width. Note that for portrait images w is always less that h
# then pass your download condition
if w < h:
# do not download image code
else:
# download image code

Programming a picture maker template in Python possible?

I'm looking for a library that enables to "create pictures" (or even videos) with the following functions:
Accepting picture inputs
Resizing said inputs to fit given template / scheme
Positioning the pictures in pre-set up layers or coordinates
A rather schematic approach to look at this:
whereas the red spots are supposed to represent e.g. text, picture (or if possible video) elements.
The end goal would be to give the .py script multiple input pictures and the .py creating a finished version like mentioned above.
Solutions I tried were looking into Python PIL, but I wasn't able to find what I was looking for.
Yes, it is possible to do this with Python.
The library you are looking for is OpenCV([https://opencv.org][1]/).
Some basic OpenCV python tutorials (https://docs.opencv.org/master/d9/df8/tutorial_root.html).
1) You can use imread() function to read images from files.
2) You can use resize() function to resize the images.
3) You can create a empty master numpy array matching the size and depth(color depth) of the black rectangle in the figure you have shown, resize your image and copy the contents into the empty array starting from the position you want.
Below is a sample code which does something close to what you might need, you can modify this to suit your actual needs. (Since your requirements are not clear I have written the code like this so that it can at least guide you.)
import numpy as np
import cv2
import matplotlib.pyplot as plt
# You can store most of these values in another file and load them.
# You can modify this to set the dimensions of the background image.
BG_IMAGE_WIDTH = 100
BG_IMAGE_HEIGHT = 100
BG_IMAGE_COLOR_DEPTH = 3
# This will act as the black bounding box you have shown in your figure.
# You can also load another image instead of creating empty background image.
empty_background_image = np.zeros(
(BG_IMAGE_HEIGHT, BG_IMAGE_WIDTH, BG_IMAGE_COLOR_DEPTH),
dtype=np.int
)
# Loading an image.
# This will be copied later into one of those red boxes you have shown.
IMAGE_PATH = "./image1.jpg"
foreground_image = cv2.imread(IMAGE_PATH)
# Setting the resize target and top left position with respect to bg image.
X_POS = 4
Y_POS = 10
RESIZE_TARGET_WIDTH = 30
RESIZE_TARGET_HEIGHT = 30
# Resizing
foreground_image= cv2.resize(
src=foreground_image,
dsize=(RESIZE_TARGET_WIDTH, RESIZE_TARGET_HEIGHT),
)
# Copying this into background image
empty_background_image[
Y_POS: Y_POS + RESIZE_TARGET_HEIGHT,
X_POS: X_POS + RESIZE_TARGET_WIDTH
] = foreground_image
plt.imshow(empty_background_image)
plt.show()

get_y() value of the image bottom in fpdf in python

I'm using Python FPDF to generate pdf. The pdf generally contains text followed by images followed by text and so on. But the cells that contains text are overlapping with the images above them. I tried to calculate image length and pass that to set_y for next cell to avoid overlapping, still no luck. So I tried using get_y() but it returns the 'y' value of previous text cell instead of the bottom of the image.
So how can I get the 'y-coordinate' of the bottom of the image ?
I was facing the same issue and I resolved it by using the pillow library.
The Pillow library will give us the height and width of the image in pixels. And if you divide it by 10 it will work as we needed.
After this, I am abled to get the y coordinate exactly below the inserted image. I inserted another image exactly below the previous image.
y = pdf.get_y()
img_height = Image.open(imagePath).height/10
y = y + img_height + 10
pdf.set_y(y)
That's how you will get it and to set the y coordinate I used pdf.set_y(y).

Use Python / PIL or similar to shrink whitespace

Any ideas how to use Python with the PIL module to shrink select all? I know this can be achieved with Gimp. I'm trying to package my app as small as possible, a GIMP install is not an option for the EU.
Say you have 2 images, one is 400x500, other is 200x100. They both are white with a 100x100 textblock somewhere within each image's boundaries. What I'm trying to do is automatically strip the whitespace around that text, load that 100x100 image textblock into a variable for further text extraction.
It's obviously not this simple, so just running the text extraction on the whole image won't work! I just wanted to query about the basic process. There is not much available on Google about this topic. If solved, perhaps it could help someone else as well...
Thanks for reading!
If you put the image into a numpy array, it's simple to find the edges which you can use PIL to crop. Here I'm assuming that the whitespace is the color (255,255,255), you can adjust to your needs:
from PIL import Image
import numpy as np
im = Image.open("test.png")
pix = np.asarray(im)
pix = pix[:,:,0:3] # Drop the alpha channel
idx = np.where(pix-255)[0:2] # Drop the color when finding edges
box = map(min,idx)[::-1] + map(max,idx)[::-1]
region = im.crop(box)
region_pix = np.asarray(region)
To show what the results look like, I've left the axis labels on so you can see the size of the box region:
from pylab import *
subplot(121)
imshow(pix)
subplot(122)
imshow(region_pix)
show()
The general algorithmn would be to find the color of the top left pixel, and then do a spiral scan inwards until you find a pixel not of that color. That will define one edge of your bounding box. Keep scanning until you hit one more of each edge.
http://blog.damiles.com/2008/11/basic-ocr-in-opencv/
might be of some help. You can use the simple bounding box method described in that tutorial or #Tyler Eaves spiral suggestion which works equally as well

Categories