How to detect and rotate images in python - python

I have multiple pdf invoice which i am trying to parse. I convert them to images and use ocr to get text from the images. One of the pdf has 2 out of 3 pages which are rotated by 90 degrees. How do i detect these rotated pages and correctly rotate them for the ocr to return correct information ?

To keep the image intact, you can set the parameter 'expand' to True
image = image.rotate(270, expand=True)

Here is a solution that works for one image but you can do it for a list of images and check each image before saving it back to PDF:
#import library
enter code here
from PIL import Image
#open image file
f=Image.open('test.jpg')
#conver to pdf
pdf=f.convert('RGB')
#if width > than height, rotate it to get portrait
if pdf.width > pdf.height:
pdf=pdf.rotate(270,expand=True)
#save pdf
pdf.save('test.pdf')

When you say they are rotated, would it be as simple as they are all meant to be in portrait orientation and some pages are landscape orientation? You should either be able to read the metadata from the PDF of the orientation of the pages, or if that's not available for some reason you might need to use this simple logic to determine it, like rotated = image.width > image.height
With Pillow/PIL it would be easy to rotate the image before OCR:
if rotated:
image = image.rotate(270)
Presumably there could be a case of pages being upside down and unless you have reliable metadata from the PDF, then you might have to first OCR with the most likely direction (say counter-clockwise 90 degrees as per above) and if that doesn't return any text try again after rotating 180 degrees.

You can use imutils to rotate without cutting out image boundaries after rotation.
import cv2 as cv
import imutils
img = cv.imread('your_image.png')
imutils.rotate_bound(img, 270) #### 270 for anti-clockwise or 90 for clockwise

Related

How can I display an image to me (as an user) and select a section of that image?

I will explain myself better, the point is that I want to develop a code that displays an image to me, then with the mouse as the image is displayed I can select or crop it as I want.
So, for example, as this code does, it would select any section of the image:
from PIL import Image
im = Image.open("test.jpg")
crop_rectangle = (50, 50, 200, 200)
cropped_im = im.crop(crop_rectangle)
cropped_im.show()
That is basically it, all I want is to crop an image at the points or coordinates that I please, I have been searching but can not find any library that helps me to do so.
NOTE: The code I am showing is an answer on this post, in case you want to check it out: How can i select a part of a image using python?
EDIT: Here is something I foud that may help me in a first instance, but is not entirely what I am looking for, it at least lets me find the coordinates that I want from the image - with some modifications on the code - and then I will keep processing the image. By now it is not solving my issue but it is a beginning. --> Using "cv2.setMouseCallback" method
The comments I received from Mark Setchell and bfris enlightened me, one with a C++ code and another one with a recommendation of OpenCV.
But in addition to their comments, I found this article where they explain exactly what I want to do, so I consider my question answered. Select ROI or Multiple ROIs [Bounding box] in OPENCV python.
import cv2
import numpy as np
#image_path
img_path="image.jpeg"
#read image
img_raw = cv2.imread(img_path)
#select ROI function
roi = cv2.selectROI(img_raw)
#print rectangle points of selected roi
print(roi)
#Crop selected roi from raw image
roi_cropped = img_raw[int(roi[1]):int(roi[1]+roi[3]), int(roi[0]):int(roi[0]+roi[2])]
#show cropped image
cv2.imshow("ROI", roi_cropped)
cv2.imwrite("crop.jpeg",roi_cropped)
#hold window
cv2.waitKey(0)
or
import cv2
import numpy as np
#image_path
img_path="image.jpeg"
#read image
img_raw = cv2.imread(img_path)
#select ROIs function
ROIs = cv2.selectROIs("Select Rois",img_raw)
#print rectangle points of selected roi
print(ROIs)
#Crop selected roi ffrom raw image
#counter to save image with different name
crop_number=0
#loop over every bounding box save in array "ROIs"
for rect in ROIs:
x1=rect[0]
y1=rect[1]
x2=rect[2]
y2=rect[3]
#crop roi from original image
img_crop=img_raw[y1:y1+y2,x1:x1+x2]
#show cropped image
cv2.imshow("crop"+str(crop_number),img_crop)
#save cropped image
cv2.imwrite("crop"+str(crop_number)+".jpeg",img_crop)
crop_number+=1
#hold window
cv2.waitKey(0)

Advanced cropping with Python Imaging Library

I have an A4 png image with some text in it, it's transparent, my question is, how can I crop the image to only have the text, I am aware of cropping in PIL, but if I set it to fixed values, it will not be able to crop another image that has that text in another place. So, how can I do it so it finds where the text, sticker, or any other thing is placed on that big and empty image, and crop it so the thing fits perfectly?
Thanks in advance!
You can do this by extracting the alpha channel and cropping to that. So, if this is your input image:
Here it is again, smaller and on a chessboard background so you can see its full extent:
The code looks like this:
#!/usr/bin/env python3
from PIL import Image
# Load image
im = Image.open('image.png')
# Extract alpha channel as new Image and get its bounding box
alpha = im.getchannel('A')
bbox = alpha.getbbox()
# Apply bounding box to original image
res = im.crop(bbox)
res.save('result.png')
Here is the result:
And again on a chessboard pattern so you can see its full extent:
Keywords: Image processing, Python, PIL/Pillow, trim to alpha, crop to alpha, trim to transparency, crop to transparency.
from PIL import Image
im = Image.open("image.png")
im.getbbox()
im2 = im.crop(im.getbbox())
im2.save("result.png")

How to change the grey scale value of a region in an image?

I am new to Python and not really sure how to attack this problem.
What I am trying to do is to take a black and white image and change the value of the edge (x pixels thick) from 255 to some other greyscale value.
I need to do this to a set of png images inside of a folder. All images will be geometric (mostly a combination of straight lines) no crazy curves or patterns. Using Python 3.
Please check the images.
A typical file will look like this:
https://drive.google.com/open?id=13ls1pikNsO7ZbsHatC6cOr4O6Fj0MPOZ
I think this is what you want. The comments should explain pretty well what I going on:
#!/usr/bin/env python3
import numpy as np
from PIL import Image, ImageFilter
from skimage.morphology import dilation, square
# Open input image and ensure it is greyscale
image = Image.open('XYbase.png').convert('L')
# Find the edges
edges = image.filter(ImageFilter.FIND_EDGES)
# Convert edges to Numpy array and dilate (fatten) with our square structuring element
selem = square(6)
fatedges = dilation(np.array(edges),selem)
# Make Numpy version of our original image and set all fatedges to brightness 128
imnp = np.array(image)
imnp[np.nonzero(fatedges)] = 128
# Convert Numpy image back to PIL image and save
Image.fromarray(imnp).save('result.png')
So, if I start with this image:
The (intermediate) edges look like this:
And I get this as the result:
If you want the outlines fatter/thinner, increase/decrease the 6 in:
selem = square(6)
If you want the outlines lighter/darker, increase/decrease the 128 in:
imnp[np.nonzero(fatedges)] = 128
Keywords: image, image processing, fatten, thicken, outline, trace, edge, highlight, Numpy, PIL, Pillow, edge, edges, morphology, structuring element, skimage, scikit-image, erode, erosion, dilate, dilation.
I can interpret your question in a much simpler way, so I thought I'd answer that simpler question too. Maybe you already have a grey-ish edge around your shapes (like the Google drive files you shared) and just want to change all pixels that are neither black nor white into a different colour - and the fact that they are edges is irrelevant. That is much easier:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
# Open input image and ensure it is greyscale
image = Image.open('XYBase.png').convert('L')
# Make Numpy version
imnp = np.array(image)
# Set all pixels that are neither black nor white to 220
imnp[(imnp>0) & (imnp<255)] = 220
# Convert Numpy image back to PIL image and save
Image.fromarray(imnp).save('result.png')

How can I calculate the perimeter of an object in an image?

I have an image. I want to get the perimeter of every object in my image. For example, in this image , the perimeter of an object is 33 (the number of pixels at its edges).
I have written the following algorithm, but it is very timely.
Does anyone have an idea to increase the speed of the algorithm?
What I have tried:
def cal_perimeter_object(object, image):
peri_ = 0
for pixel_ in image:
if pixel_is_in_neigbor_of_object() is True:
peri_ += 1
return peri_
As mentioned in the comment by #Piinthesky having a boolean (or labelled image) where you know the label for the object you want to find the contour for is the first step. There are a number of ways of doing this, the simplest of which is thresholding. Once you have your labelled image you can find the perimeter in a number of ways - e.g. the number of pixels along the border. To give you a head start here is a way to do it on the image you put in the link. I have used scikit-image but there are other python libraries you may use.
# If your python version is not 3.x uncomment line below
#from __future__ import print_function
from skimage.measure import label, regionprops
import skimage.io as io
# read in the image (enter the path where you downloaded it on your computer below
im = io.imread('/home/kola/Downloads/perimeter.png')
# To simplify things I am only using the first channel and thresholding
# to get a boolean image
bw = im[:,:,0] > 230
regions = regionprops(bw.astype(int))
print(regions[0].perimeter)

Use Python / PIL or similar to shrink whitespace

Any ideas how to use Python with the PIL module to shrink select all? I know this can be achieved with Gimp. I'm trying to package my app as small as possible, a GIMP install is not an option for the EU.
Say you have 2 images, one is 400x500, other is 200x100. They both are white with a 100x100 textblock somewhere within each image's boundaries. What I'm trying to do is automatically strip the whitespace around that text, load that 100x100 image textblock into a variable for further text extraction.
It's obviously not this simple, so just running the text extraction on the whole image won't work! I just wanted to query about the basic process. There is not much available on Google about this topic. If solved, perhaps it could help someone else as well...
Thanks for reading!
If you put the image into a numpy array, it's simple to find the edges which you can use PIL to crop. Here I'm assuming that the whitespace is the color (255,255,255), you can adjust to your needs:
from PIL import Image
import numpy as np
im = Image.open("test.png")
pix = np.asarray(im)
pix = pix[:,:,0:3] # Drop the alpha channel
idx = np.where(pix-255)[0:2] # Drop the color when finding edges
box = map(min,idx)[::-1] + map(max,idx)[::-1]
region = im.crop(box)
region_pix = np.asarray(region)
To show what the results look like, I've left the axis labels on so you can see the size of the box region:
from pylab import *
subplot(121)
imshow(pix)
subplot(122)
imshow(region_pix)
show()
The general algorithmn would be to find the color of the top left pixel, and then do a spiral scan inwards until you find a pixel not of that color. That will define one edge of your bounding box. Keep scanning until you hit one more of each edge.
http://blog.damiles.com/2008/11/basic-ocr-in-opencv/
might be of some help. You can use the simple bounding box method described in that tutorial or #Tyler Eaves spiral suggestion which works equally as well

Categories