getbbox method from python image library (PIL) not working - python

I want to crop an image to its smaller size, by cutting the white areas on the borders. I tried the solution suggested in this forum Crop a PNG image to its minimum size but the getbbox() method of pil is returning a bounding box of the same size of the image, i.e., it seems that it doesn't recognize the blank areas around. I tried the following:
>>>import Image
>>>im=Image.open("myfile.png")
>>>print im.format, im.size, im.mode
>>>print im.getbbox()
PNG (2400,1800) RGBA
(0,0,2400,1800)
I checked that my image has truly white croppable borders by cropping the image with the GIMP auto-crop. I also tried with ps and eps versions of the figure, without luck.
Any help would be highly appreciated.

Trouble is getbbox() crops off the black borders, from the docs: Calculates the bounding box of the non-zero regions in the image.
import Image
im=Image.open("flowers_white_border.jpg")
print im.format, im.size, im.mode
print im.getbbox()
# white border output:
JPEG (300, 225) RGB
(0, 0, 300, 225)
im=Image.open("flowers_black_border.jpg")
print im.format, im.size, im.mode
print im.getbbox()
# black border output:
JPEG (300, 225) RGB
(16, 16, 288, 216) # cropped as desired
We can do an easy fix for white borders, by first inverting the image using ImageOps.invert, and then use getbbox():
import ImageOps
im=Image.open("flowers_white_border.jpg")
invert_im = ImageOps.invert(im)
print invert_im.getbbox()
# output:
(16, 16, 288, 216)

Related

OpenCV process all text to be black on white (segmentation)

Is it possible to somehow make it so that all text in a document is black on white after thresholding. I've been looking online alot but I haven't been able to come to a solution. My current thresholded image is: https://i.ibb.co/Rpqcp7v/thresh.jpg
The document needs to be read by an OCR and for that I need to have the areas that are currently white on black, to be inverted. How would I go about doing this? my current code:
# thresholding
def thresholding(image):
# thresholds the image into a binary image (black and white)
return cv2.threshold(image, 120, 255, cv2.THRESH_BINARY)[1]
Use a median filter to estimate the dominant color (background).
Then subtract the image from that... you'll get white text on black background. I'm using the absolute difference. Invert for black on white.
im = cv.imread("thresh.jpg", cv.IMREAD_GRAYSCALE)
im = cv.pyrDown(cv.pyrDown(im)) # picture too large for stack overflow
bg = cv.medianBlur(im, 51) # suitably large kernel to cover all text
out = 255 - cv.absdiff(bg, im)

Writing text on a grayscale low resolution image

I have been trying to write text to an 80x80 16-bit grayscale image and have been having some trouble getting it to work.
I am currently using:
image = im[0]/255.0 #where im is just an np array of images (which are 80x80 np arrays)
# font
font = cv2.FONT_HERSHEY_SIMPLEX
# org
org = (40, 15)
# fontScale
fontScale = 0.3
# Blue color in BGR
color = (255.0)
# Line thickness of 2 px
thickness = 1
# Using cv2.putText() method
image = cv2.putText(image, 'Out:16', org, font, fontScale, color, thickness, cv2.LINE_AA)
# Displaying the image
cv2.imshow(window_name, image)
However, not only does the text look pretty awe full and take up a lot of space (I cant go lower without it not being legible), the images becomes all black except of the text which is white.
Would there be a better way to write text to a low resolution image (make the text smaller)? And why is the image turned to all black?
EDIT:
I tried using ImageDraw() and the result is all greyed
from PIL import Image, ImageFont, ImageDraw
# creating a image object
image = Image.fromarray(im[0]/255.0)
draw = ImageDraw.Draw(image)
# specified font size
font = ImageFont.truetype('./arial.ttf', 10)
text = 'fyp:16'
# drawing text size
draw.text((5, 5), text, font = font, align ="left")
It looks like the main issue is converting image type to float.
Assume (please verify it):
im[0] is 16-bit grayscale, and im[0].dtype is dtype('uint16').
image = im[0]/255.0 implies that you want to convert the range from 16-bit grayscale to the the range of uint8.
Note: for converting the range from [0, 2^16-1] to [0, 255] you need to divide by (2**16-1)/255 = 257.0. But this is not the main issue.
The main issue is converting the type to float.
The valid range of float images in OpenCV is [0, 1].
All values above 1.0 are white pixels, and 0.5 is a gray pixel.
You can keep the image type uint16 - you don't have to convert it to uint8.
A white text color for uint16 type is 2**16-1 = 65535 (not 255).
Here is code sample that works with 16-bit grayscale (and uint16 type):
import numpy as np
import cv2
im = np.full((80, 80), 10000, np.uint16) # 16 bits grayscale synthetic image - set all pixels to 10000
cv2.circle(im, (40, 40), 10, 0, 20, cv2.LINE_8) # draw black cicle - synthetic image
#image = im[0]/255.0 #where im is just an np array of images (which are 80x80 np arrays)
image = im #where im is just an np array of images (which are 80x80 np arrays)
color = 2**16-1 # 65535 is white color for 16 bis image
# Using cv2.putText() method
image = cv2.putText(image, 'Out:16', (40, 15), cv2.FONT_HERSHEY_SIMPLEX, 0.3, color, 1, cv2.LINE_AA)
# Displaying the image
cv2.imshow("image", image)
cv2.waitKey()
The above code creates synthetic 16-bit grayscale for testing.
Converting from 16-bit grayscale to 8-bit grayscale:
# https://stackoverflow.com/questions/11337499/how-to-convert-an-image-from-np-uint16-to-np-uint8
uint8_image = cv2.convertScaleAbs(image, alpha=255.0/(2**16-1)) # Convent uint16 image to uint8 image (2**16-1 scaled to 255)
The above conversion assumes image is full range 16 bits (pixel range [0, 65535]).
About the font:
OpenCV is computer vision oriented, and text drawing limited.
Why is the image black?
It's hard to answer without knowing the values of im[0].
It could be that im[0] is not a 16-bit grayscale at all.
It could be that the values of im[0] are very small.
It could be that the type of im[0] is not uint16.
Drawing text using Pillow (PIL):
The quality of the small text is much better compared to OpenCV.
You can find for about quality text rendering here.
Continue with the uint8 image:
pil_image = Image.fromarray(uint8_image)
draw = ImageDraw.Draw(pil_image)
# specified font size
font = ImageFont.truetype('./arial.ttf', 10)
text = 'fyp:16'
# drawing text size
draw.text((5, 5), text, 255, font = font, align ="left")
pil_image.show()
Result:
I don't really know the reason your text looks wired compared to above result.

Cannot draw a line in image without creating it's copy.(Issue exist in only .png images)

Below code doesn't draw any line on png image
imgPath = "./images/dummy.png"
img = cv2.imread(imgPath,cv2.IMREAD_UNCHANGED)
imgBGR = img[:,:,:3]
imgMask = img[:,:,3]
cv2.line(imgBGR, (200,100), (250,100), (0,100,255), thickness=9, lineType=cv2.LINE_AA)
plt.imshow(imgBGR[:,:,::-1])
On creating copy of BGR channel and using it to draw line works.
imgPath = "./images/dummy.png"
img = cv2.imread(imgPath,cv2.IMREAD_UNCHANGED)
imgBGR = img[:,:,:3]
imgMask = img[:,:,3]
imgBGRCopy = imgBGR.copy()
cv2.line(imgBGRCopy , (200,100), (250,100), (0,100,255), thickness=9, lineType=cv2.LINE_AA)
plt.imshow(imgBGRCopy [:,:,::-1])
Please explain why?
For mysterious reason OpenCV fails for draw on a NumPy slice.
See the following post for example: Why cv2.line can't draw on 1 channel numpy array slice inplace?
The error is not related to PNG vs JPEG, or to BGR vs BGRA.
The error is because you are trying to draw on a slice.
For example, the following code also fails:
imgBGR = cv2.imread('test.jpg', cv2.IMREAD_UNCHANGED)
imgB = imgBGR[:,:,0] # Get a slice of imgBGR
cv2.circle(imgB, (100, 100), 10, 5)
As an alternative, you can draw on the BGRA image.
According to Drawing Functions documentation:
Note The functions do not support alpha-transparency when the target image is 4-channel. In this case, the color[3] is simply copied to the repainted pixels. Thus, if you want to paint semi-transparent shapes, you can paint them in a separate buffer and then blend it with the main image.
The following code is working:
img = cv2.imread(imgPath,cv2.IMREAD_UNCHANGED)
cv2.line(img, (200,100), (250,100), (0,100,255), thickness=9, lineType=cv2.LINE_AA)
imgBGR = img[:,:,:3]
plt.imshow(imgBGR[:,:,::-1])
Note:
Python an OpenCV are "Open Source", so in theory we can follow the source code and demystify the problem.

How do I add a transparent overlay to an image using pillow?

I'm trying to add an transparent overlay on an jpeg image. In the example below, the desired result would be a red image with a pieslice where the it is light red.
My input is an jpeg Image. Since jpeg doesn't have an alpha channel, I though I could convert it to an 'RGBA' image and paste the overlay on it:
from PIL import Image, ImageDraw
# img = Image.open('input.jpg').convert('RGBA')
img = Image.new('RGBA', (400,400), (255,0,0))
img2 = Image.new('RGBA', (400,400))
draw2 = ImageDraw.Draw(img2)
draw2.pieslice([0,0,400,400], 90, 180, fill='white')
img.putalpha(128)
img.save('img.png')
img2.save('img2.png')
img.paste(img2)
img.save('img1+2.png')
However, this doesn't have the desired effect, and windows Photos cannot even open it correctly.
I saw the blend and alpha_composite functions but it doesn't have the desired effect for me. I don't want to lower the alpha values of the background outside of the overlay.

Create text mask using Python+PIL or Imagemagick

I need to programmatically create a text mask over a solid colour with the text being transparent.
eg.
Can this be done with either Imagemagick or the Python imaging library?
This works for the inverse of what I want, so solid text over a transparent background.
convert -size 720x405 xc:transparent -fill pink -gravity center -pointsize 150 -font font.ttf -draw "text 0,0 CUT" output.png
I would prefer to use PIL/Pillow if possible.
UPDATED:
This is what I have so far...
font = ImageFont.truetype('font.ttf', 200, encoding='unic')
bg = Image.new('RGBA', (720, 404), '#0099ff')
mask = Image.new('1', (720, 404))
draw = ImageDraw.Draw(mask)
draw.text((0, 0), '#HELLO', font=font, fill='#ffffff')
bg.paste('#00000000', (0, 0), text)
bg.save('text.png', 'PNG')
However this throws an error:
ValueError: unknown color specifier: '#00000000'
If I set to a valid colour it works as expected so I know I'm close, just can't reference the transparency.
UPDATED:
In the end I just created another blank image and used that rather than defining the colour. So:
trans = Image.new('RGBA', (720, 404))
bg.paste(trans, (0, 0), mask)
This is relatively simple in PIL.
Here's a pretty good PIL reference.
There are many ways to do it - here's one:
Create an image with the size and background color you want in RGBA mode.
Create a blank image of the same size in mode "1" to be used as a mask
Write the text on the mask image using the ImageDraw module
Use Image.paste(colour, box, mask) to paste the color (0, 0, 0, 0) everywhere the text exists in the mask.

Categories