I'm writing a script that automatically lets me download images from a website and set them as my background. But I only want landscape/horizontal pictures, is there a way for python to see the aspect ratio of an image and then discard them based on whether it's horizontal or vertical? And would this be easier to do after downloading the pictures, or before using Selenium?
You can achieve this with the Python OpenCV Library, available here.
With openCV you can read the width and height of your choice image. Just like this:
import cv2
im = cv2.imread('www.your_image_url.com')
h, w, c = im.shape
# h is the image height, and w is the width. Note that for portrait images w is always less that h
# then pass your download condition
if w < h:
# do not download image code
else:
# download image code
Related
Probably an unusual question, but I am currently looking for a solution to display image files with PIL slower.
Ideally so that you can see how the image builds up, pixel by pixel from left to right.
Does anyone have an idea how to implement something like this?
It is a purely optical thing, so it is not essential.
Here an example:
from PIL import Image
im = Image.open("sample-image.png")
im.show()
Is there a way to "slow down" im.show()?
AFAIK, you cannot do this directly with PIL's Image.show() because it actually saves your image as a file to /var/tmp/XXX and then passes that file to your OS's standard image viewer to display on the screen and there is no further interaction with the viewer process after that. So, if you draw in another pixel, the viewer will not be aware and if you call Image.show() again, it will save a new copy of your image and invoke another viewer which will give you a second window rather than updating the first!
There are several possibilities to get around it:
use OpenCV's cv2.imshow() which does allow updates
use tkinter to display the changing image
create an animated GIF and start a new process to display that
I chose the first, using OpenCV, as the path of least resistance:
#!/usr/bin/env python3
import cv2
import numpy as np
from PIL import Image
# Open image
im = Image.open('paddington.png')
# Make BGR Numpy version for OpenCV
BGR = np.array(im)[:,:,::-1]
h, w = BGR.shape[:2]
# Make empty image to fill in slowly and display
d = np.zeros_like(BGR)
# Use "x" to avoid drawing and waiting for every single pixel
x=0
for y in range(h):
for x in range(w):
d[y,x] = BGR[y,x]
if x%400==0:
cv2.imshow("SlowLoader",d)
cv2.waitKey(1)
x += 1
# Wait for one final keypress to exit
cv2.waitKey(0)
Increase the 400 near the end to make it faster and update the screen after a greater number of pixels, or decrease it to make it update the screen after a smaller number of pixels meaning you will see them appear more slowly.
As I cannot share a movie on StackOverflow, I made an animated GIF to show how that looks:
I decided to try and do it with tkinter as well. I am no expert on tkinter but the following works just the same as the code above. If anyone knows tkinter better, please feel free to point out my inadequacies - I am happy to learn! Thank you.
#!/usr/bin/env python3
import numpy as np
from tkinter import *
from PIL import Image, ImageTk
# Create Tkinter Window and Label
root = Tk()
video = Label(root)
video.pack()
# Open image
im = Image.open('paddington.png')
# Make Numpy version for simpler pixel access
RGB = np.array(im)
h, w = RGB.shape[:2]
# Make empty image to fill in slowly and display
d = np.zeros_like(RGB)
# Use "x" to avoid drawing and waiting for every single pixel
x=0
for y in range(h):
for x in range(w):
d[y,x] = RGB[y,x]
if x%400==0:
# Convert the video for Tkinter
img = Image.fromarray(d)
imgtk = ImageTk.PhotoImage(image=img)
# Set the image on the label
video.config(image=imgtk)
# Update the window
root.update()
x += 1
I have an A4 png image with some text in it, it's transparent, my question is, how can I crop the image to only have the text, I am aware of cropping in PIL, but if I set it to fixed values, it will not be able to crop another image that has that text in another place. So, how can I do it so it finds where the text, sticker, or any other thing is placed on that big and empty image, and crop it so the thing fits perfectly?
Thanks in advance!
You can do this by extracting the alpha channel and cropping to that. So, if this is your input image:
Here it is again, smaller and on a chessboard background so you can see its full extent:
The code looks like this:
#!/usr/bin/env python3
from PIL import Image
# Load image
im = Image.open('image.png')
# Extract alpha channel as new Image and get its bounding box
alpha = im.getchannel('A')
bbox = alpha.getbbox()
# Apply bounding box to original image
res = im.crop(bbox)
res.save('result.png')
Here is the result:
And again on a chessboard pattern so you can see its full extent:
Keywords: Image processing, Python, PIL/Pillow, trim to alpha, crop to alpha, trim to transparency, crop to transparency.
from PIL import Image
im = Image.open("image.png")
im.getbbox()
im2 = im.crop(im.getbbox())
im2.save("result.png")
I'm trying to write use the wand simple MagickWand API binding for Python to extract pages from a PDF, stitch them together into a single longer ("taller") image, and pass that image to Google Cloud Vision for OCR Text Detection. I keep running up against Google Cloud Vision's 10MB filesize limit.
I thought a good way to get the filesize down might be to eliminate all color channels and just feed Google a B&W image. I figured out how to get grayscale, but how can I make my color image into a B&W ("bilevel") one? I'm also open to other suggestions for getting the filesize down. Thanks in advance!
from wand.image import Image
selected_pages = [0,1]
imageFromPdf = Image(filename=pdf_filepath+str(selected_pages), resolution=600)
pages = len(imageFromPdf.sequence)
image = Image(
width=imageFromPdf.width,
height=imageFromPdf.height * pages
)
for i in range(pages):
image.composite(
imageFromPdf.sequence[i],
top=imageFromPdf.height * i,
left=0
)
image.colorspace = 'gray'
image.alpha_channel = False
image.format = 'png'
image
The following are several methods of getting a bilevel output from Python Wand (0.5.7). The last needs IM 7 to work. One note in my testing is that in IM 7, the first two results are swapped in terms of dithering or not dithering. But I have reported this to the Python Wand developer.
Input:
from wand.image import Image
from wand.display import display
# Using Wand 0.5.7, all images are not dithered in IM 6 and all images are dithered in IM 7
with Image(filename='lena.jpg') as img:
with img.clone() as img_copy1:
img_copy1.quantize(number_colors=2, colorspace_type='gray', treedepth=0, dither=False, measure_error=False)
img_copy1.auto_level()
img_copy1.save(filename='lena_monochrome_no_dither.jpg')
display(img_copy1)
with img.clone() as img_copy2:
img_copy2.quantize(number_colors=2, colorspace_type='gray', treedepth=0, dither=True, measure_error=False)
img_copy2.auto_level()
img_copy2.save(filename='lena_monochrome_dither.jpg')
display(img_copy2)
with img.clone() as img_copy3:
img_copy3.threshold(threshold=0.5)
img_copy3.save(filename='lena_threshold.jpg')
display(img_copy3)
# only works in IM 7
with img.clone() as img_copy4:
img_copy4.auto_threshold(method='otsu')
img_copy4.save(filename='lena_threshold_otsu.jpg')
display(img_copy4)
First output using IM 6:
Second output using IM 7:
I have multiple pdf invoice which i am trying to parse. I convert them to images and use ocr to get text from the images. One of the pdf has 2 out of 3 pages which are rotated by 90 degrees. How do i detect these rotated pages and correctly rotate them for the ocr to return correct information ?
To keep the image intact, you can set the parameter 'expand' to True
image = image.rotate(270, expand=True)
Here is a solution that works for one image but you can do it for a list of images and check each image before saving it back to PDF:
#import library
enter code here
from PIL import Image
#open image file
f=Image.open('test.jpg')
#conver to pdf
pdf=f.convert('RGB')
#if width > than height, rotate it to get portrait
if pdf.width > pdf.height:
pdf=pdf.rotate(270,expand=True)
#save pdf
pdf.save('test.pdf')
When you say they are rotated, would it be as simple as they are all meant to be in portrait orientation and some pages are landscape orientation? You should either be able to read the metadata from the PDF of the orientation of the pages, or if that's not available for some reason you might need to use this simple logic to determine it, like rotated = image.width > image.height
With Pillow/PIL it would be easy to rotate the image before OCR:
if rotated:
image = image.rotate(270)
Presumably there could be a case of pages being upside down and unless you have reliable metadata from the PDF, then you might have to first OCR with the most likely direction (say counter-clockwise 90 degrees as per above) and if that doesn't return any text try again after rotating 180 degrees.
You can use imutils to rotate without cutting out image boundaries after rotation.
import cv2 as cv
import imutils
img = cv.imread('your_image.png')
imutils.rotate_bound(img, 270) #### 270 for anti-clockwise or 90 for clockwise
Consider this experiment:
I have two images, one with free text and another with text with-in a box (surrounded by border)
If I run tesseract-ocr on these two images, the free text image outputs 'Text' while the boxed image outputs Nothing ''
Why is that?
As a fix, I can crop borders using some image processing but I am trying to know what's causing this problem.
free image
Boxed image
So far,
I cropped borders of the image using below logic [We should feed it outer border contour-cropped image] and I then I was able to detect the text. However I don't understand why tesseract isn't detecting the boxed text. Feel free to experiment on attached images.
`# Below code modified (x,y) and (height,width) `
`# in a way that new values choose a smaller box enclosed`
`# by the original box`
y = y + int(0.025*h)
x = x + int(0.025*w)
h = h - int(0.05*h)
w = w - int(0.05*w)