When i convert images to greyscale with pil it rotates them - python

When i convert images to greyscale with pil it rotates them.
How do i disable this?
from PIL import Image
import os
path = 'spanish_pages_photos/'
pathContents = os.listdir(path)
list = []
# get file names and append to list
for i in pathContents:
list.append(i)
list = sorted(list)
#loop through and change to grey scale
for i in list[2:]:
img = Image.open(f'spanish_pages_photos/{i}').convert('L')
img.save(f'spanish_pages_photos/{i}')
print('finished')

The EXIF data can contain an "EXIF Orientation" field. Try auto-orienting with PIL.ImageOps.exif_transpose().
See here.

Related

Insert Image into a pdf at a specified location

I'm relatively new to python, and I'm trying to replace an image at a given location.
The idea is to check if the extracted image in the PDF matches the image I want to replace. If it does, I extract the location and put the new image in its place. I'm done with the extracting and checking part. Could someone please help me with the later part?
Step 1: convert mypdf.pdf to full_page_image.jpg
from pdf2image import convert_from_path
pages = convert_from_path('mypdf.pdf', 500)
pages[x].save('full_page_image.jpg', 'JPEG') #where x is your page number minus one
Step 2: overlay image_to_be_added onto full_page_image
import cv2
import numpy as np
full_page_image = cv2.imread('full_page_image.jpg')
image_to_be_added = cv2.imread('image_to_be_added.jpg')
final_image = full_page_image.copy()
final_image[100:400,100:400,:] = image_to_be_added[100:400,100:400,:] #adjust the numbers according to the dimensions of the image_to_be_added
cv2.imwrite(final_image.jpg, final_image)
Step3: convert final_image.jpg to final_pdf.pdf
from PIL import Image
final_image2 = Image.open(r'final_image.jpg')
final_image3 = final_image2.convert('RGB')
final_image3.save(r'final_pdf.pdf')

batch processing of images in python opencv is not working

I used this code to read series of png format images in a folder. but it reads only one image successfully. What is the reason for that?
from glob import glob
for fn in glob('*.png'):
im = cv2.imread(fn)
You've only got one variable (called im) so it can only hold one image. You probably want a list of images:
# Make empty list
imgs = []
for fn in glob('*.png'):
im = cv2.imread(fn, cv2.IMREAD_COLOR)
imgs.append(im)
Or, you can use a "list comprehension":
imgs = [ cv2.imread(fn, cv2.IMREAD_COLOR) for fn in glob('*.png') ]

Extract an image from a PDF in python

I'm trying to extract images from a pdf using PyPDF2, but when my code gets it, the image is very different from what it should actually look like, look at the example below:
But this is how it should really look like:
Here's the pdf I'm using:
https://www.hbp.com/resources/SAMPLE%20PDF.pdf
Here's my code:
pdf_filename = "SAMPLE.pdf"
pdf_file = open(pdf_filename, 'rb')
cond_scan_reader = PyPDF2.PdfFileReader(pdf_file)
page = cond_scan_reader.getPage(0)
xObject = page['/Resources']['/XObject'].getObject()
i = 0
for obj in xObject:
# print(xObject[obj])
if xObject[obj]['/Subtype'] == '/Image':
if xObject[obj]['/Filter'] == '/DCTDecode':
data = xObject[obj]._data
img = open("{}".format(i) + ".jpg", "wb")
img.write(data)
img.close()
i += 1
And since I need to keep the image in it's colour mode, I can't just convert it to RBG if it was CMYK because I need that information.
Also, I'm trying to get dpi from images I get from a pdf, is that information always stored in the image?
Thanks in advance
I used pdfreader to extract the image from your example.
The image uses ICCBased colorspace with the value of N=4 and Intent value of RelativeColorimetric. This means that the "closest" PDF colorspace is DeviceCMYK.
All you need is to convert the image to RGB and invert the colors.
Here is the code:
from pdfreader import SimplePDFViewer
import PIL.ImageOps
fd = open("SAMPLE PDF.pdf", "rb")
viewer = SimplePDFViewer(fd)
viewer.render()
img = viewer.canvas.images['Im0']
# this displays ICCBased 4 RelativeColorimetric
print(img.ColorSpace[0], img.ColorSpace[1].N, img.Intent)
pil_image = img.to_Pillow()
pil_image = pil_image.convert("RGB")
inverted = PIL.ImageOps.invert(pil_image)
inverted.save("sample.png")
Read more on PDF objects: Image (sec. 8.9.5), InlineImage (sec. 8.9.7)
Hope this works: you probably need to use another library such as Pillow:
Here is an example:
from PIL import Image
image = Image.open("path_to_image")
if image.mode == 'CMYK':
image = image.convert('RGB')
image.write("path_to_image.jpg")
Reference: Convert from CMYK to RGB

How to delete images with the low pixel value in a folder

I have some problems with removing images in a folder
The followings are what I have done.
import os,glob
from PIL import Image
from skimage import io
import numpy as np
path = "/Users/Xin/Desktop/SVM-Image-Classification-master/Folder"
# Delete images with the low pixel value
for filename in os.listdir(path):
images = Image.open(os.path.join(path,filename))
print(images)
print(np.mean(images))
pirnt(os.listdir(path))
if np.mean(images) < 10:
os.listdir(path).remove(filename)
print(os.listdir(path))
I expected that the images with the low pixel value can be deleted. However, the result presented as follow, the image that I want to delete is still in the list.
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=256x256 at 0x1C19FE37F0>
9.507644653320312
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=256x256 at 0x1C198F2E10>
10.004150390625
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=256x256 at 0x1C19FE37F0>
10.897491455078125
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=256x256 at 0x1C198F2F98>
10.406112670898438
['0543_AD_axial_090_PET.jpg', '0543_AD_axial_091_PET.jpg', '0543_AD_axial_093_PET.jpg', '0543_AD_axial_092_PET.jpg']
Can anyone give me a help?
Thanks
You are just removing the filename from the temporary list created by os.listdir(path). If you want to remove the file completely from disk, you need to use os.remove.
For example:
for filename in os.listdir(path):
images = Image.open(os.path.join(path,filename))
if np.mean(images) < 10:
os.remove(os.path.join(path, filename))

Storing/Retrieving array images with Pyautogui (AttributeError: 'Image' object has not attribute 'read')

I'm trying to locate an image, then store another image relative to the first one within an array. Afterwards, I want those images to drop into a word document using the docx library. Currently, I'm getting the following error, despite a few different solutions I've tried below. Here's the code:
import sys
import PIL
import pyautogui
import docx
import numpy
def grab_paperclip_images():
'''
This'll look at the documents that're on
the current screen, and create images of
each document with a paperclip. I'll be
testing on an unsorted screen first.
'''
image_array = []
clip_array = find_all_objects("WHITE_PAPERCLIP.png")
for item in clip_array:
coordinates = item[0]+45, item[1], 222, item[3]
image_array.append(pyautogui.screenshot(region=coordinates))
return image_array
doc = docx.Document()
images = grab_paperclip_images()
for image in images:
#print image
#yields: [<PIL.Image.Image image mode=RGB size=222x12 at 0x7CC7770>,etc]
#Tried this - no dice
#img = PIL.Image.open(image)
#doc.add_picture(img)
doc.add_picture(image)
doc.save("testDoc.docx")
Please let me know what I'm misunderstanding, and if you see any suggestions to make the code more pythonic, better scoped, etc.
As always, thanks for the help, sincerely!
Figured out a way around this. I had to save the images to disk. I could still reference the array, but I couldn't reference the image without saving it. Here's my workaround:
def grab_paperclip_images():
'''
This'll look at the documents that're on
the current screen, and create images of
each document with a paperclip. I'll be
testing it on an unsorted screen first.
INSPIRATION:
bottom_record = pyautogui.screenshot(
"LAST_RECORD.png",
region=(
last_clip[0],
last_clip[1]+18,
1100,
14
)
)
'''
image_array = []
clip_array = find_all_objects("WHITE_PAPERCLIP.png")
count = 0
for item in clip_array:
coordinates = item[0]+45, item[1], 222, item[3]
filename = "image"+str(count)+".png"
image = pyautogui.screenshot(filename, region=coordinates)
image_array.append(filename)
count += 1
return image_array
doc = docx.Document()
images = grab_paperclip_images()
for image in images:
doc.add_picture(image)
doc.save("dingding2.docx")
delete_all(images)

Categories