Multi-Page tiff resizing python - python

First post here, although i already spent days of searching for various queries here. Python 3.6, Pillow and tiff processing.
I would like to automate one of our manual tasks, by resizing some of the images from very big to match A4 format. We're operating on tiff format, that sometimes ( often ) contains more than one page. So I wrote:
from PIL import Image,
...
def image_resize(path, dcinput, file):
dcfake = read_config(configlocation)["resize"]["dcfake"]
try:
imagehandler = Image.open(path+file)
imagehandler = imagehandler.resize((2496, 3495), Image.ANTIALIAS)
imagehandler.save(dcinput+file, optimize=True, quality=95)
except Exception:
But the very (not) obvious is that only first page of tiff is being converted. This is not exactly what I expect from this lib, however tried to dig, and found a way to enumerate each page from tiff, and save it as a separate file.
imagehandler = Image.open(path+file)
for i, page in enumerate(ImageSequence.Iterator(imagehandler)):
page = page.resize((2496, 3495), Image.ANTIALIAS)
page.save(dcinput + "proces%i.tif" %i, optimize=True, quality=95, save_all=True)
Now I could use imagemagick, or some internal commands to convert multiple pages into one, but this is not what I want to do, as it drives to code complication.
My question, is there a unicorn that can help me with either :
1) resizing all pages of given multi-page tiff in the fly
2) build a tiff from few tiffs
I'd like to focus only on python modules.
Thx.

Take a look at this example. It will make every page of a TIF file four times smaller (by halving width and height of every page):
from PIL import Image
from PIL import ImageSequence
from PIL import TiffImagePlugin
INFILE = 'multipage_tif_example.tif'
OUTFILE = 'multipage_tif_resized.tif'
print ('Resizing TIF pages')
pages = []
imagehandler = Image.open(INFILE)
for page in ImageSequence.Iterator(imagehandler):
new_size = (page.size[0]/2, page.size[1]/2)
page = page.resize(new_size, Image.ANTIALIAS)
pages.append(page)
print ('Writing multipage TIF')
with TiffImagePlugin.AppendingTiffWriter(OUTFILE) as tf:
for page in pages:
page.save(tf)
tf.newFrame()
It's supposed to work since late Pillow 3.4.x versions (works with version 5.1.0 on my machine).
Resources:
AppendingTiffWriter discussed here.
Sample TIF files can be downloaded here.

Related

convert all img in one pdf .?

I would like to finish my script, I tried a lot to solve but being a beginner failed.
I have a function imageio which takes image from website and after that, i would like resize all images in 63x88 and put all my images in one pdf.
full_path = os.path.join(filePath1, name + ".png")
if os.path.exists(full_path):
number = 1
while True:
full_path = os.path.join(filePath1, name + str(number) + ".png")
if not os.path.exists(full_path):
break
number += 1
imageio.imwrite(full_path, im_padded.astype(np.uint8))
os.chmod(full_path, mode=0o777)
thanks for answer
We (ImageIO) currently don't have a PDF reader/writer. There is a long-standing features request for it, which hasn't been implemented yet because there is currently nobody willing to contribute it.
Regarding the loading of images, we have an example for this in the docs:
import imageio as iio
from pathlib import Path
images = list()
for file in Path("path/to/folder").iterdir():
im = iio.imread(file)
images.append(im)
The caveat is that this particular example assumes that you want to read all images in a folder, and that there is only images in said folder. If either of these cases doesn't apply to you, you can easily customize the snippet.
Regarding the resizing of images, you have several options, and I recommend scikit-image's resize function.
To then get all the images into a PDF, you could have a look at matplotlib, which can generate a figure which you can save as a PDF file. The exact steps to do so will depend on the desired layout of your resulting pdf.

Converting pdf to png with python (without pdf2image)

I want to convert a pdf (one page) into a png file.
I installed pdf2image and get this error:
popler is not installed in windows.
According to this question:
Poppler in path for pdf2image, poppler should be installed and PATH modified.
I cannot do any of those (I don't have the necessary permissions in the system I am working with).
I had a look at opencv and PIL and none seems to offer the possibility to make this transformation:
PIL (see here https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html?highlight=pdf#pdf) does not offer the possibility to read pdfs, only to save images as pdfs.
The same goes for openCV.
Any suggestion how to make the pdf to png transformation ? I can install any python library but I can not touch the windows installation.
thanks
Here is a snippet that generates PNG images of arbitrary resolution (dpi):
import fitz
file_path = "my_file.pdf"
dpi = 300 # choose desired dpi here
zoom = dpi / 72 # zoom factor, standard: 72 dpi
magnify = fitz.Matrix(zoom, zoom) # magnifies in x, resp. y direction
doc = fitz.open(fname) # open document
for page in doc:
pix = page.get_pixmap(matrix=magnify) # render page to an image
pix.save(f"page-{page.number}.png")
Generates PNG files name page-0.png, page-1.png, ...
By choosing dpi < 72 thumbnail page images would be created.
PyMuPDF supports pdf to image rasterization without requiring any external dependencies.
Sample code to do a basic pdf to png transformation:
import fitz # PyMuPDF, imported as fitz for backward compatibility reasons
file_path = "my_file.pdf"
doc = fitz.open(file_path) # open document
for page in doc:
pix = page.get_pixmap() # render page to an image
pix.save(f"page_{i}.png")

Only one image from 5 is downloaded and it knocks out an error

import requests
from PIL import Image
url_shoes_for_choice = [
"https://content.adidas.co.in/static/Product-CM7531/Unisex_OUTDOOR_SANDALS_CM7531_1.jpg",
"https://cdn.shopify.com/s/files/1/0080/1374/2161/products/product-image-897958210_640x.jpg?v=1571713841",
"https://cdn.chamaripashoes.com/media/catalog/product/cache/9/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_8_3.jpg",
"https://ae01.alicdn.com/kf/HTB1EyKjaI_vK1Rjy0Foq6xIxVXah.jpg_q50.jpg",
"https://www.converse.com/dw/image/v2/BCZC_PRD/on/demandware.static/-/Sites-cnv-master-catalog/default/dwb9eb8c43/images/a_107/167708C_A_107X1.jpg"
]
def img():
for url in url_shoes_for_choice:
image = requests.get(url, stream=True).raw
out = Image.open(image)
out.save('image/image.jpg', 'jpg')
if __name__=="__main__":
img()
Error:
OSError: cannot identify image file <_io.BytesIO object at 0x7fa185c52d58>
The problem is that one of the images is making issues with the byte data returned by the requests.get(url, stream=True).raw, I'm not sure but I guess the data of the 3rd image is invalid byte data so instead of getting the raw data we can just fetch the content and then by using BytesIO we can fix the byte data.
I fixed one more thing from your original code, I added numbering to your images so each can be saved with different name.
from io import BytesIO
def img():
for count, url in enumerate(url_shoes_for_choice):
image = requests.get(url, stream=True)
with BytesIO(image.content) as f:
with Image.open(f) as out:
# out.show() # See the images
out.save('image/image{}.jpg'.format(count))
(Though this works fine but I'm not sure what was the main issue. If anyone knows exactly what is the issue please comment and explain.)
I opened the first link in my browser and saved the image. It's actually a webp file.
$ file Unisex_OUTDOOR_SANDALS_CM7531_1.webp
Unisex_OUTDOOR_SANDALS_CM7531_1.webp: RIFF (little-endian) data, Web/P image, VP8 encoding, 500x500, Scaling: [none]x[none], YUV color, decoders should clamp
You explicitly tell the image library that it should expect a jpg. When you remove that parameter and let it figure it out on its own using out.save('image/image.jpg') the first image successfully downloads for me.
The first two images work this way if you make sure to save each under a different name:
def img():
i = 0
for url in url_shoes_for_choice:
i+=1
image = requests.get(url, stream=True).raw
out = Image.open(image)
out.save('image{}.jpg'.format(i))
the third is a valid jpeg file, as well as the fourth, but using the JFIF standard 1.01 which I hear the first time of. I'm pretty sure you'll have to figure out support for different such filetypes.
It is worth noting that if I download the images in chrome and open those with python, nothing fails. So chrome might be adding information to the file.
The documentation of PIL/pillow explains here that you need a new enough version for animated images, but that is not your problem.
Support for animated WebP files will only be enabled if the system
WebP library is v0.5.0 or later. You can check webp animation support
at runtime by calling features.check(“webp_anim”).

img2pdf: one page of pdf with one image?

The souce file is here.The fetch code is sify .It's just one jpg. If you can't download it, please contact bbliao#126.com.
However this image doesn't work with fpdf package, I don't know why. You can try it.
Thus I have to use the img2pdf. With the following code I converted this image to pdf successfully.
t=os.listdir()
with open('bb.pdf','wb') as f:
f.write(img2pdf.convert(t))
However, when multiple images are combined into one pdf file, the img2pdf just combine each image by head_to_tail. This causes every pagesize = imgaesize. Briefly, the first page of pdf is 30 cm*40 cm while the second is 20 cm*10 cm the third is 15*13...That's ugly.
I want the same pagesize(A4 for example) and the same imgsize in every page of the pdf. One page of pdf with one image.
Glancing at the documentation for img2pdf, it allows you to set the paper size by including layout details to the convert call:
import img2pdf
letter = (img2pdf.in_to_pt(8.5), img2pdf.in_to_pt(11))
layout = img2pdf.get_layout_fun(letter)
with open('test.pdf', 'wb') as f:
f.write(img2pdf.convert(['image1.jpg','image2.jpg'], layout_fun=layout))

Python/wand code causes "Killed" when converting large PDFs

I have been working on setting up a PDF conversion-to-png and cropping script with Python 3.6.3 and the wand library.
I tried Pillow, but it's lacking the conversion part. I am experimenting with extracting the alpha channel because I want to feed the images to an OCR, at a later point, so I turned to trying the code provided in this SO answer.
A couple of issues came out: the first is that if the file is large, I get a "Killed" message from the terminal. The second is that it seems rather picky with the file, i.e. files that get converted properly by imagemagick's convert or pdftoppm in the command line, raise errors with wand.
I am mostly concerned with the first one though, and would really appreciate a check from more knowledgeable coders. I suspect it might come from the way the loop is structured:
from wand.image import Image
from wand.color import Color
def convert_pdf(filename, path, resolution=300):
all_pages = Image(filename=path+filename, resolution=resolution)
for i, page in enumerate(all_pages.sequence):
with Image(page) as img:
img.format = 'png'
img.background_color = Color('white')
img.alpha_channel = 'remove'
image_filename = '{}.png'.format(i)
img.save(filename=path+image_filename)
I noted that the script outputs all files at the end of the process, rather than one by one, which I am guessing it might put unnecessary burden on memory, and ultimately cause a SEGFAULT or something similar.
Thanks for checking out my question, and for any hints.
Yes, your line:
all_pages = Image(filename=path+filename, resolution=resolution)
Will start a GhostScript process to render the entire PDF to a huge temporary PNM file in /tmp. Wand will then load that massive file into memory and hand out pages from it as you loop.
The C API to MagickCore lets you specify which page to load, so you could perhaps render a page at a time, but I don't know how to get the Python wand interface to do that.
You could try pyvips. It renders PDFs incrementally by making direct calls to libpoppler, so there are no processes being started and stopped and no temporary files.
Example:
#!/usr/bin/python3
import sys
import pyvips
def convert_pdf(filename, resolution=300):
# n is number of pages to load, -1 means load all pages
all_pages = pyvips.Image.new_from_file(filename, dpi=resolution, n=-1, \
access="sequential")
# That'll be RGBA ... flatten out the alpha
all_pages = all_pages.flatten(background=255)
# the PDF is loaded as a very tall, thin image, with the pages joined
# top-to-bottom ... we loop down the image cutting out each page
n_pages = all_pages.get("n-pages")
page_width = all_pages.width
page_height = all_pages.height / n_pages
for i in range(0, n_pages):
page = all_pages.crop(0, i * page_height, page_width, page_height)
print("writing {}.tif ..".format(i))
page.write_to_file("{}.tif".format(i))
convert_pdf(sys.argv[1])
On this 2015 laptop with this huge PDF, I see:
$ /usr/bin/time -f %M:%e ../pages.py ~/pics/Audi_US\ R8_2017-2.pdf
writing 0.tif ..
writing 1.tif ..
....
writing 20.tif ..
720788:35.95
So 35s to render the entire document at 300dpi, and a peak memory use of 720MB.

Categories