convert all img in one pdf .? - python

I would like to finish my script, I tried a lot to solve but being a beginner failed.
I have a function imageio which takes image from website and after that, i would like resize all images in 63x88 and put all my images in one pdf.
full_path = os.path.join(filePath1, name + ".png")
if os.path.exists(full_path):
number = 1
while True:
full_path = os.path.join(filePath1, name + str(number) + ".png")
if not os.path.exists(full_path):
break
number += 1
imageio.imwrite(full_path, im_padded.astype(np.uint8))
os.chmod(full_path, mode=0o777)
thanks for answer

We (ImageIO) currently don't have a PDF reader/writer. There is a long-standing features request for it, which hasn't been implemented yet because there is currently nobody willing to contribute it.
Regarding the loading of images, we have an example for this in the docs:
import imageio as iio
from pathlib import Path
images = list()
for file in Path("path/to/folder").iterdir():
im = iio.imread(file)
images.append(im)
The caveat is that this particular example assumes that you want to read all images in a folder, and that there is only images in said folder. If either of these cases doesn't apply to you, you can easily customize the snippet.
Regarding the resizing of images, you have several options, and I recommend scikit-image's resize function.
To then get all the images into a PDF, you could have a look at matplotlib, which can generate a figure which you can save as a PDF file. The exact steps to do so will depend on the desired layout of your resulting pdf.

Related

How do I convert a multiple paged PDF into a PNG image per pdf page in Python

Amateur Python developer here. I'm working on a project where I take multiple PDfs, each one with varying amounts of pages(1-20ish), and turn them into PNG files to use with pytesseract later.
I'm using pdf2image and poppler on a test pdf that has 3 pages. The problem is that it only converts the last page of the PDF to a PNG. I thought "maybe the program is making the same file name for each pdf page, and with each iteration it rewrites the file until only the last pdf page remains" So I tried to write the program so it would change the file name with each iteration. Here's the code.
from pdf2image import convert_from_path
images = convert_from_path('/Users/jacobpatty/vscode_projects/badger_colors/test_ai/10254_Craigs_Plumbing.pdf', 200)
file_name = 'ping_from_ai_test.png'
file_number = 0
for image in images:
file_number =+ 1
file_name = 'ping_from_ai_test' + str(file_number) + '.png'
image.save(file_name)
This failed in 2 ways. It only made 2 png files('ping_from_ai_test.png' and 'ping_from_ai_test1.png') instead of 3, and when I clicked on the png files they were both just the last pdf page again. I don't know what to do at this point, any ideas?
Your code is only outputting a single file as far as I can see. The problem is that you have a typo in your code.
The line
file_number =+ 1
is actually an assignment:
file_number = (+1)
This should probably be
file_number += 1
try this instead of doing for image in images:
for n in range(len(images)):
images[n].save('test' + str(n) + '.png')
Does that work?

pdf2image conversion of multi page PDFs to images returns the last page on all images

So when I use the pdf2image python import, and pass a multi page PDF into the convert_from_bytes()- or convert_from_path() method, the output array does contain multiple images - but all images are of the last PDF page (whereas I would've expected that each image represented one of the PDF pages).
The output looks something like this:
Any idea on why this would occur? I can't find any solution to this online. I've found some vague suggestion that the use_cropbox argument might be used, but modifying it has no effect.
def convert(opened_file)
# Read PDF and convert pages to PPM image objects
try:
_ppm_pages = self.pdf2image.convert_from_bytes(
opened_file.read(),
grayscale = True
)
except Exception as e:
print(f"[CreateJPEG] Could not convert PDF pages to JPEG image due to error: \n '{e}'")
return
# Do stuff with _ppm_pages
for img in _ppm_pages:
img.show() # ...all images in that list are of the last page
Sometimes the output is an empty 1x1 image, instead, which I also haven't found a reason for. So if you have any idea what that is about, please do let me know!
Thanks in advance,
Simon
EDIT: Added code.
EDIT: So, when I try this in a random notebook, it actually works fine.
I've removed a few detours I used in my original code, and now it works. Still not sure what the underlying reason was though...
All the same, thanks for your help, everyone!
I'm using this right now....
from pdf2image import convert_from_path
imgSet = convert_from_path(pathToPDF, 500)
That gives me a list of images within imgSet
I guess you have to do something like this as described in the unit tests of the package.
with open("./tests/test.pdf", "rb") as pdf_file:
images_from_bytes = convert_from_bytes(pdf_file.read(), fmt="jpg")
self.assertTrue(images_from_bytes[0].format == "JPEG")

Convert image format in series with Python

sorry for my trivial question, but I'm new to Python.
I'm trying to convert a series of JPEG images to BMP format and resize it.
I managed to get the procedure for a single image, but now I can not automate the process so that the conversion happens in sequence.
this is my script
from PIL import Image
img = Image.open("C:/Users/***/Documents/images/1.jpg")
new_img = img.resize((320,240))
new_img.save("C:/Users/***/Documents/immages_bmp/1.bmp")
The images are progressively renamed from 1 to 10000.
Does anyone know how to help me implement a for loop to automate the process?
Thank you so much for your help
Something like:
from PIL import Image
from glob import glob
import os
myDir = '/Users/me/pictures'
pic_list = glob(myDir + os.sep + '*' + '.jpg')
for pic in pic_list:
#resize, use a string replace to name new bmps
img = Image.open(pic)
new_img = img.resize((320,240))
newName = pic.replace(".jpg",".bmp")
new_img.save(newName)
Should catch all the images regardless as to their naming convention, and will allow you to edit the list of names before you resize them (or not).

Multi-Page tiff resizing python

First post here, although i already spent days of searching for various queries here. Python 3.6, Pillow and tiff processing.
I would like to automate one of our manual tasks, by resizing some of the images from very big to match A4 format. We're operating on tiff format, that sometimes ( often ) contains more than one page. So I wrote:
from PIL import Image,
...
def image_resize(path, dcinput, file):
dcfake = read_config(configlocation)["resize"]["dcfake"]
try:
imagehandler = Image.open(path+file)
imagehandler = imagehandler.resize((2496, 3495), Image.ANTIALIAS)
imagehandler.save(dcinput+file, optimize=True, quality=95)
except Exception:
But the very (not) obvious is that only first page of tiff is being converted. This is not exactly what I expect from this lib, however tried to dig, and found a way to enumerate each page from tiff, and save it as a separate file.
imagehandler = Image.open(path+file)
for i, page in enumerate(ImageSequence.Iterator(imagehandler)):
page = page.resize((2496, 3495), Image.ANTIALIAS)
page.save(dcinput + "proces%i.tif" %i, optimize=True, quality=95, save_all=True)
Now I could use imagemagick, or some internal commands to convert multiple pages into one, but this is not what I want to do, as it drives to code complication.
My question, is there a unicorn that can help me with either :
1) resizing all pages of given multi-page tiff in the fly
2) build a tiff from few tiffs
I'd like to focus only on python modules.
Thx.
Take a look at this example. It will make every page of a TIF file four times smaller (by halving width and height of every page):
from PIL import Image
from PIL import ImageSequence
from PIL import TiffImagePlugin
INFILE = 'multipage_tif_example.tif'
OUTFILE = 'multipage_tif_resized.tif'
print ('Resizing TIF pages')
pages = []
imagehandler = Image.open(INFILE)
for page in ImageSequence.Iterator(imagehandler):
new_size = (page.size[0]/2, page.size[1]/2)
page = page.resize(new_size, Image.ANTIALIAS)
pages.append(page)
print ('Writing multipage TIF')
with TiffImagePlugin.AppendingTiffWriter(OUTFILE) as tf:
for page in pages:
page.save(tf)
tf.newFrame()
It's supposed to work since late Pillow 3.4.x versions (works with version 5.1.0 on my machine).
Resources:
AppendingTiffWriter discussed here.
Sample TIF files can be downloaded here.

Chunking a directory and applying image blending using PIL. Can't save images correctly with Python

Sorry for the title... So the goal of this script is to take a folder full on images that are listed in a particular order. Then it chunks the images into groups of 3. From there it takes the 3 images and blends them together using PIL. Now the issue that I have is that the code below does a great job of doing what I want. I can show imgbld2 it'll create 4 images in a temporary folder.
Now my problem is that when I go to save the images using imgbld2.save()it will only save the first created image into 4 image files, instead of 4 created images into 4 separate files.
I can fix this issue by pointing another script to retrieve the images from the temp folder by using glob.glob(). But that would require me to make sure to run the script on a freshly restarted computer but that seems to be too messy for my taste.
Is there a better way to achieve what I'm trying to do? Or there a saving method that I'm missing?
Any help would be appreciated, here is the code:
from PIL import Image
import os.path
import glob
#Lists Directory
Dir = os.listdir('/path/to/Directory/of/Images')
#Glob all jpgs
im = glob.glob( '/path/to/Directory/of/Images/*.jpg')
#sort jpg according to name
imsort = sorted(im)
def chunker(imsort,size = 3):
for i in range(0, len(imsort), size):
yield imsort[i:i + size]
print('what does it look like?')
for j in chunker(imsort):
print(j)
img1 = Image.open(j[0])
img2 = Image.open(j[1])
img3 = Image.open(j[2])
imgbld1 = Image.blend(img1, img2, 0.3)
imgbld2 = Image.blend(imgbld1, img3, 0.3)
imgbld2.show()
imgbld2.save('path/to/new/folder/' + 'blended' , 'JPEG')

Categories