I have lots of pdf files, each embedded with multiple images that need to be rotated.
I know I can extract the image out, rotate it and then again reconstruct the pdf, but is there any way that I can add a PDF command so that images rotate in place ?
Ideally, a PDF-library in python that will allow me to do that.
Edit:
One important detail I would like to add is that each page can have multiple images and each image needs to be rotated at a different angles. Think a task of straightening the images in a pdf.
I would like to answer your question,
import PyPDF2
pdf_in = open('original.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_in)
pdf_writer = PyPDF2.PdfFileWriter()
for pagenum in range(pdf_reader.numPages):
page = pdf_reader.getPage(pagenum)
page.rotateClockwise(180)#Angle in degrees
pdf_writer.addPage(page)
pdf_out = open('rotated.pdf', 'wb')
pdf_writer.write(pdf_out)
pdf_out.close()
pdf_in.close()
Hope so, this solved your problem
Related
I have 2 PDF files, each containing only one page, both pages are in same size and same orientation (landscape). I tried to use reportlab to merge theses 2 pages and save the result to a new pdf file. But the result is confusing. Each time I tried to merge, one of the pages is some how rotated by 90 degree.
Here is the code, does anybody know what I've done wrong:
base_pdf = PdfFileReader(open("base.pdf", "rb"))
hello_pdf = PdfFileReader(open("hello.pdf", "rb"))
new_pdf = PdfFileWriter()
base_page = base_pdf.getPage(0)
hello_page = hello_pdf.getPage(0)
base_page.mergePage(hello_page)
new_pdf.addPage(base_page)
outputStream = open("merged.pdf", "wb")
new_pdf.write(outputStream)
outputStream.close()
Each page in a PDF file can be rotated, on-the-fly, when a PDF viewer loads it, by 90 degrees, 180 degrees, or 270 degrees.
One of your PDFs has pages which are genuinely landscape. The other has portrait pages with /Rotate set to 90 or 270.
The software you are using is deliberately or accidentally not taking account of the difference when overlaying one page over the other.
You can use cpdf to regularize the rotated file prior to processing with your library. This will remove the soft rotation, and counter-rotate the page dimensions and content to compensate:
cpdf -upright in.pdf -o out.pdf
If you look into the documentation of the PDF library you are already using, you might find a way to do it there too.
I would like to finish my script, I tried a lot to solve but being a beginner failed.
I have a function imageio which takes image from website and after that, i would like resize all images in 63x88 and put all my images in one pdf.
full_path = os.path.join(filePath1, name + ".png")
if os.path.exists(full_path):
number = 1
while True:
full_path = os.path.join(filePath1, name + str(number) + ".png")
if not os.path.exists(full_path):
break
number += 1
imageio.imwrite(full_path, im_padded.astype(np.uint8))
os.chmod(full_path, mode=0o777)
thanks for answer
We (ImageIO) currently don't have a PDF reader/writer. There is a long-standing features request for it, which hasn't been implemented yet because there is currently nobody willing to contribute it.
Regarding the loading of images, we have an example for this in the docs:
import imageio as iio
from pathlib import Path
images = list()
for file in Path("path/to/folder").iterdir():
im = iio.imread(file)
images.append(im)
The caveat is that this particular example assumes that you want to read all images in a folder, and that there is only images in said folder. If either of these cases doesn't apply to you, you can easily customize the snippet.
Regarding the resizing of images, you have several options, and I recommend scikit-image's resize function.
To then get all the images into a PDF, you could have a look at matplotlib, which can generate a figure which you can save as a PDF file. The exact steps to do so will depend on the desired layout of your resulting pdf.
The souce file is here.The fetch code is sify .It's just one jpg. If you can't download it, please contact bbliao#126.com.
However this image doesn't work with fpdf package, I don't know why. You can try it.
Thus I have to use the img2pdf. With the following code I converted this image to pdf successfully.
t=os.listdir()
with open('bb.pdf','wb') as f:
f.write(img2pdf.convert(t))
However, when multiple images are combined into one pdf file, the img2pdf just combine each image by head_to_tail. This causes every pagesize = imgaesize. Briefly, the first page of pdf is 30 cm*40 cm while the second is 20 cm*10 cm the third is 15*13...That's ugly.
I want the same pagesize(A4 for example) and the same imgsize in every page of the pdf. One page of pdf with one image.
Glancing at the documentation for img2pdf, it allows you to set the paper size by including layout details to the convert call:
import img2pdf
letter = (img2pdf.in_to_pt(8.5), img2pdf.in_to_pt(11))
layout = img2pdf.get_layout_fun(letter)
with open('test.pdf', 'wb') as f:
f.write(img2pdf.convert(['image1.jpg','image2.jpg'], layout_fun=layout))
Lets say you have a pdf page with various complex elements inside.
The objective is to crop a region of the page (to extract only one of the elements) and then paste it in another pdf page.
Here is a simplified version of my code:
import PyPDF2
import PyPdf
def extract_tree(in_file, out_file):
with open(in_file, 'rb') as infp:
# Read the document that contains the tree (in its first page)
reader = pyPdf.PdfFileReader(infp)
page = reader.getPage(0)
# Crop the tree. Coordinates below are only referential
page.cropBox.lowerLeft = [100,200]
page.cropBox.upperRight = [250,300]
# Create an empty document and add a single page containing only the cropped page
writer = pyPdf.PdfFileWriter()
writer.addPage(page)
with open(out_file, 'wb') as outfp:
writer.write(outfp)
def insert_tree_into_page(tree_document, text_document):
# Load the first page of the document containing 'text text text text...'
text_page = PyPDF2.PdfFileReader(file(text_document,'rb')).getPage(0)
# Load the previously cropped tree (cropped using 'extract_tree')
tree_page = PyPDF2.PdfFileReader(file(tree_document,'rb')).getPage(0)
# Overlay the text-page and the tree-crop
text_page.mergeScaledTranslatedPage(page2=tree_page,scale='1.0',tx='100',ty='200')
# Save the result into a new empty document
output = PyPDF2.PdfFileWriter()
output.addPage(text_page)
outputStream = file('merged_document.pdf','wb')
output.write(outputStream)
# First, crop the tree and save it into cropped_document.pdf
extract_tree('document1.pdf', 'cropped_document.pdf')
# Now merge document2.pdf with cropped_document.pdf
insert_tree_into_page('cropped_document.pdf', 'document2.pdf')
The method "extract_tree" seems to be working. It generates a pdf file containing only the cropped region (in the example, the tree).
The problem in that when I try to paste the tree in the new page, the star and the house of the original image are pasted anyway
I tried something that actually worked. Try to convert your first output(pdf containing only the tree) to docx then convert it another time from docx to pdf before merging it with other pdf pages. It will work(only the tree will be merged).
Allow me to ask please, how did you implement an interface that define the bounds of the crop Au.
I had the exact same issue. In the end, the solution for me was to make a small edit to the source code of pyPDF2 (from this pull request, which never made it into the master branch). What you need to do is insert these lines into the method _mergePage of the class PageObject inside the file pdf.py:
page2Content = ContentStream(page2Content, self.pdf)
page2Content.operations.insert(0, [map(FloatObject, [page2.trimBox.getLowerLeft_x(), page2.trimBox.getLowerLeft_y(), page2.trimBox.getWidth(), page2.trimBox.getHeight()]), "re"])
page2Content.operations.insert(1, [[], "W"])
page2Content.operations.insert(2, [[], "n"])
(see the pull request for exactly where to put them). With that done, you can then crop the section of a pdf you want, and merge it with another page with no issues. There's no need to save the cropped section into a separate pdf, unless you want to.
from PyPDF2 import PdfFileReader, PdfFileWriter
tree_page = PdfFileReader(open('document1.pdf','rb')).getPage(0)
text_page = PdfFileReader(open('document2.pdf','rb')).getPage(0)
tree_page.cropBox.lowerLeft = [100,200]
tree_page.cropBox.upperRight = [250, 300]
text_page.mergeScaledTranslatedPage(page2=tree_page, scale='1.0', tx='100', ty='200')
output = PdfFileWriter()
output.addPage(text_page)
output.write(open('merged_document.pdf', 'wb'))
Maybe there's a better way of doing this that inserts that code without directly editing the source code. I'd be grateful if anyone finds a way to do it as this admittedly is a slightly dodgy hack.
I am using python to crop pdf pages.
Everything works fine, but how do I change the page size(width)?
This is my crop code:
input = PdfFileReader(file('my.pdf', 'rb'))
p = input.getPage(1)
(w, h) = p.mediaBox.upperRight
p.mediaBox.upperRight = (w/4, h)
output.addPage(p)
When I crop pages, I need to resize them as well, how can I do this?
This answer is really long overdue, and maybe the older versions of PyPDF2 didn't have this functionality, but its actually quite simple with version 1.26.0
import PyPDF2
pdf = "YOUR PDF FILE PATH.pdf"
pdf = PyPDF2.PdfFileReader(pdf)
page0 = pdf.getPage(0)
page0.scaleBy(0.5) # float representing scale factor - this happens in-place
writer = PyPDF2.PdfFileWriter() # create a writer to save the updated results
writer.addPage(page0)
with open("YOUR OUTPUT PDF FILE PATH.pdf", "wb+") as f:
writer.write(f)
Do you want to scale the image after you crop it? You can use p.scale(factor_x, factor_y) to do that.
You can also apply scaling, rotation or translation directly in the merge function call, by using the functions:
mergePage()
mergeRotatedPage()
mergeRotatedScaledPage()
mergeRotatedScaledTranslatedPage()
mergeScaledPage()
mergeScaledTranslatedPage()
mergeTransformedPage()
mergeTranslatedPage()
Or use addTransformation() on a page object.