I have a file similar to this.
What I want to do is replace the footer of the page with my own footer. What would be the best way to do this? Can I crop the footer part (fixed size from bottom) and merge my own created footer with every page? Or is there a library available that could automatically extract footer from the page? I don't have much experience in this regard. I have tried some libraries including pypdf2 and reportlab but I was not able to find any help regarding footer extraction.
Any help will be greatly appreciated.
This is a bit of hack solution, create an image with required footer text and then run this. Adjust the coordinates as required.
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = io.BytesIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawImage('yourFooterImage.png', 0, 2, 800, 45)
can.save()
# move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Code taken from here: Add text to Existing PDF using Python
My output:
Related
My Situation
Hi, I'm relatively new to coding with Python, and I'm currently working on a "PDF-Writer" which should read in PDF files and then draw a certain String to it plus some other stuff. I'm doing this with canvas and PyPDF2 and it works fine.
My question is referring to this question:
(Add text to Existing PDF using Python)
Others already commented under the post but didn't post any answers there that's why I'm asking this question. I would have commented under the post, but sadly, I don't have the reputation to comment. But I think resolving the issue by asking a new question is also fine.
My problems
My first problem is that I can only write on the first page of the document. I would like to write on all pages of the document.
My second problem is that the first page of the document is also the last page for some reason.
Here my Code:
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.lib.units import cm
file = input("Give path to the file >> ")
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
# register font
pdfmetrics.registerFont(TTFont('Arial', 'Arial.ttf'))
can.setFont(psfontname= "Arial",size = 10)
# create a new string
can.drawString(93, 54.5, "2022-12-21")
# create a white rectangle
can.setFillColorRGB(254,254,254)
can.rect(92.5,65,2*cm,0.34*cm, fill=1,stroke=0)
# save the pdf
can.save()
# move to the beginning of the StringIO buffer
packet.seek(0)
# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open(file, "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.pages[0]
page.merge_page(new_pdf.pages[0])
output.append_pages_from_reader(existing_pdf)
output.add_page(page)
# finally, write "output" to a real file
output_stream = open("destination3.pdf", "wb")
output.write(output_stream)
output_stream.close()
print("PDF created!")
What I've tried
Well, I added output.append_pages_from_reader(existing_pdf) to the code which led to the problems. Now I'm searching after a properly working Code :)
Well I got the answer, here is an excerpt:
existing_pdf = PdfFileReader(open(filename2, "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing pages
for x in range(existing_pdf.getNumPages()):
page = existing_pdf.pages[x]
page.merge_page(new_pdf.pages[0])
output.add_page(page)
# finally, write "output" to a real file
output_stream = open(outputPath, "wb")
output.write(output_stream)
output_stream.close()
print("PDF created!")
This code does exactly what I was expecting in my question above.
I am trying to write some string to a PDF file at some position.
I found a way to do this and implemented it like this:
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = io.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
It throws me an error at can.save() line
The error :
File "/home/corleone/miniconda3/lib/python3.6/site-packages/reportlab/pdfgen/canvas.py", line 1237, in save
self._doc.SaveToFile(self._filename, self)
File "/home/corleone/miniconda3/lib/python3.6/site-packages/reportlab/pdfbase/pdfdoc.py", line 224, in SaveToFile
f.write(data)
TypeError: string argument expected, got 'bytes'
Have read up at a lot of places on the internet. Found the same method everywhere. Is it the wrong way to do. Am I missing something?
Figured out I just had to use BytesIO instead of StringsIO.
Also open() instead of file() because I'm using Python3.
Here is the working script:
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = io.BytesIO()
# Create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont('Helvetica-Bold', 24)
can.drawString(10, 100, "Hello world")
can.showPage()
can.save()
# Move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# Read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# Add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# Finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
I have a function that gets a page from a PDF file via pyPdf2 and should convert the first page to a png (or jpg) with Pillow (PIL Fork)
from PyPDF2 import PdfFileWriter, PdfFileReader
import os
from PIL import Image
import io
# Open PDF Source #
app_path = os.path.dirname(__file__)
src_pdf= PdfFileReader(open(os.path.join(app_path, "../../../uploads/%s" % filename), "rb"))
# Get the first page of the PDF #
dst_pdf = PdfFileWriter()
dst_pdf.addPage(src_pdf.getPage(0))
# Create BytesIO #
pdf_bytes = io.BytesIO()
dst_pdf.write(pdf_bytes)
pdf_bytes.seek(0)
file_name = "../../../uploads/%s_p%s.png" % (name, pagenum)
img = Image.open(pdf_bytes)
img.save(file_name, 'PNG')
pdf_bytes.flush()
That results in an error:
OSError: cannot identify image file <_io.BytesIO object at 0x0000023440F3A8E0>
I found some threads with a similar issue, (PIL open() method not working with BytesIO) but I cannot see where I am wrong here, as I have pdf_bytes.seek(0) already added.
Any hints appreciated
Per document:
write(stream) Writes the collection of pages added to this object out
as a PDF file.
Parameters: stream – An object to write the file to. The object must
support the write method and the tell method, similar to a file
object.
So the object pdf_bytes contains a PDF file, not an image file.
The reason why there are codes like above work is: sometimes, the pdf file just contains a jpeg file as its content. If your pdf is just a normal pdf file, you can't just read the bytes and parse it as an image.
And refer to as a more robust implementation: https://stackoverflow.com/a/34116472/334999
[![enter image description here][1]][1]
import glob, sys, fitz
# To get better resolution
zoom_x = 2.0 # horizontal zoom
zoom_y = 2.0 # vertical zoom
mat = fitz.Matrix(zoom_x, zoom_y) # zoom factor 2 in each dimension
filename = "/xyz/abcd/1234.pdf" # name of pdf file you want to render
doc = fitz.open(filename)
for page in doc:
pix = page.get_pixmap(matrix=mat) # render page to an image
pix.save("/xyz/abcd/1234.png") # store image as a PNG
Credit
[Convert PDF to Image in Python Using PyMuPDF][2]
https://towardsdatascience.com/convert-pdf-to-image-in-python-using-pymupdf-9cc8f602525b
I need to add some extra text to an existing PDF using Python, what is the best way to go about this and what extra modules will I need to install.
Note: Ideally I would like to be able to run this on both Windows and Linux, but at a push Linux only will do.
Edit: pypdf and ReportLab look good but neither one will allow me to edit an existing PDF, are there any other options?
Example for [Python 2.7]:
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Example for Python 3.x:
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.pages[0]
page.merge_page(new_pdf.pages[0])
output.add_page(page)
# finally, write "output" to a real file
output_stream = open("destination.pdf", "wb")
output.write(output_stream)
output_stream.close()
I know this is an older post, but I spent a long time trying to find a solution. I came across a decent one using only ReportLab and PyPDF so I thought I'd share:
read your PDF using PdfFileReader(), we'll call this input
create a new pdf containing your text to add using ReportLab, save this as a string object
read the string object using PdfFileReader(), we'll call this text
create a new PDF object using PdfFileWriter(), we'll call this output
iterate through input and apply .mergePage(*text*.getPage(0)) for each page you want the text added to, then use output.addPage() to add the modified pages to a new document
This works well for simple text additions. See PyPDF's sample for watermarking a document.
Here is some code to answer the question below:
packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)
From here you can merge the pages of the input file with another document.
pdfrw will let you read in pages from an existing PDF and draw them to a reportlab canvas (similar to drawing an image). There are examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer: I am the pdfrw author.
cpdf will do the job from the command-line. It isn't python, though (afaik):
cpdf -add-text "Line of text" input.pdf -o output .pdf
Leveraging David Dehghan's answer above, the following works in Python 2.7.13:
from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(290, 720, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader("original.pdf")
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Don't use mergePage, It may not work for some pdfs
You should use mergeRotatedTranslatedPage
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen.canvas import Canvas
page_to_merge = 0 #Refers to the First page of PDF
xcoor = 250 #To be changed according to your pdf
ycoor = 650 #To be changed according to your pdf
input_pdf = PdfFileReader(open("Source.pdf", "rb"))
page_count = input_pdf.getNumPages()
inputpdf_page_to_be_merged = input_pdf.getPage(page_to_merge)
packet = io.BytesIO()
c = Canvas(packet,pagesize=(inputpdf_page_to_be_merged.mediaBox.getWidth(),inputpdf_page_to_be_merged.mediaBox.getHeight()))
c.drawString(xcoor,ycoor,"Hello World")
c.save()
packet.seek(0)
overlay_pdf = PdfFileReader(packet)
overlay = overlay_pdf.getPage(0)
output = PdfFileWriter()
for PAGE in range(page_count):
if PAGE == page_to_merge:
inputpdf_page_to_be_merged.mergeRotatedTranslatedPage(overlay,
inputpdf_page_to_be_merged.get('/Rotate') or 0,
overlay.mediaBox.getWidth()/2, overlay.mediaBox.getWidth()/2)
output.addPage(inputpdf_page_to_be_merged)
else:
Page_in_pdf = input_pdf.getPage(PAGE)
output.addPage(Page_in_pdf)
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
If you're on Windows, this might work:
PDF Creator Pilot
There's also a whitepaper of a PDF creation and editing framework in Python. It's a little dated, but maybe can give you some useful info:
Using Python as PDF Editing and Processing Framework
You may have better luck breaking the problem down into converting PDF into an editable format, writing your changes, then converting it back into PDF. I don't know of a library that lets you directly edit PDF but there are plenty of converters between DOC and PDF for example.
I need to add some extra text to an existing PDF using Python, what is the best way to go about this and what extra modules will I need to install.
Note: Ideally I would like to be able to run this on both Windows and Linux, but at a push Linux only will do.
Edit: pypdf and ReportLab look good but neither one will allow me to edit an existing PDF, are there any other options?
Example for [Python 2.7]:
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Example for Python 3.x:
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.pages[0]
page.merge_page(new_pdf.pages[0])
output.add_page(page)
# finally, write "output" to a real file
output_stream = open("destination.pdf", "wb")
output.write(output_stream)
output_stream.close()
I know this is an older post, but I spent a long time trying to find a solution. I came across a decent one using only ReportLab and PyPDF so I thought I'd share:
read your PDF using PdfFileReader(), we'll call this input
create a new pdf containing your text to add using ReportLab, save this as a string object
read the string object using PdfFileReader(), we'll call this text
create a new PDF object using PdfFileWriter(), we'll call this output
iterate through input and apply .mergePage(*text*.getPage(0)) for each page you want the text added to, then use output.addPage() to add the modified pages to a new document
This works well for simple text additions. See PyPDF's sample for watermarking a document.
Here is some code to answer the question below:
packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)
From here you can merge the pages of the input file with another document.
pdfrw will let you read in pages from an existing PDF and draw them to a reportlab canvas (similar to drawing an image). There are examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer: I am the pdfrw author.
cpdf will do the job from the command-line. It isn't python, though (afaik):
cpdf -add-text "Line of text" input.pdf -o output .pdf
Leveraging David Dehghan's answer above, the following works in Python 2.7.13:
from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(290, 720, "Hello world")
can.save()
#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader("original.pdf")
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
Don't use mergePage, It may not work for some pdfs
You should use mergeRotatedTranslatedPage
from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen.canvas import Canvas
page_to_merge = 0 #Refers to the First page of PDF
xcoor = 250 #To be changed according to your pdf
ycoor = 650 #To be changed according to your pdf
input_pdf = PdfFileReader(open("Source.pdf", "rb"))
page_count = input_pdf.getNumPages()
inputpdf_page_to_be_merged = input_pdf.getPage(page_to_merge)
packet = io.BytesIO()
c = Canvas(packet,pagesize=(inputpdf_page_to_be_merged.mediaBox.getWidth(),inputpdf_page_to_be_merged.mediaBox.getHeight()))
c.drawString(xcoor,ycoor,"Hello World")
c.save()
packet.seek(0)
overlay_pdf = PdfFileReader(packet)
overlay = overlay_pdf.getPage(0)
output = PdfFileWriter()
for PAGE in range(page_count):
if PAGE == page_to_merge:
inputpdf_page_to_be_merged.mergeRotatedTranslatedPage(overlay,
inputpdf_page_to_be_merged.get('/Rotate') or 0,
overlay.mediaBox.getWidth()/2, overlay.mediaBox.getWidth()/2)
output.addPage(inputpdf_page_to_be_merged)
else:
Page_in_pdf = input_pdf.getPage(PAGE)
output.addPage(Page_in_pdf)
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()
If you're on Windows, this might work:
PDF Creator Pilot
There's also a whitepaper of a PDF creation and editing framework in Python. It's a little dated, but maybe can give you some useful info:
Using Python as PDF Editing and Processing Framework
You may have better luck breaking the problem down into converting PDF into an editable format, writing your changes, then converting it back into PDF. I don't know of a library that lets you directly edit PDF but there are plenty of converters between DOC and PDF for example.