I am reading text from one pdf recursively and doing some operation with the extracted text at each run and want to create a new pdf to save that edited text with each run ..
I tried below from PyPDF2..
import PyPDF2
output = PdfFileWriter()
pdf="pdfte.pdf"
Obj_pdfFile = open(pdf, 'rb')
pdfReader = PyPDF2.PdfFileReader(Obj_pdfFile,strict = False)
pages=pdfReader.numPages
for page in range(pages):
pageObj = pdfReader.getPage(page)
pdf_text=pageObj.extractText()
upper = pdf_text.upper()
#print(pdf_text)
output.addPage(input.getPage(upper)) . # I thought this will work but no use..
I know need to input "page" but basically looking how to save edited text in new pdf ... I know I am missing some code here that how to save in pdf etc but thats exactly what I need help, never worked with pdf..
Also, is there any better option to do this ?
PyPDF2 is amazing to handle pdf files as documents, but not as an editor. I wanted to do the same that you tried, but only made it posible with reportlab as many other answers here do. Note that here
output.addPage(input.getPage(upper)) . # I thought this will work but no use.
upper is a string, and getPage() is expecting a page from
PyPDF2.PdfFileReader(pdffile).getPage(0)
Here is that worked for me on python 2.7:
temp = StringIO()
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A6 #choose here your size
can = canvas.Canvas(temp, pagesize=A6)
can.drawString(10, 405, "Your string on this position")
can.save()
temp.seek(0)
lector = PyPDF2.PdfFileReader(temp)
output.addPage(lector.getPage(0)) #your pypdf2 writter
now output is your pdf with the string attached, hope someone finds it useful.
Related
I'm learning Python and I'm creating a project:
I'm making a program, which gets a ready_pdf_file(with forms), creating an overlay_pdf that we gonna merge onto a final_pdf_file(with filled forms).
I have some problems with encoding, decoding probably with binaries. I've searched a lot on stackoverflow, YouTube, I spent about 5 hours - didn't find or didn't understand the solution.
Code:
import fpdf
from PyPDF2 import PdfFileWriter, PdfFileReader
# input to fill the form
name_surname = input("Type name and surname\n")
# filenames
overlay_pdf_file_name = 'overlay_PDF.pdf'
pdf_template_file_name = 'pdf_template.pdf'
result_pdf_file_name = 'final_PDF.pdf'
# fpdf operations
pdf = fpdf.FPDF(format='letter', unit='pt')
pdf.add_page()
pdf_style = ''
pdf.set_font("Arial", style=pdf_style, size=15)
# placing a cell and filling it with the text from the input
pdf.set_xy(70, 87)
pdf.cell(150, 15, txt=name_surname, border=0, ln=0)
pdf.output(overlay_pdf_file_name)
pdf.close()
# below copied solution from GitHub: https://gist.github.com/dwayneblew/79da32727358b502f6ec
# Take the PDF you created above and overlay it on your template PDF
# Open your template PDF
pdf_template = PdfFileReader(open(pdf_template_file_name, 'rb'))
# Get the first page from the template
template_page = pdf_template.getPage(0)
# Open your overlay PDF that was created earlier
overlay_pdf = PdfFileReader(open(overlay_pdf_file_name, 'rb'))
# Merge the overlay page onto the template page
template_page.mergePage(overlay_pdf.getPage(0))
# Write the result to a new PDF file
output_pdf = PdfFileWriter()
output_pdf.addPage(template_page)
output_pdf.write(open(result_pdf_file_name, "wb"))
This results me a good filled file but only if I use normal characters.
If I use my polish characters in input like: 'ąćżł' I'm getting an error:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0105' in position 50: ordinal not in range(256)
I can't understand why (probably some coding, encoding or writing in binary mode) and I can't find solution for this...
Anyone know the solution?
Problem solved:
First, I had to install (becouse I had previous, old version):
pip install fpdf2
Then, I had to download the DejaVuSansCondensed.ttf (from link: https://github.com/reingart/pyfpdf/releases/download/binary/fpdf_unicode_font_pack.zip) and put it in application folder, then i needed to add 2 lines of code:
pdf.add_font('DejaVu', fname='DejaVuSansCondensed.ttf')
pdf.set_font('DejaVu', size=14)
That solved the problem. Thank you everybody for helping me.
import fitz
text_rectangle = fitz.Rect(450,20,550,120)
file_handle = fitz.open(input_file)
first_page = file_handle[0]
text = 'SAS Automation'
first_page.insertTextbox(text_rectangle, f'{text}')
file_handle.save(output_file)
Above code adds text in pdf in mirror form why I dont know I tried insertText method, morph attribute with inserTextbox but still no solutions finds.you can see output hereOutPut PDF file image
Any help? Thanks In Advance
This usually happens when the pdf has its own orientations, etc. Using page.clean_contents() standardizes the page and should be used before the first insertion of any item.
in my case, it seems to be an issue with the PDF file.
i fixed it by generating another copy of the pdf file.
i used photoshop then save as PDF. you can also try "Print to PDF".
HTH
I tried updating my existing pdf file. but it wasn't the correct solution to overcome this problem. Finally, I tried by creating new pdf file.
file_handle = fitz.open()
first_page = file_handle.newPage() #file_handle[0] is getting issue
The souce file is here.The fetch code is sify .It's just one jpg. If you can't download it, please contact bbliao#126.com.
However this image doesn't work with fpdf package, I don't know why. You can try it.
Thus I have to use the img2pdf. With the following code I converted this image to pdf successfully.
t=os.listdir()
with open('bb.pdf','wb') as f:
f.write(img2pdf.convert(t))
However, when multiple images are combined into one pdf file, the img2pdf just combine each image by head_to_tail. This causes every pagesize = imgaesize. Briefly, the first page of pdf is 30 cm*40 cm while the second is 20 cm*10 cm the third is 15*13...That's ugly.
I want the same pagesize(A4 for example) and the same imgsize in every page of the pdf. One page of pdf with one image.
Glancing at the documentation for img2pdf, it allows you to set the paper size by including layout details to the convert call:
import img2pdf
letter = (img2pdf.in_to_pt(8.5), img2pdf.in_to_pt(11))
layout = img2pdf.get_layout_fun(letter)
with open('test.pdf', 'wb') as f:
f.write(img2pdf.convert(['image1.jpg','image2.jpg'], layout_fun=layout))
Lets say you have a pdf page with various complex elements inside.
The objective is to crop a region of the page (to extract only one of the elements) and then paste it in another pdf page.
Here is a simplified version of my code:
import PyPDF2
import PyPdf
def extract_tree(in_file, out_file):
with open(in_file, 'rb') as infp:
# Read the document that contains the tree (in its first page)
reader = pyPdf.PdfFileReader(infp)
page = reader.getPage(0)
# Crop the tree. Coordinates below are only referential
page.cropBox.lowerLeft = [100,200]
page.cropBox.upperRight = [250,300]
# Create an empty document and add a single page containing only the cropped page
writer = pyPdf.PdfFileWriter()
writer.addPage(page)
with open(out_file, 'wb') as outfp:
writer.write(outfp)
def insert_tree_into_page(tree_document, text_document):
# Load the first page of the document containing 'text text text text...'
text_page = PyPDF2.PdfFileReader(file(text_document,'rb')).getPage(0)
# Load the previously cropped tree (cropped using 'extract_tree')
tree_page = PyPDF2.PdfFileReader(file(tree_document,'rb')).getPage(0)
# Overlay the text-page and the tree-crop
text_page.mergeScaledTranslatedPage(page2=tree_page,scale='1.0',tx='100',ty='200')
# Save the result into a new empty document
output = PyPDF2.PdfFileWriter()
output.addPage(text_page)
outputStream = file('merged_document.pdf','wb')
output.write(outputStream)
# First, crop the tree and save it into cropped_document.pdf
extract_tree('document1.pdf', 'cropped_document.pdf')
# Now merge document2.pdf with cropped_document.pdf
insert_tree_into_page('cropped_document.pdf', 'document2.pdf')
The method "extract_tree" seems to be working. It generates a pdf file containing only the cropped region (in the example, the tree).
The problem in that when I try to paste the tree in the new page, the star and the house of the original image are pasted anyway
I tried something that actually worked. Try to convert your first output(pdf containing only the tree) to docx then convert it another time from docx to pdf before merging it with other pdf pages. It will work(only the tree will be merged).
Allow me to ask please, how did you implement an interface that define the bounds of the crop Au.
I had the exact same issue. In the end, the solution for me was to make a small edit to the source code of pyPDF2 (from this pull request, which never made it into the master branch). What you need to do is insert these lines into the method _mergePage of the class PageObject inside the file pdf.py:
page2Content = ContentStream(page2Content, self.pdf)
page2Content.operations.insert(0, [map(FloatObject, [page2.trimBox.getLowerLeft_x(), page2.trimBox.getLowerLeft_y(), page2.trimBox.getWidth(), page2.trimBox.getHeight()]), "re"])
page2Content.operations.insert(1, [[], "W"])
page2Content.operations.insert(2, [[], "n"])
(see the pull request for exactly where to put them). With that done, you can then crop the section of a pdf you want, and merge it with another page with no issues. There's no need to save the cropped section into a separate pdf, unless you want to.
from PyPDF2 import PdfFileReader, PdfFileWriter
tree_page = PdfFileReader(open('document1.pdf','rb')).getPage(0)
text_page = PdfFileReader(open('document2.pdf','rb')).getPage(0)
tree_page.cropBox.lowerLeft = [100,200]
tree_page.cropBox.upperRight = [250, 300]
text_page.mergeScaledTranslatedPage(page2=tree_page, scale='1.0', tx='100', ty='200')
output = PdfFileWriter()
output.addPage(text_page)
output.write(open('merged_document.pdf', 'wb'))
Maybe there's a better way of doing this that inserts that code without directly editing the source code. I'd be grateful if anyone finds a way to do it as this admittedly is a slightly dodgy hack.
So I have been tasked with creating a pdf that allows the end user to enter information into the pdf and print it or save it, either or. The pdf I am trying to create is being rendered from a pdf template that has fillable fields. The problem I have is that every time I use any python library(pypdf2, pdfrw, reportlabs, etc...) to create this pdf, it flattens it and the fields are no longer fillable after export. Is there anything out there that will accomplish this goal? It doesn't really matter to me if I have to take the flat template file and render an html form onto it, so long as it works. The pdf was made in acrobat and I made sure to remove the password.
The pdf in question was created in acrobat pro. My python version is 2.7.
If anyone has done this before, that information would be super helpful. Thanks!
Facing the same problem. Just found out, the forms in the output.pdf are not fillable in okular or firefox, but still able to fill when opened in google-chrome or chromium-browser.
I used this lines:
from PyPDF2 import PdfFileWriter, PdfFileReader
from reportlab.pdfgen import canvas
c = canvas.Canvas('watermark.pdf')
c.drawImage('testplot.png', 350, 550, width=150, height=150)
c.save()
output_file = PdfFileWriter()
watermark = PdfFileReader(open("watermark.pdf", 'rb'))
input_file = PdfFileReader(file('template.pdf', 'rb'))
page_count = input_file.getNumPages()
for page_number in range(page_count):
print "Plotting png to {} of {}".format(page_number, page_count)
input_page = input_file.getPage(page_number)
input_page.mergePage(watermark.getPage(0))
output_file.addPage(input_page)
with open('output.pdf', 'wb') as outputStream:
output_file.write(outputStream)