Create a pdf in python and keep the coordinates of elements

Create a pdf in python and keep the coordinates of elements - python

I need to generate an examination template for an online school website,
I need to know each coordinates of answers boxes in order to crop them later.
Is it possible to generate a pdf and get coordinates from each elements inside the pdf ? (Like inserting a black square as an image in the pdf and get his coordinates ?)
I found many libraries to create pdf like pyPdf, pyPdf2,... but i didn't find a way to get coordinates.
Thank you for your suggestions and advices.

You could use reportlab. It would allow you to access coordinates by specifying them yourself:
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.lib.pagesizes import letter
import io
buf = io.BytesIO()
doc = SimpleDocTemplate(buf, rightMargin=inch/2, leftMargin=inch/2, topMargin=inch/2, bottomMargin=inch/2, pagesize=letter)
styles = getSampleStyleSheet()
answers = []
answers.append(Paragraph('Data for Answer box', styles['Normal']))
doc.build(answers)
school_pdf = open('answers.pdf', 'a')
school_pdf.write(buf.getvalue())

Related

How to align text in table cells using Borb

I am creating PDF document using borb and try to align text within table cells.
from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import SingleColumnLayout
from borb.pdf import Paragraph
from borb.pdf import PDF
from borb.pdf import Alignment
from borb.pdf import TableCell
from borb.pdf import FlexibleColumnWidthTable
from borb.pdf import Table
pdf = Document()
page = Page()
pdf.add_page(page)
layout = SingleColumnLayout(page)
layout.add(
FixedColumnWidthTable(number_of_columns=1, number_of_rows=1)
.add(Paragraph(
"""
Report generated on 2022-01-01 at 00:00 am (UTC)
Date: 01 Jan
""",
text_alignment=Alignment.RIGHT,
padding_top=Decimal(12),
respect_newlines_in_text=True,
font_size=Decimal(10))))
with open(Path("output.pdf"), "wb") as pdf_file_handle:
PDF.dumps(pdf_file_handle, pdf)
But the text is not aligned to the very right, but to middle of the cell (check the image). Do you know how to align the text to the very right border of the table?

disclaimer: I am the author of borb
You are experiencing the difference between the horizontal_alignment of a LayoutElement and the text_alignment of said element.
When performing layout on a text-carrying LayoutElement, the logic is roughly the following:
How wide is this text going to be? That will be the width of this LayoutElement. This step takes into account possible overflow, moving text to the next line, etc
Now that we've determined the width (and height) of the LayoutElement, we take into account text_alignment
Your text fits on roughly half the Page, so the content box of your Paragraph is "roughly half the page". If you then apply text_alignment, it doesn't really do much.
I would suggest you simply set the horizontal_alignment to Alignment.RIGHT. That yields the following PDF:
I also wonder why you're using a Table to perform layout. This seems like bad document design.

create pdf with fpdf in python. can not move image to the right in a loop

I use FPDF to generate a pdf with python. i have a problem for which i am looking for a solution. in a folder "images", there are pictures that I would like to display each on ONE page. I did that - maybe not elegant. unfortunately i can't move the picture to the right. it looks like pdf_set_y won't work in the loop.
from fpdf import FPDF
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir('../images') if isfile(join('../images', f))]
pdf = FPDF('L')
pdf.add_page()
pdf.set_font('Arial', 'B', 16)
onlyfiles = ['VI 1.png', 'VI 2.png', 'VI 3.png']
y = 10
for image in onlyfiles:
pdf.set_x(10)
pdf.set_y(y)
pdf.cell(10, 10, 'please help me' + image, 0, 0, 'L')
y = y + 210 #to got to the next page
pdf.set_x(120)
pdf.set_y(50)
pdf.image('../images/' + image, w=pdf.w/2.0, h=pdf.h/2.0)
pdf.output('problem.pdf', 'F')
Would be great if you have a solution/help for me. Thanks alot
greets alex

I see the issue. You want to specify the x and y location in the call to pdf.image(). That assessment is based on the documentation for image here: https://pyfpdf.readthedocs.io/en/latest/reference/image/index.html
So you can instead do this (just showing for loop here):
for image in onlyfiles:
pdf.set_x(10)
pdf.set_y(y)
pdf.cell(10, 10, 'please help me' + image, 0, 0, 'L')
y = y + 210 # to go to the next page
# increase `x` from 120 to, say, 150 to move the image to the right
pdf.image('../images/' + image, x=120, y=50, w=pdf.w/2.0, h=pdf.h/2.0)
# new -> ^^^^^ ^^^^

You can check pdfme library. It's the most powerful library in python to create PDF documents. You can add urls, footnotes, headers and footers, tables, anything you would need in a PDF document.
The only problem I see is that currently pdfme only supports jpg format for images. But if that's not a problem it will help you with your task.
Check the docs here.

disclaimer: I am the author of pText the library I will use in this solution.
Let's start by creating an empty Document:
pdf: Document = Document()
# create empty page
page: Page = Page()
# add page to document
pdf.append_page(page)
Next we are going to load an Image using Pillow
import requests
from PIL import Image as PILImage
im = PILImage.open(
requests.get(
"https://365psd.com/images/listing/fbe/ups-logo-49752.jpg", stream=True
).raw
)
Now we can create a LayoutElement Image and use its layout method
Image(im).layout(
page,
bounding_box=Rectangle(Decimal(20), Decimal(724), Decimal(64), Decimal(64)),
)
Keep in mind the origin (in PDF space) is at the lower left corner.
Lastly, we need to store the Document
# attempt to store PDF
with open("output.pdf", "wb") as out_file_handle:
PDF.dumps(out_file_handle, pdf)
You can obtain pText either on GitHub, or using PyPi
There are a ton more examples, check them out to find out more about working with images.

Python reportlab dynamically create new page after first page is completely filled with text

Python 3.8 x64 | Windows 10 x64 | reportlab v3.5.46 (open-source)
Been searching everywhere for an answer on this to no avail. I just want to create a PDF document that contains a lot of text. After the first page is completely full of text, the remaining text should flow onto a second page. After the second page is completely full of text, the remaining text should flow onto a third page (and so on)...
All answers I find searching around the world via the internet states to use the canvas.showPage() method. This is not a great solution because in all of these examples the first page is not completely populated with text hence it is a manual method of adding a new page. In my example I do not know when the first page will be filled with text thus I do not know when I need to create a second or new page using canvas.showPage().
I need to somehow detect when the first page cannot hold any more text, and when this occurs create a new page to hold the text which remains.
From reading over the reportlabs documentation, I am not sure how to achieve this in a practical pythonic implementation. There are also platypus.PageBreak() and BaseDocTemplate.afterPage() methods but not sure what they do because the documentation is sparse on these methods.
I don't think the code I am using will be much value for my question, but it is included below for reference. The function parameter my_text is a multi-page amount of text.
from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.pagesizes import LETTER
from reportlab.lib.units import inch
def create_pdf_report_text_object(my_text):
canvas = Canvas('Test.pdf', pagesize=LETTER)
canvas.setFont('Helvetica', size=10)
text_object = canvas.beginText(x=1 * inch, y=10 * inch)
for line in my_text.splitlines(False):
text_object.textLine(line.rstrip())
canvas.drawText(text_object)
canvas.save()

I believe one solution to your question is to use a frame, this means the frame is re-created on every page until text runs out. The frame will detect when it is full.
Please see below example as a start to your own solution (its complete, just copy and paste and run the code, a pdf called "Example_output.pdf" should be created).
from reportlab.lib.pagesizes import letter
from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.platypus import BaseDocTemplate, PageTemplate, Flowable, FrameBreak, KeepTogether, PageBreak, Spacer
from reportlab.platypus import Frame, PageTemplate, KeepInFrame
from reportlab.lib.units import cm
from reportlab.platypus import (Table, TableStyle, BaseDocTemplate)
styleSheet = getSampleStyleSheet()
########################################################################
def create_pdf():
"""
Create a pdf
"""
# Create a frame
text_frame = Frame(
x1=3.00 * cm, # From left
y1=1.5 * cm, # From bottom
height=19.60 * cm,
width=15.90 * cm,
leftPadding=0 * cm,
bottomPadding=0 * cm,
rightPadding=0 * cm,
topPadding=0 * cm,
showBoundary=1,
id='text_frame')
# Create text
L = [Paragraph("""What concepts does PLATYPUS deal with?""", styleSheet['Heading2']),
Paragraph("""
The central concepts in PLATYPUS are Flowable Objects, Frames, Flow
Management, Styles and Style Sheets, Paragraphs and Tables. This is
best explained in contrast to PDFgen, the layer underneath PLATYPUS.
PDFgen is a graphics library, and has primitive commans to draw lines
and strings. There is nothing in it to manage the flow of text down
the page. PLATYPUS works at the conceptual level fo a desktop publishing
package; you can write programs which deal intelligently with graphic
objects and fit them onto the page.
""", styleSheet['BodyText']),
Paragraph("""
How is this document organized?
""", styleSheet['Heading2']),
Paragraph("""
Since this is a test script, we'll just note how it is organized.
the top of each page contains commentary. The bottom half contains
example drawings and graphic elements to whicht he commentary will
relate. Down below, you can see the outline of a text frame, and
various bits and pieces within it. We'll explain how they work
on the next page.
""", styleSheet['BodyText']),
]
# Building the story
story = L * 20 # (alternative, story.add(L))
story.append(KeepTogether([]))
# Establish a document
doc = BaseDocTemplate("Example_output.pdf", pagesize=letter)
# Creating a page template
frontpage = PageTemplate(id='FrontPage',
frames=[text_frame]
)
# Adding the story to the template and template to the document
doc.addPageTemplates(frontpage)
# Building doc
doc.build(story)
# ----------------------------------------------------------------------
if __name__ == "__main__":
create_pdf() # Printing the pdf

How to extract images and image BBox coordinates using python?

I am trying to extract images in PDF with BBox coordinates of the image.
I tried using pdfrw library, it is identifying image objects and it have an attribute called media box which have some coordinates, i am not sure if those are correct bbox coordinates since for some pdfs it is showing something like this
['0', '0', '684', '864']
but image doesn't start at the start of the page, so i don't think it is bbox
I tried with following code using pdfrw
import pdfrw, os
from pdfrw import PdfReader, PdfWriter
from pdfrw.findobjs import page_per_xobj
outfn = 'extract.' + os.path.basename(path)
pages = list(page_per_xobj(PdfReader(path).pages, margin=0.5*72))
writer = PdfWriter(outfn)
writer.addpages(pages)
writer.write()
How do i get image along with it's bbox coordinates?
sample pdf : https://drive.google.com/open?id=1IVbj1b3JfmSv_BJvGUqYvAPVl3FwC2A-

I found a way to do it through a library called pdfplumber. It's built on top of pdfminer and is working consistently in my use-case. And moreover, its MIT licensed so it is helpful for my office work.
import pdfplumber
pdf_obj = pdfplumber.open(doc_path)
page = pdf_obj.pages[page_no]
images_in_page = page.images
page_height = page.height
image = images_in_page[0] # assuming images_in_page has at least one element, only for understanding purpose.
image_bbox = (image['x0'], page_height - image['y1'], image['x1'], page_height - image['y0'])
cropped_page = page.crop(image_bbox)
image_obj = cropped_page.to_image(resolution=400)
image_obj.save(path_to_save_image)
Worked well for tables and images in my case.

Embed .SVG files into PDF using reportlab

I have written a script in python that produces matplotlib graphs and puts them into a pdf report using reportlab.
I am having difficulty embedding SVG image files into my PDF file. I've had no trouble using PNG images but I want to use SVG format as this produces better quality images in the PDF report.
This is the error message I am getting:
IOError: cannot identify image file
Does anyone have suggestions or have you overcome this issue before?

Yesterday I succeeded in using svglib to add a SVG Image as a reportlab Flowable.
so this drawing is an instance of reportlab Drawing, see here:
from reportlab.graphics.shapes import Drawing
a reportlab Drawing inherits Flowable:
from reportlab.platypus import Flowable
Here is a minimal example that also shows how you can scale it correctly (you must only specify path and factor):
from svglib.svglib import svg2rlg
drawing = svg2rlg(path)
sx = sy = factor
drawing.width, drawing.height = drawing.minWidth() * sx, drawing.height * sy
drawing.scale(sx, sy)
#if you want to see the box around the image
drawing._showBoundary = True

As mentioned by skidzo, you can totally do this with the svglib package, which you can find here: https://pypi.python.org/pypi/svglib/
According to the website, Svglib is a pure-Python library for reading SVG files and converting them (to a reasonable degree) to other formats using the ReportLab Open Source toolkit.
You can use pip to install svglib.
Here is a complete example script:
# svg_demo.py
from reportlab.graphics import renderPDF, renderPM
from reportlab.platypus import SimpleDocTemplate
from svglib.svglib import svg2rlg
def svg_demo(image_path, output_path):
drawing = svg2rlg(image_path)
renderPDF.drawToFile(drawing, output_path)
if __name__ == '__main__':
svg_demo('/path/to/image.svg', 'svg_demo.pdf')

skidzo's answer is very helpful, but isn't a complete example of how to use an SVG file as a flowable in a reportlab PDF. Hopefully this is helpful for others trying to figure out the last few steps:
from io import BytesIO
import matplotlib.pyplot as plt
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import SimpleDocTemplate, Paragraph
from svglib.svglib import svg2rlg
def plot_data(data):
# Plot the data using matplotlib.
plt.plot(data)
# Save the figure to SVG format in memory.
svg_file = BytesIO()
plt.savefig(svg_file, format='SVG')
# Rewind the file for reading, and convert to a Drawing.
svg_file.seek(0)
drawing = svg2rlg(svg_file)
# Scale the Drawing.
scale = 0.75
drawing.scale(scale, scale)
drawing.width *= scale
drawing.height *= scale
return drawing
def main():
styles = getSampleStyleSheet()
pdf_path = 'sketch.pdf'
doc = SimpleDocTemplate(pdf_path)
data = [1, 3, 2]
story = [Paragraph('Lorem ipsum!', styles['Normal']),
plot_data(data),
Paragraph('Dolores sit amet.', styles['Normal'])]
doc.build(story)
main()

You need to make sure you are importing PIL (Python Imaging Library) in your code so that ReportLab can use it to handle image types like SVG. Otherwise it can only support a few basic image formats.
That said, I recall having some trouble, even when using PIL, with vector graphics. I don't know if I tried SVG but I remember having a lot of trouble with EPS.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create a pdf in python and keep the coordinates of elements - python

Related

How to align text in table cells using Borb

create pdf with fpdf in python. can not move image to the right in a loop

Python reportlab dynamically create new page after first page is completely filled with text

How to extract images and image BBox coordinates using python?

Embed .SVG files into PDF using reportlab

Categories

Resources