ReportLab PDF, Table header repeating in first page - python

I am using reportLab library to generate pdf. Table header is repeating on the first page. I need to remove it. Other pages are working fine.
I am using repeatRows=1 to show the header on top of each page.
PDF generation code is here
python==2.7, reportlab==2.7

Related

Python: How to append a FPDF2 table to an existing pdf page?

I have an FPDF2 table created using this script. I used to output it to a blank page and merge it to an existing pdf, which works fine.
But now we need to add the table to an existing page in the pdf and then if it doesn't fit, we insert new pages. And that's the problem.
FPDF doesn't seem to be able to draw to an existing page. I know I can use reportlab canvas can.drawString() to draw to an existing page, but I don't know if reportlab can draw an FPDF object.
Also, if I were to ditch FPDF and use only reportlab to draw a table, I don't know how to detect the end of the page and insert a new page if needed. I'm not starting at the start of a page, I'll be starting somewhere in the middle.
I would prefer to be able to use the FPDF2 script I already have and somehow add the output at a specific x,y position in a page though, if possible. Have you ever had this issue?
I also have Pypdf2 installed and used in the same project, but I think that only reportlab can do the job. Maybe I need to detect the end of the page via Pypdf2 and write to the page via reportlab?
Since no answer exists as of now and since I need an answer to finish my task, I did the following:
I added the FPDF table to a blank PDF page, and pushed it down, by using set_y(100), to have a half blank page to work with.
And I took a screenshot of the items which need to be placed above the table and then added them to the same page by using reportlab canvas.drawImage()
If there's a better solution, please post an answer and I'll accept it and refactor my code. For now, I'll accept my answer to close this question.

How to extract tables from PDFs while pulling in non-table text section identifiers

I'm working through extracting tables using pdfplumber in Python from a PDF that has mostly-consistent structure between pages.
My goal is to extract each of the 2 tables under each section header (white font highlighted blue) on each page. See screenshot below for the structure of the PDF. The yellow highlights are my targeted extractions.
Challenge: How to set up the code such that it's clear which extract_tables() output is associated with each text section header.
As you can see the section headers and table description are not built into the table but are instead text elements of the page.
Some pages have 1, 2, or 3 section headers that may or may not contain the desired tables.
If the tables aren't present under a section, I intend to ignore that section or populate an empty table.
Sometimes the section header starts at the bottom of the page and tables don't start until the next page.
There are not any cases where a table "spills" over to the next page. So this might ease complexity.
I can loop through all pages and pull and append all tables of this specified structure together, but without the text section header and subsequent description line, I don't know what each table represents.
Alternative approach that was unsuccessful:
I tried a pure text extraction approach (not relying on extract_table at all), but the null fields in a table (dark gray) were not registering as empty cells, so consequently, Table 1 in the screenshot below incorrectly assumed the $1000 and $20 went under Facility1. Can this text extraction approach be improved to recognize those null columns as blank?
Screenshot of PDF:
End Goal:

Is there a way to remove page number text object from a PDF via Python?

I am trying to remove a page number text object from each page of a PDF I automate each month. I have been able to run a python script to create the final PDF I need, but am unable to automatically remove the incorrect page numbers and replace with new page nums in the correct order (as I have to piece this PDF together from multiple source PDFs.
Any thoughts on best way to complete this?

How do I link to a specific page of a PDF document inside a cell in Excel?

I am writing a python code which writes a hyperlink into a excel file.This hyperlink should open in a specific page in a pdf document.
I am trying something like
Worksheet.write_url('A1',"C:/Users/...../mypdf#page=3") but this doesn't work.Please let me know how this can be done.
Are you able to open the pdf file directly to a specific page even without xlsxwriter? I can not.
From Adobe's official site:
To target an HTML link to a specific page in a PDF file, add
#page=[page number] to the end of the link's URL.
For example, this HTML tag opens page 4 of a PDF file named
myfile.pdf:
Note: If you use UNC server locations (\servername\folder) in a link,
set the link to open to a set destination using the procedure in the
following section.
If you use URLs containing local hard drive addresses (c:\folder), you cannot link to page numbers or set destinations.

style hyperlinks in reportlab pdfs

I am using rst2pdf to generate a PDF. I am using links to sections and they appear as hyperlinks in the PDF. If I hover over the link I can see it says "Go to page XXX". Is there a way to insert that page number into the text, so that it can be seen on hardcopies?
I'm starting using reportlab recently. Maybe you need to use the superscript tag?
p = Paragraph("<link href='http://someurl' color='blue'><u>Some text</u><super> [goto page xx]</super></link>", customstyle)
What it may looks like

Categories