How to generate a PDF index page? - python

I have a python script that exports 772 pdfs and combines them into a multi-page pdf binder. While exporting each PDF, it also adds the name of the current pdf as an entry in a text file. After the whole binder is created, the text file has an entry for each PDF page in the same order as the PDF binder. I need to use this text file to create an index page at the beginning of the PDF, preferably linking to each page in the document.
If I have to do this task manually, I will (and I'm open to suggestions), but I hope to find a way to automate this.
Also, this doesn't have to be done in Python, but it would be nice to fit it in with my current script.
Thanks for the feedback,
Tanner

Poking around in the docs for arcpy.mapping, I can see that you weren't kidding about "it's limited".
Rather than adding new pages, have you considered adding bookmarks to the PDF?
And the only Python software I could dig up that can add bookmarks was pdfrecylce. It's in version 0.05, so I'm gonna go out on a limb and guess it might not be too stable.
If you're willing to use Java or C# there's iText and iTextSharp (but I'm biased). There are quite a few other PDF libraries floating around capable of manipulating existing PDFs... pick a language and start googling.

PDFsam will merge PDFs and create an index with links based on each individual PDF file name or title.
I initially downloaded PDFsam Basic because it will auto organize the PDFs to be merged in order of folder structure instead of only alphabetically. To add multiple PDFs from various folders I go to a directory, search "." to locate and select all the PDFs to add. I think the PDFsam Enhanced allows you to simply drag and drop an entire folder directory. Highly recommend.

Related

Switching the order of pages of a pdf in python

I'm currently working on some PDF file generation in python for nametags. However, in my freshly generated files I have all fronts and then all backs instead of a front, then the according back, then the next front and so on. I would like to correct that after the files have been generated.
So I have the following:
p1f, p2f, p3f,... ,p1b, p2b, p3b,...
Where pn describes the n-th page, f is for front and b is for back. What I want to end up with is:
p1f, p1b, p2f, p2b, p3f, p3b,...
What are possible ways to approach this? What libraries could I use?
Thanks in advance!
For libraries you can use PyPDF2 or pdfrw.
For approaches I'd suggest when you have small files:
load them into memory, reorder pages, and write them back to disk.
If a PDF file is too large you could split pages into sperate files and build the output file one page after another.
However it is safe to say that there are more efficient ways to do this.
Also you might want to check PDF-Shuffler which is a python-gtk tool to perform such tasks on a non programmatic basis.

Insert png image programatically in pdf file at specific location

Looking for easiest way to do the following:
I have created 10,000 unique QR-codes, with unique filenames.
I have one postcard design (.ai, eps, pdf - doesn't matter) with place holder for the qr code and for a the unique filename (sans .png extension).
How would I go about inserting each of the 10.000 png's into 10,000 copies of the pdf files? (and I need to do the same with the unique filename /textstring that represents each QR code).
since I am really no good with programming it' doesn't matter which tools to use. As long as you hold my hand - or there is a link to a beginners documentation.
however:
I am trying to learn python - so that is preferred.
I work a little bit with R - but that will not be the easiest solution.
If this can be done directly from the terminal with a shell script then halliluja :-)
But really - if you know of a solution - then please post it, regardless of the tools.
Thanks in advance.
You can do it in Python using pyPdf to merge documents.
Basically, you create a PDF with your QRCode placed where you want it in the end.
You can use the (c)StringIO module to store the created PDF file in memory.
You can find pyPDF here; there's an example that shows how you would add a watermark to a file, you should be following the same logic.

Hide information in a PDF file in Python

In Python, I have files generated by ReportLab. Now, i need to extract some pages from that PDF and hide confidential information.
I can create a PDF file with blacked-out spots and use pyPdf to mergePage, but people can still select and copy-paste the information under the blacked-out spots.
Is there a way to make those spots completely confidential?
Per example, I need to hide addresses on the pages, how would i do it?
Thanks,
Basically you'll have to remove the corresponding text drawing commands in the PDF's page content stream. It's much easier to generate the pages twice, once with the confidential information, once without them.
It might be possible (I don't know ReportLab enough) to specially craft the PDF in a way that the confidential information is easier accessible (e.g. as separate XObjects) for deletion. Still you'd have to do pretty low-level operations on the PDF -- which I would advise against.
(Sorry, I was not able to log on when I posted the question...)
Unfortunately, the document cannot be regenerated at will (context sensitive), and those PDF files (about 35) are 3000+ pages.
I was thinking about using pdf2ps and pdf2ps back, but there is a lot of quality.
pdf2ps -dLanguageLevel=3 input.pdf - | ps2pdf14 - output.pdf
And if i use "pdftops" instead, the text is still selectable. If there is a way to make it non-selectable like with "pdf2ps" but with better quality, it will do too.

Generating & Merging PDF Files in Python

I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc).
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
From doing a bit of search, it seems that I can use reportlab for creating content and pyPdf for merging PDF's together. Is this the best approach? Or is there a really funky way that I haven't come across yet?
Thanks!
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
Unfortunately no. There are several tools that are good at producing PDFs from scratch (most commonly for Python, ReportLab), but they don't generally load existing PDFs. You would have to include generating code for any boilerplate text, lines, blocks, shapes and images, rather than this being freely editable by the user.
On the other side there's pyPdf which can load PDFs, collate the pages, and extract some of the information, but can't really add new content. You can ‘merge’ pages into one, but you'd still have to create the extra information overlay as a page in ReportLab first.
Look into docutils and reSTructuredText. You could quickly write out your PDF document in reST and then compile the PDF using rst2pdf.py
I've used this, it creates very beautiful documents and the markup is extensible! Later you could take the same code and run it into rst2html to create a website out if it!
Take a look here:
http://docutils.sourceforge.net/docs/user/rst/quickref.html
http://code.google.com/p/rst2pdf/
Good luck
You could generate a document through, for example, TeX, or OpenOffice, or whatever gives you the most comfortable bindings and then print the document with a pdf printer.
This allows you not to have to figure out where to put fields precisely or figure out what to do if your content overflows the space allocated for it.

How to include page in PDF in PDF document in Python

I am using reportlab toolkit in Python to generate some reports in PDF format. I want to use some predefined parts of documents already published in PDF format to be included in generated PDF file. Is it possible (and how) to accomplish this in reportlab or in python library?
I know I can use some other tools like PDF Toolkit (pdftk) but I am looking for Python-based solution.
I'm currently using PyPDF to read, write, and combine existing PDF's and ReportLab to generate new content. Using the two package seemed to work better than any single package I was able to find.
If you want to place existing PDF pages in your Reportlab documents I recommend pdfrw. Unlike PageCatcher it is free.
I've used it for several projects where I need to add barcodes etc to existing documents and it works very well. There are a couple of examples on the project page of how to use it with Reportlab.
A couple of things to note though:
If the source PDF contains errors (due to the originating program following the PDF spec imperfectly for example), pdfrw may fail even though something like Adobe Reader has no apparent problems reading the PDF. pdfrw is currently not very fault tolerant.
Also, pdfrw works by being completely agnostic to the actual content of the PDF page you are placing. So for example, you wouldn't be able to use pdfrw inspect a page to see if it contains a certain string of text in the lower right-hand corner. However if you don't need to do anything like that you should be fine.
There is an add-on for ReportLab — PageCatcher.

Categories