I have a python routine that builds a PDF file of around 100 pages. I would like to add something like an index page, with links to pages inside the same file.
I would like to know if that is even possible.
You can add a preprocessing step before your PDF generation code by creating an intermediate (text-based) file. You can extract relevant links/titles (using regex or something similar) and add those in the index on an ad-hoc basis. Might be an extra step but you would get more flexibility incase you need to modify this functionality in future.
Related
I am looking for a kind of database which can search in separate files eg. pdf, xls, doc that I get from different suppliers. My idea is something like this:
For example, I need to search for a part number and check different data about it. The file containing the part number must then be opened with the part number marked. If there are multiple hits, the database should display a list of the various files containing the searched item number. The list should act as links that open the file with the item number selected when selecting one from the list.
Does this already exist or how do I approach it?
Today, it's all assembled into a single PDF file of more than 1000 pages, and it's a time-consuming and laborious process to maintain.
I've only used vba in connection with Excel, so maybe it's too complicated for me. But is it possible for a programmer without spending 1000 hours on it?
Please help me :-)
Either Access or Excel could do this. I noticed the Python tag. I'm sure Python could handle this as well, although it seems more like a database solution would be best. It sounds like a one-to-many scenario. See the link below for some ideas of how this technique works.
https://www.tutorialspoint.com/ms_access/ms_access_one_to_many_relationship.htm
Also, below is a link with a whole bunch of MS Access templates. Take a look at that and hopefully that will give you some ideas of how to get started.
https://www.microsoftaccessexpert.com/Microsoft-Access-Templates.aspx
I agree, keeping this in a PDF with 1000 pages is NOT the way to go!!
I'm currently working on some PDF file generation in python for nametags. However, in my freshly generated files I have all fronts and then all backs instead of a front, then the according back, then the next front and so on. I would like to correct that after the files have been generated.
So I have the following:
p1f, p2f, p3f,... ,p1b, p2b, p3b,...
Where pn describes the n-th page, f is for front and b is for back. What I want to end up with is:
p1f, p1b, p2f, p2b, p3f, p3b,...
What are possible ways to approach this? What libraries could I use?
Thanks in advance!
For libraries you can use PyPDF2 or pdfrw.
For approaches I'd suggest when you have small files:
load them into memory, reorder pages, and write them back to disk.
If a PDF file is too large you could split pages into sperate files and build the output file one page after another.
However it is safe to say that there are more efficient ways to do this.
Also you might want to check PDF-Shuffler which is a python-gtk tool to perform such tasks on a non programmatic basis.
I have a python script that exports 772 pdfs and combines them into a multi-page pdf binder. While exporting each PDF, it also adds the name of the current pdf as an entry in a text file. After the whole binder is created, the text file has an entry for each PDF page in the same order as the PDF binder. I need to use this text file to create an index page at the beginning of the PDF, preferably linking to each page in the document.
If I have to do this task manually, I will (and I'm open to suggestions), but I hope to find a way to automate this.
Also, this doesn't have to be done in Python, but it would be nice to fit it in with my current script.
Thanks for the feedback,
Tanner
Poking around in the docs for arcpy.mapping, I can see that you weren't kidding about "it's limited".
Rather than adding new pages, have you considered adding bookmarks to the PDF?
And the only Python software I could dig up that can add bookmarks was pdfrecylce. It's in version 0.05, so I'm gonna go out on a limb and guess it might not be too stable.
If you're willing to use Java or C# there's iText and iTextSharp (but I'm biased). There are quite a few other PDF libraries floating around capable of manipulating existing PDFs... pick a language and start googling.
PDFsam will merge PDFs and create an index with links based on each individual PDF file name or title.
I initially downloaded PDFsam Basic because it will auto organize the PDFs to be merged in order of folder structure instead of only alphabetically. To add multiple PDFs from various folders I go to a directory, search "." to locate and select all the PDFs to add. I think the PDFsam Enhanced allows you to simply drag and drop an entire folder directory. Highly recommend.
I have several pdf files of some lecture slides. I want to do the following: print every pdf file to another pdf file in which there are 6 slides per page and then merge all the resulting files to one big file while making sure that every original file starts on an odd page number (Edit: obviously, it will be printed in duplex) (possibly adding blank pages when necessary).
Is that possible?
Edit: For those interested, this is for printing a LOT of course material for an exam... And I need to do this for a lot of courses.
If it were me, I would use PDFjam or a similar tool to perform the 6-up on each of the source documents.
I would then use PyPDF to calculate the number of pages in each, add a blank page if necessary, and merge the rest of the pages. Something like:
blank_page = PDFFileReader('blank.pdf').pages[0]
dest = PDFFileWriter()
for source in sources:
PDF = PDFFileReader(source)
dest.addPage(PDF.pages)
if PDF.numPages % 2: #odd number of pages in source
dest.addPage(blank_page)
It appears PyPDF does also have support for merging pages with resize and relocate, so theoretically, it should also work for creating an n-up document, though I see no example code for that.
For putting multiple slides on one page, pdfnup from the PDFjam package is your friend.
For inserting the blank pages, I'm not sure; maybe you can convince pdfjam to do this as well. But can't you just turn off duplexing in the print settings?
I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc).
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
From doing a bit of search, it seems that I can use reportlab for creating content and pyPdf for merging PDF's together. Is this the best approach? Or is there a really funky way that I haven't come across yet?
Thanks!
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
Unfortunately no. There are several tools that are good at producing PDFs from scratch (most commonly for Python, ReportLab), but they don't generally load existing PDFs. You would have to include generating code for any boilerplate text, lines, blocks, shapes and images, rather than this being freely editable by the user.
On the other side there's pyPdf which can load PDFs, collate the pages, and extract some of the information, but can't really add new content. You can ‘merge’ pages into one, but you'd still have to create the extra information overlay as a page in ReportLab first.
Look into docutils and reSTructuredText. You could quickly write out your PDF document in reST and then compile the PDF using rst2pdf.py
I've used this, it creates very beautiful documents and the markup is extensible! Later you could take the same code and run it into rst2html to create a website out if it!
Take a look here:
http://docutils.sourceforge.net/docs/user/rst/quickref.html
http://code.google.com/p/rst2pdf/
Good luck
You could generate a document through, for example, TeX, or OpenOffice, or whatever gives you the most comfortable bindings and then print the document with a pdf printer.
This allows you not to have to figure out where to put fields precisely or figure out what to do if your content overflows the space allocated for it.