How to automate the conversion of PPT into PDFs? - python

I currently have a template for certificate in PPT format and wish to convert (or print) it to PDF.
I have a python list of people names that I am passing to python in the form of a list.
I need help with library and code to write a python script to automate this process by using a loop to edit the ppt with the names in the list and export the ppts to generate PDFs iteratively

You might be better off converting to PDF first with a placeholder text and then try to replace the text in the PDF using pdf-redactor: https://github.com/JoshData/pdf-redactor
A more reliable and future-proof option would be to replicate the template in Latex, where you can then very easily substitute the text.

Related

How to convert docx to pdf in python dash

I am trying to deploy a python dash app to Heroku server. I perform some calculations based on user input and I create a docx report (I have a specific .docx template and I fill it with data each time). My problem is that before sending that report to the user, I want to convert it to pdf. I was already using for offline use, docx2pdf however I do not know if heroku has word embedded. Apart from that, I do not want to save the .docx and then convert it to pdf and then delete that .docx. I want to do all the conversions when the files are in memory, not stored.
Can you suggest me any other way to do that?
I tried docx2pdf without success, and I do not want this approach either because I have to save the file.
I tried another solution that I found, conver the docx to html and then to pdf with pdfkit, but the conversion from docx to html really messed up the document.
I also tried several other minor libraries without success.

Python : Convert multiple images as multiple pages in pdf for windows

How to convert multiple images(jpeg) as a pdf file with multiple pages in windows.
Using Image library, i can convert every image as single pdf, i can merge those converted files to a single pdf file using pdfminer, but it is two way work.
I try to download MagicK, but couldn't get binary for windows. Is it possible to achieve using PIL ?
I'm not totally sure, but you can create a report with jasperReport and create a pdf file after. I believe python also can work with jasper reports.
what do you think? maybe is too much work.

Processing a PDF for information extraction

I am working on a project where I have a pdf file which describes one of the health policy. What I need to do is extract the information from this PDF and try to save it in some form such that I can answer the questions related to the policy by extracting info from this PDf.
This PDF is too big, so I want to divide the PDF according to the different sections so that when a query related to some particular area comes in then I wont have to go through the entire document.
I tried solving this using some pdf converters which converts the PDFs into the HTMLs. But these converters wont convert the PDF to HTML properly so that headings will have heading tag. Also even if I convert this properly and get the proper sections out of the document, I am not getting how to store this data.(I mean in which form should I store this Data).
Is there any other solution with which I can achieve this. I am using Python and also I can use NLTK if needed. Also the format is not fixed for the PDfs, I mean to say my code should work on any kind of PDFs.
PDFMiner is great in that it has location for every bit of text it gets from the PDF. It won't be nicely put in header tags or anything like that, but if you have a consistent PDF structure in your docs you might be able to get something working.

Create PDF from Word template in Python

I need to generate reports in python. The report needs to contain a header and footer on each page, some text that won't change, some text that's dynamic, and some charts.
I've created a template using Word, and I'm looking for a way of replacing placeholders such as [+my_placeholder+] with text content/charts/whatever.
Is there anything that can let me use Word documents (or something Word can write) as a template to create a PDF in Python? Since I've already created sample reports in Word I want to reuse what I've got instead of having to recreate them using ReportLab or HTML (I know about ReportLab, pyPDF and also xhtml2pdf).
about the first part, editing word templates, you can have a look at this question: Reading/Writing MS Word files in Python
There is a package for editing word files, called python-docx.
For converting word files into pdf documents, there should be a variety of different tools, including command line tools. So you could use python to edit your document and then use a converter tool that you call from your python code to create the pdf.

Creating a PDF from XML using XSL FO w/ Python

Could someone point me in the right direction of hopefully a library or code examples, any resources on how to take XML and create a PDF using XSL-FO in Python? If I should have to use an XML renderer, then which XML renderer is recommended?
If you want to run XSLT programmatically with Python, you want lxml.
However, if you just need to create a .fo file from a defined XSL/XML pair, you might as well just use xsltproc, which is available on any Unix-y system on the command line, including OS X.
Once you have the .fo file, use Fop to transform that to PDF.
You may want to try XHTML2PDF. It's very easy to use. Create your XHTML template. Use Jinja2 (or similar) to fill in the template. Convert to PDF.

Categories