Convert docx to PDF using Linux and Python - python

I am looking for a way to convert a docx to PDF using Python in Linux. So far, all I have found that works it is using Windows, is there a way to do it in Python without using libreoffice?

Related

Alternative ways to convert PPTX Template to PDF using Python

I've got this working on this automated python script that edits a PPTX template and converts it to PDF, however this only works in windows operating env as the dependency required for this to work is using win32com.client. However I'll be porting the script to linux based so it will not work.
Any suggestion/work around to take for the conversion of PPT -> PDF in python without using win32com and apose.slides.
P.S. I'm also unable to install libreoffice on linux due to insufficient privileges

How to convert a XLSX file to PDF with Python 2.7

I have an .xlsx file I generate using xlsxwriter in a python script (version 2.7). I am trying to find a way to convert a worksheet in the file to a PDF format. I have not found a module that suits my needs yet.. simple, lightweight, and is able to be installed using pip.
Any suggestions, let's hear them! Thanks to all!
Try using a comdination of openpyxl and PDFwriter as shown in this example this example

Clean up XML of a DOCX document with python / Linux binary

It could be some kind of question similar to this one
But methods described there aren't applicable to my situation. I'm looking for a tool to use from Python or just a standalone Linux binary. All, that I've already found are only Win/MSO-related methods:(
Is there any way to simply clean docx tags in Linux?
Thanks!
I've tried to use headless LibreOffice as a convertor from DOCX to DOCX and it seemed to help with most of the cases.
libreoffice --headless --convert-to docx ./Copyright\ license.docx
Nevertheless, this way needs more testing.

how to convert text files to pdf files without reportlab in python?

I have a problem when using reportlab and py2exe. It works normal on python but much error on reportlab modules when running the exe file after compiled by py2exe. Can you suggest a library or code in python way to convert a text files (with tables) to pdf format without using reportlab. Thanks.
I used pyPdf in the past, which is quite good for quick and dirty solutions, though I would hesitate before using it for large projects.

Converting a PDF to a series of images with Python

I'm attempting to use Python to convert a multi-page PDF into a series of JPEGs. I can split the PDF up into individual pages easily enough with available tools, but I haven't been able to find anything that can covert PDFs to images.
PIL does not work, as it can't read PDFs. The two options I've found are using either GhostScript or ImageMagick through the shell. This is not a viable option for me, since this program needs to be cross-platform, and I can't be sure either of those programs will be available on the machines it will be installed and used on.
Are there any Python libraries out there that can do this?
ImageMagick has Python bindings.
Here's whats worked for me using the python ghostscript module (installed by '$ pip install ghostscript'):
import ghostscript
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pdf2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
ghostscript.Ghostscript(*args)
I also installed Ghostscript 9.18 on my computer and it probably wouldn't have worked otherwise.
You can't avoid the Ghostscript dependency. Even Imagemagick relies on Ghostscript for its PDF reading functions. The reason for this is the complexity of the PDF format: a PDF doesn't just contain bitmap information, but mostly vector shapes, transparencies etc.
Furthermore it is quite complex to figure out which of these objects appear on which page.
So the correct rendering of a PDF Page is clearly out of scope for a pure Python library.
The good news is that Ghostscript is pre-installed on many windows and Linux systems, because it is also needed by all those PDF Printers (except Adobe Acrobat).
If you're using linux some versions come with a command line utility called 'pdftopbm' out of the box. Check out netpbm
Perhaps relevant: http://www.swftools.org/gfx_tutorial.html

Categories