Docx to pdf using pandoc in python

Docx to pdf using pandoc in python - python

So I a quite new to Python so it may be a silly question but i can't seem to find the solution anywhere.
I have a django site I am running it locally on my machine just for development.
on the site I want to convert a docx file to pdf. I want to use pandoc to do this. I know there are other methods such as online apis or the python modules such as "docx2pdf". However i want to use pandoc for deployment reasons.
I have installed pandoc on my terminal using brew install pandoc.
so it should b installed correctly.
In my django project i am doing:
import pypandoc
import docx
def making_a_doc_function(request):
doc = docx.Document()
doc.add_heading("MY DOCUMENT")
doc.save('thisisdoc.docx')
pypandoc.convert_file('thisisdoc.docx', 'docx', outputfile="thisisdoc.pdf")
pdf = open('thisisdoc.pdf', 'rb')
response = FileResponse(pdf)
return response
The docx file get created no problem but it not pdf has been created. I am getting an error that says:
Pandoc died with exitcode "4" during conversion: b'cannot produce pdf output from docx\n'
Does anyone have any ideas?

The second argument to convert_file is output format, or, in this case, the format through which pandoc generates the pdf. Pandoc doesn't know how to produce a PDF through docx, hence the error.
Use pypandoc.convert_file('thisisdoc.docx', 'latex', outputfile="thisisdoc.pdf") or pypandoc.convert_file('thisisdoc.docx', 'pdf', outputfile="thisisdoc.pdf") instead.

Related

Convert the docx file into pdf in python

I am workin on a report generator, and I used pip install python-docx and
import docx.
Now I have made a new docx file, edited it but I want to save it in pdf instead of docx file. And the program script will be converted into EXE file.
Please help.
(pip install python-docx)
from docx import Document
doc=Document()
doc.add_heading('Report', 0)
# Now to save file, I know to save in docx,
# But, I want to save in pdf
# I can not finish the program and then manually convert
# As this script will run as an
# **EXE**
doc.save('report.docx')
I tried saving like --> doc.save('report.pdf') But, it did not work.

I fould some thing here: https://medium.com/analytics-vidhya/3-methods-to-convert-docx-files-into-pdf-files-using-python-b03bd6a56f45 I pesonally think the easiest way to do it is the docx2pdf module.

You can use the python package docx2pdf*:
pip install docx2pdf
Then call the convert function:
convert("report.docx", "report.pdf") after saving doc.save('report.docx'). Creating the docx file before converting it is mandatory.
unless you work on a Linux Machine as it requires Microsoft Word to be installed.

Try using the msoffice2pdf library using Microsoft Office or LibreOffice installed in the environment.
https://pypi.org/project/msoffice2pdf/

Converting (ideally) doc to pdf with python or docx to pdf, but I get error

I am working in ios and with spyder (anaconda) trying the following code in order to convert docx files which are in a directory (folder_path):
from docx2pdf import convert
import os
no_pdfs = []
i=1
for filename in os.listdir(os.path.normcase(folder_path)):
filename = os.path.join(folder_path, filename)
try:
convert(filename, os.path.splitext(filename)[0]+'.pdf')
print(f"DONE - {i}: {os.path.basename(filename)}")
i += 1
except Exception:
no_pdfs.append(os.path.basename(filename))
print(no_pdfs)
I use try - except in my code because there is the .DS_Store that appears with ios and nothing happens.
If I brutally try convert() I get the error: ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html but I am not really able to understand what goes wrong.
An extra thing is that my initial files are not actually .docx but .doc and I would really like a piece of advice where I could convert doc to pdf or doc to docx to pdf.
Any help will be much appreciated!

If you haven't resolved this yet, you could try installing the Parallels Access from the Apple store on your ios but it sounds like you just have to update your packages (pip install --upgrade PackageName). The code you provided might be working, but the error is being flagged from those packages you mentioned. Also with docx2pdf, have you installed word on your device? From the creator of docx2pdf "Unfortunately, it requires Microsoft Office to be installed and thus only works on Windows and macOS. – Al Johri"
Also, An efficient way to convert document to pdf format is worth reading.

How to read '.doc' file with python-docx module

I'm trying to read the .doc file with python-docx module ,
I'm doing
import docx
path = 'Sample-doc-file-100kb.doc'
doc = docx.Document(path)
#extracting texts from doc
This works fine for .docx but gives ValueError: file 'Sample-doc-file-100kb.doc' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xml' error for .doc file.
I searched and found that this docx module doesn't work for older version of doc file. And I looked for converting the doc to docx but all the solution are windows dependent.
I'm running this code on aws-lambda so can't use those method .
Any way to either convert to doc to docx (platform independent) or to read .doc file?

convert to doc to docx (platform independent)
If you are able to provide working LibreOffice or OpenOffice then you might try using unoconv to do doc to docx conversion as it
is a command line tool to convert any document format that LibreOffice
can import to any document format that LibreOffice can export.

in Ubuntu with this command:
apt-get install antiword

X11 Tkinter + PIL + py2app = IOError cannot identify image file

I have a problem with a python program (python 2.7.3, X11 Tkinter, py2app 0.6.4, MacOS X 10.7.4) that I'm trying to export to py2app. The problem only started occurring in the standalone py2app-ified app version of the program. When I run the python source file from which the app was created, the problem does not exist, so I feel it must have something to do with the py2app export.
The problem: When I start the GUI, the first time I try to load a valid image file, the image fails to load, and I get the following error from the PIL Image module:
File "Image.pyc", line 1980, in open
IOError: cannot identify image file
When I then (without closing the GUI or anything) try to open the exact same file, it loads perfectly, no errors or problems. This happens every time, with any image file I try - the first attempt to load fails, subsequent attempts succeed. I should add that after that first error, no image files ever fail to load - even if they are different from the first one.
A few notes:
- The image file is a sequence, and is very large (around 300 MB), so to speed up the loading process, I use a mmap. I have tried removing the mmap step, and handing a regular file object directly to ImagePIL.open it directly, and the problem is unaffected.
- I also tried seeking to the beginning of the file before giving it to ImagePIL.open, but that had no effect.
- The py2app setup file is pretty vanilla - it just includes a few config files and an icon.
Here is the relevant part of the offending image load function:
import Image as ImagePIL
import mmap as m
...
...
def loadImage(self):
errorLog.debug("Attempting to open image \""+self.filenameVar.get()+"\"")
try:
if self.fileMap is not None:
self.fileMap.close()
imageFile = open(self.filenameVar.get(), 'r')
self.fileMap = m.mmap(imageFile.fileno(), 0, prot=m.PROT_READ)
# self.fileMap.seek(0)
self.imageSeries = ImagePIL.open(self.fileMap)
imageFile.close()
except(IOError):
errorLog.exception("Failed to open image \""+self.filenameVar.get()+"\"")
return
I'm pretty stumped - any ideas? Thanks in advance!
Edit: I should add that Tkinter, PIL, and py2app were installed using MacPorts 2.1.2, in the off chance that helps.

It seems that py2app does not include PIL's image plugins into the application bundle even though one of the py2app recipes tries to ensure that they are included.
One thing you could try is to build with "python setup.py py2app --packages=PIL" and then use "import PIL.Image as ImagePIL" to use it.
I don't understand yet why the PIL recipe doesn't work, it might be something in the way MacPorts builds python packages (I don't use MacPorts myself).

The problem is the result of inconsistency between Pillow version 3.0.0 and py2app.
I suggest two solution to avoid PIL (Pillow)
Use opencv instead of PIL.
uninstall the current version of Pillow and install a previous one like 1.7.8

python wkhtmltopdf to Generate pdf

I am able to generate the pdf using the Command Line wkhtmltopdf but when i use it in python lib
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf(
url='http://www.wikipedia.org',
output_file='a.pdf',
)
i get
'Exception: Missing url and output file arguments'

I think there is an issue with the current version. I had the same issues, and if you look at their Github issues page, someone posted the same issue two days ago.
This should have worked also, according to their documentation:
python -m wkhtmltopdf.main google.com ~/google.pdf
But instead I get:
optparse.OptionConflictError: option -h/--header-html: conflicting option string(s): -h
Since it's a wrapper, I'm guessing the underlying application was updated, but the wrapper has not been.

The problem in typos and rewrited API in wkhtmltopdf/main.py
Right now API is:
from wkhtmltopdf import WKhtmlToPdf
wkhtmltopdf = WKhtmlToPdf('http://www.wikipedia.org','out.pdf')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Docx to pdf using pandoc in python - python

Related

Convert the docx file into pdf in python

Converting (ideally) doc to pdf with python or docx to pdf, but I get error

How to read '.doc' file with python-docx module

X11 Tkinter + PIL + py2app = IOError cannot identify image file

python wkhtmltopdf to Generate pdf

Categories

Resources