Convert the docx file into pdf in python - python

I am workin on a report generator, and I used pip install python-docx and
import docx.
Now I have made a new docx file, edited it but I want to save it in pdf instead of docx file. And the program script will be converted into EXE file.
Please help.
(pip install python-docx)
from docx import Document
doc=Document()
doc.add_heading('Report', 0)
# Now to save file, I know to save in docx,
# But, I want to save in pdf
# I can not finish the program and then manually convert
# As this script will run as an
# **EXE**
doc.save('report.docx')
I tried saving like --> doc.save('report.pdf') But, it did not work.

I fould some thing here: https://medium.com/analytics-vidhya/3-methods-to-convert-docx-files-into-pdf-files-using-python-b03bd6a56f45 I pesonally think the easiest way to do it is the docx2pdf module.

You can use the python package docx2pdf*:
pip install docx2pdf
Then call the convert function:
convert("report.docx", "report.pdf") after saving doc.save('report.docx'). Creating the docx file before converting it is mandatory.
unless you work on a Linux Machine as it requires Microsoft Word to be installed.

Try using the msoffice2pdf library using Microsoft Office or LibreOffice installed in the environment.
https://pypi.org/project/msoffice2pdf/

Related

Converting (ideally) doc to pdf with python or docx to pdf, but I get error

I am working in ios and with spyder (anaconda) trying the following code in order to convert docx files which are in a directory (folder_path):
from docx2pdf import convert
import os
no_pdfs = []
i=1
for filename in os.listdir(os.path.normcase(folder_path)):
filename = os.path.join(folder_path, filename)
try:
convert(filename, os.path.splitext(filename)[0]+'.pdf')
print(f"DONE - {i}: {os.path.basename(filename)}")
i += 1
except Exception:
no_pdfs.append(os.path.basename(filename))
print(no_pdfs)
I use try - except in my code because there is the .DS_Store that appears with ios and nothing happens.
If I brutally try convert() I get the error: ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html but I am not really able to understand what goes wrong.
An extra thing is that my initial files are not actually .docx but .doc and I would really like a piece of advice where I could convert doc to pdf or doc to docx to pdf.
Any help will be much appreciated!
If you haven't resolved this yet, you could try installing the Parallels Access from the Apple store on your ios but it sounds like you just have to update your packages (pip install --upgrade PackageName). The code you provided might be working, but the error is being flagged from those packages you mentioned. Also with docx2pdf, have you installed word on your device? From the creator of docx2pdf "Unfortunately, it requires Microsoft Office to be installed and thus only works on Windows and macOS. – Al Johri"
Also, An efficient way to convert document to pdf format is worth reading.

How to read '.doc' file with python-docx module

I'm trying to read the .doc file with python-docx module ,
I'm doing
import docx
path = 'Sample-doc-file-100kb.doc'
doc = docx.Document(path)
#extracting texts from doc
This works fine for .docx but gives ValueError: file 'Sample-doc-file-100kb.doc' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xml' error for .doc file.
I searched and found that this docx module doesn't work for older version of doc file. And I looked for converting the doc to docx but all the solution are windows dependent.
I'm running this code on aws-lambda so can't use those method .
Any way to either convert to doc to docx (platform independent) or to read .doc file?
convert to doc to docx (platform independent)
If you are able to provide working LibreOffice or OpenOffice then you might try using unoconv to do doc to docx conversion as it
is a command line tool to convert any document format that LibreOffice
can import to any document format that LibreOffice can export.
in Ubuntu with this command:
apt-get install antiword

Docx to pdf using pandoc in python

So I a quite new to Python so it may be a silly question but i can't seem to find the solution anywhere.
I have a django site I am running it locally on my machine just for development.
on the site I want to convert a docx file to pdf. I want to use pandoc to do this. I know there are other methods such as online apis or the python modules such as "docx2pdf". However i want to use pandoc for deployment reasons.
I have installed pandoc on my terminal using brew install pandoc.
so it should b installed correctly.
In my django project i am doing:
import pypandoc
import docx
def making_a_doc_function(request):
doc = docx.Document()
doc.add_heading("MY DOCUMENT")
doc.save('thisisdoc.docx')
pypandoc.convert_file('thisisdoc.docx', 'docx', outputfile="thisisdoc.pdf")
pdf = open('thisisdoc.pdf', 'rb')
response = FileResponse(pdf)
return response
The docx file get created no problem but it not pdf has been created. I am getting an error that says:
Pandoc died with exitcode "4" during conversion: b'cannot produce pdf output from docx\n'
Does anyone have any ideas?
The second argument to convert_file is output format, or, in this case, the format through which pandoc generates the pdf. Pandoc doesn't know how to produce a PDF through docx, hence the error.
Use pypandoc.convert_file('thisisdoc.docx', 'latex', outputfile="thisisdoc.pdf") or pypandoc.convert_file('thisisdoc.docx', 'pdf', outputfile="thisisdoc.pdf") instead.

Can I update a part of HDF5 file using h5py python library?

Can't find any command at http://docs.h5py.org/en/latest/high/file.html to do that.

Is there a faster method to load a yaml file than the standard .load method? Django/Python

I am loading a big yaml file and it is taking forever. I am wondering if there is a faster method than the yaml.load() method.
I have read that there is a CLoader method but havent been able to run it.
The website that suggested this CLoader method asks me to do this:
Download the source package PyYAML-3.08.tar.gz and unpack it.
Go to the directory PyYAML-3.08 and run:
$ python setup.py install
If you want to use LibYAML bindings, which are much faster than the pure Python version, you need to download and install LibYAML.
Then you may build and install the bindings by executing
$ python setup.py --with-libyaml install
In order to use LibYAML based parser and emitter, use the classes CParser and CEmitter:
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
This looks like this will work but I dont have a setup.py directory anywhere in my Django project and therefore can't install/import any of these things
Can anyone help me figure out how to do this or let me know about another faster loading method??
Thanks for the help!!
I have no idea what's faster - bspymaster's ideas might be the most useful.
When you download PyYAML-3.08.tar.gz, inside the archive there will be a setup.py what you can run.
Note to use LibYAML, download this: http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
And run using the instructions from http://pyyaml.org/wiki/LibYAML
You will need a set a build tools, which should be installed on linux/unix, for osx make sure xcode is installed, and I'm not sure about windows.

Categories