Converting rtf to pdf using python - python

I am new to the python language and I am given a task to convert rtf to pdf using python. I googled and found some code- (not exactly rtf to pdf) but I tried working on it and changed it according to my requirement. But I am not able to solve it.
I have used the below code:
import sys
import os
import comtypes.client
#import win32com.client
rtfFormatPDF = 17
in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])
rtf= comtypes.client.CreateObject('Rtf.Application')
rtf.Visible = True
doc = rtf.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=rtfFormatPDF)
doc.Close()
rtf.Quit()
But its throwing the below error
Traceback (most recent call last):
File "C:/Python34/Lib/idlelib/rtf_to_pdf.py", line 12, in <module>
word = comtypes.client.CreateObject('Rtf.Application')
File "C:\Python34\lib\site-packages\comtypes\client\__init__.py", line 227, in CreateObject
clsid = comtypes.GUID.from_progid(progid)
File "C:\Python34\lib\site-packages\comtypes\GUID.py", line 78, in from_progid
_CLSIDFromProgID(str(progid), byref(inst))
File "_ctypes/callproc.c", line 920, in GetResult
OSError: [WinError -2147221005] Invalid class string
Can anyone help me with this?
I would really appreciate if someone can find the better and fast way of doing it. I have around 200,000 files to convert.
Anisha

I used Marks's advice and changed it back to Word.Application and my source pointing to rtf files. Works perfectly! - the process was slow but still faster than the JAVA application which my team was using. I have attached the final code in my question.
Final Code:
Got it done using the code which works with Word application :
import sys
import os,os.path
import comtypes.client
wdFormatPDF = 17
input_dir = 'input directory'
output_dir = 'output directory'
for subdir, dirs, files in os.walk(input_dir):
for file in files:
in_file = os.path.join(subdir, file)
output_file = file.split('.')[0]
out_file = output_dir+output_file+'.pdf'
word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()

If you have Libre Office in your system, you got the best solution.
import os
os.system('soffice --headless --convert-to pdf filename.rtf')
# os.system('libreoffice --headless -convert-to pdf filename.rtf')
# os.system('libreoffice6.3 --headless -convert-to pdf filename.rtf')
Commands may vary to different versions and platforms. But this would be the best solution ever I had.

Related

I want to open a json file in python but got an error. It said No such file or directory

enter image description here
I wrote the code like this:
intents = json.loads(open('intents.json').read())
Check your intents.json file is in the same folder on which you python file is.
you can use, for example, the os builf-in module to check on the existence of file and os.path for path manipulation. Check the official doc at https://docs.python.org/3/library/os.path.html
import os
file = 'intents.json'
# location of the current directory
w_dir = os.path.abspath('.'))
if os.path.isfile(os.path.join(w_dir, file)):
with open(file, 'r') as fd:
fd.read()
else:
print('Such file does not exist here "{}"...'.format(w_dir))
You can try opening the file using the normal file operation and then use json.load or json.loads to parse the data as per your needs. I may be unfamiliar with this syntax to the best of my knowledge your syntax is wrong.
You can open the file like this:
f = open(file_name)
Then parse the data:
data = json.load(f)
You can refer to this link for more info and reference
https://www.geeksforgeeks.org/read-json-file-using-python/

Getting Assertion error while reading the PDF file python - pypdf2

I am getting the below error when I try to read a PDF file.
Code:
from PyPDF2 import PdfFileReader
import os
os.chdir("Path to dir")
pdf_document = 'sample.pdf'
pdf = PdfFileReader(pdf_document,'rb') #Error here
Error:
Traceback (most recent call last):
File "/home/krishna/PycharmProjects/sample/sample.py", line 9, in
pdf = PdfFileReader(filehandle)
File "/home/krishna/PycharmProjects/AI_DRC/venv/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1084, in init
self.read(stream)
File "/home/krishna/PycharmProjects/AI_DRC/venv/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1838, in read
assert start >= last_end
AssertionError
NOTE: File is 18 MB in size
Here I wrote this and it completely works for me, The pdf is in same folder, you can use os to get a path value of string type too
import PyPDF2
pdf_file = PyPDF2.PdfFileReader("Sample.pdf")#addressing the file, you can use os method it works on that as well
page_content = pdf_file.getPage(0).extractText()# here I get the psge number one(index zero) and then extracted its content
print(page_content)#you can then do whatever you want with it
I think the problem with your program is that "rb" thing, you use it in normal file handling, PyPDF2 already has methods called PdfFileReader, PdfFileWriter and PdfFileMerger.
Hope it helped
If you counter any problem just mention, and I will try to get back at it.

.doc to pdf using python errors

I'm trying to convert word documents to PDF using python.Currently on python 3.8.
".doc to pdf using python"
import sys
import os
import comtypes.client
wdFormatPDF = 17
in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])
word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
above code answers my question but it works only on my machine when I deploy it onto VM and run it using scheduler remotely, I get below error
File "File Name", line 10, in <module>
word = comtypes.client.CreateObject('Word.Application')
File "Path", init__.py",line1225, in CoCreateInstance
----------------------------------------------------
File"_ctypes/callproc.c",line 930, in getResult
PermissionError:[WinError -2147024891] Access is denied
FYI, word is installed on VM
Thanks

fileinput error (WinError 32) when replacing string in a file with Python

After looking for a large amount of time, I still can't seem to find an answer to my problem (I am new to python).
Here is what I'm trying to do :
Prompt the user to insert an nlps version and ncs version (both are just server builds)
Get all the filenames ending in .properties in a specified folder
Read those files to find the old nlps and ncs version
Replace, in the same .properties files, the old nlps and ncs versions by the ones given by the user
Here is my code so far :
import glob, os
import fileinput
nlpsversion = str(input("NLPS Version : "))
ncsversion = str(input("NCS Version : "))
directory = "C:/Users/x/Documents/Python_Test"
def getfilenames():
filenames = []
os.chdir(directory)
for file in glob.glob("*.properties"):
filenames.append(file)
return filenames
properties_files = getfilenames()
def replaceversions():
nlpskeyword = "NlpsVersion"
ncskeyword = "NcsVersion"
for i in properties_files:
searchfile = open(i, "r")
for line in searchfile:
if line.startswith(nlpskeyword):
old_nlpsversion = str(line.split("=")[1])
if line.startswith(ncskeyword):
old_ncsversion = str(line.split("=")[1])
for line in fileinput.FileInput(i,inplace=1):
print(line.replace(old_nlpsversion, nlpsVersion))
replaceversions()
In the .properties files, the versions would be written like :
NlpsVersion=6.3.107.3
NcsVersion=6.4.000.29
I am able to get old_nlpsversion and old_ncsversion to be 6.3.107.3 and 6.4.000.29. The problem occurs when I try to replace the old versions with the ones the user inputed. I get the following error :
C:\Users\X\Documents\Python_Test>python replace.py
NLPS Version : 15
NCS Version : 16
Traceback (most recent call last):
File "replace.py", line 43, in <module>
replaceversions()
File "replace.py", line 35, in replaceversions
for line in fileinput.FileInput(i,inplace=1):
File "C:\Users\X\AppData\Local\Programs\Python\Python36-
32\lib\fileinput.py", line 250, in __next__
line = self._readline()
File "C:\Users\X\AppData\Local\Programs\Python\Python36-
32\lib\fileinput.py", line 337, in _readline
os.rename(self._filename, self._backupfilename)
PermissionError: [WinError 32] The process cannot access the file because it
is being used by another process: 'test.properties' -> 'test.properties.bak'
It may be that my own process is the one using the file, but I can't figure out how to replace, in the same file, the versions without error. I've tried figuring it out myself, there are a lot of threads/resources out there on replacing strings in files, and i tried all of them but none of them really worked for me (as I said, I'm new to Python so excuse my lack of knowledge).
Any suggestions/help is very welcome,
You are not releasing the file. You open it readonly and then attempt to write to it while it is still open. A better construct is to use the with statement. And you are playing fast and loose with your variable scope. Also watch your case with variable names. Fileinput maybe a bit of overkill for what you are trying to do.
import glob, os
import fileinput
def getfilenames(directory):
filenames = []
os.chdir(directory)
for file in glob.glob("*.properties"):
filenames.append(file)
return filenames
def replaceversions(properties_files,nlpsversion,ncsversion):
nlpskeyword = "NlpsVersion"
ncskeyword = "NcsVersion"
for i in properties_files:
with open(i, "r") as searchfile:
lines = []
for line in searchfile: #read everyline
if line.startswith(nlpskeyword): #update the nlpsversion
old_nlpsversion = str(line.split("=")[1].strip())
line = line.replace(old_nlpsversion, nlpsversion)
if line.startswith(ncskeyword): #update the ncsversion
old_ncsversion = str(line.split("=")[1].strip())
line = line.replace(old_ncsversion, ncsversion)
lines.append(line) #store changed and unchanged lines
#At the end of the with loop, python closes the file
#Now write the modified information back to the file.
with open(i, "w") as outfile: #file opened for writing
for line in lines:
outfile.write(line+"\n")
#At the end of the with loop, python closes the file
if __name__ == '__main__':
nlpsversion = str(input("NLPS Version : "))
ncsversion = str(input("NCS Version : "))
directory = "C:/Users/x/Documents/Python_Test"
properties_files = getfilenames(directory)
replaceversions(properties_files,nlpsversion,ncsversion)

Python: Creating and writing to a file error

I am having trouble creating and writing to a text file in Python. I am running Python 3.5.1 and have the following code to try and create and write to a file:
from os import *
custom_path = "MyDirectory/"
if not path.exists(custom_path)
mkdir(custom_path)
text_path = custom_path + "MyTextFile.txt"
text_file = open(text_path, "w")
text_file.write("my text")
But I get a TypeError saying an integer is required (got type str) at the line text_file = open(text_path, "w").
I don't know what I'm doing wrong as my code is just about identical to that of several tutorial sites showing how to create and write to files.
Also, does the above code create the text file if it doesn't exist, and if not how do I create it?
Please don't import everything from os module:
from os import path, mkdir
custom_path = "MyDirectory/"
if not path.exists(custom_path):
mkdir(custom_path)
text_path = custom_path + "MyTextFile.txt"
text_file = open(text_path, 'w')
text_file.write("my text")
Because there also a "open" method in os module which will overwrite the native file "open" method.

Categories