Converting xls to pdf - python

I'm trying to execute a script to convert a XLS file to a PDF file:
from win32com import client
import win32api
input_file = r"C:\Users\Max12\Desktop\xml\pdfminer\UiPath\attachments\75090058\Status\Verwerking\25063599.xls"
#give your file name with valid path
output_file = r"C:\Users\Max12\Desktop\xml\pdfminer\UiPath\attachments\75090058\Status\Verwerking\test.pdf"
#give valid output file name and path
app = client.DispatchEx("Excel.Application")
app.Interactive = False
app.Visible = False
Workbook = app.Workbooks.Open(input_file)
try:
Workbook.ActiveSheet.ExportAsFixedFormat(0, output_file)
except Exception as e:
print("Failed to convert in PDF format.Please confirm environment meets all the requirements and try again")
print(str(e))
finally:
Workbook.Close()
app.Exit()
However, getting the following error:
File "xls.py", line 9, in <module>
app = client.DispatchEx("Excel.Application")
File "C:\Python38\lib\site-packages\win32com\client\__init__.py", line 113, in DispatchEx
dispatch = pythoncom.CoCreateInstanceEx(clsid, None, clsctx, serverInfo, (pythoncom.IID_IDispatch,))[0]
pywintypes.com_error: (-2147024894, 'The system cannot find the file specified.', None, None)
I've closed Excel.
Please help

This seems to be a problem caused by the fact, that the referenced COM-Interface cannot be found. Normally you have to make sure, that the referenced Office-Tools are installed correctly or that the reference OCS-module is registered correctly.
Maybe you could try to use the pandas lib instead of COM? See enter link description here

Related

Deep Learning model on M1 chip(TensorFlow and Keras) [duplicate]

I am trying to read a png file into a python-flask application running in docker and am getting an error that says
ValueError: Could not find a format to read the specified file in mode
'i'
i have uploaded a file using an HTML file and now i am trying to read it for further processing. i see that scipy.misc.imread is deprecated and i am trying to replace this with imageio.imread
if request.method=='POST':
file = request.files['image']
if not file:
return render_template('index.html', label="No file")
#img = misc.imread(file)
img = imageio.imread(file)
i get this error :
File "./appimclass.py", line 34, in make_prediction
img = imageio.imread(file)
File "/usr/local/lib/python3.6/site-packages/imageio/core/functions.py", line 221, in imread
reader = read(uri, format, "i", **kwargs)
File "/usr/local/lib/python3.6/site-packages/imageio/core/functions.py", line 139, in get_reader
"Could not find a format to read the specified file " "in mode %r" % mode
Different, but in case helpful. I had an identical error in a different library (skimage), and the solution was to add an extra 'plugin' parameter like so -
image = io.imread(filename,plugin='matplotlib')
Had the exact same problem recently, and the issue was a single corrupt file. Best is to use something like PIL to check for bad files.
import os
from os import listdir
from PIL import Image
dir_path = "/path/"
for filename in listdir(dir_path):
if filename.endswith('.jpg'):
try:
img = Image.open(dir_path+"\\"+filename) # open the image file
img.verify() # verify that it is, in fact an image
except (IOError, SyntaxError) as e:
print('Bad file:', filename)
#os.remove(dir_path+"\\"+filename) (Maybe)
I had this problem today, and found that if I closed the file before reading it into imageio the problem went away.
Error was:
File "/home/vinny/pvenvs/chess/lib/python3.6/site-packages/imageio/core/functions.py", line 139, in get_reader "Could not find a format to read the specified file " "in mode %r" % mode ValueError: Could not find a format to read the specified file in mode 'i'
Solution:
Put file.close() before images.append(imageio.imread(filename)), not after.
Add the option "pilmode":
imageio.imread(filename,pilmode="RGB")
It worked for me.
I encountered the same error, and at last, I found it was because the picture was damaged.
I had accidentally saved some images as PDF, so the error occurred. resolved after deleting those incompatible format images.

Getting Python error -->PermissionError: [WinError 32] The process cannot access the file because it is being used by another process

I'm trying to read a csv file from local path using python & then I process the records of csv file into json structure , finally printing them on console. I have written the code within try & except block.I'm expecting that if any exception happens in the try block while reading the data from csv file, the except block should print that exception has occured & it should move the csv file from current location to folder called errored. But while testing via simulating the errored scenario ,it is unable to move the csv in errored folder.Instead it throws error:- "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process".Below is the code:-
try:
global df
df = pd.read_csv('CBD_BU_FULL.csv', encoding='UTF-8', dtype=str)
df = df.assign(FILE_TYPE ='BU')
data = df.to_json(orient = "records", lines=False).split('\n')
print(data)
except:
print("An exception occurred")
os.rename('CBD_BU_FULL.csv', '/Errored/CBD_BU_FULL.csv')
It's quite possible that pd.read_csv isn't properly closing the file since it is crashing mid read, I would try opening the file yourself so that in the except your own program is definitely closing the file and this may fix your issue.
import traceback
import pandas as pd
try:
with open('CBD_BU_FULL.csv', "r") as f:
df = pd.read_csv(f, encoding='UTF-8', dtype=str)
df = df.assign(FILE_TYPE ='BU')
data = df.to_json(orient = "records", lines=False).split('\n')
print(data)
except:
traceback.print_exc(1)
os.rename('CBD_BU_FULL.csv', '/Errored/CBD_BU_FULL.csv')
You cannot use os.rename if the file is in use. You could use shutil.copy instead.

Python - Converting XLSX to PDF

I have always used win32com module in my development server to easily convert from xlsx to pdf:
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
o.DisplayAlerts = False
wb = o.Workbooks.Open("test.xlsx")))
wb.WorkSheets("sheet1").Select()
wb.ActiveSheet.ExportAsFixedFormat(0, "test.pdf")
o.Quit()
However, I have deployed my Django app in production server where I don't have Excel application installed and it raises the following error:
File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\__init__.p
y", line 95, in Dispatch
dispatch, userName = dynamic._GetGoodDispatchAndUserName(dispatch,userName,c
lsctx)
File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 114, in _GetGoodDispatchAndUserName
return (_GetGoodDispatch(IDispatch, clsctx), userName)
File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 91, in _GetGoodDispatch
IDispatch = pythoncom.CoCreateInstance(IDispatch, None, clsctx, pythoncom.II
D_IDispatch)
com_error: (-2147221005, 'Invalid class string', None, None)
Is there any good alternative to convert from xlsx to PDF in Python?
I have tested xtopdf with PDFWriter, but with this solution you need to read and iterate the range and write lines one by one. I wonder if there is a more direct solution similar to win32com.client.
Thanks!
As my original answer was deleted and is eventually a bit useful, I repost it here.
You could do it in 3 steps:
excel to pandas: pandas.read_excel
pandas to HTML: pandas.DataFrame.to_html
HTML to pdf: python-pdfkit (git), python-pdfkit (pypi.org)
import pandas as pd
import pdfkit
df = pd.read_excel("file.xlsx")
df.to_html("file.html")
pdfkit.from_file("file.html", "file.pdf")
install:
sudo pip3.6 install pandas xlrd pdfkit
sudo apt-get install wkhtmltopdf
This is a far more efficient method than trying to load a redundant script that is hard to find and was wrtten in Python 2.7.
Load excel spread sheet into a DataFrame
Write the DataFrame to a HTML file
Convert the html file to an image.
dirname, fname = os.path.split(source)
basename = os.path.basename(fname)
data = pd.read_excel(source).head(6)
css = """
"""
text_file = open(f"{basename}.html", "w")
# write the CSS
text_file.write(css)
# write the HTML-ized Pandas DataFrame
text_file.write(data.to_html())
text_file.close()
imgkitoptions = {"format": "jpg"}
imgkit.from_file(f"{basename}.html", f'{basename}.png', options=imgkitoptions)
try:
os.remove(f'{basename}.html')
except Exception as e:
print(e)
return send_from_directory('./', f'{basename}.png')
Taken from here https://medium.com/#andy.lane/convert-pandas-dataframes-to-images-using-imgkit-5da7e5108d55
Works really well, I have XLSX files converting on the fly and displaying as image thumbnails on my application.
from openpyxl import load_workbook
from PDFWriter import PDFWriter
workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active
pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')
ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
s = ''
for cell in row:
if cell.value is None:
s += ' ' * 11
else:
s += str(cell.value).rjust(10) + ' '
pw.writeLine(s)
pw.savePage()
pw.close()
I have been using this and it works fine

Python win32com 'Invalid number of parameters'

I am trying to use win32com to convert multiple xlsx files into xls using the following code:
import win32com.client
f = r"./input.xlsx"
xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
wb = xl.Workbooks.Open(f)
xl.ActiveWorkbook.SaveAs("./somefile.xls", FileFormat=56)
which is failing with the following error:
Traceback (most recent call last):
File "xlsx_conv.py", line 6, in <module>
xl.ActiveWorkbook.SaveAs("./somefile.xls", FileFormat=56)
File "C:\python27\lib\site-packages\win32com\gen_py\00020813-0000-0000-C000-000000000046x0x1x9.py", line 46413, in SaveAs
, Local, WorkIdentity)
pywintypes.com_error: (-2147352562, 'Invalid number of parameters.', None, None)
Some more details:
I can do other commands to the workbook i.e. wb.Worksheets.Add()and set xl.Visible=True to view the workbook. and even do wb.Save() but can't do a wb.SaveAs()
The COM exception is due to the missing filename argument as f = r"./input.xlsx" cannot be found. Had you used Excel 2013+, you would have received a more precise exception message with slightly different error code:
(-2147352567, 'Exception occurred.', (0, 'Microsoft Excel', "Sorry, we
couldn't find ./input.xlsx. Is it possible it was moved,
renamed or deleted?", 'xlmain11.chm', 0, -2146827284), None)
While your path does work in Python's native context pointing to the directory the called .py script resides, it does not in interfacing with an external API, like Windows COM, as full path is required.
To resolve, consider using Python's built-in os to extract the current directory path of script and concatenate with os.path.join() in the Excel COM method. Also, below uses try/except/finally to properly end the Excel.exe process in background regardless if exception is raised or not.
import os
import win32com.client as win32
cd = os.path.dirname(os.path.abspath(__file__))
try:
f = os.path.join(cd, "input.xlsx")
xl = win32.gencache.EnsureDispatch('Excel.Application')
wb = xl.Workbooks.Open(f)
xl.ActiveWorkbook.SaveAs(os.path.join(cd, "input.xls"), FileFormat=56)
wb.Close(True)
except Exception as e:
print(e)
finally:
wb = None
xl = None
I spent quite a lot of time searching for a proper solution but the only thing I found out is that the script I wrote yesterday today is not working. In addition the same script works on other computers, so I guess this is something broken in the windows/MsExcel/Python environment, but I can't figure out where.
I found a work around to the "SaveAs" problem and it is just to use the "Save" function. Luckily that still works and does not block me from carrying on with my tasks. I hope this help.
import win32com.client as win32
from shutil import copyfile
# you need to define these two:
# src, is the absolute path to the excel file you want to open.
# dst, is the where you want to save as the file.
copyfile(src, dst)
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(PATH_DATASET_XLS)
ws = wb.Worksheets(DATASET_WORKING_SHEET)
# do some stuff
ws.Cells( 1, 'A' ).Value = "hello"
# Saving changes
wb.Save() # <- this is the work around
excel.Application.Quit()

check if a file is open in Python

In my app, I write to an excel file. After writing, the user is able to view the file by opening it. But if the user forgets to close the file before any further writing, a warning message should appear. So I need a way to check this file is open before the writing process. Could you supply me with some python code to do this task?
If all you care about is the current process, an easy way is to use the file object attribute "closed"
f = open('file.py')
if f.closed:
print 'file is closed'
This will not detect if the file is open by other processes!
source: http://docs.python.org/2.4/lib/bltin-file-objects.html
I assume that you're writing to the file, then closing it (so the user can open it in Excel), and then, before re-opening it for append/write operations, you want to check that the file isn't still open in Excel?
This is how you could do that:
while True: # repeat until the try statement succeeds
try:
myfile = open("myfile.csv", "r+") # or "a+", whatever you need
break # exit the loop
except IOError:
input("Could not open file! Please close Excel. Press Enter to retry.")
# restart the loop
with myfile:
do_stuff()
For windows only
None of the other provided examples would work for me when dealing with this specific issue with excel on windows 10. The only other option I could think of was to try and rename the file or directory containing the file temporarily, then rename it back.
import os
try:
os.rename('file.xls', 'tempfile.xls')
os.rename('tempfile.xls', 'file.xls')
except OSError:
print('File is still open.')
You could use with open("path") as file: so that it automatically closes, else if it's open in another process you can maybe try
as in Tims example you should use except IOError to not ignore any other problem with your code :)
try:
with open("path", "r") as file: # or just open
# Code here
except IOError:
# raise error or print
Using
try:
with open("path", "r") as file:#or just open
may cause some troubles when file is opened by some other processes (i.e. user opened it manually).
You can solve your poblem using win32com library.
Below code checks if any excel files are opened and if none of them matches the name of your particular one, openes a new one.
import win32com.client as win32
xl = win32.gencache.EnsureDispatch('Excel.Application')
my_workbook = "wb_name.xls"
xlPath="my_wb_path//" + my_workbook
if xl.Workbooks.Count > 0:
# if none of opened workbooks matches the name, openes my_workbook
if not any(i.Name == my_workbook for i in xl.Workbooks):
xl.Workbooks.Open(Filename=xlPath)
xl.Visible = True
#no workbooks found, opening
else:
xl.Workbooks.Open(Filename=xlPath)
xl.Visible = True
'xl.Visible = True is not necessary, used just for convenience'
Hope this will help
Try this method if the above methods corrupt your excel file.
This function attempts to rename the file with its own name. If the file has already been opened, the edit will be reject by the os and an OSError exception will be raised. It does not touch the inner code so it will not corrupt your excel files. LMK if it worked for you.
def check_file_status(self):
try:
os.rename("file1.xlsx", "file1.xlsx")
print("File is closed.")
except OSError:
print("File is opened.")
if myfile.closed == False:
print("File is still open ################")
Just use this function. It will close any already opened excel file
import os
def close():
try:
os.system('TASKKILL /F /IM excel.exe')
except Exception:
print("KU")
close()

Categories