I'm trying to loop through .xlsb file types in a folder and convert them to .csv in Python 3+ and Windows 10 and have pieced together the code below with help from SO. I want to save the new .csv as the original .xlsb name but am having issues - I have this so far:
import os
import glob
import win32com.client
path = r'C:\Users\folder\Desktop\Test Binary'
all_files_test = glob.glob(os.path.join(path, "*.xlsb"))
for file in all_files_test:
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
doc = excel.Workbooks.Open(file)
doc.SaveAs(Filename="C:\\Users\\folder\\Desktop\\Test Binary\\file.csv",FileFormat = 6) #overwrites file each time, need to substitute 'file'
doc.Close(True)
excel.Quit()
excel.Quit()
Which of course just overwrites each new iteration each time as 'file.csv'. How can I substitute the .xlsb name for each .csv name to SaveAs separate files? Thanks in advance.
Simply use str.replace on file variable to change extension. And consider wrapping in try/except to cleanly release COM objects regardless of error or not.
path = r'C:\Users\folder\Desktop\Test Binary'
all_files_test = glob.glob(os.path.join(path, "*.xlsb"))
for file in all_files_test:
try:
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
doc = excel.Workbooks.Open(file)
csv_name = file.replace('.xlsb', '.csv')
doc.SaveAs(Filename = csv_name, FileFormat = 6)
doc.Close(True)
excel.Quit()
except Exception as e:
print(e)
finally:
doc = None
excel = None
And to go one level deeper use combination of os.path.basename and os.path.join:
path = r'C:\Users\folder\Desktop\Test Binary'
...
csv_name = os.path.basename(file).replace('.xlsb', '.csv')
doc.SaveAs(Filename = os.path.join(path, 'Conversion_Files', csv_name), FileFormat = 6)
Parfait's answer is good, but has a few flaws. I have remedied those (that I have noticed) in this answer, and refactored out some context managers to make the logic easier to understand (and hence easier to modify).
It now prints failed files to sys.stdout (to let you recover, Unix-style, by replacing the for loop with repeated input() / f.readline()[:-1] calls), and only opens the Excel COM object once; this should be a lot faster.
I have also added support for recursively performing this match, but this feature requires Python 3.5 or above in order to work.
import os
import glob
import traceback
from contextlib import contextmanager
import win32com.client
from pythoncom import com_error
PATH = r'C:\Users\folder\Desktop\Test Binary'
#contextmanager
def open_excel():
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
try:
yield excel
finally:
excel.Quit()
#contextmanager
def open_workbook(excel, filename):
doc = excel.Workbooks.Open(filename)
try:
yield doc
finally:
doc.Close(True)
all_files_test = glob.glob(os.path.join(PATH, "**.xlsb"), recursive=True)
with excel_cm() as excel:
for file in all_files_test:
try:
with open_workbook(file) as doc:
doc.SaveAs(Filename=file[:-4] + 'csv', FileFormat=6)
except com_error as e:
print(file)
traceback.print_exc()
Related
I would like to auto-size the cells and print the result in pdf.The file excel has a single sheet but the name is not costant. This is what I was able to produce by looking online, can anyone help me?
import os
import comtypes.client
SOURCE_DIR = './'
TARGET_DIR = './'
app = comtypes.client.CreateObject('Excel.Application')
app.Visible = False
app.DisplayAlerts = False
infile = os.path.join(os.path.abspath(SOURCE_DIR), 'test.xlsx')
outfile = os.path.join(os.path.abspath(TARGET_DIR), 'test.pdf')
doc = app.Workbooks.Open(infile)
worksheet=doc.Sheets[1]
worksheet.PageSetup.Orientation=2
doc.ExportAsFixedFormat(0, outfile, 1, 0)
doc.Close()
app.Quit()
This convert the file in pdf but I still have to add the auto-size but I am not able to...
P.S. It is quite slow maybe due to the fact that I save the doc but I do changes on the worksheet?
ty for your time
Was easier than I thought I just added:
worksheet.Columns.AutoFit()
I would like to ask for help with a Python script that is supposed to loop through a directory on a drive. Basically, what I want to do is convert over 10,0000 DBF files to CSV. So far, I can achieve this on an individual dbf file by using using the dbfread and Pandas packages. Running this script over 10,000 individual times is obviously not feasible, hence why I want automate the task by writing a script that will loop through each dbf file in the directory.
Here is what I would like to do.
Define the directory
Write a for loop that will loop through each file in the directory
Only open a file with the extension '.dbf'
Convert to Pandas DataFrame
Define the name for the output file
Write to CSV and place file in a new directory
Here is the code that I was using to test whether I could convert an individual '.dbf' file to a CSV.
from dbfread import DBF
import pandas as pd
table = DBF('Name_of_File.dbf')
#I originally kept receiving a unicode decoding error
#So I manually adjusted the attributes below
table.encoding = 'utf-8' # Set encoding to utf-8 instead of 'ascii'
table.char_decode_errors = 'ignore' #ignore any decode errors while reading in the file
frame = pd.DataFrame(iter(table)) #Convert to DataFrame
print(frame) #Check to make sure Dataframe is structured proprely
frame.to_csv('Name_of_New_File')
The above code worked exactly as it was intended.
Here is my code to loop through the directory.
import os
from dbfread import DBF
import pandas as pd
directory = 'Path_to_diretory'
dest_directory = 'Directory_to_place_new_file'
for file in os.listdir(directory):
if file.endswith('.DBF'):
print(f'Reading in {file}...')
dbf = DBF(file)
dbf.encoding = 'utf-8'
dbf.char_decode_errors = 'ignore'
print('\nConverting to DataFrame...')
frame = pd.DataFrame(iter(dbf))
print(frame)
outfile = frame.os.path.join(frame + '_CSV' + '.csv')
print('\nWriting to CSV...')
outfile.to_csv(dest_directory, index = False)
print('\nConverted to CSV. Moving to next file...')
else:
print('File not found.')
When I run this code, I receive a DBFNotFound error that says it couldn't find the first file in the directory. As I am looking at my code, I am not sure why this is happening when it worked in the first script.
This is the code from the dbfread package from where the exception is being raised.
class DBF(object):
"""DBF table."""
def __init__(self, filename, encoding=None, ignorecase=True,
lowernames=False,
parserclass=FieldParser,
recfactory=collections.OrderedDict,
load=False,
raw=False,
ignore_missing_memofile=False,
char_decode_errors='strict'):
self.encoding = encoding
self.ignorecase = ignorecase
self.lowernames = lowernames
self.parserclass = parserclass
self.raw = raw
self.ignore_missing_memofile = ignore_missing_memofile
self.char_decode_errors = char_decode_errors
if recfactory is None:
self.recfactory = lambda items: items
else:
self.recfactory = recfactory
# Name part before .dbf is the table name
self.name = os.path.basename(filename)
self.name = os.path.splitext(self.name)[0].lower()
self._records = None
self._deleted = None
if ignorecase:
self.filename = ifind(filename)
if not self.filename:
**raise DBFNotFound('could not find file {!r}'.format(filename))** #ERROR IS HERE
else:
self.filename = filename
Thank you any help provided.
os.listdir returns the file names inside the directory, so you have to join them to the base path to get the full path:
for file_name in os.listdir(directory):
if file_name.endswith('.DBF'):
file_path = os.path.join(directory, file_name)
print(f'Reading in {file_name}...')
dbf = DBF(file_path)
In my flask web app, I am writing data from excel to a temporary file which I then parse in memory. This method works fine with xlrd but it does not with openpyxl.
Here is how I am writing to a temporary file which I then parse with xlrd.
xls_str = request.json.get('file')
try:
xls_str = xls_str.split('base64,')[1]
xls_data = b64decode(xls_str)
except IndexError:
return 'Invalid form data', 406
save_path = os.path.join(tempfile.gettempdir(), random_alphanum(10))
with open(save_path, 'wb') as f:
f.write(xls_data)
f.close()
try:
bundle = parse(save_path, current_user)
except UnsupportedFileException:
return 'Unsupported file format', 406
except IncompatibleExcelException as ex:
return str(ex), 406
finally:
os.remove(save_path)]
When I use openpyxl with the code above it complains about an unsupported type but that is because I'm using a temporary file to parse the data hence it doesn't have an ".xlsx" extension and even if I added it, it would not work because its not a excel file after all.
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support file format,
please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
What should I do?
Why not create a temp excel file with openpyxl instead. Give this example a try. I did something similar in the past.
from io import BytesIO
from openpyxl.writer.excel import save_virtual_workbook
from openpyxl import Workbook
def create_xlsx():
wb = Workbook()
ws = wb.active
row = ('Hello', 'Boosted_d16')
ws.append(row)
return wb
#app.route('/', methods=['GET'])
def main():
xlsx = create_xlsx()
filename = BytesIO(save_virtual_workbook(xlsx))
return send_file(
filename,
attachment_filename='test.xlsx',
as_attachment=True
)
I have this script which downloads an excel file from google drive, updates the external link then runs a desired macro. However, I would like it so that the file is deleted from the path after the macro has been run so that I don't need to update the script to change the file name every time.
I've included the script only from the run macro part.
def run_macro(workbook_name, com_instance):
wb = com_instance.workbooks.open(workbook_name)
com_instance.AskToUpdateLinks = False
try:
wb.UpdateLink(Name=wb.LinkSources())
except Exception as e:
print(e)
finally:
wb.Close(True)
wb = None
return True
def main():
dir_root = ("C:\\users\\ciara\\desktop\\test4.xlsm")
xl_app = Dispatch("Excel.Application")
xl_app.Visible = False
xl_app.DisplayAlerts = False
for root, dirs, files in os.walk(dir_root):
for fn in files:
if fn.endswith(".xlsx") and fn[0] is not "~":
run_macro(os.path.join(root, fn), xl_app)
xl_app.Quit()
xl = None
import unittest
import os.path
import win32com.client
class ExcelMacro(unittest.TestCase):
def test_excel_macro(self):
try:
xlApp = win32com.client.DispatchEx('Excel.Application')
xlsPath = os.path.expanduser('C:\\users\\ciara\\desktop\\test4.xlsm')
wb = xlApp.Workbooks.Open(Filename=xlsPath)
xlApp.Run('BridgeHit')
wb.Save()
xlApp.Quit()
print("Macro ran successfully!")
except:
print("Error found while running the excel macro!")
xlApp.Quit()
if __name__ == "__main__":
unittest.main()
if os.path.exists('C:\\users\\ciara\\desktop\\test4.xlsm'):
os.remove('C:\\users\\ciara\\desktop\\test4.xlsm')
else:
print('File does not exists')
The console displays does not display even 'File does not exists'.
If there is any other feedback on the code above I would appreciate it! I'm just starting to learn
Try adding an indent to your if-else block.
You should add the file removal if-else block to ExcelMacro class. Then only, before running (or after) a macro, it could remove the file if it exists.
I am trying to use win32com to convert multiple xlsx files into xls using the following code:
import win32com.client
f = r"./input.xlsx"
xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
wb = xl.Workbooks.Open(f)
xl.ActiveWorkbook.SaveAs("./somefile.xls", FileFormat=56)
which is failing with the following error:
Traceback (most recent call last):
File "xlsx_conv.py", line 6, in <module>
xl.ActiveWorkbook.SaveAs("./somefile.xls", FileFormat=56)
File "C:\python27\lib\site-packages\win32com\gen_py\00020813-0000-0000-C000-000000000046x0x1x9.py", line 46413, in SaveAs
, Local, WorkIdentity)
pywintypes.com_error: (-2147352562, 'Invalid number of parameters.', None, None)
Some more details:
I can do other commands to the workbook i.e. wb.Worksheets.Add()and set xl.Visible=True to view the workbook. and even do wb.Save() but can't do a wb.SaveAs()
The COM exception is due to the missing filename argument as f = r"./input.xlsx" cannot be found. Had you used Excel 2013+, you would have received a more precise exception message with slightly different error code:
(-2147352567, 'Exception occurred.', (0, 'Microsoft Excel', "Sorry, we
couldn't find ./input.xlsx. Is it possible it was moved,
renamed or deleted?", 'xlmain11.chm', 0, -2146827284), None)
While your path does work in Python's native context pointing to the directory the called .py script resides, it does not in interfacing with an external API, like Windows COM, as full path is required.
To resolve, consider using Python's built-in os to extract the current directory path of script and concatenate with os.path.join() in the Excel COM method. Also, below uses try/except/finally to properly end the Excel.exe process in background regardless if exception is raised or not.
import os
import win32com.client as win32
cd = os.path.dirname(os.path.abspath(__file__))
try:
f = os.path.join(cd, "input.xlsx")
xl = win32.gencache.EnsureDispatch('Excel.Application')
wb = xl.Workbooks.Open(f)
xl.ActiveWorkbook.SaveAs(os.path.join(cd, "input.xls"), FileFormat=56)
wb.Close(True)
except Exception as e:
print(e)
finally:
wb = None
xl = None
I spent quite a lot of time searching for a proper solution but the only thing I found out is that the script I wrote yesterday today is not working. In addition the same script works on other computers, so I guess this is something broken in the windows/MsExcel/Python environment, but I can't figure out where.
I found a work around to the "SaveAs" problem and it is just to use the "Save" function. Luckily that still works and does not block me from carrying on with my tasks. I hope this help.
import win32com.client as win32
from shutil import copyfile
# you need to define these two:
# src, is the absolute path to the excel file you want to open.
# dst, is the where you want to save as the file.
copyfile(src, dst)
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(PATH_DATASET_XLS)
ws = wb.Worksheets(DATASET_WORKING_SHEET)
# do some stuff
ws.Cells( 1, 'A' ).Value = "hello"
# Saving changes
wb.Save() # <- this is the work around
excel.Application.Quit()