Adding new sheet to 2000 xlsx files - Openpyxl is very slow - python

I have a python script that is working. My only issue is how slow it is. I have 2000 files and each one is taking near a minute to add a new worksheet tab.
Is my code missing something? I feel like I may be doing something wrong. Or should I be using a different library?
Here is my script:
import os
from openpyxl import load_workbook
from pathlib import Path
my_path = "C:\\Desktop\\test"
for filename in Path(my_path).glob('*.xlsx'):
try:
wb = load_workbook(filename)
wb.create_sheet("Export")
wb.save(filename)
wb.close()
except Exception as e:
print(e)
finally:
wb = None

Related

Openpyxl is not saving the excel file, why?

here is my code:
import openpyxl as op
from openpyxl import Workbook
from openpyxl import load_workbook
from openpyxl import worksheet
path = ...
op.load_workbook(path)
wb = Workbook()
ws2 = wb.create_sheet('Sheet2')
wb.save(filename = 'NameOfTheFile.xlsx')
Now everything works, except saving the file, when I add:
print(wb.sheetnames)
it does print out the Sheet2 I just added, but for some reason when I go to the documents where the file is, it does not update it, that sheet is nowhere to be found.

Win32Com Save As Variable Name in Python 3.6

I'm trying to loop through .xlsb file types in a folder and convert them to .csv in Python 3+ and Windows 10 and have pieced together the code below with help from SO. I want to save the new .csv as the original .xlsb name but am having issues - I have this so far:
import os
import glob
import win32com.client
path = r'C:\Users\folder\Desktop\Test Binary'
all_files_test = glob.glob(os.path.join(path, "*.xlsb"))
for file in all_files_test:
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
doc = excel.Workbooks.Open(file)
doc.SaveAs(Filename="C:\\Users\\folder\\Desktop\\Test Binary\\file.csv",FileFormat = 6) #overwrites file each time, need to substitute 'file'
doc.Close(True)
excel.Quit()
excel.Quit()
Which of course just overwrites each new iteration each time as 'file.csv'. How can I substitute the .xlsb name for each .csv name to SaveAs separate files? Thanks in advance.
Simply use str.replace on file variable to change extension. And consider wrapping in try/except to cleanly release COM objects regardless of error or not.
path = r'C:\Users\folder\Desktop\Test Binary'
all_files_test = glob.glob(os.path.join(path, "*.xlsb"))
for file in all_files_test:
try:
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
doc = excel.Workbooks.Open(file)
csv_name = file.replace('.xlsb', '.csv')
doc.SaveAs(Filename = csv_name, FileFormat = 6)
doc.Close(True)
excel.Quit()
except Exception as e:
print(e)
finally:
doc = None
excel = None
And to go one level deeper use combination of os.path.basename and os.path.join:
path = r'C:\Users\folder\Desktop\Test Binary'
...
csv_name = os.path.basename(file).replace('.xlsb', '.csv')
doc.SaveAs(Filename = os.path.join(path, 'Conversion_Files', csv_name), FileFormat = 6)
Parfait's answer is good, but has a few flaws. I have remedied those (that I have noticed) in this answer, and refactored out some context managers to make the logic easier to understand (and hence easier to modify).
It now prints failed files to sys.stdout (to let you recover, Unix-style, by replacing the for loop with repeated input() / f.readline()[:-1] calls), and only opens the Excel COM object once; this should be a lot faster.
I have also added support for recursively performing this match, but this feature requires Python 3.5 or above in order to work.
import os
import glob
import traceback
from contextlib import contextmanager
import win32com.client
from pythoncom import com_error
PATH = r'C:\Users\folder\Desktop\Test Binary'
#contextmanager
def open_excel():
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = False
try:
yield excel
finally:
excel.Quit()
#contextmanager
def open_workbook(excel, filename):
doc = excel.Workbooks.Open(filename)
try:
yield doc
finally:
doc.Close(True)
all_files_test = glob.glob(os.path.join(PATH, "**.xlsb"), recursive=True)
with excel_cm() as excel:
for file in all_files_test:
try:
with open_workbook(file) as doc:
doc.SaveAs(Filename=file[:-4] + 'csv', FileFormat=6)
except com_error as e:
print(file)
traceback.print_exc()

Accessing UpdateLinks() in COM Object using Python

I am working on automating an Excel file which is linked to certain .csv files.
Those .csv files are created from a SAS Code which is run every Quarter.
The files created are timestamped accordingly for example XYZ_201603.csv and XYZ_201606.csv and so on.
I need to update the links on my Excel File so that it automatically changes the link to the file from next quarter. I am trying to do this using Python win32com.client and my code looks like
from win32com import Dispatch
xl_app = Dispatch("Excel.Application")
xl_app.Visible = True
xl_app.DisplayAlerts = False
wb = xl_app.workbooks.open(r"C:\Users\XYZ\Desktop\Test\Summary.xlsx")
xl_app.AskToUpdateLinks = False
try:
wb.UpdateLink(Name=r"C:\Users\XYZ\Desktop\Test\XYZ_201606.csv")
except Exception as e:
print(e)
finally:
wb.Close(True)
wb = None
return True
xl_app.Quit()
xl = None
Whenever I run this, I get the following error
(-2147352567,'Exception occured.',(0,'Microsoft Excel','UpdateLink method of
Workbook class failed','xlmain11.chm',0,-2146827284),None)
Can Somebody tell me what is going wrong here. Also, incase I have multiple links, how do I tell which link needs to be changed to what? Can I pass a dictionary of directories of updated datasets
The code and the approach has been taken from this answer on Stack Overflow
Update Links in for Excel Spreadsheet Using Python
If you review the Microsoft Documentation, it seems that the UpdateLink method can be called without any parameters. Therefore this program should work:
import win32com.client as win32
xl_app = win32.gencache.EnsureDispatch("Excel.Application")
xl_app.Visible = True
xl_app.DisplayAlerts = False
wb = xl_app.workbooks.open(r"C:\Users\XYZ\Desktop\Test\Summary.xlsx")
wb.UpdateLink()
wb.Save()
wb.Close()
xl_app.Quit()
I'm not sure if my solution solves your issue, but I had the same problem and I used LinkSources() and ChangeLink() instead
newSource = r"C:\Users\XYZ\Desktop\Test\XYZ_201606.csv"
oldSource = wb.LinkSources()
wb.ChangeLink(Name = oldSource[0], NewName = newSource, Type = 1)
Hope it helps!

Is there a way to edit/save an opening (in excel) xlsx file?

I want to do something like this:
import openpyxl as x
wb = x.load_workbook(filename)
# do some edit to the workbook
wb.save(filename)
The file specified by the filename is opening in Excel. Excel is locking the file so I will get permission denied error running the above code. Is there a way to edit/save it?
from openpyxl import load_workbook
ifile = 'Whales.xlsx'
wb = load_workbook(filename=ifile)
# do some edit to the workbook
wb.save(ifile)

how to output xlsx generated by Openpyxl to browser?

I was using stackoverflow for a while now and it helped me very often. Now I have a problem I couldn't solve myself or through searching.
I'm trying to output my excel file generated by openpyxl in browser as I was doing it with phpexcel. The method appears to be the same, but I only get broken file. My code looks like this:
from openpyxl.workbook import Workbook
from openpyxl.writer.excel import ExcelWriter
from openpyxl.writer.excel import save_virtual_workbook
from openpyxl.cell import get_column_letter
from StringIO import StringIO
print 'Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
print 'Content-Disposition: attachment;filename="results.xlsx"'
print 'Cache-Control: max-age=0\n'
output = StringIO()
wb = Workbook()
ws = wb.worksheets[0]
ws.cell('A1').value = 3.14
wb.save(output)
print output.getvalue()
#print save_virtual_workbook(wb)
I use the version 1.5.8 and python 2.7.
None of the approaches works. When I just use it from desktop and not browser it works flawlessly.
I would be very thankful for help.
P.S. please don't tell me that using other language or program would be easier. I need to solve this with python.
this is work for me. I use python 2.7 and latest openpyxl and send_file from flask
... code ...
import StringIO
from openpyxl import Workbook
wb = Workbook()
ws = wb.active # worksheet
ws.title = "Excel Using Openpyxl"
c = ws.cell(row=5, column=5)
c.value = "Hi on 5,5"
out = StringIO.StringIO()
wb.save(out)
out.seek(0)
return send_file(out, mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
attachment_filename='xxl.xlsx', as_attachment=True)
output = HttpResponse(mimetype='application/application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
file_name = "Test.xlsx"
output['Content-Disposition'] = 'attachment; filename='+ file_name
wb = Workbook()
ws = wb.worksheets[0]
ws.cell('A1').value = 3.14
wb.save(output)
return output
I used this tips to download my files with openpyxl. Hope that will help
Writing the xlsx output to disk and then serving it up via Apache worked perfectly, but putting it out directly caused errors in Excel and other issues.
I added a couple of extra steps and made one minor change to your code:
buffer=output.getvalue()
In the HTTP headers:
print "Content-Length: " + str(len(buffer))
And used write() instead of print() to push the buffer into the standard output stream:
stdout.write(buffer)
Your scripts works for me as you expect without alterations.
I can only assume you have a problem with your cgi script setup.
Make sure you have the directory where the script lives actually gets served by the web server. On apache you can achieve this with:
ScriptAlias /cgi-bin/ /home/WWW/localhost/cgi-bin/
Make sure the script is excutable by setting the script permissions. For commandline operation (python scriptname) that was not necessary, for your webbrowser that is. And make sure the owner of the webserver can excute the scripts, as the webserver probably does not run as you.
Because Excel uses a binary format you should be using BytesIO to buffer.
from io import BytesIO
But what error are you getting if you use save_virtual_workbook() which does this for you?
I have same problem.
Solution is to switch stdout to bin mode:
import msvcrt
print 'Content-Type:application/octet-stream; name="{}"'.format(os.path.basename(xls_file))
print 'Content-Disposition:attachment; filename="{}"'.format(os.path.basename(xls_file))
print "Content-Length: " + str(os.path.getsize(xls_file))
print 'Cache-Control: max-age=0\r\n'
msvcrt.setmode (1, os.O_BINARY) # stdout = 1
sys.stdout.flush()
with open(xls_file, 'rb') as fobj:
copyfileobj(fobj, sys.stdout)
If you want to build an HTML table that looks like your spreadsheet, you probably want to work with CSV. Either do this, instead of Excel, OR convert your Excel to CSV after you build it.
In any case, once you have the data in CSV format, then it's simply a matter of using python to build the HTML page and looping through the CSV data, while inserting your <table>, <tr>, and <td> tags, as appropriate.

Categories