In Excel, a workbook knows whether it already was given a filename or not. This changes Excel behavior when one clics on the save button. If the wb has no name, the save button redirects to the "save as" action. I wonder if openpyxl has a similar mechanism. This would help me in a function such as:
def smartSaveXLbook(wb, defaultName='MyBook.xlsx'):
if wb.properties.title: # this does not work. No wb passes this test :-(
print("wb has a name :", wb.properties.title) # wb.properties.title always empty
wb.save() # wants to save with the current existing name
else:
wb.save(defaultName) # Simplified version here. I will grant name uniqueness.
No, and for good reason: once openpyxl has read the file, it releases it and any other application can do what they want with it. But you can easily write a wrapper function that would store the name used to open the workbook. You could then use parameters to work with this: overwrite or always new?
Related
I have a program that imports raw data, interprets them, generates an excel file, and writes information to multiple sheets based on what was interpreted.
I inherited this program from someone else. They wrote the program using openpyxl, and they have several functions that load and save the excel file. These functions are called several times throughout the program. The larger the raw data file is, the more times the functions are called. This takes a lot of time for the program to run. A raw data file of 260kb takes over 2 hours for my laptop to process. I am struggling to make the excel file only save once, but that is what I hope to accomplish.
Here is how it was originally written in the createExcel.py file the previous developer wrote. All of these functions are called by main.py and other .py files several times.
(in createExcel.py)
def create_excel(file_name):
# create the shell/format of the sheets in the excel file
# but does not write anything other than headers
wb = Workbook()
...# lots of code
wb.save(f'{file_name}.xlsx')
def insert_sheet_1_data(array):
# take data that was interpreted in another .py file and add
# the information to the relevant sheet in the excelfile
wb = load_workbook(f"{fileName}.xlsx")
...
wb.save(f'{file_name}.xlsx')
...
...
def insert_sheet_5_data(array):
# take data that was interpreted in another .py file and add
# the information to the relevant sheet in the excelfile
wb = load_workbook(f"{fileName}.xlsx")
...
wb.save(f'{file_name}.xlsx')
I tried declaring wb = Workbook outside of the functions in the createExcel.py. I then deleted all of the load_workbook calls and wb.save calls in the above functions. Then I imported createExcel into the main.py file, and wrote a line to save the wb file as the last line of code in main.py:
(in createExcel.py)
wb = Workbook()
def create_excel(file_name):
# create the shell/format of the sheets in the excel file
# but does not write anything other than headers
def insert_sheet_1_data(array):
# take data that was interpreted in another .py file and add
# the information to the relevant sheet in the excelfile
...
def insert_sheet_5_data(array):
# take data that was interpreted in another .py file and add
# the information to the relevant sheet in the excelfile
(in main.py)
import createExcel
...
#rest of the code
wb = createExcel.wb
wb.save(f'{file_name}'
# end
The original code works, but it takes an incredibly long time to execute. The new code is much much faster, and through debug mode I see that it still goes through and execeutes all of the openpyxl related calls, but it never creates an excel file on my computer (that I can find), and it doesn't throw any errors about trying to save a file that doesn't exist, so I am not sure what to make of it.
Any insight as to what I am doing wrong here would be really appreciated! Thanks so much.
You mentioned that it takes ~ one minute per KiB to process a spreadsheet.
That seems very very high.
There must be some relevant details that the original post does not mention.
You wrote
# (in createExcel.py)
wb = Workbook()
That is not a Best Practice; please don't do that.
More generally, avoid doing time consuming work at import time
and especially avoid ".save()" side effects at that time.
Better to defer such actions until run time, when a function
has been explicitly called.
At the end of def create_excel(...), consider appending
return wb
so you can communicate results through the call stack
instead of through the filesystem.
You also wrote
# (in main.py)
import createExcel
...
# other things
wb = createExcel.wb
wb.save(f'{file_name}'
The import is fine, but grabbing a reference to createExcel.wb
is not -- you don't want the import to spend time creating that object.
Better that you assign wb = createExcel.create_excel(...).
That is, defer doing the work until the caller actually needs that object.
Consider defining
def other_things(...):
...
so you can call it when appropriate, rather than at import time.
An overall theme to this code is that you wish to avoid
doing useless repeated work.
Consider structuring the code in this way:
class MyWorkbook:
def __init__():
self.wb = Workbook() # or even: self.wb = self.create_excel(...)
Now all the various methods can cheaply access
the existing self.wb attribute, perhaps
adding sheets to it as they see fit.
At the end you can .save() just once.
I am using a Excel template which have 6 tabs (All unprotected) and writing the data on each worksheet using openpyxl module.
Once the excel file is created and when tried to open the generated file, its not showing all data untill and unless I click "Enable editing" pop up.
Is there any attribute to disable in openpyxl.
This sounds like Windows has quarantined files received over a network. As this is done when the files are received, there is no way to avoid this when creating the files.
I solved this for me.
I found the answer here:
https://codereview.stackexchange.com/questions/240136/python-script-for-refreshing-and-preprocessing-data-in-excel-report-files
I only used the refresh function and it basically opened the excel file, click/refreshed, and closed/saved. You see an Excel file appear briefly on the screen. I'll insert this in a loop to go through all the files I am creating. It might take a little while to run hundreds, but much faster than open-click-save.
Here is all the code I used:
import win32com.client as win32
def refresh(directory, file_name):
xlapp = win32.DispatchEx('Excel.Application')
xlapp.DisplayAlerts = False
xlapp.Visible = True
xlbook = xlapp.Workbooks.Open(directory + '\\' + file_name)
xlbook.RefreshAll()
xlbook.Save()
xlbook.Close()
xlapp.Quit()
return()
I'm a total novice when it comes to programming. I'm trying to write a Python 3 program that will produce an Excel workbook based on the contents of a CSV file. So far, I understand how to create the workbook, and I'm able to dynamically create worksheets based on the contents of the CSV file, but I'm having trouble writing to each individual worksheet.
Note, in the example that follows, I'm providing a static list, but my program dynamically creates a list of names based on the contents of the CSV file: the number of names that will be appended to the list varies from 1 to 60, depending on the assay in question.
import xlsxwriter
workbook = xlsxwriter.Workbook('C:\\Users\\Jabocus\\Desktop\\Workbook.xlsx')
list = ["a", "b", "c", "d"]
for x in list:
worksheet = workbook.add_worksheet(x)
worksheet.write("A1", "Hello!")
workbook.close()
If I run the program as it appears above, I get a SyntaxError, and IPython points to workbook.close() as the source of the problem.
However, if I exclude the line where I try to write "Hello!" to cell A1 in every worksheet, the program runs as I'd expect: I end up with Workbook.xlsx on my desktop, and it has 4 worksheets named a, b, c, and d.
The for loop seemed like a good choice to me, because my program will need to handle a variety of CSV formats (I'd rather write one program that can process data from every assay at my lab than a program for each assay).
My hope was that by using worksheet.write() in the way that I did, Python would know that I want to write to the worksheet that I just created (i.e. I thought worksheet would act as the name for each worksheet during each iteration of the loop despite explicitly naming each worksheet something new).
It feels like the iteration is the problem here, and I know that it has something to do with how I'm trying to reference each Worksheet in the write() step (because I'm not giving any of the Worksheet objects an explicit name), but I don't know how to proceed. What's a good way that I might approach this problem?
I'm not sure exactly what is wrong with your code, but I can tell you this:
I copied your code exactly (except for changing the path to be my desktop) and it worked fine.
I believe your issue could be one of three things:
You have a buggy/old version of XlsxWriter
You have a file called Workbook.xlsx on your Desktop already that is corrupted or causing some issues (open in another program.)
You have some code other than what you posted.
To account for all of these possibilities, I would recommend that you:
Reinstall XlsxWriter:
In a Command Prompt run pip uninstall XlsxWriter followed by pip install XlsxWriter
Change the filename of the workbook you are opening:
workbook = xlsxwriter.Workbook('C:\\Users\\Jabocus\\Desktop\\Workbook2.xlsx')
Try running the code that you posted exactly, then incrementally add to it until it stops working.
Did you try something like worksheet.write(0, 0, 'Hello')
instead of worksheet.write('A1', 'Hello')
A file which has data I need to access is being generated in xlsx format yet I need it to be in xls format for my purpose.
I'm able to use win32com.client to open the file however the save cannot fully complete due to Compatibility Checker dialog pop up which notifies you of loss of formatting/features by going from a xlsx --> xls format.
Here's the code which currently doesn't allow me to save the file as execution hangs waiting for the dialog to close, any help would be much appreciated:
excel = win32com.client.Dispatch('Excel.Application')
excel.DisplayAlerts = False
in_file = u"C:\\Path\\to\\infile\\infile.xlsx"
out_file = u"C:\\Path\\to\\outfile\\outfile.xls"
wb = excel.Workbooks.Open(in_file)
wb.CheckCompatibility = False
wb.DoNotPromptForConvert = True
wb.SaveAs(out_file, FileFormat=56) #Execution hangs here
wb.Close()
excel.Quit()
I've seen other similar posts which mention the methods and attributes I've already set in this script. I've also modified my the registry value to ShowCompatDialog = 0.
MSDN says on Workbook.DoNotPromptForConvert property:
"true to prompt the user to convert the workbook; otherwise, false".
Write in your code:
wb.DoNotPromptForConvert = False
UPDATE: Solved this issue using this Excel Add-In
For reference: Changing the registry value was working however the values were being reset daily on the internal network which I develop in. Without being able to edit the registry values myself b/c I don't possess the admin rights to do so, the above solution was the only thing that ended up solving my problem.
I have an Excel macro that deletes a sheet, copies another sheet and renames it to the same name of the deleted sheet. This works fine when run from Excel, but when I run it by calling the macro from Python I get the following error message:
Run-time error '1004' - Cannot rename a sheet to the same name as
another sheet, a referenced object library or a workbook referenced by
VisualBasic.
The macro has code like the following:
Sheets("CC").Delete
ActiveWindow.View = xlPageBreakPreview
Sheets("FY").Copy After:=Sheets(Sheets.Count)
Sheets(Sheets.Count).Name = "CC"
and the debugger highlights the error on the last line where the sheet is renamed. I've also tried putting these calls directly in python but get the same error message.
Any suggestions are much appreciated!
Thanks.
I ran the code inside Excel VBA.
I am guessing that the following line is failing.
Sheets("CC").Delete
And that is the reason, you can't give the new sheet same name as existing (non-deleted) sheet.
Put Application.DisplayAlerts = False before Sheets("CC").Delete and Application.DisplayAlerts = True once you are finished with the code.
I haven't used python but it seems the library is swallowing that error for you and letting you go ahead to the next statement.
Hope that helps.
Behind the scenes, VB and VBA are maintaining references to COM objects for the application, worksheets etc. This is why you have the globals 'Application', 'Worksheets' etc. It is possible that VBA is still holding a reference to the worksheet, so Excel hasn't tidied it up properly.
Try not using these implicit globals and referencing the items in the object model explicitly. Alternatively you could do it directly in Python.
Here's a python script that will do something like what you want:
import win32com.client
xl = win32com.client.Dispatch ('Excel.Application')
xl.Visible = True
wb = xl.Workbooks.Add()
wb.Worksheets[0].Delete()
wb.Worksheets.Add()
wb.Worksheets[0].Name = 'Sheet1'