How to save in openpyxl without losing formulae? - python

Because I need to parse and then use the actual data in cells, I open an xlsm in openpyxl with data_only = True.
This has proved very useful. Now though, having the same need for an xlsm that contains formuale in cells, when I then save my changes, the formulae are missing from the saved version.
Are data_only = True and formulae mutually exclusive? If not, how can I access the actual value in cells without losing the formulae when I save?
When I say I lose the formulae, it seems that the results of the formulae (sums, concatenattions etc.) get preserved. But the actual formulaes themselves are no longer displayed when a cell is clicked.
UPDATE:
To confirm whether or not the formulaes were being preserved or not, I've re-opened the saved xlsm, this time with data_only left as False. I've checked the value of a cell that had been constructed using a formula. Had formulae been preserved, opening the xlsm with data_only set to False should have return the formula. But it returns the actual text value (which is not what I want).

If you want to preserve the integrity of the workbook, ie. retain the formulae, the you cannot use data_only=True. The documentation makes this very clear.

Part of your question was: Are data_only = True and formulae mutually exclusive?
The answer to that, in openpyxl, is yes.
But this is not intrinsic to Excel. You could have a library like openpyxl which gives you access to both the formulas and their results. This is unlikely to happen, since the maintainer(s) of openpyxl are philosophically opposed to this idea.
So, how you're expected to handle your kind of situation in openpyxl is to load the workbook twice: once with data_only=True just to read the data (which you keep in memory), then load it again as a "different" workbook with data_only=False to get a writable version.
The "canonical" way of modifying an existing workbook with Python while preserving everything else (including formatting, formulas, charts, macros, etc.) is to use a COM interface (such as PyWin32, or higher-level wrappers like pywinauto or xlwings) to control a running instance of Excel. Of course, this is only possible if you are running on a machine with Excel installed.

Related

Can a pycel object be saved as an Excel Workbook

After parsing Excel file to Python and evaluating the workbook using pycel, can the pycel object be saved as an Excel file maintaining all original formatting, etc? I.e. only values need to be updated.
TL;DR
No, you cannot save a pycel object back into Excel.
Why not?
The basic problem is that pycel is based on openpyxl. Openpyxl is used to read (and write if needed) Excel spreadsheets. However, while openpyxl has the computed values available for formula cells for a workbook it read in, it does not really allow those computed values to be saved back into a workbook it writes. It doesn't really make sense to save a different computed value for a formula cell, since the cell's value will be recomputed once it is opened back up in Excel.
While it is true that pycel has the information available to properly populate a new value when the workbook is written, it evidently is not a use case that was important to the openpyxl authors or contributors.
Please note that the openpyxl maintainers gladly took pull requests to make it run better with pycel. It seems likely they would be open to discussing a PR for writing values into workbooks.

Python 3 and Excel, Finding complex module to use

I've been looking for ages to find a suitable module to interact with excel, which needs to do the following:
Check a column of cells for an "incorrect" value and change it
Check for empty cells, and if so, replace it
Check a cell value is consistent with the contents of another cell(for example, if called Datasheet, the code in another cell = DS)and if not, change it.
I've looked at openpxyl but I am running Python 3 and I can only seem to find it working for 2.
I've seen a few others but they seem to be mainly focusing creating a new spreadsheet and simple writing/reading.
The Pandas library is amazing to work with excel files. It can read excel files easily and you then have access to a lot of tools. You can do all the operations you mentionned above. You can also save your result in the excel format

Limit the number of worksheets in a excel file using python

I need to limit the number of worksheets in a excel file to a specific number using python. The user cannot add a new worksheet once the number of sheets in that workbook reaches a particular number.
I couldn't find any solution using xlsxwriter, openpyxl or xlrd
Is there any option available using some other packages?
Excel doesn't have such functionality built in. You can only disallow creating new sheets by protecting the workbook or with a VBA handler that reverses the operation immediately.
Protecting the workbook also disables other worksheet operations like moving, renaming and hiding/unhiding which may or may not be desirable.
OTOH, the VBA handler can be more intelligent than the one on the link:
Private Sub Workbook_NewSheet(ByVal Sh As Object)
If ThisWorkbook.Worksheets.Count > <Maximum> Then
With Application
.ScreenUpdating = False
.DisplayAlerts = False
Sh.Delete
.DisplayAlerts = True
.ScreenUpdating = True
End With
End If
End Sub
Of course, this will have no effect if one edits the file with a 3rd-party package that doesn't run VBA, or disables macros in Excel.
To have macros, the workbook must be saved as .xlsm, or Excel would give an error upon opening.
See Working with VBA Macros — XlsxWriter Documentation about Python implementation. openpyxl cannot work with macros, only preserve them at most, and xlrd looks like being designed to only read rather than edit. Alternatively, there's always Excel's own COM interface that pywin32 can use.

openpyxl and stdev.p name error

I have a script to format a bunch of data and then push it into excel, where I can easily scrub the broken data, and do a bit more analysis.
As part of this I'm pushing quite a lot of data to excel, and want excel to do some of the legwork, so I'm putting a certain number of formulae into the sheet.
Most of these ("=AVERAGE(...)" "=A1+3" etc) work absolutely fine, but when I add the standard deviation ("=STDEV.P(...)" I get a name error when I open in excel 2013.
If I click in the cell within excel and hit (i.e. don't change anything within the cell), the cell re-calculates without the name error, so I'm a bit confused.
Is there anything extra that needs to be done to get this to work?
Has anyone else had any experience of this?
Thanks,
Will
--
I've investigated further and this is the issue:
When saving the formula "STDEV.P" openpyxl saves it as:
"=_xludf.STDEV.P(...)"
which is correct for many formula, but not this one.
The result should be:
"=_xlfn.STDEV.P(...)"
When I explicitly change the function to the latter, it works as expected.
I'll file a bug report, so hopefully this is done automatically in the future.
I suspect that there might be a subtle difference in what you think you need to write as the formula and what is actually required. openpyxl itself does nothing with the formula, not even check it. You can investigate this by comparing two files (one from openpyxl, one from Excel) with ostensibly the same formula. The difference might be simple – using "." for decimals and "," as a separator between values even if English isn't the language – or it could be that an additional feature is required: Microsoft has continued to extend the specification over the years.
Once you have some pointers please submit a bug report on the openpyxl issue tracker.

Editing workbooks with rich text in openpyxl

I was wondering if openpyxl can read and/or write rich text into excel. I am aware that this question was asked once before in 2012 linked below, but I am not sure if this has changed.
As it stands load_workbook() seems to throw away rich text formatting.
As for a specific problem, I need to open, edit, and save a workbook where some cells have both superscripted and normal text in one cell. When I save the workbook, the format of the first character of the cell is applied to the rest of the cell.
Here is the to 2012 question:
How do I find the formatting for a subset of text in an Excel document cell
After looking around, it seems like rich text was implemented in openpyxl (based on the issues list on openpyxl's bitbucket):
https://bitbucket.org/openpyxl/openpyxl/issues?q=rich+text
But I am still unclear on how to use it (if I interpreted the issues list correctly at all). If it helps at all, I am actually not editing the contents of these cells simply that they don't lose formatting on save.
Any thoughts would be greatly appreciated.
Thanks!
Best
Formatting below the level of the cell is not supported by openpyxl. To use it you'd have to implement your own code when writing as openpyxl just stores whatever strings it receives. Full read/write support would add a great deal of complexity.

Categories