Broken Excel output: Openpyxl formula settings? - python

I am creating some Excel spreadsheets from pandas DataFrames using the pandas.ExcelWriter().
Issue:
For some string input, this creates broken .xlsx files that need to be repaired. (problem with some content --- removed formula, cf error msg below)
I assume this happens because Excel interprets the cell content not as a string, but a formula which it cannot parse, e.g. when a string value starts with "="
Question:
When using xlsxwriter as engine, I can solve this issue by setting the argument options = {"strings_to_formulas" : False }
Is there a similar argument for openpyxl?
Troubleshooting:
I found the data_only argument to Workbook, but it only seems to apply to reading files / I cannot get it to work with ExcelWriter().
Not all output values are strings / I'd like to avoid converting all output to str
Could not find an applicable question on here
Any hints are much appreciated, thanks!
Error messages:
We found a problem with some content in 'file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes
The log after opening says:
[...] summary="Following is a list of removed records:">Removed Records: Formula from /xl/worksheets/sheet1.xml part [...]
Code
import pandas
excelout = pandas.ExcelWriter(output_file, engine = "openpyxl")
df.to_excel(excelout)
excelout.save()
Versions:
pandas #0.24.2
openpyxl #2.5.6
Excel 2016 for Mac (but replicates on Win)

I've struggled of this issue too.
I have found a strange solution for formulas.
I had to replace all ; (semicolon) signs with , (comma) in the formulas.
When I opened the result xlsx file with Excel, this error didn't rise and the formula in Excel had usual ;.

I spent FAR too long trying to figure out this error.
Turned out I had an extra bracket, so the formula wasn't valid.
I know 99% of people will read this and say "thats not the issue" and move on, but take your formula and paste it into excel if you can (replacing dynamic values as best you can) and see if excel accepts it.
If it accepts it fine, move on and find whatever the other cause it, but if you find it doesn't like the formula, maybe I just saved you a couple of hours....
My command: f'''=IF(ISBLANK(E{row}),FALSE," "))'''
Tiny command, could not understand what was wrong with it. :facepalm:

Related

Opening an Excel File in Python Disables Dynamic Arrays

I have an excel workbook that uses functions like OFFSET, UNIQUE, and FILTER which spill into other cells. I'm using python to analyze and write some data to the workbook, but after doing so these formulas revert into normal arrays. This means they now take up a fixed number of cells (however many they took up before opening the file in python) instead of adjusting to fit all of the data. I can revert the change by selecting the formula and hitting enter, but there are many of these formulas it's more work to fix them than to just print the data to a text file and paste it into excel manually. Is there any way to prevent this behavior?
I've been using openpyxl to open and save the workbook, but after encountering this issue also tried xlsxwriter and the dataframe to excel function from pandas. Both of them had the same issue as openpyxl. For context I am on python 3.11 and using the most recent version of these modules. I believe this issue is on the Python side and not the Excel side, so I don't think changing Excel settings will help, but maybe there is something there I missed.
Example:
I've created an empty workbook with two sheets, one called 'main' and one called 'input'. The 'main' sheet will analyze data from the 'input' sheet which will be entered with openpyxl. The data will just be values in the first column.
In cell A1 of the 'main' sheet, enter =OFFSET(input!A1,0,0,COUNTA(input!A:A),1).
This formula will just show a copy of the data. Since there currently isn't any data it gives a #REF! error, so it only takes up one cell.
Now I'll run the following python code to add the numbers 0-9 into the first column of the input sheet:
from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb['input']
for i in range(10):
ws.append([i])
wb.save('workbook_2.xlsx')
When opening the new file, cell A1 on the 'main' sheet only has the first value, 0, instead of the range 0--9. When selecting the cell, you can see the formula is now {=OFFSET(input!A1,0,0,COUNTA(input!A:A),1)}. The curly brackets make it an array, so it wont spill. By hitting enter in the formula the array is removed and the sheet properly becomes the full range.
If I can get this simple example to work, then expanding it to the data I'm using shouldn't be a problem.

Pandas read_excel returning nan for cells having simple formula if excel file is created by program [duplicate]

This question already has answers here:
Pandas read_excel - returning nan for cells having formula
(3 answers)
Closed last month.
I use pd.read_excel to read a excel file which is created by openpyxl and downloaded from a url.
The parsed dataframe will give nan if the cell value is a formula.
# which formula is simply =100-3
0
0 NaN
I try to open it manually with MS Office, click "edit" button, and save it, the problem is solved.
# after saving the excel, problem is solved, e.g. 97
0
0 97
I want to know is there a solution that do it programmatically? and if without using MS Excel or win32com will be great. Thanks
not enough points to comment but this probably can help you:
stackoverflowanswer
After doing some searches, I found my question may be duplicated with (or similar to):
pandas-read-excel-returning-nan-for-cells-having-formula
and found more explanations from:
python-openpyxl-read-xlsx-data-after-writing-on-existing-xlsx-with-formula
openpyxl-data-only-gives-only-a-none-answer-when-storing-a-variable
python-openpyxl-data-only-true-returning-none
refresh-excel-external-data-with-python
Some notes (conclusions):
openpyxl can write but doesn't caculate the excel formula, it just read cached value from last calculation by MS excel or other applications if possible with data_only=True arguments.
for solving this manually, like #Orlando's answer mentioned, open excel apps and save it (will automatically calculate/produce the formula results)
for solving this programatically (with excel app installed), you just use win32com open and save it. (see this answer)
for solving this programatically (without excel app), you must calculate the results from excel formula string by yourself or some module like formulas, then set the caculated value back to cell (Warning: this will delete the formula) . If you also want to keep formula with default/cached value, you should use XlsxWriter which can write formula in cell with a default/cached value.
For me, because my formula is very simple, I use eval like:
import openpyxl
wb = openpyxl.load_workbook('./test_formula2.xlsx')
ws = wb.active
ws.cell(2,2).value # '=100-1'
eval(ws.cell(2,2).value[1:]) # slice after '=', e.g. 99
to get the calculated result.
You can use formulas
The following snippet seems to work:
import formulas
xl_model = formulas.ExcelModel().loads('test_formula.xlsx').finish()
xl_model.calculate()
xl_model.write(dirpath='.')
This will write a "TEST_FORMULA.XLSX" (all caps for some reason) file with calculated values in place of the formulas. Importantly, this does not rely on Excel.
Here is the formulas documentation if you need to dig into it.

Excel removes a formula set by Pandas, but setting the formula manually makes the formula work

After check this post and see that there is no response I have opened this one.
I am trying to set a formula in an Excel cell through Pandas in Python. So far it worked by specifying the formula as text but with a new formula I am having problems:
=FILTER(SHEET1!A2:I456,(IF(SHEET2!D9=0,SHEET1!D2:D456>SHEET2!D9,SHEET1!D2:D456>=SHEET2!D9)),"No data")
(In the python code, the " are specified as \" for the empty branch)
If I open the Excel file after the code execution, Excel complains that there is a problem and I have to do accept a "recover", showing that the formula has been removed and the cell displays a 0.
After that, If I put the same formula (with " instead of \") manually in the same cell it works and the information is displayed.
I have tried to specify the cells with $ ($A$2) without success... I also have checked in the Excel options and the formulas are set to evaluate in "Automatic".
What is the problem?
Regards.
After some more research I have found the problem. I'm using OFFICE 365, in case it might affect this answer.
What was driving me crazy was that the handwritten formula in Excel was working. I had a workaround that consisted of putting the contents of the formula as text without the = sign so that Excel would not interpret it as a formula. Open Excel, go to that cell, enter the = by hand and when I pressed enter, the data was displayed.
As I use EXCEL in Spanish, but with Pandas you have to write everything in English notation, I thought I would see what Excel did internally when I put the = by hand and the formula worked. What I did was:
Change the file extension from .xlsx to .zip.
Open the zip and go to the path: xl/worksheets/sheet[number].xml.
Find the formula field, looking for <f> or </f>.
At that point I noticed that the content, instead of starting with:
FILTER(....)
I found:
_xlfn._xlws.FILTER(....)
So in the PANDAS code I changed:
cell_formula = f"=FILTER(...)"
by:
cell_formula = f"=_xlfn._xlws.FILTER(...)"
And then:
workbook = pandas_writer.book
worksheet = workbook.sheetnames[sheet_name]
worksheet.write_array_formula("A2:Y109", "{" + cell_formula + "}")
workbook.close()
And now when I open Excel I don't get the error and the formula shows the result. Then, looking in this section of the XlsxWriter documentation and in the Microsoft documentation this function does not appear.
So if this happens to you, fix the function by hand, save the changes and inspect the internal XML that is generated by EXCEL.

Python Read value formula from xlsx

I have a problem with reading value of formula from xslx file. For getting value i use openpyxl, but when i want to get value i see "None".
This is my code:
from openpyxl import *
wb = load_workbook('output.xlsx', data_only=True)
sh = wb["Sheet1"]
val=(sh['C5'].value)
File output.xlsx contains formula "C5=A1+B1", cell C5=2, but i can't get this value.
Anybody help me.
May be i need other library for reading the value of the formula from xslx file. May be exist a sample how do it?
I know that it's possible to convert this file into other format for reading, but it is not applicable for this task.
There are a number of solutions to get the value from an Excel cell. It all depends on the environment you are in to get the evaluated cell value.
If you have used Excel to create the xls or xlsx file usually there's a value in a cell. It's not guaranteed because it is possible to turn off re-calc on save, but is usual. And, if someone has turned off re-calc on save the value may not be correct.
If the xls or xlsx file has been created by a non-Excel library (eg; openpyxl, xlwrt), unless you've expressly set the value of the cell, it may not have one.
Thanks to the Python universe there are options.
Pycel, xlcalculator, DataNitro, Formulas and Schedula can read an Excel file and evaluate the formula for a cell. None of these solutions need Excel installed. Most are likely to be OS independant.
I can't find a datanitro link
disclaimer: I am the project owner of xlcalculator.
An example using xlcalculator:
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
filename = r'output.xlsx'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
val1 = evaluator.evaluate('Sheet1!C5')
print("value 'evaluated' for Sheet1!C5:", val1)
I came across this problem yesterday and found a few posts on stackoverflow talking about it. Please check Charlie Clark's post on Read Excel cell value and not the formula computing it -openpyxl.
"openpyxl does not evaluate formulae. When you open an Excel file with
openpyxl you have the choice either to read the formulae or the last
calculated value. If, as you indicate, the formula is dependent upon
add-ins then the cached value can never be accurate. As add-ins
outside the file specification they will never be supported. Instead
you might want to look at something like xlwings which can interact
with the Excel runtime. "
Using xlwings to read the value of a formula cell seems to be a solution but it doesn't work for me because xlwings only works on Mac OS and Windows.

Is there any way to edit an existing Excel file using Python preserving formulae?

I am trying to edit several excel files (.xls) without changing the rest of the sheet. The only thing close so far that I've found is the xlrd, xlwt, and xlutils modules. The problem with these is it seems that xlrd evaluates formulae when reading, then puts the answer as the value of the cell. Does anybody know of a way to preserve the formulae so I can then use xlwt to write to the file without losing them? I have most of my experience in Python and CLISP, but could pick up another language pretty quick if they have better support. Thanks for any help you can give!
I had the same problem... And eventually found the next module:
from openpyxl import load_workbook
def Write_Workbook():
wb = load_workbook(path)
ws = wb.get_sheet_by_name("Sheet_name")
c = ws.cell(row = 2, column = 1)
c.value = Some_value
wb.save(path)
==> Doing this, my file got saved preserving all formulas inserted before.
Hope this helps!
I've used the xlwt.Formula function before to be able to get hyperlinks into a cell. I imagine it will also work with other formulas.
Update: Here's a snippet I found in a project I used it in:
link = xlwt.Formula('HYPERLINK("%s";"View Details")' % url)
sheet.write(row, col, link)
As of now, xlrd doesn't read formulas. It's not that it evaluates them, it simply doesn't read them.
For now, your best bet is to programmatically control a running instance of Excel, either via pywin32 or Visual Basic or VBScript (or some other Microsoft-friendly language which has a COM interface). If you can't run Excel, then you may be able to do something analogous with OpenOffice.org instead.
We've just had this problem and the best we can do is to manually re-write the formulas as text, then convert them to proper formulas on output.
So open Excel and replace =SUM(C5:L5) with "=SUM(C5:L5)" including the quotes. If you have a double quote in your formula, replace it with 2 double quotes, as this will escape it, so = "a" & "b" becomes "= ""a"" & ""b"" ")
Then in your Python code, loop over every cell in the source and output sheets and do:
output_sheet.write(row, col, xlwt.ExcelFormula.Formula(source_cell[1:-1]))
We use this SO answer to make a copy of the source sheet to be the output sheet, which even preserves styles, and avoids overwriting the hand written text formulas from above.

Categories