Calculating Excel sheets without opening them (openpyxl or xlwt) - python

I made a script that opens a .xls file, writes a few new values in it, then saves the file.
Later, the script opens it again, and wants to find the answers in some cells which contain formulas.
If I call that cell with openpyxl, I get the formula (ie: "=A1*B1").
And if I activate data_only, I get nothing.
Is there a way to let Python calculate the .xls file? (or should I try PyXll?)

I realize this question is old, but I ran into the same problem and extensive searching didn't produce an answer.
The solution is in fact quite simple so I will post it here for posterity.
Let's assume you have an xlsx file that you have modified with openpyxl. As Charlie Clark mentioned openpyxl will not calculate the formulas, but if you were to open the file in excel the formulas would be automatically calculated. So all you need to do is open the file and then save it using excel.
To do this you can use the win32com module.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'absolute/path/to/your/file')
# this must be the absolute path (r'C:/abc/def/ghi')
workbook.Save()
workbook.Close()
excel.Quit()
That's it. I've seen all these suggestions to use Pycel or Koala, but that seems like a bit of overkill if all you need to do is tell excel to open and save.
Granted this solution is only for windows.

There is actually a project that takes Excel formulas and evaluates them using Python: Pycel. Pycel uses Excel itself (via COM) to extract the formulas, so in your case you would skip that part. The project probably has something useful that you can use, but I can't vouch for its maturity or completeness. It was not really developed for the general public.
There is also a newer project called Koala which builds on both Pycel and OpenPyXL.
Another approach, if you can't use Excel but you can calculate the results of the formulas yourself (in your Python code), is to write both the value and the formula into a cell (so that when you read the file, you can just pull the value, and not worry about the formula at all). As of this writing, I haven't found a way to do it in OpenPyXL, but XlsxWriter can do it. From the documentation:
XlsxWriter doesn’t calculate the value of a formula and instead stores the value 0 as the formula result. It then sets a global flag in the XLSX file to say that all formulas and functions should be recalculated when the file is opened. This is the method recommended in the Excel documentation and in general it works fine with spreadsheet applications. However, applications that don’t have a facility to calculate formulas, such as Excel Viewer, or some mobile applications will only display the 0 results.
If required, it is also possible to specify the calculated result of the formula using the options value parameter. This is occasionally necessary when working with non-Excel applications that don’t calculate the value of the formula. The calculated value is added at the end of the argument list:
worksheet.write_formula('A1', '=2+2', num_format, 4)
With this approach, when it's time to read the value, you would use OpenPyXL's data_only option. (For other people reading this answer: If you use xlrd, then only the value is available anyway.)
Finally, if you do have Excel, then perhaps the most straightforward and reliable thing you can do is automate the opening and resaving of your file in Excel (so that it will calculate and write the values of the formulas for you). xlwings is an easy way to do this from either Windows or Mac.

The formula module works for me. For detail please refer to https://pypi.org/project/formulas/
from openpyxl import load_workbook
import formulas
#The variable spreadsheet provides the full path with filename to the excel spreadsheet with unevaluated formulae
fpath = path.basename(spreadsheet)
dirname = path.dirname(spreadsheet)
xl_model = formulas.ExcelModel().loads(fpath).finish()
xl_model.calculate()
xl_model.write(dirpath=dirname)
#Use openpyxl to open the updated excel spreadsheet now
wb = load_workbook(filename=spreadsheet,data_only=True)
ws = wb.active

I run into the same problem, and after some time researching I ended up using pyoo ( https://pypi.org/project/pyoo/ ) which is for openoffice/libreoffice so available in all platforms and is more straightforward since communicates natively and doesn't require to save/close the file . I tried several other libraries but found the following problems
xlswings: Only works if you have Excel installed and Windows/MacOS so I couldn't evaluate
koala : Seems that it's broken, after networkx 2.4 update.
openpyxl: As pointed out by others, it isn't able to calculate formulas so I was looking into combining it with pycel to get values. I didn 't finally tried because I found pyoo . Openpyxl+pycel might not work as of now, since pycel is also relying on networkx library.

No, and in openpyxl there will never be. I think there is a Python library that purports to implements an engine for such formualae which you can use.

xlcalculator can do this job. https://github.com/bradbase/xlcalculator
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
filename = r'use_case_01.xlsm'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
# First!A2
# value is 0.1
#
# Fourth!A2
# formula is =SUM(First!A2+1)
val1 = evaluator.evaluate('Fourth!A2')
print("value 'evaluated' for Fourth!A2:", val1)
evaluator.set_cell_value('First!A2', 88)
# now First!A2 value is 88
val2 = evaluator.evaluate('Fourth!A2')
print("New value for Fourth!A2 is", val2)
Which results in the following output;
file_name use_case_01.xlsm ignore_sheets []
value 'evaluated' for Fourth!A2: 1.1
New value for Fourth!A2 is 89

Related

Excel removes a formula set by Pandas, but setting the formula manually makes the formula work

After check this post and see that there is no response I have opened this one.
I am trying to set a formula in an Excel cell through Pandas in Python. So far it worked by specifying the formula as text but with a new formula I am having problems:
=FILTER(SHEET1!A2:I456,(IF(SHEET2!D9=0,SHEET1!D2:D456>SHEET2!D9,SHEET1!D2:D456>=SHEET2!D9)),"No data")
(In the python code, the " are specified as \" for the empty branch)
If I open the Excel file after the code execution, Excel complains that there is a problem and I have to do accept a "recover", showing that the formula has been removed and the cell displays a 0.
After that, If I put the same formula (with " instead of \") manually in the same cell it works and the information is displayed.
I have tried to specify the cells with $ ($A$2) without success... I also have checked in the Excel options and the formulas are set to evaluate in "Automatic".
What is the problem?
Regards.
After some more research I have found the problem. I'm using OFFICE 365, in case it might affect this answer.
What was driving me crazy was that the handwritten formula in Excel was working. I had a workaround that consisted of putting the contents of the formula as text without the = sign so that Excel would not interpret it as a formula. Open Excel, go to that cell, enter the = by hand and when I pressed enter, the data was displayed.
As I use EXCEL in Spanish, but with Pandas you have to write everything in English notation, I thought I would see what Excel did internally when I put the = by hand and the formula worked. What I did was:
Change the file extension from .xlsx to .zip.
Open the zip and go to the path: xl/worksheets/sheet[number].xml.
Find the formula field, looking for <f> or </f>.
At that point I noticed that the content, instead of starting with:
FILTER(....)
I found:
_xlfn._xlws.FILTER(....)
So in the PANDAS code I changed:
cell_formula = f"=FILTER(...)"
by:
cell_formula = f"=_xlfn._xlws.FILTER(...)"
And then:
workbook = pandas_writer.book
worksheet = workbook.sheetnames[sheet_name]
worksheet.write_array_formula("A2:Y109", "{" + cell_formula + "}")
workbook.close()
And now when I open Excel I don't get the error and the formula shows the result. Then, looking in this section of the XlsxWriter documentation and in the Microsoft documentation this function does not appear.
So if this happens to you, fix the function by hand, save the changes and inspect the internal XML that is generated by EXCEL.

is it possible to insert a excel formula value not formula

i am trying to insert a value using excel formula which is happening successfully but i want to save the value not formula ,here is the piece of code i am trying so far.
print("Adding formula to " + filename)
for i,cellObj in enumerate(sheet_formula['P'],1):
cellObj.value='=IF(AND(OR(A{0}="g_m",A{0}="s_m"),ISNUMBER(SEARCH("A", E{0}))), "A", VLOOKUP(A{0},\'i ma\'!A:B, 2, FALSE))'.format(i)
sheet_formula.cell(row=1, column=16).value = 'C'
this piece of is able to insert formula but i want to save the value not formula
"The basic reasons for abandoning openpyxl are: (1) XLS file processing is not supported; (2) the bug of testing current version style preservation is not solved; If you encounter the above two problems, give up openpyxl and embrace xlwings. There is no way out." Grabbed from here.
"It's possible using xlwings which uses pywin32 objects to interact with Excel, rather than just reading/writing xlsx or csv documents like openpyxl and pandas. This way, Excel actually executes the formula, and xlwings grabs the result." Grabbed from here.
So while it's not possible (so it seems) using just Openpyxl, or any other library that does not support xls file processing, as a Python library, it is possible using xlwings. I have added a simple sample below. I simply opened a fresh workbook, added a formula and transformed the formula to it's calculated value.
import xlwings as xw
app = xw.App(visible=False, add_book=False)
wb = app.books.add()
ws = wb.sheets.active
ws['A1'].value = '=3+5'
ws['A1'].value = ws['A1'].value
wb.save(r'C:\Users\...\test.xlsx')
wb.close()
app.quit()
exit()
Hopefully the above helps. Please keep in mind; I'm a Python beginner!
For those who are interested, some good explaination about the difference between Openpyxl and xlWings can be found here. And a somewhat similar problem with some answers can be found here

Python Read value formula from xlsx

I have a problem with reading value of formula from xslx file. For getting value i use openpyxl, but when i want to get value i see "None".
This is my code:
from openpyxl import *
wb = load_workbook('output.xlsx', data_only=True)
sh = wb["Sheet1"]
val=(sh['C5'].value)
File output.xlsx contains formula "C5=A1+B1", cell C5=2, but i can't get this value.
Anybody help me.
May be i need other library for reading the value of the formula from xslx file. May be exist a sample how do it?
I know that it's possible to convert this file into other format for reading, but it is not applicable for this task.
There are a number of solutions to get the value from an Excel cell. It all depends on the environment you are in to get the evaluated cell value.
If you have used Excel to create the xls or xlsx file usually there's a value in a cell. It's not guaranteed because it is possible to turn off re-calc on save, but is usual. And, if someone has turned off re-calc on save the value may not be correct.
If the xls or xlsx file has been created by a non-Excel library (eg; openpyxl, xlwrt), unless you've expressly set the value of the cell, it may not have one.
Thanks to the Python universe there are options.
Pycel, xlcalculator, DataNitro, Formulas and Schedula can read an Excel file and evaluate the formula for a cell. None of these solutions need Excel installed. Most are likely to be OS independant.
I can't find a datanitro link
disclaimer: I am the project owner of xlcalculator.
An example using xlcalculator:
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
filename = r'output.xlsx'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
val1 = evaluator.evaluate('Sheet1!C5')
print("value 'evaluated' for Sheet1!C5:", val1)
I came across this problem yesterday and found a few posts on stackoverflow talking about it. Please check Charlie Clark's post on Read Excel cell value and not the formula computing it -openpyxl.
"openpyxl does not evaluate formulae. When you open an Excel file with
openpyxl you have the choice either to read the formulae or the last
calculated value. If, as you indicate, the formula is dependent upon
add-ins then the cached value can never be accurate. As add-ins
outside the file specification they will never be supported. Instead
you might want to look at something like xlwings which can interact
with the Excel runtime. "
Using xlwings to read the value of a formula cell seems to be a solution but it doesn't work for me because xlwings only works on Mac OS and Windows.

python pandas formula to dataframe

I am creating a dataframe with a bunch of calculations and adding new columns using these formulas (calculations). Then I am saving the dataframe to an Excel file.
I lose the formula after I save the file and open the file again.
For example, I am using something like:
total = 16
for s in range(total):
df_summary['Slopes(avg)' + str(s)]= df_summary[['Slope_S' + str(s)]].mean(axis=1)*df_summary['Correction1']/df_summary['Correction2'].mean(axis=1)
How can I make sure this formula appears in my excel file I write to, similar to how we have a formula in an excel worksheet?
You can write formulas to an excel file using the XlsxWriter module. Use .write_formula() https://xlsxwriter.readthedocs.org/worksheet.html#worksheet-write-formula. If you're not attached to using an excel file to store your dataframe you might want to look into using the pickle module.
import pickle
# to save
pickle.dump(df,open('saved_df.p','wb'))
# to load
df = pickle.load(open('saved_df.p','rb'))
I think my answer here may be responsive. The short of it is you need to use openpyxl (or possibly xlrd if they've added support for it) to extract the formula, and then xlsxwriter to write the formula back in. It can definitely be done.
This assumes, of course, as #jay s pointed out, that you first write Excel formulas into the DataFrame. (This solution is an alternative to pickling.)

Is there any way to edit an existing Excel file using Python preserving formulae?

I am trying to edit several excel files (.xls) without changing the rest of the sheet. The only thing close so far that I've found is the xlrd, xlwt, and xlutils modules. The problem with these is it seems that xlrd evaluates formulae when reading, then puts the answer as the value of the cell. Does anybody know of a way to preserve the formulae so I can then use xlwt to write to the file without losing them? I have most of my experience in Python and CLISP, but could pick up another language pretty quick if they have better support. Thanks for any help you can give!
I had the same problem... And eventually found the next module:
from openpyxl import load_workbook
def Write_Workbook():
wb = load_workbook(path)
ws = wb.get_sheet_by_name("Sheet_name")
c = ws.cell(row = 2, column = 1)
c.value = Some_value
wb.save(path)
==> Doing this, my file got saved preserving all formulas inserted before.
Hope this helps!
I've used the xlwt.Formula function before to be able to get hyperlinks into a cell. I imagine it will also work with other formulas.
Update: Here's a snippet I found in a project I used it in:
link = xlwt.Formula('HYPERLINK("%s";"View Details")' % url)
sheet.write(row, col, link)
As of now, xlrd doesn't read formulas. It's not that it evaluates them, it simply doesn't read them.
For now, your best bet is to programmatically control a running instance of Excel, either via pywin32 or Visual Basic or VBScript (or some other Microsoft-friendly language which has a COM interface). If you can't run Excel, then you may be able to do something analogous with OpenOffice.org instead.
We've just had this problem and the best we can do is to manually re-write the formulas as text, then convert them to proper formulas on output.
So open Excel and replace =SUM(C5:L5) with "=SUM(C5:L5)" including the quotes. If you have a double quote in your formula, replace it with 2 double quotes, as this will escape it, so = "a" & "b" becomes "= ""a"" & ""b"" ")
Then in your Python code, loop over every cell in the source and output sheets and do:
output_sheet.write(row, col, xlwt.ExcelFormula.Formula(source_cell[1:-1]))
We use this SO answer to make a copy of the source sheet to be the output sheet, which even preserves styles, and avoids overwriting the hand written text formulas from above.

Categories