python pandas formula to dataframe - python

I am creating a dataframe with a bunch of calculations and adding new columns using these formulas (calculations). Then I am saving the dataframe to an Excel file.
I lose the formula after I save the file and open the file again.
For example, I am using something like:
total = 16
for s in range(total):
df_summary['Slopes(avg)' + str(s)]= df_summary[['Slope_S' + str(s)]].mean(axis=1)*df_summary['Correction1']/df_summary['Correction2'].mean(axis=1)
How can I make sure this formula appears in my excel file I write to, similar to how we have a formula in an excel worksheet?

You can write formulas to an excel file using the XlsxWriter module. Use .write_formula() https://xlsxwriter.readthedocs.org/worksheet.html#worksheet-write-formula. If you're not attached to using an excel file to store your dataframe you might want to look into using the pickle module.
import pickle
# to save
pickle.dump(df,open('saved_df.p','wb'))
# to load
df = pickle.load(open('saved_df.p','rb'))

I think my answer here may be responsive. The short of it is you need to use openpyxl (or possibly xlrd if they've added support for it) to extract the formula, and then xlsxwriter to write the formula back in. It can definitely be done.
This assumes, of course, as #jay s pointed out, that you first write Excel formulas into the DataFrame. (This solution is an alternative to pickling.)

Related

Pandas read_excel returning nan for cells having simple formula if excel file is created by program [duplicate]

This question already has answers here:
Pandas read_excel - returning nan for cells having formula
(3 answers)
Closed last month.
I use pd.read_excel to read a excel file which is created by openpyxl and downloaded from a url.
The parsed dataframe will give nan if the cell value is a formula.
# which formula is simply =100-3
0
0 NaN
I try to open it manually with MS Office, click "edit" button, and save it, the problem is solved.
# after saving the excel, problem is solved, e.g. 97
0
0 97
I want to know is there a solution that do it programmatically? and if without using MS Excel or win32com will be great. Thanks
not enough points to comment but this probably can help you:
stackoverflowanswer
After doing some searches, I found my question may be duplicated with (or similar to):
pandas-read-excel-returning-nan-for-cells-having-formula
and found more explanations from:
python-openpyxl-read-xlsx-data-after-writing-on-existing-xlsx-with-formula
openpyxl-data-only-gives-only-a-none-answer-when-storing-a-variable
python-openpyxl-data-only-true-returning-none
refresh-excel-external-data-with-python
Some notes (conclusions):
openpyxl can write but doesn't caculate the excel formula, it just read cached value from last calculation by MS excel or other applications if possible with data_only=True arguments.
for solving this manually, like #Orlando's answer mentioned, open excel apps and save it (will automatically calculate/produce the formula results)
for solving this programatically (with excel app installed), you just use win32com open and save it. (see this answer)
for solving this programatically (without excel app), you must calculate the results from excel formula string by yourself or some module like formulas, then set the caculated value back to cell (Warning: this will delete the formula) . If you also want to keep formula with default/cached value, you should use XlsxWriter which can write formula in cell with a default/cached value.
For me, because my formula is very simple, I use eval like:
import openpyxl
wb = openpyxl.load_workbook('./test_formula2.xlsx')
ws = wb.active
ws.cell(2,2).value # '=100-1'
eval(ws.cell(2,2).value[1:]) # slice after '=', e.g. 99
to get the calculated result.
You can use formulas
The following snippet seems to work:
import formulas
xl_model = formulas.ExcelModel().loads('test_formula.xlsx').finish()
xl_model.calculate()
xl_model.write(dirpath='.')
This will write a "TEST_FORMULA.XLSX" (all caps for some reason) file with calculated values in place of the formulas. Importantly, this does not rely on Excel.
Here is the formulas documentation if you need to dig into it.

Open an existing excel file and writing a new line in Python

I've searched through some old answers about my problem, but couldn't find an answer.
The problem: I want to open an existing Excel file, than write a new-line with a list and than save the current Excel-File.
My current code is:
import pandas
import pandas as pd
l_bsp = range(1,13)
df = pd.read_excel("Existing_file.xlsx")
df.loc[df.shape[0]+1] = l_bsp
print(df)
Right now it only changes my Dataframe without changing the Excel File. How to I add the list into the existing excel file?
Thank you.
Your code only read the Excel file without writing it back. You can use command such as df.to_excel(). For better understanding of the writing process, suggest you also take a look at the pandas user guide on Writing Excel files

Using Python to load template excel file, insert a DataFrame to specific lines and save as a new file

I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()

Calculating Excel sheets without opening them (openpyxl or xlwt)

I made a script that opens a .xls file, writes a few new values in it, then saves the file.
Later, the script opens it again, and wants to find the answers in some cells which contain formulas.
If I call that cell with openpyxl, I get the formula (ie: "=A1*B1").
And if I activate data_only, I get nothing.
Is there a way to let Python calculate the .xls file? (or should I try PyXll?)
I realize this question is old, but I ran into the same problem and extensive searching didn't produce an answer.
The solution is in fact quite simple so I will post it here for posterity.
Let's assume you have an xlsx file that you have modified with openpyxl. As Charlie Clark mentioned openpyxl will not calculate the formulas, but if you were to open the file in excel the formulas would be automatically calculated. So all you need to do is open the file and then save it using excel.
To do this you can use the win32com module.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'absolute/path/to/your/file')
# this must be the absolute path (r'C:/abc/def/ghi')
workbook.Save()
workbook.Close()
excel.Quit()
That's it. I've seen all these suggestions to use Pycel or Koala, but that seems like a bit of overkill if all you need to do is tell excel to open and save.
Granted this solution is only for windows.
There is actually a project that takes Excel formulas and evaluates them using Python: Pycel. Pycel uses Excel itself (via COM) to extract the formulas, so in your case you would skip that part. The project probably has something useful that you can use, but I can't vouch for its maturity or completeness. It was not really developed for the general public.
There is also a newer project called Koala which builds on both Pycel and OpenPyXL.
Another approach, if you can't use Excel but you can calculate the results of the formulas yourself (in your Python code), is to write both the value and the formula into a cell (so that when you read the file, you can just pull the value, and not worry about the formula at all). As of this writing, I haven't found a way to do it in OpenPyXL, but XlsxWriter can do it. From the documentation:
XlsxWriter doesn’t calculate the value of a formula and instead stores the value 0 as the formula result. It then sets a global flag in the XLSX file to say that all formulas and functions should be recalculated when the file is opened. This is the method recommended in the Excel documentation and in general it works fine with spreadsheet applications. However, applications that don’t have a facility to calculate formulas, such as Excel Viewer, or some mobile applications will only display the 0 results.
If required, it is also possible to specify the calculated result of the formula using the options value parameter. This is occasionally necessary when working with non-Excel applications that don’t calculate the value of the formula. The calculated value is added at the end of the argument list:
worksheet.write_formula('A1', '=2+2', num_format, 4)
With this approach, when it's time to read the value, you would use OpenPyXL's data_only option. (For other people reading this answer: If you use xlrd, then only the value is available anyway.)
Finally, if you do have Excel, then perhaps the most straightforward and reliable thing you can do is automate the opening and resaving of your file in Excel (so that it will calculate and write the values of the formulas for you). xlwings is an easy way to do this from either Windows or Mac.
The formula module works for me. For detail please refer to https://pypi.org/project/formulas/
from openpyxl import load_workbook
import formulas
#The variable spreadsheet provides the full path with filename to the excel spreadsheet with unevaluated formulae
fpath = path.basename(spreadsheet)
dirname = path.dirname(spreadsheet)
xl_model = formulas.ExcelModel().loads(fpath).finish()
xl_model.calculate()
xl_model.write(dirpath=dirname)
#Use openpyxl to open the updated excel spreadsheet now
wb = load_workbook(filename=spreadsheet,data_only=True)
ws = wb.active
I run into the same problem, and after some time researching I ended up using pyoo ( https://pypi.org/project/pyoo/ ) which is for openoffice/libreoffice so available in all platforms and is more straightforward since communicates natively and doesn't require to save/close the file . I tried several other libraries but found the following problems
xlswings: Only works if you have Excel installed and Windows/MacOS so I couldn't evaluate
koala : Seems that it's broken, after networkx 2.4 update.
openpyxl: As pointed out by others, it isn't able to calculate formulas so I was looking into combining it with pycel to get values. I didn 't finally tried because I found pyoo . Openpyxl+pycel might not work as of now, since pycel is also relying on networkx library.
No, and in openpyxl there will never be. I think there is a Python library that purports to implements an engine for such formualae which you can use.
xlcalculator can do this job. https://github.com/bradbase/xlcalculator
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
filename = r'use_case_01.xlsm'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
# First!A2
# value is 0.1
#
# Fourth!A2
# formula is =SUM(First!A2+1)
val1 = evaluator.evaluate('Fourth!A2')
print("value 'evaluated' for Fourth!A2:", val1)
evaluator.set_cell_value('First!A2', 88)
# now First!A2 value is 88
val2 = evaluator.evaluate('Fourth!A2')
print("New value for Fourth!A2 is", val2)
Which results in the following output;
file_name use_case_01.xlsm ignore_sheets []
value 'evaluated' for Fourth!A2: 1.1
New value for Fourth!A2 is 89

Delete excel row with Python

I'm doing some testing using python-excel modules. I can't seem to find a way to delete a row in an excel sheet using these modules and the internet hasn't offered up a solution. Is there a way to delete a row using one of the python-excel modules?
In my case, I want to open an excel sheet, read the first row, determine if it contains some valid data, if not, then delete it.
Any suggestions are welcome.
xlwt provides as the module name suggests Excel writer (creation rather than modification) funcionality.
xlrd on the other hand provides Excel reader funcionality.
If your source excel file is rather simple (no fancy graphs, pivot tables, etc.), you should proceed this way:
with xlrd module read the contents of the targeted excel file, and then with xlwt module create new excel file which contains the necessary rows.
If you, however are running this on windows platform , you might be able to manipulate Excel directly through Microsoft COM objects, see old book reference.
I was having the same issue but found a walk around:
Use a custom filter process (Reader>Filter1>Filter2>...>Writer) to generate a copy of the source excel file but with a blank column inserted at the front. Let's call this file augmented.xls.
Then, read augmented.xls into a xlrd.Workbook object, rb, using xlrd.open_workbook().
Use xlutils.copy.copy() to convert rb into a xlwt.Workbook object, wb.
Set the value of the first column of each of the to-be-deleted rows as "x" (or other values as a marker) in wb.
Save wb back to augmented.xls.
Use another custom filter process to generate a resulting excel file from augmented.xls by omitting those rows with "x" in the first column and shifting all columns one column left (equivalent to deleting the first column of markers).
Information and examples of defining a filter process can be found in http://www.simplistix.co.uk/presentations/python-excel.pdf
Hope this help in some way.
You can use the library openpyxl. When opening a file it is both for reading and for writing. Then, with a simple function you can achieve that:
from openpyxl import load_workbook
wb = load_workbook(filename)
ws = wb.active()
first_row = ws[1]
# Your code here using first_row
if first_row not valid:
ws.delete_rows(1, amount=1)

Categories