Preserving features in xlsx file with python

Preserving features in xlsx file with python - python

I'm working in doing some work with python and excel. In this case I have to modify an .xlsx document and then save the document. But the issue is, the original document have especial format and style. I need to preserve the format after the work. This is some code I'm using.
import openpyxl as xl
*#open the file*
wb = xl.load_workbook("CR_Accounts_Dashboard_V4_20170127.xlsx")
*#...
#...
# Do some stuff
#...
#...*
*#save the file*
wb.save("CR_Accounts_Dashboard_V4_20170127.xls")
So after saving the file the original format and style are been removed.
This is one sheet in the original file
After working in the file and saving it
Here we have another example
This is another sheet in the original file
After working in the file and saving it
Does anyone have any idea about preserving the format and style?

Could not reproduce your problem
Give us your used Version of openpyxl
Try the following, without doing annything else, and verify if the problem persists:
wb = xl.load_workbook("CR_Accounts_Dashboard_V4_20170127.xlsx")
wb.save("CR_Accounts_Dashboard_V4_20170127.xls")
Tested with Python:3.4.2 - openpyxl:2.4.1 - LibreOffice: 4.3.3.2

It says quite clearly in the documentation that charts in existing files are not preserved
For the rest it looks like you may be using tables or even pivot tables. Support for worksheet tables with be in version 2.4.4 (you'll need to use a checkout until it's released) but pivot tables are not supported: it's a lot of work and so far no one has been prepared to sponsor the development.

Related

How to split an Excel workbook by worksheet while preserving grouping

I am doing some excel reports for work and am given a book exported from SSRS daily. The book is nicely set up, with groupings applied to every sheet for an effect similar to pivot tables.
However the book comes with 32 sheets, and I eventually need to send out each sheet individually as a distinct report. Right now I am splitting them up manually, but I am wondering if there is a way to automate this while preserving the grouping.
I previously tried something like:
import xlrd
import pandas as pd
targetWorkbook = xlrd.open_workbook(r'report.xlsx', on_demand=True)
xlsxDoc = pd.ExcelFile('report.xlsx')
for sheet in targetWorkbook.sheet_names():
reportDF = pd.read_excel(xlsxDoc, sheet)
reportDF.to_excel("report - {}.xlsx".format(sheet))
However since I'm converting each sheet to a pandas datagrams, the grouping is lost.
There are multiple ways to read/interact with excel docs in python, but I can't find a clear way to pick out a sheet and save it as its own document without losing the grouping.

This is my full answer. I have used the Worksheets().Move() method. The main idea is to use win32com.client library.
This was tested and works on my Windows 10 system with Excel 2013 installed, and Python 3.7. The grouping format was moved intact with the worksheets. I am still working on getting the looping to work. I will revise my answer again when I get the looping to work.
My example has 3 worksheets, each with different grouping (subtotal) formats.
#
# Refined .Move() method, save new file using Active Worksheet property.
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb0 = excel.Workbooks.Open(r'C:\python\so\original.xlsx')
excel.Visible = True
# Move sheet1.
wb0.Worksheets(1).Move()
excel.Application.ActiveWorkbook.SaveAs(r'C:\python\so\sheet1.xlsx')
# Move sheet2, which is now the front sheet.
wb0.Worksheets(1).Move()
excel.Application.ActiveWorkbook.SaveAs(r'C:\python\so\sheet2.xlsx')
# Save single remaining sheet as sheet3.
wb0.SaveAs(r'C:\python\so\sheet3.xlsx')
wb0.Close()
excel.Application.Quit()
You would also need to install pywin32, which is not a standard library item.
https://github.com/mhammond/pywin32
pip install pywin32

xlwt/xlutils.copy doesn't preserve cell (which I didn't touch) format

I am writing a python script to add a new sheet in a xls file, and I am using xlrd, xlutils.copy and xlwt to do it. Here is what my code looks like :
wb=xlwt.Workbook()
rb=xlrd.open_workbook(MY_FILE_PATH, formatting_info=True)
wb = copy(rb)
sht1 = wb.add_sheet('newSheet')
-- add some data
wb.save(MY_FILE_PATH)
The thing is, the formats for some cells which I didn't touch in the existing sheets (you can see I only add a new sheet) get changed. To be specific, I have two changes:
Some cells which originally have format as date (which by default
have format as yyyy/m/d)now have format as customized (and format
string as m/d/yy).
I lose all foramts I set in the conditional formatting.
Could someone tell me how can I preserve the format in the cells that I don't need to modify? I am using python 3.5.5 64 bits on windows and xlrd version 1.1.0, xlutils 2.0.0, xlwt 1.3.0 . Thank you very much!
update:
I did more test by changing the last call wb.save(MY_FILE_PATH) to wb.save(MY_FILE_PATH_2), i.e., I saved the file by a new name. I can see that only after the save call the file get changed (the original MY_FILE_PATH remained the same in this case). And actually the size of the newly-saved file was smaller than the original one, even though the new file had a sheet added. This suggests that in the save call some formatting information was lost. At least from what I can see the conditional formatting was lost which reduced the size (I assume the change of date format doesn't affect the file size too much).

Looks like xlrd doesn't support conditional formatting yet.
You can check out the error logs by passing verbosity=1 to open_workbook function.
rb=xlrd.open_workbook(MY_FILE_PATH, formatting_info=True, verbosity=1)
Alternatively, openpyxl seems to have support for "Conditional Formatting". Can check this package instead.

Excel formatting in python without loading workbook

I am trying to format an excel document within python that I am creating in the same script. All of the answers I have found have involved loading an existing workbook into python and formatting from there. In my script, I am currently writing the entire unformatted excel sheet, saving the file, then immediately reloading the document in to python to format. This is the only workaround I can find so that I can have an active sheet.
writer=pd.ExcelWriter(file_name, engine='openpyxl')
writer.save()#saving my file
wb=load_workbook(file_name) #reloading file to format
ws=wb.active
ws.column_dimensions['A'].width=33
ws.column_dimensions['B'].width=16
wb.save(file_name)
This works to change aspects such as column width, but I would like a way to format the page without saving and reloading. Is there a way to get around the need for an active sheet when there is no file_name written yet? I want a way to remove line 2 and 3, however that may be.

The object that Pandas is creating in ExcelWriter depends on the "engine" you give it. In this case, you're passing along "openpyxl", so ExcelWriter is making an openpyxl.Workbook() object. You can create a new Workbook in openpyxl using "Workbook()" Like so:
https://openpyxl.readthedocs.io/en/default/tutorial.html#create-a-workbook
It is created with 1 active sheet. Basically:
import openpyxl
wb = openpyxl.Workbook()
ws=wb.active
ws.column_dimensions['A'].width=33
ws.column_dimensions['B'].width=16
wb.save(file_name)
...would do the job

Your title is misleading: you're working in Pandas and dumping to Excel. Pandas does allow some formatting for this but, because it tries to support different Python libraries (openpyxl, xlsxwriter and xlwt) there are restrictions on this.
For full control openpyxl provides support for Pandas' DataFrame objects: http://openpyxl.readthedocs.io/en/latest/pandas.html

Calculating Excel sheets without opening them (openpyxl or xlwt)

I made a script that opens a .xls file, writes a few new values in it, then saves the file.
Later, the script opens it again, and wants to find the answers in some cells which contain formulas.
If I call that cell with openpyxl, I get the formula (ie: "=A1*B1").
And if I activate data_only, I get nothing.
Is there a way to let Python calculate the .xls file? (or should I try PyXll?)

I realize this question is old, but I ran into the same problem and extensive searching didn't produce an answer.
The solution is in fact quite simple so I will post it here for posterity.
Let's assume you have an xlsx file that you have modified with openpyxl. As Charlie Clark mentioned openpyxl will not calculate the formulas, but if you were to open the file in excel the formulas would be automatically calculated. So all you need to do is open the file and then save it using excel.
To do this you can use the win32com module.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'absolute/path/to/your/file')
# this must be the absolute path (r'C:/abc/def/ghi')
workbook.Save()
workbook.Close()
excel.Quit()
That's it. I've seen all these suggestions to use Pycel or Koala, but that seems like a bit of overkill if all you need to do is tell excel to open and save.
Granted this solution is only for windows.

There is actually a project that takes Excel formulas and evaluates them using Python: Pycel. Pycel uses Excel itself (via COM) to extract the formulas, so in your case you would skip that part. The project probably has something useful that you can use, but I can't vouch for its maturity or completeness. It was not really developed for the general public.
There is also a newer project called Koala which builds on both Pycel and OpenPyXL.
Another approach, if you can't use Excel but you can calculate the results of the formulas yourself (in your Python code), is to write both the value and the formula into a cell (so that when you read the file, you can just pull the value, and not worry about the formula at all). As of this writing, I haven't found a way to do it in OpenPyXL, but XlsxWriter can do it. From the documentation:
XlsxWriter doesn’t calculate the value of a formula and instead stores the value 0 as the formula result. It then sets a global flag in the XLSX file to say that all formulas and functions should be recalculated when the file is opened. This is the method recommended in the Excel documentation and in general it works fine with spreadsheet applications. However, applications that don’t have a facility to calculate formulas, such as Excel Viewer, or some mobile applications will only display the 0 results.
If required, it is also possible to specify the calculated result of the formula using the options value parameter. This is occasionally necessary when working with non-Excel applications that don’t calculate the value of the formula. The calculated value is added at the end of the argument list:
worksheet.write_formula('A1', '=2+2', num_format, 4)
With this approach, when it's time to read the value, you would use OpenPyXL's data_only option. (For other people reading this answer: If you use xlrd, then only the value is available anyway.)
Finally, if you do have Excel, then perhaps the most straightforward and reliable thing you can do is automate the opening and resaving of your file in Excel (so that it will calculate and write the values of the formulas for you). xlwings is an easy way to do this from either Windows or Mac.

The formula module works for me. For detail please refer to https://pypi.org/project/formulas/
from openpyxl import load_workbook
import formulas
#The variable spreadsheet provides the full path with filename to the excel spreadsheet with unevaluated formulae
fpath = path.basename(spreadsheet)
dirname = path.dirname(spreadsheet)
xl_model = formulas.ExcelModel().loads(fpath).finish()
xl_model.calculate()
xl_model.write(dirpath=dirname)
#Use openpyxl to open the updated excel spreadsheet now
wb = load_workbook(filename=spreadsheet,data_only=True)
ws = wb.active

I run into the same problem, and after some time researching I ended up using pyoo ( https://pypi.org/project/pyoo/ ) which is for openoffice/libreoffice so available in all platforms and is more straightforward since communicates natively and doesn't require to save/close the file . I tried several other libraries but found the following problems
xlswings: Only works if you have Excel installed and Windows/MacOS so I couldn't evaluate
koala : Seems that it's broken, after networkx 2.4 update.
openpyxl: As pointed out by others, it isn't able to calculate formulas so I was looking into combining it with pycel to get values. I didn 't finally tried because I found pyoo . Openpyxl+pycel might not work as of now, since pycel is also relying on networkx library.

No, and in openpyxl there will never be. I think there is a Python library that purports to implements an engine for such formualae which you can use.

xlcalculator can do this job. https://github.com/bradbase/xlcalculator
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
filename = r'use_case_01.xlsm'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
# First!A2
# value is 0.1
#
# Fourth!A2
# formula is =SUM(First!A2+1)
val1 = evaluator.evaluate('Fourth!A2')
print("value 'evaluated' for Fourth!A2:", val1)
evaluator.set_cell_value('First!A2', 88)
# now First!A2 value is 88
val2 = evaluator.evaluate('Fourth!A2')
print("New value for Fourth!A2 is", val2)
Which results in the following output;
file_name use_case_01.xlsm ignore_sheets []
value 'evaluated' for Fourth!A2: 1.1
New value for Fourth!A2 is 89

Is there any way to edit an existing Excel file using Python preserving formulae?

I am trying to edit several excel files (.xls) without changing the rest of the sheet. The only thing close so far that I've found is the xlrd, xlwt, and xlutils modules. The problem with these is it seems that xlrd evaluates formulae when reading, then puts the answer as the value of the cell. Does anybody know of a way to preserve the formulae so I can then use xlwt to write to the file without losing them? I have most of my experience in Python and CLISP, but could pick up another language pretty quick if they have better support. Thanks for any help you can give!

I had the same problem... And eventually found the next module:
from openpyxl import load_workbook
def Write_Workbook():
wb = load_workbook(path)
ws = wb.get_sheet_by_name("Sheet_name")
c = ws.cell(row = 2, column = 1)
c.value = Some_value
wb.save(path)
==> Doing this, my file got saved preserving all formulas inserted before.
Hope this helps!

I've used the xlwt.Formula function before to be able to get hyperlinks into a cell. I imagine it will also work with other formulas.
Update: Here's a snippet I found in a project I used it in:
link = xlwt.Formula('HYPERLINK("%s";"View Details")' % url)
sheet.write(row, col, link)

As of now, xlrd doesn't read formulas. It's not that it evaluates them, it simply doesn't read them.
For now, your best bet is to programmatically control a running instance of Excel, either via pywin32 or Visual Basic or VBScript (or some other Microsoft-friendly language which has a COM interface). If you can't run Excel, then you may be able to do something analogous with OpenOffice.org instead.

We've just had this problem and the best we can do is to manually re-write the formulas as text, then convert them to proper formulas on output.
So open Excel and replace =SUM(C5:L5) with "=SUM(C5:L5)" including the quotes. If you have a double quote in your formula, replace it with 2 double quotes, as this will escape it, so = "a" & "b" becomes "= ""a"" & ""b"" ")
Then in your Python code, loop over every cell in the source and output sheets and do:
output_sheet.write(row, col, xlwt.ExcelFormula.Formula(source_cell[1:-1]))
We use this SO answer to make a copy of the source sheet to be the output sheet, which even preserves styles, and avoids overwriting the hand written text formulas from above.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.