Custom Excel column using pandas [duplicate] - python

I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.
The code I have so far is simple enough. Say I have a dataframe called df:
writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")
I was looking over the pandas docs, and I don't really see any options to set column widths. Is there a trick to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?
(I am using the OpenPyXL library, and generating .xlsx files - if that makes any difference.)

Inspired by user6178746's answer, I have the following:
# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items(): # loop through `dict` of dataframes
df.to_excel(writer, sheet_name=sheetname) # send df to writer
worksheet = writer.sheets[sheetname] # pull worksheet object
for idx, col in enumerate(df): # loop through all columns
series = df[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx, idx, max_len) # set column width
writer.save()

Dynamically adjust all the column lengths
writer = pd.ExcelWriter('/path/to/output/file.xlsx')
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')
for column in df:
column_length = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)
writer.save()
Manually adjust a column using Column Name
col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
Manually adjust a column using Column Index
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
In case any of the above is failing with
AttributeError: 'Worksheet' object has no attribute 'set_column'
make sure to install xlsxwriter:
pip install xlsxwriter
For a more comprehensive explanation you can read the article How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter on TDS.

I'm posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.
One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)
import pandas as pd
import sqlalchemy as sa
import urllib
read_server = 'serverName'
read_database = 'databaseName'
read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)
#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)
#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')
#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)
#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']
#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
# find length of column i
column_len = my_dataframe[col].astype(str).str.len().max()
# Setting the length if the column header is larger
# than the max column value length
column_len = max(column_len, len(col)) + 2
# set the column length
worksheet.set_column(i, i, column_len)
writer.save()

There is a nice package that I started to use recently called StyleFrame.
it gets DataFrame and lets you to style it very easily...
by default the columns width is auto-adjusting.
for example:
from StyleFrame import StyleFrame
import pandas as pd
df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3],
'bbbbbbbbb': [1, 1, 1],
'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
columns_and_rows_to_freeze='B2')
excel_writer.save()
you can also change the columns width:
sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
width=35.3)
UPDATE 1
In version 1.4 best_fit argument was added to StyleFrame.to_excel.
See the documentation.
UPDATE 2
Here's a sample of code that works for StyleFrame 3.x.x
from styleframe import StyleFrame
import pandas as pd
columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
'aaaaaaaaaaa': [1, 2, 3, ],
'bbbbbbbbb': [1, 1, 1, ],
'ccccccccccc': [2, 3, 4, ],
}, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
excel_writer=excel_writer,
best_fit=columns,
columns_and_rows_to_freeze='B2',
row_to_add_filters=0,
)
excel_writer.save()

There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):
writer.sheets['Summary'].column_dimensions['A'].width = 15

By using pandas and xlsxwriter you can do your task, below code will perfectly work in Python 3.x. For more details on working with XlsxWriter with pandas this link might be useful https://xlsxwriter.readthedocs.io/working_with_pandas.html
import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

I found that it was more useful to adjust the column with based on the column header rather than column content.
Using df.columns.values.tolist() I generate a list of the column headers and use the lengths of these headers to determine the width of the columns.
See full code below:
import pandas as pd
import xlsxwriter
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)
workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet
header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)
writer.save() # Save the excel file

At work, I am always writing the dataframes to excel files. So instead of writing the same code over and over, I have created a modulus. Now I just import it and use it to write and formate the excel files. There is one downside though, it takes a long time if the dataframe is extra large.
So here is the code:
def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
out_path = os.path.join(output_dir, output_name)
writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
datetime_format='yyyymmdd', date_format='yyyymmdd')
workbook = writerReport.book
# loop through the list of dataframes to save every dataframe into a new sheet in the excel file
for i, dataframe in enumerate(dataframes_list):
sheet_name = sheet_names_list[i] # choose the sheet name from sheet_names_list
dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
# Add a header format.
format = workbook.add_format({
'bold': True,
'border': 1,
'fg_color': '#0000FF',
'font_color': 'white'})
# Write the column headers with the defined format.
worksheet = writerReport.sheets[sheet_name]
for col_num, col_name in enumerate(dataframe.columns.values):
worksheet.write(0, col_num, col_name, format)
worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
worksheet.freeze_panes(1, 0)
# loop through the columns in the dataframe to get the width of the column
for j, col in enumerate(dataframe.columns):
max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
# define a max width to not get to wide column
if max_width > 50:
max_width = 50
worksheet.set_column(j, j, max_width)
writerReport.save()
return output_dir + output_name

Combining the other answers and comments and also supporting multi-indices:
def autosize_excel_columns(worksheet, df):
autosize_excel_columns_df(worksheet, df.index.to_frame())
autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)
def autosize_excel_columns_df(worksheet, df, offset=0):
for idx, col in enumerate(df):
series = df[col]
max_len = max((
series.astype(str).map(len).max(),
len(str(series.name))
)) + 1
worksheet.set_column(idx+offset, idx+offset, max_len)
sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()

you can solve the problem by calling the following function, where df is the dataframe you want to get the sizes and the sheetname is the sheet in excel where you want the modifications to take place
def auto_width_columns(df, sheetname):
workbook = writer.book
worksheet= writer.sheets[sheetname]
for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
worksheet.set_column(i, i, column_len)

import re
import openpyxl
..
for col in _ws.columns:
max_lenght = 0
print(col[0])
col_name = re.findall('\w\d', str(col[0]))
col_name = col_name[0]
col_name = re.findall('\w', str(col_name))[0]
print(col_name)
for cell in col:
try:
if len(str(cell.value)) > max_lenght:
max_lenght = len(cell.value)
except:
pass
adjusted_width = (max_lenght+2)
_ws.column_dimensions[col_name].width = adjusted_width

Yes, there is there is something you can do subsequently to the xlsx file to adjust the column widths.
Use xlwings to autofit columns. It's a pretty simple solution, see the 6 last lines of the example code. The advantage of this procedure is that you don't have to worry about font size, font type or anything else.
Requirement: Excel installation.
import pandas as pd
import xlwings as xw
path = r"test.xlsx"
# Export your dataframe in question.
df = pd._testing.makeDataFrame()
df.to_excel(path)
# Autofit all columns with xlwings.
with xw.App(visible=False) as app:
wb = xw.Book(path)
for ws in wb.sheets:
ws.autofit(axis="columns")
wb.save(path)
wb.close()

Easiest solution is to specify width of column in set_column method.
for worksheet in writer.sheets.values():
worksheet.set_column(0,last_column_value, required_width_constant)

This function works for me, also fixes the index width
def write_to_excel(writer, X, sheet_name, sep_only=False):
#writer=writer object
#X=dataframe
#sheet_name=name of sheet
#sep_only=True:write only as separate excel file, False: write as sheet to the writer object
if sheet_name=="":
print("specify sheet_name!")
else:
X.to_excel(f"{output_folder}{prefix_excel_save}_{sheet_name}.xlsx")
if not sep_only:
X.to_excel(writer, sheet_name=sheet_name)
#fix column widths
worksheet = writer.sheets[sheet_name] # pull worksheet object
for idx, col in enumerate(X.columns): # loop through all columns
series = X[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx+1, idx+1, max_len) # set column width (=1 because index = 1)
#fix index width
max_len=pd.Series(X.index.values).astype(str).map(len).max()+1
worksheet.set_column(0, 0, max_len)
if sep_only:
print(f'{sheet_name} is written as seperate file')
else:
print(f'{sheet_name} is written as seperate file')
print(f'{sheet_name} is written as sheet')
return writer
call example:
writer = write_to_excel(writer, dataframe, "Statistical_Analysis")

I may be a bit late to the party but this code works when using 'openpyxl' as your engine, sometimes pip install xlsxwriter wont solve the issue. This code below works like a charm. Edit any part as you wish.
def text_length(text):
"""
Get the effective text length in characters, taking into account newlines
"""
if not text:
return 0
lines = text.split("\n")
return max(len(line) for line in lines)
def _to_str_for_length(v, decimals=3):
"""
Like str() but rounds decimals to predefined length
"""
if isinstance(v, float):
# Round to [decimal] places
return str(Decimal(v).quantize(Decimal('1.' + '0' * decimals)).normalize())
else:
return str(v)
def auto_adjust_xlsx_column_width(df, writer, sheet_name, margin=3, length_factor=1.0, decimals=3, index=False):
sheet = writer.sheets[sheet_name]
_to_str = functools.partial(_to_str_for_length, decimals=decimals)
# Compute & set column width for each column
for column_name in df.columns:
# Convert the value of the columns to string and select the
column_length = max(df[column_name].apply(_to_str).map(text_length).max(), text_length(column_name)) + 5
# Get index of column in XLSX
# Column index is +1 if we also export the index column
col_idx = df.columns.get_loc(column_name)
if index:
col_idx += 1
# Set width of column to (column_length + margin)
sheet.column_dimensions[openpyxl.utils.cell.get_column_letter(col_idx + 1)].width = column_length * length_factor + margin
# Compute column width of index column (if enabled)
if index: # If the index column is being exported
index_length = max(df.index.map(_to_str).map(text_length).max(), text_length(df.index.name))
sheet.column_dimensions["A"].width = index_length * length_factor + margin

An openpyxl version based on #alichaudry's code.
The code 1) loads an excel file, 2) adjusts column widths and 3) saves it.
def auto_adjust_column_widths(excel_file : "Excel File Path", extra_space = 1) -> None:
"""
Adjusts column widths of the excel file and replaces it with the adjusted one.
Adjusting columns is based on the lengths of columns values (including column names).
Parameters
----------
excel_file :
excel_file to adjust column widths.
extra_space :
extra column width in addition to the value-based-widths
"""
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
wb = load_workbook(excel_file)
for ws in wb:
df = pd.DataFrame(ws.values,)
for i,r in (df.astype(str).applymap(len).max(axis=0) + extra_space).iteritems():
ws.column_dimensions[get_column_letter(i+1)].width = r
wb.save(excel_file)

Related

How to read specif cell with pandas library?

I want to read from excel sheet a specific cell: h6. So I try it like this:
import pandas as pd
excel_file = './docs/fruit.xlsx'
df = pd.read_excel(excel_file,'Overzicht')
sheet = df.active
x1 = sheet['H6'].value
print(x1)
But then I get this error:
File "C:\Python310\lib\site-packages\pandas\core\generic.py", line 5575, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'active'
So my questiion is: How to read specif cell from sheet from excelsheet?
Thank you
Oke, I tried with openpyxl:
import openpyxl
path = "./docs/fruit.xlsx"
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
cell_obj = sheet_obj.cell(row = 6, column = 9)
print(cell_obj.value)
But then the formula is printed. Like this:
=(H6*1000)/F6/G6
and not the value: 93
You can do this using openpyxl directly or pandas (which internally uses openpyxl behind the scene)...
Using Openpyxl
You will need to use data_only=True when you open the file. Also, make sure you know the row and column number. To read the data in H6, row would be 6 and 8 would be H
import openpyxl
path = "./docs/Schoolfruit.xlsx"
wb_obj = openpyxl.load_workbook(path, data_only=True)
sheet_obj = wb_obj.active ## Or use sheet_obj = wb_obj['Sheet1'] if you know sheet name
val = sheet_obj.cell(row = 6, column = 8).value
print(val)
Using Pandas
The other option is to use pandas read_excel() which will read the whole sheet into a dataframe. You can use iloc() or at() to read the specific cell. Note that this is probably the less optimal solution if you need to read just one cell...
Another point to note here is that, once you have read the data into a dataframe, the row 1 will be considered as the header and the first row would now be 0. So the row number would be 4 instead of 6. Similarly, the first column would now be 0 and not 1, which would change the position to [4,7]
import pandas as pd
path = "./docs/Schoolfruit.xlsx"
df = pd.read_excel(path, 'Sheet1')
print(df.iloc[4,7])
I found a solution and hope, it works for you.
import pandas as pd
excel_file = './docs/Schoolfruit.xlsx'
df = pd.read_excel(excel_file, sheet_name='active' ,header=None, skiprows=1)
print(df[7][4])
7: Hth column
4: 6th row (skipped first row and index is began from 0)

Modifying the header format in Excel (Pandas output) after splitting files

I have a problem with modyfing excel output using pandas. I've just started to learn Python so probably my code is far from optimal.
I need to divide excel report into many files (based on representative ID), so there will be few thousand files after filtering one report.
The thing is that pandas has some default export style and I need to format it (i.e. change column width and format header a little bit to make it look nicer).
Splitting files worked well and suprisingly I even applied column width after iterating through excel files in the working directory.
I tried to use "worksheet.write" or "worksheet.set_column" but it returned an error.
Below please find my code, many thanks in advance for any suggestions:
import pandas as pd
from openpyxl.styles import Alignment, Border, Side, Font, Color, colors
from openpyxl import Workbook
import os
from openpyexcel import load_workbook
# define excel file path
excel_file_path = 'C:/Users/user/Desktop/python/test.xlsx'
df = pd.read_excel(excel_file_path)
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1',index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# inserting report code (used later in file name)
code = input('Please insert report code:')
writer.save()
# define variable to split files based on ID column
split_values = df['ID'].unique()
# splitting files
for value in split_values:
df1 = df[df['ID'] == value]
output_file_name = "Report_" + str(value) + "_" + str(code) + "_Example.xlsx"
df1.to_excel(output_file_name, index=False)
# define header format
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# add some cell formats
cell_format = workbook.add_format()
cell_format.set_font_name('Bodoni MT Black')
cell_format.set_font_size(15)
# iterate through current working directory and select appropriate sheets
path_to_xls = os.getcwd() # or r'<path>'
for xls in os.listdir(path_to_xls):
if xls.endswith(".xls") or xls.endswith(".xlsx"):
f = load_workbook(filename=xls)
sheet = f.active
# change column width - it suprisingly worked
sheet.column_dimensions['B'].width = 20
# tried different methods below but these don't work
#sheet.set_column('C:C', None, cell_format)
#sheet.write('A5', None, cell_format)
#sheet.set_row(0, None, cell_format)
#for col_num, value in enumerate(df.columns.values):
#sheet.write(0, col_num + 1, value, header_format)
f.save(xls)
f.close()
print('Done')

Color every other 2 columns of a dataframe into an excel?

I have a huge dataframe and I need to display it into an excel sheet such that every other 2 columns are colored except the 1st column.
For example:
If there are columns 1 to 100,
column 2,3 must be red
then 4,5 non colored
then 6,7 again red
then 8,9 non colored
and it goes on and on till last column of the dataframe.
In Excel, Selected the columns containing you data or the entire spreadsheet. Click Conditional formatting on the Home Ribbon. Click New Rule. Click Use a formula to determine which cells to format. In the formula box enter =OR(MOD(COLUMN(A1),4)=2,MOD(COLUMN(A1),4)=3). Click the Format button. Select the fill tab. Set the fill color to what you want. Hit OK a few times and you should be done.
This will fill in the cells that or equal to 2 or 3 mod 4.
I came with following solution:
import pandas as pd
import numpy as np
columns = 13
data = np.array([np.arange(10)]*columns).T
df = pd.DataFrame(data=data)
df = df.fillna(0) # with 0s rather than NaNs
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'bg_color': '#FFC7CE'})
for col in range(2, columns+1, 4):
worksheet.set_column(col, col + 1, cell_format=format1)
writer.save()
Iterate from index 2 (second col), until columns+1 (indexing comes from 1 in excel), color 2 cols at once and then move 4 indices further. The only problem here right now, it colors whole column (even not filled), I'll look for solution for that later.
Output:
You need to translate integer indices to excel-like labels with a function and use conditional_format in case you want to color only fields with text:
import pandas as pd
import numpy as np
columns = 13
data = np.array([np.arange(10)]*columns).T
df = pd.DataFrame(data=data)
df = df.fillna(0) # with 0s rather than NaNs
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'bg_color': '#FFC7CE'})
def colnum_string(n):
string = ""
n+=1 #just because we have index saved in first col
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
for col in range(2, columns+1, 4):
str1 = colnum_string(col)+"2" #ommiting header, 1 if header
str2 = colnum_string(col+1)+str(11) #number of rows+1 (header)
ids = str1+":"+str2
print(ids)
worksheet.conditional_format(ids, {'type': 'no_blanks',
'format': format1})
writer.save()
Output of the second code:
if you have poblem only with iteration than:
for i in range(2, columns, 4):
setColumnColor(i)
setColumnColor(i+1)
You start coloring from the 2nd column and coloring 2 columns at a time. Than iterating on the columns by steps of 4.
But if you have problems on finding a method to set dataframe colors than this is a thread for you: Colouring one column of pandas dataframe
You do it in Excel:
You start with converting a range to a table (Ctrl+T).
Then switch to the Design tab, remove a tick from Banded rows and select Banded columns instead.
right-click on the table styles
click on duplicates
click on the first column stripe
on the stripe size insert 2
note: you can click https://www.ablebits.com/office-addins-blog/2014/03/13/alternate-row-column-colors-excel/#alternating-row-tables

Export a DataFrame to Excel, values only without the formatting

I need to write a pandas DataFrame to an Excel worksheet. There are currencies, percentages and text. This script is expected to be run periodically, updating the data without changing the manually defined formatting.
pandas to_excel() seems hardcoded to force a specific format.
I created my own function to write a DataFrame to a file:
def write_sheet1(filename, data_ws, df, start_row=2, start_col=2):
"""Working for xlsx and xls only.
args:
start_row: df row +2; does not include header and is 1 based indexed.
"""
writer = pd.ExcelWriter(filename.lower(), engine='openpyxl')
import openpyxl
try:
wb = openpyxl.load_workbook(filename)
except FileNotFoundError:
wb = openpyxl.Workbook()
if data_ws not in wb.sheetnames:
wb.create_sheet(data_ws)
# Create the worksheet if it does not yet exist.
writer.book = wb
writer.sheets = {x.title: x for x in wb.worksheets}
ws = writer.sheets[data_ws]
# Fill with blanks.
for row in ws:
for cell in row:
cell.value = None
# Write manually to avoid overwriting formats.
# Column names.
ws.cell(1, 1).value = df.columns.name
for icol, col_name in zip(range(2, len(df.columns) + 2), df.columns):
ws.cell(1, icol).value = col_name
# Row headers.
for irow, row_name in zip(range(2, len(df.index) + 2), df.index):
ws.cell(irow, 1).value = row_name
# Body cells.
for row, irow in zip([x[1] for x in df.iloc[start_row - 2:].iterrows()], list(range(start_row, len(df.index) + 2))):
""""""
for cell, icol in zip([x[1] for x in row.iloc[start_col - 2:].items()], list(range(start_col, len(df.columns) + 2))):
""""""
ws.cell(irow, icol).value = cell # Skip the index.
for row in ws.values:
print('\t'.join(str(x or '') for x in row))
print('Saving.')
while True:
try:
writer.save()
break
except PermissionError:
print(f'Please close {filename} before we can write to it!')
time.sleep(2)
writer.close()
print('Done saving df.')
Result: It worked the first time, but subsequent runs appended the DataFrame to the existing data in the worksheet.
[EDIT: Actually, this first function gets the job done! The bug was upstream; the DataFrame itself had the appended data.]
I also tried monkey patching pandas:
def write_sheet2(filename, data_ws, df, start_row=2, start_col=2):
"""Write df values only to excel.
Monkey patch pandas' built-in .to_excel()
"""
def get_formatted_cells(self):
for cell in itertools.chain(self._format_header(), self._format_body()):
yield cell
from pandas.io.formats.excel import ExcelFormatter
ExcelFormatter.get_formatted_cells = get_formatted_cells
writer = pd.ExcelWriter(filename, engine='xlsxwriter', mode='w')
df.to_excel(writer, sheet_name=data_ws)
writer.save()
writer.close()
print('Done writing.')
Results: The same as the native .to_excel().
How do I write a general function to export the values only of a pandas DataFrame to an Excel file without modifying existing formatting?
Replacement for .to_excel(), writes values only:
def to_excel(filename, data_ws, df, start_row=2, start_col=2):
"""Replacement for pandas .to_excel().
For .xlsx and .xls formats only.
args:
start_row: df row +2; does not include header and is 1 based indexed.
"""
writer = pd.ExcelWriter(filename.lower(), engine='openpyxl')
import openpyxl
try:
wb = openpyxl.load_workbook(filename)
except FileNotFoundError:
wb = openpyxl.Workbook()
if data_ws not in wb.sheetnames:
wb.create_sheet(data_ws)
# Create the worksheet if it does not yet exist.
writer.book = wb
writer.sheets = {x.title: x for x in wb.worksheets}
ws = writer.sheets[data_ws]
# Fill with blanks.
try:
for row in ws:
for cell in row:
cell.value = None
except TypeError:
pass
# Write manually to avoid overwriting formats.
# Column names.
ws.cell(1, 1).value = df.columns.name
for icol, col_name in zip(range(2, len(df.columns) + 2), df.columns):
ws.cell(1, icol).value = col_name
# Row headers.
for irow, row_name in zip(range(2, len(df.index) + 2), df.index):
ws.cell(irow, 1).value = row_name
# Body cells.
for row, irow in zip([x[1] for x in df.iloc[start_row - 2:].iterrows()], list(range(start_row, len(df.index) + 2))):
for cell, icol in zip([x[1] for x in row.iloc[start_col - 2:].items()], list(range(start_col, len(df.columns) + 2))):
ws.cell(irow, icol).value = cell # Skip the index.
for row in ws.values:
print('\t'.join(str(x or '') for x in row))
print('Saving.')
while True:
try:
writer.save()
break
except PermissionError:
print(f'Please close {filename} before we can write to it!')
time.sleep(2)
writer.close()
print('Done saving df.')
If anyone comes to this question in the future, this code seems to work. It can be cleaned up a little though, possibly by converting the DataFrame to a list of lists, to avoid the DataFrame overhead and to avoid processing the column and index names separately.

Save Pandas DataFrames with formulas to xlsx files

In a Pandas DataFrame i have some "cells" with values and some that need to contain excel formulas. I have read that i can get formulas with
link = 'HYPERLINK("#Groups!A' + str(someInt) + '"; "LINKTEXT")'
xlwt.Formula(link)
and store them in the dataframe.
When i try to save my dataframe as an xlsx file with
writer = pd.ExcelWriter("pandas" + str(fileCounter) + ".xlsx", engine = "xlsxwriter")
df.to_excel(writer, sheet_name = "Paths", index = False)
# insert more sheets here
writer.save()
i get the error:
TypeError: Unsupported type <class 'xlwt.ExcelFormula.Formula'> in write()
So i tried to write my formula as a string to my dataframe but Excel wants to restore the file content and then fills all formula cells with 0's.
Edit: I managed to get it work with regular strings but nevertheless would be interested in a solution for xlwt formulas.
So my question is: How do i save dataframes with formulas to xlsx files?
Since you are using xlsxwriter, strings are parsed as formulas by default ("strings_to_formulas: Enable the worksheet.write() method to convert strings to formulas. The default is True"), so you can simply specify formulas as strings in your dataframe.
Example of a formula column which references other columns in your dataframe:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
writer = pd.ExcelWriter("foo.xlsx", engine="xlsxwriter")
df["product"] = None
df["product"] = (
'=INDIRECT("R[0]C[%s]", 0)+INDIRECT("R[0]C[%s]", 0)'
% (
df.columns.get_loc("col1") - df.columns.get_loc("product"),
df.columns.get_loc("col2") - df.columns.get_loc("product"),
)
)
df.to_excel(writer, index=False)
writer.save()
Produces the following output:
After writing the df using table.to_excel(writer, sheet_name=...), I use write_formula() as in this example (edited to add the full loop). To write all the formulas in your dataframe, read each formula in your dataframe.
# replace the right side below with reading the formula from your dataframe
# e.g., formula_to_write = df.loc(...)`
rows = table.shape[0]
for row_num in range(1 + startrow, rows + startrow + 1):
formula_to_write = '=I{} * (1 - AM{})'.format(row_num+1, row_num+1)
worksheet.write_formula(row_num, col, formula_to_write)`
Later in the code (I seem to recall one of these might be redundant, but I haven't looked it up):
writer.save() workbook.close()
Documentation is here.
you need to save in as usual just keep in mind to write the formula as string.
you can use also f strings with vars.
writer = pd.ExcelWriter(FILE_PATH ,mode='a', if_sheet_exists='overlay')
col_Q_index = 3
best_formula = f'=max(L1,N98,Q{col_Q_index})'
formula_df = pd.DataFrame([[best_formula]])
formula_df.to_excel(writer, sheet_name=SHEET_NAME, startrow=i, startcol=17, index=False, header=False)
writer.save()

Categories