pandas xlsxwriter, format table header - not sheet header - python

I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.

I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.

An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.

Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None

for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)

In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style

If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()

unfortunately add_format is in not avaiable anymore

Related

To add number format to xlxs file using python

I have data like below in abc.xlxs
date,qty,price,profitprice,sellprice
20200501,11,900,,20
And using python I want output as:
data,qty,price,profitprice,sellprice
20200501,11.00,900.00,,20.00
Can any one help on this?
how can I read each column with its value and add number format and save to xlxs file?
Based on this answer by Akshit Khurana:
import pandas as pd
df = pd.read_excel("initial.xlsx")
writer = pd.ExcelWriter("formatted.xlsx", engine = "xlsxwriter")
df.to_excel(writer, index=False, header=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'num_format': '0.00'})
worksheet.set_column('C:E', None, format1) # Adds formatting to columns C-E
writer.save()
I believe the two other answers posted here do not work for the same reason why this question was asked.
You can use the dtype parameter at read_excel:
pd.read_excel('abc.xlxs', dtype={'profitprice': float, 'sellprice': float})

pandas.ExcelWriter set_rotation does not rotate text [duplicate]

I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.
I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.
An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.
Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)
In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style
If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()
unfortunately add_format is in not avaiable anymore

How to apply Format object to index values in pandas

Question: I have a data frame with column names with respective values. But when i apply format object to column headings, they are not responding.
Code:
import pandas as pd
root = "C:\Users\543904\Desktop\New folder\"
dict = {'name':["aparna", "pankaj"],
'degree': ["MBA", "BCA"],
'score':[90, 40]}
df = pd.DataFrame(dict)
writer = pd.ExcelWriter(root + 'output', engine = "xlsxwriter")
df.to_excel(writer, sheet_name='df', index = False)
workbook = writer.book
worksheet = writer.sheets['df']
Format_Object = workbook.add_format({'text_wrap': True})
Format_Object.set_bold()
Format_Object.set_align('center')
Format_Object.set_align('top')
Format_Object.set_border(1)
Format_Object.set_bg_color('#0ef0ce')
worksheet.set_row(0, 20, Format_Object)
writer.save()
expected:
Expected
Actual:
Actual
This is explained in the XlsxWriter docs on Working with Python Pandas and XlsxWriter:
Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)

Open existing workbook with ExcelWriter [duplicate]

I use pandas to write to excel file in the following fashion:
import pandas
writer = pandas.ExcelWriter('Masterfile.xlsx')
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
writer.save()
Masterfile.xlsx already consists of number of different tabs. However, it does not yet contain "Main".
Pandas correctly writes to "Main" sheet, unfortunately it also deletes all other tabs.
Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter gives a clue that something like this might work out:
import pandas
from openpyxl import load_workbook
book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
writer.save()
UPDATE: Starting from Pandas 1.3.0 the following function will not work properly, because functions DataFrame.to_excel() and pd.ExcelWriter() have been changed - a new if_sheet_exists parameter has been introduced, which has invalidated the function below.
Here you can find an updated version of the append_df_to_excel(), which is working for Pandas 1.3.0+.
Here is a helper function:
import os
from openpyxl import load_workbook
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
#param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
#param df: DataFrame to save to workbook
#param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
#param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
#param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
#param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
#return: None
Usage examples:
>>> append_df_to_excel('d:/temp/test.xlsx', df)
>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
# Excel file doesn't exist - saving and exiting
if not os.path.isfile(filename):
df.to_excel(
filename,
sheet_name=sheet_name,
startrow=startrow if startrow is not None else 0,
**to_excel_kwargs)
return
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
Tested with the following versions:
Pandas 1.2.3
Openpyxl 3.0.5
With openpyxlversion 2.4.0 and pandasversion 0.19.2, the process #ski came up with gets a bit simpler:
import pandas
from openpyxl import load_workbook
with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
writer.book = load_workbook('Masterfile.xlsx')
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!
Starting in pandas 0.24 you can simplify this with the mode keyword argument of ExcelWriter:
import pandas as pd
with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer:
data_filtered.to_excel(writer)
I know this is an older thread, but this is the first item you find when searching, and the above solutions don't work if you need to retain charts in a workbook that you already have created. In that case, xlwings is a better option - it allows you to write to the excel book and keeps the charts/chart data.
simple example:
import xlwings as xw
import pandas as pd
#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5
#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')
ws = wb.sheets['chartData']
ws.range('A1').options(index=False).value = df
wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')
xw.apps[0].quit()
Old question, but I am guessing some people still search for this - so...
I find this method nice because all worksheets are loaded into a dictionary of sheet name and dataframe pairs, created by pandas with the sheetname=None option. It is simple to add, delete or modify worksheets between reading the spreadsheet into the dict format and writing it back from the dict. For me the xlsxwriter works better than openpyxl for this particular task in terms of speed and format.
Note: future versions of pandas (0.21.0+) will change the "sheetname" parameter to "sheet_name".
# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
sheetname=None)
# all worksheets are accessible as dataframes.
# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']
# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df
# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe
# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
engine='xlsxwriter',
datetime_format='yyyy-mm-dd',
date_format='yyyy-mm-dd') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
For the example in the 2013 question:
ws_dict = pd.read_excel('Masterfile.xlsx',
sheetname=None)
ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]
with pd.ExcelWriter('Masterfile.xlsx',
engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
There is a better solution in pandas 0.24:
with pd.ExcelWriter(path, mode='a') as writer:
s.to_excel(writer, sheet_name='another sheet', index=False)
before:
after:
so upgrade your pandas now:
pip install --upgrade pandas
The solution of #MaxU is not working for the updated version of python and related packages. It raises the error:
"zipfile.BadZipFile: File is not a zip file"
I generated a new version of the function that works fine with the updated version of python and related packages and tested with python: 3.9 | openpyxl: 3.0.6 | pandas: 1.2.3
In addition I added more features to the helper function:
Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0
Here the function:
import pandas as pd
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
resizeColumns: default = True . It resize all columns based on cell content width
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''
Returns: None
*******************
CONTRIBUTION:
Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
Features of the new helper function:
1) Now it works with python 3.9 and latest versions of pandas and openpxl
---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
3) You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0
*******************
"""
from openpyxl import load_workbook
from string import ascii_uppercase
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
try:
f = open(filename)
# Do something with the file
except IOError:
# print("File not accessible")
wb = Workbook()
ws = wb.active
ws.title = sheet_name
wb.save(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
# startrow = -1
startrow = 0
if startcol is None:
startcol = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)
if resizeColumns:
ws = writer.book[sheet_name]
def auto_format_cell_width(ws):
for letter in range(1,ws.max_column):
maximum_value = 0
for cell in ws[get_column_letter(letter)]:
val_to_check = len(str(cell.value))
if val_to_check > maximum_value:
maximum_value = val_to_check
ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2
auto_format_cell_width(ws)
# save the workbook
writer.save()
Example Usage:
# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue'],
'NaNcheck': [float('NaN'), 1, float('NaN')],
})
# EDIT YOUR PATH FOR THE EXPORT
filename = r"C:\DataScience\df.xlsx"
# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN
def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
try:
master_book = load_workbook(master_file_path)
master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
master_writer.book = master_book
master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
header=None,
index_col=None)
current_frames.to_excel(master_writer, sheet_name, index=None, header=False)
master_writer.save()
except Exception as e:
raise e
This works perfectly fine only thing is that formatting of the master file(file to which we add new sheet) is lost.
writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)
The "keep_date_col" hope help you
I used the answer described here
from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()
book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()
Solution by #MaxU worked very well. I have just one suggestion:
If truncate_sheet=True is specified than "startrow" should NOT be retained from existing sheet. I suggest:
if startrow is None and sheet_name in writer.book.sheetnames:
if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
startrow = writer.book[sheet_name].max_row
I'd reccommend using xlwings (https://docs.xlwings.org/en/stable/api.html), it is really powerful for this application... This is how I use it:
import xlwings as xw
import pandas as pd
import xlsxwriter
# function to get the active workbook
def getActiveWorkbook():
try:
# logic from xlwings to grab the current excel file
activeWb = xw.books.active
except:
# print error message if unable to get the current workbook
print('Unable to grab the current Workbook')
pause()
exitProgram()
else:
return activeWb
# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
return str(lastRow), lastCol + str(lastRow)
activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])
# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')
This seems to work very well for my applications because .xlsm workbooks can be very tricky. You can execute this as a python script or turn it into and executable with pyinstaller and then run the .exe through an excel macro. You can also call VBA macros from Python using xlwings which is very useful.
You can write to an existing Excel file without overwriting data using pandas by using the pandas.DataFrame.to_excel() method and specifying the mode parameter as 'a' (append mode).
Here's an example:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')
Method:
Can create file if not present
Append to existing excel as per sheet name
import pandas as pd
from openpyxl import load_workbook
def write_to_excel(df, file):
try:
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, **kwds)
writer.save()
except FileNotFoundError as e:
df.to_excel(file, **kwds)
Usage:
df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])

xlsxwriter not applying format to header row of dataframe - Python Pandas

I am trying to take a dataframe and create a spreadsheet from that dataframe using the xlsxwriter
I am trying to do some formatting to the header row, but the only formatting that seems to be working on that row is for the row height. The exact same formatting options work on the other rows of the dataframe.
Please see code below..
The red color (and the height) is applied to all rows except the header row (row 2) - the red color is applied to both row 0 and row 3, but only the height is applied to row 2
Any help would be much appreciated
import numpy as np
import pandas as pd
from pandas.io.data import DataReader
from pandas import DataFrame
from IPython import display
import xlsxwriter
WorkBookName="test.xlsx"
df3=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
writer = pd.ExcelWriter(WorkBookName, engine='xlsxwriter')
df3.to_excel(writer, sheet_name="sheet",index=False,startrow=2)
workbook = writer.book
worksheet = writer.sheets["sheet"]
worksheet.write(0,0,"text string")
worksheet.write(0,1,"text string")
worksheet.write(0,2,"text string")
worksheet.write(0,3,"text string")
color_format = workbook.add_format({'color': 'red'})
worksheet.set_row(0,50,color_format)
worksheet.set_row(2,50,color_format)
worksheet.set_row(3,50,color_format)
writer.save()
display.FileLink(WorkBookName)
You are trying to change the formatting of the header so you should first reset the default header settings
pd.core.format.header_style = None
Then apply the formatting as required
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
here is complete working code
d=pd.DataFrame({'a':['a','a','b','b'],
'b':['a','b','c','d'],
'c':[1,2,3,4]})
d=d.groupby(['a','b']).sum()
pd.core.format.header_style = None
writer = pd.ExcelWriter('pandas_out.xlsx', engine='xlsxwriter')
workbook = writer.book
d.to_excel(writer, sheet_name='Sheet1')
worksheet = writer.sheets['Sheet1']
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
writer.save()
In case you have 0.22, you must do pd.io.formats.excel.header_style = None. Check this git page out.
As far as I've understood, Pandas sets the format of the index row. There are ways to reset it, but those solutions weren't very reliable. It was also quite hard to actually format it.
The accepted answer uses the same format for all cells, while I just wanted to format the index row.
I solved it by writing out the index columns with the desired format:
import pandas as pd
# The data that we're feeding to ExcelWriter
df = pd.DataFrame(
{
"Col A": ["a", "a", "b", "b"],
"Col B": ["a", "b", "c", "d"],
"Col C": [1, 2, 3, 4],
}
)
# The Excel file we're creating
writer = pd.ExcelWriter("pandas_out.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name="Sheet1", index=False) # Prevents Pandas from outputting an index
# The variables we'll use to do our modifications
workbook = writer.book
worksheet = writer.sheets["Sheet1"]
worksheet.set_row(0, 30) # Set index row height to 30
# Find more info here: https://xlsxwriter.readthedocs.io/format.html#format-methods-and-format-properties
header_format = workbook.add_format(
{
"bold": True,
"valign": "vcenter",
"align": "center",
"bg_color": "#d6d6d6",
"border": True,
}
)
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Set format of data
format1 = workbook.add_format({"align": "center"})
worksheet.set_column('A:Z', 10, format1) # Width of cell
writer.save()

Categories