I am trying to take a dataframe and create a spreadsheet from that dataframe using the xlsxwriter
I am trying to do some formatting to the header row, but the only formatting that seems to be working on that row is for the row height. The exact same formatting options work on the other rows of the dataframe.
Please see code below..
The red color (and the height) is applied to all rows except the header row (row 2) - the red color is applied to both row 0 and row 3, but only the height is applied to row 2
Any help would be much appreciated
import numpy as np
import pandas as pd
from pandas.io.data import DataReader
from pandas import DataFrame
from IPython import display
import xlsxwriter
WorkBookName="test.xlsx"
df3=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
writer = pd.ExcelWriter(WorkBookName, engine='xlsxwriter')
df3.to_excel(writer, sheet_name="sheet",index=False,startrow=2)
workbook = writer.book
worksheet = writer.sheets["sheet"]
worksheet.write(0,0,"text string")
worksheet.write(0,1,"text string")
worksheet.write(0,2,"text string")
worksheet.write(0,3,"text string")
color_format = workbook.add_format({'color': 'red'})
worksheet.set_row(0,50,color_format)
worksheet.set_row(2,50,color_format)
worksheet.set_row(3,50,color_format)
writer.save()
display.FileLink(WorkBookName)
You are trying to change the formatting of the header so you should first reset the default header settings
pd.core.format.header_style = None
Then apply the formatting as required
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
here is complete working code
d=pd.DataFrame({'a':['a','a','b','b'],
'b':['a','b','c','d'],
'c':[1,2,3,4]})
d=d.groupby(['a','b']).sum()
pd.core.format.header_style = None
writer = pd.ExcelWriter('pandas_out.xlsx', engine='xlsxwriter')
workbook = writer.book
d.to_excel(writer, sheet_name='Sheet1')
worksheet = writer.sheets['Sheet1']
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column('A:C',5, format)
writer.save()
In case you have 0.22, you must do pd.io.formats.excel.header_style = None. Check this git page out.
As far as I've understood, Pandas sets the format of the index row. There are ways to reset it, but those solutions weren't very reliable. It was also quite hard to actually format it.
The accepted answer uses the same format for all cells, while I just wanted to format the index row.
I solved it by writing out the index columns with the desired format:
import pandas as pd
# The data that we're feeding to ExcelWriter
df = pd.DataFrame(
{
"Col A": ["a", "a", "b", "b"],
"Col B": ["a", "b", "c", "d"],
"Col C": [1, 2, 3, 4],
}
)
# The Excel file we're creating
writer = pd.ExcelWriter("pandas_out.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name="Sheet1", index=False) # Prevents Pandas from outputting an index
# The variables we'll use to do our modifications
workbook = writer.book
worksheet = writer.sheets["Sheet1"]
worksheet.set_row(0, 30) # Set index row height to 30
# Find more info here: https://xlsxwriter.readthedocs.io/format.html#format-methods-and-format-properties
header_format = workbook.add_format(
{
"bold": True,
"valign": "vcenter",
"align": "center",
"bg_color": "#d6d6d6",
"border": True,
}
)
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Set format of data
format1 = workbook.add_format({"align": "center"})
worksheet.set_column('A:Z', 10, format1) # Width of cell
writer.save()
Related
I want to add a "second" header to my excel using pandas dataframe.
The excel has his values and header. But I want to add a new row above the header with just one column (the size of all columns header). And text centered.
Something like this:
How can I do this?
Use MultiIndex.from_product, but text is not centered:
df.columns = pd.MultiIndex.from_product([['Result'], df.columns])
EDIT:
import string
# Creating a DataFrame
df = pd.DataFrame(np.random.randn(8, 6), columns=list('ABCDEF'))
# Create a Pandas Excel writer using XlsxWriter engine.
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
# Create custom style
df.to_excel(writer, sheet_name='Sheet1', startrow=1, index=False)
# Get workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center'})
len_cols = len(df.columns)
#set merge_range by length of colums names
len_cols = len(df.columns)
worksheet.merge_range(0, 0, 0, len_cols - 1, 'Result', merge_format)
writer.save()
I have a dataframe with 3 columns.
I like to highlight column a as orange, column b as green, column c as yellow but controlled by end of row.
using xlsxwriter I found examples for highlighting the entire column with ".add_format" but I didn't want the entire column to be highlighted.
How can I use xlsxwriter to highlight specific cells without using ".conditional_format"?
df = {'a': ['','',''],
'b':[1,2,2]
'c':[1,2,2]}
With xlsxwriter i am applying format using 2 different ways. Mainly with the function set_column (if you don't mind the format expanding until the end of the file) and using for loops if i do not want the format to expand until the end of the file (for example borderlines and background colors).
So this is how you can apply format to your dataframe:
import pandas as pd
# Create a test df
data = {'a': ['','',''], 'b': [1,2,2], 'c': [1,2,2]}
df = pd.DataFrame(data)
# Import the file through xlsxwriter
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the formats
format_orange = workbook.add_format({'bg_color': 'orange'})
format_green = workbook.add_format({'bg_color': 'green'})
format_bold = workbook.add_format({'bold': True, 'align': 'center'})
# Start iterating through the columns and the rows to apply the format
for row in range(df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_orange)
# Alternative syntax
#for row in range(df.shape[0]):
# worksheet.write(f'A{row+2}', df.iloc[row,0], format_orange)
for row in range(df.shape[0]):
worksheet.write(row+1, 1, df.iloc[row,1], format_green)
# Here you can use the faster set_column function as you do not apply color
worksheet.set_column('C:C', 15, format_bold)
# Finally write the file
writer.save()
Output:
I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.
I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.
An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.
Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)
In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style
If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()
unfortunately add_format is in not avaiable anymore
Question: I have a data frame with column names with respective values. But when i apply format object to column headings, they are not responding.
Code:
import pandas as pd
root = "C:\Users\543904\Desktop\New folder\"
dict = {'name':["aparna", "pankaj"],
'degree': ["MBA", "BCA"],
'score':[90, 40]}
df = pd.DataFrame(dict)
writer = pd.ExcelWriter(root + 'output', engine = "xlsxwriter")
df.to_excel(writer, sheet_name='df', index = False)
workbook = writer.book
worksheet = writer.sheets['df']
Format_Object = workbook.add_format({'text_wrap': True})
Format_Object.set_bold()
Format_Object.set_align('center')
Format_Object.set_align('top')
Format_Object.set_border(1)
Format_Object.set_bg_color('#0ef0ce')
worksheet.set_row(0, 20, Format_Object)
writer.save()
expected:
Expected
Actual:
Actual
This is explained in the XlsxWriter docs on Working with Python Pandas and XlsxWriter:
Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.
I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.
An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.
Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)
In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style
If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()
unfortunately add_format is in not avaiable anymore