How to apply Format object to index values in pandas - python

Question: I have a data frame with column names with respective values. But when i apply format object to column headings, they are not responding.
Code:
import pandas as pd
root = "C:\Users\543904\Desktop\New folder\"
dict = {'name':["aparna", "pankaj"],
'degree': ["MBA", "BCA"],
'score':[90, 40]}
df = pd.DataFrame(dict)
writer = pd.ExcelWriter(root + 'output', engine = "xlsxwriter")
df.to_excel(writer, sheet_name='df', index = False)
workbook = writer.book
worksheet = writer.sheets['df']
Format_Object = workbook.add_format({'text_wrap': True})
Format_Object.set_bold()
Format_Object.set_align('center')
Format_Object.set_align('top')
Format_Object.set_border(1)
Format_Object.set_bg_color('#0ef0ce')
worksheet.set_row(0, 20, Format_Object)
writer.save()
expected:
Expected
Actual:
Actual

This is explained in the XlsxWriter docs on Working with Python Pandas and XlsxWriter:
Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)

Related

How to apply a defined function of format to a dataframe

I have multiple data frames (df, df1, df2,...) I want to apply my defined format and then export them to Excel (Excel-file, Excel-file1, Excel-file2,...)
I think of creating a defined function of formatting and applying it to my data frames but I do not know how to do about this.
# Create a Pandas Excel writer using XlsxWriter
writer = pd.ExcelWriter(r'N:\Excel-file.xlsx', engine='xlsxwriter')
# Skip one row to insert a defined header, turn off the default header, and remove index
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# The xlsxwriter workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# The default format of the workbook
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
# Header format
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
# Write the column headers with the defined format
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Export the Excel file
writer.close()
The defined function looks like below
def format(df, file_name):
with pd.ExcelWriter(r'file_name.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
df_list = [df1, df2, df3, df4, df5, df6]
for i, df in enumerate(df_list):
format(df, f"Excel-file{i+1}.xlsx")
The error is the format function only is applied to the df6 and I got 1 Excel file named "file_name". Any way to fix this issue?
pandas.DataFrame.apply() iterates the dataframe, on which it is called, by rows (or by columns) and applies provided transformation and returns one result per row (or column). The transformation logic should consider this fact and should process it as if it is individual row (or column). Per your source code above, you seem to be applying the logic to entire dataframe (df1) on which apply() is called.
I assume your problem statement is that you have multiple data frames (df, df1, df2,...) and you want to export them to individual Excel files by applying some common transformation logic.
You can collect them into a list and process them individually in a loop. Since format() does not return any results (i.e. not transforms the list element), the old fashioned iteration using for loop should be the way to go about it. Also, consider using with syntax for auto file resource handling to avoid memory leak or orphan file handlers.
def format(df, file_name):
with pd.ExcelWriter(r'file_name.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
df_list = [df1, df2, df3, ...]
for i, df in enumerate(df_list):
format(df, f"Excel-file{i+1}.xlsx")

Is there a way to set pandas to default YYY-MM-DD instead of YYY-MM-DD 00:00:00?

Can I set pandas to default YYYY-MM-DD, am getting YYYY-MM-DD 00:00:00 at the end? Is there a way to make sure by default that the zeros don't appear when I export to excel/csv?
Updated per comment request:
I have a function that looks like this:
x1 = my_funct('Unemployment', '2004-01-04 2009-01-04', 'DK', 'Unemployment (Denmark)')
Then I create a df out of it:
df1 = pd.DataFrame(x1)
along with others:
# this concats the df horizontally
df_merged1 = pd.concat([df1, df0, df2, df0, df3, df0, df4], axis=1)
df_merged1.reset_index(inplace=True)
Then I export that to excel:
writer = pd.ExcelWriter('Test1.xlsx', engine='xlsxwriter')
df_merged1.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df_merged1.columns.values):
worksheet.write(0, col_num, value, header_format)
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column(0, 11, 30, format)
writer.save()
writer.close()
The exported excel file has multiple date columns each one showing the extra 00:00:00 at the end. Is it possible to have it only as YYYY-MM-DD?
Thanks
The solution is to create a writer
Creating a tiny dataframe for test
import pandas as pd
from datetime import datetime
df = pd.DataFrame([datetime(2021, 3, 4, 20, 48, 5)])
This is the dataframe so far:
0
0 2021-03-04 20:48:05
Creating the Writer
writer = pd.ExcelWriter("exemple.xlsx", datetime_format='hh:mm:ss')
df.to_excel(writer, "Sheet1")
writer.close()
Note: I used hh:mm:ss but it could be any format.
If you need more details, see at ExcelWriter
The result would be:

pandas.ExcelWriter set_rotation does not rotate text [duplicate]

I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.
I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.
An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.
Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)
In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style
If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()
unfortunately add_format is in not avaiable anymore

painting a cell in excel with condition using python

I am creating an excel report that should give me a result of automatic tests. It should say if they failed/ passed.
I have created the excel report from csv using this code:
import pandas as pd
import string
writer = pd.ExcelWriter("file.xlsx", engine="xlsxwriter")
df = pd.read_csv("K:\\results.csv")
df.to_excel(writer, sheet_name=os.path.basename("K:\\results.csv"))
# skip 2 rows
df.to_excel(writer, sheet_name='Sheet1', startrow=2, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'fg_color': '#ffcccc',
'border': 1})
# create dictionary for map length of columns
d = dict(zip(range(25), string.ascii_uppercase))
print (d)
max_len = d[len(df.columns) - 1]
print(max_len)
# C
# dynamically set merged columns in first row
worksheet.merge_range('A1:' + max_len + '1', 'This Sheet is for Personal Details')
for col_num, value in enumerate(df.columns.values):
# write to second row
worksheet.write(1, col_num, value, header_format)
column_len = df[value].astype(str).str.len().max()
column_len = max(column_len, len(value)) + 3
worksheet.set_column(col_num, col_num, column_len)
writer.save()
Now, if i have a cell that has the word" success" in it, i want to color it green, and if i have a cell in the excel which says "fail" in it i want to color it red. How can i access a specific cell in the excel file with the condition of whats written in it?
Thanks.
You could use a conditional format for this:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': ['success', 'bar', 'fail', 'foo', 'success']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a format for fail. Light red fill with dark red text.
fail_format = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
# Add a format for pass. Green fill with dark green text.
pass_format = workbook.add_format({'bg_color': '#C6EFCE',
'font_color': '#006100'})
# Apply conditional formats to the cell range.
worksheet.conditional_format('B2:B6', {'type': 'text',
'criteria': 'containing',
'value': 'fail',
'format': fail_format})
worksheet.conditional_format('B2:B6', {'type': 'text',
'criteria': 'containing',
'value': 'success',
'format': pass_format})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
See the XlsxWriter docs on Working with Conditional Formatting. Note, you can also use a numerical (row, col) range instead of the A1:D4 range, see the conditional_format().

pandas xlsxwriter, format table header - not sheet header

I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.
I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.
An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.
Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)
In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style
If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()
unfortunately add_format is in not avaiable anymore

Categories