Pandas dataframe. Add an aditional row header merging all columns - python

I want to add a "second" header to my excel using pandas dataframe.
The excel has his values and header. But I want to add a new row above the header with just one column (the size of all columns header). And text centered.
Something like this:
How can I do this?

Use MultiIndex.from_product, but text is not centered:
df.columns = pd.MultiIndex.from_product([['Result'], df.columns])
EDIT:
import string
# Creating a DataFrame
df = pd.DataFrame(np.random.randn(8, 6), columns=list('ABCDEF'))
# Create a Pandas Excel writer using XlsxWriter engine.
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
# Create custom style
df.to_excel(writer, sheet_name='Sheet1', startrow=1, index=False)
# Get workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center'})
len_cols = len(df.columns)
#set merge_range by length of colums names
len_cols = len(df.columns)
worksheet.merge_range(0, 0, 0, len_cols - 1, 'Result', merge_format)
writer.save()

Related

How to apply a defined function of format to a dataframe

I have multiple data frames (df, df1, df2,...) I want to apply my defined format and then export them to Excel (Excel-file, Excel-file1, Excel-file2,...)
I think of creating a defined function of formatting and applying it to my data frames but I do not know how to do about this.
# Create a Pandas Excel writer using XlsxWriter
writer = pd.ExcelWriter(r'N:\Excel-file.xlsx', engine='xlsxwriter')
# Skip one row to insert a defined header, turn off the default header, and remove index
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# The xlsxwriter workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# The default format of the workbook
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
# Header format
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
# Write the column headers with the defined format
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Export the Excel file
writer.close()
The defined function looks like below
def format(df, file_name):
with pd.ExcelWriter(r'file_name.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
df_list = [df1, df2, df3, df4, df5, df6]
for i, df in enumerate(df_list):
format(df, f"Excel-file{i+1}.xlsx")
The error is the format function only is applied to the df6 and I got 1 Excel file named "file_name". Any way to fix this issue?
pandas.DataFrame.apply() iterates the dataframe, on which it is called, by rows (or by columns) and applies provided transformation and returns one result per row (or column). The transformation logic should consider this fact and should process it as if it is individual row (or column). Per your source code above, you seem to be applying the logic to entire dataframe (df1) on which apply() is called.
I assume your problem statement is that you have multiple data frames (df, df1, df2,...) and you want to export them to individual Excel files by applying some common transformation logic.
You can collect them into a list and process them individually in a loop. Since format() does not return any results (i.e. not transforms the list element), the old fashioned iteration using for loop should be the way to go about it. Also, consider using with syntax for auto file resource handling to avoid memory leak or orphan file handlers.
def format(df, file_name):
with pd.ExcelWriter(r'file_name.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
df_list = [df1, df2, df3, ...]
for i, df in enumerate(df_list):
format(df, f"Excel-file{i+1}.xlsx")

Python: xlsxwriter highlight cells by range without condition

I have a dataframe with 3 columns.
I like to highlight column a as orange, column b as green, column c as yellow but controlled by end of row.
using xlsxwriter I found examples for highlighting the entire column with ".add_format" but I didn't want the entire column to be highlighted.
How can I use xlsxwriter to highlight specific cells without using ".conditional_format"?
df = {'a': ['','',''],
'b':[1,2,2]
'c':[1,2,2]}
With xlsxwriter i am applying format using 2 different ways. Mainly with the function set_column (if you don't mind the format expanding until the end of the file) and using for loops if i do not want the format to expand until the end of the file (for example borderlines and background colors).
So this is how you can apply format to your dataframe:
import pandas as pd
# Create a test df
data = {'a': ['','',''], 'b': [1,2,2], 'c': [1,2,2]}
df = pd.DataFrame(data)
# Import the file through xlsxwriter
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the formats
format_orange = workbook.add_format({'bg_color': 'orange'})
format_green = workbook.add_format({'bg_color': 'green'})
format_bold = workbook.add_format({'bold': True, 'align': 'center'})
# Start iterating through the columns and the rows to apply the format
for row in range(df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_orange)
# Alternative syntax
#for row in range(df.shape[0]):
# worksheet.write(f'A{row+2}', df.iloc[row,0], format_orange)
for row in range(df.shape[0]):
worksheet.write(row+1, 1, df.iloc[row,1], format_green)
# Here you can use the faster set_column function as you do not apply color
worksheet.set_column('C:C', 15, format_bold)
# Finally write the file
writer.save()
Output:

How to apply Format object to index values in pandas

Question: I have a data frame with column names with respective values. But when i apply format object to column headings, they are not responding.
Code:
import pandas as pd
root = "C:\Users\543904\Desktop\New folder\"
dict = {'name':["aparna", "pankaj"],
'degree': ["MBA", "BCA"],
'score':[90, 40]}
df = pd.DataFrame(dict)
writer = pd.ExcelWriter(root + 'output', engine = "xlsxwriter")
df.to_excel(writer, sheet_name='df', index = False)
workbook = writer.book
worksheet = writer.sheets['df']
Format_Object = workbook.add_format({'text_wrap': True})
Format_Object.set_bold()
Format_Object.set_align('center')
Format_Object.set_align('top')
Format_Object.set_border(1)
Format_Object.set_bg_color('#0ef0ce')
worksheet.set_row(0, 20, Format_Object)
writer.save()
expected:
Expected
Actual:
Actual
This is explained in the XlsxWriter docs on Working with Python Pandas and XlsxWriter:
Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)

painting a cell in excel with condition using python

I am creating an excel report that should give me a result of automatic tests. It should say if they failed/ passed.
I have created the excel report from csv using this code:
import pandas as pd
import string
writer = pd.ExcelWriter("file.xlsx", engine="xlsxwriter")
df = pd.read_csv("K:\\results.csv")
df.to_excel(writer, sheet_name=os.path.basename("K:\\results.csv"))
# skip 2 rows
df.to_excel(writer, sheet_name='Sheet1', startrow=2, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'fg_color': '#ffcccc',
'border': 1})
# create dictionary for map length of columns
d = dict(zip(range(25), string.ascii_uppercase))
print (d)
max_len = d[len(df.columns) - 1]
print(max_len)
# C
# dynamically set merged columns in first row
worksheet.merge_range('A1:' + max_len + '1', 'This Sheet is for Personal Details')
for col_num, value in enumerate(df.columns.values):
# write to second row
worksheet.write(1, col_num, value, header_format)
column_len = df[value].astype(str).str.len().max()
column_len = max(column_len, len(value)) + 3
worksheet.set_column(col_num, col_num, column_len)
writer.save()
Now, if i have a cell that has the word" success" in it, i want to color it green, and if i have a cell in the excel which says "fail" in it i want to color it red. How can i access a specific cell in the excel file with the condition of whats written in it?
Thanks.
You could use a conditional format for this:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': ['success', 'bar', 'fail', 'foo', 'success']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a format for fail. Light red fill with dark red text.
fail_format = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
# Add a format for pass. Green fill with dark green text.
pass_format = workbook.add_format({'bg_color': '#C6EFCE',
'font_color': '#006100'})
# Apply conditional formats to the cell range.
worksheet.conditional_format('B2:B6', {'type': 'text',
'criteria': 'containing',
'value': 'fail',
'format': fail_format})
worksheet.conditional_format('B2:B6', {'type': 'text',
'criteria': 'containing',
'value': 'success',
'format': pass_format})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
See the XlsxWriter docs on Working with Conditional Formatting. Note, you can also use a numerical (row, col) range instead of the A1:D4 range, see the conditional_format().

Writing to a specific range/Column with Pandas

I'm attempting to copy from column Range AP:AR of workbook 1 to Range A:C of workbook 2 through Pandas data frames.
I have successfully read the data frame below in workbook 1, I then want to write this into workbook 2 of the specified range. So AP:AR to AQ:AS.
I have tried:
#df.to_excel(writer, 'AP')
I have also tried the following:
#df = pd.write_excel(filename, skiprows = 2, parse_cols = 'AP:AR')
pd.writer = pd.ExcelWriter('output.xlsx', columns = 'AP:AR')
pd.writer.save()
For example:
filename ='C:/ workbook 1.xlsx'
df = pd.read_excel(filename, skiprows = 2, parse_cols = 'A:C')
import pandas as pd
writer = pd.ExcelWriter('C:/DRAX/ workbook 2.xlsx')
df.to_excel(writer, 'AQ')
writer.save()
print(df)
It reads correctly, but writes to Cell column ‘B’ instead of AQ.
You have to specify the starting column you want to write the dataframe with the parameter startcol, which is an integer starting from 0:
So you should change the line
df.to_excel(writer, 'AQ')
to
df.to_excel(writer, startcol=42) # AQ has the index of 42
Results:

Categories