I have data in a pandas dataframe. I need different columns to have different alignment (some centered, some left-aligned). After I write in the column headers, I then set the formatting of each column using:
centered = workbook.add_format()
centered.set_align('center')
right = workbook.add_format()
right.set_align('right')
left = workbook.add_format()
left.set_align('left')
cols = len(df.columns))
worksheet.set_column(1, 1, cell_format = left)
worksheet.set_column(2, 2, cell_format = centered)
worksheet.set_column(3, 3, cell_format = right)
Then I loop through the rows in my df and write each row in using:
for index, row in df.iterrows():
worksheet.write_row(index, 1, list(row))
This gets me this result:
Now I want to highlight each row of data where the "Right" column has a value of 1.
To do this, I create a new format object and I modify the writing of data to:
highlight_format = workbook.add_format()
highlight_format.set_fg_color('#f0e5ec')
for index, row in df.iterrows():
if row['Right'] == 1:
worksheet.write_row(index, 1, list(row), cell_format = highlight_format)
else:
worksheet.write_row(index, 1, list(row))
This is the result:
It is as if the new format overwrites the old one. How can I get it so that the old format stays, and just the new formatting (just the highlight) is added?
This is the desired result:
If you superimpose a row and column format at runtime in Excel then it will create a third format that is a combination of the row and column formats and apply it to the cell in the intersection.
XlsxWriter doesn't automagically create and apply a new format like that so if you want cells formatted with the properties of two combined formats you will need to explicitly create and apply that format to the relevant cells. This will require a bit of logic and work but there is no simple workaround.
However, in this particular case you could apply a conditional format to highlight the rows based on the value in Column C (which seems to be what you are trying to do).
Update, here is an example with conditional formatting:
import random
import xlsxwriter
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
centered = workbook.add_format()
centered.set_align('center')
right = workbook.add_format()
right.set_align('right')
left = workbook.add_format()
left.set_align('left')
highlight_format = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
worksheet.set_column(0, 0, cell_format = left)
worksheet.set_column(1, 1, cell_format = centered)
worksheet.set_column(2, 2, cell_format = right)
# Simulate the data.
for row_num in range(9):
worksheet.write(row_num, 0, 'Data')
worksheet.write(row_num, 1, 'Data')
worksheet.write(row_num, 2, random.choice(('Y', 'N')))
# Add a conditional format.
worksheet.conditional_format(0, 0, 8, 2, {'type': 'formula',
'criteria': '=$C1="Y"',
'format': highlight_format})
workbook.close()
Output:
Related
I would like to automatically display a specific range of columns when opening an excel spreadsheet created with xlsxwriter.
This is illustrated in the example below where the first row and column are frozen while the range of columns being displayed start at E and not at B (but note that I don't want B:D to be hidden, I just want to start the range of column displayed at E)
Is this doable?
The XlsxWriter worksheet.freeze_panes() method has 2 optional parameters (top_row, left_column) which can be used to set the first cell in the non-frozen area. Like this:
import xlsxwriter
workbook = xlsxwriter.Workbook('panes.xlsx')
worksheet = workbook.add_worksheet()
side_format = workbook.add_format({'bold': True,
'fg_color': '#D7E4BC',
'border': 1})
worksheet.freeze_panes(0, 1, 0, 4)
# Some sample data.
for row_num in range(0, 50):
worksheet.write(row_num, 0, 'Frozen', side_format)
for col_num in range(1, 26):
worksheet.write(row_num, col_num, row_num + 1)
workbook.close()
Output:
I was using set_row to apply bg color formatting to a table given an "if" condition (described here). It colored the entire row while the table has 15 columns, so I came up with a walkaround (kudos to SO) of conditional formatting:
(max_row, max_col) = df.shape
format1 = workbook.add_format({"bg_color": "#FFFFFF"})
format2 = workbook.add_format({"bg_color": "#E4DFEC"})
tmp_format = format1
tmp_val = 0
for i in range(0, max_row):
if df.loc[i]["chain_id"] != tmp_val:
tmp_format = format2 if tmp_format == format1 else format1
tmp_val = df.loc[i]["chain_id"]
worksheet.conditional_format(
i + 1,
0,
i + 1,
max_col - 1,
{
"type": "formula",
"criteria": '=$A1<>"mustbeabetterway"',
"format": tmp_format,
},
)
Not only is it super inelegant, but it also creates thousands of conditional formatting that cause laggy Excel workbook.
There's must be a better way to color a row between column indexes.
There are some different ways on how to format the file, i have been using for loops (not ideal for very large dataframes but it can still get the job done). Basically what i was doing was to iterate through the rows and columns until the point that i wanted (usually the last row or last column) and apply the format to every cell using the worksheet's write method (for more info have a look here https://xlsxwriter.readthedocs.io/worksheet.html#worksheet-write ). You do not need conditional formatting unless you want to highlight different values with specific colors.
import pandas as pd
df = pd.DataFrame({'Column A': [1,2,3,4],
'Column B': ['a','b','c','d'],
'Column C': ['A','B','C','D']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define your formats
format_red = workbook.add_format({'bg_color': '#FFC7CE'})
format_yellow = workbook.add_format({'bg_color': '#FFEB9C', 'italic': True})
format_green = workbook.add_format({'bg_color': '#C6EFCE', 'bold': True})
# Format the entire first column until the dataframe'w last cell
for row in range(0, df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_red)
# Format the entire row from 2nd column until the dataframe's last column
for col in range(1, df.shape[1]):
worksheet.write(2, col, df.iloc[1,col], format_green)
# Format the entire row from 1st column until the dataframe's last column
for col in range(0, df.shape[1]):
worksheet.write(4, col, df.iloc[3,col], format_yellow)
writer.save()
Initial output:
Final output:
I have a script that is using xlsxwriter to produce a workbook with about a dozen columns, all from a certain manipulated df.
I sort the df and add it to a table before exporting to Excel.
worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
It magically creates a table with alternate coloring (bands). I wish to control the formatting is the following fashion:
The df is sorted by a column called case_id (among other columns)
I wish to use no more than two bg colors
Every time the value in case_id changes, I wish to switch to the different color.
In other words - create bands by the value.
I thought about using conditional formatting but it's not quite what I need. I'm agnostic the the value... In pseudocode, it can be something like this:
create two variables, one for each desired formatting (format1, format2), and temp_format = format1
go over the worksheet, row by row
if value in case_id equals to case_id of the previous row, toggle temp_format to the other one.
set row format to temp_format
Implemented the pseudocode.
format1 = workbook.add_format({'bg_color': '#777CF4'})
format2 = workbook.add_format({'bg_color': '#3FCBFF'})
tmp_format = format1
tmp_val = 0
for i in range(0, max_row):
if df.loc[i]['chain_id'] != tmp_val:
tmp_format = format2 if tmp_format == format1 else format1
tmp_val = df.loc[i]['chain_id']
worksheet.set_row(i+1, None, tmp_format) #Because writer is +1 comapared to df, due to headers
I'm trying to output a Pandas dataframe into an excel file using xlsxwriter. However I'm trying to apply some rule-based formatting; specifically trying to merge cells that have the same value, but having trouble coming up with how to write the loop. (New to Python here!)
See below for output vs output expected:
(As you can see based off the image above I'm trying to merge cells under the Name column when they have the same values).
Here is what I have thus far:
#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)
#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)
for row_index in range(1,len(car_report)):
if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name']
#find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
#from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)
Any help is greatly appreciated!
Thank you!
Your logic is almost correct, however i approached your problem through a slightly different approach:
1) Sort the column, make sure that all the values are grouped together.
2) Reset the index (using reset_index() and maybe pass the arg drop=True).
3) Then we have to capture the rows where the value is new. For that purpose create a list and add the first row 1 because we will start for sure from there.
4) Then start iterating over the rows of that list and check some conditions:
4a) If we only have one row with a value the merge_range method will give an error because it can not merge one cell. In that case we need to replace the merge_range with the write method.
4b) With this algorithm you 'll get an index error when trying to write the last value of the list (because it is comparing it with the value in the next index postion, and because it is the last value of the list there is not a next index position). So we need to specifically mention that if we get an index error (which means we are checking the last value) we want to merge or write until the last row of the dataframe.
4c) Finally i did not take into consideration if the column contains blank or null cells. In that case code needs to be adjusted.
Lastly code might look a bit confusing, you have to take in mind that the 1st row for pandas is 0 indexed (headers are separate) while for xlsxwriter headers are 0 indexed and the first row is indexed 1.
Here is a working example to achieve exactly what you want to do:
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
startCells.append(row)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
lastRow = len(df)
for row in startCells:
try:
endRow = startCells[startCells.index(row)+1]-1
if row == endRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
except IndexError:
if row == lastRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)
writer.save()
Output:
Alternate Approach:
One can use the unique() function to find the index assigned to each unique value (car name in this example). Using the above test data,
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
for car in df['Name'].unique():
# find indices and add one to account for header
u=df.loc[df['Name']==car].index.values + 1
if len(u) <2:
pass # do not merge cells if there is only one car name
else:
# merge cells using the first and last indices
worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()
I think this is a better answer to your problem
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Use the groupby() function to group the rows by 'Name'
grouped = df.groupby('Name')
# Use the first() function to find the first row of each group
first_rows = grouped.first()
# Create a new column 'start_row' that contains the index of the first row of each group
first_rows['start_row'] = first_rows.index.map(lambda x: (df['Name'] == x).idxmax())
# Create a new column 'end_row' that contains the index of the last row of each group
first_rows['end_row'] = grouped.last().index.map(lambda x: (df['Name'] == x).idxmax())
# Create an empty list to store the merge ranges
merge_ranges = []
# Iterate over the first_rows dataframe and add the merge ranges to the list
for index, row in first_rows.iterrows():
merge_ranges.append((row['start_row'], 0, row['end_row'], 0))
# Write the dataframe to an excel file and apply the merge ranges
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
for merge_range in merge_ranges:
worksheet.merge_range(*merge_range, "", worksheet.get_default_format())
writer.save()
Alternate Approach : Other than xlsxwriter you can also use a pivot table.
dataframe=pd.pivot_table(df,index=[column name...])
df.to_excel(dataframe)
Should "just work" with set_index() and to_excel()
my_index_cols = ['Name'] # this can also be a list of multiple columns
df.set_index(my_index_cols).to_excel('filename.xlsx', index=True, header=None)
see also: https://stackoverflow.com/a/68208815/2098573
I have a dataframe with 3 columns.
I like to highlight column a as orange, column b as green, column c as yellow but controlled by end of row.
using xlsxwriter I found examples for highlighting the entire column with ".add_format" but I didn't want the entire column to be highlighted.
How can I use xlsxwriter to highlight specific cells without using ".conditional_format"?
df = {'a': ['','',''],
'b':[1,2,2]
'c':[1,2,2]}
With xlsxwriter i am applying format using 2 different ways. Mainly with the function set_column (if you don't mind the format expanding until the end of the file) and using for loops if i do not want the format to expand until the end of the file (for example borderlines and background colors).
So this is how you can apply format to your dataframe:
import pandas as pd
# Create a test df
data = {'a': ['','',''], 'b': [1,2,2], 'c': [1,2,2]}
df = pd.DataFrame(data)
# Import the file through xlsxwriter
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the formats
format_orange = workbook.add_format({'bg_color': 'orange'})
format_green = workbook.add_format({'bg_color': 'green'})
format_bold = workbook.add_format({'bold': True, 'align': 'center'})
# Start iterating through the columns and the rows to apply the format
for row in range(df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_orange)
# Alternative syntax
#for row in range(df.shape[0]):
# worksheet.write(f'A{row+2}', df.iloc[row,0], format_orange)
for row in range(df.shape[0]):
worksheet.write(row+1, 1, df.iloc[row,1], format_green)
# Here you can use the faster set_column function as you do not apply color
worksheet.set_column('C:C', 15, format_bold)
# Finally write the file
writer.save()
Output: