I have a script that is using xlsxwriter to produce a workbook with about a dozen columns, all from a certain manipulated df.
I sort the df and add it to a table before exporting to Excel.
worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
It magically creates a table with alternate coloring (bands). I wish to control the formatting is the following fashion:
The df is sorted by a column called case_id (among other columns)
I wish to use no more than two bg colors
Every time the value in case_id changes, I wish to switch to the different color.
In other words - create bands by the value.
I thought about using conditional formatting but it's not quite what I need. I'm agnostic the the value... In pseudocode, it can be something like this:
create two variables, one for each desired formatting (format1, format2), and temp_format = format1
go over the worksheet, row by row
if value in case_id equals to case_id of the previous row, toggle temp_format to the other one.
set row format to temp_format
Implemented the pseudocode.
format1 = workbook.add_format({'bg_color': '#777CF4'})
format2 = workbook.add_format({'bg_color': '#3FCBFF'})
tmp_format = format1
tmp_val = 0
for i in range(0, max_row):
if df.loc[i]['chain_id'] != tmp_val:
tmp_format = format2 if tmp_format == format1 else format1
tmp_val = df.loc[i]['chain_id']
worksheet.set_row(i+1, None, tmp_format) #Because writer is +1 comapared to df, due to headers
Related
I have a xlsx file, for all of its sheet I need to change the formatting of the header row to apply background color to it.
But when I am adding fomratting to the row, instead of the columns which contain my data, it gets applied to all other empty columns.
Here is what I have tried :
for sheet_name in xlwriter.sheets:
ws = xlwriter.sheets[sheet_name]
ws.freeze_panes(1, 0) # Freeze the first row.
cell_format = workbook.add_format({'bg_color': 'yellow'})
cell_format.set_bold()
cell_format.set_font_color('red')
cell_format.set_border(1)
ws.set_row(0, cell_format = cell_format)
P.S : I have tried solution from other question which I was getting as suggestion for this question but none of that works for me.
I think it is better to loop through the column names and write each cell separately, instead of set_row:
for col, val in enumerate(df.columns):
ws.write(0, col, val, cell_format)
There is another option using styler, but I think there's a bug with the border.
I was using set_row to apply bg color formatting to a table given an "if" condition (described here). It colored the entire row while the table has 15 columns, so I came up with a walkaround (kudos to SO) of conditional formatting:
(max_row, max_col) = df.shape
format1 = workbook.add_format({"bg_color": "#FFFFFF"})
format2 = workbook.add_format({"bg_color": "#E4DFEC"})
tmp_format = format1
tmp_val = 0
for i in range(0, max_row):
if df.loc[i]["chain_id"] != tmp_val:
tmp_format = format2 if tmp_format == format1 else format1
tmp_val = df.loc[i]["chain_id"]
worksheet.conditional_format(
i + 1,
0,
i + 1,
max_col - 1,
{
"type": "formula",
"criteria": '=$A1<>"mustbeabetterway"',
"format": tmp_format,
},
)
Not only is it super inelegant, but it also creates thousands of conditional formatting that cause laggy Excel workbook.
There's must be a better way to color a row between column indexes.
There are some different ways on how to format the file, i have been using for loops (not ideal for very large dataframes but it can still get the job done). Basically what i was doing was to iterate through the rows and columns until the point that i wanted (usually the last row or last column) and apply the format to every cell using the worksheet's write method (for more info have a look here https://xlsxwriter.readthedocs.io/worksheet.html#worksheet-write ). You do not need conditional formatting unless you want to highlight different values with specific colors.
import pandas as pd
df = pd.DataFrame({'Column A': [1,2,3,4],
'Column B': ['a','b','c','d'],
'Column C': ['A','B','C','D']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define your formats
format_red = workbook.add_format({'bg_color': '#FFC7CE'})
format_yellow = workbook.add_format({'bg_color': '#FFEB9C', 'italic': True})
format_green = workbook.add_format({'bg_color': '#C6EFCE', 'bold': True})
# Format the entire first column until the dataframe'w last cell
for row in range(0, df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_red)
# Format the entire row from 2nd column until the dataframe's last column
for col in range(1, df.shape[1]):
worksheet.write(2, col, df.iloc[1,col], format_green)
# Format the entire row from 1st column until the dataframe's last column
for col in range(0, df.shape[1]):
worksheet.write(4, col, df.iloc[3,col], format_yellow)
writer.save()
Initial output:
Final output:
I have data in a pandas dataframe. I need different columns to have different alignment (some centered, some left-aligned). After I write in the column headers, I then set the formatting of each column using:
centered = workbook.add_format()
centered.set_align('center')
right = workbook.add_format()
right.set_align('right')
left = workbook.add_format()
left.set_align('left')
cols = len(df.columns))
worksheet.set_column(1, 1, cell_format = left)
worksheet.set_column(2, 2, cell_format = centered)
worksheet.set_column(3, 3, cell_format = right)
Then I loop through the rows in my df and write each row in using:
for index, row in df.iterrows():
worksheet.write_row(index, 1, list(row))
This gets me this result:
Now I want to highlight each row of data where the "Right" column has a value of 1.
To do this, I create a new format object and I modify the writing of data to:
highlight_format = workbook.add_format()
highlight_format.set_fg_color('#f0e5ec')
for index, row in df.iterrows():
if row['Right'] == 1:
worksheet.write_row(index, 1, list(row), cell_format = highlight_format)
else:
worksheet.write_row(index, 1, list(row))
This is the result:
It is as if the new format overwrites the old one. How can I get it so that the old format stays, and just the new formatting (just the highlight) is added?
This is the desired result:
If you superimpose a row and column format at runtime in Excel then it will create a third format that is a combination of the row and column formats and apply it to the cell in the intersection.
XlsxWriter doesn't automagically create and apply a new format like that so if you want cells formatted with the properties of two combined formats you will need to explicitly create and apply that format to the relevant cells. This will require a bit of logic and work but there is no simple workaround.
However, in this particular case you could apply a conditional format to highlight the rows based on the value in Column C (which seems to be what you are trying to do).
Update, here is an example with conditional formatting:
import random
import xlsxwriter
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
centered = workbook.add_format()
centered.set_align('center')
right = workbook.add_format()
right.set_align('right')
left = workbook.add_format()
left.set_align('left')
highlight_format = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
worksheet.set_column(0, 0, cell_format = left)
worksheet.set_column(1, 1, cell_format = centered)
worksheet.set_column(2, 2, cell_format = right)
# Simulate the data.
for row_num in range(9):
worksheet.write(row_num, 0, 'Data')
worksheet.write(row_num, 1, 'Data')
worksheet.write(row_num, 2, random.choice(('Y', 'N')))
# Add a conditional format.
worksheet.conditional_format(0, 0, 8, 2, {'type': 'formula',
'criteria': '=$C1="Y"',
'format': highlight_format})
workbook.close()
Output:
I'm trying to output a Pandas dataframe into an excel file using xlsxwriter. However I'm trying to apply some rule-based formatting; specifically trying to merge cells that have the same value, but having trouble coming up with how to write the loop. (New to Python here!)
See below for output vs output expected:
(As you can see based off the image above I'm trying to merge cells under the Name column when they have the same values).
Here is what I have thus far:
#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)
#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)
for row_index in range(1,len(car_report)):
if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name']
#find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
#from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)
Any help is greatly appreciated!
Thank you!
Your logic is almost correct, however i approached your problem through a slightly different approach:
1) Sort the column, make sure that all the values are grouped together.
2) Reset the index (using reset_index() and maybe pass the arg drop=True).
3) Then we have to capture the rows where the value is new. For that purpose create a list and add the first row 1 because we will start for sure from there.
4) Then start iterating over the rows of that list and check some conditions:
4a) If we only have one row with a value the merge_range method will give an error because it can not merge one cell. In that case we need to replace the merge_range with the write method.
4b) With this algorithm you 'll get an index error when trying to write the last value of the list (because it is comparing it with the value in the next index postion, and because it is the last value of the list there is not a next index position). So we need to specifically mention that if we get an index error (which means we are checking the last value) we want to merge or write until the last row of the dataframe.
4c) Finally i did not take into consideration if the column contains blank or null cells. In that case code needs to be adjusted.
Lastly code might look a bit confusing, you have to take in mind that the 1st row for pandas is 0 indexed (headers are separate) while for xlsxwriter headers are 0 indexed and the first row is indexed 1.
Here is a working example to achieve exactly what you want to do:
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
startCells.append(row)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
lastRow = len(df)
for row in startCells:
try:
endRow = startCells[startCells.index(row)+1]-1
if row == endRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
except IndexError:
if row == lastRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)
writer.save()
Output:
Alternate Approach:
One can use the unique() function to find the index assigned to each unique value (car name in this example). Using the above test data,
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
for car in df['Name'].unique():
# find indices and add one to account for header
u=df.loc[df['Name']==car].index.values + 1
if len(u) <2:
pass # do not merge cells if there is only one car name
else:
# merge cells using the first and last indices
worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()
I think this is a better answer to your problem
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Use the groupby() function to group the rows by 'Name'
grouped = df.groupby('Name')
# Use the first() function to find the first row of each group
first_rows = grouped.first()
# Create a new column 'start_row' that contains the index of the first row of each group
first_rows['start_row'] = first_rows.index.map(lambda x: (df['Name'] == x).idxmax())
# Create a new column 'end_row' that contains the index of the last row of each group
first_rows['end_row'] = grouped.last().index.map(lambda x: (df['Name'] == x).idxmax())
# Create an empty list to store the merge ranges
merge_ranges = []
# Iterate over the first_rows dataframe and add the merge ranges to the list
for index, row in first_rows.iterrows():
merge_ranges.append((row['start_row'], 0, row['end_row'], 0))
# Write the dataframe to an excel file and apply the merge ranges
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
for merge_range in merge_ranges:
worksheet.merge_range(*merge_range, "", worksheet.get_default_format())
writer.save()
Alternate Approach : Other than xlsxwriter you can also use a pivot table.
dataframe=pd.pivot_table(df,index=[column name...])
df.to_excel(dataframe)
Should "just work" with set_index() and to_excel()
my_index_cols = ['Name'] # this can also be a list of multiple columns
df.set_index(my_index_cols).to_excel('filename.xlsx', index=True, header=None)
see also: https://stackoverflow.com/a/68208815/2098573
I'm using python with the pandas and xlsxwriter modules to format an excel file that is generated dynamically. I need certain rows to be colored yellow, depending on the content of one column. Here is my code ("data" is a pandas DataFrame where each row represents one person). Look for the # comment that points you to the relevant code I'm talking about:
def format_excel(data):
writer = pd.ExcelWriter('InviteList.xlsx', engine='xlsxwriter')
data.style.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Styles
yellow = workbook.add_format({'bg_color': '#fdf2d0', 'border': 1, 'border_color': '#C0C0C0'})
# Hacky bypass of default header style that pandas imposes
for idx, val in enumerate(data.columns):
worksheet.write(0, idx, val)
# THIS IS THE RELEVANT CODE
for row, employee in data.iterrows():
if data.loc[row, 'rsvp'] == 'maybe':
worksheet.conditional_format(row + 1, 0, row + 1, 15, {'type': 'no_errors', 'format': yellow})
writer.save()
So basically the for loop checks to see if the row contains a 'maybe' in the 'rsvp' column and if so, uses the yellow formatting object on that row. This works fine, HOWEVER...
Let's say row 4 and 7 get colored yellow in my excel sheet. Now if I select a column and sort that column alphabetically or something like that, the yellow formatting STAYS in rows 4 and 7 instead of MOVING along with the content it needs to stay with.
So it looks like my implementation permanently locks rows 4 and 7 with yellow formatting, when what I need is for it to be more dynamic: it should stick with the rows containing a "maybe" in the 'rsvp' column, no matter where I move them.
EDIT:
Ok I fixed my issue by using worksheet.set_row instead of worksheet.conditional_format. But then I had an issue where the color would stick with the correct row but other formatting like font size and text wrap wouldn't, so I also had to include those in the definition of my yellow format object.
I think you don't need loop at the end.
Let's say your "maybe" criteria is inside the column A and the data you want to be colored yellow is in the column B.
You only need to write the conditional format without loop like this :
worksheet.conditional_format('B1:B5', {'type': 'formula',
'criteria': '=$A1="maybe"',
'format': yellow})