Xlsxwriter change format colour based on if statement - python

I have a python script that creates a report.
I would like to apply colour a to row a and then colour b to row b, ...
I could create a variable formatA = workbook.add_format({'bg_color': 'red'}) and formatB = workbook.add_format({'bg_color': 'blue'}) and so on.
However, I think it would be easier to do format = workbook.add_format({'bg_color': 'red'}) and then when I loop through the rows I want to use, do format.set_bg_color('blue') and apply it.
However, it does not change the background colour and only applies blue to every row.

The Modifying Formats section of the xlsxwriter documentation page on formats explains this problem very well (link here). Essentially, it explains that it is not possible to use a Format and then redefine it to use at a later as a Format is applied to cells in their final state.
Perhaps consider, as an alternative, creating a function that creates a format. You can then call this function as needed when you want to change cell color or some other format you add to that function.
I've provided an example below of how you might do this.
import pandas as pd
df = pd.DataFrame({'col_a': ['a','b','c'],
'col_b': [1,2,3]})
def format_function(bg_color = 'red'):
global format
format=workbook.add_format({
'bg_color' : bg_color
})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook=writer.book
worksheet = writer.sheets['Sheet1']
# Setting the format but not setting the row height.
format_function(bg_color = 'red')
worksheet.set_row(1, None, format)
format_function(bg_color = 'blue')
worksheet.set_row(2, None, format)
format_function(bg_color = 'pink')
worksheet.set_row(3, None, format)
writer.save()
Expected Output:

Related

xlsxwriter - Modifying formatting whenever value changes

I have a script that is using xlsxwriter to produce a workbook with about a dozen columns, all from a certain manipulated df.
I sort the df and add it to a table before exporting to Excel.
worksheet.add_table(0, 0, max_row, max_col - 1, {'columns': column_settings})
It magically creates a table with alternate coloring (bands). I wish to control the formatting is the following fashion:
The df is sorted by a column called case_id (among other columns)
I wish to use no more than two bg colors
Every time the value in case_id changes, I wish to switch to the different color.
In other words - create bands by the value.
I thought about using conditional formatting but it's not quite what I need. I'm agnostic the the value... In pseudocode, it can be something like this:
create two variables, one for each desired formatting (format1, format2), and temp_format = format1
go over the worksheet, row by row
if value in case_id equals to case_id of the previous row, toggle temp_format to the other one.
set row format to temp_format
Implemented the pseudocode.
format1 = workbook.add_format({'bg_color': '#777CF4'})
format2 = workbook.add_format({'bg_color': '#3FCBFF'})
tmp_format = format1
tmp_val = 0
for i in range(0, max_row):
if df.loc[i]['chain_id'] != tmp_val:
tmp_format = format2 if tmp_format == format1 else format1
tmp_val = df.loc[i]['chain_id']
worksheet.set_row(i+1, None, tmp_format) #Because writer is +1 comapared to df, due to headers

Python: xlsxwriter highlight cells by range without condition

I have a dataframe with 3 columns.
I like to highlight column a as orange, column b as green, column c as yellow but controlled by end of row.
using xlsxwriter I found examples for highlighting the entire column with ".add_format" but I didn't want the entire column to be highlighted.
How can I use xlsxwriter to highlight specific cells without using ".conditional_format"?
df = {'a': ['','',''],
'b':[1,2,2]
'c':[1,2,2]}
With xlsxwriter i am applying format using 2 different ways. Mainly with the function set_column (if you don't mind the format expanding until the end of the file) and using for loops if i do not want the format to expand until the end of the file (for example borderlines and background colors).
So this is how you can apply format to your dataframe:
import pandas as pd
# Create a test df
data = {'a': ['','',''], 'b': [1,2,2], 'c': [1,2,2]}
df = pd.DataFrame(data)
# Import the file through xlsxwriter
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the formats
format_orange = workbook.add_format({'bg_color': 'orange'})
format_green = workbook.add_format({'bg_color': 'green'})
format_bold = workbook.add_format({'bold': True, 'align': 'center'})
# Start iterating through the columns and the rows to apply the format
for row in range(df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_orange)
# Alternative syntax
#for row in range(df.shape[0]):
# worksheet.write(f'A{row+2}', df.iloc[row,0], format_orange)
for row in range(df.shape[0]):
worksheet.write(row+1, 1, df.iloc[row,1], format_green)
# Here you can use the faster set_column function as you do not apply color
worksheet.set_column('C:C', 15, format_bold)
# Finally write the file
writer.save()
Output:

Why does 'vertical-align: middle' work for OpenPyXL, but not for XlsxWriter engine in to_excel() when applied as style?

I couldn't find a matching issue on the pandas GitHub issue tracker regarding my problem, and before opening one, I just wanted to make sure, that I don't miss any (obvious) mistake on my side. Most of the Q&As here and elsewhere focus on headers, and the specific problems within that scope.
So, basically I want to use pandas styling, and specifically export to Excel. As to the linked documentation, vertical-align is supported, and according to this test for the to_excel() function, the correct usage for centred vertical alignment is 'vertical-align: middle'.
If I use the OpenPyXL engine in to_excel(), that actually works pretty well, see the following (not so minimal) example:
import pandas as pd
def align(data):
return pd.DataFrame('text-align: center', index=data.index, columns=data.columns)
def valign(data):
return pd.DataFrame('vertical-align: middle', index=data.index, columns=data.columns)
def whitespace(data):
return pd.DataFrame('white-space: normal', index=data.index, columns=data.columns)
d = {'col1': ['first\nsecond', 'only one'], 'col2': ['only one', 'first\nsecond']}
df = pd.DataFrame(data=d)
df = df.style.apply(align, axis=None).apply(valign, axis=None).apply(whitespace, axis=None)
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
df.to_excel(writer, sheet_name='Test sheet')
I apply three different styles:
centred horizontal alignment; just as a general test, that the styling works
centred vertical alignment; as described above
'whitespace: normal'; as this is necessary to have proper "expanded" cells in Excel along with the centred vertical alignment (please see the below images to understand, what I mean)
The output for the OpenPyXL engine, as stated, is fine:
Now, let's switch the engine to XlsxWriter:
with pd.ExcelWriter('test.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Test sheet')
The result is the following:
As you can see, the vertical alignment is set to bottom. Nevertheless, the horizontal alignment as well as the whitespace/wrapping is correctly set. Also, 'vertical-align: top' works as expected. It seems, that specifically 'vertical-align: middle' doesn't work properly with the XlsxWriter engine.
Is there anything I can further test – or is this just a plain bug in pandas?
Some further remarks:
The OpenPyXL engine works, of course. But, I would like to use the XlsxWriter engine, since the post-processing – from my point of view – becomes easier.
I could manually generate the whole Excel document with XlsxWriter, that's true. But, the usage/maintainance of the final code must be as easy as possible, since co-workers using this code aren't expected to be Python/pandas experts.
If that's an actual bug, it should be fixed nonetheless. :-)
EDIT: I should've mentioned that in the first place; so here's an answer to Hans' comment: I can explicitly generate the correct output solely using the XlsxWriter engine, like this:
import pandas as pd
d = {'col1': ['first\nsecond', 'only one'], 'col2': ['only one', 'first\nsecond']}
df = pd.DataFrame(data=d)
with pd.ExcelWriter('test.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Test sheet')
workbook = writer.book
worksheet = writer.sheets['Test sheet']
fmt = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'text_wrap': True})
worksheet.set_column('B:C', None, fmt)
Notice: There's no df.style.apply!
The problem here is, I specifically need another, non-trivial pandas style which changes the background color with respect to some constraints, which can't be attacked with (XlsxWriter's) conditional formatting – or at least as not as easy as with the pandas style. Therefore, all cells already have a style, which then can't be overriden later, cf. the following code:
import pandas as pd
def whitespace(data):
return pd.DataFrame('white-space: normal', index=data.index, columns=data.columns)
d = {'col1': ['first\nsecond', 'only one'], 'col2': ['only one', 'first\nsecond']}
df = pd.DataFrame(data=d)
df = df.style.apply(whitespace, axis=None)
with pd.ExcelWriter('test.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Test sheet')
workbook = writer.book
worksheet = writer.sheets['Test sheet']
fmt = workbook.add_format({'align': 'center', 'valign': 'vcenter'})
worksheet.set_column('B:C', None, fmt)
As you can see, the wrapping from the pandas style is applied, whereas the horizontal and vertical alignments from the XlsxWriter formatting are ignored.

Xlsxwriter: format three cell ranges in same worksheet

I would like to format A1:E14 as US Dollars, F1:K14 as percentages and A15:Z1000 as US Dollars. Is there a way to do this in XlsxWriter?
I know how to format full columns as Dollars/Percentages, but I don't know how to format parts of columns -- whatever I do last will overwrite Columns F:K.
Data is starting in pandas so happy to solve the problem there. The following does not seem to work:
sheet.set_column('A1:E14', None, money_format)
More Code:
with pd.ExcelWriter(write_path) as writer:
book = writer.book
money_fmt = book.add_format({'num_format': '$#,##0'})
pct_fmt = book.add_format({'num_format': '0.00%'})
# call func that creates a worksheet named total with no format
df.to_excel(writer, sheet_name='Total', startrow=0)
other_df.to_excel(writer, sheet_name='Total', startrow=15)
writer.sheets['Total'].set_column('A1:E14',20, money_fmt)
writer.sheets['Total'].set_column('F1:K14',20, pct_fmt)
writer.sheets['Total'].set_column('F15:Z1000', 20, money_fmt)
I cannot see a way to achieve per cell formatting using just xlsxwriter with Pandas, but it would be possible to apply the formatting in a separate step using openpyxl as follows:
import openpyxl
def write_format(ws, cell_range, format):
for row in ws[cell_range]:
for cell in row:
cell.number_format = format
sheet_name = "Total"
with pd.ExcelWriter(write_path) as writer:
write_worksheet(df, writer, sheet_name=sheet_name)
wb = openpyxl.load_workbook(write_path)
ws = wb.get_sheet_by_name(sheet_name)
money_fmt = '$#,##0_-'
pct_fmt = '0.00%'
write_format(ws, 'A1:G1', money_fmt)
write_format(ws, 'A1:E14', money_fmt)
write_format(ws, 'F1:K14', pct_fmt)
write_format(ws, 'F15:Z1000', money_fmt)
wb.save(write_path)
When attempted with xlsxwriter, it always overwrites the existing data from Pandas. But if Pandas is then made to re-write the data, it then overwrites any applied formatting. There does not appear to be any method to apply formatting to an existing cell without overwriting the contents. For example, the write_blank() function states:
This method is used to add formatting to a cell which doesn’t contain
a string or number value.

pandas: how to format cells after exporting to Excel

I am exporting some pandas dataframes to Excel:
df.to_excel(writer, sheet)
wb = writer.book
ws = writer.sheets[sheet]
ws.write(1, 4, "DataFrame contains ...")
writer.save()
I understand I can use the format class: http://xlsxwriter.readthedocs.org/format.html to format cells as I write them to Excel. However, I can't figure out a way to apply a formatting style after cells have already been written to Excel. E.g. how do I set to bold and horizontally align to the centre the item in row = 2 and column = 3 of the dataframne I have exported to Excel?
It should look like this:
new_style = wb.add_format().set_bold().set_align('center')
ws.apply_style(sheet, 'C2', new_style)
According to apply_style() that need to be added to XLSXwriter:
def apply_style(self, sheet_name, cell, cell_format_dict):
"""Apply style for any cell, with value or not. Overwrites cell with joined
cell_format_dict and existing format and with existing or blank value"""
written_cell_data = self.written_cells[sheet_name].get(cell)
if written_cell_data:
existing_value, existing_cell_format_dict = self.written_cells[sheet_name][cell]
updated_format = dict(existing_cell_format_dict or {}, **cell_format_dict)
else:
existing_value = None
updated_format = cell_format_dict
self.write_cell(sheet_name, cell, existing_value, updated_format)
OR you can write so:
new_style = wb.add_format().set_bold().set_align('center')
ws.write(2, 3, ws.written_cells[sheet].get('C2), new_style)

Categories