comparing two columns and highlighting differences in dataframe

comparing two columns and highlighting differences in dataframe - python

What would be the best way to compare two columns and highlight if there is a difference between two columns in dataframe?
df = pd.DataFrame({'ID':['one2', 'one3', 'one3', 'one4' ],
'Volume':[5.0, 6.0, 7.0, 2.2],
'BOX':['one','two','three','four'],
'BOX2':['one','two','five','one hundred']})
I am trying to compare the BOX column and BOX2 column and I'd like to highlight the differences between them.

Maybe you can do something like this:
df.style.apply(lambda x: (x != df['BOX']).map({True: 'background-color: red; color: white', False: ''}), subset=['BOX2'])
Output (in Jupyter):

You might try something like:
def hl(d):
df = pd.DataFrame(columns=d.columns, index=d.index)
df.loc[d['BOX'].ne(d['BOX2']), ['BOX', 'BOX2']] = 'background: yellow'
return df
df.style.apply(hl, axis=None)
output:
for the whole row:
def hl(d):
df = pd.DataFrame(columns=d.columns, index=d.index)
df.loc[d['BOX'].ne(d['BOX2'])] = 'background: yellow'
return df
df.style.apply(hl, axis=None)
output:

Related

Color dataframe rows by condition in Pandas

I need to color rows in my DataFrame, according to the condition depending on some column. For example:
test = pd.DataFrame({"A": [1,2,3,4,5], "B":[5,3,2,1,4]})
def color(score):
return f"background-color:" + (" #ffff00;" if score < 4 else "#ff0000")
test.style.applymap(color, subset = ["A"])
But in this way I get color only at A column:
Also i can code like this:
def color1(score):
return f"background-color: #ffff00;"
def color2(score):
return f"background-color: #ff0000;"
temp = test.style.applymap(color1, subset = (test[test["A"]< 4].index, slice(None)))
temp.applymap(color2, subset = (test[test["A"] >= 4].index, slice(None)))
In this way color will be ok, but i struggle multiple calls of applymap functions. Is there any way to fulfill my goal without the multiple calls?

In the first example, by the subset='A' you are telling to apply only to column A. If you remove that it will apply to the entire dataframe.
import pandas as pd
test = pd.DataFrame({"A": [1,2,3,4,5], "B":[5,3,2,1,4]})
def color(score):
return f"background-color:" + (" #ffff00;" if score < 4 else "#ff0000")
test.style.applymap(color)
If you want to apply different formatting in different columns you can easily design something handling one column at a time.
def color2(score):
return f"background-color: #80FFff;" if score < 4 else None
test.style.applymap(color, subset='A') \
.applymap(color2, subset='B')

If some complicated mask or set styles is possible create DataFrame of styles and pass to Styler.apply with axis=None:
def highlight(x):
c1 = f"background-color: #ffff00;"
c2 = f"background-color: #ff0000;"
m = x["A"]< 4
# DataFrame of styles
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
# set columns by condition
df1.loc[m, :] = c1
df1.loc[~m, :] = c2
#for check DataFrame of styles
print (df1)
return df1
test.style.apply(highlight, axis=None)

Highlight a column incl. header with style in Pivottable [duplicate]

I am using pandas styler to give some columns a background color, based on the name of the column header.
While this works as intended, the background color of the column header doesn't change.
Here is the part in my script where thy style is applied:
def highlight_col(x):
if x.name in added_columns:
return ['background-color: #67c5a4']*x.shape[0]
elif x.name in dropped_columns:
return ['background-color: #ff9090']*x.shape[0]
else:
return ['background-color: None']*x.shape[0]
old = old.style.apply(highlight_col, axis=0)
Is there a way to apply the style.apply()-function not only to the cells below the column header, but the complete column including the column header?
Edit:
For clarification here is a screenshot of the excel output:
screenshot of excel output
"Header 2" should have the same background color as the cells below it.

Okay, I think I figured out a way to handle formatting a column header using html 'selectors':
Using much of your code as setup:
df = pd.DataFrame('some value', columns=['Header1','Header2','Header3'], index=np.arange(12))
added_columns = 'Header2'
dropped_columns = 'Header1'
def highlight_col(x):
if x.name in added_columns:
return ['background-color: #67c5a4']*x.shape[0]
elif x.name in dropped_columns:
return ['background-color: #ff9090']*x.shape[0]
else:
return ['background-color: None']*x.shape[0]
col_loc_add = df.columns.get_loc(added_columns) + 2
col_loc_drop = df.columns.get_loc(dropped_columns) + 2
df.style.apply(highlight_col, axis=0)\
.set_table_styles(
[{'selector': f'th:nth-child({col_loc_add})',
'props': [('background-color', '#67c5a4')]},
{'selector': f'th:nth-child({col_loc_drop})',
'props': [('background-color', '#ff9090')]}])
Output:
Note: I am using f-string which is a Python 3.6+ feature.

You can use np.vstack() to stack the column names like below and create a fresh dataframe to apply the function, then export to excel with header=False:
Using #Scott's data and piR's function,
Setup:
df = pd.DataFrame('some value', columns=['Header1','Header2','Header3'], index=np.arange(12))
def f(dat, c='red'):
return [f'background-color: {c}' for i in dat]
You can do:
pd.DataFrame(np.vstack((df.columns,df.to_numpy())),columns=df.columns).style.apply(
f,subset=['Header2']).to_excel('file.xlsx',header=False,index=False)
Output of excel file:

Pandas style.background_gradient ignore NaN

I have the following code to dump the dataframe results into a table in HTML, such that the columns in TIME_FRAMES are colored according to a colormap from seaborn.
import seaborn as sns
TIME_FRAMES = ["24h", "7d", "30d", "1y"]
# Set CSS properties for th elements in dataframe
th_props = [
('font-size', '11px'),
('text-align', 'center'),
('font-weight', 'bold'),
('color', '#6d6d6d'),
('background-color', '#f7f7f9')
]
# Set CSS properties for td elements in dataframe
td_props = [
('font-size', '11px')
]
cm = sns.light_palette("green", as_cmap=True)
s = (results.style.background_gradient(cmap=cm, subset=TIME_FRAMES)
.set_table_styles(styles))
a = s.render()
with open("test.html", "w") as f:
f.write(a)
From this, I get the warning:
/python3.7/site-packages/matplotlib/colors.py:512: RuntimeWarning:
invalid value encountered in less xa[xa < 0] = -1
And, as you can see in the picture below, the columns 30d and 1y don't get rendered correctly, as they have NaN's. How can I just make it so that the NaN's are ignored and the colors are rendered only using the valid values? Setting the NaN's to 0 is not a valid option, as NaN's here have a meaning by themselves.

A bit late, but for future reference.
I had the same problem, and here is how I solved it:
import pandas as pd
import numpy as np
dt = pd.DataFrame({'col1': [1,2,3,4,5], 'col2': [4,5,6,7,np.nan], 'col3': [8,2,6,np.nan,np.nan]})
First fill in the nas with a big value
dt.fillna(dt.max().max()+1, inplace=True)
Function to color the font of this max value white
def color_max_white(val, max_val):
color = 'white' if val == max_val else 'black'
return 'color: %s' % color
Function to color the background of the maximum value white
def highlight_max(data, color='white'):
attr = 'background-color: {}'.format(color)
if data.ndim == 1: # Series from .apply(axis=0) or axis=1
is_max = data == data.max()
return [attr if v else '' for v in is_max]
else: # from .apply(axis=None)
is_max = data == data.max().max()
return pd.DataFrame(np.where(is_max, attr, ''),
index=data.index, columns=data.columns)
Putting everything together
max_val = dt.max().max()
dt.style.format("{:.2f}").background_gradient(cmap='Blues', axis=None).applymap(lambda x: color_max_white(x, max_val)).apply(highlight_max, axis=None)
This link helped me for the answer

this works fine for me
df.style.applymap(lambda x: 'color: transparent' if pd.isnull(x) else '')

#quant 's answer almost worked for me but my background gradient would still use the max value to calculate the color gradient. I implemented #night-train 's suggestion to set the color map, then used two functions:
import copy
cmap = copy.copy(plt.cm.get_cmap("Blues"))
cmap.set_under("white")
def color_nan_white(val):
"""Color the nan text white"""
if np.isnan(val):
return 'color: white'
def color_nan_white_background(val):
"""Color the nan cell background white"""
if np.isnan(val):
return 'background-color: white'
And then applied them to my dataframe again borrowing from #quant with a slight modification for ease:
(df.style
.background_gradient(axis='index')
.applymap(lambda x: color_nan_white(x))
.applymap(lambda x: color_nan_white_background(x))
)
Then it worked perfectly.

Python Pandas: Style column header

I am using pandas styler to give some columns a background color, based on the name of the column header.
While this works as intended, the background color of the column header doesn't change.
Here is the part in my script where thy style is applied:
def highlight_col(x):
if x.name in added_columns:
return ['background-color: #67c5a4']*x.shape[0]
elif x.name in dropped_columns:
return ['background-color: #ff9090']*x.shape[0]
else:
return ['background-color: None']*x.shape[0]
old = old.style.apply(highlight_col, axis=0)
Is there a way to apply the style.apply()-function not only to the cells below the column header, but the complete column including the column header?
Edit:
For clarification here is a screenshot of the excel output:
screenshot of excel output
"Header 2" should have the same background color as the cells below it.

Okay, I think I figured out a way to handle formatting a column header using html 'selectors':
Using much of your code as setup:
df = pd.DataFrame('some value', columns=['Header1','Header2','Header3'], index=np.arange(12))
added_columns = 'Header2'
dropped_columns = 'Header1'
def highlight_col(x):
if x.name in added_columns:
return ['background-color: #67c5a4']*x.shape[0]
elif x.name in dropped_columns:
return ['background-color: #ff9090']*x.shape[0]
else:
return ['background-color: None']*x.shape[0]
col_loc_add = df.columns.get_loc(added_columns) + 2
col_loc_drop = df.columns.get_loc(dropped_columns) + 2
df.style.apply(highlight_col, axis=0)\
.set_table_styles(
[{'selector': f'th:nth-child({col_loc_add})',
'props': [('background-color', '#67c5a4')]},
{'selector': f'th:nth-child({col_loc_drop})',
'props': [('background-color', '#ff9090')]}])
Output:
Note: I am using f-string which is a Python 3.6+ feature.

You can use np.vstack() to stack the column names like below and create a fresh dataframe to apply the function, then export to excel with header=False:
Using #Scott's data and piR's function,
Setup:
df = pd.DataFrame('some value', columns=['Header1','Header2','Header3'], index=np.arange(12))
def f(dat, c='red'):
return [f'background-color: {c}' for i in dat]
You can do:
pd.DataFrame(np.vstack((df.columns,df.to_numpy())),columns=df.columns).style.apply(
f,subset=['Header2']).to_excel('file.xlsx',header=False,index=False)
Output of excel file:

How do I remove and re-sort (reindex) columns after applying style in python pandas?

Is there a way to remove columns or rows after applying style in python pandas? And re-sort them?
styled = df.style.apply(colorize, axis=None)
#remove _x columns
yonly = list(sorted(set(styled.columns) - set(df.filter(regex='_x$').columns)))
###Remove columns that end with "_x" here
styled.to_excel('styled.xlsx', engine='openpyxl', freeze_panes=(1,1))
Most things I tried were unavailable, i.e.
styled.reindex(columns=yonly) returned AttributeError: 'Styler' object has no attribute 'reindex'
styled.columns = yonly returned AttributeError: 'list' object has no attribute 'get_indexer'
styled = styled[yonly] returns
TypeError: 'Styler' object is not subscriptable
Follow-up from Colour specific cells from two columns that don't match, using python pandas style.where (or otherwise) and export to excel

After #jezrael's comment to remove columns before styling and colouring, I got my answer :)
The solution was to pass an extra argument, making the original dataframe df available. And coloured the dataframe df_tmp with the "_y" only. :)
df = pd.DataFrame({
'config_dummy1': ["dummytext"] * 10,
'a_y': ["a"] * 10,
'config_size_x': ["textstring"] * 10,
'config_size_y': ["textstring"] * 10,
'config_dummy2': ["dummytext"] * 10,
'a_x': ["a"] * 10
})
df.at[5, 'config_size_x'] = "xandydontmatch"
df.at[9, 'config_size_y'] = "xandydontmatch"
df.at[0, 'a_x'] = "xandydontmatch"
df.at[3, 'a_y'] = "xandydontmatch"
print(df)
def color(x, extra):
c1 = 'color: #ffffff; background-color: #ba3018'
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#select only columns ends with _x and _y and sorting
cols = sorted(extra.filter(regex='_x$|_y$').columns)
#loop by pairs and assign style by mask
for colx, coly in zip(cols[::2],cols[1::2]):
#pairs columns
m = extra[colx] != extra[coly]
df1.loc[m, [coly]] = c1
return df1
yonly = list(sorted(set(df.columns) - set(df.filter(regex='_x$').columns)))
df_tmp = df[yonly]
df_tmp.style.apply(color, axis=None, extra=df).to_excel('styled.xlsx', engine='openpyxl')
Thank you wonderful people of SO! :D

I'm not sure about re-sorting, but if you only want to remove (hide) some columns use Styler's
hide_columns method.
For example to hide columns 'A' and 'B':
hide_columns(['A', 'B'])

I had a similar scenario wherein I had to color background of a dataframe based on another dataframe. I created a function for coloring based on the ranges of the other dataframe as follows:
def colval(val, z1):
color= 'None'
df1= pd.DataFrame('', index= val.index, columns= val.columns) # dataframe for coloring
colm= z1.shape
for x in list(range(colm[0])):
for y in list(range(1, colm[1])):
# check the range in the dependent dataframe
# and color the other one
if(z1.iloc[x, y]>= 4.5):
df1.iloc[x, y]= 'background-color: red'
elif(z1.iloc[x, y]<= -4.5):
df1.iloc[x, y]= 'background-color: yellow'
return df1
df_tocol.style.apply(colval, axis= None, z1= diff_df)
Hope this helps!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

comparing two columns and highlighting differences in dataframe - python

Maybe you can do something like this: df.style.apply(lambda x: (x != df['BOX']).map({True: 'background-color: red; color: white', False: ''}), subset=['BOX2']) Output (in Jupyter):

Related

Color dataframe rows by condition in Pandas

Highlight a column incl. header with style in Pivottable [duplicate]

Pandas style.background_gradient ignore NaN

Python Pandas: Style column header

How do I remove and re-sort (reindex) columns after applying style in python pandas?

Categories

Resources