I have an issue with my code to highlight specific cell in an excel file when I export my DF. Cells to highlight with background colors are outliers of the column. Oultiers are calculated thanks a for loop on each column.
Here the code where I calculate outliers for each column:
for col in dfmg.columns.difference(['Sbj', 'expertise', 'gender']):
Q1c = dfmg[col].quantile(0.25)
Q3c = dfmg[col].quantile(0.75)
IQRc = Q3 - Q1
lowc = Q1-1.5*IQR
uppc = Q3+1.5*IQR
Then I created this function to define how to higlight cells:
def colors(v):
for v in dfmg[col]:
if v < lowc or v > uppc:
color = 'yellow'
return 'background-color: %s' % color
And I apply my function to a new df:
df_colored = dfmg.style.applymap(colors)
The problem is that when I export df_colored, everything is yellow! Where am i wrong?
Thanks for help!
You can create DataFrame of styles and apply to DataFrame, I modify your solution for no loops:
def hightlight(x):
c1 = 'background-color: yellow'
cols = x.columns.difference(['Sbj', 'expertise', 'gender'])
Q1 = x[cols].quantile(0.25)
Q3 = x[cols].quantile(0.75)
IQR = Q3 - Q1
low = Q1-1.5*IQR
up = Q3+1.5*IQR
mask = x.lt(low) | x.gt(up)
#DataFrame with same index and columns names as original filled empty strings
df1 = pd.DataFrame(c2, index=x.index, columns=x.columns)
#modify values of df1 column by boolean mask
df1[cols] = df1[cols].mask(mask, c1)
return df1
dfng.style.apply(hightlight, axis=None)
Related
I have a pandas dataframe with MultiIndex. The indexes of the rows are 'time' and 'type' while the columns are built from tuples. The dataframe stores the information about the price and size of three cryptocurrencies pairs (either info about trades or about the best_bids). The details are not really important, but the dataframe looks like this
I would like to change the color of the rows for which 'type' == 'Buy Trade' (let's say I want to make the text of these rows green, and red otherwise).
How can I do it?
You can download the csv of the dataframe from here https://github.com/doogersPy/files/blob/main/dataframe.csv and then load the dataframe with
df = pd.read_csv('dataframe.csv',index_col=[0,1], header=[0,1])
I have tried a similar method presented in this other question, but df.style.applydoes not work with non-unique multindexes (like in my case). In my dataframe, there are entries with same time value.
In fact, I have tried the following code
def highlight(ob):
c1 = f"background-color: #008000;"
c2 = f"background-color: #ff0000;"
m = ob.index.get_level_values('type') == 'Buy Trade'
# DataFrame of styles
df1 = pd.DataFrame('', index=ob.index, columns=ob.columns)
# set columns by condition
df1.loc[m, :] = c1
df1.loc[~m, :] = c2
#for check DataFrame of styles
return df1
df.style.apply(highlight,axis=None)
but I get the error
KeyError: 'Styler.apply and .applymap are not compatible with
non-unique index or columns.'
I have solved with the following method
col=df.reset_index().columns
idx= df.reset_index().index
def highlight(ob):
c_g = f"color: #008000;" # Green
c_r = f"color: #ff0000;" # Red
c_b = f"color: #000000;" #black
mBuy = (ob['type'] == 'Buy Trade')
mSell = (ob['type'] == 'Sell Trade')
mOB = (ob['type'] == 'OB Update')
# DataFrame of styles
df1 = pd.DataFrame('', index=idx, columns=col)
# set columns by condition
df1.loc[mBuy] = c_g
df1.loc[mSell] = c_r
df1.loc[mOB] = c_b
#for check DataFrame of styles
return df1
df.reset_index().style.apply(highlight,axis=None)
I need to color rows in my DataFrame, according to the condition depending on some column. For example:
test = pd.DataFrame({"A": [1,2,3,4,5], "B":[5,3,2,1,4]})
def color(score):
return f"background-color:" + (" #ffff00;" if score < 4 else "#ff0000")
test.style.applymap(color, subset = ["A"])
But in this way I get color only at A column:
Also i can code like this:
def color1(score):
return f"background-color: #ffff00;"
def color2(score):
return f"background-color: #ff0000;"
temp = test.style.applymap(color1, subset = (test[test["A"]< 4].index, slice(None)))
temp.applymap(color2, subset = (test[test["A"] >= 4].index, slice(None)))
In this way color will be ok, but i struggle multiple calls of applymap functions. Is there any way to fulfill my goal without the multiple calls?
In the first example, by the subset='A' you are telling to apply only to column A. If you remove that it will apply to the entire dataframe.
import pandas as pd
test = pd.DataFrame({"A": [1,2,3,4,5], "B":[5,3,2,1,4]})
def color(score):
return f"background-color:" + (" #ffff00;" if score < 4 else "#ff0000")
test.style.applymap(color)
If you want to apply different formatting in different columns you can easily design something handling one column at a time.
def color2(score):
return f"background-color: #80FFff;" if score < 4 else None
test.style.applymap(color, subset='A') \
.applymap(color2, subset='B')
If some complicated mask or set styles is possible create DataFrame of styles and pass to Styler.apply with axis=None:
def highlight(x):
c1 = f"background-color: #ffff00;"
c2 = f"background-color: #ff0000;"
m = x["A"]< 4
# DataFrame of styles
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
# set columns by condition
df1.loc[m, :] = c1
df1.loc[~m, :] = c2
#for check DataFrame of styles
print (df1)
return df1
test.style.apply(highlight, axis=None)
I'm trying to highlight all cells that are before the current date. However I am highlighting all cells instead of just the old dates.
import pandas as pd
from datetime import datetime
#get file
df = pd.read_excel(r'C:\Users\cc-621.xlsx')
df
# sort the data by date
df['Date'] = pd.to_datetime(df['Expiration Date'])
df = df.sort_values(by='Expiration Date')
df = df.reset_index()
df
# sort by todays date
today = datetime.now(tz=None)
today
def expired(self):
for index, rows in df.iterrows():
color = 'red' if rows['Expiration Date'] < today else 'green'
return 'background-color: %s' %color
new = df.style.applymap(expired)
new
Idea is create new DataFrame filled by styles by condition with Styler.apply, for set rows by conditions is used numpy.where:
def expired(x):
c1 = 'background-color: red'
c2 = 'background-color: green'
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
m = x['Expiration Date'] < today
df1['Expiration Date'] = np.where(m, c1, c2)
return df1
df.style.apply(expired, axis=None)
If coloring all rows by condition use DataFrame.mask:
def expired(x):
c1 = 'background-color: red'
c2 = 'background-color: green'
df1 = pd.DataFrame(c1, index=x.index, columns=x.columns)
m = x['Expiration Date'] < today
df1 = df1.mask(m, c2)
return df1
applymap is executed on each cell. You don't need to loop over each row if using this. However, it seems you are trying to highlight the entire row, so you likely want apply by row. Using this method, you have to return an array with the same size as each row.
def expired(val):
return ['background-color: green' if val['Expiration Date'] else ''] * len(val)
new = df.style.apply(expired, axis=1)
I'm trying to highlight all cells that are before the current date. However I am highlighting all cells instead of just the old dates.
import pandas as pd
from datetime import datetime
#get file
df = pd.read_excel(r'C:\Users\cc-621.xlsx')
df
# sort the data by date
df['Date'] = pd.to_datetime(df['Expiration Date'])
df = df.sort_values(by='Expiration Date')
df = df.reset_index()
df
# sort by todays date
today = datetime.now(tz=None)
today
def expired(self):
for index, rows in df.iterrows():
color = 'red' if rows['Expiration Date'] < today else 'green'
return 'background-color: %s' %color
new = df.style.applymap(expired)
new
Idea is create new DataFrame filled by styles by condition with Styler.apply, for set rows by conditions is used numpy.where:
def expired(x):
c1 = 'background-color: red'
c2 = 'background-color: green'
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
m = x['Expiration Date'] < today
df1['Expiration Date'] = np.where(m, c1, c2)
return df1
df.style.apply(expired, axis=None)
If coloring all rows by condition use DataFrame.mask:
def expired(x):
c1 = 'background-color: red'
c2 = 'background-color: green'
df1 = pd.DataFrame(c1, index=x.index, columns=x.columns)
m = x['Expiration Date'] < today
df1 = df1.mask(m, c2)
return df1
applymap is executed on each cell. You don't need to loop over each row if using this. However, it seems you are trying to highlight the entire row, so you likely want apply by row. Using this method, you have to return an array with the same size as each row.
def expired(val):
return ['background-color: green' if val['Expiration Date'] else ''] * len(val)
new = df.style.apply(expired, axis=1)
Is there a way to remove columns or rows after applying style in python pandas? And re-sort them?
styled = df.style.apply(colorize, axis=None)
#remove _x columns
yonly = list(sorted(set(styled.columns) - set(df.filter(regex='_x$').columns)))
###Remove columns that end with "_x" here
styled.to_excel('styled.xlsx', engine='openpyxl', freeze_panes=(1,1))
Most things I tried were unavailable, i.e.
styled.reindex(columns=yonly) returned AttributeError: 'Styler' object has no attribute 'reindex'
styled.columns = yonly returned AttributeError: 'list' object has no attribute 'get_indexer'
styled = styled[yonly] returns
TypeError: 'Styler' object is not subscriptable
Follow-up from Colour specific cells from two columns that don't match, using python pandas style.where (or otherwise) and export to excel
After #jezrael's comment to remove columns before styling and colouring, I got my answer :)
The solution was to pass an extra argument, making the original dataframe df available. And coloured the dataframe df_tmp with the "_y" only. :)
df = pd.DataFrame({
'config_dummy1': ["dummytext"] * 10,
'a_y': ["a"] * 10,
'config_size_x': ["textstring"] * 10,
'config_size_y': ["textstring"] * 10,
'config_dummy2': ["dummytext"] * 10,
'a_x': ["a"] * 10
})
df.at[5, 'config_size_x'] = "xandydontmatch"
df.at[9, 'config_size_y'] = "xandydontmatch"
df.at[0, 'a_x'] = "xandydontmatch"
df.at[3, 'a_y'] = "xandydontmatch"
print(df)
def color(x, extra):
c1 = 'color: #ffffff; background-color: #ba3018'
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#select only columns ends with _x and _y and sorting
cols = sorted(extra.filter(regex='_x$|_y$').columns)
#loop by pairs and assign style by mask
for colx, coly in zip(cols[::2],cols[1::2]):
#pairs columns
m = extra[colx] != extra[coly]
df1.loc[m, [coly]] = c1
return df1
yonly = list(sorted(set(df.columns) - set(df.filter(regex='_x$').columns)))
df_tmp = df[yonly]
df_tmp.style.apply(color, axis=None, extra=df).to_excel('styled.xlsx', engine='openpyxl')
Thank you wonderful people of SO! :D
I'm not sure about re-sorting, but if you only want to remove (hide) some columns use Styler's
hide_columns method.
For example to hide columns 'A' and 'B':
hide_columns(['A', 'B'])
I had a similar scenario wherein I had to color background of a dataframe based on another dataframe. I created a function for coloring based on the ranges of the other dataframe as follows:
def colval(val, z1):
color= 'None'
df1= pd.DataFrame('', index= val.index, columns= val.columns) # dataframe for coloring
colm= z1.shape
for x in list(range(colm[0])):
for y in list(range(1, colm[1])):
# check the range in the dependent dataframe
# and color the other one
if(z1.iloc[x, y]>= 4.5):
df1.iloc[x, y]= 'background-color: red'
elif(z1.iloc[x, y]<= -4.5):
df1.iloc[x, y]= 'background-color: yellow'
return df1
df_tocol.style.apply(colval, axis= None, z1= diff_df)
Hope this helps!