Python: conditional color in barplot - python

I'm doing an horizontal barplot. I need one specific bar (type=milk) to have a green fill color, and gray for the other types. The dataframe is:
df = DataFrame(val = c(1, 2, 3, 6, 7, 8),
type = c("honey","bread","coffee","bread","honey","milk"))
I have tried this without success:
clrs = ['green' if ((x == milk) else 'gray' for x in type]
ax.barh(df['type'], (df['val']), align='center', colors=clrs)
Any ideas?

Yes, you could change the patch of the appropriate bar, something like
b=ax.barh(df['type'], (df['val']), align='center', colors=clrs)
b.patches[4].set_color('green')
However I don't understand why you are repeating the same labels multiple times.
If you didn't have repeated elements you could find the label milk automatically like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.close('all')
df = pd.DataFrame({'val': np.array([1, 2, 3, 6]), 'types': ["honey", "coffee", "bread","milk"]})
b = plt.barh(df['types'], df['val'], align='center', color='gray')
index_milk = df[df['types']=='milk'].index[0]
b.patches[index_milk].set_color('green')
plt.show()

Related

Automatically find and add the coordinates to add Annotations (e.g. count) on a Boxplot made from a Dictionary of uneven Lists

I'm pretty new in programming world and I'm really frustrated to solve a problem which I thought should be really easy...
Case: Let's say I have a Dictionary with uneven Lists; Also the number of Keys(string) & Values(number) could change anytime.
Need: I want to annotate (add text or whatever) some Information (e.g. count) to each Subplots or Categories (each Key is an individual Category).
Problem: I found many solutions for evenly numbered Categories, which apparently doesn't work for me. e.g. Solution
I also found some Answers e.g. Solution , that I should first get the Coordinates of each Keys in the x-line and then do a inverted transformation to work with the "log scales". Which was so far the best solution for me, but unfortunately it does not really fit the Coordinates and I couldn't get & add the points automatically before using plt.show().
I could also guess the coordinates with trial error in the Transformation Method or with Offset e.g. Solution. But as I said, my Dictionary could change anytime, and then I should do it again every time!
I think there should be much more simpler method to solve this problem, but I couldn't find it.
Here is the simplified example of my Code and what I tried:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
AnnotationBbox)
dictionary = {}
dictionary["a"] = [1, 2, 3, 4, 5]
dictionary["b"] = [1, 2, 3, 4, 5, 6, 7]
fig, ax = plt.subplots()
ax.boxplot(dictionary.values())
x = ax.set_xticklabels(dictionary.keys())
fig.text(x = 0.25, y = 0, s = str(len(dictionary["a"])))
fig.text(x = 0.75, y = 0, s = str(len(dictionary["b"])))
plt.show()
crd = np.vstack((ax.get_xticks(), np.zeros_like(ax.get_xticks()))).T
ticks = ax.transAxes.inverted().transform(ax.transData.transform(crd))
print(ticks[:,0])
# ab = AnnotationBbox(TextArea("text"), xy=(1, 0), xybox =(0, -30), boxcoords="offset points",pad=0,frameon=False )
# ax.add_artist(ab)
Output of my code
as i understand you may want something like this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
AnnotationBbox)
dictionary = {}
dictionary["a"] = [1, 2, 3, 4, 5]
dictionary["b"] = [1, 2, 3, 4, 5, 6, 7]
dictionary["cex"] = [1, 2, 3]
fig, ax = plt.subplots()
ax.boxplot(dictionary.values())
x = ax.set_xticklabels(dictionary.keys())
ticksList=ax.get_xticks()
print (ticksList)
for x in ticksList:
ax.text(x, 0,str(len(list(dictionary.values())[x-1])),fontdict={'horizontalalignment': 'center'})
fig.show()

Plotly: How to make all plots grayscale?

I am using Plotly to generate few line plots in Python. With a sample code like this:
from plotly import offline as plot, subplots as subplot, graph_objects as go
fig = subplot.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
trace1 = go.Scatter(x = [1, 2, 3], y = [1, 2, 3])
trace2 = go.Scatter(x = [1, 2, 3], y = [4, 5, 6])
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
config_test_plot = {'displaylogo': False, 'displayModeBar': False, 'scrollZoom': True}
test_plot_html = plot.plot(fig, output_type='div', include_plotlyjs=False, config= config_test_plot)
I am able to get the required plots. However, I want to be able to get all my plots in grayscale. I see that none of the Plotly default themes are of this type. Is there anyway I can do this?
You haven't specified whether to assign a grey color scheme for your entire plot, or only for your lines. But just to not make things easy for myself, I'm going to assume the former. In that case, I would:
use template = 'plotly_white' for the figure elements not directly connected to your dataset, and
assign a grey scale to all lines using n_colors(lowcolor, highcolor, n_colors, colortype='tuple').
Example plot:
But as #S3DEV mentions, using the greys color palette could be a way to go too, and this is accesible through:
# In:
px.colors.sequential.Greys
# Out:
# ['rgb(255,255,255)',
# 'rgb(240,240,240)',
# 'rgb(217,217,217)',
# 'rgb(189,189,189)',
# 'rgb(150,150,150)',
# 'rgb(115,115,115)',
# 'rgb(82,82,82)',
# 'rgb(37,37,37)',
# 'rgb(0,0,0)']
And this would work perfectly for your use case with a limited number of lines. In that case you could just use this setup:
from plotly import offline as plot, subplots as subplot, graph_objects as go
from itertools import cycle
fig = subplot.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
trace1 = go.Scatter(x = [1, 2, 3], y = [1, 2, 3])
trace2 = go.Scatter(x = [1, 2, 3], y = [4, 5, 6])
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
colors = cycle(list(set(px.colors.sequential.Greys)))
f = fig.full_figure_for_development(warn=False)
for d in fig.data:
d.line.color = next(colors)
fig.show()
And get:
And I assume that this is what you were looking for. But one considerable drawback here is that the number of colors in px.colors.sequential.Greys is limited, and I had to use a cycle to assign the line colors of your data. And n_colors(lowcolor, highcolor, n_colors, colortype='tuple') lets you define a starting color, an end color, and a number of colors scaled between them to form a complete scale for all your lines. This will also let you adjust the brightness of the colors to your liking. So you could get this:
...this:
or this:
Here's a complete setup for those figures if you would like to experiment with that as well:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
from plotly.colors import n_colors
pd.set_option('display.max_rows', None)
pd.options.plotting.backend = "plotly"
# data sample
nperiods = 200
np.random.seed(123)
cols = 'abcdefghijkl'
df = pd.DataFrame(np.random.randint(-10, 12, size=(nperiods, len(cols))),
columns=list(cols))
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0] =1000
df = df.cumsum()#.reset_index()
greys_all = n_colors('rgb(0, 0, 0)', 'rgb(255, 255, 255)', len(cols)+1, colortype='rgb')
greys_dark = n_colors('rgb(0, 0, 0)', 'rgb(200, 200, 200)', len(cols)+1, colortype='rgb')
greys_light = n_colors('rgb(200, 200, 200)', 'rgb(255, 255, 255)', len(cols)+1, colortype='rgb')
greys = n_colors('rgb(100, 100, 100)', 'rgb(255, 255, 255)', len(cols)+1, colortype='rgb')
fig = df.plot(title = 'Greys_light', template='plotly_white', color_discrete_sequence=greys_light)
fig.update_layout(template='plotly_white')
fig.show()

Python: Barplot colored according to a third variable

Currently I am trying to create a Barplot that shows the amount of reviews for an app per week. The bar should however be colored according to a third variable which contains the average rating of the reviews in each week (range: 1 to 5).
I followed the instructions of the following post to create the graph: Python: Barplot with colorbar
The code works fine:
# Import Packages
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
# Create Dataframe
data = [[1, 10, 3.4], [2, 15, 3.9], [3, 12, 3.6], [4, 30,1.2]]
df = pd.DataFrame(data, columns = ["week", "count", "score"])
# Convert to lists
data_x = list(df["week"])
data_hight = list(df["count"])
data_color = list(df["score"])
#Create Barplot:
data_color = [x / max(data_color) for x in data_color]
fig, ax = plt.subplots(figsize=(15, 4))
my_cmap = plt.cm.get_cmap('RdYlGn')
colors = my_cmap(data_color)
rects = ax.bar(data_x, data_hight, color=colors)
sm = ScalarMappable(cmap=my_cmap, norm=plt.Normalize(1,5))
sm.set_array([])
cbar = plt.colorbar(sm)
cbar.set_label('Color', rotation=270,labelpad=25)
plt.show()
Now to the issue: As you might notice the value of the average score in week 4 is "1.2". The Barplot does however indicate that the value lies around "2.5". I understand that this stems from the following code line, which standardizes the values by dividing it with the max value:
data_color = [x / max(data_color) for x in data_color]
Unfortunatly I am not able to change this command in a way that the colors resemble the absolute values of the scores, e.g. with a average score of 1.2 the last bar should be colored in deep red not light orange. I tried to just plug in the regular score values (Not standardized) to solve the issue, however, doing so creates all bars with the same green color... Since this is only my second python project, I have a hard time comprehending the process behind this matter and would be very thankful for any advice or solution.
Cheers Neil
You identified correctly that the normalization is the problem here. It is in the linked code by valued SO user #ImportanceOfBeingEarnest defined for the interval [0, 1]. If you want another normalization range [normmin, normmax], you have to take this into account during the normalization:
# Import Packages
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
# Create Dataframe
data = [[1, 10, 3.4], [2, 15, 3.9], [3, 12, 3.6], [4, 30,1.2]]
df = pd.DataFrame(data, columns = ["week", "mycount", "score"])
# Not necessary to convert to lists, pandas series or numpy array is also fine
data_x = df.week
data_hight = df.mycount
data_color = df.score
#Create Barplot:
normmin=1
normmax=5
data_color = [(x-normmin) / (normmax-normmin) for x in data_color] #see the difference here
fig, ax = plt.subplots(figsize=(15, 4))
my_cmap = plt.cm.get_cmap('RdYlGn')
colors = my_cmap(data_color)
rects = ax.bar(data_x, data_hight, color=colors)
sm = ScalarMappable(cmap=my_cmap, norm=plt.Normalize(normmin,normmax))
sm.set_array([])
cbar = plt.colorbar(sm)
cbar.set_label('Color', rotation=270,labelpad=25)
plt.show()
Sample output:
Obviously, this does not check that all values are indeed within the range [normmin, normmax], so a better script would make sure that all values adhere to this specification. We could, alternatively, address this problem by clipping the values that are outside the normalization range:
#...
import numpy as np
#.....
#Create Barplot:
normmin=1
normmax=3.5
data_color = [(x-normmin) / (normmax-normmin) for x in np.clip(data_color, normmin, normmax)]
#....
You may also have noticed another change that I introduced. You don't have to provide lists - pandas series or numpy arrays are fine, too. And if you name your columns not like pandas functions such as count, you can access them as df.ABC instead of df["ABC"].

seaborn plot_marginals multiple kdeplots

I would like to be able to plot multiple overlaid kde plots on the y axis margin (don't need the x axis margin plot). Each kde plot would correspond to the color category (there are 4) so that I would have 4 kde's each depicting the distribution of one of the categories. This is as far as I got:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
x = [106405611, 107148674, 107151119, 107159869, 107183396, 107229405, 107231917, 107236097,
107239994, 107259338, 107273842, 107275873, 107281000, 107287770, 106452671, 106471246,
106478110, 106494135, 106518400, 106539079]
y = np.array([ 9.09803208, 5.357552 , 8.98868469, 6.84549005,
8.17990909, 10.60640521, 9.89935692, 9.24079133,
8.97441459, 9.09803208, 10.63753055, 11.82336724,
7.93663794, 8.74819285, 8.07146236, 9.82336724,
8.4429435 , 10.53332973, 8.23361968, 10.30035256])
x1 = pd.Series(x, name="$V$")
x2 = pd.Series(y, name="$Distance$")
col = np.array([2, 4, 4, 1, 3, 4, 3, 3, 4, 1, 4, 3, 2, 4, 1, 1, 2, 2, 3, 1])
g = sns.JointGrid(x1, x2)
g = g.plot_joint(plt.scatter, color=col, edgecolor="black", cmap=plt.cm.get_cmap('RdBu', 11))
cax = g.fig.add_axes([1, .25, .02, .4])
plt.colorbar(cax=cax, ticks=np.linspace(1,11,11))
g.plot_marginals(sns.kdeplot, color="black", shade=True)
To plot a distribution of each category, I think the best way is to first combine the data into a pandas dataframe. Then you can loop through each unique category by filtering the dataframe and plot the distribution using calls to sns.kdeplot.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
x = np.array([106405611, 107148674, 107151119, 107159869, 107183396, 107229405,
107231917, 107236097, 107239994, 107259338, 107273842, 107275873,
107281000, 107287770, 106452671, 106471246, 106478110, 106494135,
106518400, 106539079])
y = np.array([9.09803208, 5.357552 , 8.98868469, 6.84549005,
8.17990909, 10.60640521, 9.89935692, 9.24079133,
8.97441459, 9.09803208, 10.63753055, 11.82336724,
7.93663794, 8.74819285, 8.07146236, 9.82336724,
8.4429435 , 10.53332973, 8.23361968, 10.30035256])
col = np.array([2, 4, 4, 1, 3, 4, 3, 3, 4, 1, 4, 3, 2, 4, 1, 1, 2, 2, 3, 1])
# Combine data into DataFrame
df = pd.DataFrame({'V': x, 'Distance': y, 'col': col})
# Define colormap and create corresponding color palette
cmap = sns.diverging_palette(20, 220, as_cmap=True)
colors = sns.diverging_palette(20, 220, n=4)
# Plot data onto seaborn JointGrid
g = sns.JointGrid('V', 'Distance', data=df, ratio=2)
g = g.plot_joint(plt.scatter, c=df['col'], edgecolor="black", cmap=cmap)
# Loop through unique categories and plot individual kdes
for c in df['col'].unique():
sns.kdeplot(df['Distance'][df['col']==c], ax=g.ax_marg_y, vertical=True,
color=colors[c-1], shade=True)
sns.kdeplot(df['V'][df['col']==c], ax=g.ax_marg_x, vertical=False,
color=colors[c-1], shade=True)
This is in my opinion a much better and cleaner solution than my original answer in which I needlessly redefined the seaborn kdeplot because I had not thought to do it this way. Thanks to mwaskom for pointing that out. Also note that the legend labels are removed in the posted solution and are done so using
g.ax_marg_x.legend_.remove()
g.ax_marg_y.legend_.remove()

Color seaborn boxplot based in DataFrame column name

I'd like to create a list of boxplots with the color of the box dependent on the name of the pandas.DataFrame column I use as input.
The column names contain strings that indicate an experimental condition based on which I want the box of the boxplot colored.
I do this to make the boxplots:
sns.boxplot(data = data.dropna(), orient="h")
plt.show()
This creates a beautiful list of boxplots with correct names. Now I want to give every boxplot that has 'prog +, DMSO+' in its name a red color, leaving the rest as blue.
I tried creating a dictionary with column names as keys and colors as values:
color = {}
for column in data.columns:
if 'prog+, DMSO+' in column:
color[column] = 'red'
else:
color[column] = 'blue'
And then using the dictionary as color:
sns.boxplot(data = data.dropna(), orient="h", color=color[column])
plt.show()
This does not work, understandably (there is no loop to go through the dictionary). So I make a loop:
for column in data.columns:
sns.boxplot(data = data[column], orient='h', color=color[column])
plt.show()
This does make boxplots of different colors but all on top of each other and without the correct labels. If I could somehow put these boxplot nicely in one plot below each other I'd be almost at what I want. Or is there a better way?
You should use the palette parameter, which handles multiple colors, rather than color, which handles a specific one. You can give palette a name, an ordered list, or a dictionary. The latter seems best suited to your question:
import seaborn as sns
sns.set_color_codes()
tips = sns.load_dataset("tips")
pal = {day: "r" if day == "Sat" else "b" for day in tips.day.unique()}
sns.boxplot(x="day", y="total_bill", data=tips, palette=pal)
You can set the facecolor of individual boxes after plotting them all in one go, using ax.artists[i].set_facecolor('r')
For example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(
[[2, 4, 5, 6, 1],
[4, 5, 6, 7, 2],
[5, 4, 5, 5, 1],
[10, 4, 7, 8, 2],
[9, 3, 4, 6, 2],
[3, 3, 4, 4, 1]
],columns=['bar', 'prog +, DMSO+ 1', 'foo', 'something', 'prog +, DMSO+ 2'])
ax = sns.boxplot(data=df,orient='h')
boxes = ax.artists
for i,box in enumerate(boxes):
if 'prog +, DMSO+' in df.columns[i]:
box.set_facecolor('r')
else:
box.set_facecolor('b')
plt.tight_layout()
plt.show()

Categories