I have two dictionaries-
selected candidates and rejected candidates.
the structure is like as shown below-
selected={"name":score} #same for rejected
I want to show selected candidates in the green and rejected candidates in red.
How can I do that?
I have tried this way but it is giving me some absurd result:
#Husain Shaikh
#test 3 matplotlib
import matplotlib.pyplot as plt
selected={"Husain":92, "Asim":65,"Chirag": 74 }
rejected={"Absar":70,"premraj":57}
plt.bar(range(len(selected)),list(selected.values()),color="green")
plt.xticks(range(len(selected)),list(selected.keys()))
plt.bar(range(len(rejected)),list(rejected.values()),color="red")
plt.xticks(range(len(rejected)),list(rejected.keys()))
plt.xlabel("Candidates")
plt.ylabel("Score")
plt.plot()
plt.show()
you can try something like this
import matplotlib.pyplot as plt
selected={"Husain":92, "Asim":65,"Chirag": 74 }
rejected={"Absar":70,"premraj":57}
selected_candidates_number = len(selected)
rejected_candidates_number = len(rejected)
plt.bar(range(selected_candidates_number ),list(selected.values()),color="green")
plt.bar(range(selected_candidates_number,selected_candidates_number +rejected_candidates_number ),list(rejected.values()),color="red")
plt.xticks(range(selected_candidates_number +rejected_candidates_number), list(selected.keys()) + list(rejected.keys()))
plt.xlabel("Candidates")
plt.ylabel("Score")
plt.plot()
plt.show()
Here you go. I am only showing the relevant/modified part of the code
# Import numpy, matplotlib and data here
loc_s = np.arange(len(selected))+0.1 # Offsetting the tick-label location
loc_r = np.arange(len(rejected))-0.1 # Offsetting the tick-label location
xtick_loc = list(loc_s) + list(loc_r)
xticks = list(selected.keys())+ list(rejected.keys())
plt.bar(loc_s,list(selected.values()),color="green", width=0.2,label='Selected')
plt.bar(loc_r,list(rejected.values()),color="red", width=0.2,label='Rejected')
plt.xticks(xtick_loc, xticks, rotation=45)
# Labels and legend here
Output
Related
The following code plots a horizontal bar chart in a decreasing order. I would like to change the colors of the bars, so they fade out as the values decrease. In this case California will stay as it is, but Minnesota will be very light blue, almost transparent.
I know that I can manually hardcode the values in a list of colors, but is there a better way to achieve this?
x_state = df_top_states["Percent"].nlargest(10).sort_values(ascending=True).index
y_percent = df_top_states["Percent"].nlargest(10).sort_values(ascending=True).values
plt_size = plt.figure(figsize=(9,6))
plt.barh(x_state, y_percent)
plt.title("Top 10 States with the most number of accidents (2016 - 2021)", fontsize=16)
plt.ylabel("State", fontsize=13)
plt.yticks(size=13)
plt.xlabel("% of Total Accidents", fontsize=13)
plt.xticks(size=13)
plt.tight_layout()
plt.show()
You could create a list of colors with decreasing alpha from the list of percentages. Here is some example code:
import matplotlib.pyplot as plt
from matplotlib.colors import to_rgba
import seaborn as sns # to set the ' darkgrid' style
import pandas as pd
import numpy as np
sns.set_style('darkgrid')
# first, create some suitable test data
df_top_states = pd.DataFrame({"Percent": np.random.rand(20)**3},
index=["".join(np.random.choice([*'abcdef'], np.random.randint(3, 9))) for _ in range(20)])
df_top_states["Percent"] = df_top_states["Percent"] / df_top_states["Percent"].sum() * 100
df_largest10 = df_top_states["Percent"].nlargest(10).sort_values(ascending=True)
x_state = df_largest10.index
y_percent = df_largest10.values
max_perc = y_percent.max()
fig = plt.figure(figsize=(9, 6))
plt.barh(x_state, y_percent, color=[to_rgba('dodgerblue', alpha=perc / max_perc) for perc in y_percent])
plt.title("Top 10 States with the most number of accidents (2016 - 2021)", fontsize=16)
plt.ylabel("State", fontsize=13)
plt.yticks(size=13)
plt.xlabel("% of Total Accidents", fontsize=13)
plt.xticks(size=13)
plt.margins(y=0.02) # less vertical margin
plt.tight_layout()
plt.show()
PS: Note that plt.figure(...) returns a matplotlib figure, not some kind of size element.
I generate two plots in seaborn which share y-axis. I'm wondering how I can make the shared y-axis labels center-aligned. I am looking for some ideas and improvements. The plot is attached.
import seaborn as sns
import matplotlib.pylab as plt
import numpy as np
import string
import random
labels = []
for i in range(10):
labels.append(''.join(random.choices(string.ascii_lowercase, k=4)))
labels.append(''.join(random.choices(string.ascii_lowercase, k=7)))
score = np.abs(np.random.randn(20))
fig, axes = plt.subplots(1,2 , figsize=(5,5 ), sharey=True )
for ii in range(2):
ti = sns.barplot(y=[j for j in range(len(score))],x=score, ax=axes[ii],
orient='h' )
ti.set_yticklabels(labels)
if ii ==0:
ti.invert_xaxis()
ti.yaxis.tick_right()
fig.tight_layout(w_pad=0, pad=1)
plt.show()
The ti.set_yticklabels(labels) has options to align horizontally using ha, multi-align text with ma. I have made use of ha and adjusted using the position argument as well. It seems to be aligned. Probably this could also help to design and play with it to get to right format. Reference here (Look at other parameters)
Below is the updated code snippet for the part of the changes only:
for ii in range(2):
ti = sns.barplot(y=[j for j in range(len(score))],x=score, ax=axes[ii],
orient='h')
ti.set_yticklabels(labels, ha="center", position=(1.2,0)) # ha and position adjustments.
if ii ==0:
ti.invert_xaxis()
ti.yaxis.tick_right()
plt.tight_layout(w_pad=0, pad=2.5) # Some padding adjustments.
plt.show()
Finally the output looks like this:
This is the color palette, seaborn has used by default when I used a column with categorical variables to color the scatter points.
Is there a way to get the name or colors of color-palette being used?
I get this color scheme in the beginning but as soon as I use a diff scheme for a plotly, I can't get to this palette for the same chart.
This is not the scheme which comes from sns.color_palette. This can also be a matplotlib color scheme.
Minimum reproducible example
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
# download data
df = pd.read_csv("https://www.statlearning.com/s/Auto.csv")
df.head()
# remove rows with "?"
df.drop(df.index[~df.horsepower.str.isnumeric()], axis=0, inplace=True)
df['horsepower'] = pd.to_numeric(df.horsepower, errors='coerce')
# plot 1 (gives the desired color-palette)
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);
# plot 2
# Converting column cylinder to factor before using for 'color'
df.cylinders = df.cylinders.astype('category')
# Scatter plot - Cylinders as hue
pal = ['#fdc086','#386cb0','#beaed4','#33a02c','#f0027f']
col_map = dict(zip(sorted(df.cylinders.unique()), pal))
fig = px.scatter(df, y='mpg', x='year', color='cylinders',
color_discrete_map=col_map,
hover_data=['name','origin'])
fig.update_layout(width=800, height=400, plot_bgcolor='#fff')
fig.update_traces(marker=dict(size=8, line=dict(width=0.2,color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.show()
# plot 1 run again
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);
The specific palette you have mentioned is the cubehelix and you can get it using:
sns.cubehelix_palette()
You can get the colours using indexing:
sns.cubehelix_palette()[:]
# [[0.9312692223325372, 0.8201921796082118, 0.7971480974663592],
# [0.8559578605899612, 0.6418993116910497, 0.6754191211563135],
# [0.739734329496642, 0.4765280683170713, 0.5959617419736206],
# [0.57916573903086, 0.33934576125314425, 0.5219003947563425],
# [0.37894937987025, 0.2224702044652721, 0.41140014301575434],
# [0.1750865648952205, 0.11840023306916837, 0.24215989137836502]]
In general, checking the official docs or in the case when you need to check some defaults of seaborn (which is the only case when you don't know what the palette is, otherwise you know because you're the one defining it), you can always check the github code (eg. here or here).
In your first graph the cylinders is a continuous variable of type int64 and seaborn is using a single color, in this case purple, and shading it to indicate the scale of the value, so that 8 cylinders would be darker than 4. This is done on purpose so you can easily tell what is what by the shade of the color.
Once you convert to categorical halfway through there is no longer such a relationship between the cylinder values, i.e. 8 cylinders is not twice as much a 4 cylinders anymore, they are essentially two totally different categories. To avoid associating the shade of color with the scale of the variable (since the values are no longer continuous and the relationship doesn't exist) a categorical color palette will be used by default, such that every color is distinct from the other.
In order to solve your problem, you will need to cast cylinders back to int64 prior to running your final chart with
df.cylinders = df.cylinders.astype('int64')
This will restore the variable to a continuous one and will allow seaborn to use gradients of the same color to represent the size of the values and your final plot will look just like the first one.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")
# download data
df = pd.read_csv("https://www.statlearning.com/s/Auto.csv")
df.head()
# remove rows with "?"
df.drop(df.index[~df.horsepower.str.isnumeric()], axis=0, inplace=True)
df['horsepower'] = pd.to_numeric(df.horsepower, errors='coerce')
# plot 1 (gives the desired color-palette)
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);
# plot 2
# Converting column cylinder to factor before using for 'color'
df.cylinders = df.cylinders.astype('category')
# Scatter plot - Cylinders as hue
pal = ['#fdc086','#386cb0','#beaed4','#33a02c','#f0027f']
col_map = dict(zip(sorted(df.cylinders.unique()), pal))
fig = px.scatter(df, y='mpg', x='year', color='cylinders',
color_discrete_map=col_map,
hover_data=['name','origin'])
fig.update_layout(width=800, height=400, plot_bgcolor='#fff')
fig.update_traces(marker=dict(size=8, line=dict(width=0.2,color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.show()
# plot 1 run again
df.cylinders = df.cylinders.astype('int64')
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);
Output
One way to get it back is with sns.set(). But that doesn't tell us the name of the color scheme.
I am creating bar graphs for data that comes from series. However the names (x-axis values) are extremely long. If they are rotated 90 degrees it is impossible to read the entire name and get a good image of the graph. 45 degrees is not much better. I am looking for a way to label the x-axis by numbers 1-15 and then have a legend listing the names that correspond to each number.
This is the completed function I have so far, including creating the series from a larger dataframe
def graph_average_expressions(TAD_matches, CAGE):
"""graphs the top 15 expression levels of each lncRNA"""
for i, row in TAD_matches.iterrows():
mask = (
CAGE['short_description'].isin(row['peak_ID'])
)#finds expression level for peaks in an lncRNA
average = CAGE[mask].iloc[:,8:].mean(axis=0).astype('float32').sort_values().tail(n=15)
#made a new df of the top 15 highest expression levels for all averaged groups
#a group is peaks belong to the same lncRNA
cell_type = list(average.index)
expression = list(average.values)
average_df = pd.DataFrame(
list(zip(cell_type, expression)),
columns=['cell_type','expression']
)
colors = sns.color_palette(
'husl',
n_colors=len(cell_type)
)
p = sns.barplot(
x=average_df.index,
y='expression',
data=average_df,
palette=colors
)
cmap = dict(zip(average_df.cell_type, colors))
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
plt.legend(
handles=patches,
bbox_to_anchor=(1.04, 0.5),
loc='center left',
borderaxespad=0
)
plt.title('expression_levels_of_lncRNA_' + row['lncRNA_name'])
plt.xlabel('cell_type')
plt.ylabel('expression')
plt.show()
Here is an example of the data I am graphing
CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532 1.583428
Neutrophils_donor3.CNhs11905 1.832527
CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483 1.858384
CD14_monocytes_treated_with_Candida_donor1.CNhs13473 1.873013
CD14_Monocytes_donor2.CNhs11954 2.041607
CD14_monocytes_treated_with_Candida_donor2.CNhs13488 2.112112
CD14_Monocytes_donor3.CNhs11997 2.195365
CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469 2.974203
Eosinophils_donor3.CNhs12549 3.566822
CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470 3.685389
CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471 4.409062
CD14_monocytes_treated_with_Candida_donor3.CNhs13494 5.546789
CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492 5.673991
Neutrophils_donor1.CNhs10862 8.352045
Neutrophils_donor2.CNhs11959 11.595509
With the new code above this is the graph I get, but no legend or title.
A bit of a different route. Made a string mapping x values to the names and added it to the figure.
Made my own DataFrame for illustration.
from matplotlib import pyplot as plt
import pandas as pd
import string,random
df = pd.DataFrame({'name':[''.join(random.sample(string.ascii_letters,15))
for _ in range(10)],
'data':[random.randint(1,20) for _ in range(10)]})
Make the plot.
fig,ax = plt.subplots()
ax.bar(df.index,df.data)
Make the legend.
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
Add the legend as a Text artist and adjust the plot to accommodate it.
t = ax.text(.7,.2,x_legend,transform=ax.figure.transFigure)
fig.subplots_adjust(right=.65)
plt.show()
plt.close()
That can be made dynamic by getting and using the Text artist's size and the Figure's size.
# using imports and DataFrame from above
fig,ax = plt.subplots()
r = fig.canvas.get_renderer()
ax.bar(df.index,df.data)
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
t = ax.text(0,.1,x_legend,transform=ax.figure.transFigure)
# find the width of the Text and place it on the right side of the Figure
twidth = t.get_window_extent(renderer=r).width
*_,fwidth,fheight = fig.bbox.extents
tx,ty = t.get_position()
tx = .95 - (twidth/fwidth)
t.set_position((tx,ty))
# adjust the right edge of the plot/Axes
ax_right = tx - .05
fig.subplots_adjust(right=ax_right)
Setup the dataframe
verify the index of the dataframe to be plotted is reset, so it's integers beginning at 0, and use the index as the x-axis
plot the values on the y-axis
Option 1A: Seaborn hue
The easiest way is probably to use seaborn.barplot and use the hue parameter with the 'names'
Seaborn: Choosing color palettes
This plot is using husl
Additional options for the husl palette can be found at seaborn.husl_palette
The bars will not be centered for this option, because they are placed according to the number of hue levels, and there are 15 levels in this case.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, hue='names')
# place the legend to the right of the plot
plt.legend(bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 1B: Seaborn palette
Using the palette parameter instead of hue, places the bars directly over the ticks.
This option requires "manually" associating 'names' with the colors and creating the legend.
patches uses Patch to create each item in the legend. (e.g. the rectangle, associated with color and name).
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Patch
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, palette=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 2: pandas.DataFrame.plot
This option also requires "manually" associating 'names' with the palette and creating the legend using Patch.
Choosing Colormaps in Matplotlib
This plot is using tab20c
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.patches import Patch
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# chose a color map with enough colors for the number of bars
colors = [plt.cm.tab20c(np.arange(len(df)))]
# plot the dataframe
df.plot.bar(color=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors[0]))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Reproducible DataFrame
data = {'names': ['CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532', 'Neutrophils_donor3.CNhs11905', 'CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483', 'CD14_monocytes_treated_with_Candida_donor1.CNhs13473', 'CD14_Monocytes_donor2.CNhs11954', 'CD14_monocytes_treated_with_Candida_donor2.CNhs13488', 'CD14_Monocytes_donor3.CNhs11997', 'CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469', 'Eosinophils_donor3.CNhs12549', 'CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470', 'CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471', 'CD14_monocytes_treated_with_Candida_donor3.CNhs13494', 'CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492', 'Neutrophils_donor1.CNhs10862', 'Neutrophils_donor2.CNhs11959'],
'values': [1.583428, 1.832527, 1.858384, 1.873013, 2.041607, 2.1121112, 2.195365, 2.974203, 3.566822, 3.685389, 4.409062, 5.546789, 5.673991, 8.352045, 11.595509]}
df = pd.DataFrame(data)
Is there any possibility to do a bar plot without y-(x-)axis? In presentations all redundant informations have to be erased, so I would like to begin to delete the axis. I did not see helpful informations in the matplotlib documentation. Maybe you have better solutions than pyplot..?
Edit: I would like to have lines around the bars except the axis at the bottom. Is this possible
#!/usr/bin/env python
import matplotlib.pyplot as plt
ind = (1,2,3)
width = 0.8
fig = plt.figure(1)
p1 = plt.bar(ind,ind)
# plt.show()
fig.savefig("test.svg")
Edit: I did not see using plt.show()
that there is still the yaxis without ticks.
To make the axes not visible, try something like
import matplotlib.pyplot as plt
ind = (1,2,3)
width = 0.8
fig,a = plt.subplots()
p1 = a.bar(ind,ind)
a.xaxis.set_visible(False)
a.yaxis.set_visible(False)
plt.show()
Is this what you meant?
Here is the code I used at the end. It is not minimal anymore. Maybe it helps.
import matplotlib.pyplot as plt
import numpy as np
def adjust_spines(ax,spines):
for loc, spine in ax.spines.items():
if loc in spines:
spine.set_smart_bounds(True)
else:
spine.set_color('none') # don't draw spine
# turn off ticks where there is no spine
if 'left' in spines:
ax.yaxis.set_ticks_position('left')
else:
# no yaxis ticks
ax.yaxis.set_ticks([])
def nbar(samples, data, err, bWidth=0.4, bSafe=True, svgName='out'):
fig,a = plt.subplots(frameon=False)
if len(data)!=len(samples):
print("length(data) must be equal to length(samples)!")
return
ticks = np.arange(len(data))
p1 = plt.bar(ticks, data, bWidth, yerr=err)
plt.xticks(ticks+bWidth/2., samples )
adjust_spines(a,['bottom'])
a.xaxis.tick_bottom()
if bSafe:
fig.savefig(svgName+".svg")
samples = ('Sample1', 'Sample2','Sample3')
qyss = (91, 44, 59)
qysserr = (1,5,4)
nbar(samples,qyss,qysserr,svgName="test")
Thx to all contributors.