Fitting a large number of bars into a matplotlib barh graph - python

I'm trying to make a horizontal bar graph with a large number of elements/bars with matplotlib's barh function. However, I'm having a couple of problems with bars being too close together and their labels being illegible (see image below):
I first tried changing the figure size, setting figsize=(10,40) and increasing the height up from 40, to no avail.
I also tried bumping up the spacing between bars from 0.2 to 0.3 (in the positions list), but it seems that going any higher than a spacing of 0.2 makes some of the bars disappear. In other words, there seem to be clusters of ~5 bars that are too close together that get spaced properly at 0.3, but all the bars between these clusters disappear.
The code is shown below (adapted from the mpl docs/examples). I'm sure there's rather an easy fix here that I'm just too much of a novice to realize. Alternatively, I could try graphing this in matlab but I prefer python for quality and simplicity. Are there improvements I could make that would make my bar graph legible?
Code:
genus = {'Parasutterella': 1, 'Anaerobaculum': 1, 'Clostridiales': 1, 'Butyrivibrio': 1, 'Anaerococcus': 1, 'Neisseria': 1, 'Campylobacter': 1, 'Intestinibacter': 1, 'Erysipelatoclostridium': 1, 'Tannerella': 1, 'Barnesiella': 1, 'Enterobacter': 1, 'Odoribacter': 1, 'Arcobacter': 1, 'Dialister': 1, 'Alistipes': 1, 'Collinsella': 2, 'Synergistes': 2, 'Burkholderiales': 2, 'Gordonibacter': 2, 'Tyzzerella': 2, 'Providencia': 2, 'Weissella': 2, 'Enterobacteriaceae': 2, 'Flavonifractor': 2, 'Prevotella': 2, 'Klebsiella': 2, 'Citrobacter': 2, 'Actinomyces': 2, 'Proteus': 2, 'Catenibacterium': 2, 'Propionibacterium': 2, 'Mitsuokella': 2, 'butyrate-producing': 2, 'Parvimonas': 2, 'Phascolarctobacterium': 2, 'Desulfovibrio': 2, 'Cedecea': 2, 'Finegoldia': 2, 'Slackia': 3, '[Bacteroides]': 3, 'Hafnia': 3, 'Acidaminococcus': 3, 'Bifidobacterium': 3, 'Sutterella': 3, 'Anaerofustis': 3, 'Paraprevotella': 3, 'Oxalobacter': 3, 'Yokenella': 3, 'Leuconostoc': 3, 'Dermabacter': 3, 'Megamonas': 4, 'Staphylococcus': 4, 'Fusobacterium': 4, 'Anaerostipes': 4, 'Bilophila': 4, 'Butyricicoccus': 4, 'Parabacteroides': 4, 'Erysipelotrichaceae': 4, 'Anaerotruncus': 4, 'Listeria': 4, 'Corynebacterium': 5, 'Pseudoflavonifractor': 5, 'Dorea': 5, 'Streptococcus': 6, 'Roseburia': 6, 'Helicobacter': 6, 'Eggerthella': 6, 'Acinetobacter': 6, '[Clostridium': 6, 'Ruminococcaceae': 6, 'Dysgonomonas': 6, '[Eubacterium]': 6, 'Enterococcus': 6, 'Subdoligranulum': 7, 'Faecalibacterium': 7, 'Blautia': 8, 'Holdemania': 8, 'Bacteroides': 8, 'Marvinbryantia': 8, 'Coprococcus': 9, 'Eubacterium': 9, 'Lactobacillus': 9, 'Paenisporosarcina': 9, 'Turicibacter': 9, 'Ruminococcus': 10, 'Coprobacillus': 11, 'Ralstonia': 11, 'Peptoclostridium': 11, 'Pseudomonas': 13, 'Desulfitobacterium': 14, 'Bacillus': 15, 'Streptomyces': 26, '[Clostridium]': 29, 'Paenibacillus': 32, 'Lachnospiraceae': 32, 'Clostridium': 35}
barWidth = 0.125
labels = list(genus.keys())
cols = len(labels)
bars = []
positions = [(i+1)*0.2 for i in range(cols)]
for key in labels:
bars.append(genus[key])
fig,ax = plt.subplots()
rects = []
for i in range(len(bars)):
if labels[i] in pos_genus:
rects.append(ax.barh(y=positions[i], width=bars[i], height=barWidth, color='#000000',label='Gram Positive'))
else:
rects.append(ax.barh(y=positions[i], width=bars[i], height=barWidth, color='#E8384F',label='Gram Negative'))
ax.set_title('Genus')
ax.set_yticks(positions)
ax.set_yticklabels(labels)
ax.set_ylabel('Genus')
ax.set_xlabel('Number of Organisms')
#ax.set_ylim(positions[0]-barWidth,positions[-1]+barWidth)
ax.set_xlim(0,40)
blk_patch = mpatches.Patch(color='#000000', label='Gram Positive')
red_patch = mpatches.Patch(color='#E8384F', label='Gram Negative')
plt.legend(handles=[blk_patch, red_patch])
#plt.figure(figsize=(10,50))
bar_path = os.path.join(paths['Figures'], "{0}_horiz_bar.png".format(str('genus')))
plt.savefig(bar_path,dpi=300,bbox_inches='tight')
plt.show()
Illegible barh plot:

Related

A way to update figure layout in a for loop for each subplot (Plotly)

Is there a way I can update each figure's layout in a loop like this? I added each layout to a list and am looping through each but can't seem to update the figures in the subplot:
# Data Visualization
from plotly.subplots import make_subplots
import plotly.graph_objects as go
epoch_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
loss_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
val_loss_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
error_rate = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
val_error_rate = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
layout_list = []
loss_plots = [go.Scatter(x=epoch_list,
y=loss_list,
mode='lines',
name='Loss',
line=dict(width=4)),
go.Scatter(x=epoch_list,
y=val_loss_list,
mode='lines',
name='Validation Loss',
line=dict(width=4))]
loss_layout = dict(font_color='black',
title_font_color='black',
title=dict(text='Loss Graph',
font_size=30),
xaxis_title=dict(text='Epochs',
font_size=25),
yaxis_title=dict(text='Loss',
font_size=25),
legend=dict(font_size=15))
loss_figure = go.Figure(data=loss_plots)
layout_list.append(loss_layout)
error_plots = [go.Scatter(x=epoch_list,
y=loss_list,
mode='lines',
name='Error Rate',
line=dict(width=4)),
go.Scatter(x=epoch_list,
y=val_loss_list,
mode='lines',
name='Validation Error Rate',
line=dict(width=4))]
error_rate_layout = dict(font_color='black',
title_font_color='black',
title=dict(text='Error Rate Graph',
font_size=30),
xaxis_title=dict(text='Epochs',
font_size=25),
yaxis_title=dict(text='Error Rate',
font_size=25),
legend=dict(font_size=15))
error_figure = go.Figure(data=error_plots)
layout_list.append(error_rate_layout)
metric_figure = make_subplots(
rows=3, cols=2,
specs=[[{}, {}],
[{}, {}],
[{}, {}]])
for t in loss_figure.data:
metric_figure.append_trace(t, row=1, col=1)
for t in error_figure.data:
metric_figure.append_trace(t, row=1, col=2)
for (figure, layout) in zip(metric_figure, layout_list):
figure.update_layout(layout)
metric_figure.show()
It seems that doing this doesn't work either as the layout does not transfer over because I am looping through the traces only:
loss_figure = go.Figure(data=loss_plots, layout=loss_layout)
you can use python dict merging techniques
metric_figure.update_layout({**loss_layout, **error_rate_layout})
alternatively, if layouts are in figures
metric_figure.update_layout({**error_figure.to_dict()["layout"],**error_ficture.to_dict()["layout"]})
both of these are of limited use as sub-plot layouts are significantly different from individual figures. There will be different x-axis and y-axis definitions than individual figures / layouts and where dictionary keys overlap only one can be used - for example title

Specify specific value in the plot

How I can specify only check from data column in the plot.
Data
data x-axis y-axis result
abc 2 1 negative
abc 3 1 negative
check 1 1 positive
abc 4 1 positive
Code
ax1=sns.scatterplot(data=df, x="x-axis", y="y-axis", hue="result",markers= 'x',s=950,label=None, )
#ax1.set(xlabel=None, ylabel=None, xticklabels=[], yticklabels=[])
ax1.set_yticks((0, 1, 2, 3, 4, 5, 6, 7, 8), minor=0)
ax1.set_xticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], minor=False)
#plt.legend(bboxche_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.setp(ax1.get_legend().get_texts(), fontsize='14') # for legend text
ax1.plot()
Plot
df['data']=='check'] use this to specify the data during plotting

Textposition not displaying on plotly

I am trying to plot the accuracy of the training and test set of my neural network using plotly.
I want also to add a marker with a text that says when was the maximum value of each but also displays a text that says what that value was. I tried doing something like in this example.
Here my mcve:
import plotly.graph_objects as go
data = {
'test acc': [1, 2, 3, 4, 5, 6, 7, 9, 10],
'train acc': [3, 5, 5, 6, 7, 8, 9, 10, 8]
}
fig = go.Figure()
color_train = 'rgb(255, 0, 0)'
color_test = 'rgb(0, 255, 0)'
assert len(data["train acc"]) == len(data["test acc"])
x = list(range(len(data["train acc"])))
fig.add_trace(go.Scatter(x=x,
y=data["train acc"],
mode='lines',
name='train acc',
line_color=color_train))
fig.add_trace(go.Scatter(x=x,
y=data["test acc"],
mode='lines',
name='test acc',
line_color=color_test))
# Max points
train_max = max(data["train acc"])
test_max = max(data["test acc"])
# ATTENTION! this will only give you first occurrence
train_max_index = data["train acc"].index(train_max)
test_max_index = data["test acc"].index(test_max)
fig.add_trace(go.Scatter(x=[train_max_index],
y=[train_max],
mode='markers',
name='max value train',
text=['{}%'.format(int(train_max * 100))],
textposition="top center",
marker_color=color_train))
fig.add_trace(go.Scatter(x=[test_max_index],
y=[test_max],
mode='markers',
name='max value test',
text=['{}%'.format(int(test_max*100))],
textposition="top center",
marker_color=color_test))
fig.update_layout(title='Train vs Test accuracy',
xaxis_title='epochs',
yaxis_title='accuracy (%)'
)
fig.show()
However, my output fire is the following:
As you can see, the value is not being displayed as in the example I found.
How can I make it appear?
If you'd only like to highlight a few certain values, use add_annotation(). In your case just find the max and min Y for the X that you'd like to put into focus. Lacking a data sample from your side, here's how I'd do it with a generic data sample:
Plot:
Code:
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='browser'
fig = go.Figure()
xVars1=[0, 1, 2, 3, 4, 5, 6, 7, 8]
yVars1=[0, 1, 3, 2, 4, 3, 4, 6, 5]
xVars2=[0, 1, 2, 3, 4, 5, 6, 7, 8]
yVars2=[0, 4, 5, 1, 2, 2, 3, 4, 2]
fig.add_trace(go.Scatter(
x=xVars1,
y=yVars1
))
fig.add_trace(go.Scatter(
x=xVars2,
y=yVars2
))
fig.add_annotation(
x=yVars1.index(max(yVars1)),
y=max(yVars1),
text="yVars1 max")
fig.add_annotation(
x=yVars2.index(max(yVars2)),
y=max(yVars2),
text="yVars2 max")
fig.update_annotations(dict(
xref="x",
yref="y",
showarrow=True,
arrowhead=7,
ax=0,
ay=-40
))
fig.update_layout(showlegend=False)
fig.show()

How to create properly filled lines in Plotly when there are data gaps

Based on https://plot.ly/python/line-charts/#filled-lines, one can run the code below
import plotly.graph_objects as go
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x_rev = x[::-1]
y = [5, 2.5, 5, 7.5, 5, 2.5, 7.5, 4.5, 5.5, 5]
y_upper = [5.5, 3, 5.5, 8, 6, 3, 8, 5, 6, 5.5]
y_lower = [4.5, 2, 4.4, 7, 4, 2, 7, 4, 5, 4.75]
y_lower_rev = y_lower[::-1]
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x, y=y,
line_color='rgb(0,176,246)',
name='Mid line',
))
fig.add_trace(go.Scatter(
x=x+x_rev,
y=y_upper+y_lower_rev,
fill='toself',
fillcolor='rgba(0,176,246,0.2)',
line_color='rgba(255,255,255,0)',
name='Filled lines working properly',
))
fig.update_traces(mode='lines')
fig.show()
And successfully get the plot below
However in case there are data gaps, the filled portions do not seem to work properly (e.g. first and second connected component), at least with the code tried below.
What is the right way/code to successfully have data gaps and and filled lines?
x_for_gaps_example = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
x_for_gaps_example_rev = x_for_gaps_example[::-1]
y_with_gaps =[5, 15, None, 10, 5, 0, 10, None, 15, 5, 5, 10, 20, 15, 5]
y_upper_with_gaps = [i+1 if i is not None else None for i in y_with_gaps]
y_lower_with_gaps = [i-2 if i is not None else None for i in y_with_gaps][::-1]
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x_for_gaps_example,
y=y_with_gaps,
name='Mid Line with <b>Gaps</b>'
))
fig.add_trace(go.Scatter(
x=x_for_gaps_example+x_for_gaps_example_rev,
y=y_upper_with_gaps+y_lower_with_gaps,
fill='toself',
fillcolor='rgba(0,176,246,0.2)',
line_color='rgba(255,255,255,0)',
name='Filled Lines not working properly with <b>gaps</b>'
))
fig.show()
It seems to be quite an old plotly bug:
Refer to:
https://github.com/plotly/plotly.js/issues/1132
and:
https://community.plot.ly/t/scatter-line-plot-fill-option-fills-gaps/21264
One solution might be to break down your whole filling trace into multiple pieces and add them to the figure. However, this might a bit complicated, because it'd require different computation to determine the location of that filling area.
You can actually improve your chart a bit, by setting the connectgaps property to true, which result in this:
But, that looks somewhat weird ;)

calculate histogram peaks in python

In Python, how do I calcuate the peaks of a histogram?
I tried this:
import numpy as np
from scipy.signal import argrelextrema
data = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 1, 2, 3, 4,
5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9,
12,
15, 16, 17, 18, 19, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24,]
h = np.histogram(data, bins=[0, 5, 10, 15, 20, 25])
hData = h[0]
peaks = argrelextrema(hData, np.greater)
But the result was:
(array([3]),)
I'd expect it to find the peaks in bin 0 and bin 3.
Note that the peaks span more than 1 bin. I don't want it to consider the peaks that span more than 1 column as additional peak.
I'm open to another way to get the peaks.
Note:
>>> h[0]
array([19, 15, 1, 10, 5])
>>>
In computational topology, the formalism of persistent homology provides a definition of "peak" that seems to address your need. In the 1-dimensional case the peaks are illustrated by the blue bars in the following figure:
A description of the algorithm is given in this
Stack Overflow answer of a peak detection question.
The nice thing is that this method not only identifies the peaks but it quantifies the "significance" in a natural way.
A simple and efficient implementation (as fast as sorting numbers) and the source material to the above answer given in this blog article:
https://www.sthu.org/blog/13-perstopology-peakdetection/index.html
Try the findpeaks library.
pip install findpeaks
# Your input data:
data = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 12, 15, 16, 17, 18, 19, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,]
# import library
from findpeaks import findpeaks
# Find some peaks using the smoothing parameter.
fp = findpeaks(lookahead=1, interpolate=10)
# fit
results = fp.fit(data)
# Make plot
fp.plot()
# Results with respect to original input data.
results['df']
# Results based on interpolated smoothed data.
results['df_interp']
I wrote an easy function:
def find_peaks(a):
x = np.array(a)
max = np.max(x)
lenght = len(a)
ret = []
for i in range(lenght):
ispeak = True
if i-1 > 0:
ispeak &= (x[i] > 1.8 * x[i-1])
if i+1 < lenght:
ispeak &= (x[i] > 1.8 * x[i+1])
ispeak &= (x[i] > 0.05 * max)
if ispeak:
ret.append(i)
return ret
I defined a peak as a value bigger than 180% that of the neighbors and bigger than 5% of the max value. Of course you can adapt the values as you prefer in order to find the best set up for your problem.

Categories