Specify specific value in the plot - python

How I can specify only check from data column in the plot.
Data
data x-axis y-axis result
abc 2 1 negative
abc 3 1 negative
check 1 1 positive
abc 4 1 positive
Code
ax1=sns.scatterplot(data=df, x="x-axis", y="y-axis", hue="result",markers= 'x',s=950,label=None, )
#ax1.set(xlabel=None, ylabel=None, xticklabels=[], yticklabels=[])
ax1.set_yticks((0, 1, 2, 3, 4, 5, 6, 7, 8), minor=0)
ax1.set_xticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], minor=False)
#plt.legend(bboxche_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.setp(ax1.get_legend().get_texts(), fontsize='14') # for legend text
ax1.plot()
Plot

df['data']=='check'] use this to specify the data during plotting

Related

A way to update figure layout in a for loop for each subplot (Plotly)

Is there a way I can update each figure's layout in a loop like this? I added each layout to a list and am looping through each but can't seem to update the figures in the subplot:
# Data Visualization
from plotly.subplots import make_subplots
import plotly.graph_objects as go
epoch_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
loss_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
val_loss_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
error_rate = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
val_error_rate = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
layout_list = []
loss_plots = [go.Scatter(x=epoch_list,
y=loss_list,
mode='lines',
name='Loss',
line=dict(width=4)),
go.Scatter(x=epoch_list,
y=val_loss_list,
mode='lines',
name='Validation Loss',
line=dict(width=4))]
loss_layout = dict(font_color='black',
title_font_color='black',
title=dict(text='Loss Graph',
font_size=30),
xaxis_title=dict(text='Epochs',
font_size=25),
yaxis_title=dict(text='Loss',
font_size=25),
legend=dict(font_size=15))
loss_figure = go.Figure(data=loss_plots)
layout_list.append(loss_layout)
error_plots = [go.Scatter(x=epoch_list,
y=loss_list,
mode='lines',
name='Error Rate',
line=dict(width=4)),
go.Scatter(x=epoch_list,
y=val_loss_list,
mode='lines',
name='Validation Error Rate',
line=dict(width=4))]
error_rate_layout = dict(font_color='black',
title_font_color='black',
title=dict(text='Error Rate Graph',
font_size=30),
xaxis_title=dict(text='Epochs',
font_size=25),
yaxis_title=dict(text='Error Rate',
font_size=25),
legend=dict(font_size=15))
error_figure = go.Figure(data=error_plots)
layout_list.append(error_rate_layout)
metric_figure = make_subplots(
rows=3, cols=2,
specs=[[{}, {}],
[{}, {}],
[{}, {}]])
for t in loss_figure.data:
metric_figure.append_trace(t, row=1, col=1)
for t in error_figure.data:
metric_figure.append_trace(t, row=1, col=2)
for (figure, layout) in zip(metric_figure, layout_list):
figure.update_layout(layout)
metric_figure.show()
It seems that doing this doesn't work either as the layout does not transfer over because I am looping through the traces only:
loss_figure = go.Figure(data=loss_plots, layout=loss_layout)
you can use python dict merging techniques
metric_figure.update_layout({**loss_layout, **error_rate_layout})
alternatively, if layouts are in figures
metric_figure.update_layout({**error_figure.to_dict()["layout"],**error_ficture.to_dict()["layout"]})
both of these are of limited use as sub-plot layouts are significantly different from individual figures. There will be different x-axis and y-axis definitions than individual figures / layouts and where dictionary keys overlap only one can be used - for example title

Fitting a large number of bars into a matplotlib barh graph

I'm trying to make a horizontal bar graph with a large number of elements/bars with matplotlib's barh function. However, I'm having a couple of problems with bars being too close together and their labels being illegible (see image below):
I first tried changing the figure size, setting figsize=(10,40) and increasing the height up from 40, to no avail.
I also tried bumping up the spacing between bars from 0.2 to 0.3 (in the positions list), but it seems that going any higher than a spacing of 0.2 makes some of the bars disappear. In other words, there seem to be clusters of ~5 bars that are too close together that get spaced properly at 0.3, but all the bars between these clusters disappear.
The code is shown below (adapted from the mpl docs/examples). I'm sure there's rather an easy fix here that I'm just too much of a novice to realize. Alternatively, I could try graphing this in matlab but I prefer python for quality and simplicity. Are there improvements I could make that would make my bar graph legible?
Code:
genus = {'Parasutterella': 1, 'Anaerobaculum': 1, 'Clostridiales': 1, 'Butyrivibrio': 1, 'Anaerococcus': 1, 'Neisseria': 1, 'Campylobacter': 1, 'Intestinibacter': 1, 'Erysipelatoclostridium': 1, 'Tannerella': 1, 'Barnesiella': 1, 'Enterobacter': 1, 'Odoribacter': 1, 'Arcobacter': 1, 'Dialister': 1, 'Alistipes': 1, 'Collinsella': 2, 'Synergistes': 2, 'Burkholderiales': 2, 'Gordonibacter': 2, 'Tyzzerella': 2, 'Providencia': 2, 'Weissella': 2, 'Enterobacteriaceae': 2, 'Flavonifractor': 2, 'Prevotella': 2, 'Klebsiella': 2, 'Citrobacter': 2, 'Actinomyces': 2, 'Proteus': 2, 'Catenibacterium': 2, 'Propionibacterium': 2, 'Mitsuokella': 2, 'butyrate-producing': 2, 'Parvimonas': 2, 'Phascolarctobacterium': 2, 'Desulfovibrio': 2, 'Cedecea': 2, 'Finegoldia': 2, 'Slackia': 3, '[Bacteroides]': 3, 'Hafnia': 3, 'Acidaminococcus': 3, 'Bifidobacterium': 3, 'Sutterella': 3, 'Anaerofustis': 3, 'Paraprevotella': 3, 'Oxalobacter': 3, 'Yokenella': 3, 'Leuconostoc': 3, 'Dermabacter': 3, 'Megamonas': 4, 'Staphylococcus': 4, 'Fusobacterium': 4, 'Anaerostipes': 4, 'Bilophila': 4, 'Butyricicoccus': 4, 'Parabacteroides': 4, 'Erysipelotrichaceae': 4, 'Anaerotruncus': 4, 'Listeria': 4, 'Corynebacterium': 5, 'Pseudoflavonifractor': 5, 'Dorea': 5, 'Streptococcus': 6, 'Roseburia': 6, 'Helicobacter': 6, 'Eggerthella': 6, 'Acinetobacter': 6, '[Clostridium': 6, 'Ruminococcaceae': 6, 'Dysgonomonas': 6, '[Eubacterium]': 6, 'Enterococcus': 6, 'Subdoligranulum': 7, 'Faecalibacterium': 7, 'Blautia': 8, 'Holdemania': 8, 'Bacteroides': 8, 'Marvinbryantia': 8, 'Coprococcus': 9, 'Eubacterium': 9, 'Lactobacillus': 9, 'Paenisporosarcina': 9, 'Turicibacter': 9, 'Ruminococcus': 10, 'Coprobacillus': 11, 'Ralstonia': 11, 'Peptoclostridium': 11, 'Pseudomonas': 13, 'Desulfitobacterium': 14, 'Bacillus': 15, 'Streptomyces': 26, '[Clostridium]': 29, 'Paenibacillus': 32, 'Lachnospiraceae': 32, 'Clostridium': 35}
barWidth = 0.125
labels = list(genus.keys())
cols = len(labels)
bars = []
positions = [(i+1)*0.2 for i in range(cols)]
for key in labels:
bars.append(genus[key])
fig,ax = plt.subplots()
rects = []
for i in range(len(bars)):
if labels[i] in pos_genus:
rects.append(ax.barh(y=positions[i], width=bars[i], height=barWidth, color='#000000',label='Gram Positive'))
else:
rects.append(ax.barh(y=positions[i], width=bars[i], height=barWidth, color='#E8384F',label='Gram Negative'))
ax.set_title('Genus')
ax.set_yticks(positions)
ax.set_yticklabels(labels)
ax.set_ylabel('Genus')
ax.set_xlabel('Number of Organisms')
#ax.set_ylim(positions[0]-barWidth,positions[-1]+barWidth)
ax.set_xlim(0,40)
blk_patch = mpatches.Patch(color='#000000', label='Gram Positive')
red_patch = mpatches.Patch(color='#E8384F', label='Gram Negative')
plt.legend(handles=[blk_patch, red_patch])
#plt.figure(figsize=(10,50))
bar_path = os.path.join(paths['Figures'], "{0}_horiz_bar.png".format(str('genus')))
plt.savefig(bar_path,dpi=300,bbox_inches='tight')
plt.show()
Illegible barh plot:

Textposition not displaying on plotly

I am trying to plot the accuracy of the training and test set of my neural network using plotly.
I want also to add a marker with a text that says when was the maximum value of each but also displays a text that says what that value was. I tried doing something like in this example.
Here my mcve:
import plotly.graph_objects as go
data = {
'test acc': [1, 2, 3, 4, 5, 6, 7, 9, 10],
'train acc': [3, 5, 5, 6, 7, 8, 9, 10, 8]
}
fig = go.Figure()
color_train = 'rgb(255, 0, 0)'
color_test = 'rgb(0, 255, 0)'
assert len(data["train acc"]) == len(data["test acc"])
x = list(range(len(data["train acc"])))
fig.add_trace(go.Scatter(x=x,
y=data["train acc"],
mode='lines',
name='train acc',
line_color=color_train))
fig.add_trace(go.Scatter(x=x,
y=data["test acc"],
mode='lines',
name='test acc',
line_color=color_test))
# Max points
train_max = max(data["train acc"])
test_max = max(data["test acc"])
# ATTENTION! this will only give you first occurrence
train_max_index = data["train acc"].index(train_max)
test_max_index = data["test acc"].index(test_max)
fig.add_trace(go.Scatter(x=[train_max_index],
y=[train_max],
mode='markers',
name='max value train',
text=['{}%'.format(int(train_max * 100))],
textposition="top center",
marker_color=color_train))
fig.add_trace(go.Scatter(x=[test_max_index],
y=[test_max],
mode='markers',
name='max value test',
text=['{}%'.format(int(test_max*100))],
textposition="top center",
marker_color=color_test))
fig.update_layout(title='Train vs Test accuracy',
xaxis_title='epochs',
yaxis_title='accuracy (%)'
)
fig.show()
However, my output fire is the following:
As you can see, the value is not being displayed as in the example I found.
How can I make it appear?
If you'd only like to highlight a few certain values, use add_annotation(). In your case just find the max and min Y for the X that you'd like to put into focus. Lacking a data sample from your side, here's how I'd do it with a generic data sample:
Plot:
Code:
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='browser'
fig = go.Figure()
xVars1=[0, 1, 2, 3, 4, 5, 6, 7, 8]
yVars1=[0, 1, 3, 2, 4, 3, 4, 6, 5]
xVars2=[0, 1, 2, 3, 4, 5, 6, 7, 8]
yVars2=[0, 4, 5, 1, 2, 2, 3, 4, 2]
fig.add_trace(go.Scatter(
x=xVars1,
y=yVars1
))
fig.add_trace(go.Scatter(
x=xVars2,
y=yVars2
))
fig.add_annotation(
x=yVars1.index(max(yVars1)),
y=max(yVars1),
text="yVars1 max")
fig.add_annotation(
x=yVars2.index(max(yVars2)),
y=max(yVars2),
text="yVars2 max")
fig.update_annotations(dict(
xref="x",
yref="y",
showarrow=True,
arrowhead=7,
ax=0,
ay=-40
))
fig.update_layout(showlegend=False)
fig.show()

How to create properly filled lines in Plotly when there are data gaps

Based on https://plot.ly/python/line-charts/#filled-lines, one can run the code below
import plotly.graph_objects as go
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x_rev = x[::-1]
y = [5, 2.5, 5, 7.5, 5, 2.5, 7.5, 4.5, 5.5, 5]
y_upper = [5.5, 3, 5.5, 8, 6, 3, 8, 5, 6, 5.5]
y_lower = [4.5, 2, 4.4, 7, 4, 2, 7, 4, 5, 4.75]
y_lower_rev = y_lower[::-1]
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x, y=y,
line_color='rgb(0,176,246)',
name='Mid line',
))
fig.add_trace(go.Scatter(
x=x+x_rev,
y=y_upper+y_lower_rev,
fill='toself',
fillcolor='rgba(0,176,246,0.2)',
line_color='rgba(255,255,255,0)',
name='Filled lines working properly',
))
fig.update_traces(mode='lines')
fig.show()
And successfully get the plot below
However in case there are data gaps, the filled portions do not seem to work properly (e.g. first and second connected component), at least with the code tried below.
What is the right way/code to successfully have data gaps and and filled lines?
x_for_gaps_example = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
x_for_gaps_example_rev = x_for_gaps_example[::-1]
y_with_gaps =[5, 15, None, 10, 5, 0, 10, None, 15, 5, 5, 10, 20, 15, 5]
y_upper_with_gaps = [i+1 if i is not None else None for i in y_with_gaps]
y_lower_with_gaps = [i-2 if i is not None else None for i in y_with_gaps][::-1]
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x_for_gaps_example,
y=y_with_gaps,
name='Mid Line with <b>Gaps</b>'
))
fig.add_trace(go.Scatter(
x=x_for_gaps_example+x_for_gaps_example_rev,
y=y_upper_with_gaps+y_lower_with_gaps,
fill='toself',
fillcolor='rgba(0,176,246,0.2)',
line_color='rgba(255,255,255,0)',
name='Filled Lines not working properly with <b>gaps</b>'
))
fig.show()
It seems to be quite an old plotly bug:
Refer to:
https://github.com/plotly/plotly.js/issues/1132
and:
https://community.plot.ly/t/scatter-line-plot-fill-option-fills-gaps/21264
One solution might be to break down your whole filling trace into multiple pieces and add them to the figure. However, this might a bit complicated, because it'd require different computation to determine the location of that filling area.
You can actually improve your chart a bit, by setting the connectgaps property to true, which result in this:
But, that looks somewhat weird ;)

bar graph with wrong width

I want to create a bar graph for a dataframe contains multiple categories, with a different color for each category. Below is my simplified code and resulting graph. The top subplot is a regular bar graph in one color, the bottom subplot is color coded but the bar width is messed up. Any suggestions? Thanks!
import random
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Cat': [1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4],
'A': [2, 3, 6, 7, 9, 10, 15, 18, 22, 23, 24, 25],
'B': random.sample(range(1, 20), 12)})
fig = plt.figure(figsize=(15, 15/2.3))
ax = plt.subplot(2, 1, 1)
plt.bar(df.A, df.B)
plt.xlim(0, 30)
ax = plt.subplot(2, 1, 2)
for cat in df.Cat.unique():
df_ = df.loc[(df.Cat==cat), :]
plt.bar(df_.A, df_.B, width=0.5)
plt.xlim(0, 30)
plt.show()

Categories