How do I show only available values in the x-axis - python

I would like to plot a chart with plotly that shows only the existing values in the x-axis.
When I execute the code below, a chart that looks like in the following image appears:
The range on the x-axis as well as the range on the y-axis is evenly set from zero up to the maximal value.
import plotly.graph_objs as go
from plotly.offline import plot
xValues = [1, 2, 27, 50]
yValues = [7, 1, 2, 3]
trace = go.Scatter( x = xValues, y = yValues, mode='lines+markers', name='high limits' )
plottedData = [trace]
plot( plottedData )
Now, I would like to show only the existing values on the x axis. Related to my example, I want just the values [1, 2, 27, 50] to appear. And they should have the same space in between. Is this possible? If yes, how?

You can force the xaxis.type to be category like this:
plot( dict(data=plottedData, layout=go.Layout(xaxis = {"type": "category"} )))

Related

How to draw custom error bars with plotly?

I have a data frame with one column that describes y-axis values and two more columns that describe the upper and lower bounds of a confidence interval. I would like to use those values to draw error bars using plotly. Now I am aware that plotly offers the possibility to draw confidence intervals (using the error_y and error_y_minus keyword-arguments) but not in the logic that I need, because those keywords are interpreted as additions and subtractions from the y-values. Instead, I would like to directly define the upper and lower positions:
For instance, how could I use plotly and this example data frame
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'x':[0, 1, 2],
'y':[6, 10, 2],
'ci_upper':[8,11,2.5],
'ci_lower':[5,9,1.5]})
to produce a plot like this?
used Plotly Express to create bar chart
used https://plotly.com/python/error-bars/#asymmetric-error-bars for generation of error bars using appropriate subtractions with your required outcome
import pandas as pd
import plotly.express as px
df = pd.DataFrame(
{"x": [0, 1, 2], "y": [6, 10, 2], "ci_upper": [8, 11, 2.5], "ci_lower": [5, 9, 1.5]}
)
px.bar(df, x="x", y="y").update_traces(
error_y={
"type": "data",
"symmetric": False,
"array": df["ci_upper"] - df["y"],
"arrayminus": df["y"] - df["ci_lower"],
}
)

How can I set the width and height in a plotly theme?

Context
I use the .plot method of pandas dataframes throughout a JupyterLab notebook and have set the plotting backend to plotly and the default plotly theme to plotly
Every time I plot I do a .update_layout afterwards to set the width, height and margins. I do that because I plan on exporting the notebook to reveal.js slides, not setting those properties results in unpredictable output.
This is my example code, which creates a 200x200 plot without any margins.
import pandas as pd
import plotly.io as pio
pd.
options.plotting.backend = "plotly"
pio.templates.default = "plotly"
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
df = pd.DataFrame({"x": x, "y": y})
fig = df.plot(x=x, y=y)
fig.update_layout(width=200, height=200, margin=dict(l=0, r=0, t=0, b=0))
fig.show()
As I want this plot size and margins in all my plots, I wanted to make a theme which I can set at the beginning, such that I don't have to call .udpate_layout on every figure.
What I've tried
I tried this:
import pandas as pd
import plotly.io as pio
# Creat a custom theme and set it as default
pio.templates["custom"] = pio.templates["plotly"]
pio.templates["custom"].layout.margin = dict(l=0, r=0, t=0, b=0)
pio.templates["custom"].layout.width = 200
pio.templates["custom"].layout.height = 200
pio.templates.default = "custom"
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
df = pd.DataFrame({"x": x, "y": y})
fig = df.plot(x=x, y=y)
fig.show()
The resulting plot doesn't adhere to the size specifications unfortunately. The margin setting is respected though.
Question
How can I create a plotly theme to create plots of a specified size?
Turns out I was missing the autosize property in my template.
When I set it to False:
pio.templates["custom"].layout.autosize = False
a 200x200 plot comes out.

Bokeh Graph Starting or Finishing with NaN Y-Axis Values Is Not Shown

I am trying to get a graph that has all of the x-axis values filled but starts with y-axis values that are NaN. It appears that the graph will start at the first real y-axis value. Here is the example:
from bokeh.plotting import figure, output_file, show
output_file("line.html")
p = figure(plot_width=400, plot_height=400)
# add a line renderer with a NaN
n = float('nan')
p.line(
[1, 2, 3, 4, 5], # x-axis
[n, n, 7, 2, 4], # y-axis
line_width=2
)
show(p)
This is the result:
As you can see the first 2 elements of the x-axis array aren't shown.
I was hoping to find a way to force bokeh graph to show all values or a workaround to the same effect.
You could explicitly set the start (or end) of the plot's x (or y) axis. Like so:
from bokeh.plotting import figure, output_file, show
output_file("line.html")
p = figure(plot_width=400, plot_height=400)
# add a line renderer with a NaN
n = float('nan')
x_values = [1, 2, 3, 4, 5] # x-axis
y_values = [n, n, 7, 2, 4] # y-axis
p.line(x=x_values, y=y_values, line_width=2)
# of course in real life it'd make sense to do max and min of x and y,
# but this is all we need for your specific example.
p.x_range.start = min(x_values)
show(p)

Disable hue nesting in Seaborn

When plotting a bar chart with Seaborn and using the hue parameter to color the bars according to their column value, bars with identical column values are nested, or aggregated, and only a single bar is shown. The image below illustrates the problem. Patient number 1 has two samples of sample_type 1, with values 10 and 20. The two values have been nested, and both values are represented as a single bar (as the average of the two).
I'd like to avoid this nesting, and rather have something like in the image below.
Is this possible to achieve? MVE below. Thanks!
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({
"patient_number": [1, 1, 1, 2, 2, 2],
"sample_type": [1, 1, 2, 1, 2, 3],
"value": [10, 20, 15, 10, 11, 12]
})
sns.barplot(x="patient_number", y="value", hue="sample_type", data=df)
plt.show()
The following approach obtains the desired plot:
Seaborn's hue= parameter both defines the color and the position of the bars.
Per patient, an extra field ('idx') contains a unique number for each of the desired bars. This field 'idx' restarts from 0 for every next patient and is added to the dataframe.
'idx' can then be used as hue='idx' to get the desired columns, although they will be colored just sequently.
In order to get one color per sample type, an extra column now contains a factorized version of the sample types (so, 0 for the first type, 1 for the next, etc.)
Seaborn generates the bars per hue, one for each patient. These bars can be accessed as a list via ax.patches If some patient doesn't have a value for a given 'idx', a dummy bar is will be added to the list.
By iterating through the patients and then through the 'idx', all bars can be visited and colored via 'sample_type'. As the ordering of the bars is a bit tricky, an adequate renumbering is needed.
The legend needs to be changed to reflect the sample types.
The given data is extended a bit to be able to test different numbers of samples per patient, and sample types that aren't simple subsequent numbers.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({
'patient_number': [1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],
'sample_type': ['st1', 'st1', 'st2', 'st1', 'st2', 'st3', 'st4', 'st4', 'st4', 'st4', 'st4'],
'value': [10, 20, 15, 10, 11, 12, 1, 2, 3, 4, 5]
})
df['idx'] = df.groupby('patient_number').cumcount()
df['sample_factors'], sample_labels = pd.factorize(df['sample_type'])
ax = sns.barplot(x='patient_number', y='value', hue='idx', data=df)
colors = plt.cm.get_cmap('Set2').colors # https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
handles = [None for _ in sample_labels]
num_patients = len(ax.patches) // (df['idx'].max() + 1)
for i, (patient_id, group) in enumerate(df.groupby('patient_number')):
for j, factor in enumerate(group['sample_factors']):
patch = ax.patches[i + j * num_patients]
patch.set_color(colors[factor])
handles[factor] = patch
ax.legend(handles=handles, labels=list(sample_labels), title='Sample type')
plt.show()

Pyplot/Matplotlib: Binary data with strings on x-axis

I know it's such a basic thing, but due to ridiculous time constraints and the severity of the situation I'm forced to ask something like this:
I've got two arrays of 160 000 entries. One contains strings(names I need to use), the other contains corresponding 1's and 0's.
I'm trying to make a simple "step" graph in pyplot with the array of names along the X-axis and 0 and 1 along the Y-axis.
I have this currently:
import numpy as np
import matplotlib.pyplot as plt
data = [1, 2, 4, 5, 9]
bindata = [0,1,1,0,1,1,0,0,0,1]
xaxis = np.arange(0, data[-1] + 1)
yaxis = np.array(bindata)
plt.step(xaxis, yaxis)
plt.xlabel('Filter Degree Combinations')
plt.ylabel('Negative Or Positive')
plt.title("Car 1")
#plt.savefig('foo.png') #For saving
plt.show()
It gives me this:
But I want something like this:
I cobbled the code together from some examples, tutorials and stackoverflow questions, but I run into "ValueError: x and y must have same first dimension" so often that I'm not getting anywhere when I try to experiment my way forward.
You can achieve the desired plot by specifying the tick labels and their positions on the x-axis using plt.xticks. The first argument range(0, 10, 2) is the positions followed by the strings
import numpy as np
import matplotlib.pyplot as plt
data = [1, 2, 4, 5, 9]
bindata = [0,1,1,0,1,1,0,0,0,1]
xaxis = np.arange(0, data[-1] + 1)
yaxis = np.array(bindata)
plt.step(xaxis, yaxis)
xlabels = ['Josh', 'Anna', 'Kevin', 'Sophie', 'Steve'] # <-- specify tick-labels
plt.xlabel('Filter Degree Combinations')
plt.ylabel('Negative Or Positive')
plt.title("Car 1")
plt.xticks(range(0, 10, 2), xlabels) # <-- assign tick-labels
plt.show()

Categories