Filling the space between bars in a bar graph made with Plotly - python

I have a plotly bar graph. The measurements illustrated by the graph are not directly adjacent; there is space between them. I'd like to fill the space between measurements, making them the same value as the previous measurement. Is this possible with plotly?
Edit for clarification: Let's say I have these measurements: [ 20=3, 25=3, 27=3, 30=10, 31=10, 50=2, 56=2 ] -- I'd want data points 20, 25, and 27 to appear as one big bar on the bar graph (filling the space on the x-axis between 20 and 27), 30 and 31, to be the same bar, and 50 and 56 to be the same bar. The reason I want this is that I have millions of empty points in the graph, and if I fill them all manually, the graph grinds the browser to a halt.

One of the possibilities would be to create a scatter plot for your measurements and add the bars as shapes. The simpler solution using connectgaps: False and fill: tozeroy doesn't work here.
import plotly
plotly.offline.init_notebook_mode()
import plotly.graph_objs as go
meas_x = [20, 25, 27, 30, 31, 50, 56]
meas_y = [3, 3, 3, 10, 10, 2, 2]
meas_y.append('None')
meas_x.append('None')
trace1 = go.Scatter(
x=meas_x,
y=meas_y,
mode='markers'
)
shapes = list()
y = meas_y[0]
x = meas_x[0]
for i, m_y in enumerate(meas_y[1:]):
if y != m_y:
shapes.append({
'type': 'rect',
'x0': x,
'y0': 0,
'x1': meas_x[i],
'y1': meas_y[i - 1],
'fillcolor': '#d3d3d3',
})
y = m_y
x = meas_x[i + 1]
fig = {
'data': [trace1],
'layout': go.Layout(shapes=shapes)
}
plotly.offline.iplot(fig)

Related

Python Plotly figure with secondary x axis linked to primary

I've been struggling with this seemingly simple task: How to align two x axis with related data. In my case one axis is in Celsius and the other in Fahrenheit.
What I want to achieve is to obtain alignment of the two x axis so that:
32°F = 0°C
And
50°F = 10°C
With this relation, the two datasets will be aligned in terms of temperature.
I want to have both unit sets on the same graph so that the viewer can interpret the data according to the units they are used to.
Here is my code:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.graph_objs.layout import YAxis,XAxis,Margin
layout = go.Layout(
title="Double X Axis Example",
xaxis=XAxis(
title="Celcius"
),
xaxis2 = XAxis(
title="Fahrenheits",
overlaying= 'x',
side= 'top',
),
yaxis=dict(
title="Y values"
),
)
# Create figure with secondary x-axis
fig = go.Figure(layout=layout)
# Add traces
fig.add_trace(
go.Scatter(x=[10, 20, 30], y=[4.5, 6, 5], name="data set in celcius"),
)
fig.add_trace(
go.Scatter(x=[40, 60, 80], y=[4, 5, 6.5], name="data set in fahrenheit", xaxis='x2'),
)
fig.show()
Here is the resulting figure with the unaligned axes (10°C = 40°F !?):
Thank you,
In this case it might help to set the ranges for the x-axes, something like this:
fig.add_trace(
go.Scatter(x=[10, 20, 30], y=[4.5, 6, 5,], name="data set in celcius",xaxis="x1"),
)
fig.add_trace(
go.Scatter(x=[40, 60, 80], y=[4, 5, 6.5], name="data set in fahrenheit", xaxis='x2'),
)
fig.update_layout(
xaxis1=dict(range=[0, 100]),
xaxis2=dict(range=[32, 212]),
)
...possibly calculating the limit needed of x1 and then base x2 limit on that.
This is my solution and code to your concern. Here, I set the range of the first and second x axes to [0, 100] and [32, 212], respectively. To align the two axes, I made 26 tick marks for both axes and they are aligned because of the equal number of tick marks. Having an equal number of tick marks for both axes (and equal ranges) is crucial so that the aligned numbers are actually equal. Assuming that most data sets that will be plotted are between 0 and 100 degrees Celsius (for data in Celsius) --- or 32 and 212 degrees Fahrenheit (for data in Fahrenheit) --- I believe this solution overflows the data and the traces won't cover the full x ranges. Plot of the graph here.
import numpy as np
import plotly.graph_objects as go
arr1 = np.array([10, 20, 30])
arr2 = np.array([4.5, 6, 5])
arr3 = np.array([40, 60, 80])
arr4 = np.array([4, 5, 6.5])
fig = go.Figure(go.Scatter( x=arr1, y=arr2, name='data set in celsius' ) )
fig.add_trace(go.Scatter( x=arr3, y=arr4, xaxis='x2', name='data set in fahrenheit' ))
fig.update_layout(title_text='Double X Axis Example',
legend=dict(yanchor='top', y=0.875, xanchor='right', x=1),
yaxis=dict(domain = [0.05, 0.875], title='Y values', spikemode='toaxis', spikesnap='cursor'), template='plotly_dark',
xaxis =dict(position = 0, title='Celsius', spikemode='across', spikesnap='cursor',
tickmode='array', tickvals=np.linspace(0,100,26), range=[0,100]),
xaxis2=dict(position = 0.9, title='Fahrenheit', anchor='free', overlaying='x', side='top', tickmode='array',
tickvals=np.linspace(32,212,26), range=[32,212], spikemode='across', spikesnap='cursor' )
)
fig.show()

How do I show only available values in the x-axis

I would like to plot a chart with plotly that shows only the existing values in the x-axis.
When I execute the code below, a chart that looks like in the following image appears:
The range on the x-axis as well as the range on the y-axis is evenly set from zero up to the maximal value.
import plotly.graph_objs as go
from plotly.offline import plot
xValues = [1, 2, 27, 50]
yValues = [7, 1, 2, 3]
trace = go.Scatter( x = xValues, y = yValues, mode='lines+markers', name='high limits' )
plottedData = [trace]
plot( plottedData )
Now, I would like to show only the existing values on the x axis. Related to my example, I want just the values [1, 2, 27, 50] to appear. And they should have the same space in between. Is this possible? If yes, how?
You can force the xaxis.type to be category like this:
plot( dict(data=plottedData, layout=go.Layout(xaxis = {"type": "category"} )))

Label specific bubbles in Plotly bubble chart

I am having trouble figuring out how to label specific bubbles in a plot.ly bubble chart. I want certain "outlier" bubbles to have text written inside the bubble instead of via hover text.
Let's say I have this data:
import plotly.plotly as py
import plotly.graph_objs as go
trace0 = go.Scatter(
x=[1, 2, 3, 4],
y=[10, 11, 12, 13],
mode='markers',
marker=dict(
size=[40, 60, 80, 100],
)
)
data = [trace0]
py.iplot(data, filename='bubblechart-size')
I'd like to only add text markers on bubbles that correspond to (1,10) and (4,13). Furthermore, is it possible to control the location of text markers?
You can achieve this with annotations.
This allows you to write any text you want on the chart and reference it to your data. You can also control where the text appears using position anchors or by applying an additional calculation on top of the x and y data. For example:
x_data = [1, 2, 3, 4]
y_data = [10, 11, 12, 13]
z_data = [40, 60, 80, 100]
annotations = [
dict(
x=x,
y=y,
text='' if 4 > x > 1 else z, # Some conditional to define outliers
showarrow=False,
xanchor='center', # Position of text relative to x axis (left/right/center)
yanchor='middle', # Position of text relative to y axis (top/bottom/middle)
) for x, y, z in zip(x_data, y_data, z_data)
]
trace0 = go.Scatter(
x=x_data,
y=y_data,
mode='markers',
marker=dict(
size=z_data,
)
)
data = [trace0]
layout = go.Layout(annotations=annotations)
py.iplot(go.Figure(data=data, layout=layout), filename='bubblechart-size')
Edit
If using cufflinks, then the above can be adapted slightly to:
bubbles_to_annotate = df[(df['avg_pos'] < 2) | (df['avg_pos'] > 3)] # Some conditional to define outliers
annotations = [
dict(
x=row['avg_pos'],
y=row['avg_neg'],
text=row['subreddit'],
showarrow=False,
xanchor='center', # Position of text relative to x axis (left/right/center)
yanchor='middle', # Position of text relative to y axis (top/bottom/middle)
) for _, row in bubbles_to_annotate.iterrows()
]
df.iplot(kind='bubble', x='avg_pos', y='avg_neg', size='counts',
text='subreddit', xTitle='Average Negative Sentiment',
yTitle='Average Positive Sentiment', annotations=annotations,
filename='simple-bubble-chart')
You will still need to define the annotations since you need a conditional argument. Then pass this directly to cufflinks via annotations.

Assign specific colours to data in Matplotlib pie chart

I'm trying to create pie charts with matplotlib in which the colour of each category is fixed.
I've got a function which creates a pie chart from sets of value and category data. Here's one example:
Category Value
TI 65
Con 43
FR 40
TraI 40
Bug 38
Data 22
Int 15
KB 12
Other 8
Dep 7
PW 6
Uns 5
Perf 4
Dep 3
The problem is that the data differs from one instance to another, and that in turn changes the order of the categories. Thus, each category gets labelled a different colour each time I generate a chart. I could sort the data alphabetically every time, but that causes two problems: some categories are missing from some datasets, and I'd prefer it sorted by size anyway so that the smallest wedges are oriented horizontally.
How can I set matplotlib to assign colours depending on, say, the index of a pandas.Series?
Here's the code that I'm using to generate a pie chart:
import matplotlib.pyplot as plt
slices = [62, 39, 39, 38, 37, 21, 15, 9, 6, 7, 6, 5, 4, 3]
cmap = plt.cm.prism
colors = cmap(np.linspace(0., 1., len(slices)))
labels = [u'TI', u'Con', u'FR', u'TraI', u'Bug', u'Data', u'Int', u'KB', u'Other', u'Dep', u'PW', u'Uns', u'Perf', u'Dep']
fig = plt.figure(figsize=[10, 10])
ax = fig.add_subplot(111)
pie_wedge_collection = ax.pie(slices, colors=colors, labels=labels, labeldistance=1.05, autopct=make_autopct(slices))
for pie_wedge in pie_wedge_collection[0]:
pie_wedge.set_edgecolor('white')
titlestring = 'Issues'
ax.set_title(titlestring)
EDIT: I forgot to explain the autopct function, it's for adding value and percentage labels:
def make_autopct(values):
def my_autopct(pct):
total = sum(values)
val = int(round(pct*total/100.0))
return '{p:.2f}% ({v:d})'.format(p=pct,v=val)
return my_autopct
Here is a simpler solution to #tmdavison's answer.
Let's first see the problem with an MWE:
import matplotlib.pyplot as plt
labels = ['Frogs', 'Hogs', 'Dogs', 'Logs']
sizes = [15, 30, 45, 10]
fig, ax = plt.subplots(1, 2)
ax[0].pie(sizes, labels=labels)
ax[1].pie(sizes[1:], labels=labels[1:])
This produces the problem plots:
The problem is that in the left-hand plot, Hogs is coloured in orange, but in the right-hand plot Hogs is coloured in blue (with a similar mix-up for Logs and Dogs).
We would like the colours for the labels to be the same across both plots. We can do this by specifying a dictionary of colours to use:
labels = ['Frogs', 'Hogs', 'Dogs', 'Logs']
sizes = [15, 30, 45, 10]
colours = {'Frogs': 'C0',
'Hogs': 'C1',
'Dogs': 'C2',
'Logs': 'C3'}
fig, ax = plt.subplots(1, 2)
ax[0].pie(sizes,
labels=labels,
colors=[colours[key] for key in labels])
ax[1].pie(sizes[1:],
labels=labels[1:],
colors=[colours[key] for key in labels[1:]])
This works to create the plot:
Here we see that the labels are represented by the same colours across both plots, as desired.
If you have lots of categories it can be cumbersome to manually set a colour for each category. In this case you could construct the colours dictionary as:
colours = dict(zip(labels, plt.cm.tab10.colors[:len(labels)]))
If you have more than 10 categories you would instead use:
colours = dict(zip(labels, plt.cm.tab20.colors[:len(labels)]))
Here's an idea you could try. Make a dictionary from your labels and colors, so each color is mapped to a label. Then, after making the pie chart, go in an assign the facecolor of the wedge using this dictionary.
Here's an untested bit of code which might do what you are looking for:
import numpy as np
import matplotlib.pyplot as plt
def mypie(slices,labels,colors):
colordict={}
for l,c in zip(labels,colors):
print l,c
colordict[l]=c
fig = plt.figure(figsize=[10, 10])
ax = fig.add_subplot(111)
pie_wedge_collection = ax.pie(slices, labels=labels, labeldistance=1.05)#, autopct=make_autopct(slices))
for pie_wedge in pie_wedge_collection[0]:
pie_wedge.set_edgecolor('white')
pie_wedge.set_facecolor(colordict[pie_wedge.get_label()])
titlestring = 'Issues'
ax.set_title(titlestring)
return fig,ax,pie_wedge_collection
slices = [37, 39, 39, 38, 62, 21, 15, 9, 6, 7, 6, 5, 4, 3]
cmap = plt.cm.prism
colors = cmap(np.linspace(0., 1., len(slices)))
labels = [u'TI', u'Con', u'FR', u'TraI', u'Bug', u'Data', u'Int', u'KB', u'Other', u'Dep', u'PW', u'Uns', u'Perf', u'Dep']
fig,ax,pie_wedge_collection = mypie(slices,labels,colors)
plt.show()

Create a Diverging Stacked Bar Chart in matplotlib

I have lists of data indicating responses to likert questions with a one (very unhappy) to five (very happy) scale. I would like to create a page of plots showing these lists as skewed stacked horizontal bar charts. The lists of responses can be of different sizes (e.g. when someone has opted out of answering a particular question). Here is a minimal example of the data:
likert1 = [1.0, 2.0, 1.0, 2.0, 1.0, 3.0, 3.0, 4.0, 4.0, 1.0, 1.0]
likert2 = [5.0, 4.0, 5.0, 4.0, 5.0, 3.0]
I would like to be able to plot this with something like:
plot_many_likerts(likert1, likert2)
At the moment I've written a function to iterate over the lists, and plot each one as its own subplot on a shared figure in matplotlib:
def plot_many_likerts(*lsts):
#get the figure and the list of axes for this plot
fig, axlst = plt.subplots(len(lsts), sharex=True)
for i in range(len(lsts)):
likert_horizontal_bar_list(lsts[i], axlst[i], xaxis=[1.0, 2.0, 3.0, 4.0, 5.0])
axlst[i].axis('off')
fig.show()
def likert_horizontal_bar_list(lst, ax, xaxis):
cnt = Counter(lst)
#del (cnt[None])
i = 0
colour_float = 0.00001
previous_right = 0
for key in sorted(xaxis):
ax.barh(bottom=0, width=cnt[key], height=0.4, left=previous_right, color=plt.cm.jet(colour_float),label=str(key))
i += 1
previous_right = previous_right + cnt[key]
colour_float = float(i) / float(len(xaxis))
This works not badly and create stacked bar charts all with the same representative sizes (e.g. the widths share common axis scales). Here is a screen shot:
What is currently Produced http://s7.postimg.org/vh0j816gn/figure_1.jpg
What I would like is to have these two plots centered on midpoints of the mode of the datasets (the datasets will have the same range). For instance:
What I would like to see http://s29.postimg.org/z0qwv4ryr/figure_2.jpg
Suggestions on how I might do this?
I needed to make a divergent bar chart for some likert data. I was using pandas, but the approach would probably be similar without it. The key mechanism is to add in an invisible buffer at the start.
likert_colors = ['white', 'firebrick','lightcoral','gainsboro','cornflowerblue', 'darkblue']
dummy = pd.DataFrame([[1,2,3,4, 5], [5,6,7,8, 5], [10, 4, 2, 10, 5]],
columns=["SD", "D", "N", "A", "SA"],
index=["Key 1", "Key B", "Key III"])
middles = dummy[["SD", "D"]].sum(axis=1)+dummy["N"]*.5
longest = middles.max()
complete_longest = dummy.sum(axis=1).max()
dummy.insert(0, '', (middles - longest).abs())
dummy.plot.barh(stacked=True, color=likert_colors, edgecolor='none', legend=False)
z = plt.axvline(longest, linestyle='--', color='black', alpha=.5)
z.set_zorder(-1)
plt.xlim(0, complete_longest)
xvalues = range(0,complete_longest,10)
xlabels = [str(x-longest) for x in xvalues]
plt.xticks(xvalues, xlabels)
plt.show()
There are many limitations to this approach. First, bars no longer get a black outline, and the legend will have an extra blank element. I just hid the legend (I figure there's probably a way to hide just the individual element). I'm not sure of a convenient way to make the bars have an outline without also adding the outline to the buffer element.
First, we establish some colors and dummy data. Then we calculate the width of the left two columns and half of the middle-most column (which i know to be "SD", "D", and "N", respectively). I find the longest column, and use its width to calculate the difference needed for the other columns. Next, I insert this new buffer column into the first column position with a blank title (which felt gross, lemme tell you). For good measure, I also added a vertical line (axvline) behind the middle of the middle bar based on the advice of [2]. Finally, I adjust the x-axis to have the proper scale by offsetting its labels.
You might want more horizontal space on the left - you can easily do so by adding to "longest".
[2] Heiberger, Richard M., and Naomi B. Robbins. "Design of diverging stacked bar charts for Likert scales and other applications." Journal of Statistical Software 57.5 (2014): 1-32.
I too recently needed to make a divergent bar chart for some Likert data. I took a slightly different approach than #austin-cory-bart.
I modified an example from the gallery instead and created this:
import numpy as np
import matplotlib.pyplot as plt
category_names = ['Strongly disagree', 'Disagree',
'Neither agree nor disagree', 'Agree', 'Strongly agree']
results = {
'Question 1': [10, 15, 17, 32, 26],
'Question 2': [26, 22, 29, 10, 13],
'Question 3': [35, 37, 7, 2, 19],
'Question 4': [32, 11, 9, 15, 33],
'Question 5': [21, 29, 5, 5, 40],
'Question 6': [8, 19, 5, 30, 38]
}
def survey(results, category_names):
"""
Parameters
----------
results : dict
A mapping from question labels to a list of answers per category.
It is assumed all lists contain the same number of entries and that
it matches the length of *category_names*. The order is assumed
to be from 'Strongly disagree' to 'Strongly aisagree'
category_names : list of str
The category labels.
"""
labels = list(results.keys())
data = np.array(list(results.values()))
data_cum = data.cumsum(axis=1)
middle_index = data.shape[1]//2
offsets = data[:, range(middle_index)].sum(axis=1) + data[:, middle_index]/2
# Color Mapping
category_colors = plt.get_cmap('coolwarm_r')(
np.linspace(0.15, 0.85, data.shape[1]))
fig, ax = plt.subplots(figsize=(10, 5))
# Plot Bars
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data[:, i]
starts = data_cum[:, i] - widths - offsets
rects = ax.barh(labels, widths, left=starts, height=0.5,
label=colname, color=color)
# Add Zero Reference Line
ax.axvline(0, linestyle='--', color='black', alpha=.25)
# X Axis
ax.set_xlim(-90, 90)
ax.set_xticks(np.arange(-90, 91, 10))
ax.xaxis.set_major_formatter(lambda x, pos: str(abs(int(x))))
# Y Axis
ax.invert_yaxis()
# Remove spines
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
# Ledgend
ax.legend(ncol=len(category_names), bbox_to_anchor=(0, 1),
loc='lower left', fontsize='small')
# Set Background Color
fig.set_facecolor('#FFFFFF')
return fig, ax
fig, ax = survey(results, category_names)
plt.show()

Categories