Plotly: How to add vertical lines at specified points? - python

I have a data frame plot of a time series along with a list of numeric values at which I'd like to draw vertical lines. The plot is an interactive one created using the cufflinks package. Here is an example of three time series in 1000 time values, I'd like to draw vertical lines at 500 and 800. My attempt using "axvlinee" is based upon suggestions I've seen for similar posts:
import numpy as np
import pandas as pd
import cufflinks
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
fig=df.iplot(asFigure=True,xTitle='time',yTitle='values',title='Time Series Plot')
fig.axvline([500,800], linewidth=5,color="black", linestyle="--")
fig.show()
The error message states 'Figure' object has no attribute 'axvline'.
I'm not sure whether this message is due to my lack of understanding about basic plots or stems from a limitation of using igraph.

The answer:
To add a line to an existing plotly figure, just use:
fig.add_shape(type='line',...)
The details:
I gather this is the post you've seen since you're mixing in matplotlib. And as it has been stated in the comments, axvline has got nothing to do with plotly. That was only used as an example for how you could have done it using matplotlib. Using plotly, I'd either go for fig.add_shape(go.layout.Shape(type="line"). But before you try it out for yourself, please b aware that cufflinks has been deprecated. I really liked cufflinks, but now there are better options for building both quick and detailed graphs. If you'd like to stick to one-liners similat to iplot, I'd suggest using plotly.express. The only hurdle in your case is changing your dataset from a wide to a long format that is preferred by plotly.express. The snippet below does just that to produce the following plot:
Code:
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.offline import iplot
#
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
df['id'] = df.index
df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])
# plotly line figure
fig = px.line(df, x='id', y='value', color='variable')
# lines to add, specified by x-position
lines = {'a':500,'c':700,'a':900,'b':950}
# add lines using absolute references
for k in lines.keys():
#print(k)
fig.add_shape(type='line',
yref="y",
xref="x",
x0=lines[k],
y0=df['value'].min()*1.2,
x1=lines[k],
y1=df['value'].max()*1.2,
line=dict(color='black', width=3))
fig.add_annotation(
x=lines[k],
y=1.06,
yref='paper',
showarrow=False,
text=k)
fig.show()

Not sure if this is what you want, adding two scatter seems to work:
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
fig = df.iplot(asFigure=True,xTitle='time',yTitle='values',title='Time Series Plot')
fig.add_scatter(x=[500]*100, y=np.linspace(-4,4,100), name='lower')
fig.add_scatter(x=[800]*100, y=np.linspace(-4,4,100), name='upper')
fig.show()
Output:

Related

Extracting plotly.express selection in JupyterLab

I want to extract the indices or a mask from a selection made in a plotly.express figure. The figure is created in JupyterLab.
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df x="sepal_width", y="sepal_length", color="species")
fig.show()
This figure shows the untouched figure.
This figure show a arbitrary selection. From this selection, I would like to extract a list of indices or a boolean mask, or anything that will allow the selection to be extracted from the original DataFrame.
There seems to be some attributes/functions that are to aid with this, such as fig.data[0].selectedpoints. I am unable to utilize them.
plotly is version: '4.14.3'
As far as I know, there is no way to get the range selected by the user. The feature you pointed out in your question, selectedpoints, is there for graphers to use to highlight specific ranges. It can be used as a scenario for the creator rather than a user choice. I have customized this feature with information from this page.
import plotly.graph_objects as go
import numpy as np
df = px.data.iris()
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['sepal_width'],
y=df['sepal_length'],
mode='markers',
marker=dict(color='rgba(0, 45, 240)', size=10)))
fig.update_layout(width=600,
height=550,
autosize=False,
xaxis=dict(zeroline=False),
hovermode='closest')
fig.show()
inds = [15+k for k in range(30)]
fig.data[0].update(selectedpoints=inds,
selected=dict(marker=dict(color='red')),#color of selected points
unselected=dict(marker=dict(color='rgb(200,200, 200)',#color of unselected pts
opacity=0.9)));
fig.show()

Plotly: How to plot time series in Dash Plotly

I've searched for days and didn't find an answer. How can I plot a time series data in Dash Plotly as a linegraph with selectable lines?
My data (pandas dataframe) describes GDP of different countrys. Index is country, column is years.
I don't find a solution to pass the data to Dash Plotly linegraph. What are my x and y values?
fig = px.line(df, x=?, y=?)
By the looks of it, the solution in your example should be:
fig = px.line(df, x=df.index, y = df.columns)
Plot 1 - plot by columns as they appear in your dataset
From here, if you'd like to display countries in the legend and have time on the x-axis, you can just add df = df.T into the mix and get:
Plot 2 - transposed dataframe to show time on the x-axis
Details
There's a multitude of possibilites when it comes to plotting time series with plotly. Your example displays a dataset of a wide format. With the latest versions, plotly handles both datasets of long and wide format elegantly straight out of the box. If you need specifics of long and wide data in plotly you can also take a closer look here.
The code snippet below uses the approach described above, but in order for this to work for you exactly the same way, your countries will have to be set as the dataframe row index. But you've stated that they are, so give it a try and let me know how it works out for you. And one more thing: you can freely select which traces to display by clicking the years in the plot legend. The figure produced by the snippet below can also be directly implemented in Dash by following the steps under the section What About Dash? here.
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.io as pio
# sample dataframe of a wide format
np.random.seed(5); cols = ['Canada', 'France', 'Germany']
X = np.random.randn(6,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0;df=df.cumsum()
df['Year'] = pd.date_range('2020', freq='Y', periods=len(df)).year.astype(str)
df = df.T
df.columns = df.iloc[-1]
df = df.head(-1)
df.index.name = 'Country'
# Want time on the x-axis? ###
# just include:
# df = df.T
##############################
# plotly
fig = px.line(df, x=df.index, y = df.columns)
fig.update_layout(template="plotly_dark")
fig.show()

Clustermapping in Python using Seaborn

I am trying to create a heatmap with dendrograms on Python using Seaborn and I have a csv file with about 900 rows. I'm importing the file as a pandas dataframe and attempting to plot that but a large number of the rows are not being represented in the heatmap. What am I doing wrong?
This is the code I have right now. But the heatmap only represents about 49 rows.
Here is an image of the clustermap I've obtained but it is not displaying all of my data.
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
# Data set
df = pd.read_csv('diff_exp_gene.csv', index_col = 0)
# Default plot
sns.clustermap(df, cmap = 'RdBu', row_cluster=True, col_cluster=True)
plt.show()
Thank you.
An alternative approach would be to use imshow in matpltlib. I'm not exactly sure what your question is but I demonstrate a way to graph points on a plane from csv file
import numpy as np
import matplotlib.pyplot as plt
import csv
infile = open('diff_exp_gene.csv')
df = csv.DictReader(in_file)
temp = np.zeros((128,128), dtype = int)
for row in data:
if row['TYPE'] == types:
temp[int(row['Y'])][int(row['X'])] = temp[int(row['Y'])][int(row['X'])] + 1
plt.imshow(temp, cmap = 'hot', origin = 'lower')
plt.show()
As far as I know, keywords that apply to seaborn heatmaps also apply to clustermap, as the sns.clustermap passes to the sns.heatmap. In that case, all you need to do in your example is to set yticklabels=True as a keyword argument in sns.clustermap(). That will make all of the 900 rows appear.
By default, it is set as "auto" to avoid overlap. The same applies to the xticklabels. See more here: https://seaborn.pydata.org/generated/seaborn.heatmap.html

Plot stacked bar chart from pandas data frame

I have dataframe:
payout_df.head(10)
What would be the easiest, smartest and fastest way to replicate the following excel plot?
I've tried different approaches, but couldn't get everything into place.
Thanks
If you just want a stacked bar chart, then one way is to use a loop to plot each column in the dataframe and just keep track of the cumulative sum, which you then pass as the bottom argument of pyplot.bar
import pandas as pd
import matplotlib.pyplot as plt
# If it's not already a datetime
payout_df['payout'] = pd.to_datetime(payout_df.payout)
cumval=0
fig = plt.figure(figsize=(12,8))
for col in payout_df.columns[~payout_df.columns.isin(['payout'])]:
plt.bar(payout_df.payout, payout_df[col], bottom=cumval, label=col)
cumval = cumval+payout_df[col]
_ = plt.xticks(rotation=30)
_ = plt.legend(fontsize=18)
Besides the lack of data, I think the following code will produce the desired graph
import pandas as pd
import matplotlib.pyplot as plt
df.payout = pd.to_datetime(df.payout)
grouped = df.groupby(pd.Grouper(key='payout', freq='M')).sum()
grouped.plot(x=grouped.index.year, kind='bar', stacked=True)
plt.show()
I don't know how to reproduce this fancy x-axis style. Also, your payout column must be a datetime, otherwise pd.Grouper won't work (available frequencies).

Multiple series in a trace for plotly

I dynamically generate a pandas dataframe where columns are months, index is day-of-month, and values are cumulative revenue. This is fairly easy, b/c it just pivots a dataframe that is month/dom/rev.
But now I want to plot it in plotly. Since every month the columns will expand, I don't want to manually add a trace per month. But I can't seem to have a single trace incorporate multiple columns. I could've sworn this was possible.
revs = Scatter(
x=df.index,
y=[df['2016-Aug'], df['2016-Sep']],
name=['rev', 'revvv'],
mode='lines'
)
data=[revs]
fig = dict( data=data)
iplot(fig)
This generates an empty graph, no errors. Ideally I'd just pass df[df.columns] to y. Is this possible?
You were probably thinking about cufflinks. You can plot a whole dataframe with Plotly using the iplot function without data replication.
An alternative would be to use pandas.plot to get an matplotlib object which is then converted via plotly.tools.mpl_to_plotly and plotted. The whole procedure can be shortened to one line:
plotly.plotly.plot_mpl(df.plot().figure)
The output is virtually identical, just the legend needs tweaking.
import plotly
import pandas as pd
import random
import cufflinks as cf
data = plotly.tools.OrderedDict()
for month in ['2016-Aug', '2016-Sep']:
data[month] = [random.randrange(i * 10, i * 100) for i in range(1, 30)]
#using cufflinks
df = pd.DataFrame(data, index=[i for i in range(1, 30)])
fig = df.iplot(asFigure=True, kind='scatter', filename='df.html')
plot_url = plotly.offline.plot(fig)
print(plot_url)
#using mpl_to_plotly
plot_url = plotly.offline.plot(plotly.tools.mpl_to_plotly(df.plot().figure))
print(plot_url)

Categories