I dynamically generate a pandas dataframe where columns are months, index is day-of-month, and values are cumulative revenue. This is fairly easy, b/c it just pivots a dataframe that is month/dom/rev.
But now I want to plot it in plotly. Since every month the columns will expand, I don't want to manually add a trace per month. But I can't seem to have a single trace incorporate multiple columns. I could've sworn this was possible.
revs = Scatter(
x=df.index,
y=[df['2016-Aug'], df['2016-Sep']],
name=['rev', 'revvv'],
mode='lines'
)
data=[revs]
fig = dict( data=data)
iplot(fig)
This generates an empty graph, no errors. Ideally I'd just pass df[df.columns] to y. Is this possible?
You were probably thinking about cufflinks. You can plot a whole dataframe with Plotly using the iplot function without data replication.
An alternative would be to use pandas.plot to get an matplotlib object which is then converted via plotly.tools.mpl_to_plotly and plotted. The whole procedure can be shortened to one line:
plotly.plotly.plot_mpl(df.plot().figure)
The output is virtually identical, just the legend needs tweaking.
import plotly
import pandas as pd
import random
import cufflinks as cf
data = plotly.tools.OrderedDict()
for month in ['2016-Aug', '2016-Sep']:
data[month] = [random.randrange(i * 10, i * 100) for i in range(1, 30)]
#using cufflinks
df = pd.DataFrame(data, index=[i for i in range(1, 30)])
fig = df.iplot(asFigure=True, kind='scatter', filename='df.html')
plot_url = plotly.offline.plot(fig)
print(plot_url)
#using mpl_to_plotly
plot_url = plotly.offline.plot(plotly.tools.mpl_to_plotly(df.plot().figure))
print(plot_url)
Related
I'm a beginner in Python.
In my internship project I am trying to plot bloxplots from data contained in a csv
I need to plot bloxplots for each of the 4 (four) variables showed above (AAG, DENS, SRG e RCG). Since each variable presents values in the range from [001] to [100], there will be 100 boxplots for each variable, which need to be plotted in a single graph as shown in the image.
This is the graph I need to plot, but for each variable there will be 100 bloxplots as each one has 100 columns of values:
The x-axis is the "Year", which ranges from 2025 to 2030, so I need a graph like the one shown in figure 2 for each year and the y-axis is the sets of values for each variable.
Using Pandas-melt function and seaborn library I was able to plot only the boxplots of a column. But that's not what I need:
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
mdf= df.melt(id_vars=['Year'], value_vars='AAG[001]')
print(mdf)
ax=sns.boxplot(x='Year', y='value',width = 0.2, data=mdf)
Result of the code above:
What can I try to resolve this?
The following code gives you five subplots, where each subplot only contains the data of one variable. Then a boxplot is generated for each year. To change the range of columns used for each variable, change the upper limit in var_range = range(1, 101), and to see the outliers change showfliers to True.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
variables = ["AAG", "DENS", "SRG", "RCG", "Thick"]
period = range(2025, 2031)
var_range = range(1, 101)
fig, axes = plt.subplots(2, 3)
flattened_axes = fig.axes
flattened_axes[-1].set_visible(False)
for i, var in enumerate(variables):
var_columns = [f"TB_acc_{var}[{j:05}]" for j in var_range]
data = df.melt(id_vars=["Period"], value_vars=var_columns, value_name=var)
ax = flattened_axes[i]
sns.boxplot(x="Period", y=var, width=0.2, data=data, ax=ax, showfliers=False)
plt.tight_layout()
plt.show()
output:
I am having trouble eliminating datetime gaps within a dataset that i'm trying to create a very simple line chart in plotly express and I have straight lines on the graph connecting datapoints over a gap in the data (weekends).
Dataframe simply has an index of datetime (to the hour) called sale_date, and cols called NAME, COST with approximately 30 days worth of data.
df['sale_date'] = pd.to_datetime(df['sale_date'])
df = df.set_index('sale_date')
px.line(df, x=df.index, y='COST', color='NAME')
I've seen a few posts regarding this issue and one recommended setting datetime as the index, but it still yields the gap lines.
The data in the example may not be the same as yours, but the point is that you can change the x-axis data to string data instead of date/time data, or change the x-axis type to category, and add a scale and tick text.
import pandas as pd
import plotly.express as px
import numpy as np
np.random.seed(2021)
date_rng = pd.date_range('2021-08-01','2021-08-31', freq='B')
name = ['apple']
df = pd.DataFrame({'sale_date':pd.to_datetime(date_rng),
'COST':np.random.randint(100,3000,(len(date_rng),)),
'NAME':np.random.choice(name,size=len(date_rng))})
df = df.set_index('sale_date')
fig= px.line(df, x=[d.strftime('%m/%d') for d in df.index], y='COST', color='NAME')
fig.show()
xaxis update
fig= px.line(df, x=df.index, y='COST', color='NAME')
fig.update_xaxes(type='category',
tickvals=np.arange(0,len(df)),
ticktext=[d.strftime('%m/%d') for d in df.index])
I've searched for days and didn't find an answer. How can I plot a time series data in Dash Plotly as a linegraph with selectable lines?
My data (pandas dataframe) describes GDP of different countrys. Index is country, column is years.
I don't find a solution to pass the data to Dash Plotly linegraph. What are my x and y values?
fig = px.line(df, x=?, y=?)
By the looks of it, the solution in your example should be:
fig = px.line(df, x=df.index, y = df.columns)
Plot 1 - plot by columns as they appear in your dataset
From here, if you'd like to display countries in the legend and have time on the x-axis, you can just add df = df.T into the mix and get:
Plot 2 - transposed dataframe to show time on the x-axis
Details
There's a multitude of possibilites when it comes to plotting time series with plotly. Your example displays a dataset of a wide format. With the latest versions, plotly handles both datasets of long and wide format elegantly straight out of the box. If you need specifics of long and wide data in plotly you can also take a closer look here.
The code snippet below uses the approach described above, but in order for this to work for you exactly the same way, your countries will have to be set as the dataframe row index. But you've stated that they are, so give it a try and let me know how it works out for you. And one more thing: you can freely select which traces to display by clicking the years in the plot legend. The figure produced by the snippet below can also be directly implemented in Dash by following the steps under the section What About Dash? here.
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.io as pio
# sample dataframe of a wide format
np.random.seed(5); cols = ['Canada', 'France', 'Germany']
X = np.random.randn(6,len(cols))
df=pd.DataFrame(X, columns=cols)
df.iloc[0]=0;df=df.cumsum()
df['Year'] = pd.date_range('2020', freq='Y', periods=len(df)).year.astype(str)
df = df.T
df.columns = df.iloc[-1]
df = df.head(-1)
df.index.name = 'Country'
# Want time on the x-axis? ###
# just include:
# df = df.T
##############################
# plotly
fig = px.line(df, x=df.index, y = df.columns)
fig.update_layout(template="plotly_dark")
fig.show()
I have a data frame plot of a time series along with a list of numeric values at which I'd like to draw vertical lines. The plot is an interactive one created using the cufflinks package. Here is an example of three time series in 1000 time values, I'd like to draw vertical lines at 500 and 800. My attempt using "axvlinee" is based upon suggestions I've seen for similar posts:
import numpy as np
import pandas as pd
import cufflinks
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
fig=df.iplot(asFigure=True,xTitle='time',yTitle='values',title='Time Series Plot')
fig.axvline([500,800], linewidth=5,color="black", linestyle="--")
fig.show()
The error message states 'Figure' object has no attribute 'axvline'.
I'm not sure whether this message is due to my lack of understanding about basic plots or stems from a limitation of using igraph.
The answer:
To add a line to an existing plotly figure, just use:
fig.add_shape(type='line',...)
The details:
I gather this is the post you've seen since you're mixing in matplotlib. And as it has been stated in the comments, axvline has got nothing to do with plotly. That was only used as an example for how you could have done it using matplotlib. Using plotly, I'd either go for fig.add_shape(go.layout.Shape(type="line"). But before you try it out for yourself, please b aware that cufflinks has been deprecated. I really liked cufflinks, but now there are better options for building both quick and detailed graphs. If you'd like to stick to one-liners similat to iplot, I'd suggest using plotly.express. The only hurdle in your case is changing your dataset from a wide to a long format that is preferred by plotly.express. The snippet below does just that to produce the following plot:
Code:
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.offline import iplot
#
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
df['id'] = df.index
df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])
# plotly line figure
fig = px.line(df, x='id', y='value', color='variable')
# lines to add, specified by x-position
lines = {'a':500,'c':700,'a':900,'b':950}
# add lines using absolute references
for k in lines.keys():
#print(k)
fig.add_shape(type='line',
yref="y",
xref="x",
x0=lines[k],
y0=df['value'].min()*1.2,
x1=lines[k],
y1=df['value'].max()*1.2,
line=dict(color='black', width=3))
fig.add_annotation(
x=lines[k],
y=1.06,
yref='paper',
showarrow=False,
text=k)
fig.show()
Not sure if this is what you want, adding two scatter seems to work:
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
fig = df.iplot(asFigure=True,xTitle='time',yTitle='values',title='Time Series Plot')
fig.add_scatter(x=[500]*100, y=np.linspace(-4,4,100), name='lower')
fig.add_scatter(x=[800]*100, y=np.linspace(-4,4,100), name='upper')
fig.show()
Output:
I'm trying to plot a histogram with date data using plotly. I would like to plot it with bin sizes corresponding to weeks, and that doesn't seem to work. I searched for documentation about it but didn't find anything.
Here is the code I have. I tried (line 5): 'D7' and 'W1'. That doesn't work (plotly seems not to recognize argument, and set it to one bin per day). What's strange is that 'M1', 'M3' etc... seem to work
fig = go.Figure(data=[go.Histogram(x=df.col,
xbins=dict(
start='2018-01-01',
end='2018-12-31',
size='D7'),
autobinx=False)])
fig.update_layout(
title=go.layout.Title(
text="title",
xref="paper",
x=0.5
),
xaxis_title_text='xaxis title',
yaxis_title_text='yaxis title'
)
fig.show()
Would someone have any information about this problem ?
Thanks
xbins.size is specified in milliseconds by default. To get weekly bins, set xbins.size to 604800000 (7 days with 86,400,000 milliseconds each).
Plotly provides the format xM to get monthly bins because this use case requires more complicated calculations in the background, as monthly bins do not have a uniform size.
It seems that a resampled data source and a bar plot is what you're really looking for:
Plot:
Here, the source data based on daily observations DatetimeIndex(['2020-01-01', '2020-01-02', ... , '2020-07-18'], have been resampled to show sum per week for a certain stock price.
Code:
# Imports
import pandas as pd
#import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
#from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
# data, random sample to illustrate stocks
np.random.seed(12345)
rows = 200
x = pd.Series(np.random.randn(rows),index=pd.date_range('1/1/2020', periods=rows)).cumsum()
y = pd.Series(x-np.random.randn(rows)*5,index=pd.date_range('1/1/2020', periods=rows))
df = pd.concat([y,x], axis = 1)
df.columns = ['StockA', 'StockB']
# resample daily data to weekly sums
df2=df.reset_index()
df3=df2.resample('W-Mon', on='index').mean()
# build and show plotly plot
fig = go.Figure([go.Bar(x=df3.index, y=df3['StockA'])])
fig.show()
Let me know how this works for you.