I'm trying to plot a histogram with date data using plotly. I would like to plot it with bin sizes corresponding to weeks, and that doesn't seem to work. I searched for documentation about it but didn't find anything.
Here is the code I have. I tried (line 5): 'D7' and 'W1'. That doesn't work (plotly seems not to recognize argument, and set it to one bin per day). What's strange is that 'M1', 'M3' etc... seem to work
fig = go.Figure(data=[go.Histogram(x=df.col,
xbins=dict(
start='2018-01-01',
end='2018-12-31',
size='D7'),
autobinx=False)])
fig.update_layout(
title=go.layout.Title(
text="title",
xref="paper",
x=0.5
),
xaxis_title_text='xaxis title',
yaxis_title_text='yaxis title'
)
fig.show()
Would someone have any information about this problem ?
Thanks
xbins.size is specified in milliseconds by default. To get weekly bins, set xbins.size to 604800000 (7 days with 86,400,000 milliseconds each).
Plotly provides the format xM to get monthly bins because this use case requires more complicated calculations in the background, as monthly bins do not have a uniform size.
It seems that a resampled data source and a bar plot is what you're really looking for:
Plot:
Here, the source data based on daily observations DatetimeIndex(['2020-01-01', '2020-01-02', ... , '2020-07-18'], have been resampled to show sum per week for a certain stock price.
Code:
# Imports
import pandas as pd
#import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
#from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
# data, random sample to illustrate stocks
np.random.seed(12345)
rows = 200
x = pd.Series(np.random.randn(rows),index=pd.date_range('1/1/2020', periods=rows)).cumsum()
y = pd.Series(x-np.random.randn(rows)*5,index=pd.date_range('1/1/2020', periods=rows))
df = pd.concat([y,x], axis = 1)
df.columns = ['StockA', 'StockB']
# resample daily data to weekly sums
df2=df.reset_index()
df3=df2.resample('W-Mon', on='index').mean()
# build and show plotly plot
fig = go.Figure([go.Bar(x=df3.index, y=df3['StockA'])])
fig.show()
Let me know how this works for you.
Related
I am analysing a stock portfolio.
I downloaded data from yahoo finance and created a data frame.
What I want to do now is analyse and plot simple returns and log returns distributions and i want to be able to do it for one stock, but also (and here is my question) to plot all the stocks' distribution in the same graph so to spot their different behaviours.
I can plot the single stock return distribution but not the multiple graphs one.
#Import libraries
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from pandas_datareader import data as pdr
#Set start/end time and get stocks data from Yahoo Finance
end = dt.datetime.now()
start = end - dt.timedelta(weeks=104)
stocks_list = ['COST', 'NIO', 'AMD']
df = pdr.get_data_yahoo(stocks_list, start, end)
#Rename 'Adj Close' to be able to create a more accessible variable
df.rename(columns = {"Adj Close" : "Adj_Close"}, inplace = True)
AClose = df.Adj_Close
#Import plotly
import plotly.offline as pyo
pyo.init_notebook_mode(connected=True)
pd.options.plotting.backend = 'plotly'
#Plot the Adjusted Close price with plotly
AClose.plot()
#Plot the SIMPLE return distrib. chart of the Adj_Close price of 'COST' with plotly
AClose['COST'].pct_change().plot(kind='hist')
#Plot the SIMPLE return distrib. chart of the Adj_Close price of ALL the stocks
**QUESTION 1**
#Compute log returns
log_returns = np.log(df.Adj_Close/df.Adj_Close.shift(1)).dropna()
#Plot the LOG returns distrib. chart of the Adj_Close price of 'COST'
log_returns['COST'].plot(kind = 'hist')
#Plot the LOG return distrib. chart of the Adj_Close price of ALL the stocks
**QUESTION 2**
What I am trying to achieve is something like this, but I want to plot all the stocks and not just one and I want to do it with plotly so i can toggle the stocks data in & out from the graph's legend.
Normal Vs Stock return
I have created the code with the understanding that you want to create a histogram with all returns in different colors. I think all I need to do is add a normal distribution graph in line mode for scatter plots. Simple returns can also be graphed using the same technique. I think that each comparison should consist of subplots.
log_max = log_returns.max().max()
log_min = log_returns.min().min()
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Histogram(x=log_returns.COST, name='COST'))
fig.add_trace(go.Histogram(x=log_returns.NIO, name='NIO'))
fig.add_trace(go.Histogram(x=log_returns.AMD, name='AMD'))
fig.update_xaxes(range=[log_min, log_max])
fig.update_layout(autosize=True, height=450, barmode='overlay', title='Log returns: COST,NIO,AMD')
fig.update_traces(opacity=0.5)
fig.show()
I am trying to represent CDC Delay of Care data as a line graph but am having some trouble formatting the y axis so that it is a percentage to the hundredths place. I would also like for the x axis to show every year in the range selected.
Here is my code:
import pandas as pd
from isolation import isolate_total_stub, isolate_age_stub
import matplotlib.pyplot as plt
# very simple extraction, drop some columns and check some data
cdc_data = pd.read_csv('CDC_Delay_of_Care_Data.csv')
# separate the categories of delayed care
delay_of_medical_care = cdc_data[cdc_data.PANEL == 'Delay or nonreceipt of needed medical care due to cost']
# isolate the totals stub
total_delay_of_medical_care = isolate_total_stub(delay_of_medical_care)
x_axis = total_delay_of_medical_care.YEAR
y_axis = total_delay_of_medical_care.ESTIMATE
plt.plot(x_axis, y_axis)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()
The graph that displays looks like this:
line graph
Excuse me for being a novice, I have been googling for an hour now and instead of continue to search for an answer I thought it would be more productive to ask StackOverflow.
Thank you for your time.
To change the format of Y-axis, you can use set_major_formatter
To change X-axis to date in year format, you will need to use set_major_locator, assuming that your date is in datetime format
To change format of X-axis, you can again use the set_major_formatter
I am showing a small example below with dummy data. Hope this works.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import matplotlib.dates as mdate
estimate = [8, 7.1, 11, 10.6, 8, 8.3]
year = ['2000-01-01', '2004-01-01', '2008-01-01', '2012-01-01', '2016-01-01', '2020-01-01']
year=pd.to_datetime(year) ## Convert string to datetime
plt.figure(figsize=(12,5)) ## Added so the Years don't overlap on each other
plt.plot(year, estimate)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.gca().yaxis.set_major_formatter(FormatStrFormatter('%.2f')) ## Makes X-axis label with two decimal points
locator = mdate.YearLocator()
plt.gca().xaxis.set_major_locator(locator) ## Changes datetime to years - 1 label per year
plt.gca().xaxis.set_major_formatter(mdate.DateFormatter('%Y')) ## Shows X-axis in Years
plt.gcf().autofmt_xdate() ## Rotates X-labels, if you want to use it
plt.show()
Output plot
I am having trouble eliminating datetime gaps within a dataset that i'm trying to create a very simple line chart in plotly express and I have straight lines on the graph connecting datapoints over a gap in the data (weekends).
Dataframe simply has an index of datetime (to the hour) called sale_date, and cols called NAME, COST with approximately 30 days worth of data.
df['sale_date'] = pd.to_datetime(df['sale_date'])
df = df.set_index('sale_date')
px.line(df, x=df.index, y='COST', color='NAME')
I've seen a few posts regarding this issue and one recommended setting datetime as the index, but it still yields the gap lines.
The data in the example may not be the same as yours, but the point is that you can change the x-axis data to string data instead of date/time data, or change the x-axis type to category, and add a scale and tick text.
import pandas as pd
import plotly.express as px
import numpy as np
np.random.seed(2021)
date_rng = pd.date_range('2021-08-01','2021-08-31', freq='B')
name = ['apple']
df = pd.DataFrame({'sale_date':pd.to_datetime(date_rng),
'COST':np.random.randint(100,3000,(len(date_rng),)),
'NAME':np.random.choice(name,size=len(date_rng))})
df = df.set_index('sale_date')
fig= px.line(df, x=[d.strftime('%m/%d') for d in df.index], y='COST', color='NAME')
fig.show()
xaxis update
fig= px.line(df, x=df.index, y='COST', color='NAME')
fig.update_xaxes(type='category',
tickvals=np.arange(0,len(df)),
ticktext=[d.strftime('%m/%d') for d in df.index])
I dynamically generate a pandas dataframe where columns are months, index is day-of-month, and values are cumulative revenue. This is fairly easy, b/c it just pivots a dataframe that is month/dom/rev.
But now I want to plot it in plotly. Since every month the columns will expand, I don't want to manually add a trace per month. But I can't seem to have a single trace incorporate multiple columns. I could've sworn this was possible.
revs = Scatter(
x=df.index,
y=[df['2016-Aug'], df['2016-Sep']],
name=['rev', 'revvv'],
mode='lines'
)
data=[revs]
fig = dict( data=data)
iplot(fig)
This generates an empty graph, no errors. Ideally I'd just pass df[df.columns] to y. Is this possible?
You were probably thinking about cufflinks. You can plot a whole dataframe with Plotly using the iplot function without data replication.
An alternative would be to use pandas.plot to get an matplotlib object which is then converted via plotly.tools.mpl_to_plotly and plotted. The whole procedure can be shortened to one line:
plotly.plotly.plot_mpl(df.plot().figure)
The output is virtually identical, just the legend needs tweaking.
import plotly
import pandas as pd
import random
import cufflinks as cf
data = plotly.tools.OrderedDict()
for month in ['2016-Aug', '2016-Sep']:
data[month] = [random.randrange(i * 10, i * 100) for i in range(1, 30)]
#using cufflinks
df = pd.DataFrame(data, index=[i for i in range(1, 30)])
fig = df.iplot(asFigure=True, kind='scatter', filename='df.html')
plot_url = plotly.offline.plot(fig)
print(plot_url)
#using mpl_to_plotly
plot_url = plotly.offline.plot(plotly.tools.mpl_to_plotly(df.plot().figure))
print(plot_url)
I am learning python pandas + matplotlib + seaborn plotting and data visualization from a "R Lattice" perspective. I am still getting my legs. Here is a basic question that I could not get to work just right. Here's the example:
# envir (this is running in an iPython notebook)
%pylab inline
# imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# generate some data
nRows = 500
df = pd.DataFrame({'c1' : np.random.choice(['A','B','C','D'], size=nRows),
'c2' : np.random.choice(['P','Q','R'], size=nRows),
'i1' : np.random.randint(20,50, nRows),
'i2' : np.random.randint(0,10, nRows),
'x1' : 3 * np.random.randn(nRows) + 90,
'x2' : 2 * np.random.randn(nRows) + 89,
't1' : pd.date_range('10/3/2014', periods=nRows)})
# plot a lattice like plot
# 'hue=' is like 'groups=' in R
# 'col=' is like "|" in lattice formula interface
g = sns.FacetGrid(df, col='c1', hue='c2', size=4, col_wrap=2, aspect=2)
g.map(scatter, 't1', 'x1', s=20)
g.add_legend()
I would like the x axis to plot in an appropriate date time format, not as an integer. I am ok specify the format (YYYY-MM-DD, for example) as a start.
However it would be better if the time range was inspected and the appropriate scale was produced. In R Lattice (and other plotting systems), if the x variable is a datetime, a "pretty" function would determine if the range was large and implied YYYY only (say, for plotting 20 year time trend), YYYY-MM (for plotting something that was a few years)... or YYYY-MM-DD HH:MM:SS format for high frequency time series data (i.e. something sampled every 100 mS). That was done automatically. Is there anything like that available for this case?
One other really basic question on this example (I am almost embarrassed to ask). How can I get a title on this plot?
Thanks!
Randall
It looks like seaborn does not support datetime on the axes in lmplot yet. However, it does support with a few other of its plots. In the mean time, I would suggest adding your need to the issue in the link above, since it currently seems there isn't enough perceived need for them to address it.
As far as a title, use can use set_title() on the object itself. That would look something like this:
.
.
.
g = sns.FacetGrid(df, col='c1', hue='c2', size=4, col_wrap=2, aspect=2)
g.map(scatter, 't1', 'x1', s=20)
g.add_legend()
Then simply add:
g.set_title('Check out that beautiful facet plot!')