Pyplot - Plotting multiple distribution from a dataframe - python

I am analysing a stock portfolio.
I downloaded data from yahoo finance and created a data frame.
What I want to do now is analyse and plot simple returns and log returns distributions and i want to be able to do it for one stock, but also (and here is my question) to plot all the stocks' distribution in the same graph so to spot their different behaviours.
I can plot the single stock return distribution but not the multiple graphs one.
#Import libraries
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from pandas_datareader import data as pdr
#Set start/end time and get stocks data from Yahoo Finance
end = dt.datetime.now()
start = end - dt.timedelta(weeks=104)
stocks_list = ['COST', 'NIO', 'AMD']
df = pdr.get_data_yahoo(stocks_list, start, end)
#Rename 'Adj Close' to be able to create a more accessible variable
df.rename(columns = {"Adj Close" : "Adj_Close"}, inplace = True)
AClose = df.Adj_Close
#Import plotly
import plotly.offline as pyo
pyo.init_notebook_mode(connected=True)
pd.options.plotting.backend = 'plotly'
#Plot the Adjusted Close price with plotly
AClose.plot()
#Plot the SIMPLE return distrib. chart of the Adj_Close price of 'COST' with plotly
AClose['COST'].pct_change().plot(kind='hist')
#Plot the SIMPLE return distrib. chart of the Adj_Close price of ALL the stocks
**QUESTION 1**
#Compute log returns
log_returns = np.log(df.Adj_Close/df.Adj_Close.shift(1)).dropna()
#Plot the LOG returns distrib. chart of the Adj_Close price of 'COST'
log_returns['COST'].plot(kind = 'hist')
#Plot the LOG return distrib. chart of the Adj_Close price of ALL the stocks
**QUESTION 2**
What I am trying to achieve is something like this, but I want to plot all the stocks and not just one and I want to do it with plotly so i can toggle the stocks data in & out from the graph's legend.
Normal Vs Stock return

I have created the code with the understanding that you want to create a histogram with all returns in different colors. I think all I need to do is add a normal distribution graph in line mode for scatter plots. Simple returns can also be graphed using the same technique. I think that each comparison should consist of subplots.
log_max = log_returns.max().max()
log_min = log_returns.min().min()
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Histogram(x=log_returns.COST, name='COST'))
fig.add_trace(go.Histogram(x=log_returns.NIO, name='NIO'))
fig.add_trace(go.Histogram(x=log_returns.AMD, name='AMD'))
fig.update_xaxes(range=[log_min, log_max])
fig.update_layout(autosize=True, height=450, barmode='overlay', title='Log returns: COST,NIO,AMD')
fig.update_traces(opacity=0.5)
fig.show()

Related

How to plot two columns on one line graph using pandas and seaborn?

I want to plot columns High and Low into the same line graph using seaborn.
They are from the same csv file. I managed to graph High on the y-axis but I am struggling to get the Low in.
For reference, the files are from Kaggle.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
# Read Data from CSV:
micro = pd.read_csv('Microsoft Corporationstock.csv', parse_dates=['Date'])
covid = pd.read_csv('day_wise.csv', parse_dates=['Date'])
# Filter Data Needed:
start_date = covid['Date'].loc[0]
index_date = int(micro[micro['Date'] == start_date].index.values)
# Microsoft Stock Plot:
mg = sns.relplot(x='Date', y='High', kind='line', data=micro.loc[index_date:])
mg.set_xticklabels(rotation=30)
plt.ylabel('Market Value')
mg.fig.autofmt_xdate()
plt.show()
High Stock Graph
In general seaborn plots work better in "melted" format.
Here you can melt the data into Date, Type (High or Low), and Market Value. Then use y='Market Value' and hue='Type' to get both High and Low in the same plot:
mg = sns.relplot(
data=micro.loc[index_date:].melt(id_vars='Date', var_name='Type', value_name='Market Value')),
x='Date',
y='Market Value',
hue='Type',
kind='line',
)
Does changing it to this work? If it's similar to matplotlib it might.
plot_y = ['High', 'Low']
for y in plot_y:
mg = sns.relplot(x='Date', y=y, kind='line', data=micro.loc[index_date:])
mg.fig.autofmt_xdate()
mg.set_xticklabels(rotation=30)
plt.ylabel('Market Value')
plt.show()

Plotly - Histogram bins size to weeks

I'm trying to plot a histogram with date data using plotly. I would like to plot it with bin sizes corresponding to weeks, and that doesn't seem to work. I searched for documentation about it but didn't find anything.
Here is the code I have. I tried (line 5): 'D7' and 'W1'. That doesn't work (plotly seems not to recognize argument, and set it to one bin per day). What's strange is that 'M1', 'M3' etc... seem to work
fig = go.Figure(data=[go.Histogram(x=df.col,
xbins=dict(
start='2018-01-01',
end='2018-12-31',
size='D7'),
autobinx=False)])
fig.update_layout(
title=go.layout.Title(
text="title",
xref="paper",
x=0.5
),
xaxis_title_text='xaxis title',
yaxis_title_text='yaxis title'
)
fig.show()
Would someone have any information about this problem ?
Thanks
xbins.size is specified in milliseconds by default. To get weekly bins, set xbins.size to 604800000 (7 days with 86,400,000 milliseconds each).
Plotly provides the format xM to get monthly bins because this use case requires more complicated calculations in the background, as monthly bins do not have a uniform size.
It seems that a resampled data source and a bar plot is what you're really looking for:
Plot:
Here, the source data based on daily observations DatetimeIndex(['2020-01-01', '2020-01-02', ... , '2020-07-18'], have been resampled to show sum per week for a certain stock price.
Code:
# Imports
import pandas as pd
#import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
#from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
# data, random sample to illustrate stocks
np.random.seed(12345)
rows = 200
x = pd.Series(np.random.randn(rows),index=pd.date_range('1/1/2020', periods=rows)).cumsum()
y = pd.Series(x-np.random.randn(rows)*5,index=pd.date_range('1/1/2020', periods=rows))
df = pd.concat([y,x], axis = 1)
df.columns = ['StockA', 'StockB']
# resample daily data to weekly sums
df2=df.reset_index()
df3=df2.resample('W-Mon', on='index').mean()
# build and show plotly plot
fig = go.Figure([go.Bar(x=df3.index, y=df3['StockA'])])
fig.show()
Let me know how this works for you.

Creating a visualization with 2 Y-Axis scales

I am currently trying to plot the price of the 1080 graphics card against the price of bitcoin over time, but the scales of the Y axis are just way off. This is my code so far:
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
from matplotlib.pyplot import *
import numpy as np
GPUDATA = pd.read_csv("1080Prices.csv")
BCDATA = pd.read_csv("BitcoinPrice.csv")
date = pd.to_datetime(GPUDATA["Date"])
price = GPUDATA["Price_USD"]
date1 = pd.to_datetime(BCDATA["Date"])
price1 = BCDATA["Close"]
plot(date, price)
plot(date1, price1)
And that produces this:
The GPU prices, of course, are in blue and the price of bitcoin is in orange. I am fairly new to visualizations and I'm having a rough time finding anything online that could help me fix this issue. Some of the suggestions I found on here seem to deal with plotting data from a single datasource, but my data comes from 2 datasources.
One has entries of the GPU price in a given day, the other has the open, close, high, and low price of bitcoin in a given day. I am struggling to find a solution, any advice would be more than welcome! Thank you!
What you want to do is twin the X-axis, such that both plots will share the X-axis, but have separate Y-axes. That can be done in this way:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
GPUDATA = pd.read_csv("1080Prices.csv")
BCDATA = pd.read_csv("BitcoinPrice.csv")
gpu_dates = pd.to_datetime(GPUDATA["Date"])
gpu_prices = GPUDATA["Price_USD"]
btc_dates = pd.to_datetime(BCDATA["Date"])
btc_prices = BCDATA["Close"]
fig, ax1 = plt.subplots()
ax2 = ax1.twinx() # Create a new Axes object sharing ax1's x-axis
ax1.plot(gpu_dates, gpu_prices, color='blue')
ax2.plot(btc_dates, btc_prices, color='red')
As you have not provided sample data, I am unable to show a relevant demonstration, but this should work.

aggregate tick data to open high low close non time related

I would like to consolidate tick data stored in a pandas dataframe to the open high low close format but not time related, but aggregated for every 100 ticks. After that I would like to display them in a candlestick chart using matlibplot.
I solved this already for a time related aggregation using a pandas dataset with two values: TIMESTAMP and PRICE. The TIMESTAMP already has the pandas date format so I work with that:
df["TIMESTAMP"]= pd.to_datetime(df["TIMESTAMP"])
df = df.set_index(['TIMESTAMP'])
data_ohlc = df['PRICE'].resample('15Min').ohlc()
Is there any function, that resamples datasets in the ohlc format not using a time frame, but a count of ticks?
After that it comes to visualization, so for plotting I have to change date format to mdates. The candlestick_ohlc function requires a mdate format:
data_ohlc["TIMESTAMP"] = data_ohlc["TIMESTAMP"].apply(mdates.date2num)
from mpl_finance import candlestick_ohlc
candlestick_ohlc(ax1,data_ohlc.values,width=0.005, colorup='g', colordown='r',alpha=0.75)
So is there any function to display a candle stick chart without mdates because by aggregating tick data there would be no time relation?
As there seems to be no build in function for this problem I wrote one myself. The given dataframe needs to have the actual values in the column "PRICE":
def get_pd_ohlc(mydf, interval):
## use a copy, so that the new column doesn't effect the original dataset
mydf = mydf.copy()
## Add a new column to name tick interval
interval = [(1+int(x/interval)) for x in range(mydf["PRICE"].count())]
mydf["interval"] = interval
##Step 1: Group
grouped = mydf.groupby('interval')
##Step 2: Calculate different aggregations
myopen = grouped['PRICE'].first()
myhigh = grouped['PRICE'].max()
mylow = grouped['PRICE'].min()
myclose = grouped['PRICE'].last()
##Step 3: Generate Dataframe:
pd_ohlc = pd.DataFrame({'OPEN':myopen,'HIGH':myhigh,'LOW':mylow,'CLOSE':myclose})
return(pd_ohlc)
pd_100 = get_pd_ohlc(df,100)
print (pd_100.head())
I also found a solution to display ist. Module mpl_finance has a function candlestick2_ohlc, that does not need any datetime information. Here is the code:
#Making plot
import matplotlib.pyplot as plt
from mpl_finance import candlestick2_ohlc
fig = plt.figure()
plt.rcParams['figure.figsize'] = (16,8)
ax1 = plt.subplot2grid((6,1), (0,0), rowspan=12, colspan=1)
#Making candlestick plot
candlestick2_ohlc(ax1, pd_ohlc['OPEN'], pd_ohlc['HIGH'],
pd_ohlc['LOW'], pd_ohlc['CLOSE'], width=0.5,
colorup='#008000', colordown='#FF0000', alpha=1)

Function to plot stock price - why are they plotting together and not individually?

I am trying to automate graphing the stock price and its moving averages. I have a list of stocks, I would like to create a chart with the stock's price, its 50 day moving average, and its 200 day moving average. I want to do this for x number of stocks in my list. When I run it, why are they plotting over top of each other on the same graph, rather than individually?
import pandas as pd
from pandas import DataFrame
from matplotlib import pyplot as plt
import pandas.io.data as web
import datetime as dt
stocks = ['AAPL', 'DVN', 'XOM']
start = '2010-01-01'
end = dt.datetime.today()
def plotStock(stock):
df = web.DataReader(stock, 'yahoo', start, end)['Adj Close']
df.plot()
pd.rolling_mean(df, 50).plot(style='k')
pd.rolling_mean(df, 200).plot(style='--')
plt.title(stock, fontsize=10)
plt.savefig(stock + '.png', bbox_inches='tight')
for stock in stocks:
plotStock(stock)
To have a unique set of axes per lineplot in your figure, you'll need to add subplots. See subplot example for more details.

Categories