faceting and x axis selection in seaborn - python

I am working with this dataframe containing bit coin data from yahoo finance. I set a list of cryptocurrencies and I would like:
a. to limit the x axis to the last 2 months
b. try to put all the graphs together, like faceting one close to the other in a graph table, as it s possible to do in ggplot in R.
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
# For reading stock data from yahoo
from pandas.io.data import DataReader
# For time stamps
from datetime import datetime
# For division
from __future__ import division
tech_list = ['BTC','TTC','DGC','DEE','PPC']
end = datetime.now()
start = datetime(end.year - 1,end.month,end.day)
for stock in tech_list:
# Set DataFrame as the Stock Ticker
globals()[stock] = DataReader(stock,'yahoo',start,end)
DEE['Volume'].plot(legend=True,figsize=(10,4))
Should I change something in the time definition or in seaborn itself?
thanks

Your question has nothing to do with seaborn, it is basic
matplotlib stuff. seaborn only changes the style of figures in
your case, so I exclude it from the code I provide below, but you can
import it if you want to change the style.
To get the data for the last 2 months you should make your start for two months before today. It is easy to do with relativedelta:
from dateutil.relativedelta import relativedelta
start = dt.datetime(end.year,end.month,end.day) - relativedelta(months=2)
Changing limits of x axis would be more painful if you plot with padnas.DataFrame.plot(). But if you want to import all data and then select only data for the last 2 months, you can use the same relativedelta trick, but with indexing, like this:
DEE.loc[DEE.index >= end - relativedelta(months=2),'Volume'].plot()
As for putting the graphs together, your question is not phrased clearly, so I can only guess that you meant putting all stocks one under another as subplots, like this:
I rewrote your code to do this:
import pandas as pd
# For reading stock data from yahoo
from pandas.io.data import DataReader
import matplotlib.pyplot as plt
%matplotlib inline
# For time stamps
import datetime as dt
from dateutil.relativedelta import relativedelta
end = dt.datetime.now()
start = dt.datetime(end.year-1,end.month,end.day)
tech_list = dict({'BTC':None,'TTC':None,'DGC':None,'DEE':None,'PPC':None})
for stock in tech_list.keys():
tech_list[stock] = DataReader(stock,'yahoo',start,end)
months_to_plot = relativedelta(months=2)
fig = plt.figure(figsize=(8,10))
for (n,stock) in enumerate(tech_list.keys()):
ax = fig.add_subplot(len(tech_list),1,n)
tech_list[stock].loc[tech_list[stock].index >= end - months_to_plot,'Volume'].plot()
ax.set_title(stock)
plt.tight_layout()
P.S. Please be more clear in the future with your questions if you want a concise answer. The code you provided contains extra lines, not necessary for your question, which makes it more difficult to understand what exactly you want to do. Your question, on the other hand, is not nearly as detailed as it could be.

Related

Plotly Calmap (heatmap calendar): Mark certain dates

I have used calmap to analyze whether data has been imported correctly over the past 2.5 years. I have a few gaps, some of which are easy to explain with holidays. However, I need to google search for those dates to confirm because they are not marked as such by default in the output-calendar. Is there a way to do that automatically? There is none in the documentation for highlighting specific dates. I was thinking of adding an extra df with only holidays for the respective years but I couldn't figure out how to add two dataframes (with different coloring) to the calendar.
Here's the function I am referencing:
import numpy as np
import matplotlib.pyplot as plt
from plotly_calplot import calplot
all_days = pd.date_range('1/1/2019', periods=730, freq='D')
days = np.random.choice(all_days, 500)
events = pd.Series(np.random.randn(len(days)), index=days)
fig = calplot(events, cmap='YlGn', colorbar=False, title="Fantastic Calendar")
fig.show()ยดยดยด

How to add a string comment above every single candle using mplfinance.plot() or any similar package?

i want to add a string Comment above every single candle using mplfinance package .
is there a way to do it using mplfinance or any other package ?
here is the code i used :
import pandas as pd
import mplfinance as mpf
import matplotlib.animation as animation
from mplfinance import *
import datetime
from datetime import date, datetime
fig = mpf.figure(style="charles",figsize=(7,8))
ax1 = fig.add_subplot(1,1,1 , title='ETH')
def animate(ival):
idf = pd.read_csv("test1.csv", index_col=0)
idf['minute'] = pd.to_datetime(idf['minute'], format="%m/%d/%Y %H:%M")
idf.set_index('minute', inplace=True)
ax1.clear()
mpf.plot(idf, ax=ax1, type='candle', ylabel='Price US$')
ani = animation.FuncAnimation(fig, animate, interval=250)
mpf.show()
You should be able to do this using Axes.text()
After calling mpf.plot() then call
ax1.text()
for each text that you want (in your case for each candle).
There is an important caveat regarding the x-axis values that you pass into ax1.text():
If you do not specify show_nontrading=True then it will default to False in which case the x-axis value that you pass into ax1.text() for the position of the text must be the row number corresponding to the candle where you want the text counting from 0 for the first row in your DataFrame.
On the other hand if you do set show_nontrading=True then the x-axis value that you pass into ax1.text() will need to be the matplotlib datetime. You can convert pandas datetimes from you DataFrame DatetimeIndex into matplotlib datetimes as follows:
import matplotlib.dates as mdates
my_mpldates = mdates.date2num(idf.index.to_pydatetime())
I suggest using the first option (DataFrame row number) because it is simpler. I am currently working on an mplfinance enhancement that will allow you to enter the x-axis values as any type of datetime object (which is the more intuitive way to do it) however it may be another month or two until that enhancement is complete, as it is not trivial.
Code example, using data from the mplfinance repository examples data folder:
import pandas as pd
import mplfinance as mpf
infile = 'data/yahoofinance-SPY-20200901-20210113.csv'
# take rows [18:28] to keep the demo small:
df = pd.read_csv(infile, index_col=0, parse_dates=True).iloc[18:25]
fig, axlist = mpf.plot(df,type='candle',volume=True,
ylim=(330,345),returnfig=True)
x = 1
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'Custom\nText\nHere')
x = 3
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'High here\n= '+str(y-1),fontstyle='italic')
x = 5
y = df.loc[df.index[x],'High']+1
axlist[0].text(x-0.2,y,'More\nCustom\nText\nHere',fontweight='bold')
mpf.show()
Comments on the above code example:
I am setting the ylim=(330,345) in order to provide a little extra room above the candles for the text. In practice you might choose the high dynamically as perhaps high_ylim = 1.03*max(df['High'].values).
Notice that the for first two candles with text, the text begins at the center of the candle. The 3rd text call uses x-0.2 to position the text more over the center of the candle.
For this example, the y location of the candle is determined by taking the high of that candle and adding 1. (y = df.loc[df.index[x],'High']+1) Of course adding 1 is arbitrary, and in practice, depending on the maginitude of your prices, adding 1 may be too little or too much. Rather you may want to add a small percentage, for example 0.2 percent:
y = df.loc[df.index[x],'High']
y = y * 1.002
Here is the plot the above code generates:

Adding signals on the candle chart

I would like to plot signals on my chart is there is a way to do it on candle stick?
I did the following and got stuck :(
!pip install yfinance
!pip install mplfinance
import yfinance as yf
import mplfinance as mpf
import numpy as np
import pandas as pd
df=yf.download('BTC-USD',start='2008-01-04',end='2021-06-3',interval='1d')
buy=np.where((df['Close'] > df['Open']) & (df['Close'].shift(1) < df['Open'].shift(1),1,0)
fig = plt.figure(figsize = (20,10))
mpf.plot(df,figsize=(20,12),type ='candle',volume=True);
# any idea how to add the signal?
import yfinance as yf
import mplfinance as mpf
import numpy as np
df = yf.download('BTC-USD', start='2008-01-04', end='2021-06-3', interval='1d').tail(50)
buy = np.where((df['Close'] > df['Open']) & (df['Close'].shift(1) < df['Open'].shift(1)), 1, np.nan) * 0.95 * df['Low']
apd = [mpf.make_addplot(buy, scatter=True, markersize=100, marker=r'$\Uparrow$', color='green')]
mpf.plot(df, type='candle', volume=True, addplot=apd)
I just added .tail() for better visualization.
Output:
You place signals on the plot using the "make additional plot" api: mpf.make_addplot(data,**kwargs). The data that you pass in to make_addplot must be the same length as your original candlestick dataframe (so that mplfinance can line it up appropriately with the candlesticks). If you do not want to plot a signal at every location you simply fill the data with nan values except where you do want to plot a signal.
The return value from ap = mpf.make_addplot() is then passed into mpf.plot(df,addplot=ap) using the addplot kwarg.
You can see many examples in this tutorial on adding your own technical studies to plots.
Take the time (maybe 10 minutes or so) to go carefully through the entire tutorial. It will be time well spent.

How do create a step chart in Pandas with time series data from two seperate data sources

I have two time-series datasets that I want to make a step-chart of.
The time series data is between Monday 2015-04-20 and Friday 2015-04-24.
The first dataset contains 26337 rows with values ranging from 0-1.
The second dataset contains 80 rows with values between 0-4.
First dataset represents motion sensor values in a room, with around 2-3 minutes between each measurement. 1 indicates the room is occupied, 0 indicates that it is empty. The second contains data from a survey where users could fill in how many people were in the same room, at the time they were answering the survey.
Now I want to compare this data, to find out how well the sensor performs. Obviously there is a lot of data that is "missing" in the second set. Is there a way to fill in the "blanks" in a step chart?
Each row has the following format:
Header
Timestamp (%Y-%m-%d %H:%M:%S),value
Example:
Time,Occupancy
24-04-2015 21:40:33,1
24-04-2015 21:43:11,0
.....
So far I have managed to import the first dataset and make a plot of it. Unfortunately the x-axis is not showing dates, but a lot of numbers:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
data = open('PIRDATA.csv')
ts = pd.Series.from_csv(data, sep=',')
plot(ts);
Result:
How would I go on from here on now?
Try to use Pandas to read the data, using the Date column as the index (parsing the values to dates).
data = pd.read_csv('PIRDATA.csv', index_col=0, parse_dates=0)
To achieve your step chart objective, try:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.dates import DateFormatter
from matplotlib.dates import HourLocator
small_dataset = pd.read_csv('SURVEY_RESULTS_WEEK1.csv', header=0,index_col=0, parse_dates=0)
big_dataset = pd.read_csv('PIRDATA_RAW_CONVERTED_DATETIME.csv', header=0,index_col=0, parse_dates=0)
small_dataset.rename(columns={'Occupancy': 'Survey'}, inplace=True)
big_dataset.rename(columns={'Occupancy': 'PIR'}, inplace=True)
big = big_dataset.plot()
big.xaxis.set_major_formatter(DateFormatter('%y-%m-%d H: %H'))
big.xaxis.set_major_locator(HourLocator(np.arange(0, 25, 6)))
big.set_ylabel('Occupancy')
small_dataset.plot(ax=big, drawstyle='steps')
fig = plt.gcf()
fig.suptitle('PIR and Survey Occupancy Comparsion')
plt.show()

Time-series boxplot in pandas

How can I create a boxplot for a pandas time-series where I have a box for each day?
Sample dataset of hourly data where one box should consist of 24 values:
import pandas as pd
n = 480
ts = pd.Series(randn(n),
index=pd.date_range(start="2014-02-01",
periods=n,
freq="H"))
ts.plot()
I am aware that I could make an extra column for the day, but I would like to have proper x-axis labeling and x-limit functionality (like in ts.plot()), so being able to work with the datetime index would be great.
There is a similar question for R/ggplot2 here, if it helps to clarify what I want.
If its an option for you, i would recommend using Seaborn, which is a wrapper for Matplotlib. You could do it yourself by looping over the groups from your timeseries, but that's much more work.
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
Which gives:
Note that i'm passing the day of year as the grouper to seaborn, if your data spans multiple years this wouldn't work. You could then consider something like:
ts.index.to_series().apply(lambda x: x.strftime('%Y%m%d'))
Edit, for 3-hourly you could use this as a grouper, but it only works if there are no minutes or lower defined. :
[(dt - datetime.timedelta(hours=int(dt.hour % 3))).strftime('%Y%m%d%H') for dt in ts.index]
(Not enough rep to comment on accepted solution, so adding an answer instead.)
The accepted code has two small errors: (1) need to add numpy import and (2) nned to swap the x and y parameters in the boxplot statement. The following produces the plot shown.
import numpy as np
import pandas as pd
import seaborn
import matplotlib.pyplot as plt
n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
I have a solution that may be helpful-- It only uses native pandas and allows for hierarchical date-time grouping (i.e spanning years). The key is that if you pass a function to groupby(), it will be called on each element of the dataframe's index. If your index is a DatetimeIndex (or similar), you can access all of the dt's convenience functions for resampling!
Try this:
n = 480
ts = pd.DataFrame(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
ts.groupby(lambda x: x.strftime("%Y-%m-%d")).boxplot(subplots=False, figsize=(12,9), rot=90)

Categories