aggregate tick data to open high low close non time related - python

I would like to consolidate tick data stored in a pandas dataframe to the open high low close format but not time related, but aggregated for every 100 ticks. After that I would like to display them in a candlestick chart using matlibplot.
I solved this already for a time related aggregation using a pandas dataset with two values: TIMESTAMP and PRICE. The TIMESTAMP already has the pandas date format so I work with that:
df["TIMESTAMP"]= pd.to_datetime(df["TIMESTAMP"])
df = df.set_index(['TIMESTAMP'])
data_ohlc = df['PRICE'].resample('15Min').ohlc()
Is there any function, that resamples datasets in the ohlc format not using a time frame, but a count of ticks?
After that it comes to visualization, so for plotting I have to change date format to mdates. The candlestick_ohlc function requires a mdate format:
data_ohlc["TIMESTAMP"] = data_ohlc["TIMESTAMP"].apply(mdates.date2num)
from mpl_finance import candlestick_ohlc
candlestick_ohlc(ax1,data_ohlc.values,width=0.005, colorup='g', colordown='r',alpha=0.75)
So is there any function to display a candle stick chart without mdates because by aggregating tick data there would be no time relation?

As there seems to be no build in function for this problem I wrote one myself. The given dataframe needs to have the actual values in the column "PRICE":
def get_pd_ohlc(mydf, interval):
## use a copy, so that the new column doesn't effect the original dataset
mydf = mydf.copy()
## Add a new column to name tick interval
interval = [(1+int(x/interval)) for x in range(mydf["PRICE"].count())]
mydf["interval"] = interval
##Step 1: Group
grouped = mydf.groupby('interval')
##Step 2: Calculate different aggregations
myopen = grouped['PRICE'].first()
myhigh = grouped['PRICE'].max()
mylow = grouped['PRICE'].min()
myclose = grouped['PRICE'].last()
##Step 3: Generate Dataframe:
pd_ohlc = pd.DataFrame({'OPEN':myopen,'HIGH':myhigh,'LOW':mylow,'CLOSE':myclose})
return(pd_ohlc)
pd_100 = get_pd_ohlc(df,100)
print (pd_100.head())
I also found a solution to display ist. Module mpl_finance has a function candlestick2_ohlc, that does not need any datetime information. Here is the code:
#Making plot
import matplotlib.pyplot as plt
from mpl_finance import candlestick2_ohlc
fig = plt.figure()
plt.rcParams['figure.figsize'] = (16,8)
ax1 = plt.subplot2grid((6,1), (0,0), rowspan=12, colspan=1)
#Making candlestick plot
candlestick2_ohlc(ax1, pd_ohlc['OPEN'], pd_ohlc['HIGH'],
pd_ohlc['LOW'], pd_ohlc['CLOSE'], width=0.5,
colorup='#008000', colordown='#FF0000', alpha=1)

Related

Python Pyplot - Format Plotted Graph's Y Axis as a Percent to 2 Decimal Places

I am trying to represent CDC Delay of Care data as a line graph but am having some trouble formatting the y axis so that it is a percentage to the hundredths place. I would also like for the x axis to show every year in the range selected.
Here is my code:
import pandas as pd
from isolation import isolate_total_stub, isolate_age_stub
import matplotlib.pyplot as plt
# very simple extraction, drop some columns and check some data
cdc_data = pd.read_csv('CDC_Delay_of_Care_Data.csv')
# separate the categories of delayed care
delay_of_medical_care = cdc_data[cdc_data.PANEL == 'Delay or nonreceipt of needed medical care due to cost']
# isolate the totals stub
total_delay_of_medical_care = isolate_total_stub(delay_of_medical_care)
x_axis = total_delay_of_medical_care.YEAR
y_axis = total_delay_of_medical_care.ESTIMATE
plt.plot(x_axis, y_axis)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()
The graph that displays looks like this:
line graph
Excuse me for being a novice, I have been googling for an hour now and instead of continue to search for an answer I thought it would be more productive to ask StackOverflow.
Thank you for your time.
To change the format of Y-axis, you can use set_major_formatter
To change X-axis to date in year format, you will need to use set_major_locator, assuming that your date is in datetime format
To change format of X-axis, you can again use the set_major_formatter
I am showing a small example below with dummy data. Hope this works.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import matplotlib.dates as mdate
estimate = [8, 7.1, 11, 10.6, 8, 8.3]
year = ['2000-01-01', '2004-01-01', '2008-01-01', '2012-01-01', '2016-01-01', '2020-01-01']
year=pd.to_datetime(year) ## Convert string to datetime
plt.figure(figsize=(12,5)) ## Added so the Years don't overlap on each other
plt.plot(year, estimate)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.gca().yaxis.set_major_formatter(FormatStrFormatter('%.2f')) ## Makes X-axis label with two decimal points
locator = mdate.YearLocator()
plt.gca().xaxis.set_major_locator(locator) ## Changes datetime to years - 1 label per year
plt.gca().xaxis.set_major_formatter(mdate.DateFormatter('%Y')) ## Shows X-axis in Years
plt.gcf().autofmt_xdate() ## Rotates X-labels, if you want to use it
plt.show()
Output plot

How to add a string comment above every single candle using mplfinance.plot() or any similar package?

i want to add a string Comment above every single candle using mplfinance package .
is there a way to do it using mplfinance or any other package ?
here is the code i used :
import pandas as pd
import mplfinance as mpf
import matplotlib.animation as animation
from mplfinance import *
import datetime
from datetime import date, datetime
fig = mpf.figure(style="charles",figsize=(7,8))
ax1 = fig.add_subplot(1,1,1 , title='ETH')
def animate(ival):
idf = pd.read_csv("test1.csv", index_col=0)
idf['minute'] = pd.to_datetime(idf['minute'], format="%m/%d/%Y %H:%M")
idf.set_index('minute', inplace=True)
ax1.clear()
mpf.plot(idf, ax=ax1, type='candle', ylabel='Price US$')
ani = animation.FuncAnimation(fig, animate, interval=250)
mpf.show()
You should be able to do this using Axes.text()
After calling mpf.plot() then call
ax1.text()
for each text that you want (in your case for each candle).
There is an important caveat regarding the x-axis values that you pass into ax1.text():
If you do not specify show_nontrading=True then it will default to False in which case the x-axis value that you pass into ax1.text() for the position of the text must be the row number corresponding to the candle where you want the text counting from 0 for the first row in your DataFrame.
On the other hand if you do set show_nontrading=True then the x-axis value that you pass into ax1.text() will need to be the matplotlib datetime. You can convert pandas datetimes from you DataFrame DatetimeIndex into matplotlib datetimes as follows:
import matplotlib.dates as mdates
my_mpldates = mdates.date2num(idf.index.to_pydatetime())
I suggest using the first option (DataFrame row number) because it is simpler. I am currently working on an mplfinance enhancement that will allow you to enter the x-axis values as any type of datetime object (which is the more intuitive way to do it) however it may be another month or two until that enhancement is complete, as it is not trivial.
Code example, using data from the mplfinance repository examples data folder:
import pandas as pd
import mplfinance as mpf
infile = 'data/yahoofinance-SPY-20200901-20210113.csv'
# take rows [18:28] to keep the demo small:
df = pd.read_csv(infile, index_col=0, parse_dates=True).iloc[18:25]
fig, axlist = mpf.plot(df,type='candle',volume=True,
ylim=(330,345),returnfig=True)
x = 1
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'Custom\nText\nHere')
x = 3
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'High here\n= '+str(y-1),fontstyle='italic')
x = 5
y = df.loc[df.index[x],'High']+1
axlist[0].text(x-0.2,y,'More\nCustom\nText\nHere',fontweight='bold')
mpf.show()
Comments on the above code example:
I am setting the ylim=(330,345) in order to provide a little extra room above the candles for the text. In practice you might choose the high dynamically as perhaps high_ylim = 1.03*max(df['High'].values).
Notice that the for first two candles with text, the text begins at the center of the candle. The 3rd text call uses x-0.2 to position the text more over the center of the candle.
For this example, the y location of the candle is determined by taking the high of that candle and adding 1. (y = df.loc[df.index[x],'High']+1) Of course adding 1 is arbitrary, and in practice, depending on the maginitude of your prices, adding 1 may be too little or too much. Rather you may want to add a small percentage, for example 0.2 percent:
y = df.loc[df.index[x],'High']
y = y * 1.002
Here is the plot the above code generates:

Plotly Express Chart Gaps Even with Index

I am having trouble eliminating datetime gaps within a dataset that i'm trying to create a very simple line chart in plotly express and I have straight lines on the graph connecting datapoints over a gap in the data (weekends).
Dataframe simply has an index of datetime (to the hour) called sale_date, and cols called NAME, COST with approximately 30 days worth of data.
df['sale_date'] = pd.to_datetime(df['sale_date'])
df = df.set_index('sale_date')
px.line(df, x=df.index, y='COST', color='NAME')
I've seen a few posts regarding this issue and one recommended setting datetime as the index, but it still yields the gap lines.
The data in the example may not be the same as yours, but the point is that you can change the x-axis data to string data instead of date/time data, or change the x-axis type to category, and add a scale and tick text.
import pandas as pd
import plotly.express as px
import numpy as np
np.random.seed(2021)
date_rng = pd.date_range('2021-08-01','2021-08-31', freq='B')
name = ['apple']
df = pd.DataFrame({'sale_date':pd.to_datetime(date_rng),
'COST':np.random.randint(100,3000,(len(date_rng),)),
'NAME':np.random.choice(name,size=len(date_rng))})
df = df.set_index('sale_date')
fig= px.line(df, x=[d.strftime('%m/%d') for d in df.index], y='COST', color='NAME')
fig.show()
xaxis update
fig= px.line(df, x=df.index, y='COST', color='NAME')
fig.update_xaxes(type='category',
tickvals=np.arange(0,len(df)),
ticktext=[d.strftime('%m/%d') for d in df.index])

How to create a Boxplot with Timestamp using Matplotlib and Seaborn?

I have been trying to get a boxplot with each box representing an emotion over a period of time.
The data frame used to plot this contains timestamp and emotion name. I have tried converting the timestamp into a string first and then to datetime and finally to int64. This resulted in the gaps between x labels as seen in the plot. I have tried the same without converting to int64, but the matplotlib doesn't seem to allow the dates in the plot.
I'm attaching the code I have used here:
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib qt
import pandas as pd
import numpy as np
from datetime import datetime
import seaborn as sns
data = pd.read_csv("TX-governor-sentiment.csv")
## check data types
data.dtypes
# drop rows with all missing values
data = data.dropna(how='all')
## transforming the timestamp column
#convert from obj type to string then to date type
data['timestamp2'] = data['timestamp']
data['timestamp2'] = pd.to_datetime(data['timestamp2'].astype(str), format='%m/%d/%Y %H:%M')
# convert to number format with the following logic:
# yyyymmddhourmin --> this allows us to treat dates as a continuous variable
data['timestamp2'] = data['timestamp2'].dt.strftime('%Y%m%d%H%M')
data['timestamp2'] = data['timestamp2'].astype('int64')
print (data[['timestamp','timestamp2']])
#data transformation for data from Orange
df = pd.DataFrame(columns=('timestamp', 'emotion'))
for index, row in data.iterrows():
if row['sentiment'] == 0:
df.loc[index] = [row['timestamp2'], 'Neutral']
else:
df.loc[index] = [row['timestamp2'], row['Emotion']]
# Plot using Seaborn & Matplotlib
#convert timestamp in case it's not in number format
df['timestamp'] = df['timestamp'].astype('int64')
fig = plt.figure(figsize=(10,10))
#colors = {"Neutral": "grey", "Joy": "pink", "Surprise":"blue"}
#visualize as boxplot
plot_ = sns.boxplot(x="timestamp", y="emotion", data=df, width=0.5,whis=np.inf);
#add data point on top
plot_ = sns.stripplot(x="timestamp", y="emotion", data=df, alpha=0.8, color="black");
fig.canvas.draw()
#modify ticks and labels
plt.xlim([202003010000,202004120000])
plt.xticks([202003010000, 202003150000, 202003290000, 202004120000], ['2020/03/01', '2020/03/15', '2020/03/29', '2020/04/12'])
#add colors
for patch in plot_.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .3))
Please let me know how I can overcome this problem of gaps in the boxplot. Thank you!

Matplotlib x-axis ticks, fixed location for dates

In the timeline plot I’m making, I want date tickers to show only specified dates. (In my example I show tickers for events ‘A’, but it can be any list on tickers). I found how to do it when x-axis data is numeric (upper subplot in my example), but this won’t work with timestamp date type (bottom plot).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
myData = pd.DataFrame({'date':['2019-01-15','2019-02-10','2019-03-20','2019-04-17','2019-05-23','2019-06-11'],'cnt':range(6),'event':['a','b','a','b','a','b']})
myData['date'] = [pd.Timestamp(j) for j in myData['date']]
start = pd.Timestamp('2019-01-01')
stop = pd.Timestamp('2019-07-01')
inxa = myData.loc[myData['event'] == 'a'].index
inxb = myData.loc[myData['event'] == 'b'].index
# create two plots, one with 'cnt' as x-axis, the other 'dates' on x-axis.
fig, ax = plt.subplots(2,1,figsize=(16,9))
ax[0].plot((0,6),(0,0), 'k')
ax[1].plot((start, stop),(0,0))
for g in inxa:
ax[0].plot((myData.loc[g,'cnt'],myData.loc[g,'cnt']),(0,1),c='r')
ax[1].plot((myData.loc[g,'date'],myData.loc[g,'date']),(0,1),c='r')
for g in inxb:
ax[0].plot((myData.loc[g,'cnt'],myData.loc[g,'cnt']),(0,2),c='b')
ax[1].plot((myData.loc[g,'date'],myData.loc[g,'date']),(0,2),c='b')
xlist0 = myData.loc[myData['event']=='a','cnt']
xlist1 = myData.loc[myData['event']=='a','date']
ax[0].xaxis.set_major_locator(ticker.FixedLocator(xlist0))
# ax[1].xaxis.set_major_locator(**???**)
Couldn't find a sufficient duplicate, maybe I didn't look hard enough. There are a number of ways to do this:
Converting to numbers first or using the underlying values of a Pandas DateTime Series
xticks = [mdates.date2num(z) for z in xlist1]
# or
xticks = xlist1.values
and at least a couple ways to use it/them
ax[1].xaxis.set_major_locator(ticker.FixedLocator(xticks))
ax[1].xaxis.set_ticks(xticks)
Date tick labels
How to set the xticklabels for date in matplotlib
how to get ticks every hour?
...

Categories