MACD stock indicator function using ewm() from pandas library - python

Here is the test code for my macd function, however, the values I am getting are incorrect. I don't know if it is because my span is in days and my data is in 2 minute increments, or if it is a seperate issue. Any help would be much appreciated :)
import yfinance as yf
import pandas as pd
import pandas_ta as ta
import numpy as np
import datetime as dt
import time
dataTSLA = yf.download(tickers='TSLA', period='1mo', interval='2m', auto_adjust=True)
def indicatorMACD(data):
exp1 = data['Close'].ewm(span=12, adjust=False).mean()
exp2 = data['Close'].ewm(span=26, adjust=False).mean()
macd = exp1 - exp2
signalLine = macd.ewm(span=9, adjust=False).mean()
return [macd, signalLine]
print(indicatorMACD(dataTSLA))
Getting an output of around 0.66 for macd and 0.23 for signal when it should be -0.23 and -0.64 respectively.

Use min_periods instead adjust
code:
import pandas as pd
import pandas_datareader as pdr
import matplotlib.pyplot as plt
df = pdr.DataReader('BTC-USD' , data_source='yahoo' , start='2020-01-01')
df
Function definition:
def MACD(DF,a,b,c):
df=DF.copy()
df['MA FAST'] = df['Close'].ewm(span=a , min_periods = a).mean()
df['MA SLOW'] = df['Close'].ewm(span=b , min_periods = b).mean()
df['MACD'] = df['MA FAST'] - df['MA SLOW']
df['Signal'] = df['MACD'].ewm(span= c , min_periods = c).mean()
df.dropna(inplace=True)
return df
Function call:
data = MACD(df , 12,26,9)
data

Related

Xarray resample inter annually

I am trying to resample my data annually, but struggle to set the start day of resampling.
import xarray as xr
import numpy as np
import pandas as pd
da = xr.DataArray(
np.linspace(0, 11, num=36),
coords=[
pd.date_range(
"15/12/1999", periods=36,
)
],
dims="time",
)
da.resample(time="1Y").mean()
What I am trying to achieve is to get the means of the following periods: 15/12/1999-15/12/2000, 15/12/2000-15/12/2001, 15/12/2001-15/12/2002, ...
I have solved it by shifting the time to the first month and use the corresponding pandas anchored offset. Afterwards, reset the time back.
import xarray as xr
import numpy as np
import pandas as pd
da = xr.DataArray(
np.concatenate([np.zeros(365), np.ones(365)]),
coords=[
pd.date_range(
"06/15/2017", "06/14/2019", freq='D'
)
],
dims="time",
)
days_to_first_of_month = pd.Timedelta(days=int(da.time.dt.day[0])-1)
da['time'] = da.time - days_to_first_of_month
month = da.time.dt.strftime("%b")[0].values
resampled = da.resample(time=f'AS-{month}').sum()
resampled['time'] = resampled.time + days_to_first_of_month
print(resampled)
Is there a more efficient or clean way?

I do not know why the graph is not compiling and executing code for long time

There is no error in code ( I believe)
but when I run the program, the graph does not print in the plot. It just says executing code
and i've waited like an hour but doesn't show anything.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
excel_df = pd.read_csv('data.csv', header=None)
bool_idx = excel_df < 0.006
valid_data = excel_df[bool_idx]
true_data = valid_data.dropna()
tt = np.array(true_data.iloc[0:-1, 0])
print(tt)
tt2 = np.array(true_data.iloc[1:, 0])
print(tt2)
ts = abs(tt - tt2)
print(ts)
ind = np.array(np.where([ts < 0.001]))
graph1 = plt.plot(ind)
print(ind)
true_data0001 = true_data.iloc[0, ind]
print(true_data0001)
no error

How to apply euclidean distance to dataframe. Calculate each row

Please help me, I have the problem. It's been about 2 weeks but I don't get it yet.
So, I want to use "apply" in dataframe, which I got from Alphavantage API.
I want to apply euclidean distance to each row of dataframe.
import math
import numpy as np
import pandas as pd
from scipy.spatial import distance
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from sklearn.neighbors import KNeighborsRegressor
from alpha_vantage.timeseries import TimeSeries
from services.KEY import getApiKey
ts = TimeSeries(key=getApiKey(), output_format='pandas')
And in my picture I got this
My chart (sorry can't post image because of my reputation)
In my code
stock, meta_data = ts.get_daily_adjusted(symbol, outputsize='full')
stock = stock.sort_values('date')
open = stock['1. open'].values
low = stock['3. low'].values
high = stock['2. high'].values
close = stock['4. close'].values
sorted_date = stock.index.get_level_values(level='date')
stock_numpy_format = np.stack((sorted_date, open, low
,high, close), axis=1)
df = pd.DataFrame(stock_numpy_format, columns=['date', 'open', 'low', 'high', 'close'])
df = df[df['open']>0]
df = df[(df['date'] >= "2016-01-01") & (df['date'] <= "2018-12-31")]
df = df.reset_index(drop=True)
df['close_next'] = df['close'].shift(-1)
df['daily_return'] = df['close'].pct_change(1)
df['daily_return'].fillna(0, inplace=True)
stock_numeric_close_dailyreturn = df['close', 'daily_return']
stock_normalized = (stock_numeric_close_dailyreturn - stock_numeric_close_dailyreturn.mean()) / stock_numeric_close_dailyreturn.std()
euclidean_distances = stock_normalized.apply(lambda row: distance.euclidean(row, date_normalized) , axis=1)
distance_frame = pd.DataFrame(data={"dist": euclidean_distances, "idx":euclidean_distances.index})
distance_frame.sort_values("dist", inplace=True)
second_smallest = distance_frame.iloc[1]["idx"]
most_similar_to_date = df.loc[int(second_smallest)]["date"]
And I want that my chart like this
The chart that I want
And the code from this picture
distance_columns = ['Close', 'DailyReturn']
stock_numeric = stock[distance_columns]
stock_normalized = (stock_numeric - stock_numeric.mean()) / stock_numeric.std()
stock_normalized.fillna(0, inplace = True)
date_normalized = stock_normalized[stock["Date"] == "2016-06-29"]
euclidean_distances = stock_normalized.apply(lambda row: distance.euclidean(row, date_normalized), axis = 1)
distance_frame = pandas.DataFrame(data = {"dist": euclidean_distances, "idx": euclidean_distances.index})
distance_frame.sort_values("dist", inplace=True)
second_smallest = distance_frame.iloc[1]["idx"]
most_similar_to_date = stock.loc[int(second_smallest)]["Date"]
I tried to figure it out, the "apply" in the df.apply from pandas format and from pandas.csv_reader is different.
Is there any alternative to have same output in different format (pandas and csv)
Thank you!
nb: sorry if my english bad.

datetime groupby on a multiindex

If I have a multiindex set up like:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from io import StringIO
csv = u"""string,date,number
a string1,2/5/11 9:16am,1.0
a string2,3/5/11 10:44pm,2.0
a string3,4/22/11 12:07pm,3.0
a string4,4/22/11 12:10pm,4.0
a string5,4/29/11 11:59am,1.0
a string6,5/2/11 1:41pm,2.0
a string7,5/2/11 2:02pm,3.0
a string8,5/2/11 2:56pm,4.0
a string9,5/2/11 3:00pm,5.0
a string10,5/2/14 3:02pm,6.0
a string11,5/2/14 3:18pm,7.0"""
df = pd.read_csv(StringIO(csv))
df['date']=pd.to_datetime(df['date'],format='%m/%d/%y %I:%M%p')
df.index = df['date']
df.index = pd.MultiIndex.from_tuples(zip(df['date'], df['string']), names=['alpha', 'bravo'])
How can I do a groupby on the alpha index by month and then sum? What I've tried is:
df.groupby(level='alpha').sum().groupby(df.index.month).sum()
which clearly doesn't work.
Like this?
df.groupby(df.index.get_level_values('alpha').month).number.sum()

How to more efficiently calculate a rolling ratio

i have data length is over 3000.
below are code for making 20days value ( Volume Ration in Stock market)
it took more than 2 min.
is there any good way to reduce running time.
import pandas as pd
import numpy as np
from pandas.io.data import DataReader
import matplotlib.pylab as plt
data = DataReader('047040.KS','yahoo',start='2010')
data['vr']=0
data['Volume Ratio']=0
data['acend']=0
data['vr'] = np.sign(data['Close']-data['Open'])
data['vr'] = np.where(data['vr']==0,0.5,data['vr'])
data['vr'] = np.where(data['vr']<0,0,data['vr'])
data['acend'] = np.multiply(data['Volume'],data['vr'])
for i in range(len(data['Open'])):
if i<19:
data['Volume Ratio'][i]=0
else:
data['Volume Ratio'][i] = ((sum(data['acend'][i-19:i]))/((sum(data['Volume'][i-19:i])-sum(data['acend'][i-19:i]))))*100
Consider using conditional row selection and rolling.sum():
data.loc[data.index[:20], 'Volume Ratio'] = 0
data.loc[data.index[20:], 'Volume Ratio'] = (data.loc[:20:, 'acend'].rolling(window=20).sum() / (data.loc[:20:, 'Volume'].rolling(window=20).sum() - data.loc[:20:, 'acend'].rolling(window=20).sum()) * 100
or, simplified - .rolling.sum() will create np.nan for the first 20 values so just use .fillna(0):
data['new_col'] = data['acend'].rolling(window=20).sum().div(data['Volume'].rolling(window=20).sum().subtract(data['acend'].rolling(window=20).sum()).mul(100).fillna(0)

Categories