I am following through step by step of Analytics Vidhya's
Time Series forecasting posted a while ago. I am at the step where we calculate exponential moving average
https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?
Link for the article.
Here is vidhya's code :
xpwighted_avg = pd.ewma(ts_log, halflife=12)
plt.plot(ts_log)
plt.plot(expwighted_avg, color=‘red’)
Mycode:
expwavg = a.ewm(span=12, adjust=True).mean()
plt.plot(a)
plt.plot(expwavg, color='red')
a is my dataset. I believe the function has changed and I am using the most updated one. Any help to solve this function would be helpful.
error : list object has no attribute ewm or ewma
Thanks,
I suspect that a is not actually a DataFrame. You may want to try this first:
# assuming you have previously done:
# import pandas as pd
adf = pd.DataFrame.from_records(a)
adf.head()
If the data appears to be structured as you intend, then your command will likely work:
expwavg = adf.ewm(span=12, adjust=True).mean()
plt.plot(adf)
plt.plot(expwavg, color='red')
If this does not work, you will likely need to post some of the code that precedes the three lines you have already posted.
Related
I'm trying to copy the second exercise ("Forecasts constrained to an interval") in the link below:
https://otexts.com/fpp2/limits.html
What the link does is an ARIMA with forecasts constrained to an interval using a certain logarithmic transformation and then back-transformation at the end. But the example in the link uses R language, and I can't find a similar example for Python no matter how much I search.
Can anyone tell me how I can do the exact same thing described in the link with Python? I'm certain it is possible using the statsmodels library, but I'm not sure how to exactly replicate the transformation constraints.
The standard ARIMA in Python:
from statsmodels.tsa.arima_model import ARIMA
import numpy as np
model = ARIMA(series, order=(0,1,1))
model_fit = model.fit(trend='nc',full_output=True, disp=1)
print(model_fit.summary())
I have a feeling that I need to add something like this somewhere (transformation formula):
series = np.log((series-a)/(b-series))
as well as the back-transformation formula. But since they don't produce explicit errors I can't be sure whether I'm coding it right.
Also, I'm stuck at where I should be adding the transformation and back-transformation. I would appreciate it if someone could explain how the exercise in the link could be replicated in Python.
P.S. By 'transformation' here, it has nothing to do with making the time series stationary. I didn't mention the stationary part because it's unrelated to my current question. The link above uses the word 'transformation' to use the logarithmic formula to make the time series constrained to lie between 'a' and 'b'.
What I tried so far:
series = np.log((series-a)/(b-series))
model = ARIMA(series, order=(0,1,1))
model_fit = model.fit(trend='c',full_output=True, disp=1)
print(model_fit.summary())
fore = model_fit.forecast(steps=1)
fore = (b-a)*np.exp(fore)/(1+np.exp(fore)) + a
it's so clear from the link that you referred to in the question that the transformation is going to take place just before forecasting. so:
you do the transformation on your data
forecast using ARIMA model on transformed data
reverse the transformation on predicted data!
a = 50
b = 400
# Transformation on the data
train = np.log((series-a)/(b-series))
# Choose suitable order
model = ARIMA(train,order=(2,2,2))
results = model.fit()
start=len(train)
# One step ahead forecasting. You should set value of the end to what you prefer
predictions = results.predict(start = start , end = 1 , dynamic=False , typ='levels')
# reverse transformation
predictions = ((b-a)*np.exp(predictions)/(1+np.exp(predictions))) + a
Passing dynamic=False means that forecasts at each point are generated using the full history up to that point (all lagged values).
Passing typ='levels' predicts the levels of the original endogenous variables. If we'd used the
default typ='linear' we would have seen linear predictions in terms of the differenced
endogenous variables.
I have encountered the following problem when trying to make a boxplot of one column in a pandas.DataFrame vs another one. Here is the code:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(60))
df.columns = ['Values']
names = ('one','two','three')*int(df.shape[0]/3)
df['Names'] = names
df.plot(x='Names', y='Values', kind='box')
df.boxplot(column='Values', by='Names')
I expect two plot to be the same, but I get:
Is it an expected behavior and if so, how the expression for the first plot should be changed to match the second one?
.boxplot() and .plot(kind='box')/.plot.box() are separate implementations. Problem with .plot(kind='box')/.plot.box() is that although the argument by exists, it is not implemented and therefore ignored (see this issue for example, and they never managed to document it properly), meaning that you won't be able to reproduce the result you get with .boxplot().
Tl;dr .plot(kind='box')/.plot.box() implemented poorly, use .boxplot() instead.
The title outlines my problem for the following script(please, run it first and then read my final question):
Now the whole code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pandas_datareader as pdr
from sklearn.linear_model import LinearRegression
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import datetime
tickers=['EXO.MI','LDO.MI']
end=datetime.date.today()
gap=datetime.timedelta(days=650)
start=end- gap
Bank=pdr.get_data_yahoo(tickers,start=start,end=end)
bank_matrix=Bank['Adj Close']
bank_matrix=bank_matrix.dropna()
exor=bank_matrix['EXO.MI']
leonardo=bank_matrix['LDO.MI']
Regressione=pd.DataFrame(data=np.zeros((len(exor),3)),columns=['Intercetta','Hedge','Residuals'],index=bank_matrix['EXO.MI'].index)
lookback=20
Hedge=[]
Intercetta=[]
Residuals=[]
for i in range(lookback,len(exor)):
reg=LinearRegression().fit(bank_matrix[['LDO.MI']][i-lookback+1:i],bank_matrix[['EXO.MI']][i-lookback+1:i])
# Regressione.iloc[Regressione[i,'Hedge']]=reg.coef_[0]
Hedge.append(reg.coef_[0])
Intercetta.append(reg.intercept_)
y_pred=reg.predict(bank_matrix[['LDO.MI']][lookback:])
Residuals.append(bank_matrix[['EXO.MI']][lookback:].to_numpy()-y_pred)
Regressione=pd.DataFrame(list(zip(Intercetta,Hedge,Residuals)),columns=['Intercetta','Hedge','Residuals'])
Regressione.set_index(bank_matrix[['EXO.MI']].index[lookback:],inplace=True)
The code works however I have 2 questions:
Is that 'reg._residues' the real residuals from the Y(real value of 'EXO.MI') and y predicted?I ask that because the plot of residuals was everything but normally distributed or stationary
Guys I'm getting crazy: HOW CAN I COMPUTE THE everyday residuals in a 'FOR'LOOP ?????
I mean, I tried to:
make the difference between real y values and reg.predict
make the manual computation: y_predicted= Intercetta + Hedge*bank_matrix[['LDO.MI]]
But Python always report me problems. I honestly find very hard to understand how Python works for this....
Thanks
It's still not 100% clear to me what you want to do here, but I hope this will get you somewhere.
First of all, your code runs fine if you just add import datetime in the beginning, and replace y_pred=reg.predict(bank_matrix[['LDO.MI']][lookback:]) Residuals.append(bank_matrix[['EXO.MI']][lookback:].to_numpy()-y_pred) with y_pred=reg.predict(bank_matrix[['LDO.MI']][lookback:]) Residuals.append(bank_matrix[['EXO.MI']][lookback:]-y_pred).
Then you can visually check your residuals for each sub-period using:
for df in Residuals:
df.plot.hist()
Using Residuals[-3:] will plot the last three residual series of your calculations:
You can also easily run a Shapiro-Wilk test for normality for each of your residual series and append the results in a dataframe:
from scipy import stats
shapiro=[]
for df in Residuals[-3:]:
shapiro.append(stats.shapiro(df[df.columns[0]].values))
df_shapiro = pd.DataFrame(shapiro)
df_shapiro[0] returns the W-statistic and df_shapiro[1] returns the p-values.
Take a closer look at the p-values using:
df_pVal=df_shapiro[1].to_frame()
df_pVal['alpha']=0.05
df_pVal.plot()
Take a look at here for more information on how to use the test.
The question still remains what you're aiming to do here. A detailed explanation would be great. Until then, I hope my effort gets you a few steps further.
I'm currently writng a code involving some financial calculation. More in particular some exponential moving average. To do the job I have tried Pandas and Talib:
talib_ex=pd.Series(talib.EMA(self.PriceAdjusted.values,timeperiod=200),self.PriceAdjusted.index)
pandas_ex=self.PriceAdjusted.ewm(span=200,adjust=True,min_periods=200-1).mean()
They both work fine, but they provide different results at the begining of the array:
So there is some parameter to be change into pandas's EWMA or it is a bug and I should worry?
Thanks in advance
Luca
For the talib ema, the formula is:
So when using the pandas, if you want to make pandas ema the same as talib, you should use it as:
pandas_ex=self.PriceAdjusted.ewm(span=200,adjust=False,min_periods=200-1).mean()
Set the adjust as False according to the document(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html) if you want to use the same formula as talib:
When adjust is True (default), weighted averages are calculated using weights (1-alpha)(n-1), (1-alpha)(n-2), ..., 1-alpha, 1.
When adjust is False, weighted averages are calculated recursively as:
weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)weighted_average[i-1] + alphaarg[i].
You can also reference here:
https://en.wikipedia.org/wiki/Moving_average
PS: however, in my project, i still find some small differences between the talib and the pandas.ewm and don't know why yet...
Good day
This is my maiden Stack Overflow question so I hope I get it right and don't break any rules.
I work as a Fund Manager so do not have computer science background. I am however learning python at the moment.
I am trying to fit historical data which includes multiple time series. I think I have managed to do this. The thing I need to do next is to use this data to predict values into the future for these time series. I have looked at the StatsModels documentation but can't quite make heads or tails of it.
I am using xlwings and linking to excel. My code is as follows:
import numpy as np
from xlwings import Workbook, Range
import statsmodels.api as sm
import statsmodels
import pandas
def Fit_the_AR():
dataRange = Range('Sheet1','rDataToFit').value
dateRange = Range('Sheet1', 'rDates').value
titleRange = Range('Sheet1', 'rTitles').value
ARModel = statsmodels.tsa.vector_ar.var_model.VAR(dataRange,dateRange,titleRange,freq='m')
statsmodels.tsa.vector_ar.var_model.VAR.fit(ARModel,1, 'ols', None, 'c', True)
Range('Sheet2','B2').value = ARModel.endog_names
Range('Sheet2','B3').value = ARModel.endog
I thought i would have to use the predict method but not sure how I get all the parameters required for it.
Any help or pointing in the right direction would be much appreciated. I can provide an excel file of the data if need be. Thank you.