I am using Fb prophet to predict the upcoming covid cases every day in month of April 2022. I have a dataset of daily covid cases up to March 2022. I have included holiday, and external regressor such as weekends, or daily covid testing but still, I am no way near the actual covid cases.
I have got high MAE and MAPE,
This is the plot of original data points -
This is my Dataset - y is the daily new cases, new_tests are the daily testing and workingday ( 1 for weekday and 0 for weekend)
This is the additional holiday dataset -
I used these parameters after cross - validation:
m = Prophet(growth = "linear",
yearly_seasonality = True,
weekly_seasonality = True,
daily_seasonality = False,
holidays = holidays,
seasonality_mode = "multiplicative",
n_changepoints = 10,
seasonality_prior_scale = 5,
holidays_prior_scale = 5,
changepoint_prior_scale = 0.01)
m.add_regressor('workingday')
m.add_regressor('new_tests')
m.fit(training_set)
But I got very poor results - The blue line is for predictions and black dots are for original points.
The Mean Absolute Error and Root Mean Squared Error are 68862 and 72610 respectively whereas Mean Absolute percentage error is 2702.99.
How can I Tune my hyperparameters or add any other regressor so that I can get low Error? or close predictions. Thanks.
Related
I have been trying a crossover MA strategy using bt library. The strategy: 65(SPY)/35(AGG) portfolio as a default. IF the 50 day moving average for SPY crosses above the 200 day moving average, the investor would go 10% overweight SPY, so the portfolio would change to 75/25. If the SPY 50 day moving average crosses below the 200, the portfolio would change to 55/45. I believe the problem lies within the bt.algos.weighspecified. I think I can only put one numerical value for the following securities. Is it even possible to do alternating weights? Thanks
I have the following code:
spy_agg_data_MA
spy_agg_data_MA['mean_close_50'] = spy_agg_data_MA['SPY'].rolling(50).mean()
spy_agg_data_MA['mean_close_200'] = spy_agg_data_MA['SPY'].rolling(200).mean()
spy_agg_data_MA.dropna(inplace = True)
spy_agg_data_MA # Dataset that contains SPY, AGG, 50 MA, 200MA
spy_agg_data_MA['SPY_weights'] = np.where(spy_agg_data_MA['mean_close_50']>spy_agg_data_MA['mean_close_200'], .75,.65)
spy_agg_data_MA['AGG_weights'] = np.where(spy_agg_data_MA['mean_close_50']>spy_agg_data_MA['mean_close_200'], .25,.35)
spy_agg_data_MA #Now, the dataset has two more columns; spy weight and agg weight depending on MAs
SPY_weights = spy_agg_data_MA['SPY_weights']
AGG_weights = spy_agg_data_MA['AGG_weights']
target_weights = pd.concat([SPY_weights,AGG_weights], axis = 1)
target_weights #This might be redundant, but I put the two columns into another dataset
name = 'SPY_AGG_weightrebal'
strategy_2 = bt.Strategy(
name,
[
bt.algos.RunWeekly(),
bt.algos.SelectAll(),
bt.algos.WeighSpecified(SPY = target_weights['SPY_weights'], AGG = target_weights['AGG_weights']),
bt.algos.Rebalance()
]
)
backtest_2 = bt.Backtest(strategy_2, spy_agg_data_MA[['SPY','AGG']])
res = bt.run(backtest_2)```
The error: cannot convert the series to <class 'float'>
#Initialize the mode here
model = GFS(resolution='half', set_type='latest')
#the location I want to forecast the irradiance, and also the timezone
latitude, longitude, tz = 15.134677754177943, 120.63806622424912, 'Asia/Manila'
start = pd.Timestamp(datetime.date.today(), tz=tz)
end = start + pd.Timedelta(days=7)
#pulling the data from the GFS
raw_data = model.get_processed_data(latitude, longitude, start, end)
raw_data = pd.DataFrame(raw_data)
data = raw_data
#Description of the PV system we are using
system = PVSystem(surface_tilt=10, surface_azimuth=180, albedo=0.2,
module_type = 'glass_polymer',
module=module, module_parameters=module,
temperature_model_parameters=temperature_model_parameters,
modules_per_string=24, strings_per_inverter=32,
inverter=inverter, inverter_parameters=inverter,
racking_model='insulated_back')
#Using the ModelChain
mc = ModelChain(system, model.location, orientation_strategy=None,
aoi_model='no_loss', spectral_model='no_loss',
temp_model='sapm', losses_model='no_loss')
mc.run_model(data);
mc.total_irrad.plot()
plt.ylabel('Plane of array irradiance ($W/m^2$)')
plt.legend(loc='best')
Here is the picture of it
I am actually getting the same values for irradiance for days now. So I believe there is something wrong. I think there should somewhat be of different values for everyday at the least
Forecasting Irradiance
I think the reason the days all look the same is that the forecast data predicts those days to be consistently overcast, so there's not necessarily anything "wrong" with the values being very similar across days -- it's just several cloudy days in a row. Take a look at raw_data['total_clouds'] and see how little variation there is for this forecast (nearly always 100% cloud cover). Also note that if you print the actual values of mc.total_irrad, you'll see that there is some minor variation day-to-day that is too small to appear on the plot.
I have a set of data from January 2012 to December 2014 that show some trend and seasonality. I want to make a prediction for the next 2 years (from January 2015 to December 2017), by using the Holt-Winters method from statsmodels.
The data set is the following one:
date,Data
Jan-12,153046
Feb-12,161874
Mar-12,226134
Apr-12,171871
May-12,191416
Jun-12,230926
Jul-12,147518
Aug-12,107449
Sep-12,170645
Oct-12,176492
Nov-12,180005
Dec-12,193372
Jan-13,156846
Feb-13,168893
Mar-13,231103
Apr-13,187390
May-13,191702
Jun-13,252216
Jul-13,175392
Aug-13,150390
Sep-13,148750
Oct-13,173798
Nov-13,171611
Dec-13,165390
Jan-14,155079
Feb-14,172438
Mar-14,225818
Apr-14,188195
May-14,193948
Jun-14,230964
Jul-14,172225
Aug-14,129257
Sep-14,173443
Oct-14,188987
Nov-14,172731
Dec-14,211194
Which looks like follows:
I'm trying to build the Holt-Winters model, in order to improve the prediction performance of the past data (it means, a new graph where I can see if my parameters perform a good prediction of the past) and later on forecast the next years. I made the prediction with the following code, but I'm not able to do the forecast.
# Data loading
data = pd.read_csv('setpoints.csv', parse_dates=['date'], index_col=['date'])
df_data = pd.DataFrame(datos_matric, columns=['Data'])
df_data['Data'].index.freq = 'MS'
train, test = df_data['Data'], df_data['Data']
model = ExponentialSmoothing(train, trend='add', seasonal='add', seasonal_periods=12).fit()
period = ['Jan-12', 'Dec-14']
pred = model.predict(start=period[0], end=period[1])
df_data['Data'].plot(label='Train')
test.plot(label='Test')
pred.plot(label='Holt-Winters')
plt.legend(loc='best')
plt.show()
Which looks like:
Does anyone now how to forecast it?
I think you are making a misconception here. You shouldnt use the same data for train and test. The test data are datapoints which your model "has not seen yet". This way you can test how well your model is performing. So I used the last three months of your data as test.
As for the prediction, we can use different start and end points.
Also notice I used mul as seasonal component, which performs better on your data:
# read in data and convert date column to MS frequency
df = pd.read_csv(data)
df['date'] = pd.to_datetime(df['date'], format='%b-%y')
df = df.set_index('date').asfreq('MS')
# split data in train, test
train = df.loc[:'2014-09-01']
test = df.loc['2014-10-01':]
# train model and predict
model = ExponentialSmoothing(train, seasonal='mul', seasonal_periods=12).fit()
#model = ExponentialSmoothing(train, trend='add', seasonal='add', seasonal_periods=12).fit()
pred_test = model.predict(start='2014-10-01', end='2014-12-01')
pred_forecast = model.predict(start='2015-01-01', end='2017-12-01')
# plot data and prediction
df.plot(figsize=(15,9), label='Train')
pred_test.plot(label='Test')
pred_forecast.plot(label='Forecast')
plt.legend()
plt.show()
plt.savefig('figure.png')
I am trying to use Prophet - the forecasting package by facebook, and it works great with say, 150 rows of data. But when I try to model with less than 100 rows, it gives me very weird predictions. When I do it in R, it gives me the same prediction for all dates and when I do it in python, it gives me very bad predictions.
My data is weekly from 2018 week 1 to 2019 week 40.
This is my code:
(python)
predictionSize=6
new_train_df = data[:-predictionSize]
new_test_df = data[len(data)-predictionSize:]
m_new = Prophet(weekly_seasonality=True,yearly_seasonality=True)
m_new.fit(new_train_df)
new_future = m_new.make_future_dataframe(periods=predictionSize,freq='W')
new_forecast = m_new.predict(new_future)
new_ypred = new_forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(6)
Using this code gives me negative values for yhat.
My question is, are the predictions bad because the dataset is too less for prophet?
Do let me know if you need any other information. The data has weekly seasonality and yearly seasonality.
I am attempting to make a forecast of a stock's volatility some time into the future (say 90 days). It seems that GARCH is a traditionally used model for this.
I have implemented this below using Python's arch library. Everything I do is explained in the comments, the only thing that needs to be changed to run the code is to provide your own daily prices, rather than where I retrieve them from my own API.
import utils
import numpy as np
import pandas as pd
import arch
import matplotlib.pyplot as plt
ticker = 'AAPL' # Ticker to retrieve data for
forecast_horizon = 90 # Number of days to forecast
# Retrive prices from IEX API
prices = utils.dw.get(filename=ticker, source='iex', iex_range='5y')
df = prices[['date', 'close']]
df['daily_returns'] = np.log(df['close']).diff() # Daily log returns
df['monthly_std'] = df['daily_returns'].rolling(21).std() # Standard deviation across trading month
df['annual_vol'] = df['monthly_std'] * np.sqrt(252) # Annualize monthly standard devation
df = df.dropna().reset_index(drop=True)
# Convert decimal returns to %
returns = df['daily_returns'] * 100
# Fit GARCH model
am = arch.arch_model(returns[:-forecast_horizon])
res = am.fit(disp='off')
# Calculate fitted variance values from model parameters
# Convert variance to standard deviation (volatility)
# Revert previous multiplication by 100
fitted = 0.1 * np.sqrt(
res.params['omega'] +
res.params['alpha[1]'] *
res.resid**2 +
res.conditional_volatility**2 *
res.params['beta[1]']
)
# Make forecast
# Convert variance to standard deviation (volatility)
# Revert previous multiplication by 100
forecast = 0.1 * np.sqrt(res.forecast(horizon=forecast_horizon).variance.values[-1])
# Store actual, fitted, and forecasted results
vol = pd.DataFrame({
'actual': df['annual_vol'],
'model': np.append(fitted, forecast)
})
# Plot Actual vs Fitted/Forecasted
plt.plot(vol['actual'][:-forecast_horizon], label='Train')
plt.plot(vol['actual'][-forecast_horizon - 1:], label='Test')
plt.plot(vol['model'][:-forecast_horizon], label='Fitted')
plt.plot(vol['model'][-forecast_horizon - 1:], label='Forecast')
plt.legend()
plt.show()
For Apple, this produces the following plot:
Clearly, the fitted values are constantly far lower than the actual values, and this results in the forecast being a huge underestimation, too (This is a poor example given that Apple's volatility was unusually high in this test period, but with all companies I try, the model is always underestimating the fitted values).
Am I doing everything correct, and the GARCH model just isn't very powerful, or modelling volatility is very difficult? Or is there some error I am making?