Struggling to build an arima model in python that is even close to useful for predicting household electricity usage. Would appreciate any thoughts and suggestions. (Might just be a silly error in my implementation!)
Some design thoughts:
Data is very messy in general but there is clearly daily seasonality (usage drops over night and while household at work/school) and a weekly seasonality (weekday usage differs from weekend)
Have tried statsmodels, sktime, fbprophet and pmdarima 'auto_arima' functions with no luck. Don't think these take seasonality into account particularly well
Currently trying to get a more manual approach to work: statsmodel's sarima with only daily seasonality incorporated (see code and results below), and maybe add fourier term as exogenous variable to handle weekly seasonality.
Will consider adding exogenous variables (like temperature) to account for annual seasonality but first just trying to get something reasonable on a smaller time scale (3-6 months).
Approach I am trying to get working: Use box-jenkins method to specify a sarima model for just daily seasonality (images below).
(1) Looking at Dickey-Fuller and KPSS for the time series, there appears to be minimal trend to correct for (expected), but ACF and PACF charts show significant seasonality (daily, weekly).
(2) Taking differences to account for week and day seasonality, then taking a further first-order difference, we can quickly get to a dataset that has minimal remaining seasonality and is stationary. This should be a really good sign and suggests there is a model we can build to predict this behaviour!
One more plot to show difference between original and differenced data when we zoom in for a typicaly week.
(3) Finally, I trained a sarima model in the following way with results. I configured D=d=0 since no identifiable trend (expected), p=2 to give model opportunity to learn from most recent behaviour, m=48 for seasonality (daily since data is in 30min time intervals), and P=Q=1 to capture those seasonality effects t-48.
model = SARIMAX(
train_data,
trend='n',
order=(2, 0, 0),
seasonal_order=(1, 0, 1, 48),
)
results = model.fit()
I am able to get an exponential smoothing model working, but I had expected double seasonal arima to blow it out of the water. Any thoughts and suggestions most welcome. Thank you in advance!
I am a bit confused about how to identify the seasonal component of a SARIMA model. I am currently looking at forecasting rates (ocean carrier rates to be specific). The first thing I did was to convert my original rates to the difference in rates, i.e. log(P2) - log(P1) as I wanted to forecast the change in rate itself. Then I checked for stationarity of the series and it was stationary. This is what the seasonal decomposed series look like
After that, I ran a basic ARIMA model and chose the p,d,q based on the ACF and PACF plots here but the predictions were pretty bad which was expected
I am now trying to run a SARIMA model instead. I obtained the seasonality component via seasonal_decompose and plotted the ACF and PACF over 52 lags (i have weekly data) and this is what I see. How do I choose the right pdq component for SARIMA?
If you want to check an anomaly in stock data many studies use a linear regression. Let's say you want to check if there is a Monday effect, meaning that monday is significantly worse than other days.
I understood that we can use a regression like: return = a + b DummyMon + e
a is the constant, b the regression coefficient, we have the Dummy for Monday and the error term e.
That's what I used in python:
First you add a constant to the anomaly:
anomaly = sm.add_constant(anomaly)
Then you build the model:
model = sm.OLS(return, anomaly)
The you fit the model:
results = model.fit()
I wonder if this is the correct model setup.
In this case a plot of the linear regression would just show two vertical areas above 0 (for no Monday) and 1 (for Monday) with all the returns. It looks pretty strange. Is this correct?
Should I somehow try to use the time (t) in the regression? If so, how can I do it with python? I thought about giving each date an increasing number, but then I wondered how to treat weekends.
I would assume that with many data points both approaches are similar, if the time series is stationary, right? In the end I do a cross section anaylsis and don't care about the aspect of the time series in this case, correct? ( I heard about GARCH models etc, where this is a different)
Well, I am just learning and hope someone could give me some ideas about the topic.
Thank you very much in advance.
For time series analysis tasks (such as forecasting or anomaly detection), you may need a more advanced model, such as Recurrent Neural Networks (RNN) in deep learning. You can assign any time step to an RNN Cell, in your case, every RNN Cell can represent a day or maybe an hour or half a day etc.
The main purpose of the RNNs is to make the model understand the time dependencies in the data. For example, if monday has a bad affect, then corresponding RNN Cells will have reasonable parameters. I would recommend you to do some further research about it. Here there are some documentations that may help:
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
(Also includes different types of RNN)
https://towardsdatascience.com/understanding-rnn-and-lstm-f7cdf6dfc14e
And you can use tensorflow, keras or PyTorch libraries.
I'm working with a dataset of a bunch of different features for stock forecasting for a project to help teach myself programming and to get into machine learning. I am running into some issues with predicting future data (or time series forecasting) and I was hoping someone out there could give me some advice! Any advice or criticism you could provide will be greatly appreciated.
Below I've listed detailed examples of the three implementations I have tried for forecasting time series data. I could definitely be wrong on this but I don't believe this is mechanical code issue because all of the results are consistent despite me re-coding it a few times (the only thing I can really think of here is not using MinMaxScaler correctly. See closing thoughts). It could definitely, however, be a macrocoding mistake.
I didn't post any code for the project here because it was starting to turn into a wall of words and I had three separate examples, but if you have any questions or think it would benefit your assistance to see the code used for any of the below examples or the data used for all of them feel free to let me know and I'll link whatever's needed.
The three forecasting implementations I have tried:
1) - A sliding window implementation. Input data is shifted backwards in timesteps (x-1, x-2...), target data is current timestep (x). Data used for forecasting the first prediction is n-rows of test data shifted in same manner as input data. For every subsequent prediction the oldest timestep is removed and the prediction is appended to front of prediction row, maintaining the same number of total timesteps but progressing forward in time.
2) - Input data is just x, target data is shifted 30 timesteps forward for prediction (y+1, y+2 ... y+30). Attempting to forecast future by taking the first sample of x in test data and predicting 30 steps into the future with it.
3) - Combination of both methods, input data is shifted backward and in the example shown below, 101 timesteps including the present timestep (x-100, x-99 ... x) were used. Target data, similar to implementation 2, is shifted 30 timesteps into the future (y+1, y+2... y+30). With this, I am attempting to forecast the future by taking 101 timesteps of the first n-rows of test data and predicting 30 steps into the future from there.
For all tests, I cut off the end of my dataset at an arbitrary amount (last ~10% of total dataset), split everything before the cutoff into training/validation (80/20) and saved everything after the cutoff for testing & forecasting purposes.
As for network architectures, I've tried a bunch of different ones, from bidirectional LSTM to multi-input CNN / GRU, to a wavenet like CNN implementation and all produce prediction results that are bad in a similar enough way that I feel like this is either a data manipulation problem or a problem of me not understanding how model.predict(), or my model's output works.
The architectures I will be using for each implementation below are:
1) causal dilation CNN
2) two layers LSTM
neural network architecture diagrams here: https://imgur.com/a/cY2RWNG
For every example below the model's weights were tuned by the model training on the training data (first 80% of dataset) and attempting to achieve the lowest validation loss possible using the validation data (last 20% of dataset).
--- First Implementation ---
(unfortunately, there's an image limit on stack overflow for my current rating or whatever so I've put each implementation into its own album)
Implementation 1 - Graphs for both CNN/LSTM: Images 1-7 https://imgur.com/a/36DZCIf
In every training/validation test graph, black represents the actual data and red is the predicted data, in the forecasting predictions blue represents the predictions made and orange represents the actual close price on a longer time scale than the prediction for better scale, all forecast predictions are 30 days into the future.
Using this methodology, and displaying the actual close price against the predicted close price in every instance:
Image 1 - sliding window set up for this implementation using one and two features and a range of numbers for viewing ease
CNN:
(images 2 & 3 description in album)
Image 4 - Sliding window approach of forecasting every feature in the data, with the prediction for close price plotted against the actual close price. The timesteps start at the first row of the cutoff data.
When the first prediction is made I append the prediction to the end of this row and remove the first timestep, repeating for every future timestep I wish to predict.
I really don't even know what to say about this prediction, it's really bad...
LSTM:
(images 5 & 6 description in album)
Image 7 - Sliding window prediction: https://i.imgur.com/Ywf6xvr.png
This prediction seems to be getting the trend somewhat accurately I guess.. But the starting point is really nowhere near the last known data point which is confusing.
--- Second Implementation ---
Implementation 2 - Graphs for both CNN/LSTM: Images 1-7
https://imgur.com/a/3CAk1xc
For this attempt, I made the target prediction many timesteps into the future. With this implementation, the model takes in the current timestep(x) of features and attempts to predict the closing price at y+1, y+2,y+3 etc. There is only one prediction here -- a sequence of time steps into the future.
The same graphing and network conventions as implementation 1 had applied to this too.
Image 1 - Set up of input and target data, using a range and only one or two features for viewing ease.
CNN:
(images 2 & 3 description in album)
Image 4 - Plotting all 30 predictions made from the first row of data features after the cutoff... this is horrible, why is the start again nowhere near the last known data point? I don't understand how it can predict y+1 being so far away from the closing price of x when in every instance of its training y+1 was almost certainly extremely close to x.
LSTM:
(images 5 & 6 description in album)
Image 7 - All 30 predictions into the future made from the first row of cutoff data: Again, both all over the place and the predictions start nowhere near the last actual data point, not sure what else to add.
It's starting to appear that either my CNN implementation is poorly done or LSTM is just a better solution here. Regardless, the predictions and actual forecasting are still terrible so I'll hold my judgment on the network architecture until I get something that looks remotely like an actual forecast prediction.
--- Third Implementation ---
Implementation 3 - Graphs for both CNN/LSTM: Images 1-7
https://imgur.com/a/clcKFF8
This was the final idea I had for forecasting the future and it's essentially a combination of the first two. For this implementation, I take x-n (x, x-1, x-2, x-3) etc., which is similar to the first implementation, and set the target data to y+1, y+2, y+3, which is similar to the second implementation. My goal for predicting with this was the same strategy as the second implementation where I would predict 30 days into the future, but instead of doing so on one timestep of features, I'd do so on many timesteps into the past. I had hoped that this implementation would give the prediction enough supporting data to accurately forecast the future
Image 1 - Input data or "x" and Target data or "y" implementation and set up. I use a range of numbers again. In this example, the input data has 2 features, includes the present timestep (x) and 4 timesteps shifted backward (x-1, x-2, x-3, x-4) and the target data has 5 timesteps into the future (y+1, y+2, y+3, y+4, y+5)
CNN:
(images 2 & 3 description in album)
Image 4 - 30 predictions into the future using 101 timesteps of x
This is probably the worst result yet and that's despite the prediction having way more timesteps back of data to use.
LSTM:
(images 5 & 6 description in album)
Image 7 - 30 predictions on input data row of 101 timesteps.
This actually has some action to it I guess, but it's all over the place, doesn't start near the last actual data point and is clearly not accurate at all.
closing thoughts
I've also tried removing the target variable (close price) from the input data but it doesn't seem to change much and the past n-days of closing data should be available to a model anyway I would think.
Originally I MinMaxScaled all of my data in my pre-processing page and did not inverse_transform any of the data. The results were basically just as bad as the examples above. For the examples above I have min max scaled the prediction, validation & test datasets separately to be within the range of 0.2 - 0.8. For the actual forecasting predictions, I've inverse_transformed the data before plotting it against the actual closing price which was never transformed.
If I am doing something fundamentally wrong in the above examples I would love to know as I'm just starting out and this is my first real programming/machine learning project.
A few other things relating to this that I've come across / tried:
I've experimented briefly with using a stateful model where I reset_states() after every prediction to some moderate success.
I've read that sequence to sequence models could be useful for forecasting time series data but I'm really not sure what that system is designed to do with time series despite reading into it quite a bit and am thus not sure how to implement it or test it out.
I tried bidirectional LSTM because of one random StackOverflow post suggesting it for time series forecasting... the results were very mediocre however and it doesn't seem to make much sense to me in this situation from what I understand of how it works. I've only tried it with the first implementation above though, let me know if it's something to look more into.
Any tips/criticism at all that you could provide would be greatly appreciated, I'm really not sure how to progress from here. Thanks in advance!
I have been through that,for me the sliding window approach with LSTM, NN worked like magic for small time series But on a bigger time series with data coming in on hourly basis for a few years it failed miserably.
Later on I ditched LSTM,GBTs & started implementing algos from statsmodels.tsa, ARIMA, SARIMA most of the time, I'll suggest you to read about them too. Very easy to implement no need to worry about sliding window, moving data few timestamps back, it takes care of all. Just train, tune the parameters & predict for the next timestamps.
Sometimes, I also faced issues like my time series had missing timestamps & data, then I had to impute those values, the frequency on which I trained (hourly,weekly,monthly) was different from the frequency on which I wanted to predict, then I had to bring data in right form too. I faced different frequency issue while visualising on a plot as well.
model=statsmodels.api.tsa.SARIMAX(train_df,order=(1,0,1),seasonal_order=(1,1,0,24))
model = model.fit()
other than the data pre-processing part, imputing missing data, training on right frequency & some logic for parameters tuning, you will need to use just these two lines, your data_frame will have the index in a date format & columns will have time series data
Hey all I was wondering how could I connect 2 different line plots on the same graph together in matlab?
If that is not ideal then I could combine the 2 dataframes together however then I would need a way to tell it to change the color of the line plot at a certain x point?
I want to indicate were the predicated sales are on the graph. here is a Picture of what my code and graph currently look like(Red is actual/green is predicted)
Here is the link to my ipython notebook https://github.com/neil90/Learning_LinearReg/blob/master/Elantra%20Regression_Practice.ipynb
My original dataset was 50 observations(small I know), I split it into training and test. I got .72 R2 on my test set. So then I looked online to see if I could find the independent variables for 12 months after the dataset and low and behold I was able to, however(I am not sure of the accuracy). I then wanted to predict the sales with my model. Hence the green line.
That is always possible to connect two points using a single plot command both in MATLAB and in Python, such as:
P1 = df1(:,end); % last Point in The First Plot
P2 = df2(:,1); % first Point in The Second Plot
plot([P1(1,1) P2(1,1)],[P1(2,1) P2(2,1)])
But
this is not a good practice, as you said the green graph is your prediction part. Why don't you include more inputs when calculating the predicted points.
So I assume that you have a set of data, and then you divided it in two parts, a part for training and the other part for testing the learned model.
The ratio should be (70% to 30% or 80% to 20%), unless you have a validation part as well.
So after you train your model on the training part of the data, you should check the error of the model (Does it converge?). As I can see your prediction part has a huge error. It cannot even re-generate the sample it has seen before (i.e. the last point on the red graph).
So first try to plot the prediction over the whole training data, to see how accurate the learned model is. I am pretty sure that you need to reduce the learning error for your model by doing more iterations or changing the structure of the model.
Because you have not mentioned enough details most of the ideas are based are assumptions here.