Connecting 2 line plots

Connecting 2 line plots - python

Hey all I was wondering how could I connect 2 different line plots on the same graph together in matlab?
If that is not ideal then I could combine the 2 dataframes together however then I would need a way to tell it to change the color of the line plot at a certain x point?
I want to indicate were the predicated sales are on the graph. here is a Picture of what my code and graph currently look like(Red is actual/green is predicted)
Here is the link to my ipython notebook https://github.com/neil90/Learning_LinearReg/blob/master/Elantra%20Regression_Practice.ipynb
My original dataset was 50 observations(small I know), I split it into training and test. I got .72 R2 on my test set. So then I looked online to see if I could find the independent variables for 12 months after the dataset and low and behold I was able to, however(I am not sure of the accuracy). I then wanted to predict the sales with my model. Hence the green line.

That is always possible to connect two points using a single plot command both in MATLAB and in Python, such as:
P1 = df1(:,end); % last Point in The First Plot
P2 = df2(:,1); % first Point in The Second Plot
plot([P1(1,1) P2(1,1)],[P1(2,1) P2(2,1)])
But
this is not a good practice, as you said the green graph is your prediction part. Why don't you include more inputs when calculating the predicted points.
So I assume that you have a set of data, and then you divided it in two parts, a part for training and the other part for testing the learned model.
The ratio should be (70% to 30% or 80% to 20%), unless you have a validation part as well.
So after you train your model on the training part of the data, you should check the error of the model (Does it converge?). As I can see your prediction part has a huge error. It cannot even re-generate the sample it has seen before (i.e. the last point on the red graph).
So first try to plot the prediction over the whole training data, to see how accurate the learned model is. I am pretty sure that you need to reduce the learning error for your model by doing more iterations or changing the structure of the model.
Because you have not mentioned enough details most of the ideas are based are assumptions here.

Related

Classification of accelerometer data

I am trying to classify accelerometer data into 4 classes- 1,2,3,4. The training dataset looks like the following-
The training labels are contained in another file and contain labels for only the 10th observation. This is what it looks like-
Now I am not sure how to interpret this. Should I only use the training_labels dataset to train a model? In that case, I don't know why the first dataset is given. Also, using only the second set would lead to a loss of information. I thought of doing a left-outer join on the first dataset with the second and using 'bfill' in df.fillna() to get rid of the Nan values and then use that data to train but I am confused as to whether this is the right approach. I am still a beginner at Machine Learning so any help is appreciated.
EDIT: The data comes from an online course I am doing. It says that- "Because the accelerometers are sampled at high frequency, the labels in train_labels are only provided for every 10th observation

If you can afford to discard 90% of your data you can just use only the observations with labels, you can also take the mean / median x,y,z coordinate of 10 observations with the provided label or use the same label for the for the last 10 observations. Those approaches seem legit to me.
Probably the sampling frequency was unnecessary high and therefore you can assume labels do not change that quickly. But this can also depend on the problem at hand.

Why can my perceptron not separate perfectly a number of points that are less than the number of features?

I am quite new in machine learning and decided that a good way to start getting some experience would be to play around with some real data bases and the python scikit library. I used haberman's surgery data, a binary classification task, that can be found at https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival. I trained a few perceptrons using this data. At some point, I decided to demonstrate the concept of overfitting. Therefore, I mapped all 306 data points, of 3 features each, to a very high dimension, getting all terms up to and including the 11th degree. That is a vast 364 features (which is more than the 306 data points). Yet, when I trained the model, I did not achieve zero in-sample error. I figured the reason should be that there are some points that coincide and have different labels, so I removed duplicate data points, but again, I could not achieve zero in-sample error. Here is the interesting part of my code using the methods of the scikit library:
perceptron = Perceptron()
polynomial = preprocessing.PolynomialFeatures(11)
perceptron.fit(polynomial.fit_transform(X), Y)
print(perceptron.score(polynomial.fit_transform(X),Y))
And the output I got was a mere 0.7, an accuracy score far from 1 (100%) that I expected. What am I missing?

You only have 11 polynomial features. If you want to be guaranteed to hit every point you need almost as many if not more polynomial features than your number of datapoints. This is because each additional polynomial feature allows the graph to bend again.
Having a bunch of features of the same degree can't really increase your complexity in the way you expect. If your function is first degree for example, you really can't expect it to be anything other than linear, regardless of the like term count.
So while you may have more features than datapoints, since you don't have more polynomial features than datapoints most of your features are effectively tweaking the same weights.

Is there a way to create a graph comparing hyper-parameters vs model accuracy with TRAINS python package?

I would like to run multiple experiments, then report model accuracy per experiment.
I'm training a toy MNIST example with pytorch (v1.1.0), but the goal is, once I can compare performance for the toy problem, to have it integrated with the actual code base.
As I understand the TRAINS python package, with the "two lines of code" all my hyper-parameters are already logged (Command line argparse in my case).
What do I need to do in order to report a final scalar and then be able to sort through all the different training experiments (w/ hyper-parameters) in order to find the best one.
What I'd like to get, is a graph/s where on the X-axis I have hyper-parameter values and on the Y-axis I have the validation accuracy.

I assume you are referring to: https://pypi.org/project/trains/ (https://github.com/allegroai/trains),
which I'm one of the maintainers.
You can manually create a plot with a single point X-axis for the hyper-parameter value, and Y-Axis for the accuracy.
number_layers = 10
accuracy = 0.95
Task.current_task().get_logger().report_scatter2d(
"performance", "accuracy", iteration=0,
mode='markers', scatter=[(number_layers, accuracy)])
Assuming your hyper-parameter is "number_layers" with current value 10, and the accuracy for the trained model is 0.95.
Then when you compare the experiments you get something like that:

Financial Time Series Forecasting with Keras/Tensorflow: Three forecasting methods tried, three poor results had, what am I doing wrong?

I'm working with a dataset of a bunch of different features for stock forecasting for a project to help teach myself programming and to get into machine learning. I am running into some issues with predicting future data (or time series forecasting) and I was hoping someone out there could give me some advice! Any advice or criticism you could provide will be greatly appreciated.
Below I've listed detailed examples of the three implementations I have tried for forecasting time series data. I could definitely be wrong on this but I don't believe this is mechanical code issue because all of the results are consistent despite me re-coding it a few times (the only thing I can really think of here is not using MinMaxScaler correctly. See closing thoughts). It could definitely, however, be a macrocoding mistake.
I didn't post any code for the project here because it was starting to turn into a wall of words and I had three separate examples, but if you have any questions or think it would benefit your assistance to see the code used for any of the below examples or the data used for all of them feel free to let me know and I'll link whatever's needed.
The three forecasting implementations I have tried:
1) - A sliding window implementation. Input data is shifted backwards in timesteps (x-1, x-2...), target data is current timestep (x). Data used for forecasting the first prediction is n-rows of test data shifted in same manner as input data. For every subsequent prediction the oldest timestep is removed and the prediction is appended to front of prediction row, maintaining the same number of total timesteps but progressing forward in time.
2) - Input data is just x, target data is shifted 30 timesteps forward for prediction (y+1, y+2 ... y+30). Attempting to forecast future by taking the first sample of x in test data and predicting 30 steps into the future with it.
3) - Combination of both methods, input data is shifted backward and in the example shown below, 101 timesteps including the present timestep (x-100, x-99 ... x) were used. Target data, similar to implementation 2, is shifted 30 timesteps into the future (y+1, y+2... y+30). With this, I am attempting to forecast the future by taking 101 timesteps of the first n-rows of test data and predicting 30 steps into the future from there.
For all tests, I cut off the end of my dataset at an arbitrary amount (last ~10% of total dataset), split everything before the cutoff into training/validation (80/20) and saved everything after the cutoff for testing & forecasting purposes.
As for network architectures, I've tried a bunch of different ones, from bidirectional LSTM to multi-input CNN / GRU, to a wavenet like CNN implementation and all produce prediction results that are bad in a similar enough way that I feel like this is either a data manipulation problem or a problem of me not understanding how model.predict(), or my model's output works.
The architectures I will be using for each implementation below are:
1) causal dilation CNN
2) two layers LSTM
neural network architecture diagrams here: https://imgur.com/a/cY2RWNG
For every example below the model's weights were tuned by the model training on the training data (first 80% of dataset) and attempting to achieve the lowest validation loss possible using the validation data (last 20% of dataset).
--- First Implementation ---
(unfortunately, there's an image limit on stack overflow for my current rating or whatever so I've put each implementation into its own album)
Implementation 1 - Graphs for both CNN/LSTM: Images 1-7 https://imgur.com/a/36DZCIf
In every training/validation test graph, black represents the actual data and red is the predicted data, in the forecasting predictions blue represents the predictions made and orange represents the actual close price on a longer time scale than the prediction for better scale, all forecast predictions are 30 days into the future.
Using this methodology, and displaying the actual close price against the predicted close price in every instance:
Image 1 - sliding window set up for this implementation using one and two features and a range of numbers for viewing ease
CNN:
(images 2 & 3 description in album)
Image 4 - Sliding window approach of forecasting every feature in the data, with the prediction for close price plotted against the actual close price. The timesteps start at the first row of the cutoff data.
When the first prediction is made I append the prediction to the end of this row and remove the first timestep, repeating for every future timestep I wish to predict.
I really don't even know what to say about this prediction, it's really bad...
LSTM:
(images 5 & 6 description in album)
Image 7 - Sliding window prediction: https://i.imgur.com/Ywf6xvr.png
This prediction seems to be getting the trend somewhat accurately I guess.. But the starting point is really nowhere near the last known data point which is confusing.
--- Second Implementation ---
Implementation 2 - Graphs for both CNN/LSTM: Images 1-7
https://imgur.com/a/3CAk1xc
For this attempt, I made the target prediction many timesteps into the future. With this implementation, the model takes in the current timestep(x) of features and attempts to predict the closing price at y+1, y+2,y+3 etc. There is only one prediction here -- a sequence of time steps into the future.
The same graphing and network conventions as implementation 1 had applied to this too.
Image 1 - Set up of input and target data, using a range and only one or two features for viewing ease.
CNN:
(images 2 & 3 description in album)
Image 4 - Plotting all 30 predictions made from the first row of data features after the cutoff... this is horrible, why is the start again nowhere near the last known data point? I don't understand how it can predict y+1 being so far away from the closing price of x when in every instance of its training y+1 was almost certainly extremely close to x.
LSTM:
(images 5 & 6 description in album)
Image 7 - All 30 predictions into the future made from the first row of cutoff data: Again, both all over the place and the predictions start nowhere near the last actual data point, not sure what else to add.
It's starting to appear that either my CNN implementation is poorly done or LSTM is just a better solution here. Regardless, the predictions and actual forecasting are still terrible so I'll hold my judgment on the network architecture until I get something that looks remotely like an actual forecast prediction.
--- Third Implementation ---
Implementation 3 - Graphs for both CNN/LSTM: Images 1-7
https://imgur.com/a/clcKFF8
This was the final idea I had for forecasting the future and it's essentially a combination of the first two. For this implementation, I take x-n (x, x-1, x-2, x-3) etc., which is similar to the first implementation, and set the target data to y+1, y+2, y+3, which is similar to the second implementation. My goal for predicting with this was the same strategy as the second implementation where I would predict 30 days into the future, but instead of doing so on one timestep of features, I'd do so on many timesteps into the past. I had hoped that this implementation would give the prediction enough supporting data to accurately forecast the future
Image 1 - Input data or "x" and Target data or "y" implementation and set up. I use a range of numbers again. In this example, the input data has 2 features, includes the present timestep (x) and 4 timesteps shifted backward (x-1, x-2, x-3, x-4) and the target data has 5 timesteps into the future (y+1, y+2, y+3, y+4, y+5)
CNN:
(images 2 & 3 description in album)
Image 4 - 30 predictions into the future using 101 timesteps of x
This is probably the worst result yet and that's despite the prediction having way more timesteps back of data to use.
LSTM:
(images 5 & 6 description in album)
Image 7 - 30 predictions on input data row of 101 timesteps.
This actually has some action to it I guess, but it's all over the place, doesn't start near the last actual data point and is clearly not accurate at all.
closing thoughts
I've also tried removing the target variable (close price) from the input data but it doesn't seem to change much and the past n-days of closing data should be available to a model anyway I would think.
Originally I MinMaxScaled all of my data in my pre-processing page and did not inverse_transform any of the data. The results were basically just as bad as the examples above. For the examples above I have min max scaled the prediction, validation & test datasets separately to be within the range of 0.2 - 0.8. For the actual forecasting predictions, I've inverse_transformed the data before plotting it against the actual closing price which was never transformed.
If I am doing something fundamentally wrong in the above examples I would love to know as I'm just starting out and this is my first real programming/machine learning project.
A few other things relating to this that I've come across / tried:
I've experimented briefly with using a stateful model where I reset_states() after every prediction to some moderate success.
I've read that sequence to sequence models could be useful for forecasting time series data but I'm really not sure what that system is designed to do with time series despite reading into it quite a bit and am thus not sure how to implement it or test it out.
I tried bidirectional LSTM because of one random StackOverflow post suggesting it for time series forecasting... the results were very mediocre however and it doesn't seem to make much sense to me in this situation from what I understand of how it works. I've only tried it with the first implementation above though, let me know if it's something to look more into.
Any tips/criticism at all that you could provide would be greatly appreciated, I'm really not sure how to progress from here. Thanks in advance!

I have been through that,for me the sliding window approach with LSTM, NN worked like magic for small time series But on a bigger time series with data coming in on hourly basis for a few years it failed miserably.
Later on I ditched LSTM,GBTs & started implementing algos from statsmodels.tsa, ARIMA, SARIMA most of the time, I'll suggest you to read about them too. Very easy to implement no need to worry about sliding window, moving data few timestamps back, it takes care of all. Just train, tune the parameters & predict for the next timestamps.
Sometimes, I also faced issues like my time series had missing timestamps & data, then I had to impute those values, the frequency on which I trained (hourly,weekly,monthly) was different from the frequency on which I wanted to predict, then I had to bring data in right form too. I faced different frequency issue while visualising on a plot as well.
model=statsmodels.api.tsa.SARIMAX(train_df,order=(1,0,1),seasonal_order=(1,1,0,24))
model = model.fit()
other than the data pre-processing part, imputing missing data, training on right frequency & some logic for parameters tuning, you will need to use just these two lines, your data_frame will have the index in a date format & columns will have time series data

Why does ARIMA fit properly but generate flat predictions?

Model Fits but the Predictions Fail
Using a (4,0,13) ARIMA model on the following data shown in the picture below yields flat predictions (also shown shown in the second picture below). I am not sure why the model can fit the data in the training set, but then predict nothing afterwards. I found another question here which said I needed to add a seasonal component. I detail my experience with that below.
The Time Series (zoomed in)
The Predictions*
* The predictions plot shows all the training data as well as the validation data after the orange vertical line. The training fit is rounded to be integers (it's not possible to have real numbers in this dataset). Note the prediction is just flat and then dies.
Problem Definition
I have 15 minute interval data and desire to apply a SARIMA model to it. It has a daily seasonality, which is defined from 7am-9pm (therefore, every 4 * 15 = 60 periods (4, 15 minute periods in an hour * 15 hours)). I first tested for stationarity with the Augmented Dickey-Fuller test. This passed, and so I started to analyze the ACF and PACF to determine the SARIMA parameters.
Parameter Determination
(p,d,q)
ACF & PACF on Original Data
From this, I see there is no unit root (sum of ACF and PACF do not equal 1), and that we need to difference the series since there is no big cut off in the ACF.
ACF & PACF on Differenced Data
From this, I see it is slightly overdifferenced, so I may want to try no integrated term and add an AR term at 15 (the point where the ACF in the original plot enters the bands). I also add an MA term here.
(P,D,Q)s
I now look for the seasonal component. I do a seasonal difference of period 60 since that's where the spike is in the plots.
Seasonal difference
Seeing this, I should add 2 MA terms to the seasonal component (Rules 13 and 7 from here) But the site also says to not use more than 1 seasonal MA usually, so I leave it at 1.
Model
This leaves me with a SARIMA(0,1,1)(0,1,1,60) model. However, I run out of memory trying to fit this model (Python, using the statsmodels SARIMA function).
Question
Did I choose the parameters correctly? Is this data even fittable by ARIMA/SARIMA? And lastly, would the 60 period SARIMA actually work and I just need to find a way to run it on a different machine?
I guess the tl;dr question is: what am I doing wrong?
Feel free to go into detail. I want to become well informed with time series and so more information is better!

to select the best fit model, you use the AIC/BIC test to find the model that receives best results. You test different combination of Q and P.
Further，normally the model follows rule: q+d+p+Q+D+P < 6
BR
A.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.