support vector regression time series forecasting - python

support vector regression time series forecasting - python - python

I have a dataset of peak load for a year. Its a simple two column dataset with the date and load(kWh).
I want to train it on the first 9 months and then let it predict the next three months . I can't get my head around how to implement SVR. I understand my 'y' would be predicted value in kWh but what about my X values?
Can anyone help?

given multi-variable regression, y =
Regression is a multi-dimensional separation which can be hard to visualize in ones head since it is not 3D.
The better question might be, which are consequential to the output value `y'.
Since you have the code to the loadavg in the kernel source, you can use the input parameters.

For Python (I suppose, the same way will be for R):
Collect the data in this way:
[x_i-9, x_i-8, ..., x_i] vs [x_i+1, x_i+2, x_i+3]
First vector - your input vector. Second vector - your output vector (or value if you like). Use method fit from here, for example: http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR.fit
You can try scaling, removing outliers, apply weights and so on. Play :)

Related

Top features of linear regression in python

So I had to create a linear regression in python, but this dataset has over 800 columns. Is there anyway to see what columns are contributing most to the linear regression model? Thank you.

Look at the coefficients for each of the features. Ignore the sign of the coefficient:
A large absolute value means the feature is heavily contributing.
A value close to zero means the feature is not contributing much.
A value of zero means the feature is not contributing at all.

You can measure the correlation between each independent variable and dependent variable, for example:
corr(X1, Y)
corr(X2, Y)
.
.
.
corr(Xn, Y)
and you can test the model selecting the N most correlated variable.
There are more sophisticated methods to perform dimensionality reduction:
PCA (Principal Component Analysis)
(https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c)
Forward Feature Construction
Use XGBoost in order to measure feature importance for each variable and then select the N most important variables
(How to get feature importance in xgboost?)
There are many ways to perform this action and each one has pros and cons.
https://machinelearningmastery.com/dimensionality-reduction-algorithms-with-python/

If you are just looking for variables with high correlation I would just do something like this
import pandas as pd
cols = df.columns
for c in cols:
# Set this to whatever you would like
if df['Y'].corr(df[c]) > .7:
print(c, df['Y'].corr(df[c]))
after you have decided what threshold/columns you want you can append c to a list

Solution to a single feature logistic regression problem

So I'm having a hard time conceptualizing how to make mathematical representation of my solution for a simple logistic regression problem. I understand what is happening conceptually and have implemented it, but I am answering a question which asks for a final solution.
Say I have a simple two column dataset denoting something like likelihood of getting a promotion per year worked, so the likelihood would increase the person accumulates experience. Where X denotes the year and Y is a binary indicator for receiving a promotion:
X | Y
1 0
2 1
3 0
4 1
5 1
6 1
I implement logistic regression to find the probability per year worked of receiving a promotion, and get an output set of probabilities that seem correct.
I get an output weight vector that that is two items, which makes sense as there are only two inputs. The number of years X, and when I fix the intercept to handle bias, it adds a column of 1s. So one weight for years, one for bias.
So I have two few questions about this.
Since it is easy to get an equation of the form y = mx + b as a decision boundary for something like linear regression or a PLA, how can similarly I denote a mathematical solution with the weights of the logistic regression model? Say I have a weights vector [0.9, -0.34], how can I convert this into an equation?
Secondly, I am performing gradient descent which returns a gradient, and I multiply that by my learning rate. Am I supposed to update the weights at every epoch? As my gradient never returns zeros in this case so I am always updating.
Thank you for your time.

The logistic regression is trying to map the input value (x = years) to the output value (y=likelihood) through this relationship:
where theta and b are the weights you are trying to find.
The decision boundary will then be defined as L(x)>p or <p. where L(x) is the right term of the equation above. That is the relationship you want.
You can eventually transform it to a more linear form like the one of linear regression by passing the exponential in numerator and taking the log on both sides.

Regression analysis for linear regression

I have a regression model where my target variable (days) quantitative values ranges between 2 to 30. My RMSE is 2.5 and all the other X variables(nominal) are categorical and hence I have dummy encoded them.
I want to know what would be a good value of RMSE? I want to get something within 1-1.5 or even lesser but I am unaware what I should do to achieve the same.
Note# I have already tried feature selection and removing features will less importance.
Any ideas would be appreciated.

If your x values are categorical then it does not necessarily make much sense binding them to a uniform grid. Who's to say category A and B should be spaced apart the same as B and C. Assuming that they are will only lead to incorrect representation of your results.
As your choice of scale is the unknowns, you would be better in terms of visualisation to set your uniform x grid as being the day number and then seeing where the categories would place on the y scale if given a linear relationship.
RMS Error doesn't come into it at all if you don't have quantitative data for x and y.

Training a neural network when order of indices in target label is meaningless, abitrary

I am trying to train an NN to approximate some coordinates from some 2d data - a matrix for a conv net and a long vector for a traditional NN.
So the network takes the data as input, outputs 6 floats and tries to learn from labels, x1,y1, x2,y2, x3,y3 by minimizing mean squared error.
The issue is that the ordering of the points in the label is meaningless. x1,y1 could just as easily be x3,y3 or vice versa. Nothing in the input data distinguishes one point from any other besides its position.
but, the network when by minimizing sum((approximation - target)^2)/6 assumes that order does matter so that even if the network guessed for ex: .42,.23 when there was a point at .44,.22, the cost function might be very high if it guessed the point in indices 0 and 1 when .44,.22 was in indices 2 and 3 in the target label. As is, the network cannot learn because it has no way of learning which point should go first, second, third, for any given training sample, even if the network does learn the function that maps the input data to the a pair of coordinates.
Are there any elegant solutions to this problem?
I have tried artificially ordering the points, such that points are sorted by x value. (x1 is always the smallest and x3 is always the largest). This helps, but it is not ideal as the network then has to learn an additional function on top of finding the coordinates of these points.
Please don't tell me to just use other methods, I am, but i would like to explore how these other methods stack up against dnns.
thank you!

get the best features from matrix n X m

I have a X matrix with 1000 features (columns) and 100 lines of float elements and y a vector target of two classes 0 and 1, the dimension of y is (100,1). I want to compute the 10 best features in this matrix which discriminate the 2 classes. I tried to use the chi-square defined in scikit-learn but X is of float elements.
Can you help me and tell me a function that I can use.
Thank you.

I am not sure what you mean by X is of float elements. Chi2 works for non-negative histogram data (i.e. l1 normalized). If you data doesn't satisfy this, you have to use another method.
There is a whole module of feature selection algorithms in scikit-learn. Have you read the docs? The simplest one would be using SelectKBest.

Recursive Feature Elimination(RFE) has been really effective for me. This method assigns weights to all the features initially, and removes the feature with the least weight. This step is applied repeatedly till we achieve our desired number of features (in your case 10).
http://scikit-learn.org/stable/modules/feature_selection.html#recursive-feature-elimination
As far as I know, if you data is correlated, L1 penalty selection might not be the best idea. Correct me if I'm wrong.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.