I applied a linear regression on some features to predict the target with 10 folds cross validation.
MinMax scale was applied for both the features and the target.
Then the features standardized.
When I run the model, the r2 equal to 0.65 and MSE is 0.02.
But when I use the target as they are without MinMax scaling, I got r2 same but the MSE increase a lot to 18.
My question is, do we have to deal with targets as same we do with features in terms of data preprocessing? and which of the values above is correct? because the mse got quit bigger with out scaling the target.
Some people say we have to scale the targets too while others say no.
Thanks in advance.
Whether you scale your target or not will change the 'meaning' of your error. For example, consider 2 different targets, one ranged [0, 100] and another one [0, 10000]. If you run models against them (with no scaling), MSE of 20 would mean different things for the two models. In the former case it will be disastrous, while in the latter case it will be pretty decent.
So the fact that you get lower MSE with target range [0, 1] than the original is not surprising.
At the same time, r2 value is independent of the range since it is calculated using variances.
Scaling allows you to compare model performance for different targets, among other things.
Also for some model types (like NNs) scaling would be more important.
Hope it helps!
Related
For data that are one-dimensional or consist of a single column, calculating the error (RMSE) is simple. We can use python library, for instance
from sklearn.metrics import mean_squared_error
RMSE = mean_squared_error(y_actual, y_predicted, squared=TRUE)
Models can have multiple output columns 2, 10 or even 100..
How to calculate RMSE if the data has few columns?
For example:
y_act = np.array([1.022,0.94,1.278,2.096,1.404,
2.035,1.622,2.348,1.909,1.678,
1.638,1.742,2.279,1.878,2.045] )
y_actual = y_act.reshape((5,3))
y_pred = np.array([1.021,0.84,1.111,2.091,1.314,
2.131,1.622,2.348,1.888,1.178,
1.238,1.632,2.119,1.677,2.145] )
y_predicted = y_pred.reshape((5,3))
RMSE(y_actual - y_predicted)?
How the formula of the error changes?
The formula remains the same. How you use it will depend on your use case.
In your case you have 5 samples with 3 outputs(columns). Probably you ran some model on 3 different algorithms and got these results from them.
The difference from the 1D version is how you want to treat each output. You could do three things with it. In 1D version, you have just one value, here you have 3.
Leave it as it is: In this case you have an MSE value each for all 3 outputs.
Uniformly Average Them: Take the average of three MSE value you get to get one MSE value.
Weighted Average: You can do a weighted average of the three MSE values.
These functionalities are available in the sklearn under the 'multi-output' parameter.
Here are examples to help you with manual computation v/s package computation
I'm new to ML and would be grateful for any assistance provided. I've run a linear regression prediction using test set A and training set A. I saved the linear regression model and would now like to use the same model to predict a test set A target using features from test set B. Each time I run the model it throws up the error below
How can I successfully predict a test data set from features and a target with different shapes?
Input
print(testB.shape)
print(testA.shape)
Output
(2480, 5)
(1315, 6)
Input
saved_model = joblib.load(filename)
testB_result = saved_model.score(testB_features, testA_target)
print(testB_result)
Output
ValueError: Found input variables with inconsistent numbers of samples: [1315, 2480]
Thanks again
They are inconsistent shapes which is why the error is being thrown. Have you tried to reshape the data so one of them are same shape? From a quick look, it seems that you have more samples and one less feature in testA.
Think about it, if you have trained your model with 5 features you cannot then ask the same model to make a prediction given 6 features. You speak of using a Linear Regressor, the equation is roughly:
y = b + w0*x0 + w1*x1 + w2*x2 + .. + wN-1*xN-1
Where {
y is your output/label
N is the number of features
b is the bias term
w(i) is the ith weight
x(i) is the ith feature value
}
You have trained a linear regressor with 5 features, effectively producing the following
y (your output/label) = b + w0*x0 + w1*x1 + w2*x2 + w3*x3 + w4*x4
You then ask it to make a prediction given 6 features but it only knows how to deal with 5.
Aside from that issue, you also have too many samples, testB has 2480 and testA has 1315. These need to match, as the model wants to make 2480 predictions, but you only give it 1315 outputs to compare it to. How can you get a score for 1165 missing samples? Do you now see why the data has to be reshaped?
EDIT
Assuming you have datasets with an equal amount of features as discussed above, you may now look at reshaping (removing data) testB like so:
testB = testB[0:1314, :]
testB.shape
(1315, 5)
Or, if you would prefer a solution using the numpy API:
testB = np.delete(testB, np.s_[0:(len(testB)-len(testA))], axis=0)
testB.shape
(1315, 5)
Keep in mind, when doing this you slice out a number of samples. If this is important to you (which it can be) then it may be better to introduce a pre-processing step to help out with the missing values, namely imputing them like this. It is worth noting that the data you are reshaping should be shuffled (unless it is already), as you may be removing parts of the data the model should be learning about. Neglecting to do this could result in a model that may not generalise as well as you hoped.
I'm fairly new to deep learning and Keras, and this problem has bothered me for weeks. Hope I can get some hints from here.
Features:
I simulated two variables, each has 10k samples and follow a standard normal distribution: A ~ Norm(0, 1); B ~ Norm(0, 1).
Labels
And I derived two labels from the simulated variables: y1 = A * B; y2 = A / B.
Model
Input dimension: 2
Hidden layers: 4 dense layers, all of them were 32 neurons wide
Output layers: a dense layer with 1 neuron
Activation functions: ReLU for all the activation functions
Compiler: 'MSE' as the loss function, 'Adam' as the optimizer with learning rate at 1e-05
Tasks
Finally, I set up three tasks for MLP to learn:
(1) Use A, B to predict y1;
(2) Use A, B to predict y2;
(3) Use A, 1/B to predict y2
Validation
Use 'validation_split = 0.2' to verify the model
Results and Inference
It can reach MSE below 1 easily for both training and validation set after 10~15 epochs in task 1. However, I'll always get a very high loss like 30k+ on training loss for the other two tasks.
[update] I also evaluated the results by Pearson correlation coefficient, which returned ~0.7 for task 1 and <0.01 for task 2 and 3.
It's weird to me since the ideas of multiplication(y1) and division(y2) are mathematically the same. So I then tried to look into the distribution of 1/B, and I found that it has extremely long tails at each side. I surpose it might be the source of difficulty but couldn't figure any strategy for it. I also tried to normalize 1/B before the training but got no luck on it.
Any advice or comment is welcome. Can't find discussion on this either on web or books, really want to make some progress on it. Thank you.
y2 values have a much different distribution from y1 values, specifically, it will have values with much larger absolute values. This means that comparing the loss directly isn't really fair.
It's kinda like estimating the mass of a person vs. estimating the mass of a planet, and being upset that you're off by millions of pounds.
For an illustration, try calculating the loss on all three problems, but with an estimator that only ever guesses 0.0. I suspect that problem 1 will have much lower loss than the other two.
When I am using count:poisson instead of rmse I am seeing nloglikelihood values. Now I am not sure how to compare those numbers with rmse or mae.
Definitely lesser the value better .. but not getting actual error intuition that we get with rmse or Mae.
For example -> train-poisson-nloglik:2.01885 val-poisson-nloglik:2.02898
Here can we say, actual values differ by 2.02 error.
Can someone explain with small example.
Thanks.
There is a good post on the computation of the value here
Just to be more exhaustive, the value is:
mean(factorial(label) + preds - label*log(preds))
If you compare with the true formula of the negative log-likelihood, it should be the sum instead of the mean. I guess that they choose to take the mean so that the train and the test values are more comparable.
Finally, to answer the question, the likelihood is the probability that the data came from the distribution with a specific parameter. In the Poisson model, the parameters are just the set of predictions. So the better is your prediction, the greater is the probability, the smaller is the associate negative log-likelihood.
rmse or mae are based on the expectation of the difference between the prediction and the truth whereas negative log-likelihood is looking at a probability.
Lets say I've ground truth (output) of a number in range [0-100].
Is it possible to learn to predict a number that minimize the delta from the original number (gt) according the input ?
The objective types of Keras are here http://keras.io/objectives/
I think your best bet would be to use mean squared error (loss='mse'), which penalizes predictions based on the square of their difference from the ground truth. You would also want to use linear activations (the default) for the last layer.
If you're especially concerned about keeping the predictions within the range [0, 100], you could create a modified objective function that penalizes predictions outside [0, 100] even more than quadratically, but that's probably not necessary and you could instead just clip the predictions using np.clip(predictions, 0, 100).