Intuition behind nloglikelihood value in xgboost poisson run

Intuition behind nloglikelihood value in xgboost poisson run - python

When I am using count:poisson instead of rmse I am seeing nloglikelihood values. Now I am not sure how to compare those numbers with rmse or mae.
Definitely lesser the value better .. but not getting actual error intuition that we get with rmse or Mae.
For example -> train-poisson-nloglik:2.01885 val-poisson-nloglik:2.02898
Here can we say, actual values differ by 2.02 error.
Can someone explain with small example.
Thanks.

There is a good post on the computation of the value here
Just to be more exhaustive, the value is:
mean(factorial(label) + preds - label*log(preds))
If you compare with the true formula of the negative log-likelihood, it should be the sum instead of the mean. I guess that they choose to take the mean so that the train and the test values are more comparable.
Finally, to answer the question, the likelihood is the probability that the data came from the distribution with a specific parameter. In the Poisson model, the parameters are just the set of predictions. So the better is your prediction, the greater is the probability, the smaller is the associate negative log-likelihood.
rmse or mae are based on the expectation of the difference between the prediction and the truth whereas negative log-likelihood is looking at a probability.

Related

RMSE for Multidimensional Data

For data that are one-dimensional or consist of a single column, calculating the error (RMSE) is simple. We can use python library, for instance
from sklearn.metrics import mean_squared_error
RMSE = mean_squared_error(y_actual, y_predicted, squared=TRUE)
Models can have multiple output columns 2, 10 or even 100..
How to calculate RMSE if the data has few columns?
For example:
y_act = np.array([1.022,0.94,1.278,2.096,1.404,
2.035,1.622,2.348,1.909,1.678,
1.638,1.742,2.279,1.878,2.045] )
y_actual = y_act.reshape((5,3))
y_pred = np.array([1.021,0.84,1.111,2.091,1.314,
2.131,1.622,2.348,1.888,1.178,
1.238,1.632,2.119,1.677,2.145] )
y_predicted = y_pred.reshape((5,3))
RMSE(y_actual - y_predicted)?
How the formula of the error changes?

The formula remains the same. How you use it will depend on your use case.
In your case you have 5 samples with 3 outputs(columns). Probably you ran some model on 3 different algorithms and got these results from them.
The difference from the 1D version is how you want to treat each output. You could do three things with it. In 1D version, you have just one value, here you have 3.
Leave it as it is: In this case you have an MSE value each for all 3 outputs.
Uniformly Average Them: Take the average of three MSE value you get to get one MSE value.
Weighted Average: You can do a weighted average of the three MSE values.
These functionalities are available in the sklearn under the 'multi-output' parameter.
Here are examples to help you with manual computation v/s package computation

how to find the model precision Faster_rcnn_inception_v2?

please help .
run eval.py of the tensorflow detection model
I want to find the precision and I got this data.
Can someone explain to me if it's ok or not and what can I do please.
I am new in these subjects
enter image description here

To explain the concepts in detail.
For object detection instead of precision Average precision is considered.
AP (Average Precision) is a popular metric in measuring the accuracy of object detectors.
Average precision computes the average precision value for recall value over 0 to 1.
So what you need to do in your case is to create a DataFrame with Objects, Actual value and Prediction value, the prediction you can say TRUE if your IoU value >=0.5.
IoU measures the overlap between 2 boundaries. We use that to measure how much our predicted boundary overlaps with the ground truth (the real object boundary). In some datasets, we predefine an IoU threshold (say 0.5) in classifying whether the prediction is a true positive or a false positive.
Then you can calculate the average precision accordingly.
Hope this answer helps you, Happy Learning!

Likelihood Ratio Test and pseudo Rsquared differ significantly (Logistic Regression)

I apply a logistic regression and I would like to test for statistical sigificance of my overall model.
Now, the pseudo-Rsquared (McFaddon) Rsquared = 1 - L(c)/L(null) returns the variance explained by the model - where L(c) denotes the maximized likelihood value from the fitted model and L(null) denotes the corresponding value for the null model (no covariates, only intercept).
The likelihood test statistic is LR = 2 * (L(c) - L(null)) which follows a Chi-squared distribution and can be tested for significance according to the models degree of freedoms.
Anyways, I use the Chi-squared to calculate a p-value which is highly significant, but the pseudo Rsquared is around 0.021 ???
Why does Rsquared and the overall p-value differ so much?
Using an accuracy calulation for some test-data metrics.accuracy_score(y_test, y_pred), I see that the accuracy for the test data is only around 55% (for the training data its around 60%).
Can someone help me to interpret my results?

Maybe there is a correlation which is significant, but the impact is still small: since you are doing classification, you could check if examples with this variable (=1 in the binary case) have a slightly higher/lower probability to be member of class 1 than those without that variable (=0 in the binary case):
examples with the variable being 1 have a chance of 50% to belong to class 1 while examples with that variable being 0 have a chance of 48% to belong to class 1.
If lots of examples exist who have that variable, the effect might still be significant (p value), but it will hardly predict the right class alone (explain the variance - r squared).
This might be the reference which could help you understand this graphically for another problem: https://blog.minitab.com/blog/adventures-in-statistics-2/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values

Wasserstein loss can be negative?

I'm currently training a WGAN in keras with (approx) Wasserstein loss as below:
def wasserstein_loss(y_true, y_pred):
return K.mean(y_true * y_pred)
However, this loss can obviously be negative, which is weird to me.
I trained the WGAN for 200 epochs and got the critic Wasserstein loss training curve below.
The above loss is calculated by
d_loss_valid = critic.train_on_batch(real, np.ones((batch_size, 1)))
d_loss_fake = critic.train_on_batch(fake, -np.ones((batch_size, 1)))
d_loss, _ = 0.5*np.add(d_loss_valid, d_loss_fake)
The resulting generated sample quality is great, so I think I trained the WGAN correctly. However I still cannot understand why the Wasserstein loss can be negative and the model still works. According to the original WGAN paper, Wasserstein loss can be used as a performance indicator for GAN, so how should we interpret it? Am I misunderstand anything?

The Wasserstein loss is a measurement of Earth-Movement distance, which is a difference between two probability distributions. In tensorflow it is implemented as d_loss = tf.reduce_mean(d_fake) - tf.reduce_mean(d_real) which can obviously give a negative number if d_fake moves too far on the other side of d_real distribution. You can see it on your plot where during the training your real and fake distributions changing sides until they converge around zero. So as a performance measurement you can use it to see how far the generator is from the real data and on which side it is now.
See the distributions plot:
P.S. it's crossentropy loss, not Wasserstein.
Perhaps this article can help you more, if you didn't read it yet. However, the other question is how the optimizer can minimize the negative loss (to zero).

Looks like I cannot make a comment to the answer given by Sergeiy Isakov because I do not have enough reputations. I wanted to comment because I think that information is not correct.
In principle, Wasserstein distance cannot be negative because distance metric cannot be negative. The actual expression (dual form) for Wasserstein distance involves the supremum of all the 1-Lipschitz functions (You can refer to it on the web). Since it is the supremum, we always take that Lipschitz function that gives the largest value to obtain the Wasserstein distance. However, the Wasserstein we compute using WGAN is just an estimate and not really the real Wasserstein distance. If the inner iterations of the critic are low it may not have enough iterations to move to a positive value.
Thought experiment: If we suppose that we obtain a Wasserstein estimate that is negative, we can always negate the critic function to make the estimate positive. That means there exist a Lipschitz function that gives a positive value which is larger than that Lipschitz function that gives negative value. So Wasserstein estimates cannot be negative as by definition we need to have the supremum of all the 1-Lipschitz functions.

Random Forest Regression MAE

I'm trying to create a random forest regression model to predict a continuous variable with the following characteristics (after log-transformation):
count 2868.000000
mean 4.566630
std 1.233496
min -0.356675
25% 3.745966
50% 4.604670
75% 5.452325
max 8.141889
The best result I get in terms of negative mean absolute error is -0.64. How should I evaluate this result? Is it good/decent/bad?
Thank you very much.

here is a link to the docs docs
the simple answer to your question is yes it is a good to have a -0.64 NMAE.
NMAE is the opposite of MAE but you can basically remove the minus and treat it as a normal MAE. so that means if you have a high MAE that means the NMAE is also high and that is not a good result but a low MAE can be seen as low NMAE and that's considered a good result of course depending on what you want to have.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Intuition behind nloglikelihood value in xgboost poisson run - python

Related

RMSE for Multidimensional Data

how to find the model precision Faster_rcnn_inception_v2?

Likelihood Ratio Test and pseudo Rsquared differ significantly (Logistic Regression)

Wasserstein loss can be negative?

Random Forest Regression MAE

Categories

Resources