how to find the model precision Faster_rcnn_inception_v2? - python

please help .
run eval.py of the tensorflow detection model
I want to find the precision and I got this data.
Can someone explain to me if it's ok or not and what can I do please.
I am new in these subjects
enter image description here

To explain the concepts in detail.
For object detection instead of precision Average precision is considered.
AP (Average Precision) is a popular metric in measuring the accuracy of object detectors.
Average precision computes the average precision value for recall value over 0 to 1.
So what you need to do in your case is to create a DataFrame with Objects, Actual value and Prediction value, the prediction you can say TRUE if your IoU value >=0.5.
IoU measures the overlap between 2 boundaries. We use that to measure how much our predicted boundary overlaps with the ground truth (the real object boundary). In some datasets, we predefine an IoU threshold (say 0.5) in classifying whether the prediction is a true positive or a false positive.
Then you can calculate the average precision accordingly.
Hope this answer helps you, Happy Learning!

Related

Does log_softmax combine with cross entropy loss learn to decrease the denominator and increase the numerator?

I've trained a dual encoder. I intend to use metric based log softmax and cross entropy as a loss function. The formula in the image below:
My idea is that decrease the distance between anchor and the centroid of the positive class prototype and increase the distance between anchor with other negative classes.
Note that the negative sign is necessary since I want to increase the probability for close-by vectors and have a low probability for distant vectors.
But 70% the classes of the data has just one instance. So the distance between anchor and its own is 0. But I still hopefully this formula push away the distance between anchor with negative classes.
Am I right?

Likelihood Ratio Test and pseudo Rsquared differ significantly (Logistic Regression)

I apply a logistic regression and I would like to test for statistical sigificance of my overall model.
Now, the pseudo-Rsquared (McFaddon) Rsquared = 1 - L(c)/L(null) returns the variance explained by the model - where L(c) denotes the maximized likelihood value from the fitted model and L(null) denotes the corresponding value for the null model (no covariates, only intercept).
The likelihood test statistic is LR = 2 * (L(c) - L(null)) which follows a Chi-squared distribution and can be tested for significance according to the models degree of freedoms.
Anyways, I use the Chi-squared to calculate a p-value which is highly significant, but the pseudo Rsquared is around 0.021 ???
Why does Rsquared and the overall p-value differ so much?
Using an accuracy calulation for some test-data metrics.accuracy_score(y_test, y_pred), I see that the accuracy for the test data is only around 55% (for the training data its around 60%).
Can someone help me to interpret my results?
Maybe there is a correlation which is significant, but the impact is still small: since you are doing classification, you could check if examples with this variable (=1 in the binary case) have a slightly higher/lower probability to be member of class 1 than those without that variable (=0 in the binary case):
examples with the variable being 1 have a chance of 50% to belong to class 1 while examples with that variable being 0 have a chance of 48% to belong to class 1.
If lots of examples exist who have that variable, the effect might still be significant (p value), but it will hardly predict the right class alone (explain the variance - r squared).
This might be the reference which could help you understand this graphically for another problem: https://blog.minitab.com/blog/adventures-in-statistics-2/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values

Negative reward in reinforcement learning

I can't wrap my head around question: how exactly negative rewards helps machine to avoid them?
Origin of the question came from google's solution for game Pong. By their logic, once game finished (agent won or lost point), environment returns reward (+1 or -1). Any intermediate states return 0 as reward. That means each win/loose will return either [0,0,0,...,0,1] either [0,0,0,...,0,-1] reward arrays. Then they discount and standardize rewards:
#rwd - array with rewards (ex. [0,0,0,0,0,0,1]), args.gamma is 0.99
prwd = discount_rewards(rwd, args.gamma)
prwd -= np.mean(prwd)
prwd /= np.std(prwd)
discount_rewards suppose to be some kind of standard function, impl can be found here. Result for win (+1) could be something like this:
[-1.487 , -0.999, -0.507, -0.010, 0.492, 0.999, 1.512]
For loose (-1):
[1.487 , 0.999, 0.507, 0.010, -0.492, -0.999, -1.512]
As result each move gets rewarded. Their loss function looks like this:
loss = tf.reduce_sum(processed_rewards * cross_entropies + move_cost)
Please, help me answer next questions:
Cross entropy function can produce output from 0 -> inf. Right?
Tensorflow optimizer minimize loss by absolute value (doesn't care about sign, perfect loss is always 0). Right?
If statement 2 is correct, then loss 7.234 is equally bad as -7.234. Right?
If everything above is correct, than how negative reward tells machine that it's bad, and positive tells machine that it's good?
I also read this answer, however I still didn't manage to get the idea exactly why negative worse than positive. It makes more sense to me to have something like:
loss = tf.reduce_sum(tf.pow(cross_entropies, reward))
But that experiment didn't went well.
Cross entropy function can produce output from 0 -> inf. Right?
Yes, only because we multiply it by -1. Thinking of the natural sign of log(p). As p is a probability (i.e between 0 and 1), log(p) ranges from (-inf, 0].
Tensorflow optimizer minimize loss by absolute value (doesn't care about sign, perfect loss is always 0). Right?
Nope, the sign matters. It sums up all losses with their signs intact.
If statement 2 is correct, then loss 7.234 is equally bad as -7.234. Right?
See below, a loss of 7.234 is much better than a loss of -7.234 in terms of increasing the reward. The overall positive loss indicates our agent is making a series of good decisions.
If everything above is correct, than how negative reward tells machine that it's bad, and positive tells machine that it's good?
Normalizing Rewards to Generate Returns in reinforcement learning makes a very good point that the signed rewards are there to control the size of the gradient. The positive / negative rewards perform a "balancing" act for the gradient size. This is because a huge gradient from a large loss would cause a large change to the weights. Thus if your agent makes as many mistakes as it does proper moves, the overall update for that batch should not be large.
"Tensorflow optimizer minimize loss by absolute value (doesn't care about sign, perfect loss is always 0). Right?"
Wrong. Minimizing the loss means trying to achieve as small a value as possible. That is, -100 is "better" than 0. Accordingly, -7.2 is better than 7.2. Thus, a value of 0 really carries no special significance, besides the fact that many loss functions are set up such that 0 determines the "optimal" value. However, these loss functions are usually set up to be non-negative, so the question of positive vs. negative values doesn't arise. Examples are cross entropy, squared error etc.

Intuition behind nloglikelihood value in xgboost poisson run

When I am using count:poisson instead of rmse I am seeing nloglikelihood values. Now I am not sure how to compare those numbers with rmse or mae.
Definitely lesser the value better .. but not getting actual error intuition that we get with rmse or Mae.
For example -> train-poisson-nloglik:2.01885 val-poisson-nloglik:2.02898
Here can we say, actual values differ by 2.02 error.
Can someone explain with small example.
Thanks.
There is a good post on the computation of the value here
Just to be more exhaustive, the value is:
mean(factorial(label) + preds - label*log(preds))
If you compare with the true formula of the negative log-likelihood, it should be the sum instead of the mean. I guess that they choose to take the mean so that the train and the test values are more comparable.
Finally, to answer the question, the likelihood is the probability that the data came from the distribution with a specific parameter. In the Poisson model, the parameters are just the set of predictions. So the better is your prediction, the greater is the probability, the smaller is the associate negative log-likelihood.
rmse or mae are based on the expectation of the difference between the prediction and the truth whereas negative log-likelihood is looking at a probability.

sigmoid output for detection class returns incorrect performance

Summary of my problem: I have a detection (binary classification, unbalanced problem). I use a sigmoid to classify samples. Reported f-score, precision and recall seem to consider both classes, e.g. true positives seem to be the total number of correctly classified samples, and not total number of samples belonging to class '1' that are correctly classified.
Longer explanation: In my experiment I have demographical data about persons and I have to predict whether they bought a product or not. I used PCA to reduce the initial features to just 4 features and the data is stored in csv file (first column has the class labels, '0' and '1'). Note that most people didn't buy and then the two classes are very unbalanced. I use the CSVDataset class to read it:
dataset: &train !obj:pylearn2.datasets.csv_dataset.CSVDataset {
path: 'input.csv',
task: 'classification'
}
I want to start with a simple classification model and I use f-score as performance measure. Therefore, my first idea was to use a MLP model with a single sigmoid layer (default monitor 'detection' provides recall, precision, f-score):
model: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.Sigmoid {
layer_name: 'y',
dim: 2,
irange: .005
}
],
nvis: 4,
}
My initial idea was to set dim to 1 (the decision rule would be: if output > 0.5 choose class '1', if < 0.5 choose class '0'). However, I got the error ValueError: Can't convert to VectorSpace of dim 1. Expected either dim=2 (merged one-hots) or 2 (concatenated one-hots) and then I decided to set dim to 2 (decision rule would be: if out1 > out0 choose '1', if out1
In my train.yaml I follow more or less the softmax example notebook provided in the documentation. For example, I use BGD algorithm and set the batch_size as the total number of examples in the training set (74164 examples, a small dataset!) just to avoid confusion when checking the performance manually.
The model was trained with the train.py script that is provided and everything seemed fine, until I had a look at the results. As mentioned earlier, it is a detection problem where the class to detect ('1') happens very rarely. Therefore, I was very surprised to see high values for the reported train_y_f1 (best result is approx. 94%, after one epoch).
To check this, I computed the f-score manually using the provided script predict_csv.py and then loading the predictions. I saw that in fact there were only misses (all '1' were classified as '0'), so precision, recall and f-score should be all zero. Why does the detection monitor report higher values?
After some investigation, I found that the MLP has an output for each class, and I verified (computed it manually and got the same numbers) that true positives and false positives defined in get_detection_channels_from_state() refer actually to both classes, '1' and '0', e.g. true positives is the number of vectors that belong to '1' classified as '1' summed to the number of vectors that belong to '0' classified as '0'. So the MLP is classifying everything as '0', and since nearly all vectors belongs to '0', the performance is good. This is a known problem for unbalanced detection problems, where correct classification rate is not a suitable measure, and it is the reason why we have measures such f-score or AUC. However, if tp and fp in get_detection_channels_from_state() consider both classes, then the reported f-score is not useful (not to me at least).
I can imagine that this is known to the designer of the Sigmoid class, so I can only assume that I am doing something wrong. Hopefully somebody can give me a hint :)
Note: I have submitted this question to the pylearn2 user mailing list. If I get an answer I will copy it here...
The pylearn monitor calculates the f1 score, % misclass, etc, for each batch, not for the entire epoch. When it generates the report, the f1 score is the mean of the f1's for all the batches in the epoch. Reporting the mean over all the batches works just fine when you look at quantities like misclass:
misclass[n] is the score for the nth batch
misclass_epoch = mean(misclass[0] + misclass[1] +.. misclass[n])
however, you can't construct the same statement for the f1 score:
f1_epoch != mean(f1[0] + f1[1] +.. f1[n])
where f1[n] = 2*precision[n]*recall[n]/(precision[n] + recall[n])
For demonstration purposes, try setting the batch size to be the size of the data set (you can get away with this in the mnist example). The f1 score will then be correct.
So the best advice is keep an eye on quantities in the monitor like misclass, where the mean over the batches is the same as the value for the epoch. Once you've trained the nn, then you can make predictions for your entire validation set and calculate the f1 score at that point.

Categories