Using torchmetrics with nan values

Using torchmetrics with nan values - python

I'm, working on a a DL project and using pytorch lightning and torchmetrics.
I'm using a metric that is just irrelevant for examples, which means that there exist batches for which this metric will get a NaN value.
The problem is the aggregated value for the function epoch-wise is also computed to be NaN because of that.
Is there a possible workaround? torchmetrics is very convenient and I would like to avoid a switch.
I have seen the torchmetrics.MeanMetric object (and similar ones) but I couldn't make it work. If the solution goes through this kind of object I would very much appreciate an example.

Related

MNLogit fit and summary displays all nan

I am new to ML world.
Trying to do Logistic regression from Stats model. However, when I execute I get current Function Value as nan
I tried checking if dataframe is finite as I saw it might be cause. But that turns out to be ok.
Referred the below link, but did not work in my case.
update : Still did not figure it out
Referred Links :
MNLogit in statsmodel returning nan
numpy: Invalid value encountered in true_divide
Please can someone help me on this ?
[![Finite Values result][1]][1]
[![Error with current function value as nan][2]][2]
[![All Nans in summary][3]][3]

After reviewing your problem and the solution identified in referred link#1. {FYI it gives the same error as shown in your screen capture.}
It seems like you need to identify a different solving method.
In your code you can try to do the following to have the same solution as link#1.
result=logit_model.fit(method='bfgs')
I'm sorry if you already tried this. Unfortunately, I cannot test it without the dataset, let me know if that works.

If I'm trying to predict a label for a sample, but the sample is missing features, how should I deal with it?

I'm having a conceptual issue right now; I know that sklearn does not like it when .predict() is used on examples with NaN values, but what should I do if I want to predict a label for a example with NaN/missing features?
Currently, I'm replacing the NaN cells with -999 as a placeholder measure, but I'm not sure if that's a good idea. Unfortunately, searching about missing values in prediction samples doesn't yield helpful results.

One approach you could try is to fill in the missing value in your test example with the value you use to fill in missing values in your training dataset. For example, if you fill in missing values for that feature with the mean of the training data, you could use that mean to fill in the missing value in your test example.

Machine learning models perform better when your data is complete, therefore it is advisable that you impute missing values with a summary statistic or with the same information as a closely located data point (using KNN, for instance).
Scikit Learn contains a suite of algorithms to impute missing values. The most common method is to use the SimpleImputer with a "mean" strategy.
You can also use simpler approaches and use Pandas to either fill all NAs in your dataset with fillna() or remove the NAs with dropna().
It is important that you familiarize yourself with the data that you are working with. Sometimes missing data has a meaning to it. For instance, when working with income data, some very affluent people have refused to disclose their income, whereas people with low income would always disclose it. In this case, if the income of the former group was replaced with 0 or the mean, the results of the prediction could have been invalid.
Have a look at this step-by-step guide on how to handle missing data in Python.

How does pairwise comparison training work in XGBoost XGBRanker?

I'm interested in learning to rank with pairwise comparison. While working on this I found that XGBoost has a model called XGBRanker which works very well.
I want to find out how the XGBRanker manages the training data to get such low memory usage and great results?(It uses LambdaMART I believe) I imagine it must be some kind of lookup table for the features and maybe making the pairs iteratively or not using all possible permutations with different labels within one group.
I tried looking through the source code but everything keeps referring to some other XGBoost method and I haven't been able to understand it so far.
I would like to create a similar method to train NNs for pairwise comparison but handling the training data has been a huge hurdle so far.
So more generally my Question would be: How are the pairs created in pairwise ranking anlgorithms?(RankNet,LambdaNet and so on) Are all pairs used? Only a percentage? Is there some other way of doing this? If you're working with >100.000 items you would easily get into the range of hundreds of millions.
I hope someone has some information about this or knows who might.

Can Ngboost algorithm processing missing values automatic?

I get a new GBDT algorithm named Ngboost invented by stanfordmlgroup. I want to use it and call encode
pip install ngboost==0.2.0
to install it.
and then I train a dataset that donot impute or delete missing value.
however I get a error:
Input contains NaN, infinity or a value too large for dtype('float32').
is this mean Ngboost cannot processing missing value automatic like xgboost?

You have two possibilities with this error.
1- You have some really large value. Check the max of your columns.
2- The algorithm don't support NAN and inf type so you have to handle them like in some other regression models.

Here's a response from one of the ngboost creators about that
Hey #omsuchak, thanks for the suggestion. There is no one "natural" or good way to generically handle missing data. If ngboost were to do this for you, we would be making a number of choices behind the scenes that would be obscured from the user.
If we limited ourselves to use cases where the base learner is a regression tree (like we do with the feature importances) there are some reasonable default choices for what to do with missing data. Implementing those strategies here is probably not crazy hard to do but it's also not a trivial task. Either way, I'd want the user to have a transparent choice about what is going on. I'd be open to review pull requests on that front as they satisfy that requirement, but it's not something I plan on working on myself in the foreseeable future. I'll close for now but if anyone wants to try to add this please feel free to comment.
And then you can see other answer about how to solve that, for example with sklearn.impute.MissingIndicator module (to indicate to the model the presence of missings) or some Imputer module.
If you need a practical example you can try with the survival example (located in the repo!).

Impute multiple missing values in a feature-vector

Edited post
This is a short and somewhat clarified version of the original post.
We've got a training dataset (some features are significantly correlated). The feature space has 20 dimensions (all continuous).
We need to train a nonparametric (most features form nonlinear subspaces and we can't assume a distribution for any of them) imputer (kNN or tree-based regression) using the training data.
We need to predict multiple missing values in query data (a query feature-vector can have up to 13 missing features, so the imputer should handle any combination of missing features) using the trained imputer. NOTE the imputer should not be in any way retrained/fitted using the query data (like it is done in all mainstream R packages I've found so far: Amelia, impute, mi and mice...). That is the imputation should be based solely on the training data.
The purpose for all this is described below.
A small data sample is down below.
Original post (TL;DR)
Simply put, I've got some sophisticated data imputing to do. We've got a training dataset of ~100k 20D samples and a smaller testing dataset. Each feature/dimension is a continuous variable, but the scales are different. There are two distinct classes. Both datasets are very NA-inflated (NAs are not equally distributed across dimensions). I use sklearn.ensemble.ExtraTreesClassifier for classification and, although tree ensembles can handle missing data cases, there are three reasons to perform imputation
This way we get votes from all trees in a forest during classification of a query dataset (not just those that don't have a missing feature/features).
We don't loose data during training.
scikit implementation of tree ensembles (both ExtraTrees and RandomForest) do not handle missing values. But this point is not that much important. If it wasn't for the former two I would've just used rpy2 + some nice R implementation.
Things are quite simple with the training dataset because I can apply class-specific median imputation strategy to deal with missing values and this approach has been working fine so far. Obviously this approach can't be applied to a query - we don't have the classes to begin with. Since we know that the classes will likely have significantly different shares in the query we can't apply a class-indifferent approach because that might introduce bias and reduce classification performance, therefore we need to impute missing values from a model.
Linear models are not an option for several reasons:
all features are correlated to some extent;
theoretically we can get all possible combinations of missing features in a sample feature-vector, even though our tool requires at least 7 non-missing features we end up with ~1^E6 possible models, this doesn't look very elegant if you ask me.
Tree-based regression models aren't good for the very same reason. So we ended up picking kNN (k nearest neighbours), ball tree or LSH with radius threshold to be more specific. This approach fits the task quite well, because dimensions (ergo distances) are correlated, hence we get nice performance in extremely NA-rich cases, but there are several drawbacks:
I haven't found a single implementation in Python (including impute, sklearn.preprocessing.Imputer, orange) that handles feature-vectors with different sets of missing values, that is we want to have only one imputer for all possible combinations of missing features.
kNN uses pair-wise point distances for prediction/imputation. As I've already mentioned our variables have different scales, hence the feature space must be normalised prior to distance estimations. And we need to know theoretic max/min values for each dimension to scale it properly. This is not as much of a problem, as it is a matter architectural simplicity (a user will have to provide a vector of min/max values).
So here is what I would like to hear from you:
Are there any classic ways to address the kNN-related issues given in the list above? I believe this must be a common case, yet I haven't found anything specific on the web.
Is there a better way to impute data in our case? What would you recommend? Please, provide implementations in Python (R and C/C++ are considered as well).
Data
Here is a small sample of the training data set. I reduced the number of features to make it more readable. The query data has identical structure, except for the obvious absence of category information.
v1 v2 v3 v4 v5 category
0.40524 0.71542 NA 0.81033 0.8209 1
0.78421 0.76378 0.84324 0.58814 0.9348 2
0.30055 NA 0.84324 NA 0.60003 1
0.34754 0.25277 0.18861 0.28937 0.41394 1
NA 0.71542 0.10333 0.41448 0.07377 1
0.40019 0.02634 0.20924 NA 0.85404 2
0.56404 0.5481 0.51284 0.39956 0.95957 2
0.07758 0.40959 0.33802 0.27802 0.35396 1
0.91219 0.89865 0.84324 0.81033 0.99243 1
0.91219 NA NA 0.81033 0.95988 2
0.5463 0.89865 0.84324 0.81033 NA 2
0.00963 0.06737 0.03719 0.08979 0.57746 2
0.59875 0.89865 0.84324 0.50834 0.98906 1
0.72092 NA 0.49118 0.58814 0.77973 2
0.06389 NA 0.22424 0.08979 0.7556 2

Based on the new update I think I would recommend against kNN or tree-based algorithms here. Since imputation is the goal and not a consequence of the methods you're choosing you need an algorithm that will learn to complete incomplete data.
To me this seems very well suited to use a denoising autoencoder. If you're familiar with Neural Networks it's the same basic principle. Instead of training to predict labels you train the model to predict the input data with a notable twist.
The 'denoising' part refers to a intermediate step where you randomly set some percentage of the input data to 0 before attempting to predict it. This forces the algorithm to learn more rich features and how to complete the data when there are missing pieces. In your case I would recommend a low amount of drop out in training (since your data is already missing features) and no dropout in test.
It would be difficult to write a helpful example without looking at your data first, but the basics of what an autoencoder does (as well as a complete code implementation) are covered here: http://deeplearning.net/tutorial/dA.html
This link uses a python module called Theano which I would HIGHLY recommend for the job. The flexibility the module trumps every other module I've looked at for Machine Learning and I've looked at a lot. It's not the easiest thing to learn, but if you're going to be doing a lot of this kind of stuff I'd say it's worth the effort. If you don't want to go through all that then you can still implement a denoising autoencoder in Python without it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.