How to write a scikit-lean estimator with different predict results - python

I'm trying to wrap a new method called "GEMSEC: Embedding with Self Clustering" in a model class that conforms to scikit-learn's conventions.
I read about predict function here and it seems like predict function must return an array of [n_samples,]or [n_samples, n_outputs].
The model I'm implementing does two different things: Learning embeddings (representation) and clustering and I don't know what to return from my the predict function to be suitable for predict functions as defined by scikit-learn.
Thanks in advance

Related

How to evaluate Gaussian Process Latent Variable ModelĀ¶

I am following a tutorial on Gaussian Process Latent Variable Model here is the link https://pyro.ai/examples/gplvm.html
It is a dimension-reduction method.
Now I want to evaluate the model and find the accuracy, confusion matrix is it possible to do so?
I think I have found my answer. I have to make a model that takes the transformed data (dimension-reduced data) as input. After training this model I can evaluate the model.

How to solve problem of imbalanced data using fit_generator [duplicate]

I am trying to use keras to fit a CNN model to classify images. The data set has much more images form certain classes, so its unbalanced.
I read different thing on how to weight the loss to account for this in Keras, e.g.:
https://datascience.stackexchange.com/questions/13490/how-to-set-class-weights-for-imbalanced-classes-in-keras, which is nicely explained. But, its always explaining for the fit() function, not the fit_generator() one.
Indeed, in the fit_generator() function we dont have the 'class_weights' parameter, but instead we have 'weighted_metrics', which I dont understand its description: "weighted_metrics: List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing."
How can I pass from 'class_weights' to 'weighted_metrics'? Would any one have a simple example?
We have class_weight in fit_generator (Keras v.2.2.2) According to docs:
Class_weight: Optional dictionary mapping class indices (integers) to
a weight (float) value, used for weighting the loss function (during
training only). This can be useful to tell the model to "pay more
attention" to samples from an under-represented class.
Assume you have two classes [positive and negative], you can pass class_weight to fit_generator with:
model.fit_generator(gen,class_weight=[0.7,1.3])

With scikit learn, how to use predict_proba in fit_predict?

I am implementing custom estimators thanks to the scikit library and its Pipeline, BaseEstimators, TransformerMixin and other base classes. (you can check the API here)
Given an pipeline, you can call pipeline.fit(X) then pipeline.predict(X) or you can use pipeline.fit_predict(X) which is a bit faster because it applies necessary transformations once instead of twice (one for the fit and one for the predict). So it is used to get an optimization when you want to predict on the same dataset you used to fit.
But some models, like classifiers or clusterers, have a method called predict_proba that return the probability of the classification or labelization.
From the scikit glossary (link):
fit_predict
Used especially for unsupervised, transductive estimators, this fits
the model and returns the predictions (similar to predict) on the
training data. In clusterers, these predictions are also stored in the
labels_ attribute, and the output of .fit_predict(X) is usually
equivalent to .fit(X).predict(X). The parameters to fit_predict
are the same as those to fit.
predict_proba
A method in classifiers and clusterers that are able to return
probability estimates for each class/cluster. Its input is usually
only some observed data, X.
If the estimator was not already fitted, calling this method should
raise a exceptions.NotFittedError.
Output conventions are like those for decision_function except in the
binary classification case, where one column is output for each class
(while decision_function outputs a 1d array). For binary and
multiclass predictions, each row should add to 1.
Like other methods, predict_proba should only be present when the
estimator can make probabilistic predictions (see duck typing). This
means that the presence of the method may depend on estimator
parameters (e.g. in linear_model.SGDClassifier) or training data
(e.g. in model_selection.GridSearchCV) and may only appear after
fitting.
I am looking for a way to get a fit_predict_proba method which has the same advantages of the fit_predict but that return probabilities

How to make user-defined callable weight function for sklearn knn?

I am trying to make custom weights for Sklearn KNN classifier, similar as here.
In documentation is just briefly mentioned that you can set custom weights as a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights. here.
How can I make that function for squared distance or linear weights?
I went trough countless pages of SO, but without any luck.
Is there a walkthrough or correct example?

neural nets - How can I associate a confidence to my loss function?

I am trying doing OCC (one class classification) using an autoencoder based neural network.
To make a long story short I train my neural network with 200 matrices each containing 128 dataelements. Those are then compressed (see autoencoder).
Once the training is done I pass a new matrix to my neural net (test data) and based on the loss function I know whether the data I passed to it belongs to the target class or not.
I would like to know how I can compute a classification confidence in % based on the loss function I obtain when passing test data.
Thanks
In case it helps I am using Tensorflow
Well actually normally you try to minimize your cost function (or in the case of one training observation your loss function). Normally the probability of the class you want to predict is not done using the loss function, but using a sigmoid output layer for example. You need a function that goes from 0 to 1 and that behaves like a probability. Where did you get the idea of using the loss function to evaluate your proabibility? But I am not an expert in one class classification (or outliers detection)... I guess you actually want the probability of your observation of not belonging to your class right?

Categories