Multi target regression and GridSearchCV - python

Let's say I have the following pipeline : GridSearchCV(MultiOutputRegressor(Regressor)).
I am training a model on multiple Target using the MultiOutputRegressor.
How does GridSearchCV operate when it comes to optimizing hyperparameter ?
Does it find optimal hyperparameter for each target individually or in average the best ?

Related

How to access the predictions made in scikit-learn's GridSearchCV-function?

I'm using scikit-learn's GridSearchCV to implement hyperparameter tuning for a classifier model. As I've understood from the documentation of GridSearchCV, you can query for attributes such as best estimator, best score, et cetera, but I would be interested in getting the predicted y-class labels which were used to calculate the best score attribute in GridSearchCV.
Is there a way to access these predictions?

Outlier-Detection in Scikit-learn( Isolation Forest) in a pipeline

I have encountered the problem, as I can't use the Isolation Forest algorithm in the Sklearn pipeline. I am trying to predict the credit card default using the Kaggle Credit Card Fraud Detection dataset. I am trying to fix everything after data partitioning in order to avoid data leakage. (By using pipelines for every cross-validation as I get an almost 100% F1-score using Logistic Regression in K-fold cross-validation without using pipelines) Most of the machine learning algorithms can be used (Logistic Regression, Random Forest Classifier, etc) but not for some anomaly detection algorithms such as IsolationForest. I wondered how can I fit these anomaly detection algorithms inside the Pipelines. Thanks.
Some details for X and Y (Y- 0 as a normal transaction, 1 as fraudulent transaction)
pipe =Pipeline([
('sc', StandardScaler()),
('smote', SMOTE()),
('IF', IsolationForest())
])
print(cross_val_score(pipe, X,Y, scoring='f1_weighted' ,cv=5))
# Result: [3.01179163e-06 3.53204982e-06 6.55363495e-06 3.51940600e-06 4.52981524e-06]
Without further information, I would guess that your Pipeline import is from sklearn.pipelines. Just replace it with:
from imblearn.pipeline import Pipeline
For further information this helped me.

Does the refit option in gridsearchcv re-select features?

I'm using gridsearchcv to train a logistic regression classifier. What I want to know is whether the refit command re-selects features based on chosen hyper-parameter C, OR simply uses features selected in the cross-validation procedures and only re-fits the value of coefficients without re-selection of features?
As per the documentation of GridSearchCV :
1. Refit an estimator using the best found parameters on the whole dataset.
2. The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance.
From here Confused with repect to working of GridSearchCV you can get below significance of refit parameter.
refit : boolean
Refit the best estimator with the entire dataset.
If “False”, it is impossible to make predictions using
this GridSearchCV instance after fitting.

Online version of Ridge Regression Classifier in ski-kitlearn?

I'm trying a range of Online classifiers in the ski-kitlearn library to train a model from huge data. I found there are many classifiers supporting the partial_fit allowing for incremental learning. I want to use the Ridge Regression classifier in this setting, but could not find it in the implementation. Is there an alternative model that can do this in sklearn?
sklearn.linear_model.SGDClassifier, its loss function contain ‘hinge’, ‘log’, ‘modified_huber’, ‘squared_hinge’, ‘perceptron’
sklearn.linear_model.SGDRegressor, its default loss function is squared_loss, The possible values are ‘squared_loss’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’

Scikit pipeline with different datasets

I am wondering if it is possible to use pipeline in Scikit-Learn in the following way:
I want to train a model on dataset A and then make predictions with the same model but on dataset B. Then, like this, I can use GridSearch to search for the best parameters on pipeline using prediction of dataset B as a measure.
I know how to write a normal pipeline and use it with GridSearch, but I can't see how I can work with two datasets.

Categories