How to apply Undersampling or oversampling to a dataset in Python? - python

Here's the thing, I have an imbalanced data and I'm trying to use Undersampling.
Perhaps people don't have the solution to my error, but if this is the case, any alternative would be appreciated.
This is what I've done:
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=0)
X_train_resampled, y_train_resampled = rus.fit_sample(X_train, y_train)
However, I keep getting the error:
AttributeError: 'RandomUnderSampler' object has no attribute '_validate_data'
I saw this post RandomUnderSampler' object has no attribute 'fit_resample', but the answer didn't work. I upgraded the library, it didn't work. I also tried using fit_resample and I got the exact same error.
Any ideas on how to fix this error OR other way of applying Undersampling?
UPDATE: The whole error below (can't show the real data, privacy concerns)
Regarding the version: my Python is 3.7 and scikit-learn 0.23.1

Related

How to find the sklearn version of pickled model?

I have a pickled sklearn model, which I need to get to run. This model, however, is trained in unknown version of sklearn.
When I look up the model in debugger, I find that there is a bunch of strange tracebacks inside, instead of the keys you'd expect, for example:
decision_function -> 'RandomForestClassifier' object has no attribute 'decision_function'
fit_predict -> 'RandomForestClassifier' object has no attribute 'fit_predict'
score_samples -> 'RandomForestClassifier' object has no attribute 'score_samples'
How can I get this model to run? Does these error message hint you anything?
EDIT: The solution is to brute force search the sklearn version. In my case when I got to the correct major version, the error message pointed me to the correct minor version.
Just like #rickhg12hs suggested, the python -m pickletools your_pickled_model_file does the job!
The output is quite long, so I recommend using head:
python -m pickletools your_pickled_model_file | head -100
You can know the version of a pickled model after scikit-learn 0.18. Using
model.__getstate__()['_sklearn_version']
So at least you will know if it is pre 0.18 or newer.

How do I get Feature orders from xgboost pickle model

I am trying to get predictions using my xgboost pickle model using the new data but getting the error "ValueError: Feature shape mismatch"
The reason for this is I need to pass the feature names exactly in the same order model was built. For that I am using the following code but it's not working:-
feature_order= model_pkl.get_booster().feature_names
X_new = X_new[feature_order]
Y_static = model_pkl.predict(X_new)
feature_order is returning "None" it was working in the xgboost version 1.1.1. But Now I am using the xgboost version 1.4.2.
Can you guys please help me in getting the feature order from my xgboost pickle model? So that I can pass my data also in the same order and not run into the error?
Use:
feature_order= model_pkl.feature_names

Unable to save tensorflow model containing the DNNRegressor estimator

I am a beginner in Machine Learning, and I've got a small doubt!
I have been working on a machine learning code, where I am using tensorflow for predictions of future values. Now, the code uses a dataset which implements One Hot Encoding initially, then the OHE columns are combined and the estimator function is created. For the estimator model, I have used DNN Regressor. There is no Keras used anywhere in the code.
model=tf.estimator.DNNRegressor(hidden_units=[100,100,100],feature_columns=feature_columns,
optimizer=tf.optimizers.Adam(learning_rate=0.01),
activation_fn=tf.nn.relu )
Now, I tried using Pickle for saving. However, I get this error:
AttributeError: Can't pickle local object 'DNNRegressorV2.__init__.<locals>._model_fn'
I tried the same using joblib, but I got the following issue:
PicklingError: Can't pickle <function DNNRegressorV2.__init__.<locals>._model_fn at 0x00000175F1A0E948>: it's not found as tensorflow_estimator.python.estimator.canned.dnn.DNNRegressorV2.__init__.<locals>._model_fn
Following this, I tried this code
tf.keras.models.save_model( model, filepath, overwrite=True,
include_optimizer=True, save_format=None, signatures=None, options=None )
But I got the error:
AttributeError: 'DNNRegressorV2' object has no attribute 'built'
I have also tried other methods such as model.save(), model.to_json(), and also tried saving using API - none of them worked out.
Can someone help me out with the same?

Getting 'AttributeError: Can't get attribute 'DeprecationDict' on <module 'sklearn.utils.deprecation'' this error while executing ML project

AttributeError: Can't get attribute 'DeprecationDict' on
Showing error in model = pickle.load(open('rf_regression_model.pkl', 'rb')) this line.
You used a new version of sklearn to load a model which was trained by an old version of sklearn.
So, the options are:
Retrain the model with current version of sklearn if you have the
training script and data
Or fall back to the lower sklearn version
reported in the warning message
Try this
with open('rf_regression_model','rb') as f:
model=pickle.load(f)

AttributeError: 'hyperopt_estimator' object has no attribute '_best_preprocs'

I'm trying to optimize the performance of a sklearn.svm.LinearSVC using hyperopt-sklearn. I'm using the LinearSVC to predict image labels of the CIFAR-10 dataset, and am hoping through hyperopt I'll get improved performance.
However, despite multiple runs, when I run the following:
model = HyperoptEstimator(classifier=hpc.svc_linear('mySVC'),
max_evals=10,trial_timeout=10)
model.fit(x,y)
I get this error:
AttributeError: 'hyperopt_estimator' object has no attribute '_best_preprocs'
This is a bug, as noted in this open issue on the hyperopt-sklearn GitHub, but has no solution. Anyone been able to bypass this error?

Categories