I'm trying to use RandomUnderSampler. I have correctly installed the imblearn module. But still getting the error: "Name 'RandomUnderSampler" is not defined`. Any specific reason for this? Can someone please help
from imblearn.under_sampling import RandomUnderSampler
#Random under-sampling and over-sampling with imbalanced-learn
def random_under_sampling(X,Y):
rus = RandomUnderSampler(return_indices=True)
X_rus, y_rus, id_rus = rus.fit_sample(X, Y)
print('Removed indexes:', id_rus)
plot_2d_space(X_rus, y_rus, 'Random under-sampling')
The actual method name
This is where I called my method
Since it seems that you are using IPython it is important that you execute first the line importing imblearn library (e.g. Ctrl-Enter):
from imblearn.under_sampling import RandomUnderSampler
After that the module should get imported and the name of the function is going to be defined.
If this does not work, could you reload the notebook and execute all the statements up until the random_under_sampling function to ensure nothing was missed?
Related
I already referred the posts here,here,and here. So, don't mark it as duplicate
I am trying to execute a tutorial as provided here (binary classification of breast cancer)
When I execute the below piece of code, I get an error as shown below
explainer = lime_tabular.LimeTabularExplainer(X_train, mode="classification",
class_names=breast_cancer.target_names,
feature_names=breast_cancer.feature_names,
)
explainer
NameError: name 'lime_tabular' is not defined
But my code already has the below import statements
import lime
import lime.lime_tabular
What is causing this issue?
You are not giving a name to the imported resource.
You can either use lime.lime_tabular when you are calling it on the code,
or change the second line of import to from lime import lime_tabular
The second approach would be the one I prefer when I code.
I am trying to import 'jaccard_similarity_score' from 'sklearn' package. But unable to do so. Upon running the cell in Jupyter Notebook, I get an error. I tried restarting the kernel (as mentioned in one of the posts of stackoverflow) but that didn't work for me. I've attached the the screenshot of the error:
Any help is appreciated. Thanks in advance.
In the last version of sklearn, this function is renamed as 'jaccard_score'.
importing has changed due to recent updates.
Instead of writing :
from sklearn.metrics import jaccard_similarity_score
you should write : from sklearn.metrics import jaccard_score
note: new parameter pos_label is required, for example:
jaccard_score(y_test, dt_yhat,pos_label = "PAIDOFF")
Valid labels for pos_label are: array(['COLLECTION', 'PAIDOFF'], dtype='<U10')
References :
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.jaccard_score.html#sklearn.metrics.jaccard_score
https://github.com/DiamondLightSource/SuRVoS/issues/103#issuecomment-731122304
Instead of importing jaccard_similarity_score, you should import jaccard_score. Keep in mind that jaccard_score needs another parameter in the arguments passed, which is pos_label.
Detailed Image
This is the code below which shows the error.
from imblearn.under_sampling import NearMiss
nm = NearMiss()
X_res,y_res=nm.fit_sample(X,Y)
You are probably trying to under sample your imbalanced dataset. For this purpose, you can use RandomUnderSampler instead of NearMiss.
Try the following code:
from imblearn.under_sampling import RandomUnderSampler
under_sampler = RandomUnderSampler()
X_res, y_res = under_sampler.fit_resample(X, y)
Now, your dataset is balanced. You can verify it using y_res.value_counts().
Cheers!
Instead of "imblearn" package my conda installed a package named "imbalanced-learn" that's why it does not take the data. But it is strange that the jupyter notebook doesn't tell me that "imblearn" isn't installed.
I'm not sure why the model isn't defined
Taken from here
https://github.com/DariusAf/MesoNet/blob/master/example.py
Code:
from classifiers import *
from pipeline import *
from keras.preprocessing.image import ImageDataGenerator
classifier = Meso4()
classifier.load('Meso4_DF')
gives error:
classifier = Meso4()
NameError: name 'Meso4' is not defined
The reason for this is that Meso4 is defined in classifiers.py, as you can see here.
Strictly speaking, your problem would be solved by also downloading the classifiers.py file and putting it in the same directory as your example.py file.
However, you should, in general, refrain from copy-pasting code from GitHub unless you know what you are doing, and if you need to wonder if you do, you don't.
Therefore, I recommend actually cloning the repo and working from the local copy.
I have built a custom sklearn pipeline, as follows:
pipeline = make_pipeline(
SelectColumnsTransfomer(features_to_use),
ToDummiesTransformer('feature_0', prefix='feat_0', drop_first=True, dtype=bool), # Dummify customer_type
ToDummiesTransformer('feature_1', prefix='feat_1'), # Dummify the feature
ToDummiesTransformer('feature_2', prefix='feat_2'), # Dummify
ToDummiesTransformer('feature_3', prefix='feat_3'), # Dummify
)
pipeline.fit(df)
The classes SelectColumnsTransfomer and ToDummiesTransformer are custom sklearn steps implementing BaseEstimator and TransformerMixin.
To serialise this object I use
from sklearn.externals import joblib
joblib.dump(pipeline, 'data_pipeline.joblib')
but when I do deserialise with
pipeline = joblib.load('data_pipeline.joblib')
I get AttributeError: module '__main__' has no attribute 'SelectColumnsTransfomer'.
I have read other similar questions and followed the instruction in this blogpost here, but couldn't solve the issue.
I am copying pasting the classes, and importing them in the code. If i create a simplified version of this exercise, the whole thing works, the problem occurs because i am running some tests with pytest, and when i run pytest it seems it doesn't see my custom classes, in fact there is this other part of the error
self = <sklearn.externals.joblib.numpy_pickle.NumpyUnpickler object at 0x7f821508a588>, module = '__main__', name = 'SelectColumnsTransfomer' which is hinting me that the NumpyUnpickler doesn't see the SelectColumnsTransfomer even if in the test it is imported.
My test code
import pytest
from app.pipeline import * # the pipeline objects
# SelectColumnsTransfomer and ToDummiesTransformer
# are here!
#pytest.fixture(scope="module")
def clf():
pipeline = joblib.load("persistence/data_pipeline.joblib")
return clf
def test_fake(clf):
assert True
I had the same error message when I was trying to save a Pytorch class like this:
import torch.nn as nn
class custom(nn.Module):
def __init__(self):
super(custom, self).__init__()
print("Class loaded")
model = custom()
And then using Joblib to dump this model like so:
from joblib import dump
dump(model, 'some_filepath.jobjib')
The issue was I was running the code above in a Kaggle kernel. And then downloading the dumped file and trying to load it with this script locally:
from joblib import load
model = load(model, 'some_filepath.jobjib')
The way I fixed the issue was to run all of these code snippets locally on my computer instead of creating the class and dumping it on Kaggle, but loading it on my local machine. Wanted to add this here because the comments on the answer by #DarioB confused me in their reference to a 'function' which didn't apply in my simpler case.
I had a similar issue with sklearn and complex pipelines.
I used cloudpickle 2.0.0 /py3.10 (instead of pickle or joblib) to dump the model and then load it with joblib without error.
Hope it could help.
Note: the model was dump from a jupyter notebook and load inside a python script.