KNN imputation of numerical variables in pipleine in Dataframe- Python

KNN imputation of numerical variables in pipleine in Dataframe- Python - python

All,
I'm facing an issue while trying to use KNN imputer in a pipleline. I have listed my workflow as below.
I have separated my Numerical and Categorical Variables and built a pipleline as below
numeric_transformer = Pipeline(steps=[
('imputer', KNN(k=3)),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, num_attr),
('cat', categorical_transformer, cat_attr)])
I want to use KNN imputer to impute the missing values in the numerical columns.
I ran logistic regression
clf_logreg = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression())])
clf_logreg.fit(X_train, Y_train)
The above code chunk worked fine but when i'm trying to predict on X_train, i'm getting the below error. Please help me out. Thanks
train_pred_logreg = clf_logreg.predict(X_train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-121-f17e49913947> in <module>
1 #train_pred_logreg = clf_logreg.predict(X_train)
----> 2 test_pred_logreg = clf_logreg.predict(X_test)
3
4 print(confusion_matrix(y_true=Y_train, y_pred = train_pred_logreg))
5
/opt/conda/lib/python3.6/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
114
115 # lambda, but not partial, allows help() to work with update_wrapper
--> 116 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
117 # update the docstring of the returned function
118 update_wrapper(out, self.fn)
/opt/conda/lib/python3.6/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
419 Xt = X
420 for _, name, transform in self._iter(with_final=False):
--> 421 Xt = transform.transform(Xt)
422 return self.steps[-1][-1].predict(Xt, **predict_params)
423
/opt/conda/lib/python3.6/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
537 'remainder keyword')
538
--> 539 Xs = self._fit_transform(X, None, _transform_one, fitted=True)
540 self._validate_output(Xs)
541
/opt/conda/lib/python3.6/site-packages/sklearn/compose/_column_transformer.py in _fit_transform(self, X, y, func, fitted)
418 message=self._log_message(name, idx, len(transformers)))
419 for idx, (name, trans, column, weight) in enumerate(
--> 420 self._iter(fitted=fitted, replace_strings=True), 1))
421 except ValueError as e:
422 if "Expected 2D array, got 1D array instead" in str(e):
/opt/conda/lib/python3.6/site-packages/joblib/parallel.py in __call__(self, iterable)
919 # remaining jobs.
920 self._iterating = False
--> 921 if self.dispatch_one_batch(iterator):
922 self._iterating = self._original_iterator is not None
923
/opt/conda/lib/python3.6/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
/opt/conda/lib/python3.6/site-packages/joblib/parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
/opt/conda/lib/python3.6/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
/opt/conda/lib/python3.6/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
/opt/conda/lib/python3.6/site-packages/joblib/parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
/opt/conda/lib/python3.6/site-packages/joblib/parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
/opt/conda/lib/python3.6/site-packages/sklearn/pipeline.py in _transform_one(transformer, X, y, weight, **fit_params)
693
694 def _transform_one(transformer, X, y, weight, **fit_params):
--> 695 res = transformer.transform(X)
696 # if we have a weight for this transformer, multiply output
697 if weight is None:
/opt/conda/lib/python3.6/site-packages/sklearn/pipeline.py in _transform(self, X)
538 Xt = X
539 for _, _, transform in self._iter():
--> 540 Xt = transform.transform(Xt)
541 return Xt
542
/opt/conda/lib/python3.6/site-packages/fancyimpute/solver.py in transform(self, X, y)
223 "doesn't support inductive mode. Only %s.fit_transform is "
224 "supported at this time." % (
--> 225 self.__class__.__name__, self.__class__.__name__))
ValueError: KNN.transform not implemented! This imputation algorithm likely doesn't support inductive mode. Only KNN.fit_transform is supported at this time.
When I try to use fit_transform as shown in the error message I get the below error
clf_logreg.fit_transform(X_train, Y_train)
AttributeError: 'LogisticRegression' object has no attribute 'transform'

Try clf_logreg.preprocessor.fit_transform(X_train, Y_train) to test if KNN works. As per the logistic regression, it doesn't have a transform method as it's a classification method.

Related

TypeError, Hands on ML With Sci-Kit Learn Regression

Trying to work through the project in Chapter 2 of Hands-on ML with Sci-Kit Learn and Tensorflow, I am being given a type error when trying to run the data through a pipeline prior to building a model.
I keep getting a TypeError telling me that fit_transform() takes 2 positional arguments and yet 3 are given. Not sure what I am doing wrong as I am following along the best I can with the exercise. Please advise, let me know or if more information is needed as I tried to stick to the minimum amount of code required to generate the error sustained. Thanks for what insight you may be able to kindly provide.
Code is as follows
#Load the Data
import os
import tarfile
from six.moves import urllib
download_root = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
housing_path = os.path.join('datasets', 'housing')
housing_url = download_root + "datasets/housing/housing.tgz"
def fetch_housing_data(housing_url=housing_url, housing_path = housing_path):
if not os.path.isdir(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, 'housing.tgz')
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
# Pipeline Build
from sklearn.preprocessing import Imputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, LabelBinarizer
from sklearn.base import BaseEstimator, TransformerMixin
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, x, y=None):
return self
def transform(self, x):
return x[self.attribute_names].values
from sklearn.base import BaseEstimator, TransformerMixin
rooms_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6
class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
def __init__(self, add_bedrooms_per_room=True):
self.add_bedrooms_per_room = add_bedrooms_per_room
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
rooms_per_house = X[:, rooms_ix] / X[:, household_ix]
pop_per_house = X[:, population_ix] / X[:, household_ix]
if self.add_bedrooms_per_room:
bedrooms_per_room = X[:, bedrooms_ix]/X[:, rooms_ix]
return np.c_[X, rooms_per_house, pop_per_house,
bedrooms_per_room]
else:
return np.c_[X, rooms_per_house, pop_per_house]
num_attribs = list(house_num)
cat_attribs = ['ocean_proximity']
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
('imputer', Imputer(strategy='median')),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler())
])
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer())
])
from sklearn.compose import ColumnTransformer
full_pipeline = ColumnTransformer([
('num_pipeline', num_pipeline, num_attribs),
('cat_pipeline', cat_pipeline, cat_attribs),
])
Type Error encountered is as follows
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-50-925e65d2e69a> in <module>
----> 1 house_prep = full_pipeline.fit_transform(house)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
474 self._validate_remainder(X)
475
--> 476 result = self._fit_transform(X, y, _fit_transform_one)
477
478 if not result:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _fit_transform(self, X, y, func, fitted)
418 message=self._log_message(name, idx, len(transformers)))
419 for idx, (name, trans, column, weight) in enumerate(
--> 420 self._iter(fitted=fitted, replace_strings=True), 1))
421 except ValueError as e:
422 if "Expected 2D array, got 1D array instead" in str(e):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
922 self._iterating = self._original_iterator is not None
923
--> 924 while self.dispatch_one_batch(iterator):
925 pass
926
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
714 with _print_elapsed_time(message_clsname, message):
715 if hasattr(transformer, 'fit_transform'):
--> 716 res = transformer.fit_transform(X, y, **fit_params)
717 else:
718 res = transformer.fit(X, y, **fit_params).transform(X)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\pipeline.py in fit_transform(self, X, y, **fit_params)
391 return Xt
392 if hasattr(last_step, 'fit_transform'):
--> 393 return last_step.fit_transform(Xt, y, **fit_params)
394 else:
395 return last_step.fit(Xt, y, **fit_params).transform(Xt)
TypeError: fit_transform() takes 2 positional arguments but 3 were given

Code works sometimes but sometimes get TypeError: issubclass() when using imblearn's SMOTE

Trying to implement the code on here. The code was working fine then stopped, restarted updated and etc. Keep getting
TypeError: issubclass() arg 2 must be a class or tuple of classes
smote = SMOTE(random_state = 45)
X_train1, X_test1, y_train1, y_test1 = train_test_split(Xall, yall,
test_size = 0.3, random_state = 123)
# fit smote on training data
balanced_X1, balanced_y1 = smote.fit_sample(X_train1, y_train1)
# smote outputs numpy array therefore transformed to df
balanced_X1 = pd.DataFrame(data=balanced_X1, columns= X_train1.columns )
balanced_y1 = pd.DataFrame(data = balanced_y1,columns=['y'])
# hypertuning parameters; Create hyperparameter grid and fit
param_grid = {'penalty' : ['l1', 'l2'], 'C' : [0.001, 0.01, 0.1, 1, 10,
100]}
clf = GridSearchCV(LogisticRegression(random_state = 123,
),
param_grid,
cv=5)
best = clf.fit(balanced_X1, balanced_y1)
print('Best Penalty:', best.best_estimator_.get_params()['penalty'])
print('Best C:', best.best_estimator_.get_params()['C'])
I am just trying to run a grid search on said code but can't get past this error. PLEASE help
error message in full:
TypeError Traceeback (most recent call last)
~\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in
_fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,
fit_params, return_train_score, return_parameters, return_n_test_samples,
return_times, return_estimator, error_score)
513 else:
--> 514 estimator.fit(X_train, y_train, **fit_params)
515
~\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py in fit(self,
X, y, sample_weight)
1492 """
-> 1493 solver = _check_solver(self.solver, self.penalty, self.dual)
1494
~\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py in
_check_solver(solver, penalty, dual)
431 "Specify a solver to silence this warning.",
--> 432 FutureWarning)
433
TypeError: issubclass() arg 2 must be a class or tuple of classes
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-13-925cae965240> in <module>
6 param_grid,
7 cv=5)
----> 8 cv.fit(X_train1, y_train1)
9 print('Best Penalty:', cv.best_estimator_.get_params()['penalty'])
10 print('Best C:', cv.best_estimator_.get_params()['C'])
~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self,
X, y, groups, **fit_params)
685 return results
686
--> 687 self._run_search(evaluate_candidates)
688
689 # For multi-metric evaluation, store the best_index_,
best_params_ and
~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in
_run_search(self, evaluate_candidates)
1146 def _run_search(self, evaluate_candidates):
1147 """Search all candidates in param_grid"""
-> 1148 evaluate_candidates(ParameterGrid(self.param_grid))
1149
1150
~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in
evaluate_candidates(candidate_params)
664 for parameters, (train, test)
665 in product(candidate_params,
--> 666 cv.split(X, y, groups)))
667
668 if len(out) < 1:
~\Anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
919 # remaining jobs.
920 self._iterating = False
--> 921 if self.dispatch_one_batch(iterator):
922 self._iterating = self._original_iterator is not None
923
~\Anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self,
iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
~\Anaconda3\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
~\Anaconda3\lib\site-packages\joblib\_parallel_backends.py in
apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
~\Anaconda3\lib\site-packages\joblib\_parallel_backends.py in __init__(self,
batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
~\Anaconda3\lib\site-packages\joblib\parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\Anaconda3\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in
_fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters,
fit_params, return_train_score, return_parameters, return_n_test_samples,
return_times, return_estimator, error_score)
526 "raised or error_score=np.nan to adopt
the "
527 "behavior from version 0.22.",
--> 528 FutureWarning)
529 raise
530 elif isinstance(error_score, numbers.Number):
TypeError: issubclass() arg 2 must be a class or tuple of classes
UPDATE: I just ran this code without SMOTE who I thought was the likely culprit and it turns out that the issue lies somewhere in scikit learn.

Sklearn Pipeline/ColumnTransformer throwing ValueError: too many values to unpack (expected 2)

I'm trying to run sklearn ColumnTransformer for individual Pipelines. My dataframe is called listings_prepared. All features are float or int. The dataframe is clean--the only issue is a few missing values which are marked Nan, so SimpleImputer should handle it...
Sklearn provides documentation here on running pipelines through ColumnTransformer, which is what I've followed.
First, I create the pipelines using Pipeline:
num_pipeline = Pipeline([
('num_imputer', SimpleImputer(strategy='median')),
('num_scaler', StandardScaler()),
])
disc_pipeline = Pipeline([
('disc_imputer', SimpleImputer(strategy='most_frequent')),
('disc_scaler', StandardScaler(), disc_attribs),
])
cat_pipeline = Pipeline([
('cat_imputer', SimpleImputer(strategy='most_frequent')),
('cat_ohe', OneHotEncoder(categories='auto', drop='first',
sparse=False)),
])
amen_pipeline= Pipeline([
('amen_imputer', SimpleImputer(strategy='most_frequent')),
])
Then, I run them through ColumnTransformer:
listings_pipeline = ColumnTransformer([
('num', num_pipeline, num_attribs),
('disc', disc_pipeline, disc_attribs),
('cat', cat_pipeline, cat_attribs),
('amen', amen_pipeline, amen_attribs),
])
X_train = listings_pipeline.fit_transform(listings_explore)
Here's the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-511-9d5100fe0f5d> in <module>
12 ('amen', amen_pipeline, amen_attribs),
13 ])
---> 14 X_train = listings_pipeline.fit_transform(listings_explore_pipeline)
~/anaconda3/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in fit_transform(self, X, y)
466 self._validate_remainder(X)
467
--> 468 result = self._fit_transform(X, y, _fit_transform_one)
469
470 if not result:
~/anaconda3/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in _fit_transform(self, X, y, func, fitted)
410 message=self._log_message(name, idx, len(transformers)))
411 for idx, (name, trans, column, weight) in enumerate(
--> 412 self._iter(fitted=fitted, replace_strings=True), 1))
413 except ValueError as e:
414 if "Expected 2D array, got 1D array instead" in str(e):
~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
922 self._iterating = self._original_iterator is not None
923
--> 924 while self.dispatch_one_batch(iterator):
925 pass
926
~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
~/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
~/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
714 with _print_elapsed_time(message_clsname, message):
715 if hasattr(transformer, 'fit_transform'):
--> 716 res = transformer.fit_transform(X, y, **fit_params)
717 else:
718 res = transformer.fit(X, y, **fit_params).transform(X)
~/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
385 """
386 last_step = self._final_estimator
--> 387 Xt, fit_params = self._fit(X, y, **fit_params)
388 with _print_elapsed_time('Pipeline',
389 self._log_message(len(self.steps) - 1)):
~/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
270 fit_transform_one_cached = memory.cache(_fit_transform_one)
271
--> 272 fit_params_steps = {name: {} for name, step in self.steps
273 if step is not None}
274 for pname, pval in fit_params.items():
~/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in <dictcomp>(.0)
270 fit_transform_one_cached = memory.cache(_fit_transform_one)
271
--> 272 fit_params_steps = {name: {} for name, step in self.steps
273 if step is not None}
274 for pname, pval in fit_params.items():
ValueError: too many values to unpack (expected 2)
Why isn't this working?

TypeError: take(): argument 'index' (position 1) must be Tensor, not numpy.ndarray

I'm new to pytorch. I'm trying to do a cross validation, and I found the skorch library, which allow users to use sklearn functions with a torch model. So, I define a neural network class:
torch.manual_seed(42)
class Netcross(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(5,30)
self.sig1 = nn.Tanh()
#self.dout = nn.Dropout(0.2)
self.fc2 = nn.Linear(30,30)
self.sig2 = nn.Sigmoid()
self.out = nn.Linear(30, 1)
self.out_act = nn.Sigmoid()
#self.fc1.weight = torch.nn.Parameter(torch.rand(50,5))
def forward(self, x):
x = self.fc1(x)
x = self.sig1(x)
#x = self.dout(x)
x = self.fc2(x)
x = self.sig2(x)
x = self.out(x)
y = self.out_act(x)
return y
crossnet1 = NeuralNet(
Netcross,
max_epochs = 5,
criterion=torch.nn.BCELoss,
#user defined coeff.
callbacks = [epoch_acc, epoch_f1, epoch_phi],
optimizer=torch.optim.SGD,
optimizer__momentum=0.9,
lr=0.85,
)
inputs = Variable(x_traintensor)
labels = Variable(y_traintensor)
crossnet1.fit(inputs, labels)
so far everything is fine, the function returns credible results without any errors. The problem appears when I try to use the GridSearchCV function:
from sklearn.model_selection import GridSearchCV
param_grid = {'max_epochs':[5, 10, 20],
'lr': [0.1, 0.65, 0.8],
}
gs = GridSearchCV(estimator = crossnet1, param_grid = param_grid, refit = False, cv = 3, scoring = "accuracy")
gs.fit(inputs, labels)
I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-41-e1f3dbd9a2b0> in <module>
3 labels1 = torch.from_numpy(np.array(labels))
4
----> 5 gs.fit(inputs1, labels1)
~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
720 return results_container[0]
721
--> 722 self._run_search(evaluate_candidates)
723
724 results = results_container[0]
~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1189 def _run_search(self, evaluate_candidates):
1190 """Search all candidates in param_grid"""
-> 1191 evaluate_candidates(ParameterGrid(self.param_grid))
1192
1193
~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params)
709 for parameters, (train, test)
710 in product(candidate_params,
--> 711 cv.split(X, y, groups)))
712
713 all_candidate_params.extend(candidate_params)
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
915 # remaining jobs.
916 self._iterating = False
--> 917 if self.dispatch_one_batch(iterator):
918 self._iterating = self._original_iterator is not None
919
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)
516 start_time = time.time()
517
--> 518 X_train, y_train = _safe_split(estimator, X, y, train)
519 X_test, y_test = _safe_split(estimator, X, y, test, train)
520
~\Anaconda3\lib\site-packages\sklearn\utils\metaestimators.py in _safe_split(estimator, X, y, indices, train_indices)
201 X_subset = X[np.ix_(indices, train_indices)]
202 else:
--> 203 X_subset = safe_indexing(X, indices)
204
205 if y is not None:
~\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in safe_indexing(X, indices)
214 indices.dtype.kind == 'i'):
215 # This is often substantially faster than X[indices]
--> 216 return X.take(indices, axis=0)
217 else:
218 return X[indices]
TypeError: take(): argument 'index' (position 1) must be Tensor, not numpy.ndarray
What is wrong?

Change your input and labels to np.ndarray (see examples here).
Those will be casted to torch.Tensor when needed automatically by skorch.
All in all change your
inputs = Variable(x_traintensor)
labels = Variable(y_traintensor)
to:
inputs = x_traintensor.numpy() # assuming x is torch.Tensor
labels = y_traintensor.numpy() # assuming y is torch.Tensor
BTW. torch.Variable is deprecated, you should use torch.Tensor(data, requires_grad=True). In this case, inputs and labels do not need gradient, hence Variable is even more out of place.

scikit-learn FeatureUnion not working to combine text & numeric features

I'm trying to combine a textual column of movie plots I have in a dataset with a categorical column of each movie's rating (the MPAA rating - G, PG, PG-13, R; not an IMDb user's score). I'm using sklearn's FeatureUnion object, but I keep getting en error about the fit_transform method being called with too many named arguments. Here's my code:
# create training and testing sets
X_train, X_test, y_train, y_test = train_test_split(movie_ratings[['Genre', 'Plot']], pd.get_dummies(movie_ratings['Rated']), random_state=56)
''' create a processing pipeline and feature union '''
# create function transformers
get_genre_data = FunctionTransformer(lambda x: x['Genre'], validate=False)
get_plot_data = FunctionTransformer(lambda x: x['Plot'], validate=False)
# obtain the data
genres = get_genre_data.fit_transform(movie_ratings)
plots = get_plot_data.fit_transform(movie_ratings)
# # join the processing in a feature union
join_data_formats = FeatureUnion(
transformer_list = [
('genres', Pipeline([
('selector', get_genre_data),
('one_hot_encoder', LabelEncoder())
])),
('plots', Pipeline([
('selector', get_plot_data),
('count_vectorizer', CountVectorizer(tokenizer=nltk.tokenize)),
('tfidf_transformer', TfidfTransformer())
]))
]
)
# # instantiate a nested pipeline
pipeline = Pipeline([
('feature_union', join_data_formats),
('neural_network', MLPClassifier(alpha=0.01, hidden_layer_sizes=(100,), early_stopping=False, verbose=True))
])
# # fit the pipeline to the training data
pipeline.fit(X_train, y_train)
...and the error being thrown is:
34 # # fit the pipeline to the training data
---> 35 pipeline.fit(X_train, y_train)
...
TypeError: fit_transform() takes 2 positional arguments but 3 were given
Where am I going wrong? Thanks much for the help!
UPDATE: here's the full stack trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-171-f57d9b24a9c8> in <module>()
28 # print(y_test.shape)
29
---> 30 pipeline.fit(X_train, y_train)
31 y_pred = pipeline.predict(X_test)
32
~\Anaconda3\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
246 This estimator
247 """
--> 248 Xt, fit_params = self._fit(X, y, **fit_params)
249 if self._final_estimator is not None:
250 self._final_estimator.fit(Xt, y, **fit_params)
~\Anaconda3\lib\site-packages\sklearn\pipeline.py in _fit(self, X, y, **fit_params)
211 Xt, fitted_transformer = fit_transform_one_cached(
212 cloned_transformer, None, Xt, y,
--> 213 **fit_params_steps[name])
214 # Replace the transformer of the step with the fitted
215 # transformer. This is necessary when loading the transformer
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\memory.py in __call__(self, *args, **kwargs)
360
361 def __call__(self, *args, **kwargs):
--> 362 return self.func(*args, **kwargs)
363
364 def call_and_shelve(self, *args, **kwargs):
~\Anaconda3\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, weight, X, y, **fit_params)
579 **fit_params):
580 if hasattr(transformer, 'fit_transform'):
--> 581 res = transformer.fit_transform(X, y, **fit_params)
582 else:
583 res = transformer.fit(X, y, **fit_params).transform(X)
~\Anaconda3\lib\site-packages\sklearn\pipeline.py in fit_transform(self, X, y, **fit_params)
737 delayed(_fit_transform_one)(trans, weight, X, y,
738 **fit_params)
--> 739 for name, trans, weight in self._iter())
740
741 if not result:
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)
623 return False
624 else:
--> 625 self._dispatch(tasks)
626 return True
627
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch)
586 dispatch_timestamp = time.time()
587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588 job = self._backend.apply_async(batch, callback=cb)
589 self._jobs.append(job)
590
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self, func, callback)
109 def apply_async(self, func, callback=None):
110 """Schedule a func to be run"""
--> 111 result = ImmediateResult(func)
112 if callback:
113 callback(result)
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self, batch)
330 # Don't delay the application, to avoid keeping the input
331 # arguments in memory
--> 332 self.results = batch()
333
334 def get(self):
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
~\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in <listcomp>(.0)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
~\Anaconda3\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, weight, X, y, **fit_params)
579 **fit_params):
580 if hasattr(transformer, 'fit_transform'):
--> 581 res = transformer.fit_transform(X, y, **fit_params)
582 else:
583 res = transformer.fit(X, y, **fit_params).transform(X)
~\Anaconda3\lib\site-packages\sklearn\pipeline.py in fit_transform(self, X, y, **fit_params)
281 Xt, fit_params = self._fit(X, y, **fit_params)
282 if hasattr(last_step, 'fit_transform'):
--> 283 return last_step.fit_transform(Xt, y, **fit_params)
284 elif last_step is None:
285 return Xt
TypeError: fit_transform() takes 2 positional arguments but 3 were given

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

KNN imputation of numerical variables in pipleine in Dataframe- Python - python

Try clf_logreg.preprocessor.fit_transform(X_train, Y_train) to test if KNN works. As per the logistic regression, it doesn't have a transform method as it's a classification method.

Related

TypeError, Hands on ML With Sci-Kit Learn Regression

Code works sometimes but sometimes get TypeError: issubclass() when using imblearn's SMOTE

Sklearn Pipeline/ColumnTransformer throwing ValueError: too many values to unpack (expected 2)

TypeError: take(): argument 'index' (position 1) must be Tensor, not numpy.ndarray

scikit-learn FeatureUnion not working to combine text & numeric features

Categories

Resources