k-fold cross validation using tensorflow

k-fold cross validation using tensorflow - python

I have created an artificial neural network. I am trying to calculate the accuracy of the model using k-fold cross validation technique but after compiling the last line its not progressing any further, its stuck there for more than 20 mins. I am not able to figure out where I am going wrong. Can anyone please help me with this thing? Below is the code I have used.
Thanks in advance.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X=X[:,1:]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential #required to initialize ann
from keras.layers import Dense #required to build the layers of ann
def build_classifier():
classifier=Sequential()
classifier.add(Dense(kernel_initializer="uniform", activation="relu", input_dim=11, units=6))
classifier.add(Dense(kernel_initializer="uniform", activation="relu", units=6))
classifier.add(Dense(kernel_initializer="uniform", activation="sigmoid",units=1))
classifier.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
return classifier
classifier=KerasClassifier(build_fn=build_classifier, batch_size=10, nb_epoch=100)
accuracies=cross_val_score(estimator=classifier,X=X_train,y=y_train,cv=10,n_jobs=-1)

I had the same issue with the exact same code. It seems Windows has an issue with "n_jobs", if you remove it by "accuracies = .." , it will start working. It's just that it could take long but it will work and show each epoch being updated.

Related

The first argument to `Layer.call` must always be passed

import os
from pylab import rcParams
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns; sns.set()
from numpy import *
from scipy import stats
from pandas.plotting import scatter_matrix
import sklearn
import warnings
from imblearn.over_sampling import SMOTE
import tensorflow as tf
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
data = pd.read_excel(r'Attrition Data Exercise.xlsx')
X = data.iloc[:, 3:-1].values
y = data.iloc[:, -1].values
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder
ct = ColumnTransformer(transformers=
[('one_encoder', OneHotEncoder(), [2, 5, 11, 13, 28]),
('ord_encoder', OrdinalEncoder(), [0])],
remainder='passthrough')
X = np.array(ct.fit_transform(X))
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dropout(rate=0.3))
ann.add(tf.keras.layers.Dense(units=6, activation='relu', kernel_regularizer='l1', bias_regularizer='l2'))
ann.add(tf.keras.layers.Dropout(rate=0.3))
ann.add(tf.keras.layers.Dense(units=3, activation='relu', kernel_regularizer='l1', bias_regularizer='l2'))
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
opt = tf.keras.optimizers.Adam(
learning_rate=0.001,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08)
ann.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy', tf.keras.metrics.Recall()])
The above code runs successfully. It's when I run the below code in a cell that it causes an error.
pipe = Pipeline([('smt', SMOTE()), ('model', KerasClassifier(build_fn = ann, verbose = 0, epochs=170))])
weights = np.linspace(0.5, 0.5, 1)
gsc = GridSearchCV(
estimator = pipe,
param_grid = {
'smt__sampling_strategy' : weights
},
scoring = 'f1',
cv = 4)
grid_result = gsc.fit(X_train, y_train)
The code above results in the following error:
ValueError: The first argument to `Layer.call` must always be passed
Any idea what I might be doing wrong or what can be improved?
I tried replacing KerasClassifier with KerasRegressor too just to see if something changes but nothing did. What essentially is going wrong?
I'm trying to use the Pipeline class from imblearn and GridSearchCV to get the best parameters for classifying the imbalanced dataset, I want to leave out resampling of the validation set and only resample the training set, which imblearn's Pipeline seems to be doing. However, I'm getting an error while implementing the accepted solution
Also link to the screenshot to the error trace is attached.Error Trace Complete

#danr got it correct. Many thanks to him. I was getting the same error when using KerasClassifier with sklearn's cross_val_score. Adding the lambda after build_fn solved the problem. I had a function create_model that created a keras Sequential model. Corrected code that runs smoothly (tensorflow 2.4.1):
from sklearn.model_selection import cross_val_score
# Create a KerasClassifier using best params determined using RandomizedSearchCV above
model = KerasClassifier(build_fn = lambda: create_model(learning_rate = 0.01, activation = 'tanh'), epochs = 50, batch_size = 32, verbose = 0)
# Calculate the accuracy score for each fold
kfolds = cross_val_score(model, X, y, cv = 3)

What does the error mean and how to fix it - "ValueError: query data dimension must match training data dimension"

I am trying to write the code for K-NN
Below is my code. - I know that issue is in `predict() but I am not able to figure out how o fix it.
# Importing the libraries
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('UniversalBank.csv')
X = dataset.iloc[:,[ 1,2,3,5,6,7,8,10,11,12,13]].values #,
y = dataset.iloc[:,9].values
#Splitting the dataset to training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state= 0)
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Fitting the classifier to training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train,y_train)
#Predicting the test results
y_pred = classifier.predict(X_test)

Trouble importing Keras

Here is the complete code
top part runs fine till i import keras.
I have tried installing and uninstalling keras, however the error is still there
Classification template
# Importing the libraries
import numpy as my
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
#Removing 1 Dummy Variable to avoid Dummy Variable Trap
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Part 2: Let's make the ANN
#Importing the keras library
import keras.backend
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()

AttributeError: module 'tensorflow.python.keras.backend' has no attribute 'get_graph'
Solution (as found in comments) was to install keras version 2.2.4
e.g:
pip install 'keras==2.2.4'
if you are above that version, you may try using this function instead:
keras.backend.image_data_format()

ValueError in DL

I get this error , post choosing the epochs: ValueError: Input arrays should have the same number of samples as target arrays. Found 5516 input samples and 12870 target samples. Any suggestions are welcome. Thanks in advance
Im using a dataset with a lot of categorical variables and they add up to 95 after creating the dummy variables, the code until I choose the number of epochs runs flawlessly and then I get this error, what is the reason for this error, its important, one, I could use it in the future and 2, Im unable to proceed :)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('zrpl_data.csv')
X = dataset.iloc[:, 0:6].values
y = dataset.iloc[:, 6].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])
labelencoder_X_3 = LabelEncoder()
X[:, 2] = labelencoder_X_1.fit_transform(X[:, 2])
labelencoder_X_4 = LabelEncoder()
X[:, 3] = labelencoder_X_1.fit_transform(X[:, 3])
labelencoder_X_5 = LabelEncoder()
X[:, 4] = labelencoder_X_1.fit_transform(X[:, 4])
onehotencoder = OneHotEncoder(categorical_features = [0,1,2,3,4])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,
random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
import keras
from keras.models import Sequential
from keras.layers import Dense
classifier = Sequential()
classifier.add(Dense(output_dim=47,
init='uniform',activation='relu',input_dim=95))
classifier.add(Dense(output_dim=47, init='uniform',activation='relu'))
classifier.add(Dense(output_dim=1,
init='uniform',activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit (X_train, y_train, batch_size=10,nb_epoch=100)

You have 5516 feature samples and 12870 target samples (you should have equal), before train the model double check their dimensions.

Error pickling scikit-learn model

I am not able to pickle my model below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
script_dir = os.path.dirname(__file__)
abs_file_path = os.path.join(script_dir, 'Churn_Modelling.csv')
# Importing the dataset
dataset = pd.read_csv(abs_file_path)
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features=[1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Part 2 - Now let's make the ANN!
# Importing the Keras libraries and packages
from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras import backend
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))
# Adding the second hidden layer
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
# Adding the output layer
classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
# Compiling the ANN
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size=10, epochs=100, validation_split=0.1)
# Part 3 - Making predictions and evaluating the model
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Predicting a single new observation
new_prediction = classifier.predict(sc.transform(np.array([[0.0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])))
new_prediction = (new_prediction > 0.5)
I have tried using
from sklearn.externals import joblib
joblib.dump(classifier, 'model.pkl')
and
import pickle
with open('classifier.pkl', 'wb') as fid:
pickle.dump(classifier, fid,2)
for both, I am getting PicklingError: Can't pickle : attribute lookup module on builtins failed
What am I doing wrong? Your insights are much appriciated.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

k-fold cross validation using tensorflow - python

I had the same issue with the exact same code. It seems Windows has an issue with "n_jobs", if you remove it by "accuracies = .." , it will start working. It's just that it could take long but it will work and show each epoch being updated.

Related

The first argument to `Layer.call` must always be passed

What does the error mean and how to fix it - "ValueError: query data dimension must match training data dimension"

Trouble importing Keras

ValueError in DL

Error pickling scikit-learn model

Categories

Resources