Neural Network epoch and max_iter in scikit-learn

Neural Network epoch and max_iter in scikit-learn - python

What is the difference or relationship between the Neural Network (NN) epoch and the max_iter parameter in scikit-learn?
For instance, as it can be seen in the code, evaluating the NN model for max_iter from 1 up to 10000 and evaluating for each iteration the Mean Absolute Error can be seen as the epoch? See image/link below, please!
Thank you very much!
for i in range(1,10000,10):
clf = MLPRegressor(max_iter=i, solver='lbfgs', alpha=1e-6, activation='relu', # melhorou e muito o treino com relu
hidden_layer_sizes=hidden_layer_sizes, random_state=1)
clf.fit(X_train_scaled, y_train)
mae_B = cross_val_score(clf, X_train_scaled, y_train, scoring="neg_mean_absolute_error", cv=10)
print i, float(-mae_B.mean()), clf.score(X_train_scaled, y_train), clf.score(X_test_scaled, y_test)

max_iter is equivalent to maximum number of epochs you want the model get trained on. It is called as maximum because the learning could get stopped before reaching the maximum number of iterations as well based on other termination criteria - n_iter_no_change. Hence do not loop through with different max_iterations, try to tweak the tol and n_iter_no_change if you want to avoid the overfitting.
Try the following and set the reasonably enough epochs in max_iter and then play with n_iter_no_change and tol. Reference Doc
clf = MLPRegressor(max_iter=50, solver='lbfgs', alpha=1e-6, activation='relu',
hidden_layer_sizes=hidden_layer_sizes, random_state=1,
tol=1e-3, n_iter_no_change = 5)
clf.fit(X_train_scaled, y_train)
mae_B = cross_val_score(clf, X_train_scaled, y_train, scoring="neg_mean_absolute_error", cv=10)
print i, float(-mae_B.mean()), clf.score(X_train_scaled, y_train), clf.score(X_test_scaled, y_test)

Related

Accuracy of Model dropped gradually every loop

I have 5 different sets of data and want find evaluation metrics for every set using neural network regression. I realized that the R^2 is dropping every loop in order manner. I'm pretty sure there should be a point that I need to identify, but not sure where in my code.
My code:
for sensor, sensorlol, name in zip(sensors, sensorslol, names):
x_train, x_test, y_train, y_test = train_test_split(sensor, reference, test_size=0.2, random_state=42)
x_trainl, x_testl, y_trainl, y_testl = train_test_split(sensorlol, referencelol, test_size=0.2, random_state=42)
kf=KFold(7, shuffle=True, random_state=42)
ann=MLPRegressor(hidden_layer_sizes=int(node), activation='relu',
learning_rate='constant', learning_rate_init=initl, shuffle=False)
ann.fit(x_train, y_train)
m_predictionlol=cross_val_predict(ann, sensorlol, referencelol, cv=kf)
R2lol=r2_score(referencelol, m_predictionlol)
MAElol=mean_absolute_error(referencelol, m_predictionlol)
RMSElol=mean_squared_error(referencelol, m_predictionlol)
MBElol=np.mean(m_predictionlol-referencelol)
r_2lol.append(R2lol)
maelol.append(MAElol)
rmselol.append(RMSElol)
mbelol.append(MBElol)
sumref=np.sum(referencelol)
probref=referencelol/sumref
sumtest=np.sum(m_predictionlol)
probtest=m_predictionlol/sumtest
KLlol=sum(rel_entr(probtest,probref))
kllol.append(KLlol)
del m_predictionlol, sensorlol
dataframe1=pd.DataFrame(list(zip(lst, r_2lol, maelol, rmselol, mbelol,kllol)), columns=['Sensor', 'R^2', 'MAE', 'RMSE', 'MBE','KL'])
And results:
Sensor R^2 MAE RMSE MBE KL
0 I 0.803568 1.776084 5.702426 0.097944 0.044695
1 H 0.739653 2.013070 7.557870 0.102656 0.053525
2 L 0.722556 2.074596 8.054198 -0.143503 0.058237
3 G 0.696291 2.193398 8.816680 0.261528 0.062377
4 J 0.677972 2.251240 9.348475 -0.000313 0.068745
I have added random seed to reduce randomness of the model. I have deleted the model every loop and metric variables to avoid any overlay.

Why no ability to generalize, but loss seems good? (ANN, keras, high dimensional data)

I am using tensorflow and keras for a binary classification problem.
I have only a training set of 81 samples (Testsize 21), but ~1900 features. I know its too less samples and too many features, but its a biological problem (gene-expression data), so i have to deal with it.
My model looks like this: (using different neurons per layer, different number of hidden layers, regularization and dropout to deal with the high dimensional data)
model = Sequential()
model.add(Input((input_shape,)))
for i in range(num_hidden):
model.add(Dense(n_neurons, activation="relu",kernel_regularizer=keras.regularizers.l1_l2(l1_reg, l2_reg)))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation="sigmoid"))
ann_optimizer= keras.optimizers.Adam()
model.compile(loss="binary_crossentropy",
optimizer=ann_optimizer, metrics=['accuracy'])
I am using a 10 fold nested cross validation and grid search in the inner fold like this:
# fit and evaluate the model
# configure the inner cross-validation procedure (5 fold, 80 inner training dataset, 20 inner test dataset)
cv_inner = ShuffleSplit(n_splits=5, test_size=0.2, random_state=1)
# define the mode
ann = KerasRegressor(build_fn=regressionModel_sequential, input_shape=X_train.shape[1],
batch_size=batch_size)
# use pipeline as prevent from Leaky Preprocessing (StandardScaler on 80% inner-training dataset))
pipe = Pipeline(steps=[('scaler', StandardScaler()), ('ann', ann)])
# define the grid search of with inner cv to get good parameters
grid_search_result = GridSearchCV(
pipe, param_grid, n_jobs=-1, cv=cv_inner, refit=True, verbose=0)
#refit = True a final model with the entire inner-training dataset
# execute search
grid_search_result.fit(X_train, y_train, ann__verbose=0)
logger.info('>>>>> est=%.3f, params=%s' % (grid_search_result.best_score_, grid_search_result.best_params_))
# to get loss curve
ann_val = regressionModel_sequential(input_shape=X_train.shape[1],
n_neurons=grid_search_result.best_params_['ann__n_neurons'],
l1_reg=grid_search_result.best_params_['ann__l1_reg'],
l2_reg=grid_search_result.best_params_['ann__l2_reg'],
num_hidden=grid_search_result.best_params_['ann__num_hidden'],
dropout_rate=grid_search_result.best_params_['ann__dropout_rate'])
# Validation with outer 20 %
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
history = ann_val.fit(X_train, y_train, batch_size=batch_size, verbose=0,
validation_split=0.25, shuffle=True, epochs=grid_search_result.best_params_['ann__epochs'])
plot_history(history, directory, i)
# use best grid search reult for predicting on outer test dataset
y_predicted = ann_val.predict(X_test)
# print predicted
logger.info(y_predicted[:5])
logger.info(y_test[:5])
rmse = (np.sqrt(metrics.mean_squared_error(y_test, y_predicted)))
mae = (metrics.mean_squared_error(y_test, y_predicted))
r_squared = metrics.r2_score(y_test, y_predicted)
My loss seems good: loss
But accuracy is very bad. accuracy (example from one outer fold)
Does anyone have suggestions on what i could do to improve my results?
I also know that the biological question behind is very hard/maybe not possible to solve.

Difference in every result of network in every run?

I have written a simple neural network (MLP Regressor), to fit simple data frame columns. To have an optimum architecture, I also defined it as a function to see whether it is converging to a pattern. But every time that I run the model, it gives me a different result than the last time that I tried, and I do not know why? Due to the fact that it is fairly difficult to make the question reproducible, I can not post the data but I can post the architecture of the network here:
def MLP(): #After 50
nn=30
nl=25
a=2
s=0
learn=2
learn_in=4.22220046e-05
max_i=1000
return nn,nl,a,s,learn,learn_in,max_i#,
def process(df):
y = df.iloc[:,-1]
X = df.drop(columns=['col3'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=27)
return X_train, X_test, y_train, y_test
def MLPreg(x_train, y_train):#
nn,nl,a,s,learn,learn_in,max_i=MLP()#nl,
act=['identity', 'logistic', 'relu','tanh'] #'identity'=Linear
activ=act[a]
sol=['lbfgs', 'sgd', 'adam']
solv=sol[s]
l_r=['constant','invscaling','adaptive']
lr=l_r[learn]
model = MLPRegressor(hidden_layer_sizes=(nl,nn), activation=activ, solver=solv, alpha=0.00001, batch_size='auto',
learning_rate=lr, learning_rate_init=learn_in, power_t=0.5, max_iter=max_i, shuffle=True, random_state=None,
tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False,
validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)
model.fit(x_train, y_train)
return model
Even I have tried to keep all variables that make the model produce randomness Off but I receive different mse in every run.
The new model which is below is the simplest version.
def MLPreg(x_train, y_train):
model = MLPRegressor(hidden_layer_sizes=(100,),
activation='relu',
solver='adam',
alpha=0.0001,
batch_size='auto',
learning_rate='constant',
learning_rate_init=0.001,
power_t=0.4,
max_iter=100,)
model.fit(x_train, y_train)
return model
The first time mse:
2.6935335013259937e-05
2.7836293087120013e-05
7.218691932722961e-05
4.950603795598673e-05
4.743424330664441e-06
The second time mse:
3.6520542498579784e-06
1.151821946860996e-05
3.0840569586230768e-06
1.4008729128558944e-05
9.326142225670172e-06
And so on.

Looking at the documentation of MLPRegressor, these two arguments are important for reproducible results:
shuffle:bool, default=True
Whether to shuffle samples in each
iteration. Only used when solver=’sgd’ or ‘adam’.
Set shuffle=False to have the same behavior between runs.
random_state: int, RandomState instance, default=None
Determines
random number generation for weights and bias initialization,
train-test split if early stopping is used, and batch sampling when
solver=’sgd’ or ‘adam’. Pass an int for reproducible results across
multiple function calls.
Follow these instructions and set an int value for this argument. Neither of your code examples currently does that (one sets it to None, which is the non-reproducible default, the other omits it).

Getting a 100% Training Accuracy, but 60% Testing accuracy

I am trying different classifiers with different parameters and stuff on a dataset provided to us as part of a course project. We have to try and get the best performance on the dataset. The dataset is actually a reduced version of the online news popularity
I have tried the SVM, Random Forest, SVM with cross-validation with k = 5 and they all seem to give approximately 100% training accuracy, while the testing accuracy is between 60-70. I think the testing accuracy is fine, but the training accuracy bothers me.
I would say maybe it was a case of overfitting data but none of my classmates seem to be getting similar results so maybe the problem is with my code.
Here is the code for my cross-validation and random forest classifier. I would be very grateful if you help me find out why I am getting such a high Training accuracy
def crossValidation(X_train, X_test, y_train, y_test, numSplits):
skf = StratifiedKFold(n_splits=5, shuffle=True)
Cs = np.logspace(-3, 3, 10)
gammas = np.logspace(-3, 3, 10)
ACC = np.zeros((10, 10))
DEV = np.zeros((10, 10))
for i, gamma in enumerate(gammas):
for j, C in enumerate(Cs):
acc = []
for train_index, dev_index in skf.split(X_train, y_train):
X_cv_train, X_cv_dev = X_train[train_index], X_train[dev_index]
y_cv_train, y_cv_dev = y_train[train_index], y_train[dev_index]
clf = SVC(C=C, kernel='rbf', gamma=gamma, )
clf.fit(X_cv_train, y_cv_train)
acc.append(accuracy_score(y_cv_dev, clf.predict(X_cv_dev)))
ACC[i, j] = np.mean(acc)
DEV[i, j] = np.std(acc)
i, j = np.argwhere(ACC == np.max(ACC))[0]
clf1 = SVC(C=Cs[j], kernel='rbf', gamma=gammas[i], decision_function_shape='ovr')
clf1.fit(X_train, y_train)
y_predict_train = clf1.predict(X_train)
y_pred_test = clf1.predict(X_test)
print("Train Accuracy :: ", accuracy_score(y_train, y_predict_train))
print("Test Accuracy :: ", accuracy_score(y_test, y_pred_test))
def randomForestClassifier(X_train, X_test, y_train, y_test):
"""
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
y_predict_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
print("Train Accuracy :: ", accuracy_score(y_train, y_predict_train))
print("Test Accuracy :: ", accuracy_score(y_test, y_pred_test))

There are two issues about the problem, training accuracy and testing accuracy are significantly different.
Different distribution of training data and testing data.(because of selecting a part of the dataset)
Overfitting of the model to the training data.
Since you apply cross-validation, it seems that you should think about another solution.
I recommend that you apply some feature selection or feature reduction (like PCA) approaches to tackle the overfitting problem.

Make a graph of accuracy by epoch with mlpclassifier in scikit-learn

I would like to test the accuracy by epoch in scikit-learn. However, so far, i have been unsuccessful.
This is my code part of classifying with mlpclassifier:
NUM_EPOCHS = 1000
LOG_FOR_EVERY = 10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = MLPClassifier(hidden_layer_sizes=(18, 175, 256), batch_size=528,
learning_rate_init=0.0001, beta_1=0.001,
beta_2=0.001, max_iter=1, warm_start=True)
for i in range(NUM_EPOCHS):
clf.fit(X_train, y_train.ravel())
I have also made a graph with this result but i need to make it continuous and further increase accuracy.
Why is the accuracy not increasing?

Please try with :
clf.fit(X_train, y_train.ravel(), epochs=NUM_EPOCHS)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Neural Network epoch and max_iter in scikit-learn - python

Related

Accuracy of Model dropped gradually every loop

Why no ability to generalize, but loss seems good? (ANN, keras, high dimensional data)

Difference in every result of network in every run?

Getting a 100% Training Accuracy, but 60% Testing accuracy

Make a graph of accuracy by epoch with mlpclassifier in scikit-learn

Categories

Resources