I have written a simple neural network (MLP Regressor), to fit simple data frame columns. To have an optimum architecture, I also defined it as a function to see whether it is converging to a pattern. But every time that I run the model, it gives me a different result than the last time that I tried, and I do not know why? Due to the fact that it is fairly difficult to make the question reproducible, I can not post the data but I can post the architecture of the network here:
def MLP(): #After 50
nn=30
nl=25
a=2
s=0
learn=2
learn_in=4.22220046e-05
max_i=1000
return nn,nl,a,s,learn,learn_in,max_i#,
def process(df):
y = df.iloc[:,-1]
X = df.drop(columns=['col3'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=27)
return X_train, X_test, y_train, y_test
def MLPreg(x_train, y_train):#
nn,nl,a,s,learn,learn_in,max_i=MLP()#nl,
act=['identity', 'logistic', 'relu','tanh'] #'identity'=Linear
activ=act[a]
sol=['lbfgs', 'sgd', 'adam']
solv=sol[s]
l_r=['constant','invscaling','adaptive']
lr=l_r[learn]
model = MLPRegressor(hidden_layer_sizes=(nl,nn), activation=activ, solver=solv, alpha=0.00001, batch_size='auto',
learning_rate=lr, learning_rate_init=learn_in, power_t=0.5, max_iter=max_i, shuffle=True, random_state=None,
tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False,
validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)
model.fit(x_train, y_train)
return model
Even I have tried to keep all variables that make the model produce randomness Off but I receive different mse in every run.
The new model which is below is the simplest version.
def MLPreg(x_train, y_train):
model = MLPRegressor(hidden_layer_sizes=(100,),
activation='relu',
solver='adam',
alpha=0.0001,
batch_size='auto',
learning_rate='constant',
learning_rate_init=0.001,
power_t=0.4,
max_iter=100,)
model.fit(x_train, y_train)
return model
The first time mse:
2.6935335013259937e-05
2.7836293087120013e-05
7.218691932722961e-05
4.950603795598673e-05
4.743424330664441e-06
The second time mse:
3.6520542498579784e-06
1.151821946860996e-05
3.0840569586230768e-06
1.4008729128558944e-05
9.326142225670172e-06
And so on.
Looking at the documentation of MLPRegressor, these two arguments are important for reproducible results:
shuffle:bool, default=True
Whether to shuffle samples in each
iteration. Only used when solver=’sgd’ or ‘adam’.
Set shuffle=False to have the same behavior between runs.
random_state: int, RandomState instance, default=None
Determines
random number generation for weights and bias initialization,
train-test split if early stopping is used, and batch sampling when
solver=’sgd’ or ‘adam’. Pass an int for reproducible results across
multiple function calls.
Follow these instructions and set an int value for this argument. Neither of your code examples currently does that (one sets it to None, which is the non-reproducible default, the other omits it).
Related
I have 5 different sets of data and want find evaluation metrics for every set using neural network regression. I realized that the R^2 is dropping every loop in order manner. I'm pretty sure there should be a point that I need to identify, but not sure where in my code.
My code:
for sensor, sensorlol, name in zip(sensors, sensorslol, names):
x_train, x_test, y_train, y_test = train_test_split(sensor, reference, test_size=0.2, random_state=42)
x_trainl, x_testl, y_trainl, y_testl = train_test_split(sensorlol, referencelol, test_size=0.2, random_state=42)
kf=KFold(7, shuffle=True, random_state=42)
ann=MLPRegressor(hidden_layer_sizes=int(node), activation='relu',
learning_rate='constant', learning_rate_init=initl, shuffle=False)
ann.fit(x_train, y_train)
m_predictionlol=cross_val_predict(ann, sensorlol, referencelol, cv=kf)
R2lol=r2_score(referencelol, m_predictionlol)
MAElol=mean_absolute_error(referencelol, m_predictionlol)
RMSElol=mean_squared_error(referencelol, m_predictionlol)
MBElol=np.mean(m_predictionlol-referencelol)
r_2lol.append(R2lol)
maelol.append(MAElol)
rmselol.append(RMSElol)
mbelol.append(MBElol)
sumref=np.sum(referencelol)
probref=referencelol/sumref
sumtest=np.sum(m_predictionlol)
probtest=m_predictionlol/sumtest
KLlol=sum(rel_entr(probtest,probref))
kllol.append(KLlol)
del m_predictionlol, sensorlol
dataframe1=pd.DataFrame(list(zip(lst, r_2lol, maelol, rmselol, mbelol,kllol)), columns=['Sensor', 'R^2', 'MAE', 'RMSE', 'MBE','KL'])
And results:
Sensor R^2 MAE RMSE MBE KL
0 I 0.803568 1.776084 5.702426 0.097944 0.044695
1 H 0.739653 2.013070 7.557870 0.102656 0.053525
2 L 0.722556 2.074596 8.054198 -0.143503 0.058237
3 G 0.696291 2.193398 8.816680 0.261528 0.062377
4 J 0.677972 2.251240 9.348475 -0.000313 0.068745
I have added random seed to reduce randomness of the model. I have deleted the model every loop and metric variables to avoid any overlay.
I am building a MLP using TensorFlow 2.0. I am plotting the learning curve and also using keras.evaluate on both training and test data to see how well it performed. The code I'm using:
history = model.fit(X_train, y_train, batch_size=32,
epochs=200, validation_split=0.2, verbose=0)
# evaluate the model
eval_result_tr = model.evaluate(X_train, y_train)
eval_result_te = model.evaluate(X_test, y_test)
print("[training loss, training accuracy]:", eval_result_tr)
print("[test loss, test accuracy]:", eval_result_te)
#[training loss, training accuracy]: [0.5734676122665405, 0.9770742654800415]
#[test loss, test accuracy]: [0.7273344397544861, 0.9563318490982056]
#plot the learning rate curve
import matplotlib.pyplot as plt
plt.plot(history.history["loss"], label='eğitim')
plt.plot(history.history['val_loss'], label='doğrulama')
plt.xlabel("Öğrenme ivmesi")
plt.ylabel("Hata payı")
plt.title("Temel modelin öğrenme eğrisi")
plt.legend()
The output is:
My question is: How keras.evaluate() calculates the training loss to be 0.5734676122665405? I take the average of history.history["loss"] bu it returns different (0.7975356701016426) value.
Or, am I mistaken to begin with by trying to evaluate the model performance on training data by eval_result_tr = model.evaluate(X_train, y_train)?
For community benefit, adding #Dr. Snoopy's answer here in the answer section.
This has been asked many times before, the loss you see is with
changing weights during training, evaluating outside of training will
use fixed weights, so you will always see a different loss value.
I am using tensorflow and keras for a binary classification problem.
I have only a training set of 81 samples (Testsize 21), but ~1900 features. I know its too less samples and too many features, but its a biological problem (gene-expression data), so i have to deal with it.
My model looks like this: (using different neurons per layer, different number of hidden layers, regularization and dropout to deal with the high dimensional data)
model = Sequential()
model.add(Input((input_shape,)))
for i in range(num_hidden):
model.add(Dense(n_neurons, activation="relu",kernel_regularizer=keras.regularizers.l1_l2(l1_reg, l2_reg)))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation="sigmoid"))
ann_optimizer= keras.optimizers.Adam()
model.compile(loss="binary_crossentropy",
optimizer=ann_optimizer, metrics=['accuracy'])
I am using a 10 fold nested cross validation and grid search in the inner fold like this:
# fit and evaluate the model
# configure the inner cross-validation procedure (5 fold, 80 inner training dataset, 20 inner test dataset)
cv_inner = ShuffleSplit(n_splits=5, test_size=0.2, random_state=1)
# define the mode
ann = KerasRegressor(build_fn=regressionModel_sequential, input_shape=X_train.shape[1],
batch_size=batch_size)
# use pipeline as prevent from Leaky Preprocessing (StandardScaler on 80% inner-training dataset))
pipe = Pipeline(steps=[('scaler', StandardScaler()), ('ann', ann)])
# define the grid search of with inner cv to get good parameters
grid_search_result = GridSearchCV(
pipe, param_grid, n_jobs=-1, cv=cv_inner, refit=True, verbose=0)
#refit = True a final model with the entire inner-training dataset
# execute search
grid_search_result.fit(X_train, y_train, ann__verbose=0)
logger.info('>>>>> est=%.3f, params=%s' % (grid_search_result.best_score_, grid_search_result.best_params_))
# to get loss curve
ann_val = regressionModel_sequential(input_shape=X_train.shape[1],
n_neurons=grid_search_result.best_params_['ann__n_neurons'],
l1_reg=grid_search_result.best_params_['ann__l1_reg'],
l2_reg=grid_search_result.best_params_['ann__l2_reg'],
num_hidden=grid_search_result.best_params_['ann__num_hidden'],
dropout_rate=grid_search_result.best_params_['ann__dropout_rate'])
# Validation with outer 20 %
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
history = ann_val.fit(X_train, y_train, batch_size=batch_size, verbose=0,
validation_split=0.25, shuffle=True, epochs=grid_search_result.best_params_['ann__epochs'])
plot_history(history, directory, i)
# use best grid search reult for predicting on outer test dataset
y_predicted = ann_val.predict(X_test)
# print predicted
logger.info(y_predicted[:5])
logger.info(y_test[:5])
rmse = (np.sqrt(metrics.mean_squared_error(y_test, y_predicted)))
mae = (metrics.mean_squared_error(y_test, y_predicted))
r_squared = metrics.r2_score(y_test, y_predicted)
My loss seems good: loss
But accuracy is very bad. accuracy (example from one outer fold)
Does anyone have suggestions on what i could do to improve my results?
I also know that the biological question behind is very hard/maybe not possible to solve.
What is the difference or relationship between the Neural Network (NN) epoch and the max_iter parameter in scikit-learn?
For instance, as it can be seen in the code, evaluating the NN model for max_iter from 1 up to 10000 and evaluating for each iteration the Mean Absolute Error can be seen as the epoch? See image/link below, please!
Thank you very much!
for i in range(1,10000,10):
clf = MLPRegressor(max_iter=i, solver='lbfgs', alpha=1e-6, activation='relu', # melhorou e muito o treino com relu
hidden_layer_sizes=hidden_layer_sizes, random_state=1)
clf.fit(X_train_scaled, y_train)
mae_B = cross_val_score(clf, X_train_scaled, y_train, scoring="neg_mean_absolute_error", cv=10)
print i, float(-mae_B.mean()), clf.score(X_train_scaled, y_train), clf.score(X_test_scaled, y_test)
max_iter is equivalent to maximum number of epochs you want the model get trained on. It is called as maximum because the learning could get stopped before reaching the maximum number of iterations as well based on other termination criteria - n_iter_no_change. Hence do not loop through with different max_iterations, try to tweak the tol and n_iter_no_change if you want to avoid the overfitting.
Try the following and set the reasonably enough epochs in max_iter and then play with n_iter_no_change and tol. Reference Doc
clf = MLPRegressor(max_iter=50, solver='lbfgs', alpha=1e-6, activation='relu',
hidden_layer_sizes=hidden_layer_sizes, random_state=1,
tol=1e-3, n_iter_no_change = 5)
clf.fit(X_train_scaled, y_train)
mae_B = cross_val_score(clf, X_train_scaled, y_train, scoring="neg_mean_absolute_error", cv=10)
print i, float(-mae_B.mean()), clf.score(X_train_scaled, y_train), clf.score(X_test_scaled, y_test)
I am using the following fit function:
history = model.fit(x=[X1_train, X2_train, X3_train],
y=y_train,
batch_size=50,
epochs=20,
verbose=2,
validation_split=0.3,
#validation_data=([X1_test, X2_test, X3_test], y_test),
class_weight={0:1, 1:10})
and getting average val_acc of 0.7. But when running again, this time with the validation_data option (using data from the same dataset that I kept aside, of size around 30% of train data) I am getting an average val_acc of 0.35. Any reasons for getting such differences?
As requested by the OP, I am posting my comment as an answer and try to elaborate more:
When you set the validation_split argument, the validations samples are selected from the last samples in the training data and labels (i.e. X_train and y_train). Now, in this specific case, if the proportion of class labels in these selected samples is not the same as the proportion of the class labels in the data you provide using validation_data argument, then you should not necessarily expect the validation loss to be the same in these two cases. And that's simply because your model may have different accuracy on each of the classes.