Can I optimize three parameters with a DistributionLambda layer? - python

I need to create a neural network layer to optimize the three parameters of a GeneralizedExtremeValue distribution (that are loc, scale, and concentration), but when more than one parameter is passed to a DistributionLambda layer, all training metrics are nan, and the output distribution is empty.
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import tensorflow_probability as tfp
import tensorflow as tf
tfd = tfp.distributions
tfkl = tf.keras.layers
tfpl = tfp.layers
# INPUT DATASETS ==========================================================
# creating a GEV distribution
dist = tfd.GeneralizedExtremeValue(loc=2, scale=1, concentration=0.1)
# sampling the GEV to create a dataset
dataset = dist.sample(10**(5))
# creating some noise on the sample
def add_eps (values):
eps = np.random.randn(len(values))
return values + eps
x_train = (0.75* add_eps(dataset[:(8*10**4)]))
y_train = dataset[:(8*10**4)]
x_test = (0.75* add_eps(dataset[(8*10**4):]))
# plotting the input datasets
fig=plt.figure()
sns.histplot(dataset, bins=100,
stat='probability', kde=True,
color='r', label='Sample from GEV: y_train')
sns.histplot(x_train, bins=100,
stat='probability',
kde=True,
color='b', label='Modified sample from GEV: x_train')
plt.legend()
plt.show()
Plotting the x_train and y_train, it seems that the input datasets are right. Anyway, the neural network outputs a distribution with nan instead of mean and stddev. I tried changing the batch_size, the epochs, and the number of outputs from the Dense layer, but nothing worked.
# CREATING BAYESIAN NETWORK ================================================
# model architecture
model = tf.keras.Sequential([
tfkl.Dense(3, input_shape = (1,)),
tfpl.DistributionLambda(
lambda t: tfd.GeneralizedExtremeValue(loc=t[0],
scale=t[1], concentration=t[2]))
])
negloglik = lambda y_true, y_pred: -y_pred.log_prob(y_true)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01),
loss=negloglik)
# Model Fitting
history = model.fit(x_train, y_train,
validation_split=0.2, epochs=100,
verbose=True, shuffle=True,
batch_size =300)
# predictions
y_pred_mean = model(x_test.numpy().reshape(-1,1)).mean()
y_pred_std = model(x_test.numpy().reshape(-1,1)).stddev()

Try adding some constrains to the parameters. I'm not familiar with this distribution in particular, but I'm quite sure sigma (scale) should be greater than zero, so in your case
scale = 1e-3 + tf.math.softplus(0.01 *t[1])

Related

Python Tensor Flow issue with fitting line to data

I've recently been trying to implement tensor flow into my projects, and I attempted to use the Basic Regression Using Keras Guide for regression. However, I am having issues with fitting the line onto the data: loss & prediction vs. data. I've normalized my data, ran it through 1000 epochs, and the data seems fine. Here is the data and the code I've used. Does anyone know why the prediction is so different from the data
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
# Make NumPy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print(tf.__version__)
train_dataset = df.sample(frac=0.8, random_state = 0)
test_dataset = df.drop(train_dataset.index)
train_dataset.describe().transpose()
train_features = train_dataset.copy()
test_features = test_dataset.copy()
train_labels = train_features.pop('Max')
test_labels = test_features.pop('Max')
train_dataset.describe().transpose()[['mean','std']]
normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(train_features))
print(normalizer.mean.numpy())
first = np.array(train_features[:1])
with np.printoptions(precision=2, suppress=True):
print('First example:', first)
print()
print('Normalized:', normalizer(first).numpy())
date = np.array(train_features['Date Lifted'])
date_normalizer = layers.Normalization(input_shape=[1,], axis=None)
date_normalizer.adapt(date)
date_model = tf.keras.Sequential([
date_normalizer,
layers.Dense(units=1)
])
date_model.summary()
date_model.predict(date[:10])
date_model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='mean_absolute_error')
%%time
history = date_model.fit(
train_features['Date Lifted'],
train_labels,
epochs=100,
# Suppress logging.
verbose=0,
# Calculate validation results on 20% of the training data.
validation_split = 0.2)
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()
def plot_loss(history):
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.ylim([0, 1000])
plt.xlabel('Epoch')
plt.ylabel('Error [Max]')
plt.legend()
plt.grid(True)
plot_loss(history)
test_results = {}
test_results['date_model'] = date_model.evaluate(
test_features['Date Lifted'],
test_labels, verbose=0)
x = tf.linspace(0, 250, 251)
y = date_model.predict(x)
def plot_horsepower(x, y):
plt.scatter(train_features['Date Lifted'], train_labels, label='Data')
plt.plot(x, y, color='k', label='Predictions')
plt.xlabel('Date Lifted')
plt.ylabel('Max')
plt.legend()
plot_horsepower(x, y)
Your model only has a single neuron?
I imagine you'll see better results if you use a more complicated model, with more layers and more units per layer.

How I can plot the actual vs predicted values for the neural network?

I am trying to plot the actual vs predicted values of my neural network which I created using keras.
What I want exactly is to have my data scattered and to have the best fit curve for the training and testing data set?
below is the code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
import os
#Load the dataset from excel
data = pd.read_csv('C:\\Excel Files\\Neural Network\\diabetes1.csv', sep=';')
#Viewing the Data
data.head(5)
import seaborn as sns
data['Outcome'].value_counts().plot(kind = 'bar')
#Split into input(x) and output (y) variables
predictiors = data.iloc[:,0:8]
response = data.iloc[:,8]
#Create training and testing vars
X_train, X_test, y_train, y_test = train_test_split(predictiors, response, test_size=0.2)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
# Define the keras model - Layer by Layer sequential model
kerasmodel = Sequential()
kerasmodel.add(Dense(12, input_dim=8, activation='relu')) #First Hidden Layer, 12 neurons in and 8 inputs with relu activation functions
kerasmodel.add(Dense(8, activation='relu')) #Relu to avoid vanishing/exploding gradient problem -#
kerasmodel.add(Dense(1, activation='sigmoid')) #since output is binary so "Sigmoid" - #OutputLayer
#Please not weight and bias initialization are done by keras default nethods using "'glorot_uniform'"
# Compiling model
kerasmodel.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#fintting model
kerasmodel.fit(X_train, y_train, epochs=50, batch_size=10)
# Train accuracy
_, accuracy = kerasmodel.evaluate(X_train, y_train)
print('Train Accuracy: %.2f' % (accuracy*100))
I want to have plots like these:
From what i understood form your question this will give you a scatter plot of actual_y values and a line plot of predicted_y value:-
import matplotlib.pyplot as plt
x = []
plt.plot(predicted_y, linestyle = 'dotted')
for i in range(0,len(acutal_y):
x.append(i)
plt.scatter(x, actual_y)
plt.show()
predict for both x_train and x_test by the model, and then try out to draw sns.regplot() by import seaborn as sns, for the horizontal x = actual y_values, vertical y = predicted values, two separated plots for both train and test set, then it would plot scatter for points and even line for its regression which means if slope is equal to 1 and intercept equal to 0 or close to these values, the model would be very well.
for more options it would be better to calculate 'scipy.stats.linregress' for both train and test set to derive the slope and intercept.
Thanks #Soroush Mirzaei
Thank you so much! .. Your suggestion has solved my problem.
May I ask you if is it possible to plot to data sets using sns.regplot() and the best fit curve for these two data sets.
For example:
I got two Data sets and two fit lines for each one like the below image
[1]: https://i.stack.imgur.com/avGR1.png
Instead, I want to get something two Data sets, the orange and the blue sets, and one fit curve for these two sets..
Below is the code that I am using :
sns.regplot(y_train,y_pred_train, label="Training Data")
plt.xlabel('Actual G*')
plt.ylabel('Predicted G*')
plt.title('Actual vs. Predicted')
plt.legend(loc="upper left")
sns.regplot(y_test,y_pred_test, label="Testing Data")

How to plot the ROC curve for ANN for 10 fold Cross validation in Keras using Python?

I was just trying to find ROC plot for all the 10 experiments for 10 fold cross-validation for ANN in Keras. I got stuck with it for a week and can not find a solution. Could anyone help with this? I have tried the code from the following link(https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html) from sklearn and wanted to use wrapper to use Keras model in sklearn but it shows errors. My code in python:
## Creating NN in Keras
# Load libraries
import numpy as np
from keras import models
from keras import layers
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification
# Set random seed
np.random.seed(7)
#Create Function That Constructs Neural Network
# Create function returning a compiled network
def create_network():
# Start neural network
network = models.Sequential()
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=25, activation='relu', input_shape=(X.shape[1],)))
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=X.shape[1], activation='relu'))
# Add fully connected layer with a sigmoid activation function
network.add(layers.Dense(units=1, activation='sigmoid'))
# Compile neural network
network.compile(loss='binary_crossentropy', # Cross-entropy
optimizer='adam', # Root Mean Square Propagation
metrics=['accuracy']) # Accuracy performance metric
# Return compiled network
return network
###
#Wrap Function In KerasClassifier
# Wrap Keras model so it can be used by scikit-learn
neural_network = KerasClassifier(build_fn=create_network,
epochs=150,
batch_size=10,
verbose=0)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.metrics import auc
from sklearn.metrics import plot_roc_curve
from sklearn.model_selection import StratifiedKFold
n_samples, n_features = X.shape
# Add noisy features
random_state = np.random.RandomState(0)
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]
# #############################################################################
# Classification and ROC analysis
# Run classifier with cross-validation and plot ROC curves
cv = StratifiedKFold(n_splits=10)
classifier = neural_network
tprs = []
aucs = []
mean_fpr = np.linspace(0, 1, 100)
fig, ax = plt.subplots()
for i, (train, test) in enumerate(cv.split(X, y)):
classifier.fit(X[train], y[train])
viz = plot_roc_curve(classifier, X[test], y[test],
name='ROC fold {}'.format(i),
alpha=0.3, lw=1, ax=ax)
interp_tpr = np.interp(mean_fpr, viz.fpr, viz.tpr)
interp_tpr[0] = 0.0
tprs.append(interp_tpr)
aucs.append(viz.roc_auc)
ax.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',
label='Chance', alpha=.8)
mean_tpr = np.mean(tprs, axis=0)
mean_tpr[-1] = 1.0
mean_auc = auc(mean_fpr, mean_tpr)
std_auc = np.std(aucs)
ax.plot(mean_fpr, mean_tpr, color='b',
label=r'Mean ROC (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc, std_auc),
lw=2, alpha=.8)
std_tpr = np.std(tprs, axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
ax.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,
label=r'$\pm$ 1 std. dev.')
ax.set(xlim=[-0.05, 1.05], ylim=[-0.05, 1.05],
title="Receiver operating characteristic example")
ax.legend(loc="lower right")
plt.show()
**It shows the following error:**
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-29-f10078491154> in <module>()
40 viz = plot_roc_curve(classifier, X[test], y[test],
41 name='ROC fold {}'.format(i),
---> 42 alpha=0.3, lw=1, ax=ax)
43 interp_tpr = np.interp(mean_fpr, viz.fpr, viz.tpr)
44 interp_tpr[0] = 0.0
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_plot/roc_curve.py in plot_roc_curve(estimator, X, y, sample_weight, drop_intermediate, response_method, name, ax, **kwargs)
170 )
171 if not is_classifier(estimator):
--> 172 raise ValueError(classification_error)
173
174 prediction_method = _check_classifer_response_method(estimator,
ValueError: KerasClassifier should be a binary classifier
I had the same question. I found this link very informative.
https://www.kaggle.com/kanncaa1/roc-curve-with-k-fold-cv. I have modified it for my case as bellow:
seed = 7
np.random.seed(seed)
tprs = []
aucs = []
mean_fpr = np.linspace(0, 1, 100)
i = 1
fig, ax = plt.subplots()
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)
# for i, (train, test) in enumerate(cv.split(X_13 , target)):
for train, test in kfold.split(X_train, y_train):
# create model
model= Sequential()
model.add(Dense(100, input_dim=X_train.shape[1], activation= 'relu',kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(80, activation = 'relu',kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'sigmoid'))
##- compile model
sgd = SGD(lr=0.1, momentum=0.8)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.fit(X_train[train], y_train[train], epochs=100, batch_size=15,verbose=0)
# evaluate the model
y_pred_keras = model.predict_proba(X_train[test]).ravel()
fpr, tpr, thresholds = roc_curve(y_train[test], y_pred_keras)
tprs.append(interp(mean_fpr, fpr, tpr))
roc_auc = auc(fpr, tpr)
aucs.append(roc_auc)
plt.plot(fpr, tpr, lw=2, alpha=0.3, label='ROC fold %d (AUC = %0.2f)' % (i, roc_auc))
i= i+1
plt.plot([0,1],[0,1],linestyle = '--',lw = 2,color = 'black')
mean_tpr = np.mean(tprs, axis=0)
mean_auc = auc(mean_fpr, mean_tpr)
plt.plot(mean_fpr, mean_tpr, color='blue',
label=r'Mean ROC (AUC = %0.2f )' % (mean_auc),lw=2, alpha=1)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC')
plt.legend(loc="lower right")
plt.show()
Hope it could help!
I have just answered what seems to be the copy of this post (apart from variable names) here.
Not sure whether this is the exact duplicate or not because the question comes from a different account but it seems like that. But here is a copy of my answer in case one of these is closed as a duplicate.
This is an implementational detail that is (probably) missing in this wrapper library.
Sklearn simply checks whether an attribute called _estimator_type is present on the estimator and is set to string value classifier. You can see that by looking into sklearn's source code on github.
def is_classifier(estimator):
"""Return True if the given estimator is (probably) a classifier.
Parameters
----------
estimator : object
Estimator object to test.
Returns
-------
out : bool
True if estimator is a classifier and False otherwise.
"""
return getattr(estimator, "_estimator_type", None) == "classifier"
All you need to do is to add this attribute to your classifier object manually.
classifier = KerasClassifier(build_fn=create_network,
epochs=10,
batch_size=100,
verbose=2)
classifier._estimator_type = "classifier"
I have tested it and it works.

Spiral problem, why does my loss increase in this neural network using Keras?

I'm trying to solve the spiral problem using Keras with 3 spirals instead of 2 using a similar strategy that I used for 2. Problem is my loss is now growing exponentially instead of decreasing with the same parameters I used for 2 spirals (The neural network structure has 3 outputs instead of being binary). I'm not quite sure what could be happening with this issue if anyone could help? I have tried this with various epochs, learning rates, batch sizes.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.optimizers import RMSprop
from Question1.utils import create_neural_network, create_test_data
EPOCHS = 250
BATCH_SIZE = 20
def main():
df = three_spirals(1000)
# Set-up data
x_train = df[['x-coord', 'y-coord']].values
y_train = df['class'].values
# Don't need y_test, can inspect visually if it worked or not
x_test = create_test_data()
# Scale data
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)
relu_model = create_neural_network(layers=3,
neurons=[40, 40, 40],
activation='relu',
optimizer=RMSprop(learning_rate=0.001),
loss='categorical_crossentropy',
outputs=3)
# Train networks
relu_model.fit(x=x_train, y=y_train, epochs=EPOCHS, verbose=1, batch_size=BATCH_SIZE)
# Predictions on test data
relu_predictions = relu_model.predict_classes(x_test)
models = [relu_model]
test_predictions = [relu_predictions]
# Plot
plot_data(models, test_predictions)
And here is the create_neural_network function:
def create_neural_network(layers, neurons, activation, optimizer, loss, outputs=1):
if layers != len(neurons):
raise ValueError("Number of layers doesn't much the amount of neuron layers.")
model = Sequential()
for i in range(layers):
model.add(Dense(neurons[i], activation=activation))
# Output
if outputs == 1:
model.add(Dense(outputs))
else:
model.add(Dense(outputs, activation='softmax'))
model.compile(optimizer=optimizer,
loss=loss)
return model
I have worked it out, for the output data it isn't like a binary classification where you only need one column. For multi classification you need a column for each class you want to classify...so where I had y could be 0, 1, 2 was incorrect. The correct way to do this was to have y0, y1, y2 which would be 1 if it fit that specific class and 0 if it didn't.

Overfitting and data leakage in tensorflow/keras neural network

Good morning, I'm new in machine learning and neural networks. I am trying to build a fully connected neural network to solve a regression problem. The dataset is composed by 18 features and 1 label, and all of these are physical quantities.
You can find the code below. I upload the figure of the loss function evolution along the epochs (you can find it below). I am not sure if there is overfitting. Someone can explain me why there is or not overfitting?
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import SelectFromModel
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
import keras
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
from keras import optimizers
from sklearn.metrics import r2_score
from keras import regularizers
from keras import backend
from tensorflow.keras import regularizers
from keras.regularizers import l2
# =============================================================================
# Scelgo il test size
# =============================================================================
test_size = 0.2
dataset = pd.read_csv('DataSet.csv', decimal=',', delimiter = ";")
label = dataset.iloc[:,-1]
features = dataset.drop(columns = ['Label'])
y_max_pre_normalize = max(label)
y_min_pre_normalize = min(label)
def denormalize(y):
final_value = y*(y_max_pre_normalize-y_min_pre_normalize)+y_min_pre_normalize
return final_value
# =============================================================================
# Split
# =============================================================================
X_train1, X_test1, y_train1, y_test1 = train_test_split(features, label, test_size = test_size, shuffle = True)
y_test2 = y_test1.to_frame()
y_train2 = y_train1.to_frame()
# =============================================================================
# Normalizzo
# =============================================================================
scaler1 = preprocessing.MinMaxScaler()
scaler2 = preprocessing.MinMaxScaler()
X_train = scaler1.fit_transform(X_train1)
X_test = scaler2.fit_transform(X_test1)
scaler3 = preprocessing.MinMaxScaler()
scaler4 = preprocessing.MinMaxScaler()
y_train = scaler3.fit_transform(y_train2)
y_test = scaler4.fit_transform(y_test2)
# =============================================================================
# Creo la rete
# =============================================================================
optimizer = tf.keras.optimizers.Adam(lr=0.001)
model = Sequential()
model.add(Dense(60, input_shape = (X_train.shape[1],), activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dropout(0.2))
model.add(Dense(60, activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dropout(0.2))
model.add(Dense(60, activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dense(1,activation = 'linear',kernel_initializer='glorot_uniform'))
model.compile(loss = 'mse', optimizer = optimizer, metrics = ['mse'])
history = model.fit(X_train, y_train, epochs = 100,
validation_split = 0.1, shuffle=True, batch_size=250
)
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
y_train_pred = denormalize(y_train_pred)
y_test_pred = denormalize(y_test_pred)
plt.figure()
plt.plot((y_test1),(y_test_pred),'.', color='darkviolet', alpha=1, marker='o', markersize = 2, markeredgecolor = 'black', markeredgewidth = 0.1)
plt.plot((np.array((-0.1,7))),(np.array((-0.1,7))),'-', color='magenta')
plt.xlabel('True')
plt.ylabel('Predicted')
plt.title('Test')
plt.figure()
plt.plot((y_train1),(y_train_pred),'.', color='darkviolet', alpha=1, marker='o', markersize = 2, markeredgecolor = 'black', markeredgewidth = 0.1)
plt.plot((np.array((-0.1,7))),(np.array((-0.1,7))),'-', color='magenta')
plt.xlabel('True')
plt.ylabel('Predicted')
plt.title('Train')
plt.figure()
plt.plot(loss_values,'b',label = 'training loss')
plt.plot(val_loss_values,'r',label = 'val training loss')
plt.xlabel('Epochs')
plt.ylabel('Loss Function')
plt.legend()
print("\n\nThe R2 score on the test set is:\t{:0.3f}".format(r2_score(y_test_pred, y_test1)))
print("The R2 score on the train set is:\t{:0.3f}".format(r2_score(y_train_pred, y_train1)))
from sklearn import metrics
# Measure MSE error.
score = metrics.mean_squared_error(y_test_pred,y_test1)
print("\n\nFinal score test (MSE): %0.4f" %(score))
score1 = metrics.mean_squared_error(y_train_pred,y_train1)
print("Final score train (MSE): %0.4f" %(score1))
score2 = np.sqrt(metrics.mean_squared_error(y_test_pred,y_test1))
print(f"Final score test (RMSE): %0.4f" %(score2))
score3 = np.sqrt(metrics.mean_squared_error(y_train_pred,y_train1))
print(f"Final score train (RMSE): %0.4f" %(score3))
EDIT:
I tried alse to do feature importances and to raise n_epochs, these are the results:
Feature Importance:
No Feature Importace:
Looks like you don't have overfitting! Your training and validation curves are descending together and converging. The clearest sign you could get of overfitting would be a deviation between these two curves, something like this:
Since your two curves are descending and are not diverging, it indicates your NN training is healthy.
HOWEVER! Your validation curve is suspiciously below the training curve. This hints a possible data leakage (train and test data have been mixed somehow). More info on a nice an short blog post. In general, you should split the data before any other preprocessing (normalizing, augmentation, shuffling, etc...).
Other causes for this could be some type of regularization (dropout, BN, etc..) that is active while computing the training accuracy and it's deactivated when computing the Validation/Test accuracy.
Overfitting is, when the model does not generalize to other data than the training data. When this happen you will have a very (!) low training loss but a high validation loss. You can think of it this way: if you have N points you can fit a N-1 polynomial such that you have a zero training loss (your model hits all your training points perfectly). But, if you apply that model to some other data, it will most likely produce a very high error (see the image below). Here the red line is our model and the green is the true data (+ noice), and you can see in the last picture we get zero training error. In the first, our model is too simple (high train/high validation error), the second is good (low train/low valuidation error) the third and last is too complex i.e overfitting (very low train/high validation error).
Neural network can work in the same way, so by looking at your training vs validation error, you can conclude if it overfits or not
No, this is not overfitting as your validation loss isn´t increasing.
Nevertheless, if I were you I would be a little bit skeptical. Try to train your model for even more epochs and watch out for the validation loss.
What you definitely should do, is to observe the following:
- are there duplicates or near-duplicates in the data (creates information leakage from train to test validation split)
- are there features that have a causal connection to the target variable
Edit:
Usually, you have some random component in a real-world dataset, so that rules that are observed in train data aren´t 100% true for validation data.
Your plot shows that the validation loss is even more decreasing as train loss decreases. Usually, you get to some point in training, where the rules you observe in train data are too specific to describe the whole data. That´s when overfitting begins. Hence, it is weird, that your validation loss doesn´t increase again.
Please check whether your validation loss approaches zero when you´re training for more epochs. If it´s the case I would check your database very carefully.
Let´s assume, that there is a kind of information leakage from the train set to the validation set (through duplicate records for example). Your model would change the weights to describe very specific rules. When applying your model to new data it would fail miserably since the observed connections are not really general.
Another common data problem is, that features may have an inversed causality.
The thing that validation loss is generally lower than train error is probably depending on dropout and regularization, since it´s applied while training but not for predicting/testing.
I put some emphasis on this because a tiny bug or an error in the data can "fuck up" your whole model.

Categories