Edit: For anyone interested. I made it slight better. I used L2 regularizer=0.0001, I added two more dense layers with 3 and 5 nodes with no activation functions. Added doupout=0.1 for the 2nd and 3rd GRU layers.Reduced batch size to 1000 and also set loss function to mae
Important note: I discovered that my TEST dataframe wwas extremely small compared to the train one and that is the main Reason it gave me very bad results.
I have a GRU model which has 12 features as inputs and I'm trying to predict output power. I really do not understand though whether I choose
1 layer or 5 layers
50 neurons or 512 neuron
10 epochs with a small batch size or 100 eopochs with a large batch size
Different optimizers and activation functions
Dropput and L2 regurlarization
Adding more dense layer.
Increasing and Decreasing learning rate
My results are always the same and doesn't make any sense, my loss and val_loss loss is very steep in first 2 epochs and then for the rest it becomes constant with small fluctuations in val_loss
Here is my code and a figure of losses, and my dataframes if needed:
Dataframe1: https://drive.google.com/file/d/1I6QAU47S5360IyIdH2hpczQeRo9Q1Gcg/view
Dataframe2: https://drive.google.com/file/d/1EzG4TVck_vlh0zO7XovxmqFhp2uDGmSM/view
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from google.colab import files
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
tbc=TensorBoardColab() # Tensorboard
from keras.layers.core import Dense
from keras.layers.recurrent import GRU
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from keras import regularizers
from keras.layers import Dropout
df10=pd.read_csv('/content/drive/My Drive/Isolation Forest/IF 10 PERCENT.csv',index_col=None)
df2_10= pd.read_csv('/content/drive/My Drive/2019 Dataframe/2019 10minutes IF 10 PERCENT.csv',index_col=None)
X10_train= df10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_train=X10_train.values
y10_train= df10['Power_kW']
y10_train=y10_train.values
X10_test= df2_10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_test=X10_test.values
y10_test= df2_10['Power_kW']
y10_test=y10_test.values
# scaling values for model
x_scale = MinMaxScaler()
y_scale = MinMaxScaler()
X10_train= x_scale.fit_transform(X10_train)
y10_train= y_scale.fit_transform(y10_train.reshape(-1,1))
X10_test= x_scale.fit_transform(X10_test)
y10_test= y_scale.fit_transform(y10_test.reshape(-1,1))
X10_train = X10_train.reshape((-1,1,12))
X10_test = X10_test.reshape((-1,1,12))
Early_Stop=EarlyStopping(monitor='val_loss', patience=3 , mode='min',restore_best_weights=True)
# creating model using Keras
model10 = Sequential()
model10.add(GRU(units=200, return_sequences=True, input_shape=(1,12),activity_regularizer=regularizers.l2(0.0001)))
model10.add(GRU(units=100, return_sequences=True))
model10.add(GRU(units=50))
#model10.add(GRU(units=30))
model10.add(Dense(units=1, activation='linear'))
model10.compile(loss=['mse'], optimizer='adam',metrics=['mse'])
model10.summary()
history10=model10.fit(X10_train, y10_train, batch_size=1500,epochs=100,validation_split=0.1, verbose=1, callbacks=[TensorBoardColabCallback(tbc),Early_Stop])
score = model10.evaluate(X10_test, y10_test)
print('Score: {}'.format(score))
y10_predicted = model10.predict(X10_test)
y10_predicted = y_scale.inverse_transform(y10_predicted)
y10_test = y_scale.inverse_transform(y10_test)
plt.scatter( df2_10['WindSpeed_mps'], y10_test, label='Measurements',s=1)
plt.scatter( df2_10['WindSpeed_mps'], y10_predicted, label='Predicted',s=1)
plt.legend()
plt.savefig('/content/drive/My Drive/Figures/we move on curve6 IF10.png')
plt.show()
I think the units of GRU are very high there. Too many GRU units might cause vanishing gradient problem. For starting, I would choose 30 to 50 units of GRU. Also, a bit higher learning rate e. g. 0.001.
If the dataset is publicly available can you please give me the link so that I can experiment on that and inform you.
I made it slightly better. I used L2 regularizer=0.0001, I added two more dense layers with 3 and 5 nodes with no activation functions. Added doupout=0.1 for the 2nd and 3rd GRU layers.Reduced batch size to 1000 and also set loss function to mae
Important note: I discovered that my TEST dataframe was extremely small compared to the train one and that is the main Reason it gave me very bad results.
Related
UPDATE: This question was for Tensorflow 1.x. I upgraded to 2.0 and (at least on the simple code below) the reproducibility issue seems fixed on 2.0. So that solves my problem; but I'm still curious about what "best practices" were used for this issue on 1.x.
Training the exact same model/parameters/data on keras/tensorflow does not give reproducible results and the loss is significantly different each time you train the model. There are many stackoverflow questions about that (eg, How to get reproducible results in keras ) but the recommend workarounds don't seem to work for me or many other people on StackOverflow. OK, it is what it is.
But given that limitation of non-reproducibility with keras on tensorflow -- what's the best practice for comparing models and choosing hyper parameters? I'm testing different architectures and activations, but since the loss estimate is different each time, I'm never sure if one model is better than the other. Is there any best practice for dealing with this?
I don't think the issue has anything to do with my code, but just in case it helps; here's a sample program:
import os
#stackoverflow says turning off the GPU helps reproducibility, but it doesn't help for me
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ['PYTHONHASHSEED']=str(1)
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers
import random
import pandas as pd
import numpy as np
#StackOverflow says this is needed for reproducibility but it doesn't help for me
from tensorflow.keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
#make some random data
NUM_ROWS = 1000
NUM_FEATURES = 10
random_data = np.random.normal(size=(NUM_ROWS, NUM_FEATURES))
df = pd.DataFrame(data=random_data, columns=['x_' + str(ii) for ii in range(NUM_FEATURES)])
y = df.sum(axis=1) + np.random.normal(size=(NUM_ROWS))
def run(x, y):
#StackOverflow says you have to set the seeds but it doesn't help for me
tf.set_random_seed(1)
np.random.seed(1)
random.seed(1)
os.environ['PYTHONHASHSEED']=str(1)
model = keras.Sequential([
keras.layers.Dense(40, input_dim=df.shape[1], activation='relu'),
keras.layers.Dense(20, activation='relu'),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(1, activation='linear')
])
NUM_EPOCHS = 500
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x, y, epochs=NUM_EPOCHS, verbose=0)
predictions = model.predict(x).flatten()
loss = model.evaluate(x, y) #This prints out the loss by side-effect
#Each time we run it gives a wildly different loss. :-(
run(df, y)
run(df, y)
run(df, y)
Given the non-reproducibility, how can I evaluate whether changes in my hyper-parameters and architecture are helping or not?
It's sneaky, but your code does, in fact, lack a step for better reproducibility: resetting the Keras & TensorFlow graphs before each run. Without this, tf.set_random_seed() won't work properly - see correct approach below.
I'd exhaust all the options before tossing the towel on non-reproducibility; currently I'm aware of only one such instance, and it's likely a bug. Nonetheless, it's possible you'll get notably differing results even if you follow through all the steps - in that case, see "If nothing works", but each is clearly not very productive, thus it's best on focusing attaining reproducibility:
Definitive improvements:
Use reset_seeds(K) below
Increase numeric precision: K.set_floatx('float64')
Set PYTHONHASHSEED before the Python kernel starts - e.g. from terminal
Upgrade to TF 2, which includes some reproducibility bug fixes, but mind performance
Run CPU on a single thread (painfully slow)
Do not import from tf.python.keras - see here
Ensure all imports are consistent (i.e. don't do from keras.layers import ... and from tensorflow.keras.optimizers import ...)
Use a superior CPU - for example, Google Colab, even if using GPU, is much more robust against numeric imprecision - see this SO
Also see related SO on reproducibility
If nothing works:
Rerun X times w/ exact same hyperparameters & seeds, average results
K-Fold Cross-Validation w/ exact same hyperparameters & seeds, average results - superior option, but more work involved
Correct reset method:
def reset_seeds(reset_graph_with_backend=None):
if reset_graph_with_backend is not None:
K = reset_graph_with_backend
K.clear_session()
tf.compat.v1.reset_default_graph()
print("KERAS AND TENSORFLOW GRAPHS RESET") # optional
np.random.seed(1)
random.seed(2)
tf.compat.v1.set_random_seed(3)
print("RANDOM SEEDS RESET") # optional
Running TF on single CPU thread: (code for TF1-only)
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
The problem appears to be solved in Tensorflow 2.0 (at least on simple models)! Here is a code snippet that seems to yield repeatable results.
import os
####*IMPORANT*: Have to do this line *before* importing tensorflow
os.environ['PYTHONHASHSEED']=str(1)
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers
import random
import pandas as pd
import numpy as np
def reset_random_seeds():
os.environ['PYTHONHASHSEED']=str(1)
tf.random.set_seed(1)
np.random.seed(1)
random.seed(1)
#make some random data
reset_random_seeds()
NUM_ROWS = 1000
NUM_FEATURES = 10
random_data = np.random.normal(size=(NUM_ROWS, NUM_FEATURES))
df = pd.DataFrame(data=random_data, columns=['x_' + str(ii) for ii in range(NUM_FEATURES)])
y = df.sum(axis=1) + np.random.normal(size=(NUM_ROWS))
def run(x, y):
reset_random_seeds()
model = keras.Sequential([
keras.layers.Dense(40, input_dim=df.shape[1], activation='relu'),
keras.layers.Dense(20, activation='relu'),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(1, activation='linear')
])
NUM_EPOCHS = 500
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x, y, epochs=NUM_EPOCHS, verbose=0)
predictions = model.predict(x).flatten()
loss = model.evaluate(x, y) #This prints out the loss by side-effect
#With Tensorflow 2.0 this is now reproducible!
run(df, y)
run(df, y)
run(df, y)
Putting only the code of below, it works. The KEY of the question, VERY IMPORTANT, is to call the function reset_seeds() every time before running the model. Doing that you will obtain reproducible results as I checked in the Google Collab.
import numpy as np
import tensorflow as tf
import random as python_random
def reset_seeds():
np.random.seed(123)
python_random.seed(123)
tf.random.set_seed(1234)
reset_seeds()
You have a couple option for stabilizing performance...
1) Set the seed for your intializers so they are always initialized to the same values.
2) More data generally results in a more stable convergence.
3) Lower learning rates and bigger batch sizes are also good for more predictable learning.
4) Training based on a fixed amount of epochs instead of using callbacks to modify hyperparams during train.
5) K-fold validation to train on different subsets. The average of these folds should result in a fairly predictable metric.
6) Also you have the option of just training multiple times and taking an average of this.
I am very new to Keras, neural networks and machine learning having just started to learn yesterday. I decided to try predicting the experience over an hour (0 to 23) (for a game and my own generated data-set) that a user would earn. Currently running what I have the predictions seem to be very low and very poor. I have tried a relu activation, which produced predictions all to be zero and from a bit of research, LeakyReLU.
This is the code I have for the prediction model so far:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LeakyReLU
import numpy
numpy.random.seed(7)
dataset = numpy.loadtxt("experience.csv", delimiter=",")
X = dataset[: ,0]
Y = dataset[: ,1]
model = Sequential()
model.add(Dense(12, input_dim = 1, activation=LeakyReLU(0.3)))
model.add(Dense(8, activation=LeakyReLU(0.3)))
model.add(Dense(1, activation=LeakyReLU(0.3)))
model.compile(loss = 'mean_absolute_error', optimizer='adam', metrics = ['accuracy'])
model.fit(X, Y, epochs=120, batch_size=10, verbose = 0)
predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]
print(rounded)
I have also tried playing around with the hidden levels of the network, but honestly have no idea how many there should be or a good way to justify an amount.
If it helps here is the data-set I have been using:
https://raw.githubusercontent.com/NightShadeII/xpPredictor/master/experience.csv
Thankyou for any help
Looking at your data it does not seem like a classification problem.
You have two options:
-> Look at the second column and bucket them depending on the ranges and make classes that can be predicted, for instance: 0, 1, 2 etc. Now it tries to train but does not have enough examples for millions of classes that it thinks you are trying to predict.
-> If you want real valued output and not classes, try using linear regression.
I'm trying to understand how to implement neural networks. So I made my own dataset. Xtrain is numpy.random floats. Ytrain is sign(sin(1/x^3).
Try to implement neural networks gave me very poor results. 30%accuracy. Random Forest with 100 trees give 97%. But I heard that NN can approximate any function. What is wrong in my understanding?
import numpy as np
import keras
import math
from sklearn.ensemble import RandomForestClassifier as RF
train = np.random.rand(100000)
test = np.random.rand(100000)
def g(x):
if math.sin(2*3.14*x) > 0:
if math.cos(2*3.14*x) > 0:
return 0
else:
return 1
else:
if math.cos(2*3.14*x) > 0:
return 2
else:
return 3
def f(x):
x = (1/x) ** 3
res = [0, 0, 0, 0]
res[g(x)] = 1
return res
ytrain = np.array([f(x) for x in train])
ytest = np.array([f(x) for x in test])
train = np.array([[x] for x in train])
test = np.array([[x] for x in test])
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding, LSTM
model = Sequential()
model.add(Dense(100, input_dim=1))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(4))
model.add(Activation('softmax'))
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
P.S. I tried out many layers, activation functions, loss functions, optimizers, but never got more than 30% accuracy :(
I suspect that the 30% accuracy is a combination of small learning rate setting and a small training-step setting.
I ran your code snippet with model.fit(train, ytrain, nb_epoch=5, batch_size=32), after 5 epoch's training it yields about 28% accuracy. With the same setting but increasing the training steps to nb_epoch=50, the loss drops to ~1.157 ish and the accuracy raises to 40%. Further increase training steps should lead the model to further converging. Other than that, you can also try to configure the model with a larger learning rate setting which could make the converging faster :
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.1, momentum=0.9, nesterov=True), metrics=['accuracy'])
Although be careful don't set the learning rate to be too large otherwise your loss could blow up.
EDIT:
NN is known for having the potential for modeling extremely complex function, however, whether or not the model actually produce a good performance is a matter of how the model is designed, trained, and many other matters related to the specific application.
Zhongyu Kuang's answer is correct in stating that you may need to train it longer or with a different learning rate.
I'll add that the deeper your network, the longer you'll need to train it before it converges. For a relatively simple function like sign(sin(1/x^3)), you may be able to get away with a smaller network than the one you're using.
Additionally, softmax probably isn't the best output layer. You just need to yield -1 or 1. A single tanh unit seems like it would do well. softmax is generally used when you want to learn a probability distribution over a finite set. (You'll probably want to switch your error function from cross entropy to mean square error for similar reasons.)
Try a network with one sigmoidal hidden layer and an output layer with just one tanh unit. Then play around with the layer size and learning rate. Maybe add a second hidden layer if you can't get results with just one, but I wouldn't be surprised if it's unnecessary.
Addendum: In this approach, you'll replace f(x) with a direct calculation of the target function instead of the one-hot vector you're using currently.
I have constructed an ANN in keras which has 1 input layer(3 inputs), one output layer (1 output) and two hidden layers with with 12 and 3 nodes respectively.
The way i construct and train my network is:
from keras.models import Sequential
from keras.layers import Dense
from sklearn.cross_validation import train_test_split
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
dataset = numpy.loadtxt("sorted output.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:3]
Y = dataset[:,3]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Dense(12, input_dim=3, init='uniform', activation='relu'))
model.add(Dense(3, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), nb_epoch=150, batch_size=10)
Sorted output csv file looks like:
so after 150 epochs i get: loss: 0.6932 - acc: 0.5000 - val_loss: 0.6970 - val_acc: 0.1429
My question is: how could i modify my NN in order to achieve higher accuracy?
You could try the following things. I have written this roughly in the order of importance - i.e. the order I would try things to fix the accuracy problem you are seeing:
Normalise your input data. Usually you would take mean and standard deviation of training data, and use them to offset+scale all further inputs. There is a standard normalising function in sklearn for this. Remember to treat your test data in the same way (using the mean and std from the training data, not recalculating it)
Train for more epochs. For problems with small numbers of features and limited training set sizes, you often have to run for thousands of epochs before the network will converge. You should plot the training and validation loss values to see whether the network is still learning, or has converged as best as it can.
For your simple data, I would avoid relu activations. You may have heard they are somehow "best", but like most NN options, they have types of problems where they work well, and others where they are not best choice. I think you would be better off with tanh or sigmoid activations in hidden layers for your problem. Save relu for very deep networks and/or convolutional problems on images/audio.
Use more training data. Not clear how much you are feeding it, but NNs work best with large amounts of training data.
Provided you already have lots of training data - increase size of hidden layers. More complex relationships require more hidden neurons (and sometimes more layers) for the NN to be able to express the "shape" of the decision surface. Here is a handy browser-based network allowing you to play with that idea and get a feel for it.
Add one or more dropout layers after the hidden layers or add some other regularisation. The network could be over-fitting (although with a training accuracy of 0.5 I suspect it isn't). Unlike relu, using dropout is pretty close to a panacea for tougher NN problems - it improves generalisation in many cases. A small amount of dropout (~0.2) might help with your problem, but like most hyper-parameters, you will need to search for the best values.
Finally, it is always possible that the relationship you want to find that allows you to predict Y from X is not really there. In which case it would be a correct result from the NN to be no better than guessing at Y.
Neil Slater already provided a long list of helpful general advices.
In your specific examaple, normalization is the important thing. If you add the following lines to your code
...
X = dataset[:,0:3]
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
you will get 100% accuracy on your toy data, even with much simpler network structures. Without normalization, the optimizer won't work.
Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])