I am training a relatively small neural network using python 3.10.7 on a mac studio, with tensorflow installed in a virtualenv.
Most epochs take about 2.3 seconds, but every once in a while there is an epoch which takes between 700 and 1100 seconds:
Epoch 4292/5000
5/5 [==============================] - 2s 358ms/step - loss: 0.0017 - accuracy: 0.1168 - val_loss: 0.0029 - val_accuracy: 0.1073
Epoch 4293/5000
5/5 [==============================] - 2s 382ms/step - loss: 0.0018 - accuracy: 0.1237 - val_loss: 0.0029 - val_accuracy: 0.1073
Epoch 4294/5000
5/5 [==============================] - 1063s 266s/step - loss: 0.0018 - accuracy: 0.1081 - val_loss: 0.0029 - val_accuracy: 0.1107
Epoch 4295/5000
5/5 [==============================] - 2s 354ms/step - loss: 0.0019 - accuracy: 0.1263 - val_loss: 0.0029 - val_accuracy: 0.1003
Epoch 4296/5000
5/5 [==============================] - 2s 350ms/step - loss: 0.0019 - accuracy: 0.1315 - val_loss: 0.0029 - val_accuracy: 0.1142
I hand-picked the batch size to optimize the time spent on every epoch, and my training code is as follows:
sgd = tf.keras.optimizers.SGD(learning_rate=0.005, momentum=0.95, nesterov=True)
model.compile(optimizer=sgd, loss=tf.keras.losses.MeanSquaredError(), metrics=['accuracy'])
history = model.fit(X, y, validation_data = (X_test,y_test), epochs=epochs, batch_size=240, shuffle=True)
I am using the following packages:
pip freeze | grep tensorflow
tensorflow-estimator==2.9.0
tensorflow-macos==2.9.0
tensorflow-metal==0.5.0
According to Apple's activity monitor, my GPU usage is between 80% and 90%.
I tried reducing the batch size, but the long epochs kept on happening every once in a while.
I will also note that during the training my computer isn't connected to any display.
Related
I am learning python deep learning tools on Tensorflow official websites.
Trying to build several Text-Classification network, do as tutorials. But LSTM does not work as except.
import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow.keras import utils
from tensorflow.keras import losses
import matplotlib.pyplot as plt
seed = 42
BATCH_SIZE = 64
train_ds = utils.text_dataset_from_directory(
'stack_overflow_16k/train',
validation_split=0.2,
subset='training',
batch_size=BATCH_SIZE,
seed=seed)
val_ds = utils.text_dataset_from_directory(
'stack_overflow_16k/train',
validation_split=0.2,
subset='validation',
batch_size=BATCH_SIZE,
seed=seed)
test_ds = utils.text_dataset_from_directory(
'stack_overflow_16k/test',
batch_size=BATCH_SIZE)
class_names = train_ds.class_names
train_ds = train_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = val_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
test_ds = test_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
VOCAB_SIZE = 1000
MAX_SEQUENCE_LENGTH = 500
encoder = tf.keras.layers.TextVectorization(
max_tokens=VOCAB_SIZE,
output_sequence_length=MAX_SEQUENCE_LENGTH)
encoder.adapt(train_ds.map(lambda text, label: text))
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(VOCAB_SIZE, 64, mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
# tf.keras.layers.LSTM(128),
# tf.keras.layers.Dense(64, activation='relu'),
# tf.keras.layers.Conv1D(64, 5, padding="valid", activation="relu", strides=2),
# tf.keras.layers.GlobalMaxPooling1D(),
# tf.keras.layers.GRU(64),
# tf.keras.layers.SimpleRNN(64),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(4)
])
model.summary()
model.compile(loss='sparse_categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
history = model.fit(train_ds, epochs=10,
validation_data=val_ds)
This is my complete code, the core part is same as tutorials.
But the training output as follow:
Epoch 1/10
100/100 [==============================] - 33s 273ms/step - loss: 9.6882 - accuracy: 0.2562 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 2/10
100/100 [==============================] - 25s 250ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 3/10
100/100 [==============================] - 25s 252ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 4/10
100/100 [==============================] - 25s 254ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 5/10
100/100 [==============================] - 25s 255ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 6/10
100/100 [==============================] - 26s 256ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 7/10
100/100 [==============================] - 26s 257ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 8/10
100/100 [==============================] - 26s 258ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 9/10
100/100 [==============================] - 26s 258ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 10/10
100/100 [==============================] - 26s 256ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
The accuracy does not increase and loss does not decrease at all, like not trained. And the accuracy just same as the reciprocal of the number of classes. (E.g. If it's a binary classification problem then the accuracy would keep aroud 0.5, four classification problem 0.25)
Later I compare with CNN, just change the LSTM layers to CNN layers as tutorials, it works as expect. (Same datasets, same params of model.compile() and model.fit())
I also tried GRU, same problem occurs.
I don't get it.
Am I missing some configuration with RNN-like model? Can somebody help me with this problem? Thanks!
P.S.
tensorflow-macos 2.9
tensorflow-metal 0.5.0
Chip: Apple M1
Datasets: https://storage.googleapis.com/download.tensorflow.org/data/stack_overflow_16k.tar.gz Same as tutorials.
I tried config the optimizer(sgd, adam) and learning rate, does not work. It is not like overfitting.
Methods I tried:
Keras accuracy does not change
Update 2023-01-30
I run the same code on my linux server it work expected. It maybe a bug of tensorflow-macos.
Update 2023-02-01
Tried the official version of tensorflow for macos m1, just
conda install tensorflow
it works. Suppposed to be the problem of tensorflow-macos GPU support. And I try again by using CPU only on tensorflow-macos, it works.
Conclusion:
The RNN-like model have problem on tensorflow-macos with GPU.
I am writing a neural network for translating texts from Russian to English, but I ran into the problem that my neural network gives a big loss, as well as a very far from the correct answer.
Below is the LSTM that I build using Keras:
def make_model(in_vocab, out_vocab, in_timesteps, out_timesteps, n):
model = Sequential()
model.add(Embedding(in_vocab, n, input_length=in_timesteps, mask_zero=True))
model.add(LSTM(n))
model.add(Dropout(0.3))
model.add(RepeatVector(out_timesteps))
model.add(LSTM(n, return_sequences=True))
model.add(Dropout(0.3))
model.add(Dense(out_vocab, activation='softmax'))
model.compile(optimizer=optimizers.RMSprop(lr=0.001), loss='sparse_categorical_crossentropy')
return model
And also the learning process is presented:
Epoch 1/10
3/3 [==============================] - 5s 1s/step - loss: 8.3635 - accuracy: 0.0197 - val_loss: 8.0575 - val_accuracy: 0.0563
Epoch 2/10
3/3 [==============================] - 2s 806ms/step - loss: 7.9505 - accuracy: 0.0334 - val_loss: 8.2927 - val_accuracy: 0.0743
Epoch 3/10
3/3 [==============================] - 2s 812ms/step - loss: 7.7977 - accuracy: 0.0349 - val_loss: 8.2959 - val_accuracy: 0.0571
Epoch 4/10
3/3 [==============================] - 3s 825ms/step - loss: 7.6700 - accuracy: 0.0389 - val_loss: 8.5628 - val_accuracy: 0.0751
Epoch 5/10
3/3 [==============================] - 3s 829ms/step - loss: 7.5595 - accuracy: 0.0411 - val_loss: 8.5854 - val_accuracy: 0.0743
Epoch 6/10
3/3 [==============================] - 3s 807ms/step - loss: 7.4604 - accuracy: 0.0406 - val_loss: 8.7633 - val_accuracy: 0.0743
Epoch 7/10
3/3 [==============================] - 2s 815ms/step - loss: 7.3475 - accuracy: 0.0436 - val_loss: 8.9103 - val_accuracy: 0.0743
Epoch 8/10
3/3 [==============================] - 3s 825ms/step - loss: 7.2548 - accuracy: 0.0455 - val_loss: 9.0493 - val_accuracy: 0.0721
Epoch 9/10
3/3 [==============================] - 2s 814ms/step - loss: 7.1751 - accuracy: 0.0449 - val_loss: 9.0740 - val_accuracy: 0.0788
Epoch 10/10
3/3 [==============================] - 3s 831ms/step - loss: 7.1132 - accuracy: 0.0479 - val_loss: 9.2443 - val_accuracy: 0.0773
And the parameters that I transmit for training:
model = make_model(# the size of tokenized words
russian_vocab_size,
english_vocab_size,
# maximum sentence lengths
max_russian_sequence_length,
max_english_sequence_length,
512)
model.fit(preproc_russian_sentences, # all tokenized Russian offers that are transmitted in the format shape (X, Y)
preproc_english_sentences, # all tokenized English offers that are transmitted in the format shape (X, Y, 1)
epochs=10,
batch_size=1024,
validation_split=0.2,
callbacks=None,
verbose=1)
Thank you in advance.
I am trying to improve my model training performance following the Better performance with the tf.data API guideline. However, I have observed that the performance using .cache() is almost the same or even worse if compared to same settings without .cache().
datafile_list = load_my_files()
RAW_BYTES = 403*4
BATCH_SIZE = 32
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=datafile_list, record_bytes=RAW_BYTES, num_parallel_reads=10, buffer_size=1024*RAW_BYTES)
raw_dataset = raw_dataset.map(tf.autograph.experimental.do_not_convert(decode_and_prepare),
num_parallel_calls=tf.data.AUTOTUNE)
raw_dataset = raw_dataset.cache()
raw_dataset = raw_dataset.shuffle(buffer_size=1024)
raw_dataset = raw_dataset.batch(BATCH_SIZE)
raw_dataset = raw_dataset.prefetch(tf.data.AUTOTUNE)
The data in datafile_list hold 9.92GB which fairly fits the system total physical RAM available (100GB). System swap is disabled.
By training the model using the dataset:
model = build_model()
model.fit(raw_dataset, epochs=5, verbose=2)
results in:
Epoch 1/5
206247/206247 - 126s - loss: 0.0043 - mae: 0.0494 - mse: 0.0043
Epoch 2/5
206247/206247 - 125s - loss: 0.0029 - mae: 0.0415 - mse: 0.0029
Epoch 3/5
206247/206247 - 129s - loss: 0.0027 - mae: 0.0397 - mse: 0.0027
Epoch 4/5
206247/206247 - 125s - loss: 0.0025 - mae: 0.0386 - mse: 0.0025
Epoch 5/5
206247/206247 - 125s - loss: 0.0024 - mae: 0.0379 - mse: 0.0024
This result is frustrating. By the docs:
The first time the dataset is iterated over, its elements will be cached either in the specified file or in memory. Subsequent iterations will use the cached data.
And from this guide:
When iterating over this dataset, the second iteration will be much faster than the first one thanks to the caching.
However, the elapsed time took by all epochs are almost the same. In addition, during the training both CPU and GPU usage are very low (see images below).
By commenting out the line raw_dataset = raw_dataset.cache() the results do not show any notable difference:
Epoch 1/5
206067/206067 - 129s - loss: 0.0042 - mae: 0.0492 - mse: 0.0042
Epoch 2/5
206067/206067 - 127s - loss: 0.0028 - mae: 0.0412 - mse: 0.0028
Epoch 3/5
206067/206067 - 134s - loss: 0.0026 - mae: 0.0393 - mse: 0.0026
Epoch 4/5
206067/206067 - 127s - loss: 0.0024 - mae: 0.0383 - mse: 0.0024
Epoch 5/5
206067/206067 - 126s - loss: 0.0023 - mae: 0.0376 - mse: 0.0023
As pointed out in the docs, my expectations were using cache would result in a much fast training time. I would like to know what I am doing wrong.
Attachments
GPU usage during training using cache:
GPU usage during training WITHOUT cache:
System Stats (Memory, CPU etc) during training using cache:
System Stats (Memory, CPU etc) during training WITHOUT cache:
Just a small observation using Google Colab. According to the docs:
Note: For the cache to be finalized, the input dataset must be iterated through in its entirety. Otherwise, subsequent iterations will not use cached data.
And
Note: cache will produce exactly the same elements during each
iteration through the dataset. If you wish to randomize the iteration
order, make sure to call shuffle after calling cache.
I did notice a few differences when using caching and iterating over the dataset beforehand. Here is an example.
Prepare data:
import random
import struct
import tensorflow as tf
import numpy as np
RAW_N = 2 + 20*20 + 1
bytess = random.sample(range(1, 5000), RAW_N*4)
with open('mydata.bin', 'wb') as f:
f.write(struct.pack('1612i', *bytess))
def decode_and_prepare(register):
register = tf.io.decode_raw(register, out_type=tf.float32)
inputs = register[2:402]
label = tf.random.uniform(()) + register[402:]
return inputs, label
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin']*7000, record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(decode_and_prepare)
Train model without caching and iterating beforehand:
total_data_entries = len(list(raw_dataset.map(lambda x, y: (x, y))))
train_ds = raw_dataset.shuffle(buffer_size=total_data_entries).batch(32).prefetch(tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input((400,))
x = tf.keras.layers.Dense(200, activation='relu', kernel_initializer='normal')(inputs)
x = tf.keras.layers.Dense(100, activation='relu', kernel_initializer='normal')(x)
outputs = tf.keras.layers.Dense(1, kernel_initializer='normal')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(train_ds, epochs=5)
Epoch 1/5
875/875 [==============================] - 4s 3ms/step - loss: 0.1425
Epoch 2/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0841
Epoch 3/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0840
Epoch 4/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0840
Epoch 5/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0840
<keras.callbacks.History at 0x7fc41be037d0>
Training model with caching but no iterating:
total_data_entries = len(list(raw_dataset.map(lambda x, y: (x, y))))
train_ds = raw_dataset.shuffle(buffer_size=total_data_entries).cache().batch(32).prefetch(tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input((400,))
x = tf.keras.layers.Dense(200, activation='relu', kernel_initializer='normal')(inputs)
x = tf.keras.layers.Dense(100, activation='relu', kernel_initializer='normal')(x)
outputs = tf.keras.layers.Dense(1, kernel_initializer='normal')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(train_ds, epochs=5)
Epoch 1/5
875/875 [==============================] - 4s 2ms/step - loss: 0.1428
Epoch 2/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0841
Epoch 3/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 4/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 5/5
875/875 [==============================] - 2s 3ms/step - loss: 0.0840
<keras.callbacks.History at 0x7fc41fa87810>
Training model with caching and iterating:
total_data_entries = len(list(raw_dataset.map(lambda x, y: (x, y))))
train_ds = raw_dataset.shuffle(buffer_size=total_data_entries).cache().batch(32).prefetch(tf.data.AUTOTUNE)
_ = list(train_ds.as_numpy_iterator()) # iterate dataset beforehand
inputs = tf.keras.layers.Input((400,))
x = tf.keras.layers.Dense(200, activation='relu', kernel_initializer='normal')(inputs)
x = tf.keras.layers.Dense(100, activation='relu', kernel_initializer='normal')(x)
outputs = tf.keras.layers.Dense(1, kernel_initializer='normal')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(train_ds, epochs=5)
Epoch 1/5
875/875 [==============================] - 3s 3ms/step - loss: 0.1427
Epoch 2/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0841
Epoch 3/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 4/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 5/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
<keras.callbacks.History at 0x7fc41ac9c850>
Conclusion: The caching and the prior iteration of the dataset seem to have an effect on training, but in this example only 7000 files were used.
I built an algorithm in Python for data sets classification with Keras. It's a very simple LSTM network with 1 input layer, 1 hidden layer (LSTM) and 1 dense output layer.
My data consists of some analog measurements: 63 sets for training and 36 sets for testing, each set having 3 channels with 19200 samples each channel, so (following what I understood reading the documentation) the input shape I needed was x = (63,19200,3) and y = (36,19200,3). (If you want some additional information about the type of data, I can explain more.)
My code is as follows:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Input
from keras.layers import Dropout
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras import initializers
from keras import optimizers
# Fix random seed for reproducibility.
np.random.seed(1)
# Loading data (shapes: X_test (36,19200,3), y_test (36,3), X_train (63,19200,3), y_train (63,3))
(X_test, y_test), (X_train, y_train) = np.load('path.npy',allow_pickle=True)
data = [(X_test, y_test), (X_train, y_train)]
# Manually separating the validation data.
x_val = X_train[-10:]
y_val = y_train[-10:]
X_train = X_train[:-10]
y_train = y_train[:-10]
# Creating model.
model = Sequential()
model.add(Input(shape=(19200,3)))
model.add(LSTM(50, name = 'LSTM', activation='tanh',recurrent_activation='tanh', kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.05, seed=1), bias_initializer=initializers.zeros()))
model.add(Dense(1, name = 'Saida', activation='sigmoid', kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.05, seed=1), bias_initializer=initializers.zeros()))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
history = model.fit(X_train, y_train, epochs=20, batch_size=12, shuffle=True, validation_data=(x_val, y_val))
# Final evaluation of the model.
scores = model.evaluate(X_test, y_test, verbose=1)
print("Accuracy: %.2f%%" % (scores[1]*100))
Very simple, but not that organized, still working on that.
And for this run, the results are:
Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
LSTM (LSTM) (None, 50) 10800
_________________________________________________________________
Saida (Dense) (None, 1) 51
=================================================================
Total params: 10,851
Trainable params: 10,851
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/20
5/5 [==============================] - 17s 3s/step - loss: 0.6866 - accuracy: 0.6792 - val_loss: 0.6956 - val_accuracy: 0.0000e+00
Epoch 2/20
5/5 [==============================] - 20s 4s/step - loss: 0.6814 - accuracy: 0.8113 - val_loss: 0.6979 - val_accuracy: 0.0000e+00
Epoch 3/20
5/5 [==============================] - 21s 4s/step - loss: 0.6915 - accuracy: 0.7925 - val_loss: 0.7002 - val_accuracy: 0.0000e+00
Epoch 4/20
5/5 [==============================] - 24s 5s/step - loss: 0.6810 - accuracy: 0.7925 - val_loss: 0.7025 - val_accuracy: 0.0000e+00
Epoch 5/20
5/5 [==============================] - 25s 5s/step - loss: 0.6828 - accuracy: 0.7925 - val_loss: 0.7048 - val_accuracy: 0.0000e+00
Epoch 6/20
5/5 [==============================] - 24s 5s/step - loss: 0.6703 - accuracy: 0.8302 - val_loss: 0.7072 - val_accuracy: 0.0000e+00
Epoch 7/20
5/5 [==============================] - 24s 5s/step - loss: 0.6787 - accuracy: 0.7925 - val_loss: 0.7095 - val_accuracy: 0.0000e+00
Epoch 8/20
5/5 [==============================] - 26s 5s/step - loss: 0.6963 - accuracy: 0.7547 - val_loss: 0.7117 - val_accuracy: 0.0000e+00
Epoch 9/20
5/5 [==============================] - 25s 5s/step - loss: 0.6776 - accuracy: 0.7925 - val_loss: 0.7141 - val_accuracy: 0.0000e+00
Epoch 10/20
5/5 [==============================] - 25s 5s/step - loss: 0.6640 - accuracy: 0.8302 - val_loss: 0.7164 - val_accuracy: 0.0000e+00
Epoch 11/20
5/5 [==============================] - 24s 5s/step - loss: 0.6626 - accuracy: 0.8491 - val_loss: 0.7187 - val_accuracy: 0.0000e+00
Epoch 12/20
5/5 [==============================] - 24s 5s/step - loss: 0.6504 - accuracy: 0.8491 - val_loss: 0.7210 - val_accuracy: 0.0000e+00
Epoch 13/20
5/5 [==============================] - 24s 5s/step - loss: 0.6729 - accuracy: 0.7925 - val_loss: 0.7233 - val_accuracy: 0.0000e+00
Epoch 14/20
5/5 [==============================] - 24s 5s/step - loss: 0.6602 - accuracy: 0.8302 - val_loss: 0.7257 - val_accuracy: 0.0000e+00
Epoch 15/20
5/5 [==============================] - 25s 5s/step - loss: 0.6857 - accuracy: 0.7547 - val_loss: 0.7281 - val_accuracy: 0.0000e+00
Epoch 16/20
5/5 [==============================] - 23s 5s/step - loss: 0.6630 - accuracy: 0.8113 - val_loss: 0.7305 - val_accuracy: 0.0000e+00
Epoch 17/20
5/5 [==============================] - 25s 5s/step - loss: 0.6633 - accuracy: 0.7925 - val_loss: 0.7328 - val_accuracy: 0.0000e+00
Epoch 18/20
5/5 [==============================] - 24s 5s/step - loss: 0.6600 - accuracy: 0.8302 - val_loss: 0.7352 - val_accuracy: 0.0000e+00
Epoch 19/20
5/5 [==============================] - 25s 5s/step - loss: 0.6670 - accuracy: 0.8113 - val_loss: 0.7374 - val_accuracy: 0.0000e+00
Epoch 20/20
5/5 [==============================] - 24s 5s/step - loss: 0.6534 - accuracy: 0.8302 - val_loss: 0.7399 - val_accuracy: 0.0000e+00
2/2 [==============================] - 1s 314ms/step - loss: 0.7171 - accuracy: 0.4167
Accuracy: 41.67%
Summarizing: High loss, but decrease very slowly. Accuracy is varying, but in the end it stabilizes at the same value (usually 0,7925 or 0,8113). And my accuracy for the validation set don't even respond to any changes that occur with the other metrics.
My main concern is the validation data is not behaving as it should. I already tried changing the optimizers, activation functions of every layer, weight initializers, number of epochs (went till 100 several times but nothing changed), batch size, shuffling the data using Keras function and Python built-in method, and so on.
The only thing I did not tried was to change the input shapes, but, as I mentioned earlier, this was the only way I got the 3D array to be accepted in the Input Layer.
If you guys have any tips to what can be changed to achieve more consistent results, I would be very grateful.
Any additional commentary will be happily accepted.
This is my first question here and I am not a native english speaker, so sorry if any information was not very clear.
Cheers, Matheus Zimmermann.
I think you can apply to_categorical method or One hot encoding approach to the
y_train , y_val and y_test variables.
Hope than after applying it ,you will find your validation accuracy perfectly.
I faced this type of same problem before.
I was building a CNN model for predicting the class of of an x-ray image as covid infected or not. during training the model this is what i was getting as accuracy and loss in each epoch.
Epoch 1/20
43/43 [==============================] - 157s 4s/step - loss: 16.5535 - accuracy: 0.8844 - val_loss: 1.6308 - val_accuracy: 0.9914
Epoch 2/20
43/43 [==============================] - 153s 4s/step - loss: 9.3576 - accuracy: 0.9647 - val_loss: 1.8470 - val_accuracy: 0.9871
Epoch 3/20
43/43 [==============================] - 152s 4s/step - loss: 4.8507 - accuracy: 0.9720 - val_loss: 2.1491 - val_accuracy: 0.9871
Epoch 4/20
43/43 [==============================] - 153s 4s/step - loss: 2.8917 - accuracy: 0.9772 - val_loss: 0.5409 - val_accuracy: 0.9914
Epoch 5/20
43/43 [==============================] - 153s 4s/step - loss: 1.7138 - accuracy: 0.9831 - val_loss: 0.4102 - val_accuracy: 0.9957
Epoch 6/20
43/43 [==============================] - 153s 4s/step - loss: 2.4398 - accuracy: 0.9801 - val_loss: 5.5315 - val_accuracy: 0.9569
Epoch 7/20
43/43 [==============================] - 153s 4s/step - loss: 4.3175 - accuracy: 0.9661 - val_loss: 0.5032 - val_accuracy: 0.9914
Epoch 8/20
43/43 [==============================] - 152s 4s/step - loss: 1.7567 - accuracy: 0.9816 - val_loss: 0.5169 - val_accuracy: 0.9914
Epoch 9/20
43/43 [==============================] - 153s 4s/step - loss: 1.5359 - accuracy: 0.9786 - val_loss: 0.2652 - val_accuracy: 0.9957
Epoch 10/20
43/43 [==============================] - 153s 4s/step - loss: 0.9022 - accuracy: 0.9897 - val_loss: 0.1173 - val_accuracy: 0.9957
Epoch 11/20
43/43 [==============================] - 153s 4s/step - loss: 0.9991 - accuracy: 0.9801 - val_loss: 0.2755 - val_accuracy: 0.9871
when i run the classification report on the predictions the accuracy is only 50%. why is this happening pls someone explain?
After training many models, I came to know that accuracy doesn't help much. That is why, you should focus on the loss and not much on the accuracy. In this case, I would suggest you to reduce the learning rate, since I can see that the learning is not much stable. Also, I guess that you are not using Batch Normalization layers in your network, due to which you are getting the accuracy as 50%. I can't say much, as you have not provided much details (like you have not provided your model structure, optimizer and loss function you are using). But try reducing the learning rate and introducing Batch Normalization to your model.