training model for different batch sizes in keras - python

I want to train my model for different batch sizes i.e: [64, 128]
I am doing it with for loop like below
epoch=2
batch_sizes = [128,256]
for i in range(len(batch_sizes)):
history = model.fit(x_train, y_train, batch_sizes[i], epochs=epochs,
callbacks=[early_stopping, chk], validation_data=(x_test, y_test))
for above code my model produce following results:
Epoch 1/2
311/311 [==============================] - 157s 494ms/step - loss: 0.2318 -
f1: 0.0723
Epoch 2/2
311/311 [==============================] - 152s 488ms/step - loss: 0.1402 -
f1: 0.4360
Epoch 1/2
156/156 [==============================] - 137s 877ms/step - loss: 0.1197 -
f1: **0.5450**
Epoch 2/2
156/156 [==============================] - 136s 871ms/step - loss: 0.1132 -
f1: 0.5756
it looks like the model continues training after completing training for batch size 64, i.e I want to get my model trained for the next batch from scratch, how can I do it kindly guide me.
p.s: what i have tried:
epoch=2
batch_sizes = [128,256]
for i in range(len(batch_sizes)):
history = model.fit(x_train, y_train, batch_sizes[i], epochs=epochs,
callbacks=[early_stopping, chk], validation_data=(x_test, y_test))
keras.backend.clear_session()
it also did not worked

You can write a function to define a model, and you would need to call that before the subsequent fit calls. If your model is contained within model, the weights are updated during training, and they stay that way after the fit call. That is why you need to redefine the model. This can help you
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
X = np.random.rand(1000,5)
Y = np.random.rand(1000,1)
def build_model():
model = Sequential()
model.add(Dense(64,input_shape=(X.shape[1],)))
model.add(Dense(Y.shape[1]))
model.compile(loss='mse',optimizer='Adam')
return model
epoch=2
batch_sizes = [128,256]
for i in range(len(batch_sizes)):
model = build_model()
history = model.fit(X, Y, batch_sizes[i], epochs=epoch, verbose=2)
model.save('Model_' + str(batch_sizes[i]) + '.h5')
Then, the output looks like:
Epoch 1/2
8/8 - 0s - loss: 0.3164
Epoch 2/2
8/8 - 0s - loss: 0.1367
Epoch 1/2
4/4 - 0s - loss: 0.7221
Epoch 2/2
4/4 - 0s - loss: 0.4787

Related

tf.keras.callbacks.ModelCheckpoint ignores the montior parameter and always use loss

I am running tf.keras.callbacks.ModelCheckpoint with the accuracy metric but loss is used to save the best checkpoints. I have tested this in different places (my computer and collab) and two different code and faced the same issue. Here is an example code and the results:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
import shutil
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
return model
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
ckpt_folder = os.path.join(os.getcwd(), 'ckpt')
if os.path.exists(ckpt_folder):
shutil.rmtree(ckpt_folder)
ckpt_path = os.path.join(r'D:\deep_learning\tf_keras\semantic_segmentation\logs', 'mymodel_{epoch}')
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
# The saved model name will include the current epoch.
filepath=ckpt_path,
montior="val_accuracy",
# save the model weights with best validation accuracy
mode='max',
save_best_only=True, # only save the best weights
save_weights_only=False,
# only save model weights (not whole model)
verbose=1
)
]
model = get_compiled_model()
model.fit(
x_train, y_train, epochs=3, batch_size=1, callbacks=callbacks, validation_split=0.2, steps_per_epoch=1
)
1/1 [==============================] - ETA: 0s - loss: 2.6475 - accuracy: 0.0000e+00
Epoch 1: val_loss improved from -inf to 2.32311, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_1
1/1 [==============================] - 6s 6s/step - loss: 2.6475 - accuracy: 0.0000e+00 - val_loss: 2.3231 - val_accuracy: 0.1142
Epoch 2/3
1/1 [==============================] - ETA: 0s - loss: 1.9612 - accuracy: 1.0000
Epoch 2: val_loss improved from 2.32311 to 2.34286, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_2
1/1 [==============================] - 5s 5s/step - loss: 1.9612 - accuracy: 1.0000 - val_loss: 2.3429 - val_accuracy: 0.1187
Epoch 3/3
1/1 [==============================] - ETA: 0s - loss: 2.8378 - accuracy: 0.0000e+00
Epoch 3: val_loss did not improve from 2.34286
1/1 [==============================] - 5s 5s/step - loss: 2.8378 - accuracy: 0.0000e+00 - val_loss: 2.2943 - val_accuracy: 0.1346
In your code, You write montior instead of monitor, and the function doesn't have this word as param then use the default value, If you write like below, You get what you want:
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
filepath=ckpt_path,
monitor="val_accuracy",
mode='max',
save_best_only=True,
save_weights_only=False,
verbose=1
)
]

Binary classification model using BERT encoder stuck at 50% accuracy

I'm trying to train a simple model for the Yelp binary classification task.
Load BERT encoder:
gs_folder_bert = "gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12"
bert_config_file = os.path.join(gs_folder_bert, "bert_config.json")
config_dict = json.loads(tf.io.gfile.GFile(bert_config_file).read())
bert_config = bert.configs.BertConfig.from_dict(config_dict)
_, bert_encoder = bert.bert_models.classifier_model(
bert_config, num_labels=2)
checkpoint = tf.train.Checkpoint(model=bert_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
Load data:
data, info = tfds.load('yelp_polarity_reviews', with_info=True, batch_size=-1, as_supervised=True)
train_x_orig, train_y_orig = tfds.as_numpy(data['train'])
train_x = encode_examples(train_x_orig)
train_y = train_y_orig
Use BERT to embed the data:
encoder_output = bert_encoder.predict(train_x)
Setup the model:
inputs = keras.Input(shape=(768,))
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(8, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
sgd = SGD(lr=0.0001)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
Train:
model.fit(encoder_output[0], train_y, batch_size=64, epochs=3)
# encoder_output[0].shape === (10000, 1, 768)
# y_train.shape === (100000,)
Training results:
Epoch 1/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6921 - accuracy: 0.5455
Epoch 2/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6918 - accuracy: 0.5455
Epoch 3/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6915 - accuracy: 0.5412
Epoch 4/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6913 - accuracy: 0.5407
Epoch 5/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6911 - accuracy: 0.5358
I tried different learning rates, but the main issue seems that training takes 1 second and the accuracy stays at ~0.5. Am I not setting the inputs/model correctly?
Your BERT model is not training. It has to be placed before dense layers and train as part of the model. the input layer has to take not BERT vectors, but the sequence of tokens cropped to max_length and padded. Here is the example code: https://keras.io/examples/nlp/text_extraction_with_bert/, see the beginning of create_model function.
Alternatively, you can use Trainer from transformers.

Why is my model overfitting on the second epoch?

I'm a beginner in deep learning and I'm trying to train a deep learning model to classify different ASL hand signs using Mobilenet_v2 and Inception.
Here are my codes create an ImageDataGenerator for creating the training and validation set.
# Reformat Images and Create Batches
IMAGE_RES = 224
BATCH_SIZE = 32
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split = 0.4
)
train_generator = datagen.flow_from_directory(
base_dir,
target_size = (IMAGE_RES,IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_generator = datagen.flow_from_directory(
base_dir,
target_size= (IMAGE_RES, IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'validation'
)
Here are the codes to train the models:
# Do transfer learning with Tensorflow Hub
URL = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
feature_extractor = hub.KerasLayer(URL,
input_shape=(IMAGE_RES, IMAGE_RES, 3))
# Freeze pre-trained model
feature_extractor.trainable = False
# Attach a classification head
model = tf.keras.Sequential([
feature_extractor,
layers.Dense(5, activation='softmax')
])
model.summary()
# Train the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
EPOCHS = 5
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=EPOCHS,
validation_data = val_generator,
validation_steps=len(val_generator)
)
Epoch 1/5
94/94 [==============================] - 19s 199ms/step - loss: 0.7333 - accuracy: 0.7730 - val_loss: 0.6276 - val_accuracy: 0.7705
Epoch 2/5
94/94 [==============================] - 18s 190ms/step - loss: 0.1574 - accuracy: 0.9893 - val_loss: 0.5118 - val_accuracy: 0.8145
Epoch 3/5
94/94 [==============================] - 18s 191ms/step - loss: 0.0783 - accuracy: 0.9980 - val_loss: 0.4850 - val_accuracy: 0.8235
Epoch 4/5
94/94 [==============================] - 18s 196ms/step - loss: 0.0492 - accuracy: 0.9997 - val_loss: 0.4541 - val_accuracy: 0.8395
Epoch 5/5
94/94 [==============================] - 18s 193ms/step - loss: 0.0349 - accuracy: 0.9997 - val_loss: 0.4590 - val_accuracy: 0.8365
I've tried using data augmentation but the model still overfits so I'm wondering if I've done something wrong in my code.
Your data is very small. Try splitting with random seeds and check if the problem still persists.
If it does, then use regularizations and decrease the complexity of neural network.
Also experiment with different optimizers and smaller learning rate (try lr scheduler)
It seems like your dataset is very small with some true outputs separated only by a small distance of inputs in the input-output curve. That is why it is fitting easily to those points.

why loss is so high for 4 inputs 2 outputs function approximation using MLP?

My code is very simple since my understanding is MLP can approximate any function:
def build_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(5, activation='tanh', input_shape=(4, ), name='a'),
tf.keras.layers.Dense(5, activation='tanh'),
tf.keras.layers.Dense(2, activation='sigmoid', name='b')])
optimizer = tf.keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='mse',optimizer=optimizer)
return model
def train_benchmark_NN(x, y, epochs=10000):
model = build_model()
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=20, verbose=0)
history = model.fit(x, y, batch_size = 1000, epochs=epochs, validation_split = 0.2, verbose=1, callbacks=[es])
return model, history
I tried different number of layers(like[256, 128, 64, 32]), nodes, optimizer, initializer, activation function. I also tried handle the two outputs separately instead of training one model for them together, but the result is bad too. Actually I don't have a decent judge on how heavy my model should be for data like this. I tried training model for some know function with same number of input and output, it's always very hard when the function like
y1=math.cos(x1)+math.cos(x2)+math.cos(x3)+math.cos(x4).
Can anyone tell me, I should try much heavier model or I did something wrong in my code? or I have to preprocess the data differently? I only normalized it with zscore. Data size is ~6000 in total.
Current results:
Epoch 62/10000
4936/4936 [==============================] - 0s 4us/sample - loss: 0.2711 - val_loss: 3.9427
Epoch 63/10000
4936/4936 [==============================] - 0s 4us/sample - loss: 0.2686 - val_loss: 3.9444
Epoch 64/10000
4936/4936 [==============================] - 0s 3us/sample - loss: 0.2661 - val_loss: 3.9457
If I change validation_split from 0.2 to 0.01,the result became very different:
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3729 - val_loss: 0.0589
Epoch 96/10000
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3683 - val_loss: 0.0356
Epoch 97/10000
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3702 - val_loss: 0.0381
i: 0 , err_mean: 2.383471436639142
<matplotlib.legend.Legend at 0x7fdbb329d7f0>
Although the val_loss became much smaller, that probably because the validation size isn't big enough, because when I plot the errors, it still looks same.
Some visualization of the relationships in my data:
inputs are x1-car speed, x2-engine torque, x3-DOC temperature, x4-DPF temperature
outputs are y1-tailpipe CO gas, y2-tailpipe HC gas.
y1 against x1, x2, x3, x4 are shown below:
Should this be function easy to approximate at all? Thanks!!!
I plotted errors along targets, it seems the model didn't learn at all, because the errors is very correlated to the targets.

0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?

I'm encountering a very strange with a keras model using ImageDataGenerator, fit_generator, and evaluate_generator.
I'm creating the model like so:
classes = <list of classes>
num_classes = len(classes)
pretrained_model = Sequential()
pretrained_model.add(ResNet50(include_top=False, weights='imagenet', pooling='avg'))
pretrained_model.add(Dense(num_classes, activation='softmax'))
pretrained_model.layers[0].trainable = False
pretrained_model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
And I'm training it like this:
idg_final = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
width_shift_range = 0.2,
height_shift_range = 0.2,
rotation_range=15,
)
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=1, verbose=1)
fit_generator prints loss: 1.0297 - acc: 0.7546.
Then, I am trying to evaluate the model on the exact same data it was trained on.
debug_gen = idg_final.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes, shuffle=True)
print(pretrained_model.evaluate_generator(debug_gen, steps=100))
Which prints [10.278913383483888, 0.0].
Why is the accuracy so different on the same exact data?
Edit: I also wanted to point out that sometimes the accuracy is above 0.0. For example, when I use a model trained with five epochs, evaluate_accuracy returns 6% accuracy.
Edit 2: Based on the answers below I made sure to train for more epochs and that the ImageDataGenerator for evaluation did not have random shifts and rotations. I'm still getting very high accuracy during training and extremely low accuracy during evaluation on the same dataset.
I'm training like
idg_final = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
width_shift_range = 0.2,
height_shift_range = 0.2,
rotation_range=15,
)
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=10, verbose=1)
Which prints the following:
Found 9850 images belonging to 4251 classes.
Epoch 1/10
308/308 [==============================] - 3985s 13s/step - loss: 8.9218 - acc: 0.0860
Epoch 2/10
308/308 [==============================] - 3555s 12s/step - loss: 3.2710 - acc: 0.3403
Epoch 3/10
308/308 [==============================] - 3594s 12s/step - loss: 1.8597 - acc: 0.5836
Epoch 4/10
308/308 [==============================] - 3656s 12s/step - loss: 1.2712 - acc: 0.7058
Epoch 5/10
308/308 [==============================] - 3667s 12s/step - loss: 0.9556 - acc: 0.7795
Epoch 6/10
308/308 [==============================] - 3689s 12s/step - loss: 0.7665 - acc: 0.8207
Epoch 7/10
308/308 [==============================] - 3693s 12s/step - loss: 0.6581 - acc: 0.8498
Epoch 8/10
308/308 [==============================] - 3618s 12s/step - loss: 0.5874 - acc: 0.8636
Epoch 9/10
308/308 [==============================] - 3823s 12s/step - loss: 0.5144 - acc: 0.8797
Epoch 10/10
308/308 [==============================] - 4334s 14s/step - loss: 0.4835 - acc: 0.8854
And I'm evaluating like this on the exact same dataset
idg_debug = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
)
debug_gen = idg_debug.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes)
print(pretrained_model.evaluate_generator(debug_gen))
Which prints the following very low accuracy: [10.743386410747084, 0.0001015228426395939]
The full code is here.
Two things I suspect.
1 - No, your data is not the same.
You're using three types of augmentation in ImageDataGenerator, and it seems there isn't a random seed being set. So, test data is not equal to training data.
And as it seems, you're also training for only one epoch, which is very little (unless you really have tons of data, but since you're using augmentation, maybe that's not the case). (PS: I don't see the steps_per_epoch argument in your fit_generator call...)
So, if you want to see good results, here are some solutions:
remove the augmentation arguments from the generator for this test (either training and test data) - This means, remove width_shift_range, height_shift_range and rotation_range;
if not, train for really long, enough for your model to really get used to all kinds of augmented images (as it seems, five epochs seem still to be way too little);
or set a random seed and guarantee that the test data is equal to the training data (argument seed in flow_from_directory)
2 - (This may happen if you're very new to Keras/programming, so please ignore if it's not the case) You might be running the code that defines the model again when testing.
If you run the code that defines the model again, it will replace all your previous training with random weights.
3 - Since we're out of suggestions:
Maybe save the weights instead of saving the model. I usually do this instead of saving the models. (For some reason I don't understand, I've never been able to load a model like that)
def createModel():
....
model = createModel()
...
model.fit_generator(....)
np.save('model_weights.npy',model.get_weights())
model = createModel()
model.set_weights(np.load('model_weights.npy'))
...
model.evaluate_generator(...)
Hint:
It's not related to the bug, but make sure that the base model layer is really layer 0. If I remember well, sequential models have an input layer and you should actually be making layer 1 untrainable instead.
Use the model.summary() to confirm the number of untrainable parameters.

Categories