Tensorflow Random segmentation faults

Tensorflow Random segmentation faults - python

I am trying to run the demo code from official tensorflow website
I am attaching the full code (copied and arranged) here for ease
import tensorflow as tf
# print("1")
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import time
import os
# print("2")
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
# #tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
# #tf.function
def test_step(x, y):
val_logits = model(x, training=False)
val_acc_metric.update_state(y, val_logits)
inputs = keras.Input(shape=(784,), name="digits")
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
outputs = layers.Dense(10, name="predictions")(x2)
model = keras.Model(inputs=inputs, outputs=outputs)
# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
start_time = time.time()
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
loss_value = train_step(x_batch_train, y_batch_train)
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %d samples" % ((step + 1) * 64))
# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
test_step(x_batch_val, y_batch_val)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print("Validation acc: %.4f" % (float(val_acc),))
print("Time taken: %.2fs" % (time.time() - start_time))
print("end")
Without any reason, this code enters Segmentation Fault in Tensorflow 2.3.1 right at the beginning
>python dummy.py
2021-03-11 17:45:52.231509: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Segmentation fault (core dumped)
Interestingly if I put some random print statements at the very start(those print("1") etc statements, the code will execute till the end and suffer segmentation fault at the end(redundant output not shown)
Start of epoch 1
Training loss (for one batch) at step 0: 1.0215
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.9116
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.4894
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.5636
Seen so far: 38464 samples
Training acc over epoch: 0.8416
Validation acc: 0.8296
Time taken: 3.16s
end
Segmentation fault (core dumped)
Another observation is, if I uncomment the #tf.function on top of my trainStep and testStep functions, the code enters into segfault again but after it prints
Start of epoch 0
Can someone explain what is going wrong with my Tensorflow package?

It was due to older version of Ubuntu. I was using 14. After upgrading to 18, the issue got resolved

Related

tensorflow training: pass model as argument to training loop?

I have experimented with training a simple tensorflow model using two scenarios: passing my model to my training loop (and to the subfunctions which are called from training loop), versus not passing my model to the training loop. The two cases result in different results. When passing my model to the training functions, the model is trained properly. But in the second scenario, something is wrong because the model is apparently not trained. I am baffled, and I wonder if it's a scope thing.
To be more specific, my setup involves dynamically creating a new model of larger size (adding some layers at each iteration of a for loop), and then training the resulting model. As stated before, I train the model in two scenarios: passing the model to training subfunctions and not doing so, and I obtain different results depending on which one I do. I verify this by giving the model a test sample (class 0 MNIST images) and checking if the correct classification is output. The models trained by passing the model as an argument is trained correctly, but, if I do not do this, then only the first model created by the for loop is correctly trained, as verified by incorrect class predictions. Can this be explained?
The code below is for training by not passing model as an argument.
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
from tensorflow.keras.optimizers import Adam
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import time
epochs = 200
input_shape = (28,28,1)
num_classes=10
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
x_train = np.expand_dims(x_train, -1)
x_val = np.expand_dims(x_val, -1)
x_test = np.expand_dims(x_test, -1)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_val = x_val.astype("float32")
y_test_sorted_args_0=np.where(y_test == 0)
x_test_0=x_test[y_test_sorted_args_0]
y_test_0=np.full( (x_test_0.shape)[0],0)
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)
acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=False)
#tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
mod_output = model(x, training=True)
loss_value = loss_fn(y, mod_output)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
acc_metric.update_state(y, mod_output)
return loss_value
#tf.function
def test_step(x, y):
val = model(x, training=False)
acc_metric.update_state(y, val)
def train( epochs):
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
start_time = time.time()
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
loss_value = train_step( x_batch_train, y_batch_train)
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %d samples" % ((step + 1) * batch_size))
# Display metrics at the end of each epoch.
train_acc = acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
# Reset training metrics at the end of each epoch
acc_metric.reset_states()
# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
test_step(x_batch_val, y_batch_val)
val_acc = acc_metric.result()
val_acc_metric.reset_states()
print("Validation acc: %.4f" % (float(val_acc),))
print("Time taken: %.2fs" % (time.time() - start_time))
max_hidden=7
for num_hidden_layers in range(1,max_hidden,3):
model1 = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Flatten(),
]
)
for i in range(1, num_hidden_layers+1):
model1.add(layers.Dense(150, activation="relu"))
model1.add(layers.Dense(num_classes, activation="softmax"))
model=model1
train(epochs)
#verify that the model is properly trained by checking that the model correclty predicts images from class 0.
#The more class 0 predictions we have, the better.
for sample_index in range(0,10):
x_sample=x_test_0[sample_index,:,:,:]
x_sample=np.expand_dims(x_sample, 0)
print(tf.math.argmax(model(x_sample),axis=1))
time.sleep(1)

Changing BatchNormalization momentum while training in Tensorflow 2

I want batch normalization running statistics (mean and variance) to converge in the end of training, which requires to increase batch norm momentum from some initial value to 1.0. I managed to change momentum using a custom Callback, but it works only if my model is compiled in eager mode. Toy example (it sets momentum=1.0 after epoch zero due to which moving_mean should stop updating):
import tensorflow as tf # version 2.3.1
import tensorflow_datasets as tfds
ds_train, ds_test = tfds.load("mnist", split=["train", "test"], shuffle_files=True, as_supervised=True)
ds_train = ds_train.batch(128)
ds_test = ds_test.batch(128)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.ReLU(),
tf.keras.layers.Dense(10),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
# run_eagerly=True,
)
class BatchNormMomentumCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
last_bn_layer = None
for layer in self.model.layers:
if isinstance(layer, tf.keras.layers.BatchNormalization):
if epoch == 0:
layer.momentum = 0.99
else:
layer.momentum = 1.0
last_bn_layer = layer
if last_bn_layer:
tf.print("Momentum=" + str(last_bn_layer.moving_mean[-1].numpy())) # Should not change after epoch 1
batchnorm_decay = BatchNormMomentumCallback()
model.fit(ds_train, epochs=6, validation_data=ds_test, callbacks=[batchnorm_decay], verbose=0)
Output (get this when run_eagerly=False)
Momentum=0.0
Momentum=-102.20184
Momentum=-106.04614
Momentum=-116.36204
Momentum=-129.995
Momentum=-123.70443
Expected output (get it when run_eagerly=True)
Momentum=0.0
Momentum=-5.9038606
Momentum=-5.9038606
Momentum=-5.9038606
Momentum=-5.9038606
Momentum=-5.9038606
I guess this happens because in graph mode TF compiles the model as graph with a momentum defined as 0.99, and the uses this value in the graph (so momentum is not updated by BatchNormMomentumCallback).
Question:
Is there a way to update that compiled momentum variable inside the graph while training? I want to update momentum not in eager mode (i.e. using run_eagerly=False) because training efficiency is important.

I would recommend simply using a custom training loop for your use case. You will have all the flexibility you need:
import tensorflow as tf # version 2.3.1
import tensorflow_datasets as tfds
ds_train, ds_test = tfds.load("mnist", split=["train", "test"], shuffle_files=True, as_supervised=True)
ds_train = ds_train.batch(128)
ds_test = ds_test.batch(128)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.ReLU(),
tf.keras.layers.Dense(10),
]
)
optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()
batch_norm_layer = model.layers[2]
#tf.function
def train_step(epoch, model, batch):
if epoch == 0:
batch_norm_layer.momentum = 0.99
else:
batch_norm_layer.momentum = 1.0
with tf.GradientTape() as tape:
x_batch_train, y_batch_train = batch
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
train_acc_metric.update_state(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
epochs = 6
for epoch in range(epochs):
tf.print("\nStart of epoch %d" % (epoch,))
tf.print("Momentum = ", batch_norm_layer.moving_mean[-1], summarize=-1)
for batch in ds_train:
train_step(epoch, model, batch)
train_acc = train_acc_metric.result()
tf.print("Training acc over epoch: %.4f" % (float(train_acc),))
train_acc_metric.reset_states()
Start of epoch 0
Momentum = 0
Training acc over epoch: 0.9158
Start of epoch 1
Momentum = -20.2749767
Training acc over epoch: 0.9634
Start of epoch 2
Momentum = -20.2749767
Training acc over epoch: 0.9755
Start of epoch 3
Momentum = -20.2749767
Training acc over epoch: 0.9826
Start of epoch 4
Momentum = -20.2749767
Training acc over epoch: 0.9876
Start of epoch 5
Momentum = -20.2749767
Training acc over epoch: 0.9915
A simple test shows that the function with the tf.function decorator performs way better:
import tensorflow as tf # version 2.3.1
import tensorflow_datasets as tfds
import timeit
ds_train, ds_test = tfds.load("mnist", split=["train", "test"], shuffle_files=True, as_supervised=True)
ds_train = ds_train.batch(128)
ds_test = ds_test.batch(128)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.ReLU(),
tf.keras.layers.Dense(10),
]
)
optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()
batch_norm_layer = model.layers[2]
#tf.function
def train_step(epoch, model, batch):
if epoch == 0:
batch_norm_layer.momentum = 0.99
else:
batch_norm_layer.momentum = 1.0
with tf.GradientTape() as tape:
x_batch_train, y_batch_train = batch
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
train_acc_metric.update_state(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
def train_step_without_tffunction(epoch, model, batch):
if epoch == 0:
batch_norm_layer.momentum = 0.99
else:
batch_norm_layer.momentum = 1.0
with tf.GradientTape() as tape:
x_batch_train, y_batch_train = batch
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
train_acc_metric.update_state(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
epochs = 6
for epoch in range(epochs):
tf.print("\nStart of epoch %d" % (epoch,))
tf.print("Momentum = ", batch_norm_layer.moving_mean[-1], summarize=-1)
test = True
for batch in ds_train:
train_step(epoch, model, batch)
if test:
tf.print("TF function:", timeit.timeit(lambda: train_step(epoch, model, batch), number=10))
tf.print("Eager function:", timeit.timeit(lambda: train_step_without_tffunction(epoch, model, batch), number=10))
test = False
train_acc = train_acc_metric.result()
tf.print("Training acc over epoch: %.4f" % (float(train_acc),))
train_acc_metric.reset_states()
Start of epoch 0
Momentum = 0
TF function: 0.02285163299893611
Eager function: 0.11109527599910507
Training acc over epoch: 0.9229
Start of epoch 1
Momentum = -88.1852188
TF function: 0.024091466999379918
Eager function: 0.1109461480009486
Training acc over epoch: 0.9639
Start of epoch 2
Momentum = -88.1852188
TF function: 0.02331122400210006
Eager function: 0.11751473100230214
Training acc over epoch: 0.9756
Start of epoch 3
Momentum = -88.1852188
TF function: 0.02656845700039412
Eager function: 0.1121610670015798
Training acc over epoch: 0.9830
Start of epoch 4
Momentum = -88.1852188
TF function: 0.02821972700257902
Eager function: 0.15709391699783737
Training acc over epoch: 0.9877
Start of epoch 5
Momentum = -88.1852188
TF function: 0.02441513300072984
Eager function: 0.10921925399816246
Training acc over epoch: 0.9917

Another option is to declare the momentum as a variable
momentum = tf.Variable(0.99, trainable=False)
# pass into the BN layer
tf.keras.layers.BatchNormalization(momentum=momentum)
Then you can have a callback that updates the momentum
class BNMomentumUpdate(tf.keras.callbacks.Callback):
def __init__(self, momentum):
super().__init__()
self.momentum = momentum
def on_epoch_end(self, epoch, logs=None):
if epoch > 0:
self.momentum.assign(1.)

Keras Own Training routine vs Keras Fit

I am trying to compare the results of my own training loop with the one given by Keras Fit. I provide a code snippet below (RUN_TYPE = 1 runs own training loop, else run Keras fit) . As you can see:
I seed the rnds so my generated training set is exactly the same (checked).
I use the same function to create the DNN architecture.
I use the same hyperparameter values (learning rate, optimised parameters, batch_size, etc)
I specifically use shuffle=False in the Keras.fit function to disable any batch shuffling.
When I set batch_size to the size of the training sample, I get the same loss between the two versions. As soon as batch_size < training sample size, i.e. an epoch takes multiple steps, my results diverge. Having made the algo deterministic I don't understand where the discrepancy comes from.
Any tips?
RUN_TYPE = 2
import numpy as np
import time
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import LambdaCallback
import tensorflow.keras.backend as K
np.random.seed(1)
tf.random.set_seed(2)
#tf.function
def training_step(x, y, model, opt):
print(x)
print(y)
with tf.GradientTape() as tape:
predictions = model(x, training=True)
mseLoss = keras.losses.MeanSquaredError()
loss = mseLoss(y,predictions)
grads = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grads, model.trainable_variables))
return loss
#HyperParameters Initialisation
#learnging_rate, beta_1, beta_2, blablabla.
if RUN_TYPE == 1:
X_train, y_train = createDataSet(n)
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_dataset = train_dataset.batch(batch_size)
# An optimizer for updating the trainable variables
optimizer = keras.optimizers.Adam(learning_rate=learning_rate, beta_1=beta_1, beta_2=beta_2)
# Create an instance of the model
model = createModel(X_train.shape[1:])
# Train the model
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
loss = training_step(x_batch_train, y_batch_train, model, optimizer)
print("Step " + str(step) + " loss = " + str(loss.numpy()))
else:
X, Y = createDataSet(n)
model = createModel(X.shape[1:])
optimizer = keras.optimizers.Adam(learning_rate=learning_rate, beta_1=beta_1, beta_2=beta_2)
model.compile(loss="mse", optimizer=optimizer)
history = model.fit(X, Y, epochs=epochs, batch_size=batch_size, verbose=0, shuffle=False) #

Keras .fit giving better performance than manual Tensorflow

I'm new to Tensorflow and Keras. To get started, I followed the https://www.tensorflow.org/tutorials/quickstart/advanced tutorial. I'm now adapting it to train on CIFAR10 instead of MNIST dataset. I recreated this model https://keras.io/examples/cifar10_cnn/ and I'm trying to run it in my own codebase.
Logically, if the model, batch size and optimizer are all the same, then the two should perform identically, but they don't. I thought it might be that I'm making a mistake in preparing the data. So I copied the model.fit function from the keras code into my script, and it still performs better. Using .fit gives me around 75% accuracy in 25 epochs, while with the manual method it takes around 60 epochs. With .fit I also achieve slightly better max accuracy.
What I want to know is: Is .fit doing something behind the scenes that's optimizing training? What do I need to add to my code to get the same performance? Am I doing something obviously wrong?
Thanks for your time.
Main code:
import tensorflow as tf
from tensorflow import keras
import msvcrt
from Plotter import Plotter
#########################Configuration Settings#############################
BatchSize = 32
ModelName = "CifarModel"
############################################################################
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
print("x_train",x_train.shape)
print("y_train",y_train.shape)
print("x_test",x_test.shape)
print("y_test",y_test.shape)
x_train, x_test = x_train / 255.0, x_test / 255.0
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
train_ds = tf.data.Dataset.from_tensor_slices(
(x_train, y_train)).batch(BatchSize)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BatchSize)
loss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.0001,decay=1e-6)
# Create an instance of the model
model = ModelManager.loadModel(ModelName,10)
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(name='test_accuracy')
########### Using this function I achieve better results ##################
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=BatchSize,
epochs=100,
validation_data=(x_test, y_test),
shuffle=True,
verbose=2)
############################################################################
########### Using the below code I achieve worse results ##################
#tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
#tf.function
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
epoch = 0
InterruptLoop = False
while InterruptLoop == False:
#Shuffle training data
train_ds.shuffle(1000)
epoch = epoch + 1
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
for images, labels in train_ds:
train_step(images, labels)
for test_images, test_labels in test_ds:
test_step(test_images, test_labels)
test_accuracy = test_accuracy.result() * 100
train_accuracy = train_accuracy.result() * 100
#Print update to console
template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
print(template.format(epoch,
train_loss.result(),
train_accuracy ,
test_loss.result(),
test_accuracy))
# Check if keyboard pressed
while msvcrt.kbhit():
char = str(msvcrt.getch())
if char == "b'q'":
InterruptLoop = True
print("Stopping loop")
The model:
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Dropout, MaxPool2D
from tensorflow.keras import Model
class ModelData(Model):
def __init__(self,NumberOfOutputs):
super(ModelData, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu', padding='same', input_shape=(32,32,3))
self.conv2 = Conv2D(32, 3, activation='relu')
self.maxpooling1 = MaxPool2D(pool_size=(2,2))
self.dropout1 = Dropout(0.25)
############################
self.conv3 = Conv2D(64,3,activation='relu',padding='same')
self.conv4 = Conv2D(64,3,activation='relu')
self.maxpooling2 = MaxPool2D(pool_size=(2,2))
self.dropout2 = Dropout(0.25)
############################
self.flatten = Flatten()
self.d1 = Dense(512, activation='relu')
self.dropout3 = Dropout(0.5)
self.d2 = Dense(NumberOfOutputs,activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.maxpooling1(x)
x = self.dropout1(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.maxpooling2(x)
x = self.dropout2(x)
x = self.flatten(x)
x = self.d1(x)
x = self.dropout3(x)
x = self.d2(x)
return x

I know this question already has an answer, but I faced the same problem and the solution seemed to be something different, that's not specified in the documentation.
I copy & paste here the answer (and the relative link) I found on GitHub, which solved the issue in my case:
The problem is caused by broadcasting in your loss function in the
custom loop. Make sure that the dimensions of predictions and label is
equal. At the moment (for MAE) they are [128,1] and [128]. Just make
use of tf.squeeze or tf.expand_dims.
Link: https://github.com/tensorflow/tensorflow/issues/28394
Basic translation: when computing the loss, always be sure of the tensors' shapes.

Mentioning the solution here (Answer Section) even though it is present in the Comments, for the benefit of the Community.
On the same Dataset, the Accuracy can differ when using Keras Model.fit with that of the Model built using Tensorflow mainly if the Data is shuffled because, when we shuffle the Data, the Split of Data between Training and Testing (or Validation) will be different resulting in different Train and Test Data in both the cases (Keras and Tensorflow).
If we want to observe the similar results on the Same Dataset and with similar Architecture in Keras and in Tensorflow, we can Turn off Shuffling the Data.
Hope this helps. Happy Learning!

Tensorflow Eager execution won't work

I've been trying to replicate the Eager Execution tutorial using the MNIST dataset, but it doesn't work... I get no error, but it looks like as the code just stops in the first round, the output is
Epoch 000: Loss: nan, Accuracy: 9.898%
I've tested other MNIST codes and they progressed faster... (I waited for about 30 min)
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
#Load dataset
mnist = tf.contrib.learn.datasets.load_dataset('mnist')
train_data = mnist.train.images
train_labels = np.asarray(mnist.train.labels, dtype = np.int32)
test_data = mnist.test.images
test_labels = np.asarray(mnist.test.labels, dtype = np.int32)
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_labels))
train_dataset = train_dataset.shuffle(buffer_size=100000)
train_dataset = train_dataset.batch(10)
features, label = iter(train_dataset).next()
print("example features:", features[0])
print("example label:", label[0])
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation="relu", input_shape=(784,)), # input shape required
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(3)
])
def loss(model, x, y):
y_ = model(x)
return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return tape.gradient(loss_value, model.variables)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.01)
train_loss_results = []
train_accuracy_results = []
num_epochs = 200
for epoch in range(num_epochs):
epoch_loss_avg = tf.contrib.eager.metrics.Mean()
epoch_accuracy = tf.contrib.eager.metrics.Accuracy()
for x,y in train_dataset:
grads = grad(model, x, y)
optimizer.apply_gradients(zip(grads, model.variables),
global_step=tf.train.get_or_create_global_step())
epoch_loss_avg(loss(model, x, y))
epoch_accuracy(tf.argmax(model(x), axis=1, output_type=tf.int32), y)
train_loss_results.append(epoch_loss_avg.result())
train_accuracy_results.append(epoch_accuracy.result())
if epoch % 50 == 0:
print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch,
epoch_loss_avg.result(),
epoch_accuracy.result()))
PS. I know that the model probably won't be efficient, is just a test run

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow Random segmentation faults - python

It was due to older version of Ubuntu. I was using 14. After upgrading to 18, the issue got resolved

Related

tensorflow training: pass model as argument to training loop?

Changing BatchNormalization momentum while training in Tensorflow 2

Keras Own Training routine vs Keras Fit

Keras .fit giving better performance than manual Tensorflow

Tensorflow Eager execution won't work

Categories

Resources