Considering the fact that you have a basic model similar to this:
input_layer = layers.Input(shape=(50,20))
layer = layers.Dense(123, activation = 'relu')
layer = layers.LSTM(128, return_sequences = True)(layer)
outputs = layers.Dense(20, activation='softmax')(layer)
model = Model(input_layer,outputs)
How would you implement CTC loss? I tried something from the keras code tutorial on OCR like this:
class CTCLayer(layers.Layer):
def __init__(self, name=None):
super().__init__(name=name)
self.loss_fn = keras.backend.ctc_batch_cost
def call(self, y_true, y_pred):
# Compute the training-time loss value and add it
# to the layer using `self.add_loss()`.
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
self.add_loss(loss)
# At test time, just return the computed predictions
return y_pred
labels = layers.Input(shape=(None,), dtype="float32")
input_layer = layers.Input(shape=(50,20))
layer = layers.Dense(123, activation = 'relu')
layer = layers.LSTM(128, return_sequences = True)(layer)
outputs = layers.Dense(20, activation='softmax')(layer)
output = CTCLayer()(labels,outputs)
model = Model(input_layer,outputs)
However when it came to the model.fit part it started to fall apart due to me not knowing how to feed the model the "label" input layer thing. I think that the approach in the tutorial is quite unambiguous so what would be a better and more efficient way to do implement the CTC loss?
The only thing you are doing wrong is the Model creation model = Model(input_layer,outputs) it should be model = Model([input_layer,labels],output) that said you can also compile the model with tf.nn.ctc_loss as loss if you don't want to have 2 inputs
def my_loss_fn(y_true, y_pred):
loss_value = tf.nn.ctc_loss(y_true, y_pred, y_true_length, y_pred_length,
logits_time_major = False)
return tf.reduce_mean(loss_value, axis=-1)
model.compile(optimizer='adam', loss=my_loss_fn)
Something like this, Note that the code above is not tested and you need to find the the y_pred and y_true length but you can do that as is done in the ctc layer
Related
I'm trying to train slot filling by LSTM in tensorflow, but received a shape error after first epoch.
Error mes: "Input to reshape is a tensor with 5632 values, but the requested shape has 12800"
I used sequences model and defined timestep = 128, which is the length of the input data sequence(after padding).
input data in model.fit : x_train.shape = (7244, 128) , y_train.shape = (7244, 128)
I have no idea why the model work on in the first epoch but fail in the latter , appreciate to any suggestion!
batch_size = 100
sequence_input = Input(shape=(128,), dtype='int32',batch_size=batch_size)
embedding_layer = Embedding(embeddings.shape[0],
embedding_dim,
weights = [embeddings],
input_length = 128,
name = 'embeddings',mask_zero=True)
# embedding layer use a dict according to glove 300d,
# and I set input_length =128 because I gauss this parameter also mentioned time step? not pretty sure.
embedded_sequences = embedding_layer(sequence_input)
x = Bidirectional(GRU(90,
stateful=True,
return_sequences=True,
name='lstm_layer',
go_backwards=True))(embedded_sequences)
x = Dense(50, activation="sigmoid")(x)
pred = Dense(9, activation="sigmoid")(x)
_model = Model(sequence_input, pred)
_model.compile(loss = 'SparseCategoricalCrossentropy',
optimizer='adam',
metrics = ['accuracy'])
_model.summary()
history = _model.fit(x_train, y_train,
epochs =10, batch_size =batch_size, shuffle = False,
validation_data=(sequences['train'], labels['train']))
model summary
I have a model for OCR, and I came to realise that after 2-3 epochs the output of the model is the same. So when I predicted values and looked at the output for each layer I realised that all layers after the 1st layer in the LSTM block output the same values no matter the output. So here is the model (or the parts related to the problem):
Processing = layers.Reshape((12,9472))(encoder)
Processing = layers.Dense(128, activation='relu')(Processing)
lstm = layers.Bidirectional(layers.LSTM(256, return_sequences = True))(Processing)
lstm = layers.Bidirectional(layers.LSTM(128, return_sequences = True))(lstm)
lstm = layers.Bidirectional(layers.LSTM(64, return_sequences = True))(lstm)
outputs = layers.Dense(358,activation=tf.keras.layers.LeakyReLU(alpha=0.1))(lstm)
outputs = layers.Dense(358, activation=tf.keras.layers.LeakyReLU(alpha=0.1))(outputs)
outputs = layers.Dense(l, activation='softmax',name='output')(outputs)
output = CTCLayer()(labels,outputs)
Everything after (and including) the 1st LSTM layer outputs the same value (not including the CTC layer thing due to it being removed for predictions).
Before picking the model apart I thought it may have been a dying relu problem so I replaced all of the activation functions which where relu with the leaky relu. So what is causing this is there something wrong with my implementation? or what may be causing the everything after the LSTM layer to "die". How would I fix the underlying issue?
FYI: the CTCLayer is the one from the code example from the keras website
BTW: Another weird thing is that even after it outputs the same thing the loss values reduce for some time (so from 20.5 - 16.2), so its still learning. Im pretty sure it has nothing to do with the learning rate as I experimented with extremely small values (1e-10, it just took allot longer to get to the point where all the outputs become the same which from my observation is between 22 and 16, in terms of loss)
class CTCLayer(layers.Layer):
def __init__(self, name=None):
super().__init__(name=name)
self.loss_fn = keras.backend.ctc_batch_cost
def call(self, y_true, y_pred):
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
self.add_loss(loss)
return y_pred
I am following this tutorial on Keras , but I don't know how to correctly save this model with custom layer after the training and load it.
This problem has been mentioned in here and here but apparently non of those solutions work for this Keras example. Can anyone point me in the right direction?
P.S: here is the main part of the code:
class CTCLayer(layers.Layer):
def __init__(self, name=None):
super().__init__(name=name)
self.loss_fn = keras.backend.ctc_batch_cost
def call(self, y_true, y_pred):
# Compute the training-time loss value and add it
# to the layer using `self.add_loss()`.
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
self.add_loss(loss)
# At test time, just return the computed predictions
return y_pred
def build_model():
# Inputs to the model
input_img = layers.Input(
shape=(img_width, img_height, 1), name="image", dtype="float32"
)
labels = layers.Input(name="label", shape=(None,), dtype="float32")
# First conv block
x = layers.Conv2D(
32,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv1",
)(input_img)
x = layers.MaxPooling2D((2, 2), name="pool1")(x)
# Second conv block
x = layers.Conv2D(
64,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv2",
)(x)
x = layers.MaxPooling2D((2, 2), name="pool2")(x)
# We have used two max pool with pool size and strides 2.
# Hence, downsampled feature maps are 4x smaller. The number of
# filters in the last layer is 64. Reshape accordingly before
# passing the output to the RNN part of the model
new_shape = ((img_width // 4), (img_height // 4) * 64)
x = layers.Reshape(target_shape=new_shape, name="reshape")(x)
x = layers.Dense(64, activation="relu", name="dense1")(x)
x = layers.Dropout(0.2)(x)
# RNNs
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x)
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x)
# Output layer
x = layers.Dense(len(characters) + 1, activation="softmax", name="dense2")(x)
# Add CTC layer for calculating CTC loss at each step
output = CTCLayer(name="ctc_loss")(labels, x)
# Define the model
model = keras.models.Model(
inputs=[input_img, labels], outputs=output, name="ocr_model_v1"
)
# Optimizer
opt = keras.optimizers.Adam()
# Compile the model and return
model.compile(optimizer=opt)
return model
# Get the model
model = build_model()
model.summary()class CTCLayer(layers.Layer):
def __init__(self, name=None):
super().__init__(name=name)
self.loss_fn = keras.backend.ctc_batch_cost
def call(self, y_true, y_pred):
# Compute the training-time loss value and add it
# to the layer using `self.add_loss()`.
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
self.add_loss(loss)
# At test time, just return the computed predictions
return y_pred
def build_model():
# Inputs to the model
input_img = layers.Input(
shape=(img_width, img_height, 1), name="image", dtype="float32"
)
labels = layers.Input(name="label", shape=(None,), dtype="float32")
# First conv block
x = layers.Conv2D(
32,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv1",
)(input_img)
x = layers.MaxPooling2D((2, 2), name="pool1")(x)
# Second conv block
x = layers.Conv2D(
64,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv2",
)(x)
x = layers.MaxPooling2D((2, 2), name="pool2")(x)
# We have used two max pool with pool size and strides 2.
# Hence, downsampled feature maps are 4x smaller. The number of
# filters in the last layer is 64. Reshape accordingly before
# passing the output to the RNN part of the model
new_shape = ((img_width // 4), (img_height // 4) * 64)
x = layers.Reshape(target_shape=new_shape, name="reshape")(x)
x = layers.Dense(64, activation="relu", name="dense1")(x)
x = layers.Dropout(0.2)(x)
# RNNs
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x)
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x)
# Output layer
x = layers.Dense(len(characters) + 1, activation="softmax", name="dense2")(x)
# Add CTC layer for calculating CTC loss at each step
output = CTCLayer(name="ctc_loss")(labels, x)
# Define the model
model = keras.models.Model(
inputs=[input_img, labels], outputs=output, name="ocr_model_v1"
)
# Optimizer
opt = keras.optimizers.Adam()
# Compile the model and return
model.compile(optimizer=opt)
return model
# Get the model
model = build_model()
model.summary()
epochs = 100
early_stopping_patience = 10
# Add early stopping
early_stopping = keras.callbacks.EarlyStopping(
monitor="val_loss", patience=early_stopping_patience, restore_best_weights=True
)
# Train the model
history = model.fit(
train_dataset,
validation_data=validation_dataset,
epochs=epochs,
callbacks=[early_stopping],
)
# Get the prediction model by extracting layers till the output layer
prediction_model = keras.models.Model(
model.get_layer(name="image").input, model.get_layer(name="dense2").output
)
prediction_model.summary()
The problem is not actually with Keras's saving methods. The characters set is inconsistent and does not keep ordering. Add the below code after creating the characters set to solve the issue:
characters = sorted(list(characters))
#Amirhosein, check out this function in the Horovod repository:
Serialize:
https://github.com/horovod/horovod/blob/6f0bb9fae826167559501701d4a5a0380284b5f0/horovod/spark/keras/util.py#L115
Deserialize:
https://github.com/horovod/horovod/blob/6f0bb9fae826167559501701d4a5a0380284b5f0/horovod/spark/keras/remote.py#L267
Example of use for deserialization:
https://github.com/horovod/horovod/blob/6f0bb9fae826167559501701d4a5a0380284b5f0/horovod/spark/keras/remote.py#L118
If you are using custom objects like custom metrics or custom Loss function, you will need to use custom_object_scope as in the example.
It used a package called cloudpickle (https://pypi.org/project/cloudpickle/) under the hood to convert the KerasModel to a string and vice versa.
I am trying to write a custom loss function for a keras NN model, but it seems like the loss function is outputting the wrong value. My loss function is
def tangle_loss3(input_tensor):
def custom_loss(y_true, y_pred):
true_diff = y_true - input_tensor
pred_diff = y_pred - input_tensor
normalized_diff = K.abs(tf.math.divide(pred_diff, true_diff))
normalized_diff = tf.reduce_mean(normalized_diff)
return normalized_diff
return custom_loss
Then I use it in this simple feed-forward network:
input_layer = Input(shape=(384,), name='input')
hl_1 = Dense(64, activation='elu', name='hl_1')(input_layer)
hl_2 = Dense(32, activation='elu', name='hl_2')(hl_1)
hl_3 = Dense(32, activation='elu', name='hl_3')(hl_2)
output_layer = Dense(384, activation=None, name='output')(hl_3)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model = tf.keras.models.Model(input_layer, output_layer)
model.compile(loss=tangle_loss3(input_layer), optimizer=optimizer)
Then to test whether the loss function is working, I created a random input and target vector and did the numpy calculation of what I expect, but this does not seem to match the result from keras.
X = np.random.rand(1, 384)
y = np.random.rand(1, 384)
np.mean(np.abs((model.predict(X) - X)/(y - X)))
# returns some number
model.test_on_batch(X, y)
# always returns 0.0
Why does my loss function always return zero? And should these answers match?
I misunderstood your issue, and I have updated my method. it should work now. I stack the input layer and output layer to get a new layer that I pass to output.
def tangle_loss3(y_true, y_pred):
true_diff = y_true - y_pred[0]
pred_diff = y_pred[1] - y_pred[0]
normalized_diff = tf.abs(tf.math.divide(pred_diff, true_diff))
normalized_diff = tf.reduce_mean(normalized_diff)
return normalized_diff
input_layer = Input(shape=(384,), name='input')
hl_1 = Dense(64, activation='elu', name='hl_1')(input_layer)
hl_2 = Dense(32, activation='elu', name='hl_2')(hl_1)
hl_3 = Dense(32, activation='elu', name='hl_3')(hl_2)
output_layer = Dense(384, activation=None, name='output')(hl_3)
out = tf.stack([input_layer, output_layer])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model = tf.keras.models.Model(input_layer, out)
model.compile(loss=tangle_loss3, optimizer=optimizer)
and now when I calculate the loss it works
X = np.random.rand(1, 384)
y = np.random.rand(1, 384)
np.mean(np.abs((model.predict(X)[1] - X)/(y - X)))
# returns some number
model.test_on_batch(X, y)
Note that I have to use model.predict(X)[1] as we get two outputs, both the input and output layers' results. This is just one hacky solution but it works.
The custom loss works well with single non-nested custom_loss(y_true,y_pred). You can try to add Subtract layer of keras for output and then try to use new label as new_label = label - input right before you add to to the training pipeline.
Now only use customloss
I am a beginner in Keras programming. I just want to manually update the model weights manually in keras so as to get a deep understanding of gradient descent. However, when I tried it, the model either cannot get converged or the loss even gets exploded. My steps are listed as follows:
First, I use keras sequential model to fit a quadratic function y = 2*x*x - 7*x + 11
below is the code using the sequential model:
model = Sequential()
model.add(Dense(64, input_dim = 1, activation = 'relu'))
model.add(Dense(32, activation = 'relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()
training loss
fitted curved and original one
Then, I use the following code to update the weight manually:
class MyModel(keras.Model):
def __init__(self):
super().__init__()
self.layer1 = Dense(64, input_shape = (1, ))
self.layer2 = Dense(32)
self.layer3 = Dense(1)
def forward(self, x):
y = keras.activations.relu(self.layer1(x))
y = keras.activations.relu(self.layer2(y))
y = self.layer3(y)
return y
def loss_fun(y_pred, y):
return keras_backend.mean(keras.losses.mean_squared_error(y, y_pred))
def compute_loss(model, x, y, loss_fun = loss_fun):
logits = model.forward(x)
mse = loss_fun(y, logits)
return mse, logits
def compute_gradients(model, x, y, loss_fun = loss_fun):
with tf.GradientTape() as tape:
loss, _ = compute_loss(model, x, y, loss_fun)
return tape.gradient(loss, model.trainable_variables), loss
def apply_gradients(optimizer, gradients, variables):
optimizer.apply_gradients(zip(gradients, variables))
def train_batch(x, y, model, optimizer):
'''
one step batch training
'''
gradients, loss = compute_gradients(model, x, y)
apply_gradients(optimizer, gradients, model.trainable_variables)
return loss
model2 = MyModel()
epochs = 200
optimizer = keras.optimizers.Adam(learning_rate = 0.01) #据查这个0.01是keras默认的learning rate
loss = []
x_train = np.expand_dims(x_train, axis = 0)
y_train = np.expand_dims(y_train, axis = 0)
for i in range(epochs):
l = train_batch(x_train, y_train, model2, optimizer)
loss.append(l)
if i % 10 == 0:
print(f'current loss = {l}')
while the loss looks like this:
I also try another way to manually update the weights:
epochs = 200
lr = 0.01
optimizer = keras.optimizers.Adam(learning_rate = 0.01)
loss = []
x_train = np.expand_dims(x_train, axis = 0)
y_train = np.expand_dims(y_train, axis = 0)
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)
for i in range(epochs):
y_pred = model5.forward(x_train)
l = k.mean(keras.losses.mean_squared_error(y_train, y_pred))
gradient = k.gradients(l, model5.trainable_weights)
new_weights = model5.get_weights() - 0.001 * np.array(gradients)
model5.set_weights(new_weights)
if i % 10 == 0:
loss.append(l)
print(f'{i}th loss is: {l}')
In this case, the loss explodes like this:
where is the problem?
I have figure out where the problem is.
When getting the model through the following code:
model = MyModel()
The trainable variables in model are null.
When I try to print them using this:
print(model.trainable_variables)
it outputs
[]
I try to make the weight trainable manually by the following code:
for layers in model.layers:
layers.trainable = True
But it still doesn't work at all.