TensorBoard: adding output image to callback

TensorBoard: adding output image to callback - python

I built a network that attempts to predict raster images of surface temperatures.
The output of the network is a (1000, 1000) size array, representing a raster image. For training and testing these are compared to the real raster of their respective samples.
I understand how to add the training image to my TensorBoard callback but I'd like to also add the network's output image to the callback, so that I could compare them visually. Is this possible?
x = Input(shape = (2))
x = Dense(4)(x)
x = Reshape((2, 2))(x)
Where Reshape would be the last layer (or one before some deconvolution layer).

Depending on the tensorflow version you are using, I would have 2 different codes to suggest. I will assume you use > 2.0 and post the code I use for that version for image-to-image models. I basically initialize a callback with a noisy image (I am doing denoising but you can easily adapt to your problem), and the corresponding ground truth image. I then use the model to do the inference after each epoch.
"""Inspired by https://github.com/sicara/tf-explain/blob/master/tf_explain/callbacks/grad_cam.py"""
import numpy as np
import tensorflow as tf
from tensorflow.keras.callbacks import Callback
class TensorBoardImage(Callback):
def __init__(self, log_dir, image, noisy_image):
super().__init__()
self.log_dir = log_dir
self.image = image
self.noisy_image = noisy_image
def set_model(self, model):
self.model = model
self.writer = tf.summary.create_file_writer(self.log_dir, filename_suffix='images')
def on_train_begin(self, _):
self.write_image(self.image, 'Original Image', 0)
def on_train_end(self, _):
self.writer.close()
def write_image(self, image, tag, epoch):
image_to_write = np.copy(image)
image_to_write -= image_to_write.min()
image_to_write /= image_to_write.max()
with self.writer.as_default():
tf.summary.image(tag, image_to_write, step=epoch)
def on_epoch_end(self, epoch, logs={}):
denoised_image = self.model.predict_on_batch(self.noisy_image)
self.write_image(denoised_image, 'Denoised Image', epoch)
So typically you would use this the following way:
# define the model
model = Model(inputs, outputs)
# define the callback
image_tboard_cback = TensorBoardImage(
log_dir=log_dir + '/images',
image=val_gt[0:1],
noisy_image=val_noisy[0:1],
)
# fit the model
model.fit(
x,
y,
callbacks=[image_tboard_cback,],
)
If you use versions prior to 2.0 I can direct to this gist I wrote (which is a bit more intricate).

Related

I am running into a gradient computation inplace error

I am running this code (https://github.com/ayu-22/BPPNet-Back-Projected-Pyramid-Network/blob/master/Single_Image_Dehazing.ipynb) on a custom dataset but I am running into this error.
RuntimeError: one of the variables needed for gradient computation has been modified by an in place operation: [torch. cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Please refer to the code link above for clarification of where the error is occurring.
I am running this model on a custom dataset, the data loader part is pasted below.
import torchvision.transforms as transforms
train_transform = transforms.Compose([
transforms.Resize((256,256)),
#transforms.RandomResizedCrop(256),
#transforms.RandomHorizontalFlip(),
#transforms.ColorJitter(),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])
])
class Flare(Dataset):
def __init__(self, flare_dir, wf_dir,transform = None):
self.flare_dir = flare_dir
self.wf_dir = wf_dir
self.transform = transform
self.flare_img = os.listdir(flare_dir)
self.wf_img = os.listdir(wf_dir)
def __len__(self):
return len(self.flare_img)
def __getitem__(self, idx):
f_img = Image.open(os.path.join(self.flare_dir, self.flare_img[idx])).convert("RGB")
for i in self.wf_img:
if (self.flare_img[idx].split('.')[0][4:] == i.split('.')[0]):
wf_img = Image.open(os.path.join(self.wf_dir, i)).convert("RGB")
break
f_img = self.transform(f_img)
wf_img = self.transform(wf_img)
return f_img, wf_img
flare_dir = '../input/flaredataset/Flare/Flare_img'
wf_dir = '../input/flaredataset/Flare/Without_Flare_'
flare_img = os.listdir(flare_dir)
wf_img = os.listdir(wf_dir)
wf_img.sort()
flare_img.sort()
print(wf_img[0])
train_ds = Flare(flare_dir, wf_dir,train_transform)
train_loader = torch.utils.data.DataLoader(dataset=train_ds,
batch_size=BATCH_SIZE,
shuffle=True)
To get a better idea of the dataset class , you can compare my dataset class with the link pasted above

Your code is stuck in what is called the "Backpropagation" of your GAN Network.
What you have defined your backward graph should follow is the following:
def backward(self, unet_loss, dis_loss):
dis_loss.backward(retain_graph = True)
self.dis_optimizer.step()
unet_loss.backward()
self.unet_optimizer.step()
So in your backward graph, you are propagating the dis_loss which is the combination of the discriminator and adversarial loss first and then you are propagating the unet_loss which is the combination of UNet, SSIM and ContentLoss but the unet_loss is connected to discriminator's output loss. So the pytorch is confused and gives you this error as you are taking the optimizer step of dis_loss before even storing the backward graph for unet_loss and I would recommend you to change the code as follows:
def backward(self, unet_loss, dis_loss):
dis_loss.backward(retain_graph = True)
unet_loss.backward()
self.dis_optimizer.step()
self.unet_optimizer.step()
And this will start your training! but you can experiment with your retain_graph=True.
And great work on the BPPNet Work.

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

I get an error using gradient visualization with transfer learning in TF 2.0. The gradient visualization works on a model that does not use transfer learning.
When I run my code I get the error:
assert str(id(x)) in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("block5_conv3/Identity:0", shape=(None, 14, 14, 512), dtype=float32)
When I run the code below it errors. I think there's an issue with the naming conventions or connecting inputs and outputs from the base model, vgg16, to the layers I'm adding. Really appreciate your help!
"""
Broken example when grad_model is created.
"""
!pip uninstall tensorflow
!pip install tensorflow==2.0.0
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
IMAGE_PATH = '/content/cat.3.jpg'
LAYER_NAME = 'block5_conv3'
model_layer = 'vgg16'
CAT_CLASS_INDEX = 281
imsize = (224,224,3)
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
plt.figure()
plt.imshow(img)
img = tf.io.read_file(IMAGE_PATH)
img = tf.image.decode_jpeg(img)
img = tf.cast(img, dtype=tf.float32)
# img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.image.resize(img, (224,224))
img = tf.reshape(img, (1, 224,224,3))
input = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet',
input_shape=(imsize[0], imsize[1], imsize[2]))
# base_model.trainable = False
flat = layers.Flatten()
dropped = layers.Dropout(0.5)
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
fc1 = layers.Dense(16, activation='relu', name='dense_1')
fc2 = layers.Dense(16, activation='relu', name='dense_2')
fc3 = layers.Dense(128, activation='relu', name='dense_3')
prediction = layers.Dense(2, activation='softmax', name='output')
for layr in base_model.layers:
if ('block5' in layr.name):
layr.trainable = True
else:
layr.trainable = False
x = base_model(input)
x = global_average_layer(x)
x = fc1(x)
x = fc2(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = input, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
This portion of the code is where the error lies. I'm not sure what is the correct way to label inputs and outputs.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input, model.get_layer(model_layer).input],
outputs=[model.get_layer(model_layer).get_layer(LAYER_NAME).output,
model.output])
print(model.get_layer(model_layer).get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(img)
loss = predictions[:, 1]
The section below is for plotting a heatmap of gradcam.
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]
# Apply guided backpropagation
gate_f = tf.cast(output > 0, 'float32')
gate_r = tf.cast(grads > 0, 'float32')
guided_grads = gate_f * gate_r * grads
# Average gradients spatially
weights = tf.reduce_mean(guided_grads, axis=(0, 1))
# Build a ponderated map of filters according to gradients importance
cam = np.ones(output.shape[0:2], dtype=np.float32)
for index, w in enumerate(weights):
cam += w * output[:, :, index]
# Heatmap visualization
cam = cv2.resize(cam.numpy(), (224, 224))
cam = np.maximum(cam, 0)
heatmap = (cam - cam.min()) / (cam.max() - cam.min())
cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET)
output_image = cv2.addWeighted(cv2.cvtColor(img.astype('uint8'), cv2.COLOR_RGB2BGR), 0.5, cam, 1, 0)
plt.figure()
plt.imshow(output_image)
plt.show()
I also asked this to the tensorflow team on github at https://github.com/tensorflow/tensorflow/issues/37680.

I figured it out. If you set up the model extending the vgg16 base model with your own layers, rather than inserting the base model into a new model like a layer, then it works.
First set up the model and be sure to declare the input_tensor.
inp = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet', input_tensor=inp,
input_shape=(imsize[0], imsize[1], imsize[2]))
This way we don't have to include a line like x=base_model(inp) to show what input we want to put in. That's already included in tf.keras.applications.VGG16(...).
Instead of putting this vgg16 base model inside another model, it's easier to do gradcam by adding layers to the base model itself. I grab the output of the last layer of VGG16 (with the top removed), which is the pooling layer.
block5_pool = base_model.get_layer('block5_pool')
x = global_average_layer(block5_pool.output)
x = fc1(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = inp, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
Now, I grab the layer for visualization, LAYER_NAME='block5_conv3'.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input],
outputs=[model.output, model.get_layer(LAYER_NAME).output])
print(model.get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
predictions, conv_outputs = grad_model(img)
loss = predictions[:, 1]
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]

We (I plus a number of team members developing a project) found a similar problem with a code implementing Grad-CAM that we found in a tutorial.
That code didn't work with a model consisting of the base model of VGG19 plus a few extra layers added on top of it. The problem was that the VGG19 base model was inserted as a "layer" inside our model, and apparently the GradCAM code didn't know how to deal with it - we were getting a "Graph disconnected..." error. Then after some debugging (carried out by another team member, not me) we managed to modify the original code to make it work for this kind of model that contains another model inside it. The idea is to add the inner model as an extra argument of the class GradCAM. Since this may be helpful to others I am including the modified code below (we also renamed the GradCAM class as My_GradCAM).
class My_GradCAM:
def __init__(self, model, classIdx, inner_model=None, layerName=None):
self.model = model
self.classIdx = classIdx
self.inner_model = inner_model
if self.inner_model == None:
self.inner_model = model
self.layerName = layerName
[...]
gradModel = tensorflow.keras.models.Model(inputs=[self.inner_model.inputs],
outputs=[self.inner_model.get_layer(self.layerName).output,
self.inner_model.output])
Then the class can be instantiated by adding the inner model as the extra argument, e.g.:
cam = My_GradCAM(model, None, inner_model=model.get_layer("vgg19"), layerName="block5_pool")
I hope this helps.
Edit: Credit to Mirtha Lucas for doing the debugging and finding the solution.

After a lot of struggle, I condense the way to draw the heat map when you are using transfer learning. Here is the keras official tutorial
The issue I encounter is that when I'm trying to draw the heat map
from my model, the densenet can be only seen as functional layer in my
model. So the make_gradcam_heatmap can not figure out the layer that
inside functional layer. As the 5th layer shows.
Therefore, to simulate the Keras official document, I need to only use the densenet as the model for visualization. Here is the step
Only Take out the model from your model
dense_model = dense_model.get_layer('densenet121')
Copy the weight from dense model to your new initiated model
inputs = tf.keras.Input(shape=(224, 224, 3))
model = model_builder(weights="imagenet", include_top=True, input_tensor=inputs)
for layer, dense_layer in zip(model.layers[1:], dense_model.layers[1:]):
layer.set_weights(dense_layer.get_weights())
relu = model.get_layer('relu')
x = tf.keras.layers.GlobalAveragePooling2D()(relu.output)
outputs = tf.keras.layers.Dense(5)(x)
model = tf.keras.models.Model(inputs = inputs, outputs = outputs)
Draw the heat map
preprocess_input = keras.applications.densenet.preprocess_input
img_array = preprocess_input(get_img_array(img_path, size=(224, 224)))
heatmap = make_gradcam_heatmap(img_array, model, 'bn')
plt.matshow(heatmap)
plt.show()
get_img_array, make_gradcam_heatmap and save_and_display_gradcam are kept in still. Follow the keras tutorial then you are good to go.

Tensorflow Callback as Custom Metric for CTC

In an attempt to yield more metrics during the training of my model (written in TensorFlow version 2.1.0), like the Character Error Rate (CER) and Word Error Rate (WER), I created a callback to pass to the fit function of my model. It is able to generate the CER and WER at the end of an epoch.
It's my second choice as I wanted to create a custom metric for this, but you can only use keras backend functionality for custom metrics. Does anyone have any advice on how to convert the callback below into a Custom Metric (which can then be calculated during training on the validation and/or training data)?
Some roadblocks I encountered are:
Failure to convert the K.ctc_decode result to a sparse tensor
How can you calculate a distance like edit-distance using the Keras backend?
class Metrics(tf.keras.callbacks.Callback):
def __init__(self, valid_data, steps):
"""
valid_data is a TFRecordDataset with batches of 100 elements per batch, shuffled and repeated infinitely.
steps define the amount of batches per epoch
"""
super(Metrics, self).__init__()
self.valid_data = valid_data
self.steps = steps
def on_train_begin(self, logs={}):
self.cer = []
self.wer = []
def on_epoch_end(self, epoch, logs={}):
imgs = []
labels = []
for idx, (img, label) in enumerate(self.valid_data.as_numpy_iterator()):
if idx >= self.steps:
break
imgs.append(img)
labels.extend(label)
imgs = np.array(imgs)
labels = np.array(labels)
out = self.model.predict((batch for batch in imgs))
input_length = len(max(out, key=len))
out = np.asarray(out)
out_len = np.asarray([input_length for _ in range(len(out))])
decode, log = K.ctc_decode(out,
out_len,
greedy=True)
decode = [[[int(p) for p in x if p != -1] for x in y] for y in decode][0]
for (pred, lab) in zip(decode, labels):
dist = editdistance.eval(pred, lab)
self.cer.append(dist / (max(len(pred), len(lab))))
self.wer.append(not np.array_equal(pred, lab))
print("Mean CER: {}".format(np.mean([self.cer], axis=1)[0]))
print("Mean WER: {}".format(np.mean([self.wer], axis=1)[0]))

Solved in TF 2.3.1, but should apply for previous versions of 2.x as well.
Some remarks:
Information on how to properly implement a Tensorflow Custom Metric is scarce. The question implied the use of a callback to implement the metric. This has longer epochs as a consequence (due to the explicit extra calculation of the metric on_epoch_end), or so I believe. Implementing it as a subclass of tensorflow.keras.metrics.Metric seems the right way, and yields results (if verbose is set correctly) while the epoch is ongoing.
Calculating the edit distance for the CER is quite easily performed using tf.edit_distance (using sparse tensors), this can subsequently be used to calculate the WER using some tf logic.
Alas, I am yet to find out how to implement both the CER and WER in one metric (as it has quite some duplicate code), if anyone knows how to do so, please contact me.
Custom metrics can simply be added into the compilation of your TF model:
self.model.compile(optimizer=opt, loss=loss, metrics=[CERMetric(), WERMetric()])
class CERMetric(tf.keras.metrics.Metric):
"""
A custom Keras metric to compute the Character Error Rate
"""
def __init__(self, name='CER_metric', **kwargs):
super(CERMetric, self).__init__(name=name, **kwargs)
self.cer_accumulator = self.add_weight(name="total_cer", initializer="zeros")
self.counter = self.add_weight(name="cer_count", initializer="zeros")
def update_state(self, y_true, y_pred, sample_weight=None):
input_shape = K.shape(y_pred)
input_length = tf.ones(shape=input_shape[0]) * K.cast(input_shape[1], 'float32')
decode, log = K.ctc_decode(y_pred,
input_length,
greedy=True)
decode = K.ctc_label_dense_to_sparse(decode[0], K.cast(input_length, 'int32'))
y_true_sparse = K.ctc_label_dense_to_sparse(y_true, K.cast(input_length, 'int32'))
decode = tf.sparse.retain(decode, tf.not_equal(decode.values, -1))
distance = tf.edit_distance(decode, y_true_sparse, normalize=True)
self.cer_accumulator.assign_add(tf.reduce_sum(distance))
self.counter.assign_add(len(y_true))
def result(self):
return tf.math.divide_no_nan(self.cer_accumulator, self.counter)
def reset_states(self):
self.cer_accumulator.assign(0.0)
self.counter.assign(0.0)
class WERMetric(tf.keras.metrics.Metric):
"""
A custom Keras metric to compute the Word Error Rate
"""
def __init__(self, name='WER_metric', **kwargs):
super(WERMetric, self).__init__(name=name, **kwargs)
self.wer_accumulator = self.add_weight(name="total_wer", initializer="zeros")
self.counter = self.add_weight(name="wer_count", initializer="zeros")
def update_state(self, y_true, y_pred, sample_weight=None):
input_shape = K.shape(y_pred)
input_length = tf.ones(shape=input_shape[0]) * K.cast(input_shape[1], 'float32')
decode, log = K.ctc_decode(y_pred,
input_length,
greedy=True)
decode = K.ctc_label_dense_to_sparse(decode[0], K.cast(input_length, 'int32'))
y_true_sparse = K.ctc_label_dense_to_sparse(y_true, K.cast(input_length, 'int32'))
decode = tf.sparse.retain(decode, tf.not_equal(decode.values, -1))
distance = tf.edit_distance(decode, y_true_sparse, normalize=True)
correct_words_amount = tf.reduce_sum(tf.cast(tf.not_equal(distance, 0), tf.float32))
self.wer_accumulator.assign_add(correct_words_amount)
self.counter.assign_add(len(y_true))
def result(self):
return tf.math.divide_no_nan(self.wer_accumulator, self.counter)
def reset_states(self):
self.wer_accumulator.assign(0.0)
self.counter.assign(0.0)

Alas, I am yet to find out how to implement both the CER and WER in
one metric (as it has quite some duplicate code), if anyone knows how
to do so, please contact me.
Hey, this solution really helped me a lot. As of now, there are TensorFlow 2.10 releases, so for this version, I wrote a combination of WER and CER metrics, here is the final working code:
import tensorflow as tf
class CWERMetric(tf.keras.metrics.Metric):
""" A custom TensorFlow metric to compute the Character Error Rate
"""
def __init__(self, name='CWER', **kwargs):
super(CWERMetric, self).__init__(name=name, **kwargs)
self.cer_accumulator = tf.Variable(0.0, name="cer_accumulator", dtype=tf.float32)
self.wer_accumulator = tf.Variable(0.0, name="wer_accumulator", dtype=tf.float32)
self.counter = tf.Variable(0, name="counter", dtype=tf.int32)
def update_state(self, y_true, y_pred, sample_weight=None):
input_shape = tf.keras.backend.shape(y_pred)
input_length = tf.ones(shape=input_shape[0], dtype='int32') * tf.cast(input_shape[1], 'int32')
decode, log = tf.keras.backend.ctc_decode(y_pred, input_length, greedy=True)
decode = tf.keras.backend.ctc_label_dense_to_sparse(decode[0], input_length)
y_true_sparse = tf.cast(tf.keras.backend.ctc_label_dense_to_sparse(y_true, input_length), "int64")
decode = tf.sparse.retain(decode, tf.not_equal(decode.values, -1))
distance = tf.edit_distance(decode, y_true_sparse, normalize=True)
correct_words_amount = tf.reduce_sum(tf.cast(tf.not_equal(distance, 0), tf.float32))
self.wer_accumulator.assign_add(correct_words_amount)
self.cer_accumulator.assign_add(tf.reduce_sum(distance))
self.counter.assign_add(len(y_true))
def result(self):
return {
"CER": tf.math.divide_no_nan(self.cer_accumulator, tf.cast(self.counter, tf.float32)),
"WER": tf.math.divide_no_nan(self.wer_accumulator, tf.cast(self.counter, tf.float32))
}
I still need to check whether it calculates CER and WER correctly, I'll find out that something is missing, I'll update this.

TensorFlow + Keras multi gpu model with inference

I am trying to do image classification using Keras's Xception model modeled after this code. However I want to use multiple GPU's to do batch parallel image classification using this function. I believe it is possible and I have the original code working without multi GPU support however I can not get multi_gpu_model function to work as I would expect. I am following this example for the multi GPU example. This is my code (it is the backend of a Flask app), it instantiates the model, makes a prediction on an example ndarray when the class is created, and then expects a base 64 encoded image in the classify function:
import os
from keras.preprocessing import image as preprocess_image
from keras.applications import Xception
from keras.applications.inception_v3 import preprocess_input, decode_predictions
from keras.utils import multi_gpu_model
import numpy as np
import tensorflow as tf
import PIL.Image
from numpy import array
class ModelManager:
def __init__(self, model_path):
self.model_name = 'ImageNet'
self.model_version = '1.0'
self.batch_size = 32
height = 224
width = 224
num_classes = 1000
# self.model = tf.keras.models.load_model(os.path.join(model_path, 'ImageNetXception.h5'))
with tf.device('/cpu:0'):
model = Xception(weights=None,
input_shape=(height, width, 3),
classes=num_classes, include_top=True)
# Replicates the model on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
self.parallel_model = multi_gpu_model(model, gpus=8)
self.parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
print("Loaded Xception model.")
x = np.empty((1, 224, 224, 3))
self.parallel_model.predict(x, batch_size=self.batch_size)
self.graph = tf.get_default_graph()
self.graph.finalize()
def classify(self, ids, images):
results = []
all_images = np.empty((0, 224, 224, 3))
# all_images = []
for image_id, image in zip(ids, images):
# This does the same as keras.preprocessing.image.load_img
image = image.convert('RGB')
image = image.resize((224, 224), PIL.Image.NEAREST)
x = preprocess_image.img_to_array(image)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
all_images = np.append(all_images, x, axis=0)
# all_images.append(x)
# a = array(all_images)
# print(type(a))
# print(a[0])
with self.graph.as_default():
preds = self.parallel_model.predict(all_images, batch_size=288)
#print(type(preds))
top3 = decode_predictions(preds, top=3)[0]
print(top3)
output = [((t[1],) + t[2:]) for t in top3]
predictions = [
{'label': label, 'probability': probability * 100.0}
for label, probability in output
]
results.append({
'id': 1,
'predictions': predictions
})
print(len(results))
return results
The parts I am not sure about is what to pass the predict function. Currently I am creating an ndarray of the images I want classified after they are preprocessed and then passing that to the predict function. The function returns but the preds variable doesn't hold what I expect. I tried to loop through the preds object but decode_predictions errors when I pass a single item but responds with one prediction when I pass the whole preds ndarray. In the example code they don't use the decode_predictions function so I'm not sure how to use it with the response from parallel_model.predict. Any help or resources is appreciated, thanks.

the following site illustrates how to do that correctly link

CTC loss goes down and stops

I’m trying to train a captcha recognition model. Model details are resnet pretrained CNN layers + Bidirectional LSTM + Fully Connected. It reached 90% sequence accuracy on captcha generated by python library captcha. The problem is that these generated captcha seems to have similary location of each character. When I randomly add spaces between characters, the model does not work any more. So I wonder is LSTM learning segmentation during learning? Then I try to use CTC loss. At first, loss goes down pretty quick. But it stays at about 16 without significant drop later. I tried different layers of LSTM, different number of units. 2 Layers of LSTM reach lower loss, but still not converging. 3 layers are just like 2 layers. The loss curve:
#encoding:utf8
import os
import sys
import torch
import warpctc_pytorch
import traceback
import torchvision
from torch import nn, autograd, FloatTensor, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torch.optim.lr_scheduler import MultiStepLR
from tensorboard import SummaryWriter
from pprint import pprint
from net.utils import decoder
from logging import getLogger, StreamHandler
logger = getLogger(__name__)
handler = StreamHandler(sys.stdout)
logger.addHandler(handler)
from dataset_util.utils import id_to_character
from dataset_util.transform import rescale, normalizer
from config.config import MAX_CAPTCHA_LENGTH, TENSORBOARD_LOG_PATH, MODEL_PATH
class CNN_RNN(nn.Module):
def __init__(self, lstm_bidirectional=True, use_ctc=True, *args, **kwargs):
super(CNN_RNN, self).__init__(*args, **kwargs)
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False
modules = list(model_conv.children())[:-1] # delete the last fc layer.
for param in modules[8].parameters():
param.requires_grad = True
self.resnet = nn.Sequential(*modules) # CNN with fixed parameters from resnet as feature extractor
self.lstm_input_size = 512 * 2 * 2
self.lstm_hidden_state_size = 512
self.lstm_num_layers = 2
self.chracter_space_length = 64
self._lstm_bidirectional = lstm_bidirectional
self._use_ctc = use_ctc
if use_ctc:
self._max_captcha_length = int(MAX_CAPTCHA_LENGTH * 2)
else:
self._max_captcha_length = MAX_CAPTCHA_LENGTH
if lstm_bidirectional:
self.lstm_hidden_state_size = self.lstm_hidden_state_size * 2 # so that hidden size for one direction in bidirection lstm is the same as vanilla lstm
self.lstm = self.lstm = nn.LSTM(self.lstm_input_size, self.lstm_hidden_state_size // 2, dropout=0.5, bidirectional=True, num_layers=self.lstm_num_layers)
else:
self.lstm = nn.LSTM(self.lstm_input_size, self.lstm_hidden_state_size, dropout=0.5, bidirectional=False, num_layers=self.lstm_num_layers) # dropout doen't work for one layer lstm
self.ouput_to_tag = nn.Linear(self.lstm_hidden_state_size, self.chracter_space_length)
self.tensorboard_writer = SummaryWriter(TENSORBOARD_LOG_PATH)
# self.dropout_lstm = nn.Dropout()
def init_hidden_status(self, batch_size):
if self._lstm_bidirectional:
self.hidden = (autograd.Variable(torch.zeros((self.lstm_num_layers * 2, batch_size, self.lstm_hidden_state_size // 2))),
autograd.Variable(torch.zeros((self.lstm_num_layers * 2, batch_size, self.lstm_hidden_state_size // 2)))) # number of layers, batch size, hidden dimention
else:
self.hidden = (autograd.Variable(torch.zeros((self.lstm_num_layers, batch_size, self.lstm_hidden_state_size))),
autograd.Variable(torch.zeros((self.lstm_num_layers, batch_size, self.lstm_hidden_state_size)))) # number of layers, batch size, hidden dimention
def forward(self, image):
'''
:param image: # batch_size, CHANNEL, HEIGHT, WIDTH
:return:
'''
features = self.resnet(image) # [batch_size, 512, 2, 2]
batch_size = image.shape[0]
features = [features.view(batch_size, -1) for i in range(self._max_captcha_length)]
features = torch.stack(features)
self.init_hidden_status(batch_size)
output, hidden = self.lstm(features, self.hidden)
# output = self.dropout_lstm(output)
tag_space = self.ouput_to_tag(output.view(-1, output.size(2))) # [MAX_CAPTCHA_LENGTH * BATCH_SIZE, CHARACTER_SPACE_LENGTH]
tag_space = tag_space.view(self._max_captcha_length, batch_size, -1)
if not self._use_ctc:
tag_score = F.log_softmax(tag_space, dim=2) # [MAX_CAPTCHA_LENGTH, BATCH_SIZE, CHARACTER_SPACE_LENGTH]
else:
tag_score = tag_space
return tag_score
def train_net(self, data_loader, eval_data_loader=None, learning_rate=0.008, epoch_num=400):
try:
if self._use_ctc:
loss_function = warpctc_pytorch.warp_ctc.CTCLoss()
else:
loss_function = nn.NLLLoss()
# optimizer = optim.SGD(filter(lambda p: p.requires_grad, self.parameters()), momentum=0.9, lr=learning_rate)
# optimizer = MultiStepLR(optimizer, milestones=[10,15], gamma=0.5)
# optimizer = optim.Adadelta(filter(lambda p: p.requires_grad, self.parameters()))
optimizer = optim.Adam(filter(lambda p: p.requires_grad, self.parameters()))
self.tensorboard_writer.add_scalar("learning_rate", learning_rate)
tensorbard_global_step=0
if os.path.exists(os.path.join(TENSORBOARD_LOG_PATH, "resume_step")):
with open(os.path.join(TENSORBOARD_LOG_PATH, "resume_step"), "r") as file_handler:
tensorbard_global_step = int(file_handler.read()) + 1
for epoch_index, epoch in enumerate(range(epoch_num)):
for index, sample in enumerate(data_loader):
optimizer.zero_grad()
input_image = autograd.Variable(sample["image"]) # batch_size, 3, 255, 255
tag_score = self.forward(input_image)
if self._use_ctc:
tag_score, target, tag_score_sizes, target_sizes = self._loss_preprocess_ctc(tag_score, sample)
loss = loss_function(tag_score, target, tag_score_sizes, target_sizes)
loss = loss / tag_score.size(1)
else:
target = sample["padded_label_idx"]
tag_score, target = self._loss_preprocess(tag_score, target)
loss = loss_function(tag_score, target)
print("Training loss: {}".format(float(loss)))
self.tensorboard_writer.add_scalar("training_loss", float(loss), tensorbard_global_step)
loss.backward()
optimizer.step()
if index % 250 == 0:
print(u"Processing batch: {} of {}, epoch: {}".format(index, len(data_loader), epoch_index))
self.evaluate(eval_data_loader, loss_function, tensorbard_global_step)
tensorbard_global_step += 1
self.save_model(MODEL_PATH + "_epoch_{}".format(epoch_index))
except KeyboardInterrupt:
print("Exit for KeyboardInterrupt, save model")
self.save_model(MODEL_PATH)
with open(os.path.join(TENSORBOARD_LOG_PATH, "resume_step"), "w") as file_handler:
file_handler.write(str(tensorbard_global_step))
except Exception as excp:
logger.error(str(excp))
logger.error(traceback.format_exc())
def predict(self, image):
# TODO ctc version
'''
:param image: [batch_size, channel, height, width]
:return:
'''
tag_score = self.forward(image)
# TODO ctc
# if self._use_ctc:
# tag_score = F.softmax(tag_score, dim=-1)
# decoder.decode(tag_score)
confidence_log_probability, indexes = tag_score.max(2)
predicted_labels = []
for batch_index in range(indexes.size(1)):
label = ""
for character_index in range(self._max_captcha_length):
if int(indexes[character_index, batch_index]) != 1:
label += id_to_character[int(indexes[character_index, batch_index])]
predicted_labels.append(label)
return predicted_labels, tag_score
def predict_pil_image(self, pil_image):
try:
self.eval()
processed_image = normalizer(rescale({"image": pil_image}))["image"].view(1, 3, 255, 255)
result, tag_score = self.predict(processed_image)
self.train()
except Exception as excp:
logger.error(str(excp))
logger.error(traceback.format_exc())
return [""], None
return result, tag_score
def evaluate(self, eval_dataloader, loss_function, step=0):
total = 0
sequence_correct = 0
character_correct = 0
character_total = 0
loss_total = 0
batch_size = eval_data_loader.batch_size
true_predicted = {}
self.eval()
for sample in eval_dataloader:
total += batch_size
input_images = sample["image"]
predicted_labels, tag_score = self.predict(input_images)
for predicted, true_label in zip(predicted_labels, sample["label"]):
if predicted == true_label: # dataloader is making label a list, use batch_size=1
sequence_correct += 1
for index, true_character in enumerate(true_label):
character_total += 1
if index < len(predicted) and predicted[index] == true_character:
character_correct += 1
true_predicted[true_label] = predicted
if self._use_ctc:
tag_score, target, tag_score_sizes, target_sizes = self._loss_preprocess_ctc(tag_score, sample)
loss_total += float(loss_function(tag_score, target, tag_score_sizes, target_sizes) / batch_size)
else:
tag_score, target = self._loss_preprocess(tag_score, sample["padded_label_idx"])
loss_total += float(loss_function(tag_score, target)) # averaged over batch index
print("True captcha to predicted captcha: ")
pprint(true_predicted)
self.tensorboard_writer.add_text("eval_ture_to_predicted", str(true_predicted), global_step=step)
accuracy = float(sequence_correct) / total
avg_loss = float(loss_total) / (total / batch_size)
character_accuracy = float(character_correct) / character_total
self.tensorboard_writer.add_scalar("eval_sequence_accuracy", accuracy, global_step=step)
self.tensorboard_writer.add_scalar("eval_character_accuracy", character_accuracy, global_step=step)
self.tensorboard_writer.add_scalar("eval_loss", avg_loss, global_step=step)
self.zero_grad()
self.train()
def _loss_preprocess(self, tag_score, target):
'''
:param tag_score: value return by self.forward
:param target: sample["padded_label_idx"]
:return: (processed_tag_score, processed_target) ready for NLLoss function
'''
target = target.transpose(0, 1)
target = target.contiguous()
target = target.view(target.size(0) * target.size(1))
tag_score = tag_score.view(-1, self.chracter_space_length)
return tag_score, target
def _loss_preprocess_ctc(self, tag_score, sample):
target_2d = [
[int(ele) for ele in sample["padded_label_idx"][row, :] if int(ele) != 0 and int(ele) != 1]
for row in range(sample["padded_label_idx"].size(0))]
target = []
for ele in target_2d:
target.extend(ele)
target = autograd.Variable(torch.IntTensor(target))
# tag_score = F.softmax(F.sigmoid(tag_score), dim=-1)
tag_score_sizes = autograd.Variable(torch.IntTensor([self._max_captcha_length] * tag_score.size(1)))
target_sizes = autograd.Variable(sample["captcha_length"].int())
return tag_score, target, tag_score_sizes, target_sizes
# def visualize_graph(self, dataset):
# '''Since pytorch use dynamic graph, an input is required to visualize graph in tensorboard'''
# # warning: Do not run this, the graph is too large to visualize...
# sample = dataset[0]
# input_image = autograd.Variable(sample["image"].view(1, 3, 255, 255))
# tag_score = self.forward(input_image)
# self.tensorboard_writer.add_graph(self, tag_score)
def save_model(self, model_path):
self.tensorboard_writer.close()
self.tensorboard_writer = None # can't be pickled
torch.save(self, model_path)
self.tensorboard_writer = SummaryWriter(TENSORBOARD_LOG_PATH)
#classmethod
def load_model(cls, model_path=MODEL_PATH, *args, **kwargs):
net = cls(*args, **kwargs)
if os.path.exists(model_path):
model = torch.load(model_path)
if model:
model.tensorboard_writer = SummaryWriter(TENSORBOARD_LOG_PATH)
net = model
return net
def __del__(self):
if self.tensorboard_writer:
self.tensorboard_writer.close()
if __name__ == "__main__":
from dataset_util.dataset import dataset, eval_dataset
data_loader = DataLoader(dataset, batch_size=2, shuffle=True)
eval_data_loader = DataLoader(eval_dataset, batch_size=2, shuffle=True)
net = CNN_RNN.load_model()
net.train_net(data_loader, eval_data_loader=eval_data_loader)
# net.predict(dataset[0]["image"].view(1, 3, 255, 255))
# predict_pil_image test code
# from config.config import IMAGE_PATHS
# import glob
# from PIL import Image
#
# image_paths = glob.glob(os.path.join(IMAGE_PATHS.get("EVAL"), "*.png"))
# for image_path in image_paths:
# pil_image = Image.open(image_path)
# predicted, score = net.predict_pil_image(pil_image)
# print("True value: {}, predicted: {}".format(os.path.split(image_path)[1], predicted))
print("Done")
The above codes are main part. If you need other components that makes it running, leave a comment. Got stuck here for quite long. Any advice for training crnn + ctc is appreciated.

I've been training with ctc loss and encountered the same problem. I know this is a rather late answer but hopefully it'll help someone else who's researching on this. After trial and error and a lot of research there are a few things that's worth knowing when it comes to training with ctc (if your model is set up correctly):
The quickest way for the model to lower cost is to predict only blanks. This is noted in a few papers and blogs: see http://www.tbluche.com/ctc_and_blank.html
The model learns to predict only blanks first, then it starts picking up on the error signal in regards to the correct underlying labels. This is also explained in the above link. In practice, I noticed that my model starts to learn the real underlying labels/targets after a couple hundred epochs and the loss starts decreasing dramatically again. Similar to what is shown for the toy example here: https://thomasmesnard.github.io/files/CTC_Poster_Mesnard_Auvolat.pdf
These parameters have a great impact on whether your model converges or not - learning rate, batch size and epoch number.

You have a few questions, so I will try to answer them one by one.
First, why does adding spaces to the captcha break the model?
A neural network learns to deal with the data it is trained on. If you change the distribution of the data (by for example adding spaces between characters) there is no guarantee that the network will generalize. As you hint at in your question. It is possible that the captchas you train on always have the characters in the same positions, or at the same distance from one another, thus your model learns that and learns to exploit this by looking in those positions. If you want your network to generalize a specific scenario, you should explicitly train on that scenario. So in your case, you should add random spaces also during training.
Second, why does the loss not go below 16?
Clearly, from the fact that your training loss is also stalled at 16 (like your validation loss), the problem is that your model simply doesn't have the capacity to deal with the complexity of the problem. In other words, your model is underfitting. You had the correct reflex to try to increase the capacity of your network. You tried to increase the capacity of the LSTM and it didn't help. Thus, the next logical step is that the convolution part of your network is not powerful enough. So here are a few things that you might want to try, from most likely to succeed in my opinion to least likely:
Make convnet trainable: I notice that you are using a pretrained convnet and that you are not fine-tuning the weights of that convnet. That could be a problem. Whatever your convnet was trained on, it might not develop the required features to deal with captchas. You should try learning the weights of the convnet too, in order to develop useful features for captchas.
Use deeper convnet: This is the naive thing to do. Your convnet doesn't have good enough features, try a more powerful deeper one. (But you should definitely use this only after you've made the convnet trainable).

From my experience, training RNN model with CTC loss is not an EASY task. The model may not converge at all if the training is not carefully setup-ed. Here are my suggestions:
Check the CTC loss output along training. For a model would converge, the CTC loss at each batch fluctuates notably. If you observed that the CTC loss shrinks almost monotonically to a stable value, then the model is most likely stuck at a local minima
Use short samples to pretrain your model. Though we have advanced RNN strucures like LSTM and GRU, it's still hard to back-propagate the RNN for long steps.
Enlarge sample variety. You can even add artificial samples to help your model escape from local minima.
F.Y.I., we've just open-sourced a new deep learning framework Dandelion which has built-in CTC objective, and interface pretty much like pytorch. You can try your model with Dandelion and compare it with your current implementation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.