U-Net with Pixel-wise weighted cross entropy: Input dimension errors - python

I have been using Zhixuhao's implementation of U-Net to try to do semantic binary segmentation and I modified it slightly using suggestions from this Stackoverflow answer:
Keras, binary segmentation, add weight to loss function
to be able to do a pixel-wise weighted binary cross-entropy, as they do in the original U-Net paper (see page 5), to force my U-Net to learn border pixels. Essentially the idea is to add a lambda layer that computes the pixel-wise weighted cross-entropy within the model itself and then use an "identity loss" that just copies the output of the network.
Here is what my input data looks like:
input image groundtruth weights
And here is what my code looks like:
def unet(pretrained_weights = None,input_size = (256,256,1)):
inputs = Input(input_size)
# [... Unet architecture from Zhixuhao's model.py file...]
conv10 = Conv2D(1, 1, activation = 'sigmoid', name='true_output')(conv9)
mask_weights = Input(input_size)
true_masks = Input(input_size)
loss1 = Lambda(weighted_binary_loss, output_shape=input_size, name='loss_output')([conv10, mask_weights, true_masks])
model = Model(inputs = [inputs, mask_weights, true_masks], outputs = loss1)
model.compile(optimizer = Adam(lr = 1e-4), loss =identity_loss)
And added those two functions:
def weighted_binary_loss(X):
y_pred, weights, y_true = X
loss = keras.losses.binary_crossentropy(y_pred, y_true)
loss = multiply([loss, weights])
return loss
def identity_loss(y_true, y_pred):
return y_pred
And finally here is the relevant part of my main.py:
input_size = (256,256,1)
target_size = (256,256)
myGene = trainGenerator(5,'data/moma/train','img','seg','wei',data_gen_args,save_to_dir=None,target_size=target_size)
model = unet(input_size=input_size)
model_checkpoint = ModelCheckpoint('unet_moma_weights.hdf5',monitor='loss',verbose=1, save_best_only=True)
model.fit_generator(myGene,steps_per_epoch=300,epochs=5,callbacks=[model_checkpoint])
Now this code runs fine, I can train my U-Net and it does learn border pixels, but only if I resize my input images to be 256*256 in size. If I instead use input_size=(256,32,1) and target_size=(256,32) in main.py , which is the relevant dimensions for my data and that allows me to use bigger batch sizes, I get the following error:
ValueError: Operands could not be broadcast together with shapes (256,
32, 1) (256, 32)
For the line loss = multiply([loss, weights]). And indeed the weights have one extra singleton dimension. I don't understand why the error is not raised when I use 256*256 inputs, but I tried to make both inputs the same dimensions with either k.expand_dims() or Reshape(), but while the code does not issue an error and the loss converges, when I test my network on extra inputs I get blank outputs (ie fully grey or white or black images, or stuff that has nothing to do with my inputs).
So this is a lot of text for the following question: Why does multiply() issue an error in the 256*32 case and not 256*256, and why creating/removing dimensions on the inputs does not help?
Thanks!
ps: In order to get the network to output the actual prediction instead of the pixel-wise loss after training, I remove the loss layer and the two extra input layers with the following code:
new_model = Model(inputs=model.inputs,outputs=model.get_layer("true_output").output)
new_model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy')
new_model.set_weights(model.get_weights())
This works fine (again in the 256*256 case at least)

So for anyone who stumbles upon this question, here is how I implemented the loss function:
def pixelwise_weighted_binary_crossentropy(y_true, y_pred):
'''
Pixel-wise weighted binary cross-entropy loss.
The code is adapted from the Keras TF backend.
(see their github)
Parameters
----------
y_true : Tensor
Stack of groundtruth segmentation masks + weight maps.
y_pred : Tensor
Predicted segmentation masks.
Returns
-------
Tensor
Pixel-wise weight binary cross-entropy between inputs.
'''
try:
# The weights are passed as part of the y_true tensor:
[seg, weight] = tf.unstack(y_true, 2, axis=-1)
seg = tf.expand_dims(seg, -1)
weight = tf.expand_dims(weight, -1)
except:
pass
epsilon = tf.convert_to_tensor(K.epsilon(), y_pred.dtype.base_dtype)
y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)
y_pred = tf.math.log(y_pred / (1 - y_pred))
zeros = array_ops.zeros_like(y_pred, dtype=y_pred.dtype)
cond = (y_pred >= zeros)
relu_logits = math_ops.select(cond, y_pred, zeros)
neg_abs_logits = math_ops.select(cond, -y_pred, y_pred)
entropy = math_ops.add(relu_logits - y_pred * seg, math_ops.log1p(math_ops.exp(neg_abs_logits)), name=None)
# This is essentially the only part that is different from the Keras code:
return K.mean(math_ops.multiply(weight, entropy), axis=-1)

Related

TFX AutoEncoder model output in serve_tf_examples_fn

I am building a TFX pipeline that takes data input from CSV file and trains an autoencoder. The issue I am facing is that when I get output from model in serve_tf_examples_fn it is a tensor of shape [1000, 17] I want ot calculate reconstruction loss and then perform thresholding on this output, none of the TF2 functions work inside this function like tensor.numpy(), etc. I have the labels for the dataset, which is why I want to compute reconstruction loss against traformed features and then perform thresholding and return labels back to tfma.EvalConfig.
EvalConfig:
eval_config = tfma.EvalConfig(
model_specs=[
tfma.ModelSpec(
signature_name="serving_default",
label_keys=features.LABEL_KEY
# preprocessing_function_names=["transform_features"],
)
],
...
trainer.py>>serve_tf_examples_fn()
#tf.function
def serve_tf_examples_fn(serialized_tf_examples):
"""Returns the output to be used in the serving signature."""
feature_spec = tf_transform_output.raw_feature_spec()
# feature_spec.pop(features.LABEL_KEY)
parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
transformed_features = model.tft_layer(parsed_features)
print(model.summary())
reconstructions = model(transformed_features)
print(reconstructions)
# sys.exit()
return {"outputs": reconstructions}
What I want to do is:
Get model output
Calculate reconstruction loss against transformed_features
Threshold that loss and generate predictions
Return those predictions to EvalConfig

Three Dimensional Output Loss

I am currently working on implementing a time series prediction task that will produce labels across a sequence (batch, steps, features) -> (batch, steps, classes). I have a TimeDistributed layer as my final layer and have a three dimensional output due to this, I seem to be getting terrible accuracy. I am wondering if this is due to the three dimensional output not being calculated correctly in the loss. Is there a better way to do this?
K.clear_session()
def acc(y_true, y_pred):
y_pred = tf.argmax(y_pred, 2)
y_true = tf.squeeze(y_true, -1)
return categorical_accuracy(y_true, y_pred)
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
def build_model():
char_in = Input(shape=(None, None)) #sequence length, word char length
char_emb = Embedding(char_emb_weights.shape[1], 32, weights=char_emb_weights, trainable=False)(char_in)
char_GRU = TimeDistributed(Bidirectional(GRU(32, recurrent_initializer='glorot_uniform'), 'concat'))(char_emb)
lstm = LSTM(64, return_sequences=True, recurrent_initializer='glorot_uniform')(char_GRU)
dense = TimeDistributed(Dense(16, activation='relu'))(lstm)
output = TimeDistributed(Dense(3, activation='softmax'))(dense)
#output = CRF(target_size, sparse_target=True)(dense)
m = Model(inputs=[word_in, char_in], outputs=output)
m.compile(optimizer='sgd', loss=loss, metrics=[acc])
return m
I solved this issue, using a model with lower hidden nodes per layer reduced model complexity and allowed for convergence.
That being said, I am still looking for an explanation as to why this is and also I am curious about a three dimensional output and how loss is calculated over time if anyone can provide an answer.

How to properly use CategoricalCrossentropy loss for image segmentation in Tensorflow 2.0/Keras?

I am trying to train a U-Net derivative to do single-class image segmentation but am having problems using the tf.keras.losses.SparseCategoricalCrossentropy() and tf.keras.losses.CategoricalCrossentropy() functions in Keras. Which is the more appropriate and how to use it properly?
If I try to use SpareCategoricalCrossentropy, I get the error:
Received a label value of 1 which is outside the valid range of [0, 1)
If I try to use CategoricalCrossentropy, I get:
You are passing a target array of shape (3600, 64, 64, 1) while using as loss categorical_crossentropy. categorical_crossentropy expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via: y_binary = tf.keras.utils.to_categorical(y_int)
Using to_categorical for my mask vs background segmentation problem, it increases the last dimension to 2, which should not be necessary. My prediction should be a number between 0 and 1 in a single "channel".
Model definition snippet:
input_x = tf.keras.Input(batch_shape=(batch_size, xsze, ysze, 3), name='input_x')
predictions = tf.keras.layers.Conv2D(1, [1, 1], activation='linear', name='output_x')(drop11)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
model.compile(optimizer=tf.keras.optimizers.Adam(), # Optimizer
loss=loss,
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
checkpointer = tf.keras.callbacks.ModelCheckpoint(session_name + '_backup.h5', save_best_only=True, monitor = 'acc', verbose = 0)
early_stopper = tf.keras.callbacks.EarlyStopping(monitor='val_loss',patience=5, verbose=1,min_delta=0.005)
history = model.fit(data_train, roi_train,
batch_size=batch_size,
epochs = 10,
validation_data=(data_val, roi_zoom_val),callbacks=[checkpointer,early_stopper])
My roi_train is a numpy array with 0's and 1's of type float32.
Since you only have one class and you want each value in the segmentation map to be between 0 and 1, then you should use sigmoid as the activation of last layer and binary_crossentropy as the loss function. That's because for each pixel you are facing a binary decision: does this pixel belong to foreground or background?

from_logits=True and from_logits=False get different training result for tf.losses.CategoricalCrossentropy for UNet

I am doing the image semantic segmentation job with unet, if I set the Softmax Activation for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
The training will not converge even for only one training image.
But if I do not set the Softmax Activation for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
The training will converge for one training image.
My groundtruth dataset is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
Why? Is there something wrong for my usage?
This is my experiment code of git: https://github.com/honeytidy/unet
You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.
Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False option.
You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.
from_logits = True signifies the values of the loss obtained by the model are not normalized and is basically used when we don't have any softmax function in our model. For e.g. https://www.tensorflow.org/tutorials/generative/dcgan in this model they have not used a softmax activation function or in other words we can say it helps in numerical stability.
By default, all of the loss function implemented in Tensorflow for classification problem uses from_logits=False. Remember in case of classification problem, at the end of the prediction, usually one wants to produce output in terms of probabilities.
Just look at the image below, the last layer of the network(just before softmax function)
So the sequence is Neural Network ⇒ Last layer output ⇒ Softmax or Sigmoid function ⇒ Probability of each class.
For example in the case of a multi-class classification problem, where output can be y1, y2, ....... yn one wants to produce each output with some probability. (see the output layer). Now, this output layer will get compared in cross-entropy loss function with the true label.
Let us take an example where our network produced the output for the classification task. Assume your Neural Network is producing output, then you convert that output into probabilities using softmax function and calculate loss using a cross-entropy loss function
# output produced by the last layer of NN
nn_output_before_softmax = [3.2, 1.3, 0.2, 0.8]
# converting output of last layer of NN into probabilities by applying softmax
nn_output_after_softmax = tf.nn.softmax(nn_output_before_softmax)
# output converted into softmax after appling softmax
print(nn_output_after_softmax.numpy())
[0.77514964 0.11593805 0.03859243 0.07031998]
y_true = [1.0, 0.0, 0.0, 0.0]
Now there are two scenarios:
One is explicitly using the softmax (or sigmoid) function
One is not using softmax function separately and wants to include in the calculation of loss function
1) One is explicitly using the softmax (or sigmoid) function
When one is explicitly using softmax (or sigmoid) function, then, for the classification task, then there is a default option in TensorFlow loss function i.e. from_logits=False. So here TensorFlow is assuming that whatever the input that you will be feeding to the loss function are the probabilities, so no need to apply the softmax function.
# By default from_logits=False
loss_taking_prob = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
loss_1 = loss_taking_prob(y_true, nn_output_after_softmax)
print(loss_1)
tf.Tensor(0.25469932, shape=(), dtype=float32)
2) One is not using the softmax function separately and wants to include it in the calculation of the loss function. This means that whatever inputs you are providing to the loss function is not scaled (means inputs are just the number from -inf to +inf and not the probabilities). Here you are letting TensorFlow perform the softmax operation for you.
loss_taking_logits = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
loss_2 = loss_taking_logits(y_true, nn_output_before_softmax)
print(loss_2)
tf.Tensor(0.2546992, shape=(), dtype=float32)
Please do remember that you using from_logits=False when it should be True leads to taking softmax of probabilities and producing incorrect model
I guess the problem comes from the softmax activation function. Looking at the doc I found that sotmax is applied to the last axis by default. Can you look at model.summary() and check if that is what you want ?
For softmax to work properly, you must make sure that:
You are using 'channels_last' as Keras default channel config.
This means the shapes in the model will be like (None, height, width, channels)
This seems to be your case because you are putting n_classes in the last axis. But it's also strange because you are using Conv2D and your output Y should be (1, height, width, n_classes) and not that strange shape you are using.
Your Y has only zeros and ones (not 0 and 255 as usually happens to images)
Check that Y.max() == 1 and Y.min() == 0
You may need to have Y = Y / 255.
Only one class is correct (your data does not have more than one path/channel with value = 1).
Check that (Y.sum(axis=-1) == 1).all() is True

Doing Multi-Label classification with BERT

I want to use BERT model to do multi-label classification with Tensorflow.
To do so, I want to adapt the example run_classifier.py from BERT github repository, which is an example on how to use BERT to do simple classification, using the pre-trained weights given by Google Research. (For example with BERT-Base, Cased)
I have X different labels which have value of either 0 or 1, so I want to add to the original BERT model a new Dense layer of size X and using the sigmoid_cross_entropy_with_logits activation function.
So, for the theorical part I think I am OK.
The problem is that I don't know how I can append a new output layer and retrain only this new layer with my dataset, using the existing BertModel class.
Here is the original create_model() function from run_classifier.py where I guess I have to do my modifications. But I am a bit lost on what to do.
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
labels, num_labels, use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
And here is the same function, with some of my modifications, but where there is things missing (and wrong things too? )
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids, labels, num_labels):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
The other things I have adapted in the code and for which I had no problem :
DataProcessor to load and parse my custom dataset
Changing the type of labels variable from numerical values to arrays everywhere it is used
So, if anyone knows what I should do to resolve my problem, or even point out some obvious mistake I may have done, I would be glad to hear it.
Notes :
I found this article that correspond pretty well to what I am trying to do, but it use PyTorch, and I can not translate it into Tensorflow.
You want to replace the softmax that models a single distribution over possible outputs (all scores sum up to one) with sigmoid which models an independent distribution for each class (there is yes/no distribution for each output).
So, you correctly change the loss function, but you also need to change how you compute the probabilities. It should be:
probabilities = tf.sigmoid(logits)
In this case, you don't need the log_probs.

Categories