TFX AutoEncoder model output in serve_tf_examples_fn - python

I am building a TFX pipeline that takes data input from CSV file and trains an autoencoder. The issue I am facing is that when I get output from model in serve_tf_examples_fn it is a tensor of shape [1000, 17] I want ot calculate reconstruction loss and then perform thresholding on this output, none of the TF2 functions work inside this function like tensor.numpy(), etc. I have the labels for the dataset, which is why I want to compute reconstruction loss against traformed features and then perform thresholding and return labels back to tfma.EvalConfig.
EvalConfig:
eval_config = tfma.EvalConfig(
model_specs=[
tfma.ModelSpec(
signature_name="serving_default",
label_keys=features.LABEL_KEY
# preprocessing_function_names=["transform_features"],
)
],
...
trainer.py>>serve_tf_examples_fn()
#tf.function
def serve_tf_examples_fn(serialized_tf_examples):
"""Returns the output to be used in the serving signature."""
feature_spec = tf_transform_output.raw_feature_spec()
# feature_spec.pop(features.LABEL_KEY)
parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
transformed_features = model.tft_layer(parsed_features)
print(model.summary())
reconstructions = model(transformed_features)
print(reconstructions)
# sys.exit()
return {"outputs": reconstructions}
What I want to do is:
Get model output
Calculate reconstruction loss against transformed_features
Threshold that loss and generate predictions
Return those predictions to EvalConfig

Related

How can I update weights of a model with two different loss function at the same time on TF2

I want to train a classification model with two losses as follows:
model.compile(optimizer=adam)
#tf.function
def train(model, inputs_data_1, inputs_data_2, y):
with tf.GradientTape(persistent=True) as tape:
logits1, features1 = model(inputs_data_1) # logits: output of fully-connected layer
logits2, features2 = model(inputs_data_2) # features: output of feature extractor
loss_fn1 = cross-entropy(y, logits1)
loss_fn2 = euclidean_dist(features1-features2)
losses = loss_fn1 + loss_fn2
optim.apply_gradients(zip(tape.gradient(losses, model.trainable_weights), model.trainable_variables))
when I try this, it just stopped without an error.
I didn't change the input data by using tf.split or tf.reshape
how can I compile the model and train with two losses?
Plz, give me some opinions or code implementation to reference this problem. Thank you.

What's the best way to access single gradients in a batch in TensorFlow?

I'm currently analyzing how gradients develop over the course of training of a CNN using Tensorflow 2.x. What I want to do is compare each gradient in a batch to the gradient resulting for the whole batch. At the moment I use this simple code snippet for each training step:
[...]
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
[...]
# One training step
# x_train is a batch of input data, y_train the corresponding labels
def train_step(model, optimizer, x_train, y_train):
# Process batch
with tf.GradientTape() as tape:
batch_predictions = model(x_train, training=True)
batch_loss = loss_object(y_train, batch_predictions)
batch_grads = tape.gradient(batch_loss, model.trainable_variables)
# Do something with gradient of whole batch
# ...
# Process each data point in the current batch
for index in range(len(x_train)):
with tf.GradientTape() as single_tape:
single_prediction = model(x_train[index:index+1], training=True)
single_loss = loss_object(y_train[index:index+1], single_prediction)
single_grad = single_tape.gradient(single_loss, model.trainable_variables)
# Do something with gradient of single data input
# ...
# Use batch gradient to update network weights
optimizer.apply_gradients(zip(batch_grads, model.trainable_variables))
train_loss(batch_loss)
train_accuracy(y_train, batch_predictions)
My main problem is that computation time explodes when calculating each of the gradients single-handedly although these calculations should have already been done by Tensorflow when calculating the batch's gradient. The reason is that GradientTape as well as compute_gradients always return a single gradient no matter whether single or several data points were given. So this computation has to be done for each data point.
I know that I could compute the batch's gradient to update the network by using all the single gradients calculated for each data point but this plays only a minor role in saving computation time.
Is there a more efficient way to compute single gradients?
You can use the jacobian method of the gradient tape to get the Jacobian matrix, which will give you the gradients for each individual loss value:
import tensorflow as tf
# Make a random linear problem
tf.random.set_seed(0)
# Random input batch of ten four-vector examples
x = tf.random.uniform((10, 4))
# Random weights
w = tf.random.uniform((4, 2))
# Random batch label
y = tf.random.uniform((10, 2))
with tf.GradientTape() as tape:
tape.watch(w)
# Prediction
p = x # w
# Loss
loss = tf.losses.mean_squared_error(y, p)
# Compute Jacobian
j = tape.jacobian(loss, w)
# The Jacobian gives you the gradient for each loss value
print(j.shape)
# (10, 4, 2)
# Gradient of the loss wrt the weights for the first example
tf.print(j[0])
# [[0.145728424 0.0756840706]
# [0.103099883 0.0535449386]
# [0.267220169 0.138780832]
# [0.280130595 0.145485848]]

U-Net with Pixel-wise weighted cross entropy: Input dimension errors

I have been using Zhixuhao's implementation of U-Net to try to do semantic binary segmentation and I modified it slightly using suggestions from this Stackoverflow answer:
Keras, binary segmentation, add weight to loss function
to be able to do a pixel-wise weighted binary cross-entropy, as they do in the original U-Net paper (see page 5), to force my U-Net to learn border pixels. Essentially the idea is to add a lambda layer that computes the pixel-wise weighted cross-entropy within the model itself and then use an "identity loss" that just copies the output of the network.
Here is what my input data looks like:
input image groundtruth weights
And here is what my code looks like:
def unet(pretrained_weights = None,input_size = (256,256,1)):
inputs = Input(input_size)
# [... Unet architecture from Zhixuhao's model.py file...]
conv10 = Conv2D(1, 1, activation = 'sigmoid', name='true_output')(conv9)
mask_weights = Input(input_size)
true_masks = Input(input_size)
loss1 = Lambda(weighted_binary_loss, output_shape=input_size, name='loss_output')([conv10, mask_weights, true_masks])
model = Model(inputs = [inputs, mask_weights, true_masks], outputs = loss1)
model.compile(optimizer = Adam(lr = 1e-4), loss =identity_loss)
And added those two functions:
def weighted_binary_loss(X):
y_pred, weights, y_true = X
loss = keras.losses.binary_crossentropy(y_pred, y_true)
loss = multiply([loss, weights])
return loss
def identity_loss(y_true, y_pred):
return y_pred
And finally here is the relevant part of my main.py:
input_size = (256,256,1)
target_size = (256,256)
myGene = trainGenerator(5,'data/moma/train','img','seg','wei',data_gen_args,save_to_dir=None,target_size=target_size)
model = unet(input_size=input_size)
model_checkpoint = ModelCheckpoint('unet_moma_weights.hdf5',monitor='loss',verbose=1, save_best_only=True)
model.fit_generator(myGene,steps_per_epoch=300,epochs=5,callbacks=[model_checkpoint])
Now this code runs fine, I can train my U-Net and it does learn border pixels, but only if I resize my input images to be 256*256 in size. If I instead use input_size=(256,32,1) and target_size=(256,32) in main.py , which is the relevant dimensions for my data and that allows me to use bigger batch sizes, I get the following error:
ValueError: Operands could not be broadcast together with shapes (256,
32, 1) (256, 32)
For the line loss = multiply([loss, weights]). And indeed the weights have one extra singleton dimension. I don't understand why the error is not raised when I use 256*256 inputs, but I tried to make both inputs the same dimensions with either k.expand_dims() or Reshape(), but while the code does not issue an error and the loss converges, when I test my network on extra inputs I get blank outputs (ie fully grey or white or black images, or stuff that has nothing to do with my inputs).
So this is a lot of text for the following question: Why does multiply() issue an error in the 256*32 case and not 256*256, and why creating/removing dimensions on the inputs does not help?
Thanks!
ps: In order to get the network to output the actual prediction instead of the pixel-wise loss after training, I remove the loss layer and the two extra input layers with the following code:
new_model = Model(inputs=model.inputs,outputs=model.get_layer("true_output").output)
new_model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy')
new_model.set_weights(model.get_weights())
This works fine (again in the 256*256 case at least)
So for anyone who stumbles upon this question, here is how I implemented the loss function:
def pixelwise_weighted_binary_crossentropy(y_true, y_pred):
'''
Pixel-wise weighted binary cross-entropy loss.
The code is adapted from the Keras TF backend.
(see their github)
Parameters
----------
y_true : Tensor
Stack of groundtruth segmentation masks + weight maps.
y_pred : Tensor
Predicted segmentation masks.
Returns
-------
Tensor
Pixel-wise weight binary cross-entropy between inputs.
'''
try:
# The weights are passed as part of the y_true tensor:
[seg, weight] = tf.unstack(y_true, 2, axis=-1)
seg = tf.expand_dims(seg, -1)
weight = tf.expand_dims(weight, -1)
except:
pass
epsilon = tf.convert_to_tensor(K.epsilon(), y_pred.dtype.base_dtype)
y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)
y_pred = tf.math.log(y_pred / (1 - y_pred))
zeros = array_ops.zeros_like(y_pred, dtype=y_pred.dtype)
cond = (y_pred >= zeros)
relu_logits = math_ops.select(cond, y_pred, zeros)
neg_abs_logits = math_ops.select(cond, -y_pred, y_pred)
entropy = math_ops.add(relu_logits - y_pred * seg, math_ops.log1p(math_ops.exp(neg_abs_logits)), name=None)
# This is essentially the only part that is different from the Keras code:
return K.mean(math_ops.multiply(weight, entropy), axis=-1)

Why does my training loss oscillate while training the final layer of AlexNet with pre-trained weights?

I am working on texture classification and based on previous works, I am trying to modify the final layer of AlexNET to have 20 classes, and train only that layer for my multi class classification problem.
I am using Tensorflow-GPU on an NVIDIA GTX 1080, Python3.6 on Ubuntu 16.04.
I am using the Gradient Descent Optimiser and the class Estimator to build this. I am also using two dropout layers for regularization. Therefore, my hyper parameters are the learning rate, batch_size, and weight_decay. I have tried using batch_size of 50,100,200,weight_decays of 0.005 and 0.0005, and learning rates of 1e-3,1e-4,and 1e-5. All the training loss curves for the above values follow similar trends.
My training loss curve does not monotonically decrease and instead seems to oscillate. I have provided a tensorboard visualization for learning rate=1e-5, weight decay=0.0005, and batch_size=200.
Please assist in understanding what went wrong and how I could possibly rectify it.
The Tensorboard Visualization for the case I specified
# Create the Estimator
classifier = tf.estimator.Estimator(model_fn=cnn_model)
# Set up logging for predictions
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=10)
# Train the model
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": train_data},y=train_labels,batch_size=batch_size,num_epochs=None,shuffle=True)
classifier.train(input_fn=train_input_fn, steps=200000, hooks=[logging_hook])
# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": eval_data},
y=eval_labels,
num_epochs=1,
shuffle=False)
eval_results = classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)
#Sections of the cnn_model
#Output Config
predictions = { "classes": tf.argmax(input=logits, axis=1),# Generate predictions (for PREDICT and EVAL mode)
"probabilities": tf.nn.softmax(logits, name="softmax_tensor")} # Add `softmax_tensor` to the graph. It is used for PREDICT and by the `logging_hook`.
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Calculate Loss (for both TRAIN and EVAL modes)
onehot_labels = tf.one_hot(indices=tf.cast(labels,tf.int32),depth=20)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
#Training Config
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
tf.summary.scalar('training_loss',loss)
summary_hook = tf.train.SummarySaverHook(save_steps=10,output_dir='outputs',summary_op=tf.summary.merge_all())
train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op,training_hooks=[summary_hook])
# Evaluation Metric- Accuracy
eval_metric_ops = {"accuracy": tf.metrics.accuracy(labels=labels, predictions=predictions["classes"])}
print(time.time()-t)
tf.summary.scalar('eval_loss',loss)
ac=tf.metrics.accuracy(labels=labels,predictions=predictions["classes"])
tf.summary.scalar('eval_accuracy',ac)
evaluation_hook= tf.train.SummarySaverHook(save_steps=10,output_dir='outputseval',summary_op=tf.summary.merge_all())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops,evaluation_hooks=[evaluation_hook])
Are you selecting your mini-batches randomly? It looks like you have a high variance across your mini-batches which leads to a high variance of the loss at different iterations.
I assume the x-axis in your plot is iterations and not epochs and the training data provided every ~160 iterations is harder to predict which leads to the periodic drop in your loss curve. How does your validation loss behave?
Possible solutions/ideas:
try randomizing your training data selection in a better way
Check your training data for mislabeled examples

Image retraining in tensorflow, changing the simple softmax layer to multilayer CNN

Tensorflow has released a tutorial for transfer learning named Image retraining that can be found in here:
https://www.tensorflow.org/tutorials/image_retraining
What they are doing is using a pre-trained model on inception v3 and then they change only the very last layer (softmax regression layer) and train it on the new dataset. This is very understandable and in fact a common practice in transfer learning.
I have tried their method on my dataset (which is a small dataset) and I have applied all the suggestion to get a better result from the data augmentation to change the number of steps but I did not modify their code by any means. The accuracy I got is relatively bad ~70%.
I am thinking of the possibility of training a small neural network on top of the given model, namely, changing the last layer from a simple regression to a more sophisticated network.
Here is the part of their code where they modify the softmax layer:
def add_final_training_ops(class_count, final_tensor_name, bottleneck_tensor):
"""Adds a new softmax and fully-connected layer for training.
We need to retrain the top layer to identify our new classes, so this function
adds the right operations to the graph, along with some variables to hold the
weights, and then sets up all the gradients for the backward pass.
The set up for the softmax and fully-connected layers is based on:
https://tensorflow.org/versions/master/tutorials/mnist/beginners/index.html
Args:
class_count: Integer of how many categories of things we're trying to
recognize.
final_tensor_name: Name string for the new final node that produces results.
bottleneck_tensor: The output of the main CNN graph.
Returns:
The tensors for the training and cross entropy results, and tensors for the
bottleneck input and ground truth input.
"""
with tf.name_scope('input'):
bottleneck_input = tf.placeholder_with_default(
bottleneck_tensor, shape=[None, BOTTLENECK_TENSOR_SIZE],
name='BottleneckInputPlaceholder')
ground_truth_input = tf.placeholder(tf.float32,
[None, class_count],
name='GroundTruthInput')
# Organizing the following ops as `final_training_ops` so they're easier
# to see in TensorBoard
layer_name = 'final_training_ops'
with tf.name_scope(layer_name):
with tf.name_scope('weights'):
layer_weights = tf.Variable(tf.truncated_normal([BOTTLENECK_TENSOR_SIZE, class_count], stddev=0.001), name='final_weights')
variable_summaries(layer_weights)
with tf.name_scope('biases'):
layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
variable_summaries(layer_biases)
with tf.name_scope('Wx_plus_b'):
logits = tf.matmul(bottleneck_input, layer_weights) + layer_biases
tf.summary.histogram('pre_activations', logits)
final_tensor = tf.nn.softmax(logits, name=final_tensor_name)
tf.summary.histogram('activations', final_tensor)
with tf.name_scope('cross_entropy'):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
labels=ground_truth_input, logits=logits)
with tf.name_scope('total'):
cross_entropy_mean = tf.reduce_mean(cross_entropy)
tf.summary.scalar('cross_entropy', cross_entropy_mean)
with tf.name_scope('train'):
train_step = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(
cross_entropy_mean)
return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input,
final_tensor)
def add_evaluation_step(result_tensor, ground_truth_tensor):
"""Inserts the operations we need to evaluate the accuracy of our results.
Args:
result_tensor: The new final node that produces results.
ground_truth_tensor: The node we feed ground truth data
into.
Returns:
Tuple of (evaluation step, prediction).
"""
with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
prediction = tf.argmax(result_tensor, 1)
correct_prediction = tf.equal(
prediction, tf.argmax(ground_truth_tensor, 1))
with tf.name_scope('accuracy'):
evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', evaluation_step)
return evaluation_step, prediction
However, I am facing two main problems. First, I am not if this a good idea or not? would I be just wasting in my effort in doing something useless? Second, they are using the simple MNIST tutorial as a model for the last layer, say that I would use their Expert MNIST tutorial (https://www.tensorflow.org/get_started/mnist/pros) I am lost on what to do or how to configure it?.
Any suggestions on what can I do?

Categories