I've been recently trying to implement a multi-class classification LSTM architecture, based on this example: biLSTM example
After I changed
self.label = tf.placeholder(tf.int32, [None])
to
self.label = tf.placeholder(tf.int32, [None,self.n_class)
The model seems to train normally, yet I am having trouble with this step:
self.loss = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y_hat, labels=self.label))
# prediction
self.prediction = tf.argmax(tf.nn.softmax(y_hat), 1)
As, even though the model learns normally, the predictions does not seem to work for multiple variables. I was wondering how should one code the self.prediction object, so that it emits a vector of predictions for individual instances?
Thank you very much.
I was wondering how should one code the self.prediction object, so
that it emits a vector of predictions for individual instances?
In general tf.nn.softmax returns a vector of probabilities. You just can't see them because your are using tf.argmax, which returns the index of the largest value. Therefore you will just get one number. Just remove tf.argmax and you should be fine.
Related
I am building a model with 3 classes: [0,1,2]
After training, the .predict function returns a list of percentages instead.
I was checking the keras documentation but could not figure out, what I did wrong.
.predict_classes is not working anymore, and I did not have this problem with previous classifiers. I already tried different activation functions (relu, sigmoid etc.)
If I understand correctly, the number inDense(3...) defines the amount of classes.
outputs1=Dense(3,activation='softmax')(att_out)
model1=Model(inputs1,outputs1)
model1.summary()
model1.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=['accuracy'])
model1.fit(x=text_pad,y=train_y,batch_size=batch_size,epochs=epochs,verbose=1,shuffle=True)
y_pred = model1.predict(test_text_matrix)
Output example:
[[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]]
Output I want:
[1,2,0,0,0,1,2,0]
Thank you for any ideas.
You did not do anything wrong, predict has always returned the output of the model, for a classifier this has always been probabilities per class.
predict_classes is only available for Sequential models, not for Functional ones.
But there is an easy solution, you just need to take the argmax on the last dimension and you will get class indices:
y_probs = model1.predict(test_text_matrix)
y_pred = np.argmax(y_probs, axis=-1)
My question is about the practical implementation of "Domain Adaptation" into a functional model in keras with tensorflow backend.
Description of the problem:
I have a collection of particle collision samples which consist of n variables. One half of them is simulated data with certain class labels (e.g "W-Boson"). The other half is real collision data which is not labeled. The key idea now is to setup a keras model, which has two outputs. One for classifying the class of a sample and one for classifying the domain, so wether it is simulated or real data. The thing is that the model shall be trained so that the domain classifier performs very poor. This is achieved by flipping the sign of the incoming gradient from the domain end of the network during training. This technique is called "Domain Adaptation". The model is expected to be trained to find domain-invariant features, or in other words, to perform the same on simulated and real collision data.
The framework I am working with has an existin functional keras model, which I wanted to expand with said domain classifier. This is a prototype I came up with:
# common layers
inputs = keras.Input(shape=(n_variables, ))
X = layers.Dense(units=50, activation="relu")(inputs)
# domain end
flip_layer = flipGradientTF.GradientReversal(hp_lambda=0.3)(X)
X_domain = layers.Dense(units=50, activation="relu")(flip_layer)
domain_out = layers.Dense(units=2, activation="softmax", name="domain_out")(X_domain)
# class end
X_class = layers.Dense(units=50, activation="relu")(X)
class_out = layers.Dense(units=n_classes, activation="softmax", name="class_out")(X_class)
The code for flipGradientTF is taken from https://github.com/michetonu/gradient_reversal_keras_tf
And further on for compiling and training the model:
model = keras.Model(inputs=inputs, outputs=[class_out, domain_out])
model.compile(optimizer="adam", loss=loss_function, metrics="accuracy")
# train model
model.fit(
x = train_data,
y = [train_class_labels, train_domain_labels],
batch_size = 200,
epochs = 200,
sample_weight = {"class_out": class_weights, "domain_out": None}
)
For train_data I am passing the dataframe which consists of the data from both domains. As I have tried to use either "categorical_crossentropy" or "sparse_categorical_crossentropy" as the loss_function, train_class_labels and train_domain_labels where either in the one-hot representation or in the integer representation. My biggest issue is figuring out what to use for the class labels of the unlabeled data and this led to a gut feeling that I am on the wrong track here.
So in a nutshell:
Is this implementation strategy legit and assuming it is, what should I do about the class labels for the unlabeled data? And if it is not legit, what would be a better way of attacking this problem?
Any help would be much appreciated :)
I can't find the info in the documentation so I am asking here.
I have a multioutput model with 3 different outputs:
model = tf.keras.Model(inputs=[input], outputs=[output1, output2, output3])
The predicted labels for validation are constructed from these 3 outputs to form only one, it's a post-processing step. The dataset used for training is a dataset of those 3 intermediary outputs, for validation I evaluate on a dataset of labels instead of the 3 kind of intermediary data.
I would like to evaluate my model using a custom metric that handle the post processing and comparaison with the ground truth.
My question is, in the code of the custom metric, will y_pred be a list of the 3 outputs of the model?
class MyCustomMetric(tf.keras.metrics.Metric):
def __init__(self, name='my_custom_metric', **kwargs):
super(MyCustomMetric, self).__init__(name=name, **kwargs)
def update_state(self, y_true, y_pred, sample_weight=None):
# ? is y_pred a list [batch_output_1, batch_output_2, batch_output_3] ?
def result(self):
pass
# one single metric handling the 3 outputs?
model.compile(optimizer=tf.compat.v1.train.RMSPropOptimizer(0.01),
loss=tf.keras.losses.categorical_crossentropy,
metrics=[MyCustomMetric()])
With your given model definition, this is a standard multi-output Model.
model = tf.keras.Model(inputs=[input], outputs=[output_1, output_2, output_3])
In general, all (custom) Metrics as well as (custom) Losses will be called on every output separately (as y_pred)! Within the loss/metric function you will only see one output together with the one
corresponding target tensor.
By passing a list of loss functions (length == number of outputs of your model) you can specifiy which loss will be used for which output:
model.compile(optimizer=Adam(), loss=[loss_for_output_1, loss_for_output_2, loss_for_output_3], loss_weights=[1, 4, 8])
The total loss (which is the objective function to minimize) will be the additive combination of all losses multiplied with the given loss weights.
It is almost the same for the metrics! Here you can pass (as for the loss) a list (lenght == number of outputs) of metrics and tell Keras which metric to use for which of your model outputs.
model.compile(optimizer=Adam(), loss='mse', metrics=[metrics_for_output_1, metrics_for_output2, metrics_for_output3])
Here metrics_for_output_X can be either a function or a list of functions, which all be called with the one corresponding output_X as y_pred.
This is explained in detail in the documentation of Multi-Output Models in Keras. They also show examples for using dictionarys (to map loss/metric functions to a specific output) instead of lists.
https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models
Further information:
If I understand you correctly you want to train your model using a loss function comparing the
three model outputs with three ground truth values and want to do some sort of performance evaluation by comparing
a derived value from the three model outputs and a single ground truth value.
Usually the model gets trained on the same objective it is evaluated on, otherwise you might get poorer results when
evaluating your model!
Anyways... for evaluating your model on a single label I suggest you either:
1. (The clean solution)
Rewrite your model and incorporate the post-processing steps. Add all the necessary operations (as layers) and map those
to an auxiliary output. For training your model you can set the loss_weight of the auxiliary output to zero.
Merge your Datasets so you can feed your model the model input, the intermediate target outputs as well as the labels.
As explained above you can define now a metric comparing the auxiliary model output with the given target labels.
2.
Or you train your model and derive the metric e.g. in a custom Callback by calculating your post-processing steps on the three outputs of model.predict(input).
This will make it necessary to write custom summaries if you want to track those values in your tensorboard! That's why I would not recommend this solution.
I have CNN that I have built using on Tensor-flow 2.0. I need to access outputs of the intermediate layers. I was going over other stackoverflow questions that were similar but all had solutions involving Keras sequential model.
I have tried using model.layers[index].output but I get
Layer conv2d has no inbound nodes.
I can post my code here (which is super long) but I am sure even without that someone can point to me how it can be done using just Tensorflow 2.0 in eager mode.
I stumbled onto this question while looking for an answer and it took me some time to figure out as I use the model subclassing API in TF 2.0 by default (as in here https://www.tensorflow.org/tutorials/quickstart/advanced).
If somebody is in a similar situation, all you need to do is assign the intermediate output you want, as an attribute of the class. Then keep the test_step without the #tf.function decorator and create its decorated copy, say val_step, for efficient internal computation of validation performance during training. As a short example, I have modified a few functions of the tutorial from the link accordingly. I'm assuming we need to access the output after flattening.
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
self.intermediate=x #assign it as an object attribute for accessing later
x = self.d1(x)
return self.d2(x)
#Remove #tf.function decorator from test_step for prediction
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
return
#Create a decorated val_step for object's internal use during training
#tf.function
def val_step(images, labels):
return test_step(images, labels)
Now when you run model.predict() after training, using the un-decorated test step, you can access the intermediate output using model.intermediate which would be an EagerTensor whose value is obtained simply by model.intermediate.numpy(). However, if you don't remove the #tf_function decorator from test_step, this would return a Tensor whose value is not so straightforward to obtain.
Thanks for answering my earlier question. I wrote this simple example to illustrate how what you're trying to do might be done in TensorFlow 2.x, using the MNIST dataset as the example problem.
The gist of the approach:
Build an auxiliary model (aux_model in the example below), which is so-called "functional model" with multiple outputs. The first output is the output of the original model and will be used for loss calculation and backprop, while the remaining output(s) are the intermediate-layer outputs that you want to access.
Use tf.GradientTape() to write a custom training loop and expose the detailed gradient values on each individual variable of the model. Then you can pick out the gradients that are of interest to you. This requires that you know the ordering of the model's variables. But that should be relatively easy for a sequential model.
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# This is the original model.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=[28, 28, 1]),
tf.keras.layers.Dense(100, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")])
# Make an auxiliary model that exposes the output from the intermediate layer
# of interest, which is the first Dense layer in this case.
aux_model = tf.keras.Model(inputs=model.inputs,
outputs=model.outputs + [model.layers[1].output])
# Define a custom training loop using `tf.GradientTape()`, to make it easier
# to access gradients on specific variables (the kernel and bias of the first
# Dense layer in this case).
cce = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.optimizers.Adam()
with tf.GradientTape() as tape:
# Do a forward pass on the model, retrieving the intermediate layer's output.
y_pred, intermediate_output = aux_model(x_train)
print(intermediate_output) # Now you can access the intermediate layer's output.
# Compute loss, to enable backprop.
loss = cce(tf.one_hot(y_train, 10), y_pred)
# Do backprop. `gradients` here are for all variables of the model.
# But we know we want the gradients on the kernel and bias of the first
# Dense layer, which happens to be the first two variables of the model.
gradients = tape.gradient(loss, aux_model.variables)
# This is the gradient on the first Dense layer's kernel.
intermediate_layer_kerenl_gradients = gradients[0]
print(intermediate_layer_kerenl_gradients)
# This is the gradient on the first Dense layer's bias.
intermediate_layer_bias_gradients = gradients[1]
print(intermediate_layer_bias_gradients)
# Update the variables of the model.
optimizer.apply_gradients(zip(gradients, aux_model.variables))
The most straightforward solution would go like this:
mid_layer = model.get_layer("layer_name")
you can now treat the "mid_layer" as a model, and for instance:
mid_layer.predict(X)
Oh, also, to get the name of a hidden layer, you can use this:
model.summary()
this will give you some insights about the layer input/output as well.
I have a "How can I do that" question with keras :
Assuming that I have a first neural network, say NNa which has 4 inputs (x,y,z,t) which is already trained.
If I have a second neural network, say NNb, and that its loss function depends on the first neural network.
The custom loss function of NNb customLossNNb calls the prediction of NNa with a fixed grid (x,y,z) and just modify the last variable t.
Here in pseudo-python-code what I would like to do to traine the second NN : NNb:
grid=np.mgrid[0:10:1,0:10:1,0:10:1].reshape(3,-1).T
Y[:,0]=time
Y[:,1]=something
def customLossNNb(NNa,grid):
def diff(y_true,y_pred):
for ii in range(y_true.shape[0]):
currentInput=concatenation of grid and y_true[ii,0]
toto[ii,:]=NNa.predict(currentInput)
#some stuff with toto
return #...
return diff
Then
NNb.compile(loss=customLossNNb(NNa,K.variable(grid)),optimizer='Adam')
NNb.fit(input,Y)
In fact the line that cause me troubles is currentInput=concatenation of grid and y_true[ii,0]
I tried to send to customLossNNb the grid as a tensor with K.variable(grid). But I can't defined a new tensor inside the loss function, something like CurrentY which has a shape (grid.shape[0],1) fill with y[ii,0](i.e. the current t) and then concatenate grid and currentY to build currentInput
Any ideas?
Thanks
You can include your custom loss function into the graph using functional API of keras. The model in this case can be used as a function, something like this:
for l in NNa.layers:
l.trainable=False
x=Input(size)
y=NNb(x)
z=NNa(y)
Predict method will not work, since loss function should be part of the graph, and predict method returns np.array
First, make NNa untrainable. Notice that you should do this recursively if your model has inner models.
def makeUntrainable(layer):
layer.trainable = False
if hasattr(layer, 'layers'):
for l in layer.layers:
makeUntrainable(l)
makeUntrainable(NNa)
Then you have two options:
Attach NNa to the end of your model (notice that both y_true and y_pred will be changed)
Then change your targets (predict with NNa) for correct results since your model is now expecting the output of NNa, not NNb.
Create a custom loss function that uses NNa inside it, without changing your targets
Option 1 - Attaching models
inputs = NNb.inputs
outputs = NNa(NNb.outputs) #make sure NNb is outputing 4 tensors to match NNa inputs
fullModel = Model(inputs,outputs)
#changing the targets:
newY_train = NNa.predict(oldY_train)
Option 2 - Creating a custom loss
Warning: please test whether NNa's weights are really frozen while training this configuration
from keras.losses import binary_crossentropy
def customLoss(true,pred):
true = NNa(true)
pred = NNa(pred)
#use some of the usual losses or create your own
binary_crossentropy(true,pred)
NNb.compile(optimizer=anything, loss = customLoss)