I'm trying to train a neural network in keras but I'm getting as error that there are no gradients for any variable, which may imply that the graph is disconnected.
I'm copying here a stripped down version of the code with only the bit related to the model definition.
The model accepts two inputs that will be fed, one at time, to the same shared model: the encoder.
The two outputs of the encoder are then concatenated and sent to a dense layer to compute the final output.
I don't get what's wrong, it looks like that when instantiating the encoder I'm creating additional trainable variables that are not used anywhere.
For the network layout I was getting inspiration from the official keras docs:
https://keras.io/guides/functional_api/#all-models-are-callable-just-like-layers
def _get_encoder(self, model_input_shape):
encoder_input = Input(shape=model_input_shape)
x = encoder_input
x = Conv2D(32, (3, 3), strides=1, padding="same")(x)
x = BatchNormalization(axis=-1)(x)
x = LeakyReLU(alpha=0.1)(x)
latent_z = Flatten()(x)
latent_z = Dense(self.latent_dim)(latent_z)
encoder = Model(
encoder_input,
latent_z,
name='encoder'
)
return encoder
def build_model(self):
model_input_shape = (self.height, self.width, self.depth)
model_input_1 = Input(shape=model_input_shape)
model_input_2 = Input(shape=model_input_shape)
self.encoder = self._get_encoder(model_input_shape)
z_1 = self.encoder(model_input_1)
z_2 = self.encoder(model_input_2)
x = concatenate([z_1, z_2])
prediction = Dense(1, activation='sigmoid')(x)
self.network = Model(
inputs=[model_input_1, model_input_2],
outputs=[prediction],
name = 'network'
)
network.network.compile(
optimizer='rmsprop',
loss='mse',
metrics=['mae'])
H = network.network.fit(
x=train_gen,
validation_data=test_gen,
epochs=EPOCHS,
steps_per_epoch=STEPS,
validation_steps=STEPS)
I found the problem. My custom data generator was returning a list [x,y] instead of a tuple (x,y). Where x is the input and y the target. A simple mistake that was causing totally unrelated errors.
Related
Playing around with Variational Autoencoders for some days. I am trying to fit a small toy function with a small model.
I first implemented the model using the Keras Functional API, with the following code:
def define_tfp_encoder(latent_dim, n_inputs=2, kl_weight=1):
prior = tfd.MultivariateNormalDiag(loc=tf.zeros(latent_dim))
input_x = Input((n_inputs,))
input_c = Input((1,))
dense = Dense(25, activation='relu', name='tfpenc/dense_1')(input_x)
dense = Dense(32, activation='relu', name='tfpenc/dense_2')(dense)
dense_z_params = Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim), name='tfpenc/z_params')(dense)
dense_z = tfpl.MultivariateNormalTriL(latent_dim, name='tfpenc/z')(dense_z_params)
#activity_regularizer=tfpl.KLDivergenceRegularizer(prior) # weight=kl_weight
kld = tfpl.KLDivergenceAddLoss(prior, name='tfpenc/kld_add')(dense_z)
model = Model(inputs=input_x, outputs=kld)
return model
def define_tfp_decoder(latent_dim, n_inputs=2):
input_c = Input((1,), name='tfpdec/cond_input')
input_n = Input((latent_dim,))
dense = Dense(15, activation='relu', name='tfpdec/dense_1')(input_n)
dense = Dense(32, activation='relu', name='tfpdec/dense_2')(dense)
dense = Dense(tfpl.IndependentNormal.params_size(n_inputs), name='tfpdec/output')(dense)
output = tfpl.IndependentNormal((n_inputs,))(dense)
model = Model(input_n, output)
return model
def get_custom_unconditional_vae():
latent_size = 5
encoder = define_tfp_encoder(latent_dim=latent_size)
decoder = define_tfp_decoder(latent_dim=latent_size)
encoder.trainable = True
decoder.trainable = True
x = encoder.input
z = encoder.output
out = decoder(z)
vae = Model(inputs=x, outputs=out)
vae.compile(loss=lambda x, pred: -pred.log_prob(x), optimizer='adam')
return encoder, decoder, vae
The vae-model was then fitted and trained on 3000 epochs.
However, it only produced garbage for a very simple quadratic function to fit.
Now it comes:
When creating the exact same model using the sequential API it works as expected and the desired function gets approximated nicely:
And it becomes even stranger for me:
After running tf.random.set_seed(None) the model created using the Functional API also works as expected - What am I missing or not understanding correctly so far? - I assume that there are some differences regarding tf.random.set_seed when using the Sequential vs. the Functional API but... ?
Thanks in advance,
codax
EDIT: I forgot to mention that setting a seed (e.g. tf.random.set_seed(123) leads to identical results for both models not fitting the desired function.
I want to build a network that should be able to verificate images (e.g. human faces). As I understand, that the best solution for that is Siamese network with a triplet loss. I didn't found any ready-made implementations, so I decided to create my own.
But I have question about Keras. For example, here's the structure of the network:
And the code is something like that:
embedding = Sequential([
Flatten(),
Dense(1024, activation='relu'),
Dense(64),
Lambda(lambda x: K.l2_normalize(x, axis=-1))
])
input_a = Input(shape=shape, name='anchor')
input_p = Input(shape=shape, name='positive')
input_n = Input(shape=shape, name='negative')
emb_a = embedding(input_a)
emb_p = embedding(input_p)
emb_n = embedding(input_n)
out = Concatenate()([emb_a, emb_p, emp_n])
model = Model([input_a, input_p, input_n], out)
model.compile(optimizer='adam', loss=<triplet_loss>)
I defined only one embedding model. Does this mean that once the model starts training weights would be the same for each input?
If it is, how can I extract embedding weights from the model?
Yes, In triplet loss function weights should be shared across all three networks, i.e Anchor, Positive and Negetive.
In Tensorflow 1.x to achieve weight sharing you can use reuse=True in tf.layers.
But in Tensorflow 2.x since the tf.layers has been moved to tf.keras.layers and reuse functionality has been removed.
To achieve weight sharing you can write a custom layer that takes the parent layer and reuses its weights.
Below is the sample example to do the same.
class SharedConv(tf.keras.layers.Layer):
def __init__(
self,
filters,
kernel_size,
strides=None,
padding=None,
dilation_rates=None,
activation=None,
use_bias=True,
**kwargs
):
self.filters = filters
self.kernel_size = kernel_size
self.strides = strides
self.padding = padding
self.dilation_rates = dilation_rates
self.activation = activation
self.use_bias = use_bias
super().__init__(*args, **kwargs)
def build(self, input_shape):
self.conv = Conv2D(
self.filters,
self.kernel_size,
padding=self.padding,
dilation_rate=self.dilation_rates[0]
)
self.net1 = Activation(self.activation)
self.net2 = Activation(self.activation)
def call(self, inputs, **kwargs):
x1 = self.conv(inputs)
x1 = self.act1(x1)
x2 = tf.nn.conv2d(
inputs,
self.conv.weights[0],
padding=self.padding,
strides=self.strides,
dilations=self.dilation_rates[1]
)
if self.use_bias:
x2 = x2 + self.conv.weights[1]
x2 = self.act2(x2)
return x1, x2
I will answer on how to extract the embeddings (reference from my Github post):
My trained siamese model looked like this:
siamese_model.summary()
Note that my newly redefined model is basically the same as the one highlighted in yellow
I then redefined my model which I wanted to use for extracting embeddings (It should be the same model you defined except now it will not have those multiple inputs like siamese) which looked like this:
siamese_embeddings_model = build_siamese_model(input_shape)
siamese_embeddings_model .summary()
Then I just extracted the weights from my trained siamese model and set them into my new model
embeddings_weights = siamese_model.layers[-3].get_weights()
siamese_embeddings_model.set_weights(embeddings_weights )
Then you can supply the new Image to extract the embeddings from the new model
vector = siamese.predict(image)
len(vector[0]) it will print 150 because of my fine dense layer (which are the output vector)
I want to use a classification model inside another model as layer, since I thought that keras models can be used as layers also. This is the code of the first model:
cencoder_inputs = keras.layers.Input(shape=[pad_len], dtype=np.int32)
ccondi_input = keras.layers.Input(shape=[1], dtype=np.int32)
ccondi_layer = tf.keras.layers.concatenate([cencoder_inputs, ccondi_input], axis=1)
cembeddings = keras.layers.Embedding(vocab_size, 4)
cencoder_embeddings = cembeddings(ccondi_layer)
clstm = keras.layers.LSTM(128)(cencoder_embeddings)
cout_layer = keras.layers.Dense(16, activation="softmax")(clstm)
classification_model = keras.Model(inputs=[cencoder_inputs, ccondi_input], outputs=[cout_layer])
classification_model.compile(optimizer="Nadam", loss="sparse_categorical_crossentropy", metrics=["accuracy"], experimental_run_tf_function=False)
I train this model, save and reload it as class_model and set trainable=False
This is the code of my model, which should use the model above as layer:
encoder_inputs = keras.layers.Input(shape=[pad_len], dtype=np.int32)
decoder_inputs = keras.layers.Input(shape=[pad_len], dtype=np.int32)
condi_input = keras.layers.Input(shape=[1], dtype=np.int32)
class_layer = class_model((encoder_inputs, condi_input))
#Thats how I use the class model. Compilation goes fine so far
class_pred_layer = keras.layers.Lambda(lambda x: tf.reshape(tf.cast(tf.keras.backend.argmax(x, axis=1), dtype=tf.int32),shape=(tf.shape(encoder_inputs)[0],1)))(class_layer)
# Lambda and reshape layer, so I get 1 prediction per batch as integer
condi_layer = tf.keras.layers.concatenate([encoder_inputs, condi_input, class_pred_layer], axis=1)
embeddings = keras.layers.Embedding(vocab_size, 2)
encoder_embeddings = embeddings(condi_layer)
decoder_embeddings = embeddings(decoder_inputs)
encoder_1 = keras.layers.LSTM(64, return_sequences=True, return_state=True)
encoder_lstm_bidirectional_1 = keras.layers.Bidirectional(encoder_1)
encoder_output, state_h1, state_c1, state_h2, state_c2 = encoder_lstm_bidirectional_1(encoder_embeddings)
encoder_state = [Concatenate()([state_h1, state_h2]), Concatenate()([state_c1, state_c2])]
decoder_lstm = keras.layers.LSTM(64*2, return_sequences=True, return_state=True, name="decoder_lstm")
print(encoder_output.shape)
decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(decoder_embeddings,initial_state=encoder_state)
print(decoder_outputs.shape)
attn_layer = AttentionLayer(name="attention_layer")
attn_out, attn_states = attn_layer([encoder_output, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1, name="decoder_concat_layer")([decoder_outputs, attn_out])
decoder_dense_out = keras.layers.TimeDistributed(keras.layers.Dense(vocab_size, activation="softmax"))
decoder_outputs = decoder_dense_out(decoder_concat_input)
model = keras.Model(inputs=[encoder_inputs, decoder_inputs, condi_input], outputs=[decoder_outputs])
When I execute model.fit(), I receive the following error:
Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor 'input_21:0' shape=(None, 35) dtype=int32>]
I thought trained models could be used easily as layers, what am I doing wrong?
I also already looked into this Post but it didnt help me either.
Thanks for your help!
Ok, i will do 2 things: (1) I will give you an example that works where i had to do call a model inside an other model, and (2) try to give you a hint on what could be your problem here ( i cant really undertand the code but i had in the past the same error )
1.
This is an example of a model that use an other model as an hidden layer:
def model_test(input_shape, sub_model):
inputs = Input(input_shape)
eblock_1_1 = dense_convolve(inputs, n_filters=growth_rate)
eblock_1_2 = dense_convolve(eblock_1_1, n_filters=growth_rate);
dblock_1_1 = dense_convolve(eblock_1_2, n_filters=growth_rate);
dblock_1_2 = dense_convolve(dblock_1_1, n_filters=growth_rate);
final_convolution = Conv3D(2, (1, 1, 1), padding='same', activation='relu')(dblock_1_2)
intermedio = sub_model(final_convolution)
layer = LeakyReLU(alpha=0.3)(intermedio)
model = Model(inputs=inputs, outputs=layer)
return model
I call it like this:
with strategy.scope():
sub_model = tf.keras.models.load_model('link_to_the_model')
sub_model.trainable = False
model = model_test(INPUT_SIZE, sub_model)
model.compile(optimizer=Adam(lr=0.1),
loss=tf.keras.losses.MeanSquaredError(),
metrics=None)
I just tested this on google colab with keras.
I had the same error some time ago when i tried to call a function with eager execution inside a model, the problem here is that the training is executed in graph mode ( you can find online some info about https://towardsdatascience.com/eager-execution-vs-graph-execution-which-is-better-38162ea4dbf6).
If the problem is the call of the model maybe try to do what i did, pass the model as a parameter and call it inside with a layer as argument and use it as a simple layer
I am trying to replicate (a way smaller version) of the AlphaGo Zero system. However, in the network model, I am having a problem. The loss function I am supposed to implement is the following:
Where:
z is the label (a real value between -1 and 1) of one of the two heads of network and v is this value predicted by the network.
pi is the label of a distribution probability over all actions and p is the distribution probability over all actions predicted by the network.
c is the L2 regularization parameter.
I pass to the network a list of channels (representing the game state) and an array (same size of the pi and p) representing which actions are indeed valid (by putting 1 if valid, 0 otherwise).
As you can see, the loss function uses both the target and the network predictions for the calculation. But after extensive search, when implementing my custom loss function, I can only pass as parameter y_true and y_pred even though I have two "y_true's" and two "y_pred's". I have tried using indexing to get those values but I'm pretty sure it is not working.
The modeling of the network and the custom loss function is in the code below:
def custom_loss(y_true, y_pred):
# I am pretty sure this does not work
output_prob_dist = y_pred[0]
output_value = y_pred[1]
label_prob_dist = y_true[0]
label_value = y_pred[1]
mse_loss = K.mean(K.square(label_value - output_value), axis=-1)
cross_entropy_loss = K.dot(K.transpose(label_prob_dist), output_prob_dist)
return mse_loss - cross_entropy_loss
def define_model():
"""Neural Network model implementation using Keras + Tensorflow."""
state_channels = Input(shape = (5,5,6), name='States_Channels_Input')
valid_actions_dist = Input(shape = (32,), name='Valid_Actions_Input')
conv = Conv2D(filters=10, kernel_size=2, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='Conv_Layer')(state_channels)
pool = MaxPooling2D(pool_size=(2, 2), name='Pooling_Layer')(conv)
flat = Flatten(name='Flatten_Layer')(pool)
# Merge of the flattened channels (after pooling) and the valid action
# distribution. Used only as input in the probability distribution head.
merge = concatenate([flat, valid_actions_dist])
#Probability distribution over actions
hidden_fc_prob_dist_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_1')(merge)
hidden_fc_prob_dist_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_2')(hidden_fc_prob_dist_1)
output_prob_dist = Dense(32, kernel_regularizer=regularizers.l2(0.0001), activation='softmax', name='Output_Dist')(hidden_fc_prob_dist_2)
#Value of a state
hidden_fc_value_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_1')(flat)
hidden_fc_value_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_2')(hidden_fc_value_1)
output_value = Dense(1, kernel_regularizer=regularizers.l2(0.0001), activation='tanh', name='Output_Value')(hidden_fc_value_2)
model = Model(inputs=[state_channels, valid_actions_dist], outputs=[output_prob_dist, output_value])
model.compile(loss=custom_loss, optimizer='adam', metrics=['accuracy'])
return model
# In the main method
model = define_model()
# ...
# MCTS routine to collect the data for the network input
# ...
x_train = [channels_input, valid_actions_dist_input]
y_train = [dist_probs_label, who_won_label]
model.fit(x_train, y_train, epochs=10)
In short, my question is: how do I correctly implement this custom loss function that uses both the network outputs and label values of the network?
I check their git and there is a lot going on; As showing in the equetion the final loss is the combination of three different losses, and the three networks are minimizing this final loss. Their code of losses is below:
# train ops
policy_cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=tf.stop_gradient(labels['pi_tensor'])))
value_cost = params['value_cost_weight'] * tf.reduce_mean(
tf.square(value_output - labels['value_tensor']))
reg_vars = [v for v in tf.trainable_variables()
if 'bias' not in v.name and 'beta' not in v.name]
l2_cost = params['l2_strength'] * \
tf.add_n([tf.nn.l2_loss(v) for v in reg_vars])
combined_cost = policy_cost + value_cost + l2_cost
You can refer this and make your changes accordingly.
Well, I have a problem setting up a network consisting of a CNN + Autoencoder for a classification task. The main idea is to use CNN-generated embedding as the input of an autoencoder for the embedding reconstruction process. Well, I was able to define both architectures, but I couldn't merge them into a single graph.
def autoencoder(cnn_out):
xreal = keras.layers.Input(tensor=cnn_out)
(...)
xhat = keras.layers.Dense(cnn_out.shape[1], activation='sigmoid')(dec)
ae = keras.models.Model(inputs=xreal, outputs=xhat)
loss_mse = mse_loss(xreal, xhat)
ae.add_loss(loss_mse)
return ae
def cnnae_model(input_shape):
h1 = keras.layers.Conv2D(8,strides=(1,1), kernel_size=kernel, kernel_regularizer=r.l2(kl), padding='same')(X)
(...)
h5 = keras.layers.AveragePooling2D(pool_size = (2, 2))(h5)
xreal = keras.layers.Flatten()(h5)
cnn = keras.models.Model(inputs=X, outputs=xreal)
cnn_ae = keras.models.Model(inputs=cnn.input, outputs=autoencoder(cnn.output).output)
return cnn_ae
input_shape = (128, 64, 3)
model = cnnae_siamesa(input_shape)
model.compile(loss=contrastve_loss,bacth_size = 16, optimizer=rms, metrics=[accuracy], callbacks=[reduce_lr])
The following error message appears when I try to compile the model:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("flatten_11/Identity:0", shape=(None, 2048), dtype=float32) at layer "input_50". The following previous layers were accessed without issue: []
I did some modifications to your code and produced a working version (one without the error you reported). There are a few changes that have to do with how the output layers are called when connecting up the different submodels, but hopefully you can relate it back to your original model. There is some additional information here that might help clarify: https://www.tensorflow.org/guide/keras/functional#using_the_same_graph_of_layers_to_define_multiple_models. I hope this helps. :
import tensorflow as tf
import numpy as np
print(tf.__version__)
tf.keras.backend.clear_session()
# Code with issue:
def autoencoder(cnn_out):
xreal = cnn_out # tf.keras.layers.Input(tensor=cnn_out)
dec = xreal
xhat = tf.keras.layers.Dense(cnn_out.shape[1], activation='sigmoid', name='AE_Dense')(dec)
# ae = tf.keras.models.Model(inputs=xreal, outputs=xhat, name='AE_Model')
# loss_mse = mse_loss(xreal, xhat)
# ae.add_loss(loss_mse)
return xhat # return last layer of model
def cnnae_model(input_shape):
#CNN model start:
X = tf.keras.layers.Input(input_shape, name='CNN_Input')
h1 = tf.keras.layers.Conv2D(8,kernel_size=(2,2), padding='same', name='CNN_Conv2D')(X)
h5 = h1
h5 = tf.keras.layers.AveragePooling2D(pool_size = (2, 2), name='CNN_AvgPooling2D')(h5)
xreal = tf.keras.layers.Flatten(name='CNN_myFlatten')(h5)
cnn = tf.keras.models.Model(inputs=X, outputs=xreal, name='CNN_Model')
#CNN model end:
ae_model = autoencoder(xreal)
cnn_ae = tf.keras.models.Model(inputs=cnn.input, outputs=ae_model, name='cnn_ae_model')
return cnn_ae
input_shape = (128, 64, 3)
model = cnnae_model(input_shape)
print('model.summary():')
print(model.summary())
model.compile(optimizer='rmsprop', loss='mse')
x_train=np.random.random((2,128,64,3))
y_train=np.random.random((2,16384))
print('x_train.shape:')
print(x_train.shape)
print('y_train.shape:')
print(y_train.shape)
model.fit(x_train, y_train, epochs=1)