Hi everyone,
Hope you're all doing great.
I am currently struggling with a problem that seems trivial at a first glance - however, I was not able to find a solution to this no matter how many research I've tried to make.
I am currently working on implementing a new method (layerwise relevance propagation) on top of an existing pipeline concerning an Autoencoder architecture, which is coded by a class as follows :
class Autoencoder(tf.keras.Model):
def __init__(self):
self._encoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(500, activation='relu', input_shape=(input_size, )),
tf.keras.layers.Dense(200, activation='relu'),
tf.keras.layers.Dense(50, activation='relu'),
]
)
self._decoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(200, activation='relu', input_shape=(50, )),
tf.keras.layers.Dense(500, activation='relu'),
tf.keras.layers.Dense(input_size)
]
)
The autoencoder is implemented as a sequence of two sequential models - an encoder followed by a decoder.
For this reason, running autoencoder.layers yields :
[<keras.engine.sequential.Sequential at 0x7f190014f080>,
<keras.engine.sequential.Sequential at 0x7f190012a400>]
I am currently working on a neural network method that requires that the input model to be a sequence of layers - in other words, the output of autoencoder.layers should be the following :
[<keras.layers.core.Dense at 0x7f191a2aa438>,
<keras.layers.core.Dense at 0x7f191a2aa748>,
<keras.layers.core.Dense at 0x7f191a2aad30>,
<keras.layers.core.Dense at 0x7f1900123cf8>,
<keras.layers.core.Dense at 0x7f190012ab00>,
<keras.layers.core.Dense at 0x7f190012ae10>]
To that end, I have tried to define a new model as follows :
model = Sequential([autoencoder.encoder.layers[0],
autoencoder.encoder.layers[1],
autoencoder.encoder.layers[2],
autoencoder.decoder.layers[0],
autoencoder.decoder.layers[1],
autoencoder.decoder.layers[2]])
Unfortunately, using this method, I get the following error when building a relevance propagation graph (which is basically a backward propagation graph):
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 1114), dtype=tf.float32, name='dense_input'), name='dense_input', description="created by layer 'dense_input'") at layer "dense". The following previous layers were accessed without issue: []
In case you might want to look deeper into the code responsible for the error :
class LayerwiseRelevancePropagation:
def __init__(self, model, alpha=2, epsilon=1e-7):
# Initialization
self.model = model
self.alpha = alpha
self.beta = 1 - alpha
self.epsilon = epsilon
# Retrieve network parameters
self.names, self.activations, self.weights = get_model_params(self.model)
self.num_layers = len(self.names)
# Build relevance propagation graph
self.relevance = self.relevance_propagation()
print(self.model.input)
self.lrp_runner = K.function(inputs=self.model.input,
outputs=self.relevance)
def relevance_propagation(self):
"""Builds graph for relevance propagation."""
# Forward pass
r = self.model.output
# Relevance propagation
for i in range(self.num_layers-2, -2, -1):
if i==-1:
r = self.backprop_fc(self.weights[i + 1][0], self.weights[i + 1][1], tf.ones_like(self.model.input), r)
elif 'dense' in self.names[i + 1]:
r = self.backprop_fc(self.weights[i + 1][0], self.weights[i + 1][1], self.activations[i], r)
else:
raise Exception("Error: layer type not recognized.")
return r
def backprop_fc(self,w,b,a,r):
# Positive relevance
w_p = K.maximum(w, 0.)
b_p = K.maximum(b, 0.)
z_p = K.dot(a, w_p) + b_p + self.epsilon
s_p = r / z_p
c_p = K.dot(s_p, K.transpose(w_p))
# Negative relevance
w_n = K.minimum(w, 0.)
b_n = K.minimum(b, 0.)
z_n = K.dot(a, w_n) + b_n - self.epsilon
s_n = r / z_n
c_n = K.dot(s_n, K.transpose(w_n))
return a * (self.alpha * c_p + self.beta * c_n)
Any help would be greatly appreciated!
Thank you in advance for your time :)
it seems like I managed to come up with a solution - it is not as elegant as I would have hoped, but it does solve the problem.
# Create the model with the same architecture and a "clean" network graph
model = tf.keras.Sequential(
[tf.keras.layers.Dense(500, activation='relu', input_shape=(input_size, )),
tf.keras.layers.Dense(200, activation='relu'),
tf.keras.layers.Dense(50, activation='relu'),
tf.keras.layers.Dense(200, activation='relu', input_shape=(50, )),
tf.keras.layers.Dense(500, activation='relu'),
tf.keras.layers.Dense(input_size)])
# Copy the weights
model.set_weights(autoencoder.get_weights())
Thanks for your help again !
If I understood your question correctly, then this short code snippet should give you an idea of how you can achieve the desired behavior:
def model_test():
encoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(500, activation='relu', input_shape=(10,)),
tf.keras.layers.Dense(200, activation='relu'),
tf.keras.layers.Dense(50, activation='relu'),
]
)
decoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(200, activation='relu', input_shape=(50,)),
tf.keras.layers.Dense(500, activation='relu'),
tf.keras.layers.Dense(10)
]
)
inputs = layers.Input(shape=(10,))
x = inputs
for layer in encoder.layers:
x = layer(x)
for layer in decoder.layers:
x = layer(x)
model = Model(inputs=inputs, outputs=x)
model.summary()
print(model.layers)
Related
Is there any layer in keras which calculates the derivative wrt input? For example if x is input, the first layer is say f(x), then the next layer's output should be f'(x). There are multiple question here about this topic but all of them involve computation of derivative outside the model. In essence, I want to create a neural network whose loss function involves both the jacobian and hessians wrt the inputs.
I've tried the following
import keras.backend as K
def create_model():
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = K.gradients(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model()
X = np.random.uniform(size = (3, 10))
This is gives the error tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
So I tried using that
def create_model2():
with tf.GradientTape() as tape:
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = tape.gradient(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model2()
X = np.random.uniform(size = (3, 10))
but this tells me 'KerasTensor' object has no attribute '_id'
Both these methods work fine outside the model. My end goal is to use the Jacobian and Hessian in the loss function, so alternative approaches would also be appreciated
Not sure what exactly you want to do, but maybe try a custom Keras layer with tf.gradients:
import tensorflow as tf
tf.random.set_seed(111)
class GradientLayer(tf.keras.layers.Layer):
def __init__(self):
super(GradientLayer, self).__init__()
self.dense = tf.keras.layers.Dense(1, activation = "sigmoid")
#tf.function
def call(self, inputs):
outputs = self.dense(inputs)
return tf.gradients(outputs, inputs)
def create_model2():
gradient_layer = GradientLayer()
inputs = tf.keras.layers.Input(shape = (10,))
outputs = gradient_layer(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
model = create_model2()
X = tf.random.uniform((3, 10))
print(model(X))
tf.Tensor(
[[-0.07935508 -0.12471244 -0.0702782 -0.06729251 0.14465885 -0.0818079
-0.08996294 0.07622238 0.11422144 -0.08126545]
[-0.08666676 -0.13620329 -0.07675356 -0.07349276 0.15798753 -0.08934557
-0.09825202 0.08324542 0.12474566 -0.08875315]
[-0.08661086 -0.13611545 -0.07670406 -0.07344536 0.15788564 -0.08928795
-0.09818865 0.08319173 0.12466521 -0.08869591]], shape=(3, 10), dtype=float32)
I want to build a network that should be able to verificate images (e.g. human faces). As I understand, that the best solution for that is Siamese network with a triplet loss. I didn't found any ready-made implementations, so I decided to create my own.
But I have question about Keras. For example, here's the structure of the network:
And the code is something like that:
embedding = Sequential([
Flatten(),
Dense(1024, activation='relu'),
Dense(64),
Lambda(lambda x: K.l2_normalize(x, axis=-1))
])
input_a = Input(shape=shape, name='anchor')
input_p = Input(shape=shape, name='positive')
input_n = Input(shape=shape, name='negative')
emb_a = embedding(input_a)
emb_p = embedding(input_p)
emb_n = embedding(input_n)
out = Concatenate()([emb_a, emb_p, emp_n])
model = Model([input_a, input_p, input_n], out)
model.compile(optimizer='adam', loss=<triplet_loss>)
I defined only one embedding model. Does this mean that once the model starts training weights would be the same for each input?
If it is, how can I extract embedding weights from the model?
Yes, In triplet loss function weights should be shared across all three networks, i.e Anchor, Positive and Negetive.
In Tensorflow 1.x to achieve weight sharing you can use reuse=True in tf.layers.
But in Tensorflow 2.x since the tf.layers has been moved to tf.keras.layers and reuse functionality has been removed.
To achieve weight sharing you can write a custom layer that takes the parent layer and reuses its weights.
Below is the sample example to do the same.
class SharedConv(tf.keras.layers.Layer):
def __init__(
self,
filters,
kernel_size,
strides=None,
padding=None,
dilation_rates=None,
activation=None,
use_bias=True,
**kwargs
):
self.filters = filters
self.kernel_size = kernel_size
self.strides = strides
self.padding = padding
self.dilation_rates = dilation_rates
self.activation = activation
self.use_bias = use_bias
super().__init__(*args, **kwargs)
def build(self, input_shape):
self.conv = Conv2D(
self.filters,
self.kernel_size,
padding=self.padding,
dilation_rate=self.dilation_rates[0]
)
self.net1 = Activation(self.activation)
self.net2 = Activation(self.activation)
def call(self, inputs, **kwargs):
x1 = self.conv(inputs)
x1 = self.act1(x1)
x2 = tf.nn.conv2d(
inputs,
self.conv.weights[0],
padding=self.padding,
strides=self.strides,
dilations=self.dilation_rates[1]
)
if self.use_bias:
x2 = x2 + self.conv.weights[1]
x2 = self.act2(x2)
return x1, x2
I will answer on how to extract the embeddings (reference from my Github post):
My trained siamese model looked like this:
siamese_model.summary()
Note that my newly redefined model is basically the same as the one highlighted in yellow
I then redefined my model which I wanted to use for extracting embeddings (It should be the same model you defined except now it will not have those multiple inputs like siamese) which looked like this:
siamese_embeddings_model = build_siamese_model(input_shape)
siamese_embeddings_model .summary()
Then I just extracted the weights from my trained siamese model and set them into my new model
embeddings_weights = siamese_model.layers[-3].get_weights()
siamese_embeddings_model.set_weights(embeddings_weights )
Then you can supply the new Image to extract the embeddings from the new model
vector = siamese.predict(image)
len(vector[0]) it will print 150 because of my fine dense layer (which are the output vector)
I'm trying to train a neural network in keras but I'm getting as error that there are no gradients for any variable, which may imply that the graph is disconnected.
I'm copying here a stripped down version of the code with only the bit related to the model definition.
The model accepts two inputs that will be fed, one at time, to the same shared model: the encoder.
The two outputs of the encoder are then concatenated and sent to a dense layer to compute the final output.
I don't get what's wrong, it looks like that when instantiating the encoder I'm creating additional trainable variables that are not used anywhere.
For the network layout I was getting inspiration from the official keras docs:
https://keras.io/guides/functional_api/#all-models-are-callable-just-like-layers
def _get_encoder(self, model_input_shape):
encoder_input = Input(shape=model_input_shape)
x = encoder_input
x = Conv2D(32, (3, 3), strides=1, padding="same")(x)
x = BatchNormalization(axis=-1)(x)
x = LeakyReLU(alpha=0.1)(x)
latent_z = Flatten()(x)
latent_z = Dense(self.latent_dim)(latent_z)
encoder = Model(
encoder_input,
latent_z,
name='encoder'
)
return encoder
def build_model(self):
model_input_shape = (self.height, self.width, self.depth)
model_input_1 = Input(shape=model_input_shape)
model_input_2 = Input(shape=model_input_shape)
self.encoder = self._get_encoder(model_input_shape)
z_1 = self.encoder(model_input_1)
z_2 = self.encoder(model_input_2)
x = concatenate([z_1, z_2])
prediction = Dense(1, activation='sigmoid')(x)
self.network = Model(
inputs=[model_input_1, model_input_2],
outputs=[prediction],
name = 'network'
)
network.network.compile(
optimizer='rmsprop',
loss='mse',
metrics=['mae'])
H = network.network.fit(
x=train_gen,
validation_data=test_gen,
epochs=EPOCHS,
steps_per_epoch=STEPS,
validation_steps=STEPS)
I found the problem. My custom data generator was returning a list [x,y] instead of a tuple (x,y). Where x is the input and y the target. A simple mistake that was causing totally unrelated errors.
I am trying to replicate (a way smaller version) of the AlphaGo Zero system. However, in the network model, I am having a problem. The loss function I am supposed to implement is the following:
Where:
z is the label (a real value between -1 and 1) of one of the two heads of network and v is this value predicted by the network.
pi is the label of a distribution probability over all actions and p is the distribution probability over all actions predicted by the network.
c is the L2 regularization parameter.
I pass to the network a list of channels (representing the game state) and an array (same size of the pi and p) representing which actions are indeed valid (by putting 1 if valid, 0 otherwise).
As you can see, the loss function uses both the target and the network predictions for the calculation. But after extensive search, when implementing my custom loss function, I can only pass as parameter y_true and y_pred even though I have two "y_true's" and two "y_pred's". I have tried using indexing to get those values but I'm pretty sure it is not working.
The modeling of the network and the custom loss function is in the code below:
def custom_loss(y_true, y_pred):
# I am pretty sure this does not work
output_prob_dist = y_pred[0]
output_value = y_pred[1]
label_prob_dist = y_true[0]
label_value = y_pred[1]
mse_loss = K.mean(K.square(label_value - output_value), axis=-1)
cross_entropy_loss = K.dot(K.transpose(label_prob_dist), output_prob_dist)
return mse_loss - cross_entropy_loss
def define_model():
"""Neural Network model implementation using Keras + Tensorflow."""
state_channels = Input(shape = (5,5,6), name='States_Channels_Input')
valid_actions_dist = Input(shape = (32,), name='Valid_Actions_Input')
conv = Conv2D(filters=10, kernel_size=2, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='Conv_Layer')(state_channels)
pool = MaxPooling2D(pool_size=(2, 2), name='Pooling_Layer')(conv)
flat = Flatten(name='Flatten_Layer')(pool)
# Merge of the flattened channels (after pooling) and the valid action
# distribution. Used only as input in the probability distribution head.
merge = concatenate([flat, valid_actions_dist])
#Probability distribution over actions
hidden_fc_prob_dist_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_1')(merge)
hidden_fc_prob_dist_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_2')(hidden_fc_prob_dist_1)
output_prob_dist = Dense(32, kernel_regularizer=regularizers.l2(0.0001), activation='softmax', name='Output_Dist')(hidden_fc_prob_dist_2)
#Value of a state
hidden_fc_value_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_1')(flat)
hidden_fc_value_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_2')(hidden_fc_value_1)
output_value = Dense(1, kernel_regularizer=regularizers.l2(0.0001), activation='tanh', name='Output_Value')(hidden_fc_value_2)
model = Model(inputs=[state_channels, valid_actions_dist], outputs=[output_prob_dist, output_value])
model.compile(loss=custom_loss, optimizer='adam', metrics=['accuracy'])
return model
# In the main method
model = define_model()
# ...
# MCTS routine to collect the data for the network input
# ...
x_train = [channels_input, valid_actions_dist_input]
y_train = [dist_probs_label, who_won_label]
model.fit(x_train, y_train, epochs=10)
In short, my question is: how do I correctly implement this custom loss function that uses both the network outputs and label values of the network?
I check their git and there is a lot going on; As showing in the equetion the final loss is the combination of three different losses, and the three networks are minimizing this final loss. Their code of losses is below:
# train ops
policy_cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=tf.stop_gradient(labels['pi_tensor'])))
value_cost = params['value_cost_weight'] * tf.reduce_mean(
tf.square(value_output - labels['value_tensor']))
reg_vars = [v for v in tf.trainable_variables()
if 'bias' not in v.name and 'beta' not in v.name]
l2_cost = params['l2_strength'] * \
tf.add_n([tf.nn.l2_loss(v) for v in reg_vars])
combined_cost = policy_cost + value_cost + l2_cost
You can refer this and make your changes accordingly.
I want to program a neural network and I'm using the Keras library for it. One dataset is divided into a random number of subsets (1-100). Not used subsets are set to zero. One subset consists of 2*4+1 binary input values. The Architecture should look like this (The weights of all subset networks should be shared):
. InA1(4) InB1(4) _
. \ / \
. FCNA FCNB |
. \ / |
. Concatinate |
. | \ 100x (InA2, InB2, InC2, InA3, ...)
. FCN /
.InC(1) | |
. \ / |
. \ / _/
. Concatinate
. |
. FCN
. |
. Out(1)
I have looked through a number of tutorials and examples but I dont find a proper method to implement that network. Here is what I have tried so far:
from keras import *
# define arrays for training set input
InA = []
InB = []
InC = []
for i in range(100):
InA.append( Input(shape=4,), dtype='int32') )
InB.append( Input(shape=4,), dtype='int32') )
InC.append( Input(shape=1,), dtype='int32') )
NetA = Sequential()
NetA.add(Dense(4, input_shape(4,), activation="relu"))
NetA.add(Dense(3, activation="relu"))
NetB = Sequential()
NetB.add(Dense(4, input_shape(4,), activation="relu"))
NetB.add(Dense(3, activation="relu"))
NetMergeAB = Sequential()
NetMergeAB.add(Dense(1, input_shape=(3,2), activation="relu"))
# merging all subsample networks of InA, InB
MergeList = []
for i in range(100):
NetConcat = Concatenate()( [NetA(InA[i]), NetB(InB[i])] )
MergedNode = NetMergeAB(NetConcat)
MergeList.append(MergedNode)
MergeList.append(InC[i])
# merging also InC
FullConcat = Concatenate()(MergeList)
# put in fully connected net
ConcatNet = Sequential()
ConcatNet.add(Dense(10, input_shape(2, 100), activation="relu"))
ConcatNet.add(Dense(6, activation="relu"))
ConcatNet.add(Dense(4, activation="relu"))
ConcatNet.add(Dense(1, activation="relu"))
Output = ConcatNet(FullConcat)
The problem is, that either I get a "no Tensor" error, or it doesnt work at all. Has someone a idea how to solve this properly?
You can achieve that network architecture easily with the functional API and not use Sequential at all:
InA, InB, InC = [Input(shape=(4,), dtype='int32') for _ in range(3)]
netA = Dense(4, activation="relu")(InA)
netA = Dense(3, activation="relu")(netA)
netB = Dense(4, activation="relu")(InB)
netB = Dense(3, activation="relu")(netB)
netMergeAB = concatenate([netA, netB])
netMergeAB = Dense(1, activation="relu")(netMergeAB)
fullConcat = concatenate([netMergeAB, InC])
out = Dense(10, activation="relu")(fullConcat)
out = Dense(6, activation="relu")(out)
out = Dense(4, activation="relu")(out)
out = Dense(1, activation="relu")(out)
model = Model([InA, InB, InC], out)
You might need to adjust it slightly but the overall idea should be clear.
Using the code from the answer of the question author:
ActInA = Input(shape=(4,), dtype='int32')
ActInB = Input(shape=(4,), dtype='int32')
ActInC = Input(shape=(1,), dtype='int32')
NetA = Dense(4, activation="relu")(ActInA)
NetA = Dense(3, activation="relu")(NetA)
NetB = Dense(4, activation="relu")(ActInB)
NetB = Dense(3, activation="relu")(NetB)
NetAB = concatenate([NetA, NetB])
NetAB = Dense(1, activation="relu")(NetAB)
Now we build a model for this subset of the net:
mymodel = Model([ActInA, ActInB], NetAB)
Now the important part from the keras doc:
All models are callable, just like layers
this means you can simpy do something like this:
for i in range(100):
NetMergeABC.append(mymodel([ActInA_array[i], ActInB_array[i]]))
Because you reuse the layers, the weights will be shared.
I have changed my code and I hope that it is more clear now:
NetMergeABC = []
for i in range(100):
ActInA = Input(shape=(4,), dtype='int32')
ActInB = Input(shape=(4,), dtype='int32')
ActInC = Input(shape=(1,), dtype='int32')
NetA = Dense(4, activation="relu")(ActInA)
NetA = Dense(3, activation="relu")(NetA)
NetB = Dense(4, activation="relu")(ActInB)
NetB = Dense(3, activation="relu")(NetB)
NetAB = concatenate([NetA, NetB])
NetAB = Dense(1, activation="relu")(NetAB)
NetMergeABC.append(NetAB)
NetMergeABC.append(ActInC)
NetABC = concatenate(NetMergeABC)
NetABC = Dense(10, activation="relu")(NetABC)
NetABC = Dense(6, activation="relu")(NetABC)
NetABC = Dense(4, activation="relu")(NetABC)
NetABC = Dense(1, activation="relu")(NetABC)
The problem now is, that (I guess) the weights of the NetA/B/C 1-100 arent shared.