I want to program a neural network and I'm using the Keras library for it. One dataset is divided into a random number of subsets (1-100). Not used subsets are set to zero. One subset consists of 2*4+1 binary input values. The Architecture should look like this (The weights of all subset networks should be shared):
. InA1(4) InB1(4) _
. \ / \
. FCNA FCNB |
. \ / |
. Concatinate |
. | \ 100x (InA2, InB2, InC2, InA3, ...)
. FCN /
.InC(1) | |
. \ / |
. \ / _/
. Concatinate
. |
. FCN
. |
. Out(1)
I have looked through a number of tutorials and examples but I dont find a proper method to implement that network. Here is what I have tried so far:
from keras import *
# define arrays for training set input
InA = []
InB = []
InC = []
for i in range(100):
InA.append( Input(shape=4,), dtype='int32') )
InB.append( Input(shape=4,), dtype='int32') )
InC.append( Input(shape=1,), dtype='int32') )
NetA = Sequential()
NetA.add(Dense(4, input_shape(4,), activation="relu"))
NetA.add(Dense(3, activation="relu"))
NetB = Sequential()
NetB.add(Dense(4, input_shape(4,), activation="relu"))
NetB.add(Dense(3, activation="relu"))
NetMergeAB = Sequential()
NetMergeAB.add(Dense(1, input_shape=(3,2), activation="relu"))
# merging all subsample networks of InA, InB
MergeList = []
for i in range(100):
NetConcat = Concatenate()( [NetA(InA[i]), NetB(InB[i])] )
MergedNode = NetMergeAB(NetConcat)
MergeList.append(MergedNode)
MergeList.append(InC[i])
# merging also InC
FullConcat = Concatenate()(MergeList)
# put in fully connected net
ConcatNet = Sequential()
ConcatNet.add(Dense(10, input_shape(2, 100), activation="relu"))
ConcatNet.add(Dense(6, activation="relu"))
ConcatNet.add(Dense(4, activation="relu"))
ConcatNet.add(Dense(1, activation="relu"))
Output = ConcatNet(FullConcat)
The problem is, that either I get a "no Tensor" error, or it doesnt work at all. Has someone a idea how to solve this properly?
You can achieve that network architecture easily with the functional API and not use Sequential at all:
InA, InB, InC = [Input(shape=(4,), dtype='int32') for _ in range(3)]
netA = Dense(4, activation="relu")(InA)
netA = Dense(3, activation="relu")(netA)
netB = Dense(4, activation="relu")(InB)
netB = Dense(3, activation="relu")(netB)
netMergeAB = concatenate([netA, netB])
netMergeAB = Dense(1, activation="relu")(netMergeAB)
fullConcat = concatenate([netMergeAB, InC])
out = Dense(10, activation="relu")(fullConcat)
out = Dense(6, activation="relu")(out)
out = Dense(4, activation="relu")(out)
out = Dense(1, activation="relu")(out)
model = Model([InA, InB, InC], out)
You might need to adjust it slightly but the overall idea should be clear.
Using the code from the answer of the question author:
ActInA = Input(shape=(4,), dtype='int32')
ActInB = Input(shape=(4,), dtype='int32')
ActInC = Input(shape=(1,), dtype='int32')
NetA = Dense(4, activation="relu")(ActInA)
NetA = Dense(3, activation="relu")(NetA)
NetB = Dense(4, activation="relu")(ActInB)
NetB = Dense(3, activation="relu")(NetB)
NetAB = concatenate([NetA, NetB])
NetAB = Dense(1, activation="relu")(NetAB)
Now we build a model for this subset of the net:
mymodel = Model([ActInA, ActInB], NetAB)
Now the important part from the keras doc:
All models are callable, just like layers
this means you can simpy do something like this:
for i in range(100):
NetMergeABC.append(mymodel([ActInA_array[i], ActInB_array[i]]))
Because you reuse the layers, the weights will be shared.
I have changed my code and I hope that it is more clear now:
NetMergeABC = []
for i in range(100):
ActInA = Input(shape=(4,), dtype='int32')
ActInB = Input(shape=(4,), dtype='int32')
ActInC = Input(shape=(1,), dtype='int32')
NetA = Dense(4, activation="relu")(ActInA)
NetA = Dense(3, activation="relu")(NetA)
NetB = Dense(4, activation="relu")(ActInB)
NetB = Dense(3, activation="relu")(NetB)
NetAB = concatenate([NetA, NetB])
NetAB = Dense(1, activation="relu")(NetAB)
NetMergeABC.append(NetAB)
NetMergeABC.append(ActInC)
NetABC = concatenate(NetMergeABC)
NetABC = Dense(10, activation="relu")(NetABC)
NetABC = Dense(6, activation="relu")(NetABC)
NetABC = Dense(4, activation="relu")(NetABC)
NetABC = Dense(1, activation="relu")(NetABC)
The problem now is, that (I guess) the weights of the NetA/B/C 1-100 arent shared.
Related
I have neural net model with multiple sub-models. Model2 includes model1 in it's topology. I want to train model1 first separately from model2, then I want to pass the trained parameters of model1 to model2. Does an approach similar to below passes parameters automatically or do I need to explicitly get weights and biases from model1 and pass to model2 during initalization?
Model1:
main_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='main_input')
x = embedding_layer(main_input)
x = CuDNNLSTM(KG_EMBEDDING_DIM, return_sequences=True)(x)
x = Avg(x)
x = Dense(KG_EMBEDDING_DIM)(x)
x = Activation('relu')(x)
# entity_extraction = Reshape([KG_EMBEDDING_DIM])(x)
entity_extraction = Transpose(x)
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(entity_extraction))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m1 = Model(inputs=main_input, outputs=final_output)
m1.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
m1.summary()
Model2:
x = embedding_layer(main_input)
x = CuDNNLSTM(LSTM_HIDDEN_SIZE, return_sequences=True)(x)
Avg = keras.layers.core.Lambda(lambda x: K.mean(x, axis=1), output_shape=(LSTM_HIDDEN_SIZE,))
x = Avg(x)
x = Dense(LSTM_HIDDEN_SIZE)(x)
main_lstm_out = Activation('relu')(x)
lstm_hidden_and_entity = Concatenate(axis=0)([Transpose(main_lstm_out), entity_extraction])
print("lstm_hidden_and_entity", K.int_shape(lstm_hidden_and_entity))
# input("continue?")
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(lstm_hidden_and_entity))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m2 = Model(inputs=main_input, outputs=final_output)
m2.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
Notice that Model2 uses entity_extraction in it's structure which is a layer in model1 and trained before initialization of Model2. So with this kind of approach, does it transfer parameters?
Hi everyone,
Hope you're all doing great.
I am currently struggling with a problem that seems trivial at a first glance - however, I was not able to find a solution to this no matter how many research I've tried to make.
I am currently working on implementing a new method (layerwise relevance propagation) on top of an existing pipeline concerning an Autoencoder architecture, which is coded by a class as follows :
class Autoencoder(tf.keras.Model):
def __init__(self):
self._encoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(500, activation='relu', input_shape=(input_size, )),
tf.keras.layers.Dense(200, activation='relu'),
tf.keras.layers.Dense(50, activation='relu'),
]
)
self._decoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(200, activation='relu', input_shape=(50, )),
tf.keras.layers.Dense(500, activation='relu'),
tf.keras.layers.Dense(input_size)
]
)
The autoencoder is implemented as a sequence of two sequential models - an encoder followed by a decoder.
For this reason, running autoencoder.layers yields :
[<keras.engine.sequential.Sequential at 0x7f190014f080>,
<keras.engine.sequential.Sequential at 0x7f190012a400>]
I am currently working on a neural network method that requires that the input model to be a sequence of layers - in other words, the output of autoencoder.layers should be the following :
[<keras.layers.core.Dense at 0x7f191a2aa438>,
<keras.layers.core.Dense at 0x7f191a2aa748>,
<keras.layers.core.Dense at 0x7f191a2aad30>,
<keras.layers.core.Dense at 0x7f1900123cf8>,
<keras.layers.core.Dense at 0x7f190012ab00>,
<keras.layers.core.Dense at 0x7f190012ae10>]
To that end, I have tried to define a new model as follows :
model = Sequential([autoencoder.encoder.layers[0],
autoencoder.encoder.layers[1],
autoencoder.encoder.layers[2],
autoencoder.decoder.layers[0],
autoencoder.decoder.layers[1],
autoencoder.decoder.layers[2]])
Unfortunately, using this method, I get the following error when building a relevance propagation graph (which is basically a backward propagation graph):
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 1114), dtype=tf.float32, name='dense_input'), name='dense_input', description="created by layer 'dense_input'") at layer "dense". The following previous layers were accessed without issue: []
In case you might want to look deeper into the code responsible for the error :
class LayerwiseRelevancePropagation:
def __init__(self, model, alpha=2, epsilon=1e-7):
# Initialization
self.model = model
self.alpha = alpha
self.beta = 1 - alpha
self.epsilon = epsilon
# Retrieve network parameters
self.names, self.activations, self.weights = get_model_params(self.model)
self.num_layers = len(self.names)
# Build relevance propagation graph
self.relevance = self.relevance_propagation()
print(self.model.input)
self.lrp_runner = K.function(inputs=self.model.input,
outputs=self.relevance)
def relevance_propagation(self):
"""Builds graph for relevance propagation."""
# Forward pass
r = self.model.output
# Relevance propagation
for i in range(self.num_layers-2, -2, -1):
if i==-1:
r = self.backprop_fc(self.weights[i + 1][0], self.weights[i + 1][1], tf.ones_like(self.model.input), r)
elif 'dense' in self.names[i + 1]:
r = self.backprop_fc(self.weights[i + 1][0], self.weights[i + 1][1], self.activations[i], r)
else:
raise Exception("Error: layer type not recognized.")
return r
def backprop_fc(self,w,b,a,r):
# Positive relevance
w_p = K.maximum(w, 0.)
b_p = K.maximum(b, 0.)
z_p = K.dot(a, w_p) + b_p + self.epsilon
s_p = r / z_p
c_p = K.dot(s_p, K.transpose(w_p))
# Negative relevance
w_n = K.minimum(w, 0.)
b_n = K.minimum(b, 0.)
z_n = K.dot(a, w_n) + b_n - self.epsilon
s_n = r / z_n
c_n = K.dot(s_n, K.transpose(w_n))
return a * (self.alpha * c_p + self.beta * c_n)
Any help would be greatly appreciated!
Thank you in advance for your time :)
it seems like I managed to come up with a solution - it is not as elegant as I would have hoped, but it does solve the problem.
# Create the model with the same architecture and a "clean" network graph
model = tf.keras.Sequential(
[tf.keras.layers.Dense(500, activation='relu', input_shape=(input_size, )),
tf.keras.layers.Dense(200, activation='relu'),
tf.keras.layers.Dense(50, activation='relu'),
tf.keras.layers.Dense(200, activation='relu', input_shape=(50, )),
tf.keras.layers.Dense(500, activation='relu'),
tf.keras.layers.Dense(input_size)])
# Copy the weights
model.set_weights(autoencoder.get_weights())
Thanks for your help again !
If I understood your question correctly, then this short code snippet should give you an idea of how you can achieve the desired behavior:
def model_test():
encoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(500, activation='relu', input_shape=(10,)),
tf.keras.layers.Dense(200, activation='relu'),
tf.keras.layers.Dense(50, activation='relu'),
]
)
decoder = tf.keras.Sequential(
[
tf.keras.layers.Dense(200, activation='relu', input_shape=(50,)),
tf.keras.layers.Dense(500, activation='relu'),
tf.keras.layers.Dense(10)
]
)
inputs = layers.Input(shape=(10,))
x = inputs
for layer in encoder.layers:
x = layer(x)
for layer in decoder.layers:
x = layer(x)
model = Model(inputs=inputs, outputs=x)
model.summary()
print(model.layers)
I'm trying to train a neural network in keras but I'm getting as error that there are no gradients for any variable, which may imply that the graph is disconnected.
I'm copying here a stripped down version of the code with only the bit related to the model definition.
The model accepts two inputs that will be fed, one at time, to the same shared model: the encoder.
The two outputs of the encoder are then concatenated and sent to a dense layer to compute the final output.
I don't get what's wrong, it looks like that when instantiating the encoder I'm creating additional trainable variables that are not used anywhere.
For the network layout I was getting inspiration from the official keras docs:
https://keras.io/guides/functional_api/#all-models-are-callable-just-like-layers
def _get_encoder(self, model_input_shape):
encoder_input = Input(shape=model_input_shape)
x = encoder_input
x = Conv2D(32, (3, 3), strides=1, padding="same")(x)
x = BatchNormalization(axis=-1)(x)
x = LeakyReLU(alpha=0.1)(x)
latent_z = Flatten()(x)
latent_z = Dense(self.latent_dim)(latent_z)
encoder = Model(
encoder_input,
latent_z,
name='encoder'
)
return encoder
def build_model(self):
model_input_shape = (self.height, self.width, self.depth)
model_input_1 = Input(shape=model_input_shape)
model_input_2 = Input(shape=model_input_shape)
self.encoder = self._get_encoder(model_input_shape)
z_1 = self.encoder(model_input_1)
z_2 = self.encoder(model_input_2)
x = concatenate([z_1, z_2])
prediction = Dense(1, activation='sigmoid')(x)
self.network = Model(
inputs=[model_input_1, model_input_2],
outputs=[prediction],
name = 'network'
)
network.network.compile(
optimizer='rmsprop',
loss='mse',
metrics=['mae'])
H = network.network.fit(
x=train_gen,
validation_data=test_gen,
epochs=EPOCHS,
steps_per_epoch=STEPS,
validation_steps=STEPS)
I found the problem. My custom data generator was returning a list [x,y] instead of a tuple (x,y). Where x is the input and y the target. A simple mistake that was causing totally unrelated errors.
I am trying to replicate (a way smaller version) of the AlphaGo Zero system. However, in the network model, I am having a problem. The loss function I am supposed to implement is the following:
Where:
z is the label (a real value between -1 and 1) of one of the two heads of network and v is this value predicted by the network.
pi is the label of a distribution probability over all actions and p is the distribution probability over all actions predicted by the network.
c is the L2 regularization parameter.
I pass to the network a list of channels (representing the game state) and an array (same size of the pi and p) representing which actions are indeed valid (by putting 1 if valid, 0 otherwise).
As you can see, the loss function uses both the target and the network predictions for the calculation. But after extensive search, when implementing my custom loss function, I can only pass as parameter y_true and y_pred even though I have two "y_true's" and two "y_pred's". I have tried using indexing to get those values but I'm pretty sure it is not working.
The modeling of the network and the custom loss function is in the code below:
def custom_loss(y_true, y_pred):
# I am pretty sure this does not work
output_prob_dist = y_pred[0]
output_value = y_pred[1]
label_prob_dist = y_true[0]
label_value = y_pred[1]
mse_loss = K.mean(K.square(label_value - output_value), axis=-1)
cross_entropy_loss = K.dot(K.transpose(label_prob_dist), output_prob_dist)
return mse_loss - cross_entropy_loss
def define_model():
"""Neural Network model implementation using Keras + Tensorflow."""
state_channels = Input(shape = (5,5,6), name='States_Channels_Input')
valid_actions_dist = Input(shape = (32,), name='Valid_Actions_Input')
conv = Conv2D(filters=10, kernel_size=2, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='Conv_Layer')(state_channels)
pool = MaxPooling2D(pool_size=(2, 2), name='Pooling_Layer')(conv)
flat = Flatten(name='Flatten_Layer')(pool)
# Merge of the flattened channels (after pooling) and the valid action
# distribution. Used only as input in the probability distribution head.
merge = concatenate([flat, valid_actions_dist])
#Probability distribution over actions
hidden_fc_prob_dist_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_1')(merge)
hidden_fc_prob_dist_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_2')(hidden_fc_prob_dist_1)
output_prob_dist = Dense(32, kernel_regularizer=regularizers.l2(0.0001), activation='softmax', name='Output_Dist')(hidden_fc_prob_dist_2)
#Value of a state
hidden_fc_value_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_1')(flat)
hidden_fc_value_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_2')(hidden_fc_value_1)
output_value = Dense(1, kernel_regularizer=regularizers.l2(0.0001), activation='tanh', name='Output_Value')(hidden_fc_value_2)
model = Model(inputs=[state_channels, valid_actions_dist], outputs=[output_prob_dist, output_value])
model.compile(loss=custom_loss, optimizer='adam', metrics=['accuracy'])
return model
# In the main method
model = define_model()
# ...
# MCTS routine to collect the data for the network input
# ...
x_train = [channels_input, valid_actions_dist_input]
y_train = [dist_probs_label, who_won_label]
model.fit(x_train, y_train, epochs=10)
In short, my question is: how do I correctly implement this custom loss function that uses both the network outputs and label values of the network?
I check their git and there is a lot going on; As showing in the equetion the final loss is the combination of three different losses, and the three networks are minimizing this final loss. Their code of losses is below:
# train ops
policy_cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=tf.stop_gradient(labels['pi_tensor'])))
value_cost = params['value_cost_weight'] * tf.reduce_mean(
tf.square(value_output - labels['value_tensor']))
reg_vars = [v for v in tf.trainable_variables()
if 'bias' not in v.name and 'beta' not in v.name]
l2_cost = params['l2_strength'] * \
tf.add_n([tf.nn.l2_loss(v) for v in reg_vars])
combined_cost = policy_cost + value_cost + l2_cost
You can refer this and make your changes accordingly.
I saw this code from https://github.com/raducrs/Applications-of-Deep-Learning/blob/master/Image%20captioning%20Flickr8k.ipynb and tried it to run in google colab, however when I run the code below it gave me error. It says
Merge is deprecated
I wonder how I can run this code with keras latest version.
LSTM_CELLS_CAPTION = 256
LSTM_CELLS_MERGED = 1000
image_pre = Sequential()
image_pre.add(Dense(100, input_shape=(IMG_FEATURES_SIZE,), activation='relu', name='fc_image'))
image_pre.add(RepeatVector(MAX_SENTENCE,name='repeat_image'))
caption_model = Sequential()
caption_model.add(Embedding(VOCABULARY_SIZE, EMB_SIZE,
weights=[embedding_matrix],
input_length=MAX_SENTENCE,
trainable=False, name="embedding"))
caption_model.add(LSTM(EMB_SIZE, return_sequences=True, name="lstm_caption"))
caption_model.add(TimeDistributed(Dense(100, name="td_caption")))
combined = Sequential()
combined.add(Merge([image_pre, caption_model], mode='concat', concat_axis=1,name="merge_models"))
combined.add(Bidirectional(LSTM(256,return_sequences=False, name="lstm_merged"),name="bidirectional_lstm"))
combined.add(Dense(VOCABULARY_SIZE,name="fc_merged"))
combined.add(Activation('softmax',name="softmax_combined"))
predictive = Model([image_pre.input, caption_model.input],combined.output)
Merge(mode='concat') is now Concatenate(axis=1).
The following generates a graph correctly on colab.
from tensorflow.python import keras
from keras.layers import *
from keras.models import Model, Sequential
IMG_FEATURES_SIZE = 10
MAX_SENTENCE = 80
VOCABULARY_SIZE = 1000
EMB_SIZE = 100
embedding_matrix = np.zeros((VOCABULARY_SIZE, EMB_SIZE))
LSTM_CELLS_CAPTION = 256
LSTM_CELLS_MERGED = 1000
image_pre = Sequential()
image_pre.add(Dense(100, input_shape=(IMG_FEATURES_SIZE,), activation='relu', name='fc_image'))
image_pre.add(RepeatVector(MAX_SENTENCE,name='repeat_image'))
caption_model = Sequential()
caption_model.add(Embedding(VOCABULARY_SIZE, EMB_SIZE,
weights=[embedding_matrix],
input_length=MAX_SENTENCE,
trainable=False, name="embedding"))
caption_model.add(LSTM(EMB_SIZE, return_sequences=True, name="lstm_caption"))
caption_model.add(TimeDistributed(Dense(100, name="td_caption")))
merge = Concatenate(axis=1,name="merge_models")([image_pre.output, caption_model.output])
lstm = Bidirectional(LSTM(256,return_sequences=False, name="lstm_merged"),name="bidirectional_lstm")(merge)
output = Dense(VOCABULARY_SIZE, name="fc_merged", activation='softmax')(lstm)
predictive = Model([image_pre.input, caption_model.input], output)
predictive.compile('sgd', 'binary_crossentropy')
predictive.summary()
Description:
This is a model with 2 inputs per sample: an image and a caption ( a sequence of words ).
The input graphs merge at the concatenation point (name='merge_models')
The image is processed simply by a Dense layer (you may want to add convolutions to the image branch ); the output of this dense layer is then copied MAX_SENTENCE times in preparation for the merge.
The captions are processed by an LSTM and a Dense layer.
The merge results in MAX_SENTENCE time-steps each with features from both branches.
The combined branch then ends up predicting one class out of VOCABULARY_SIZE.
The model.summary() is a good way to understand the graph.