Multi-input models using Keras (Model API) - python

I've been trying to construct a multiple input model using Keras. I am coming from using the sequential model and having only one input which was fairly straight-forward. I have been looking at the documentation (https://keras.io/getting-started/functional-api-guide/) and some answers here on StackOverflow (How to "Merge" Sequential models in Keras 2.0?). Basically what I want is to have two inputs train one model. One input is a piece of text and the other is a set of hand-picked features that were extracted from that text. The hand-picked feature vectors are of a constant length. Below is what I've tried so far:
left = Input(shape=(7801,), dtype='float32', name='left_input')
left = Embedding(7801, self.embedding_vector_length, weights=[self.embeddings],
input_length=self.max_document_length, trainable=False)(left)
right = Input(shape=(len(self.z_train), len(self.z_train[0])), dtype='float32', name='right_input')
for i, filter_len in enumerate(filter_sizes):
left = Conv1D(filters=128, kernel_size=filter_len, padding='same', activation=c_activation)(left)
left = MaxPooling1D(pool_size=2)(left)
left = CuDNNLSTM(100, unit_forget_bias=1)(left)
right = CuDNNLSTM(100, unit_forget_bias=1)(right)
left_out = Dense(3, activation=activation, kernel_regularizer=l2(l_2), activity_regularizer=l1(l_1))(left)
right_out = Dense(3, activation=activation, kernel_regularizer=l2(l_2), activity_regularizer=l1(l_1))(right)
for i in range(self.num_outputs):
left_out = Dense(3, activation=activation, kernel_regularizer=l2(l_2), activity_regularizer=l1(l_1))(left_out)
right_out = Dense(3, activation=activation, kernel_regularizer=l2(l_2), activity_regularizer=l1(l_1))(right_out)
left_model = Model(left, left_out)
right_model = Model(right, right_out)
concatenated = merge([left_model, right_model], mode="concat")
out = Dense(3, activation=activation, kernel_regularizer=l2(l_2), activity_regularizer=l1(l_1), name='output_layer')(concatenated)
self.model = Model([left_model, right_model], out)
self.model.compile(loss=loss, optimizer=optimizer, metrics=[cosine, mse, categorical_accuracy])
This gives the error:
TypeError: Input layers to a `Model` must be `InputLayer` objects. Received inputs: Tensor("cu_dnnlstm_1/strided_slice_16:0", shape=(?, 100), dtype=float32). Input 0 (0-based) originates from layer type `CuDNNLSTM`.

The error is clear (and you're almost there). The code is currently attempting to set the inputs as the models [left_model, right_model], instead the inputs must be Input layers [left, right]. The relevant part of the code sample above should read:
self.model = Model([left, rigt], out)
see my answer here as reference: Merging layers especially the second example.

Related

Unexpected behaviour of from_logits in BinaryCrossentropy?

I am playing with a naive U-net that I'm deploying on MNIST as a toy dataset.
I am seeing a strange behaviour in the way the from_logits argument works in tf.keras.losses.BinaryCrossentropy.
From what I understand, if in the last layer of any neural network activation='sigmoid' is used, then in tf.keras.losses.BinaryCrossentropy you must use from_logits=False. If instead activation=None, you need from_logits=True. Either of them should work in practice, although from_logits=True appears more stable (e.g., Why does sigmoid & crossentropy of Keras/tensorflow have low precision?). This is not the case in the following example.
So, my unet goes as follows (the full code is at the end of this post):
def unet(input,init_depth,activation):
# do stuff that defines layers
# last layer is a 1x1 convolution
output = tf.keras.layers.Conv2D(1,(1,1), activation=activation)(previous_layer) # shape = (28x28x1)
return tf.keras.Model(input,output)
Now I define two models, one with the activation in the last layer:
input = Layers.Input((28,28,1))
model_withProbs = unet(input,4,activation='sigmoid')
model_withProbs.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
optimizer=tf.keras.optimizers.Adam()) #from_logits=False since the sigmoid is already present
and one without
model_withLogits = unet(input,4,activation=None)
model_withLogits.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam()) #from_logits=True since there is no activation
If I'm right, they should have exactly the same behaviour.
Instead, the prediction for model_withLogits has pixel values up to 2500 or so (which is wrong), while for model_withProbs I get values between 0 and 1 (which is right). You can check out the figures I get here
I thought about the issue of stability (from_logits=True is more stable) but this problem appears even before training (see here). Moreover, the problem is exactly when I pass from_logits=True (that is, for model_withLogits) so I don't think stability is relevant.
Does anybody have any clue of why this is happening? Am I missing anything fundamental here?
Post Scriptum: Codes
Re-purposing MNIST for segmentation.
I load MNIST:
(x_train, labels_train), (x_test, labels_test) = tf.keras.datasets.mnist.load_data()
I am re-purposing MNIST for a segmentation task by setting to one all the non-zero values x_train:
x_train = x_train/255 #normalisation
x_test = x_test/255
Y_train = np.zeros(x_train.shape) #create segmentation map
Y_train[x_train>0] = 1 #Y_train is zero everywhere but where the digit is drawn
Full unet network:
def unet(input, init_depth,activation):
conv1 = Layers.Conv2D(init_depth,(2,2),activation='relu', padding='same')(input)
pool1 = Layers.MaxPool2D((2,2))(conv1)
drop1 = Layers.Dropout(0.2)(pool1)
conv2 = Layers.Conv2D(init_depth*2,(2,2),activation='relu',padding='same')(drop1)
pool2 = Layers.MaxPool2D((2,2))(conv2)
drop2 = Layers.Dropout(0.2)(pool2)
conv3 = Layers.Conv2D(init_depth*4, (2,2), activation='relu',padding='same')(drop2)
#pool3 = Layers.MaxPool2D((2,2))(conv3)
#drop3 = Layers.Dropout(0.2)(conv3)
#upsampling
up1 = Layers.Conv2DTranspose(init_depth*2, (2,2), strides=(2,2))(conv3)
up1 = Layers.concatenate([conv2,up1])
conv4 = Layers.Conv2D(init_depth*2, (2,2), padding='same')(up1)
up2 = Layers.Conv2DTranspose(init_depth,(2,2), strides=(2,2), padding='same')(conv4)
up2 = Layers.concatenate([conv1,up2])
conv5 = Layers.Conv2D(init_depth, (2,2), padding='same' )(up2)
last = Layers.Conv2D(1,(1,1), activation=activation)(conv5)
return tf.keras.Model(inputs=input,outputs=last)

failing to load weights into model XCeption-CNN

I make the same model, I load the weights. It had worked before but now it does not. Maybe I am stupid but I am thought I checked everything.
def make_model(trainable = False):
xception = Xception(include_top=False, weights='imagenet', input_shape=input_shape)
xception.trainable = trainable
inputs = Input(shape=input_shape, name='xception_input')
x = xception(inputs, training=False)
x = Flatten()(x)
x = Dense(256, activation='swish', name='xception_int', trainable=True)(x)
x = Dropout(0.6)(x)
outputs = Dense(17, activation='softmax', name='xception_output')(x)
model = Model(inputs, outputs)
return model
then I load the weights.
model.load_weights('models/xception2_weights.h5')
It has different things it says at the end of the error message. On Mac it says:
ValueError: Shapes (32,) and (3, 3, 32, 64) are incompatible
on Windows:
ValueError: axes don't match array
from what it says on the Mac, I guess it has something to do with the XCeption part. I had used load_weights in the same way successfully before I don't know why it is like this now.
If anyone can help, that would be great
I identified that when you create the model via tf.keras.applications.Xception as a functional model, has a different shape than including tf.keras.applications.Xception a Sequential model.
It also happens if you remove the top layer, adding a new classifier and then you return back to the original Xception shape with a Dense(1000) output.
If your initial model was generated with a different classifier, you added more layers and went back, it will also mismatch. Please check the axis of the original model and this new one and compare the shapes.

Adding a Concatenated layer to TensorFlow 2.0 (using Attention)

In building a model that uses TensorFlow 2.0 Attention I followed the example given in the TF docs. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention
The last line in the example is
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
Then the example has the comment
# Add DNN layers, and create Model.
# ...
So it seemed logical to do this
model = tf.keras.Sequential()
model.add(input_layer)
This produces the error
TypeError: The added layer must be an instance of class Layer.
Found: Tensor("concatenate/Identity:0", shape=(None, 200), dtype=float32)
UPDATE (after #thushv89 response)
What I am trying to do in the end is add an attention layer in the following model which works well (or convert it to an attention model).
model = tf.keras.Sequential()
model.add(layers.Embedding(vocab_size, embedding_nodes, input_length=max_length))
model.add(layers.LSTM(20))
#add attention here?
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error', metrics=['accuracy'])
My data looks like this
4912,5059,5079,0
4663,5145,5146,0
4663,5145,5146,0
4840,5117,5040,0
Where the first three columns are the inputs and the last column is binary and the goal is classification. The data was prepared similarly to this example with a similar purpose, binary classification. https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/
So, first thing is Keras has three APIs when it comes to creating models.
Sequential - (Which is what you're doing here)
Functional - (Which is what I'm using in the solution)
Subclassing - Creating Python classes to represent custom models/layers
The way the model created in the tutorial is not to be used with sequential models but a model from the Functional API. So you got to do the following. Note that, I've taken the liberty of defining the dense layers with arbitrary parameters (e.g. number of output classes, which you can change as needed).
import tensorflow as tf
# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')
# ... the code in the middle
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
# Add DNN layers, and create Model.
# ...
dense_out = tf.keras.layers.Dense(50, activation='relu')(input_layer)
pred = tf.keras.layers.Dense(10, activation='softmax')(dense_out)
model = tf.keras.models.Model(inputs=[query_input, value_input], outputs=pred)
model.summary()

Make fixed timestep length LSTM Keras model free timestep length

I have a Keras LSTM multitask model that performs two tasks. One is a sequence tagging task (so I predict a label per token). The other is a global classification task over the whole sequence using a CNN that is stacked on the hidden states of the LSTM.
In my setup (don't ask why) I only need the CNN task during training, but the labels it predicts have no use on the final product. So, on Keras, one can train a LSTM model without especifiying the input sequence lenght. like this:
l_input = Input(shape=(None,), dtype="int32", name=input_name)
However, if I add the CNN stacked on the LSTM hidden states I need to set a fixed sequence length for the model.
l_input = Input(shape=(timesteps_size,), dtype="int32", name=input_name)
The problem is that once I have trained the model with a fixed timestep_size I can no longer use it to predict longer sequences.
In other frameworks this is not a problem. But in Keras, I cannot get rid of the CNN and change the expected input shape of the model once it has been trained.
Here is a simplified version of the model
l_input = Input(shape=(timesteps_size,), dtype="int32")
l_embs = Embedding(len(input.keys()), 100)(l_input)
l_blstm = Bidirectional(GRU(300, return_sequences=True))(l_embs)
# Sequential output
l_out1 = TimeDistributed(Dense(len(labels.keys()),
activation="softmax"))(l_blstm)
# Global output
conv1 = Conv1D( filters=5 , kernel_size=10 )( l_embs )
conv1 = Flatten()(MaxPooling1D(pool_size=2)( conv1 ))
conv2 = Conv1D( filters=5 , kernel_size=8 )( l_embs )
conv2 = Flatten()(MaxPooling1D(pool_size=2)( conv2 ))
conv = Concatenate()( [conv1,conv2] )
conv = Dense(50, activation="relu")(conv)
l_out2 = Dense( len(global_labels.keys()) ,activation='softmax')(conv)
model = Model(input=input, output=[l_out1, l_out2])
optimizer = Adam()
model.compile(optimizer=optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"])
I would like to know if anyone here has faced this issue, and if there are any solutions to delete layers from a model after training and, more important, how to reshape input layer sizes after training.
Thanks
Variable timesteps length makes a problem not because of using convolution layers (actually the good thing about convolution layers is that they do not depend on the input size). Rather, using Flatten layers cause the problem here since they need an input with specified size. Instead, you can use Global Pooling layers. Further, I think stacking convolution and pooling layers on top of each other might give a better result instead of using two separate convolution layers and merging them (although this depends on the specific problem and dataset you are working on). So considering these two points it might be better to write your model like this:
# Global output
conv1 = Conv1D(filters=16, kernel_size=5)(l_embs)
conv1 = MaxPooling1D(pool_size=2)(conv1)
conv2 = Conv1D(filters=32, kernel_size=5)(conv1)
conv2 = MaxPooling1D(pool_size=2)(conv2)
gpool = GlobalAveragePooling1D()(conv2)
x = Dense(50, activation="relu")(gpool)
l_out2 = Dense(len(global_labels.keys()), activation='softmax')(x)
model = Model(inputs=l_input, outputs=[l_out1, l_out2])
You may need to tune the number of conv+maxpool layers, number of filters, kernel size and even add dropout or batch normalization layers.
As a side note, using TimeDistributed on a Dense layer is redundant as the Dense layer is applied on the last axis.

Python Keras Multiple Input Layers - How to Concatenate/Merge?

In python, I am trying to build a neural network model using Sequential in keras to perform binary classification. Note that X is a numpy array of time series data 59x1000x3 (samples x timesteps x features) and D is a numpy array of 59x100 (samples x auxillary features). I want to pass the time series through an lstm layer, and then augment at a later layer with the accompanying features (i.e. concatenate two layers).
My code to fit the model is below:
def fit_model(X, y, D, neurons, batch_size, nb_epoch):
model = Sequential()
model.add(LSTM(units = neurons, input_shape = (X.shape[1], X.shape[2]))
model.add(Dropout(0.1))
model.add(Dense(10))
input1 = Sequential()
d = K.variable(D)
d_input = Input(tensor=d)
input1.add(InputLayer(input_tensor=d_input))
input1.add(Dropout(0.1))
input1.add(Dense(10))
final_model = Sequential()
merged = Concatenate([model, input1])
final_model.add(merged)
final_model.add(Dense(1, activation='sigmoid'))
final_model.compile(loss = 'binary_crossentropy', optimizer = 'adam')
final_model.fit(X, y, batch_size = batch_size, epochs = nb_epoch)
return final_model
I get the following error:
ValueError: A Concatenate layer should be called on a list of at least 2 inputs
I tried using various permutations of merge/concatenate/the functional api/not the functional api, but I keep landing with some sort of error. I've seen answers using Merge from keras.engine.topology. However, it seems to now be deprecated. Any suggestions to fix the error when using Sequential or how to convert the code to the functional API would be appreciated. Thanks.
You are incorrectly passing a Model and an Input as parameters of the Concatenate layer:
merged = Concatenate([model, input1])
Try passing another Input layer instead:
merged = Concatenate([input1, input2])

Categories