The keras model is like this:
input_x = Input(shape=input_shape)
x=Conv2D(...)(input_x)
...
y_pred1 = Conv2D(...)(x) # shape of (None, 80, 80, 2)
y_pred2 = Dense(...)(x) # shape of (None, 4)
y_merged = Concatenate(...)([y_pred1, y_pred2])
model = Model(input_x, y_merged)
y_pred1 and y_pred2 are the results I want the model to learn to predict.
But the loss function fcn1 for the y_pred1 branch need y_pred2 prediction results, so I have to concatenate the results of the two branches to get y_merged, so that fcn1 will have access to y_pred2.
The problem is, I want to use the Concatenate layer to concatenate the y_pred1 (None, 4) output with the y_pred2 (None, 80, 80, 2) output, but I don't know how to do that.
How can I reshape the (None, 4) to (None, 80, 80, 1)? For example, by filling the (None, 80, 80, 1) with the 4 elements in y_pred2 and zeros.
Is there any better solutions than using the Concatenate layer?
Maybe this extracted piece of code could help you:
tf.print(condi_input.shape)
# shape is TensorShape([None, 1])
condi_i_casted = tf.expand_dims(condi_input, 2)
tf.print(condi_i_casted.shape)
# shape is TensorShape([None, 1, 1])
broadcasted_val = tf.broadcast_to(condi_i_casted, shape=tf.shape(decoder_outputs))
tf.print(broadcasted_val.shape)
# shape is TensorShape([None, 23, 256])
When you want to broadcast a value, first think about what exactly you want to broadcast. In this example, condi_input has shape(None,1) and helped me as a condition for my encoder-decoder lstm network. To match all dimensionalities, of the encoder states of the lstm, first I had to use tf.expand_dims() to expand the condition value from a shape like [[1]] to [[[1]]].
This is what you need to do first. If you have a prediction as a softmax from the dense layers, you might want to use tf.argmax() first, so you only have one value, which is way easier to broadcast. However, its also possible with 4 but keep in mind, that the dimensions need to match. You cannot broadcast shape(None,4) to shape(None,6), but to shape(None,8) since 8 is devidable through 4.
Then you you can use tf.broadcast() to broadcast your value into the desired shape. Then you have two shapes, you can concatenate together.
hope this helps you out.
Figured it out, the code is like this:
input_x = Input(shape=input_shape)
x=Conv2D(...)(input_x)
...
y_pred1 = Conv2D(...)(x) # shape of (None, 80, 80, 2)
y_pred2 = Dense(4)(x) # (None, 4)
# =========transform to concatenate:===========
y_pred2_matrix = Lambda(lambda x: K.expand_dims(K.expand_dims(x, -1)))(y_pred2) # (None,4, 1,1)
y_pred2_matrix = ZeroPadding2D(padding=((0,76),(0,79)))(y_pred2_matrix) # (None, 80, 80,1)
y_merged = Concatenate(axis=-1)([y_pred1, y_pred2_matrix]) # (None, 80, 80, 3)
The 4 elements of y_pred2 can be indexed as y_merged[None, :4, 0, 2]
Related
I want to obtain the output of intermediate sub-model layers with tf2.keras.Here is a model composed of two sub-modules:
input_shape = (100, 100, 3)
def model1():
input = tf.keras.layers.Input(input_shape)
cov = tf.keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,name='cov1')(input)
embedding_model = tf.keras.Model(input,cov,name='model1')
return embedding_model
def model2(embedding_model):
input_sequence = tf.keras.layers.Input((None,) + input_shape)
sequence_embedding = tf.keras.layers.TimeDistributed(embedding_model,name='time_dis1')
emb = sequence_embedding(input_sequence)
att = tf.keras.layers.Attention()([emb,emb])
dense1 = tf.keras.layers.Dense(64,name='dense1')(att)
outputs = tf.keras.layers.Softmax()(dense1)
final_model = tf.keras.Model(inputs=input_sequence, outputs=outputs,name='model2')
return final_model
embedding_model = model1()
model2 = model2(embedding_model)
print(model2.summary())
output:
Model: "model2"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, None, 100, 1 0
__________________________________________________________________________________________________
time_dis1 (TimeDistributed) (None, None, 98, 98, 896 input_2[0][0]
__________________________________________________________________________________________________
attention (Attention) (None, None, 98, 98, 0 time_dis1[0][0]
time_dis1[0][0]
__________________________________________________________________________________________________
dense1 (Dense) (None, None, 98, 98, 2112 attention[0][0]
__________________________________________________________________________________________________
softmax (Softmax) (None, None, 98, 98, 0 dense1[0][0]
==================================================================================================
Total params: 3,008
Trainable params: 3,008
Non-trainable params: 0
and then,I want to get output intermediate layer of model1 and model2:
model1_output_layer = model2.get_layer('time_dis1').layer.get_layer('cov1')
output1 = model1_output_layer.get_output_at(0)
output2 = model2.get_layer('dense1').get_output_at(0)
output_tensors = [output1,output2]
model2_input = model2.input
submodel = tf.keras.Model([model2_input],output_tensors)
input_data2 = np.zeros((1,10,100,100,3))
result = submodel.predict([input_data2])
print(result)
Running in tf2.3 ,the error I am getting is:
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py", line 115, in __init__
self._init_graph_network(inputs, outputs)
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py", line 191, in _init_graph_network
self.inputs, self.outputs)
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py", line 931, in _map_graph_network
str(layers_with_complete_input))
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(None, 100, 100, 3), dtype=float32) at layer "cov1". The following previous layers were accessed without issue: ['time_dis1', 'attention', 'dense1']
But the following code works:
model1_input = embedding_model.input
model2_input = model2.input
submodel = tf.keras.Model([model1_input,model2_input],output_tensors)
input_data1 = np.zeros((1,100,100,3))
input_data2 = np.zeros((1,10,100,100,3))
result = submodel.predict([input_data1,input_data2])
print(result)
But not what I want.This is strange, model1 is part of model2, so why do we need to input an extra tensor.Sometimes,it is hard to get an extra tensor,especially for complex models.
so why do we need to input an extra tensor
Short answer is, TensorFlow doesn't know to make the connection between the inputs you expect it to make. The problem arises because you're passing a Model (instead of a Layer) to your TimeDistributed layer. This leave the Input layer of your model1 hanging, unless you explicitly pass it an input. The TimeDistributed layer is not smart enough to handle models in this way.
My solution would depend on the answer to the following question,
Why do you need model1? All it has is a Conv2D layer. You can easily do
sequence_embedding = tf.keras.layers.TimeDistributed(
tf.keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,name='cov1'),
name='time_dis1'
)
If you do this, now you gotta change the following lines,
model1_output_layer = model2.get_layer('time_dis1').layer.get_layer('cov1')
output1 = model1_output_layer.get_output_at(0)
to something like (the exact output you want will depend on what you're actually after)
model1_output_layer = model2.get_layer('time_dis1')
output1 = model1_output_layer.output
# This output1 may need further processing depending on what you need
# e.g. if you need mean embeddings over time axis
output_mean = tf.keras.layers.Average(output1, axis=1)
This is because you can't access the output of the layer nested by a TimeDistributed layer. Because the layer passed to the TimeDistributed layer doesn't actually do anything. And it doesn't have a defined output. It's just sitting there as a template for the TimeDistributed layer to compute the output using it. So, to get the output from a TimeDistributed layer, you need to access it via that layer.
You try to do it as you have it (instead of my way), you'll get,
AttributeError: Layer cov1 has no inbound nodes.
You may ask, "why did it work before"?
It's because, before you had a Model there instead of a Layer. Because the Conv2D layer was wrapped by the model, the layer output was defined (because it had an Input layer). And this feeds back to the reason, why it complained about the missing Input from model1 when trying to define the submodel.
I know this explanation may make your head spin as the reasons behind this error are quite convoluted. But going through it a few times will hopefully help.
I'm working with a CNN-LSTM model on Tensorflow 2.0 + Keras to perform sequence classification. My model is defined as following:
inp = Input(input_shape)
rshp = Reshape((input_shape[0]*input_shape[1], 1), input_shape=input_shape)(inp)
cnn1 = Conv1D(100, 9, activation='relu')(rshp)
cnn2 = Conv1D(100, 9, activation='relu')(cnn1)
mp1 = MaxPooling1D((3,))(cnn2)
cnn3 = Conv1D(50, 3, activation='relu')(mp1)
cnn4 = Conv1D(50, 3, activation='relu')(cnn3)
gap1 = AveragePooling1D((3,))(cnn4)
dropout1 = Dropout(rate=dropout[0])(gap1)
flt1 = Flatten()(dropout1)
rshp2 = Reshape((input_shape[0], -1), input_shape=flt1.shape)(flt1)
bilstm1 = Bidirectional(LSTM(240,
return_sequences=True,
recurrent_dropout=dropout[1]),
merge_mode=merge)(rshp2)
dense1 = TimeDistributed(Dense(30, activation='relu'))(rshp2)
dropout2 = Dropout(rate=dropout[2])(dense1)
prediction = TimeDistributed(Dense(1, activation='sigmoid'))(dropout2)
model = Model(inp, prediction, name="CNN-bLSTM_per_segment")
print(model.summary(line_length=75))
Where input_shape = (60, 60). This definition, however, raises the following error:
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
At first, I thought it was because the rshp2 layer could not reshape the flt1 output to shape (60, X). So I added a printing block before the Bidirectional(LSTM)) layer:
print('reshape1: ', rshp.shape)
print('cnn1: ', cnn1.shape)
print('cnn2: ', cnn2.shape)
print('mp1: ', mp1.shape)
print('cnn3: ', cnn3.shape)
print('cnn4: ', cnn4.shape)
print('gap1: ', gap1.shape)
print('flatten 1: ', flt1.shape)
print('reshape 2: ', rshp2.shape)
And the shapes were:
reshape 1: (None, 3600, 1)
cnn1: (None, 3592, 100)
cnn2: (None, 3584, 100)
mp1: (None, 1194, 100)
cnn3: (None, 1192, 50)
cnn4: (None, 1190, 50)
gap1: (None, 396, 50)
flatten 1: (None, 19800)
reshape 2: (None, 60, None)
Looking at the flt1 layer, its output shape is (19800,), which can be reshaped as (60, 330), but for some reason the (60, -1) of the rshp2 layer is not working as intended, evidenced by the print reshape 2: (None, 60, None). When I try to reshape as (60, 330) it works just fine. Does anyone knows why the (-1) is not working?
-1 is working.
From Reshape documentation, https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape
the layer returns a tensor with shape (batch_size,) + target_shape
So, the batch size stays the same, the other dimensions are calculated based on your target_shape.
From the doc, look at the last example,
# also supports shape inference using `-1` as dimension
model.add(tf.keras.layers.Reshape((-1, 2, 2)))
model.output_shape
(None, None, 2, 2)
If you pass -1 in your target shape, the Keras will store None, this is useful if you expect variable-length data in that axis, but if your data shape is always same, just put the dimension hard-coded that will place the dimension when you print the shape later.
N.B: Also no need to specify input_shape=input_shape for your intermediate layers in functional API. The model will infer that for you.
I'm not sure if this is possible or not in Keras, but I would like to know is there any way to concatenate specific values from LSTM layers.
I would like to encode whole sequence using LSTM but for prediction use only specific columns.
For example:
I'm using two input sequences with size of 5 (so my input shape is for one input is (None, 5) and (None, 5).
Then I embed sequences and my input shape is (None, 5, 300) and (None, 5, 300)
Then I encode sequence with LSTM layers with 200 LSTM cells and my final shape is (None, 5, 200) and (None, 5, 200).
Now I would like not to concatenate the whole sequence but the last 4 words encoded in lstm_1 and the first word in lstm_2.
> input_1 = Input(shape=(5, ))
> emb_1 = Embedding(..., 300, ...)(input_1)
> lstm_1 = CuDNNLSTM(200, ...)(emb_1)
>
> input_2 = Input(shape=(5, ))
> emb_2 = Embedding(..., 300, ...)(input_2)
> lstm_2 = CuDNNLSTM(200, ...)(emb_2)
>
> # here is the problem
> emb = concatenate([lstm_1[??], lstm_2[??])
>
> d1 = Dense(...)(emb)
> out = Dense(..., activation="softmax")(d1)
Not sure if I'm making any sense but I would like to know if it's possible using Keras functional API.
Best regards,
Daniel
So I didn't realize that lstm_1 and lstm_2 are actually Numpy arrays that I can concatenate. So the solution is simple, you just need to wrap it in Lambda layer.
lstm_1_lambda = Lambda(lambda x: x[:, -4:, :])(lstm_1)
lstm_2_lambda = Lambda(lambda x: x[:, :1, :])(lstm_2)
emb = concatenate([lstm_1_lambda, lstm_1_lambda)
Best regards,
Daniel
I have two separately designed CNNs for two different features(image and text) of the same data, and the output has two classes
In the very last layer:
for image (resnet), I would like to use "he_normal" as the initializer
flatten1 = Flatten()(image_maxpool)
dense = Dense(output_dim=2, kernel_initializer="he_normal")(flatten1)
but for the text CNNs, i would like to use the default "glorot_normal"
flatten2 = Flatten()(text_maxpool)
output = Dense(output_dim=2, kernel_initializer="glorot_normal")(flatten2)
the flatten1 and flatten2 have sizes:
flatten_1 (Flatten) (None, 512)
flatten_2 (Flatten) (None, 192)
is there anyway i can concate these two flatten layers and have a long dense layer with a size 192+512 = 704, where the first 192 and second 512 has two seperate kernel_initializer, and produce a 2-class outputs?
something like this:
merged_tensor = merge([flatten1, flatten2], mode='concat', concat_axis=1)
output = Dense(output_dim=2,
kernel_initializer for [:512]='he_normal',
kernel_initializer for [512:]='glorot_normal')(merged_tensor)
Edit: I think I have gotten this work by having the following codes(thanks to #Aechlys):
def my_init(shape, shape1, shape2):
x = initializers.he_normal()(shape1)
y = initializers.glorot_normal()(shape2)
return tf.concat([x,y], 0)
class_num = 2
flatten1 = Flatten()(image_maxpool)
flatten2 = Flatten()(text_maxpool)
merged_tensor = concatenate([flatten1, flatten2],axis=-1)
output = Dense(output_dim=class_num, kernel_initializer=lambda shape: my_init(shape,\
shape1=(512,class_num),\
shape2=(192,class_num)),\
activation='softmax')(merged_tensor)
I have to manually add the shape size 512 and 192, because I failed to get the size of flatten1 and flatten1 via the code
flatten1.get_shape().as_list()
,which gave me [none, none], althought it should be [None, 512], other than that it should be fine
Oh my, have I had fun with this one. You have to create your own kernel intializer:
def my_init(shape, dtype=None, *, shape1, shape2):
x = keras.initializers.he_normal()(shape1, dtype=dtype)
y = keras.initializers.glorot_normal()(shape2, dtype=dtype)
return tf.concat([x,y], 0)
Then you will call it via lambda function within the Dense function:
Unfortunately, as you can see, I have not been able to deduce the shape programatically, yet. I may update this answer when I do. But, if you know the shape beforehand you can pass them as constants:
DENSE_UNITS = 64
input_t = Input((1,25))
input_i = Input((1,35))
input_a = Concatenate(axis=-1)([input_t, input_i])
dense = Dense(DENSE_UNITS, kernel_initializer=lambda shape: my_init(shape,
shape1=(int(input_t.shape[-1]), DENSE_UNITS),
shape2=(int(input_i.shape[-1]), DENSE_UNITS)))(input_a)
tf.keras.Model(inputs=[input_t, input_i], outputs=dense)
Out: <tensorflow.python.keras._impl.keras.engine.training.Model at 0x19ff7baac88>
I am trying to adapt a Tensorflow r0.12 code (from https://github.com/ZZUTK/Face-Aging-CAAE) to version 1.2.1 and I am having issues with optimizer.minimize().
In this case I am using GradientDescent, but the following error message is only slightly different (in terms of shapes provided) when I try with different optimizers:
ValueError: Shape must be rank 0 but is rank 1 for 'GradientDescent/update_E_con
v0/w/ApplyGradientDescent' (op: 'ApplyGradientDescent')
with input shapes: [5,5,1,64], [1], [5,5,1,64].
Where [5,5] is my kernels size, 1 is the number of initial channels and 64 is the number of filters in the first convolution. This is the convolutional encoder network it is referring to:
E_conv0: (100, 128, 128, 64)
E_conv1: (100, 64, 64, 128)
E_conv2: (100, 32, 32, 256)
E_conv3: (100, 16, 16, 512)
E_conv4: (100, 8, 8, 1024)
E_conv5: (100, 4, 4, 2048)
...
This is code that's triggering the error:
self.EG_optimizer = tf.train.GradientDescentOptimizer(
learning_rate=EG_learning_rate,
beta1=beta1
).minimize(
loss=self.loss_EG,
global_step=self.EG_global_step,
var_list=self.E_variables + self.G_variables
)
Where:
EG_learning_rate = tf.train.exponential_decay(
learning_rate=learning_rate,
global_step=self.EG_global_step,
decay_steps=size_data / self.size_batch * 2,
decay_rate=decay_rate,
staircase=True
)
self.EG_global_step = tf.get_variable(name='global_step',shape=1, initializer=tf.constant_initializer(0), trainable=False)
And
self.E_variables = [var for var in trainable_variables if 'E_' in var.name]
self.G_variables = [var for var in trainable_variables if 'G_' in var.name]
self.loss_EG = tf.reduce_mean(tf.abs(self.input_image - self.G))
After some debugging I now believe the problem comes from the minimize() method. The error seems to be attributed to the last parameter (var_list) but when I try to comment out the second or third parameter, the error remains the same and is just attributed to the first parameter (loss).
I have changed the code with respect to the one currently on GitHub to adapt it to the new version, so I worked a lot on tf.variable_scope(tf.get_variable_scope(), reuse=True). Could this be the cause?
Thank you so much in advance!
It's tricky to decode, since it comes from an internal op, but this error message points to the cause:
ValueError: Shape must be rank 0 but is rank 1 for 'GradientDescent/update_E_conv0/w/ApplyGradientDescent' (op: 'ApplyGradientDescent')
with input shapes: [5,5,1,64], 1, [5,5,1,64].
One of the inputs to the ApplyGradientDescent op is a rank 1 tensor (i.e. a vector) when it should be a rank 0 tensor (i.e. a scalar). Looking at the definition of the ApplyGradientDescent op, the only scalar input is alpha, or the learning rate.
Therefore, it appears that the EG_learning_rate tensor is a vector when it should be a scalar. A simple fix would be to "slice" a scalar from the EG_learning_rate tensor when you construct the tf.train.GradientDescentOptimizer:
scalar_learning_rate = EG_learning_rate[0]
self.EG_optimizer = tf.train.GradientDescentOptimizer(
learning_rate=scalar_learning_rate,
beta1=beta1
).minimize(
loss=self.loss_EG,
global_step=self.EG_global_step,
var_list=self.E_variables + self.G_variables
)