Keras concatenation of certain values from LSTM layers - python

I'm not sure if this is possible or not in Keras, but I would like to know is there any way to concatenate specific values from LSTM layers.
I would like to encode whole sequence using LSTM but for prediction use only specific columns.
For example:
I'm using two input sequences with size of 5 (so my input shape is for one input is (None, 5) and (None, 5).
Then I embed sequences and my input shape is (None, 5, 300) and (None, 5, 300)
Then I encode sequence with LSTM layers with 200 LSTM cells and my final shape is (None, 5, 200) and (None, 5, 200).
Now I would like not to concatenate the whole sequence but the last 4 words encoded in lstm_1 and the first word in lstm_2.
> input_1 = Input(shape=(5, ))
> emb_1 = Embedding(..., 300, ...)(input_1)
> lstm_1 = CuDNNLSTM(200, ...)(emb_1)
>
> input_2 = Input(shape=(5, ))
> emb_2 = Embedding(..., 300, ...)(input_2)
> lstm_2 = CuDNNLSTM(200, ...)(emb_2)
>
> # here is the problem
> emb = concatenate([lstm_1[??], lstm_2[??])
>
> d1 = Dense(...)(emb)
> out = Dense(..., activation="softmax")(d1)
Not sure if I'm making any sense but I would like to know if it's possible using Keras functional API.
Best regards,
Daniel

So I didn't realize that lstm_1 and lstm_2 are actually Numpy arrays that I can concatenate. So the solution is simple, you just need to wrap it in Lambda layer.
lstm_1_lambda = Lambda(lambda x: x[:, -4:, :])(lstm_1)
lstm_2_lambda = Lambda(lambda x: x[:, :1, :])(lstm_2)
emb = concatenate([lstm_1_lambda, lstm_1_lambda)
Best regards,
Daniel

Related

Concatenate differently shaped keras layer outputs

The keras model is like this:
input_x = Input(shape=input_shape)
x=Conv2D(...)(input_x)
...
y_pred1 = Conv2D(...)(x) # shape of (None, 80, 80, 2)
y_pred2 = Dense(...)(x) # shape of (None, 4)
y_merged = Concatenate(...)([y_pred1, y_pred2])
model = Model(input_x, y_merged)
y_pred1 and y_pred2 are the results I want the model to learn to predict.
But the loss function fcn1 for the y_pred1 branch need y_pred2 prediction results, so I have to concatenate the results of the two branches to get y_merged, so that fcn1 will have access to y_pred2.
The problem is, I want to use the Concatenate layer to concatenate the y_pred1 (None, 4) output with the y_pred2 (None, 80, 80, 2) output, but I don't know how to do that.
How can I reshape the (None, 4) to (None, 80, 80, 1)? For example, by filling the (None, 80, 80, 1) with the 4 elements in y_pred2 and zeros.
Is there any better solutions than using the Concatenate layer?
Maybe this extracted piece of code could help you:
tf.print(condi_input.shape)
# shape is TensorShape([None, 1])
condi_i_casted = tf.expand_dims(condi_input, 2)
tf.print(condi_i_casted.shape)
# shape is TensorShape([None, 1, 1])
broadcasted_val = tf.broadcast_to(condi_i_casted, shape=tf.shape(decoder_outputs))
tf.print(broadcasted_val.shape)
# shape is TensorShape([None, 23, 256])
When you want to broadcast a value, first think about what exactly you want to broadcast. In this example, condi_input has shape(None,1) and helped me as a condition for my encoder-decoder lstm network. To match all dimensionalities, of the encoder states of the lstm, first I had to use tf.expand_dims() to expand the condition value from a shape like [[1]] to [[[1]]].
This is what you need to do first. If you have a prediction as a softmax from the dense layers, you might want to use tf.argmax() first, so you only have one value, which is way easier to broadcast. However, its also possible with 4 but keep in mind, that the dimensions need to match. You cannot broadcast shape(None,4) to shape(None,6), but to shape(None,8) since 8 is devidable through 4.
Then you you can use tf.broadcast() to broadcast your value into the desired shape. Then you have two shapes, you can concatenate together.
hope this helps you out.
Figured it out, the code is like this:
input_x = Input(shape=input_shape)
x=Conv2D(...)(input_x)
...
y_pred1 = Conv2D(...)(x) # shape of (None, 80, 80, 2)
y_pred2 = Dense(4)(x) # (None, 4)
# =========transform to concatenate:===========
y_pred2_matrix = Lambda(lambda x: K.expand_dims(K.expand_dims(x, -1)))(y_pred2) # (None,4, 1,1)
y_pred2_matrix = ZeroPadding2D(padding=((0,76),(0,79)))(y_pred2_matrix) # (None, 80, 80,1)
y_merged = Concatenate(axis=-1)([y_pred1, y_pred2_matrix]) # (None, 80, 80, 3)
The 4 elements of y_pred2 can be indexed as y_merged[None, :4, 0, 2]

Python (TensorFlow) - Concatenating tensor objects with different dimensions

I have a question regarding concatenating two Tensor object in Tensorflow. As you can see in the code below, I would like to concatenate a2 and b1. a2 has shape (None, 1, 512) and b1 has shape (None, 34, 512). I would like to concatenate them along the second argument, thus axis=1.
a_input = Input(shape=(20480,))
b_input = Input(shape=(34,))
a1 = Dense(embedding_dim)(image_input) #N: Activation specification needed here? # shape = (None, 512)
a2 = K.expand_dims(image_emb, axis=1) # shape = (None, 1, 512)
b1 = Embedding(num_words, embedding_dim, mask_zero=True)(caption_input) # shape = (None, 34, 512)
c = concatenate((a2, b1), axis=1)
However, if I execute the code above, I obtain the following error
ValueError: Dimension 0 in both shapes must be equal, but are 512 and 1. Shapes are [512] and [1]. for '{{node concatenate_28/concat_1}} = ConcatV2[N=2, T=DT_BOOL, Tidx=DT_INT32](concatenate_28/ones_like, concatenate_28/ExpandDims, concatenate_28/concat_1/axis)' with input shapes: [?,1,512], [?,34,1], [] and with computed input tensors: input[2] = <1>.
What am I doing wrong here? How can this be solved?
Looking forward to some suggestions!
Providing the solution here (Answer Section), even though it is present in the comment section for the benefit of the community.
ValueError: Dimension 0 in both shapes must be equal, but are 512 and 1. Shapes are [512] and [1]. for '{{node concatenate_28/concat_1}} = ConcatV2[N=2, T=DT_BOOL, Tidx=DT_INT32](concatenate_28/ones_like, concatenate_28/ExpandDims, concatenate_28/concat_1/axis)' with input shapes: [?,1,512], [?,34,1], [] and with computed input tensors: input[2] = <1>.
This error was resolved when modifying code from
b1 = Embedding(num_words, embedding_dim, mask_zero=True)(caption_input)
to
b1 = Embedding(num_words, embedding_dim, mask_zero=False)(caption_input)
Complete updated code in below
a_input = Input(shape=(20480,))
b_input = Input(shape=(34,))
a1 = Dense(embedding_dim)(image_input) #N: Activation specification needed here? # shape = (None, 512)
a2 = K.expand_dims(image_emb, axis=1) # shape = (None, 1, 512)
b1 = Embedding(num_words, embedding_dim, mask_zero=False)(caption_input) # shape = (None, 34, 512)
c = concatenate((a2, b1), axis=1)

Tensorflow 2.0: Shape inference with Reshape returns None dimension

I'm working with a CNN-LSTM model on Tensorflow 2.0 + Keras to perform sequence classification. My model is defined as following:
inp = Input(input_shape)
rshp = Reshape((input_shape[0]*input_shape[1], 1), input_shape=input_shape)(inp)
cnn1 = Conv1D(100, 9, activation='relu')(rshp)
cnn2 = Conv1D(100, 9, activation='relu')(cnn1)
mp1 = MaxPooling1D((3,))(cnn2)
cnn3 = Conv1D(50, 3, activation='relu')(mp1)
cnn4 = Conv1D(50, 3, activation='relu')(cnn3)
gap1 = AveragePooling1D((3,))(cnn4)
dropout1 = Dropout(rate=dropout[0])(gap1)
flt1 = Flatten()(dropout1)
rshp2 = Reshape((input_shape[0], -1), input_shape=flt1.shape)(flt1)
bilstm1 = Bidirectional(LSTM(240,
return_sequences=True,
recurrent_dropout=dropout[1]),
merge_mode=merge)(rshp2)
dense1 = TimeDistributed(Dense(30, activation='relu'))(rshp2)
dropout2 = Dropout(rate=dropout[2])(dense1)
prediction = TimeDistributed(Dense(1, activation='sigmoid'))(dropout2)
model = Model(inp, prediction, name="CNN-bLSTM_per_segment")
print(model.summary(line_length=75))
Where input_shape = (60, 60). This definition, however, raises the following error:
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
At first, I thought it was because the rshp2 layer could not reshape the flt1 output to shape (60, X). So I added a printing block before the Bidirectional(LSTM)) layer:
print('reshape1: ', rshp.shape)
print('cnn1: ', cnn1.shape)
print('cnn2: ', cnn2.shape)
print('mp1: ', mp1.shape)
print('cnn3: ', cnn3.shape)
print('cnn4: ', cnn4.shape)
print('gap1: ', gap1.shape)
print('flatten 1: ', flt1.shape)
print('reshape 2: ', rshp2.shape)
And the shapes were:
reshape 1: (None, 3600, 1)
cnn1: (None, 3592, 100)
cnn2: (None, 3584, 100)
mp1: (None, 1194, 100)
cnn3: (None, 1192, 50)
cnn4: (None, 1190, 50)
gap1: (None, 396, 50)
flatten 1: (None, 19800)
reshape 2: (None, 60, None)
Looking at the flt1 layer, its output shape is (19800,), which can be reshaped as (60, 330), but for some reason the (60, -1) of the rshp2 layer is not working as intended, evidenced by the print reshape 2: (None, 60, None). When I try to reshape as (60, 330) it works just fine. Does anyone knows why the (-1) is not working?
-1 is working.
From Reshape documentation, https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape
the layer returns a tensor with shape (batch_size,) + target_shape
So, the batch size stays the same, the other dimensions are calculated based on your target_shape.
From the doc, look at the last example,
# also supports shape inference using `-1` as dimension
model.add(tf.keras.layers.Reshape((-1, 2, 2)))
model.output_shape
(None, None, 2, 2)
If you pass -1 in your target shape, the Keras will store None, this is useful if you expect variable-length data in that axis, but if your data shape is always same, just put the dimension hard-coded that will place the dimension when you print the shape later.
N.B: Also no need to specify input_shape=input_shape for your intermediate layers in functional API. The model will infer that for you.

Keras LSTM Model for text-generation purpose

I am a beginner with Keras and in writing Neural Networks models and actually I'm trying to write a LSTM for text-generation purpose, without success. What am I doing wrong?
I read this question: here
and other articles but there is something I am missing I can't get, sorry if I seem dumb.
The goal
My purpose is to generate english articles of a fixed length (1500 by now).
Suppose I have a 20k records dataset in sequences (articles, basically) of different lengths, I set a fixed length for all articles (MAX_SEQUENCE_LENGTH=1500) and tokenized them, getting a matrix (X, my training-data) looking like:
[[ 0 0 0 ... 88 664 206]
[ 0 0 0 ... 1 93 140]
[ 0 0 0 ... 3 173 2283]
...
[ 50 2761 4 ... 167 148 156]
[ 0 0 0 ... 10 77 206]
[ 0 0 0 ... 167 148 156]]
with a shape of 20000x1500
the output of my LSTM should be a 1 x MAX_SEQUENCE_LENGTH array of tokens.
My model looks like that:
def generator_model(sequence_input, embedded_sequences, output_shape):
layer = LSTM(16,return_sequences = True)(embedded_sequences)
layer = LSTM(32,return_sequences = True)(layer)
layer = Flatten()(layer)
output = Dense(output_shape, activation='softmax')(layer)
generator = Model(sequence_input, output)
return generator
with:
sequence_input = Input(batch_shape=(1, 1,1500), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
output_shape = MAX_SEQUENCE_LENGTH
the LSTM is supposed to train, with model.fit(), on a training-set of 20k x MAX_SEQUENCE_LENGTH shape (X).
and getting an array of tokens with 1 x MAX_SEQUENCE_LENGTH shape as output when I call model.predict(seed), with seed a random noise array.
compile, fit and predict
comments for the following section:
. generator.compile works, the model is given in edit section of ths post.
. generator.fit compile, epochs=1 param is for testing-purpose, will be BATCH_NUM
. now i have some doubts on the y I give to generator.fit, by now I'm giving a matrix of 0 as target output, if I generate it with a different shape from the X.shape[0], it throw the error, this means it needs to have a label for every record in X. but if I give him a matrix of 0 as target for model.fit, isn't it going to predict just arrays of 0?
. the error is giving is always the same, despite i use the noise_generator() or noise_integer_generator(), i believe it's because it doesn't like the y_shape param i'm giving
embedding_layer = load_embeddings(word_index)
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
embedded_sequences = embedding_layer(sequence_input)
generator = generator_model(sequence_input, embedded_sequences, X.shape[1])
print(generator.summary())
generator.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
Xnoise = generate_integer_noise(MAX_SEQUENCE_LENGTH)
y_shape = np.zeros((X.shape[0],), dtype=int)
generator.fit(X, y_shape, epochs=1)
acc = generator.predict(Xnoise, verbose=1)
But actually I'm getting the following error
ValueError: Error when checking input: expected input_1 to have shape (1500,) but got array with shape (1,)
when I call:
Xnoise = generate_noise(samples_number=MAX_SEQUENCE_LENGTH)
generator.predict(Xnoise, verbose=1)
The noise I give is a 1 x 1500 array, but it seems it's expecting a (1500,) matrix, So there must be some kind of error in the shape settings for my output.
Is my model correct for my purpose? or did I wrote something really really stupid I can't see?
Thanks for the help you can give me, I appreciate that!
edit
Changelog:
v1.
###
- Changed model structure, now return_sequences = True and using shape instead of batch_shape
###
- Changed
sequence_input = Input(batch_shape=(1,1,1500), dtype='int32')
to
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
###
- Changed the error the model is giving
v2.
###
- Changed generate_noise() code
###
- Added generate_integer_noise() code
###
- Added full sequence with the model compile, fit and predict
###
- Added model.fit summary under the model summary, in the tail of the post
generate_noise() code:
def generate_noise(samples_number, mean=0.5, stdev=0.1):
noise = np.random.normal(mean, stdev, (samples_number, MAX_SEQUENCE_LENGTH))
print(noise.shape)
return noise
which print: (1500,)
generate_integer_noise() code:
def generate_integer_noise(samples_number):
noise = []
for _ in range(0, samples_number):
noise.append(np.random.randint(1, MAX_NB_WORDS))
Xnoise = np.asarray(noise)
return Xnoise
my function load_embeddings() is as follow:
def load_embeddings(word_index, embeddingsfile='Embeddings/glove.6B.%id.txt' %EMBEDDING_DIM):
embeddings_index = {}
f = open(embeddingsfile, 'r', encoding='utf8')
for line in f:
values = line.split(' ') #split the line by spaces
word = values[0] #each line starts with the word
coefs = np.asarray(values[1:], dtype='float32') #the rest of the line is the vector
embeddings_index[word] = coefs #put into embedding dictionary
f.close()
print('Found %s word vectors.' % len(embeddings_index))
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
return embedding_layer
model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1500) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 1500, 300) 9751200
_________________________________________________________________
lstm_1 (LSTM) (None, 1500, 16) 20288
_________________________________________________________________
lstm_2 (LSTM) (None, 1500, 32) 6272
_________________________________________________________________
flatten_1 (Flatten) (None, 48000) 0
_________________________________________________________________
dense_1 (Dense) (None, 1500) 72001500
=================================================================
Total params: 81,779,260
Trainable params: 72,028,060
Non-trainable params: 9,751,200
_________________________________________________________________
model.fit() summary (using a 999-sized dataset for testing, instad of the 20k-sized):
999/999 [==============================] - 62s 62ms/step - loss: 0.5491 - categorical_accuracy: 0.9680
I rewrote full answer, now it works (at least compiles and runs, can't say anything about convergence).
First, I don't know why you use sparse_categorical_crossentropy instead of categorical_crossentropy? It could be important. I change the model a bit, so it compiles and use a categorical_crossentropy. If you need a sparse one, change the shape of a target.
Also, I change batch_shape to shape argument, because it allows to use batches of different shape. It's easier to work with.
And the last edit: you should change generate_noise, because an Embedding layer awaits a numbers from (0, max_features), not the normally distributed floats (see a comment in the function).
EDIT
Addressing the last comments, I've removed a generate_noise and post modified generate_integer_noise function:
from keras.layers import Input, Embedding, LSTM
from keras.models import Model
import numpy as np
def generate_integer_noise(samples_number):
"""
samples_number is a number of samples, i.e. first dimension in (some, 1500)
"""
return np.random.randint(1, MAX_NB_WORDS, size=(samples_number, MAX_SEQUENCE_LENGTH))
MAX_SEQUENCE_LENGTH = 1500
"""
Tou can use your definition of embedding layer,
I post to make a reproducible example
"""
max_features, embed_dim = 10, 300
embedding_matrix = np.zeros((max_features, embed_dim))
output_shape = MAX_SEQUENCE_LENGTH
embedded_layer = Embedding(
max_features,
embed_dim,
weights=[embedding_matrix],
trainable=False
)
def generator_model(embedded_layer, output_shape):
"""
embedded_layer: Embedding keras layer
output_shape: shape of the target
"""
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH, ))
embedded_sequences = embedded_layer(sequence_input) # Set trainable to the True if you wish to train
layer = LSTM(32, return_sequences=True)(embedded_sequences)
layer = LSTM(64, return_sequences=True)(layer)
output = LSTM(output_shape)(layer)
generator = Model(sequence_input, output)
return generator
generator = generator_model(embedded_layer, output_shape)
noise = generate_integer_noise(32)
# generator.predict(noise)
generator.compile(loss='categorical_crossentropy', optimizer='adam')
generator.fit(noise, noise)

How to make Keras have two different initialisers in a dense layer?

I have two separately designed CNNs for two different features(image and text) of the same data, and the output has two classes
In the very last layer:
for image (resnet), I would like to use "he_normal" as the initializer
flatten1 = Flatten()(image_maxpool)
dense = Dense(output_dim=2, kernel_initializer="he_normal")(flatten1)
but for the text CNNs, i would like to use the default "glorot_normal"
flatten2 = Flatten()(text_maxpool)
output = Dense(output_dim=2, kernel_initializer="glorot_normal")(flatten2)
the flatten1 and flatten2 have sizes:
flatten_1 (Flatten) (None, 512)
flatten_2 (Flatten) (None, 192)
is there anyway i can concate these two flatten layers and have a long dense layer with a size 192+512 = 704, where the first 192 and second 512 has two seperate kernel_initializer, and produce a 2-class outputs?
something like this:
merged_tensor = merge([flatten1, flatten2], mode='concat', concat_axis=1)
output = Dense(output_dim=2,
kernel_initializer for [:512]='he_normal',
kernel_initializer for [512:]='glorot_normal')(merged_tensor)
Edit: I think I have gotten this work by having the following codes(thanks to #Aechlys):
def my_init(shape, shape1, shape2):
x = initializers.he_normal()(shape1)
y = initializers.glorot_normal()(shape2)
return tf.concat([x,y], 0)
class_num = 2
flatten1 = Flatten()(image_maxpool)
flatten2 = Flatten()(text_maxpool)
merged_tensor = concatenate([flatten1, flatten2],axis=-1)
output = Dense(output_dim=class_num, kernel_initializer=lambda shape: my_init(shape,\
shape1=(512,class_num),\
shape2=(192,class_num)),\
activation='softmax')(merged_tensor)
I have to manually add the shape size 512 and 192, because I failed to get the size of flatten1 and flatten1 via the code
flatten1.get_shape().as_list()
,which gave me [none, none], althought it should be [None, 512], other than that it should be fine
Oh my, have I had fun with this one. You have to create your own kernel intializer:
def my_init(shape, dtype=None, *, shape1, shape2):
x = keras.initializers.he_normal()(shape1, dtype=dtype)
y = keras.initializers.glorot_normal()(shape2, dtype=dtype)
return tf.concat([x,y], 0)
Then you will call it via lambda function within the Dense function:
Unfortunately, as you can see, I have not been able to deduce the shape programatically, yet. I may update this answer when I do. But, if you know the shape beforehand you can pass them as constants:
DENSE_UNITS = 64
input_t = Input((1,25))
input_i = Input((1,35))
input_a = Concatenate(axis=-1)([input_t, input_i])
dense = Dense(DENSE_UNITS, kernel_initializer=lambda shape: my_init(shape,
shape1=(int(input_t.shape[-1]), DENSE_UNITS),
shape2=(int(input_i.shape[-1]), DENSE_UNITS)))(input_a)
tf.keras.Model(inputs=[input_t, input_i], outputs=dense)
Out: <tensorflow.python.keras._impl.keras.engine.training.Model at 0x19ff7baac88>

Categories