Possible to add a bidirectional LSTM before a CNN in Keras?

Possible to add a bidirectional LSTM before a CNN in Keras? - python

I am currently working on a system that classifies whether two sentences share the same content or not. For this purpose I use pretrained word vectors, so there is an array with the word vectors of sentence one (s1) and an array with the word vectors of sentence 2 (s2). In order to classify whether they are similar or not I create a matrix by comparing all vectors in s1 pairwise with the vectors in s2. This matrix is then fed into a CNN classifier and trained on the data. This all pretty straight forward.
Now I would like to enhance this system by making using bidirectional LSTMs on s1 and s2. The bidirectional LSTM should be used in order to get the hidden state of each vector in s1 and s2 and these hidden states should then be compared in the same way by pairwise cosine similarity as the vectors of s1 and s2 are compared before. This is in order to capture the information of the sentence context of each word in s1 and s2.
Now the question is how to do this in Keras. Currently I am using numpy/sklearn to create the matrices which are then fed as training data into Keras.
I found one implementation of what I want to do in plain tensorflow (https://github.com/LiuHuiwen/Pairwise-Word-Interaction-Modeling-by-Tensorflow-1.0/blob/master/model.py).
I assume that I will have to change the input data to consist of just the two arrays of vectors of s1 and s2. Then I have to run the biLSTM first, get the hidden states, convert everything into matrices and feed this into the CNN. The example in plain tensorflow seems to be quite clear to me, but I cannot come up with an idea of how to do this in Keras. Is it possible at all in Keras or does one have to resort to tensorflow directly in order to do the necessary calculations on the output of the biLSTM?

Keras RNN layer including LSTM can return not only the last output in the output sequence but also the full sequence from all hidden layers using return_sequences=True option.
https://keras.io/layers/recurrent/
When you want to connect Bi-Directional LSTM layer before CNN layer, the following code is an example:
from keras.layers import Input, LSTM, Bidirectional, Conv1D
input = Input(shape=(50, 200))
seq = Bidirectional(LSTM(16, return_sequences=True))(input)
cnn = Conv1D(32, 3, padding="same", activation="relu")(seq)
Please note: If you want to use Conv2D layer after Bi-Directional LSTM layer, reshaping to ndim=4 is required for input of Conv2D, like the following code:
from keras.layers import Input, LSTM, Bidirectional, Conv2D, Reshape
input = Input(shape=(50, 200))
seq = Bidirectional(LSTM(16, return_sequences=True))(input)
seq = Reshape((50, 32, 1))(seq)
cnn = Conv2D(32, (3, 3), padding="same", activation="relu")(seq)

Related

How to make array have same number of samples?

I am a beginner using CNN and Keras and I am trying to make a program to predict whether someone could develop diabetes using data in a CSV file. I think I am getting confused with how to reshape the array as I am receiving the error:
ValueError: Data cardinality is ambiguous:
x sizes: 8
y sizes: 768
Make sure all arrays contain the same number of samples
Here is the code:
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten
# read in the csv file using pandas
data = pd.read_csv("diabetes.csv")
# extract the input and output columns from the dataframe
X = data.drop(columns=['Outcome'])
y = data['Outcome']
# reshape the input data into the shape expected by a CNN
X = X.values.reshape(8, 768, 1)
# create a Sequential model in Keras
model = Sequential()
# add a 2D convolutional layer with 32 filters and a kernel size of 3x3
model.add(Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=(8, 768, 1)))
# add a flatten layer to flatten the output from the convolutional layer
model.add(Flatten())
# add a fully-connected layer with 64 units and a ReLU activation
model.add(Dense(64, activation="relu"))
# add a fully-connected layer with 10 units and a softmax activation
model.add(Dense(10, activation="softmax"))
# compile the model using categorical crossentropy loss and an Adam optimizer
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
# fit the model using the input and output data
model.fit(X, y)
# print prediction
print(model.predict(10, 139, 80, 0, 0, 27.1, 1.441, 57))

Tldr, you probably don't want a CNN in this case.
First off, I’m assuming your data looks something like the following, if that’s not the case the rest of the post may be way off target:
enter image description here
So there are 768 rows or patients, 8 inputs for each row, and 1 output (known as the label).
Convolutional layers are used when there is an input signal that you wish to analyze. In 2d, this would be something like a grid of pixels, or in 1d it might be time series data. Your data is neither – each row of the data represents a single 8-dimensional data point (i.e. a single patient) at a single point in time, so you very likely don’t want to use a convolutional layer at all.
For more information, you can read up on the differences between convnets and fully connected neural networks here: https://ai.stackexchange.com/questions/5546/what-is-the-difference-between-a-convolutional-neural-network-and-a-regular-neur?rq=1
“CNN, in specific, has one or more layers of convolution units. A convolution unit receives its input from multiple units from the previous layer which together create a proximity. Therefore, the input units (that form a small neighborhood) share their weights.
The convolution units (as well as pooling units) are especially beneficial as:
• They reduce the number of units in the network (since they are many-to-one mappings). This means, there are fewer parameters to learn which reduces the chance of overfitting as the model would be less complex than a fully connected network.
• They consider the context/shared information in the small neighborhoods. This feature is very important in many applications such as image, video, text, and speech processing/mining as the neighboring inputs (eg pixels, frames, words, etc) usually carry related information."
A very naïve, very basic NN for a problem like this would just use Dense, i.e. fully connected layers.
In Keras, you can do the following:
model = Sequential()
model.add(Dense(64, activation="relu", input_shape=(8,)))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
Note that the last layer is a single neuron, since you have only one output. If you were classifying images as one of say 10 categories (dog, cat, bird, etc), then you would use 10 output nodes in the last layer, softmax them, and use categorical cross entropy. Here, with a single condition, you only need a single output node and note that the loss function should probably be binary crossentropy – i.e. you’re trying to detect the presence or absence of the condition.
Hope this helps.

how to Create Multi Head Convolutional layers merging as Temporal dimension in keras

I want to implement a time-series prediction model which has a window of non-image matrixes as input , each matrix to be processed by a Conv2d layer at the first layer and then the output of this conv layers merged as time dimension to be passed to a recurrent layer like LSTM,
one way is to use Time-Distribution technique but TimeDistributed layer apply the same layer to several inputs. And it produce one output per input to get the result in time, the Time-Distribution technique will share the same weights among all convolution heads which is not what I want, for example If you injects 5 Matrixes, the weights are not tweaked 5 times, but only once, and distributed to every blocks defined in the current Time Distributed layer. how can I avoid this and have independent Convolutional heads with outputs merging as time dimension for the next layer?
I have tried to implement it as following
Matrix_Dimention=20;
Input_Window=4;
Input_Matrixes=[]
ConvLayers=[]
for i in range(0 , Input_Window):
Inp_Matrix=layers.Input(shape=(Matrix_Dimention,Matrix_Dimention,1));
Input_Matrixes.append(Inp_Matrix);
conv=layers.Conv2D(64, 5, activation='relu', input_shape=(Matrix_Dimention,Matrix_Dimention,1))(Inp_Matrix)
ConvLayers.append(conv);
#Temporal Concatenation
Spatial_Layers_Concate = layers.Concatenate(ConvLayers); # this causes error : Inputs to a layer should be tensors
#Temporal Component
LSTM_Layer=layers.LSTM(activation='relu',return_sequences=False)(Spatial_Layers_Concate )
Model = keras.Model(Input_Matrixes, LSTM_Layer)
Model.compile(optimizer='adam', loss=keras.losses.MeanSquaredError)
it would be great if you provide your answer by correcting my implementation or provide your own if there is a better way to form this idea , tnx.

Adaptation module design for stacking two CNNs

I'm trying to stack two different CNNs using an adaptation module to bridge them, but I'm having a hard time determining the adaption module's layer hyperparameters correctly.
To be more precise, I would like to train the adaptation module to bridge two convolutional layers:
Layer A with output shape: (29,29,256)
Layer B with input shape: (8,8,384)
So, after Layer A, I sequentially add the adaptation module, for which I choose:
Conv2D layer with 384 filters with kernel size: (3,3) / Output shape: (29,29,384)
MaxPool2D with pool size: (2,2), strides: (4,4) and padding: "same" / Output shape: (8,8,384)
Finally, I try to add layer B to the model, but I get the following error from tensorflow:
InvalidArgumentError: Dimensions must be equal, but are 384 and 288 for '{{node batch_normalization_159/FusedBatchNormV3}} = FusedBatchNormV3[T=DT_FLOAT, U=DT_FLOAT, data_format="NHWC", epsilon=0.001, exponential_avg_factor=1, is_training=false](Placeholder, batch_normalization_159/scale, batch_normalization_159/ReadVariableOp, batch_normalization_159/FusedBatchNormV3/ReadVariableOp, batch_normalization_159/FusedBatchNormV3/ReadVariableOp_1)' with input shapes: [?,8,8,384], [288], [288], [288], [288].
There's a minimal reproducible example of it:
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.applications.mobilenet import MobileNet
from keras.layers import Conv2D, MaxPool2D
from keras.models import Sequential
mobile_model = MobileNet(weights='imagenet')
server_model = InceptionResNetV2(weights='imagenet')
hybrid = Sequential()
for i, layer in enumerate(mobile_model.layers):
if i <= 36:
layer.trainable = False
hybrid.add(layer)
hybrid.add(Conv2D(384, kernel_size=(3,3), padding='same'))
hybrid.add(MaxPool2D(pool_size=(2,2), strides=(4,4), padding='same'))
for i, layer in enumerate(server_model.layers):
if i >= 610:
layer.trainable = False
hybrid.add(layer)

Sequential models only support models where the layers are arranged like a linked list - each layer takes the output of only one layer, and each layer's output is only fed to a single layer. Your two base models have residual blocks, which breaks the above assumption, and turns the model architecture into directed acyclic graph (DAG).
To do what you want to do, you'll need to use the Functional API. With the Functional API, you explicitly control the intermediate activations aka KerasTensors.
For the first model, you can skip that extra work and just create a new model from a subset of the existing graph like this
sub_mobile = keras.models.Model(mobile_model.inputs, mobile_model.layers[36].output)
Wiring some of the layers of the second model is much more difficult. It's easy to slice off the end of a keras model - it's much more difficult to slice of the beginning because of the need for a tf.keras.Input placeholder. To do this successfully, you'll need to write a model walking algorithm that go through the layers, tracks the output KerasTensors, then calls each layer with the new inputs to create a new output KerasTensor.
You could avoid all that work by simply finding some source code for an InceptionResNet and adding layers via Python rather than introspecting an existing model. Here's one which may fit the bill.
https://github.com/yuyang-huang/keras-inception-resnet-v2/blob/master/inception_resnet_v2.py

Tensorflow - building LSTM model - need for tf.keras.layers.Dense()

Python 3.7 tensorflow
I am experimenting Time series forecasting w Tensorflow
I understand the second line creates a LSTM RNN i.e. a Recurrent Neural Network of type Long Short Term Memory.
Why do we need to add a Dense(1) layer in the end?
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Tutorial for Dense() says
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
would you like to rephrase or elaborate on need for Dense() here ?

The following line
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
creates an LSTM layer which transforms each input step of size #features into a latent representation of size 32. You want to predict a single value so you need to convert this latent representation of size 32 into a single value. Hence, you add the following line
single_step_model.add(tf.keras.layers.Dense(1))
which adds a Dense Layer (Fully-Connected Neural Network) with one neuron in the output which, obviously, produces a single value. Look at it as a way to transform an intermediate result of higher dimensionality into the final result.

Well in the tutorial you are following Time series forecasting, they are trying to forecast temperature (6 hrs ahead). For which they are using an LSTM followed by a Dense layer.
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Dense layer is nothing but a regular fully-connected NN layer. In this case you are bringing down the output dimensionality to 1, which should represent some proportionality (need not be linear) to the temperature you are trying to predict. There are other layers you could use as well. Check out, Keras Layers.
If you are confused about the input and output shape of LSTM, check out
I/O Shape.

How to use Bidirectional RNN and Conv1D in keras when shapes are not matching?

I am brand new to Deep-Learning so I'm reading though Deep Learning with Keras by Antonio Gulli and learning a lot. I want to start using some of the concepts. I want to try and implement a neural network with a 1-dimensional convolutional layer that feeds into a bidirectional recurrent layer (like the paper below). All the tutorials or code snippets I've encountered do not implement anything remotely similar to this (e.g. image recognition) or use an older version of keras with different functions and usage.
What I'm trying to do is a variation of this paper:
(1) convert DNA sequences to one-hot encoding vectors; ✓
(2) use a 1 dimensional convolutional neural network; ✓
(3) with max pooling; ✓
(4) send the output to a bidirectional RNN; ⓧ
(5) classify the input;
I cannot figure out how to get the shapes to match up on the Bidirectional RNN. I can't even get an ordinary RNN to work at this stage. How can I restructure the incoming layers to work with a Bidirectional RNN?
Note:
The original code came from https://github.com/uci-cbcl/DanQ/blob/master/DanQ_train.py but I simplified the output layer to just do binary classification. This processed was described (kind of) in https://github.com/fchollet/keras/issues/3322 but I cannot get it to work with the updated keras. The original code (and the 2nd link) work on a very large dataset so I am generating some fake data to illustrate the concept. They are also using an older version of keras where key functionality changes have been made since then.
# Imports
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl.keras.layers.core import *
from tensorflow.python.keras._impl.keras.layers import Conv1D, MaxPooling1D, SimpleRNN, Bidirectional, Input
from tensorflow.python.keras._impl.keras.models import Model, Sequential
# Set up TensorFlow backend
K = tf.keras.backend
K.set_session(tf.Session())
np.random.seed(0) # For keras?
# Constants
NUMBER_OF_POSITIONS = 40
NUMBER_OF_CLASSES = 2
NUMBER_OF_SAMPLES_IN_EACH_CLASS = 25
# Generate sequences
https://pastebin.com/GvfLQte2
# Build model
# ===========
# Input Layer
input_layer = Input(shape=(NUMBER_OF_POSITIONS,4))
# Hidden Layers
y = Conv1D(100, 10, strides=1, activation="relu", )(input_layer)
y = MaxPooling1D(pool_size=5, strides=5)(y)
y = Flatten()(y)
y = Bidirectional(SimpleRNN(100, return_sequences = True, activation="tanh", ))(y)
y = Flatten()(y)
y = Dense(100, activation='relu')(y)
# Output layer
output_layer = Dense(NUMBER_OF_CLASSES, activation="softmax")(y)
model = Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="categorical_crossentropy", )
model.summary()
# ~/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/recurrent.py in build(self, input_shape)
# 1049 input_shape = tensor_shape.TensorShape(input_shape).as_list()
# 1050 batch_size = input_shape[0] if self.stateful else None
# -> 1051 self.input_dim = input_shape[2]
# 1052 self.input_spec[0] = InputSpec(shape=(batch_size, None, self.input_dim))
# 1053
# IndexError: list index out of range

You don't need to restructure anything at all to get the output of a Conv1D layer into an LSTM layer.
So, the problem is simply the presence of the Flatten layer, which destroys the shape.
These are the shapes used by Conv1D and LSTM:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
Length is the same as timeSteps, and channels is the same as features.
Using the Bidirectional wrapper won't change a thing either. It will only duplicate your output features.
Classifying.
If you're going to classify the entire sequence as a whole, your last LSTM must use return_sequences=False. (Or you may use some flatten + dense instead after)
If you're going to classify each step of the sequence, all your LSTMs should have return_sequences=True. You should not flatten the data after them.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.