I have some troubles with the LSTM implementation in Keras.
My training set is structured as follow:
number of sequences: 5358
the length of each sequence is 300
each element of the sequence is a vector of 54 features
I'm unsure on how to shape the input for a stateful LSTM.
Following this tutorial: http://philipperemy.github.io/keras-stateful-lstm/, I've created the subsequences (in my case there are 1452018 subsequences with a window_size = 30).
What is the best option to reshape the data for a stateful LSTM's input?
What means the timestep of the input in this case? And why?
Is the batch_size related to the timestep?
I'm unsure on how to shape the input for a stateful LSTM.
LSTM(100, statefull=True)
But before using stateful LSTM ask yourself do I really need statefull LSTM? See here and here for more details.
What is the best option to reshape the data for a stateful LSTM's
input?
It really depends on the problem on hands. However, I think you do not need reshaping just feed data directly into Keras:
input_layer = Input(shape=(300, 54))
What means the timestep of the input in this case? And why?
In your example timestamp is 300. See here for further details on timestamp. In the following picture, we have 5 timestamps that we feed them into the LSTM network.
Is the batch_size related to the timestep?
No, it has nothing to do with batch_size. More details on batch_size can be found here.
Here is simple code based on the description that you provide. It might give you some intuition:
import numpy as np
from tensorflow.python.keras import Input, Model
from tensorflow.python.keras.layers import LSTM
from tensorflow.python.layers.core import Dense
x_train = np.zeros(shape=(5358, 300, 54))
y_train = np.zeros(shape=(5358, 1))
input_layer = Input(shape=(300, 54))
lstm = LSTM(100)(input_layer)
dense1 = Dense(20, activation='relu')(lstm)
dense2 = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=input_layer, ouputs=dense2)
model.compile("adam", loss='binary_crossentropy')
model.fit(x_train, y_train, batch_size=512)
Related
I'm trying to implement an LSTM model for DNA sequence classification, but at the moment it is unusable because of how long it takes to train (25 seconds per epoch over 6.5K sequences, about 4ms per sample, and we need to train several versions of the model over 100s of thousands of sequences).
DNA sequence can be represented as a string of A, C, G, and T, e.g. "ACGGGTGACAT" could be an example of a single DNA sequence. Each sequence belongs to one of two categories that I am trying to predict and each sequence contains 1000 characters.
Initially, my model did not include an Embedding layer and instead I manually converted each sequence into a one-hot encoded matrix (4 rows by 1000 columns) and the model didn't work great but was incredibly fast. At this point though I had seen online that using an embedding layer has clear advantages. So I added an embedding layer and instead of using the one-hot encoded matrix I converted the sequences into integers with each character represented by a different integer.
Indeed the model works much better now, but it is about 30 times slower and impossible to work with. Is there something I can do here to speed up the embedding layer?
Here are the functions for constructing and fitting my model:
from tensorflow.keras.layers import Embedding, Dense, LSTM, Activation
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
def build_model():
# initialize a sequential model
model = Sequential()
# add embedding layer
model.add(Embedding(5, 1, input_length=1000, mask_zero=True))
# Add LSTM layer
model.add(
LSTM(5)
)
# Add Dense NN layer
model.add(
Dense(units=2)
)
model.add(Activation('softmax'))
optimizer = Adam(clipnorm=1.)
model.compile(
loss="categorical_crossentropy", optimizer=optimizer, metrics=['accuracy']
)
return model
def train_model(X_train, y_train, epochs, batch_size):
model = build_model()
# y_train is initially a list of zeroes and ones, needs to be converted to categorical
y_train = to_categorical(y_train)
history = model.fit(
X_train, y_train, epochs=epochs, batch_size=batch_size
)
return model, history
Any help will be greatly appreciated - after much googling and trial-and-error, I can't seem to speed this up.
A possible suggestion is to use a "cheaper" RNN, such as the SimpleRNN instead of LSTM. It has less parameters to train. In some simple testing, I got a ~3x speed up over LSTM, with the same Embedding processing as you currently have. Not sure if you can reduce the sequence length from 1000 to a lower number, but that might be a direction to explore as well. I hope this helps.
Trying to set up a Conv1D layer to be the input layer in keras.
The dataset is 1000 timesteps, and each timestep has 1 feature.
After reading a bunch of answers I reshaped my dataset to be in the following format of (n_samples, timesteps, features), which corresponds to the following in my case:
train_data = (78968, 1000, 1)
test_data = (19742, 1000, 1)
train_target = (78968,)
test_target = (19742,)
I later create and compile the code using the following lines
model = Sequential()
model.add(Conv1D(64, (4), input_shape = (1000,1) ))
model.add(MaxPooling1D(pool_size=2))
model.add(Dense(1))
optimizer = opt = Adam(decay = 1.000-0.999)
model.compile(optimizer=optimizer,
loss='mean_squared_error',
metrics=['mean_absolute_error','mean_squared_error'])
Then I try to fit, note, train_target and test_target are pandas series so i'm calling DataFrame.values to convert to numpy array, i suspect there might be an issue there?
training = model.fit(train_data,
train_target.values,
validation_data=(test_data, test_target.values),
epochs=epochs,
verbose=1)
The model compiles but I get an error when I try to fit
Error when checking target: expected dense_4 to have 3 dimensions,
but got array with shape (78968, 1)
I've tried every combination of reshaping the data and can't get this to work.
I've used keras with dense layers only before for a different project where the input_dimension was specificied instead of the input_shape, so I'm not sure what I'm doing wrong here. I've read almost every stack overflow question about data shape issues and I'm afraid the problem is elsewhere, any help is appreciated, thank you.
Under the line model.add(MaxPooling1D(pool_size=2)), add one line model.add(Flatten()), your problem will be solved. Flatten function will help you convert your data into correct shape, please see this site for more information https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten
The problem is the following. I have a categorical prediction task of vocabulary size 25K. On one of them (input vocab 10K, output dim i.e. embedding 50), I want to introduce a trainable weight matrix for a matrix multiplication between the input embedding (shape 1,50) and the weights (shape(50,128)) (no bias) and the resulting vector score is an input for a prediction task along with other features.
The crux is, I think that the trainable weight matrix varies for each input, if I simply add it in. I want this weight matrix to be common across all inputs.
I should clarify - by input here I mean training examples. So all examples would learn some example specific embedding and be multiplied by a shared weight matrix.
After every so many epochs, I intend to do a batch update to learn these common weights (or use other target variables to do multiple output prediction)
LSTM? Is that something I should look into here?
With the exception of an Embedding layer, layers apply to all examples in the batch.
Take as an example a very simple network:
inp = Input(shape=(4,))
h1 = Dense(2, activation='relu', use_bias=False)(inp)
out = Dense(1)(h1)
model = Model(inp, out)
This a simple network with 1 input layer, 1 hidden layer and an output layer. If we take the hidden layer as an example; this layer has a weights matrix of shape (4, 2,). At each iteration the input data which is a matrix of shape (batch_size, 4) is multiplied by the hidden layer weights (feed forward phase). Thus h1 activation is dependent on all samples. The loss is also computed on a per batch_size basis. The output layer has a shape (batch_size, 1). Given that in the forward phase all the batch samples affected the values of the weights, the same is true for backdrop and gradient updates.
When one is dealing with text, often the problem is specified as predicting a specific label from a sequence of words. This is modelled as a shape of (batch_size, sequence_length, word_index). Lets take a very basic example:
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
sequence_length = 80
emb_vec_size = 100
vocab_size = 10_000
def make_model():
inp = Input(shape=(sequence_length, 1))
emb = Embedding(vocab_size, emb_vec_size)(inp)
emb = Reshape((sequence_length, emb_vec_size))(emb)
h1 = Dense(64)(emb)
recurrent = LSTM(32)(h1)
output = Dense(1)(recurrent)
model = Model(inp, output)
model.compile('adam', 'mse')
return model
model = make_model()
model.summary()
You can copy and paste this into colab and see the summary.
What this example is doing is:
Transform a sequence of word indices into a sequence of word embedding vectors.
Applying a Dense layer called h1 to all the batches (and all the elements in the sequence); this layer reduces the dimensions of the embedding vector. It is not a typical element of a network to process text (in isolation). But this seemed to match your question.
Using a recurrent layer to reduce the sequence into a single vector per example.
Predicting a single label from the "sentence" vector.
If I get the problem correctly you can reuse layers or even models inside another model.
Example with a Dense layer. Let's say you have 10 Inputs
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# defining 10 inputs in a List with (X,) shape
inputs = [Input(shape = (X,),name='input_{}'.format(k)) for k in
range(10)]
# defining a common Dense layer
D = Dense(64, name='one_layer_to_rule_them_all')
nets = [D(inp) for inp in inputs]
model = Model(inputs = inputs, outputs = nets)
model.compile(optimizer='adam', loss='categorical_crossentropy')
This code is not going to work if the inputs have different shapes. The first call to D defines its properties. In this example, outputs are set directly to nets. But of course you can concatenate, stack, or whatever you want.
Now if you have some trainable model you can use it instead of the D:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# defining 10 inputs in a List with (X,) shape
inputs = [Input(shape = (X,),name='input_{}'.format(k)) for k in
range(10)]
# defining a shared model with the same weights for all inputs
nets = [special_model(inp) for inp in inputs]
model = Model(inputs = inputs, outputs = nets)
model.compile(optimizer='adam', loss='categorical_crossentropy')
The weights of this model are shared among all inputs.
My input data consists of 10 samples, each of which has 200 time steps, while each time step is described by a vector of 30 dimensions.
In addition, each time step consists of a 3 dimensional vector (one hot encoding) which describes the action which has been taken at that particular time step. With that being said, I am trying to build a model which get fed in all previous actions and then predicts which action would be the best to take next.
I tried to get this working with tflearn and tensorflow but with limited success so far.
Simple sample code:
import numpy as np
import operator
import tflearn
from tflearn import regression
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.embedding_ops import embedding
from tflearn.layers.recurrent import bidirectional_rnn, BasicLSTMCell
from tflearn.data_utils import to_categorical, pad_sequences
SAMPLES = 10
TIME_STEPS = 200
DATA_DIMENSIONS = 30
LABEL_CLASSES = 3
x = []
y = []
# Generate fake data.
for i in range(SAMPLES):
sequences = []
outputs = []
for i in range(TIME_STEPS):
d = []
for i in range(DATA_DIMENSIONS):
d.append(1)
sequences.append(d)
outputs.append([0,0,1])
x.append(sequences)
y.append(outputs)
print("X1:", len(x), ", X2:", len(x[0]), ", X3:", len(x[0][0]))
print("Y1:", len(y), ", Y2:", len(y[0]), ", Y3:", len(y[0][0]))
# Define model
net = tflearn.input_data([None, TIME_STEPS, DATA_DIMENSIONS], name='input')
net = tflearn.lstm(net, 128, dropout=0.8, return_seq=True)
net = tflearn.fully_connected(net, LABEL_CLASSES, activation='softmax')
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(net)
# Fit model.
model.fit({'input': x}, {'targets': y},
n_epoch=1,
snapshot_step=1000,
show_metric=True, run_id='test', batch_size=32)
Error
ValueError: Cannot feed value of shape (10, 200, 3) for Tensor
'targets/Y:0', which has shape '(?, 3)'
As far as I understand, the input_data should be correct. However, the output data is apparently wrong, at least, Tensorflow throws an error. That is probably because my model expects one label per sample rather than one label per time step.
Can I even achieve my goal with an LSTM, and if so, how do I have to set up my model?
Thanks,
Robert
As the error suggests, there is a shape mismatch between the expected size of your targets tensor, and the one of the data you actually provide for it. Let us break it down.
From what I understand, you have labeled action for every timestep of your sequences. This means that the labels that you provide should have a shape (10, 200, 3). This seems to be the case from the error message. Good.
So we now know the error comes from what the network generates.
=================
Input data -> (10, 200, 30)
LSTM -> (10, 128) (because return_seq=False)
FullyConnected -> (10, 3).
=================
So that explains the second part of the error message, your network indeed produces an output with shape (10, 3) which mismatches the one of your data.
I think you missed the return_seq argument of the LSTM. As is usually the case with RNN implementations, you have a parameter telling if you want the layer to return outputs for the whole sequence, or only for the last timestep. Here by default it is the second option, that is why you don't get an output with the expected shape. Use return_seq=True.
I am brand new to Deep-Learning so I'm reading though Deep Learning with Keras by Antonio Gulli and learning a lot. I want to start using some of the concepts. I want to try and implement a neural network with a 1-dimensional convolutional layer that feeds into a bidirectional recurrent layer (like the paper below). All the tutorials or code snippets I've encountered do not implement anything remotely similar to this (e.g. image recognition) or use an older version of keras with different functions and usage.
What I'm trying to do is a variation of this paper:
(1) convert DNA sequences to one-hot encoding vectors; ✓
(2) use a 1 dimensional convolutional neural network; ✓
(3) with max pooling; ✓
(4) send the output to a bidirectional RNN; ⓧ
(5) classify the input;
I cannot figure out how to get the shapes to match up on the Bidirectional RNN. I can't even get an ordinary RNN to work at this stage. How can I restructure the incoming layers to work with a Bidirectional RNN?
Note:
The original code came from https://github.com/uci-cbcl/DanQ/blob/master/DanQ_train.py but I simplified the output layer to just do binary classification. This processed was described (kind of) in https://github.com/fchollet/keras/issues/3322 but I cannot get it to work with the updated keras. The original code (and the 2nd link) work on a very large dataset so I am generating some fake data to illustrate the concept. They are also using an older version of keras where key functionality changes have been made since then.
# Imports
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl.keras.layers.core import *
from tensorflow.python.keras._impl.keras.layers import Conv1D, MaxPooling1D, SimpleRNN, Bidirectional, Input
from tensorflow.python.keras._impl.keras.models import Model, Sequential
# Set up TensorFlow backend
K = tf.keras.backend
K.set_session(tf.Session())
np.random.seed(0) # For keras?
# Constants
NUMBER_OF_POSITIONS = 40
NUMBER_OF_CLASSES = 2
NUMBER_OF_SAMPLES_IN_EACH_CLASS = 25
# Generate sequences
https://pastebin.com/GvfLQte2
# Build model
# ===========
# Input Layer
input_layer = Input(shape=(NUMBER_OF_POSITIONS,4))
# Hidden Layers
y = Conv1D(100, 10, strides=1, activation="relu", )(input_layer)
y = MaxPooling1D(pool_size=5, strides=5)(y)
y = Flatten()(y)
y = Bidirectional(SimpleRNN(100, return_sequences = True, activation="tanh", ))(y)
y = Flatten()(y)
y = Dense(100, activation='relu')(y)
# Output layer
output_layer = Dense(NUMBER_OF_CLASSES, activation="softmax")(y)
model = Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="categorical_crossentropy", )
model.summary()
# ~/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/recurrent.py in build(self, input_shape)
# 1049 input_shape = tensor_shape.TensorShape(input_shape).as_list()
# 1050 batch_size = input_shape[0] if self.stateful else None
# -> 1051 self.input_dim = input_shape[2]
# 1052 self.input_spec[0] = InputSpec(shape=(batch_size, None, self.input_dim))
# 1053
# IndexError: list index out of range
You don't need to restructure anything at all to get the output of a Conv1D layer into an LSTM layer.
So, the problem is simply the presence of the Flatten layer, which destroys the shape.
These are the shapes used by Conv1D and LSTM:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
Length is the same as timeSteps, and channels is the same as features.
Using the Bidirectional wrapper won't change a thing either. It will only duplicate your output features.
Classifying.
If you're going to classify the entire sequence as a whole, your last LSTM must use return_sequences=False. (Or you may use some flatten + dense instead after)
If you're going to classify each step of the sequence, all your LSTMs should have return_sequences=True. You should not flatten the data after them.