Running out of memory when running Tf.Keras model - python

I'm building a model to predict 1148 rows of 160000 columns to a number of 1-9. I've done a similar thing before in keras, but am having trouble transfering the code to tensorflow.keras. Running the program produces the following error:
(1) Resource exhausted: 00M when allocating tensor with shape(1148,1,15998,9) and type float......k:0/device:GPU:0 by allocator GPU_0_bfc..............
[[{{node conv1d/conv1d-0-0-TransposeNCHWToNWC-LayoutOptimizer}}]]
This is caused by the following code. It appears to be a memory issue, but I'm unsure why memory would be an issue. Advice would be appreciated.
num_classes=9
y_train = to_categorical(y_train,num_classes)
x_train = x_train.reshape((1148, 160000, 1))
y_train = y_train.reshape((1148, 9))
input_1 = tf.keras.layers.Input(shape=(160000,1))
conv1 = tf.keras.layers.Conv1D(num_classes, kernel_size=3, activation='relu')(input_1)
flatten_1 = tf.keras.layers.Flatten()(conv1)
output_1 = tf.keras.layers.Dense(num_classes, activation='softmax')(flatten_1)
model = tf.keras.models.Model(input_1, output_1)
my_optimizer = tf.keras.optimizers.RMSprop()
my_optimizer.lr = 0.02
model.compile(optimizer=my_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, steps_per_epoch=20)
predictions = model.predict(x_test)
Edit: model.summary
Layer-Output shape-Param#
Input_1 (inputLayer) none, 160000,1. 0 Conv1d (Conv1D) none,159998, 9
36 flatten (Flatten) none,1439982. 0 dense (Dense) none, 9. 12959847
Total Params: 12,959,883 Trainable Params 12,959,883

Without more information it is hard to give a concrete answer.
what hardware are you running on? How much memory do you have available?
At which point in the code does the error occur?
Some things you can try:
change from 32-bit float to 16 bit float, if you haven't already (2x memory reduction)
reduce the batch size by adding batch_size=16 inside model.fit (default is 32) (2x memory reduction)
If that's still not enough you need to think about applying dimensionality reduction to your feature space, which is very high dimensional (160,000)

This might sound very silly but in my case I was getting (1) Resource exhausted error because I didn't had enough space in my main harddrive. After cleaning out some space, my training scripts start working again.

Related

Keras seem to ignore my batch_size and tries to fit all data in GPU memory

I have a RNN with 2 input arrays, and 2 output arrays, the sizes are:
input1 = (339679, 90, 15)
input2 =(339679, 90, 27)
output1 = 339679,2
output2 = 339679,16
The code to create the RNN LSTM is (I will only show one of the two RNN, the other one is identical but with 16 outputs and getting input sizes from input2):
inputs = Input(shape=(n_in1.shape[1], n_in1.shape[2]), name='inputs')
lstmA1 = LSTM(1024, return_sequences=True, name="lstmA1") (inputs)
lstmA2 = LSTM(512//1, return_sequences=True, name="lstmA2")(lstmA1)
lstmA3 = LSTM(512//2, return_sequences=True, name="lstmA3")(lstmA2)
lstmA4 = LSTM(512//2, return_sequences=True, name="lstmA4")(lstmA3)
lstmA5 = LSTM(512//4, return_sequences=False, name="lstmA5")(lstmA4)
denseA1 = DenseBND(lstmA5, 512//1, "denseA1", None, False, 0.2)
denseA2 = DenseBND(denseA1, 512//4, "denseA2", None, False, 0.2)
denseA3 = DenseBND(denseA2, 512//8, "denseA3", None, False, 0.2)
outputsA = Dense(2, name="outputsA")(denseA3)
Here, n_in1 is input1 I described before, so the shape given is 90,15
DenseBND is just a function that returns a Dense layer with BatchNormalization and dropout. In this case BatchNormalization is False, the activation function is None and Dropout is 0.2 So it just returns Dense layer with linear activation function and 20% Dropout.
And finally, the line to train it:
model.fit( {'inputsA': np.array(n_input1), 'inputsB': np.array(n_input2)},
{'outputsA': np.array(n_output1), 'outputsB': np.array(n_output2)},
validation_split=0.1, epochs=1000, batch_size=256,
callbacks=callbacks_list)
You can see that the validation_split is 0.1 and the batch_size is 256
Nevertheless, when I try to train it I get the following error:
ResourceExhaustedError: OOM when allocating tensor with shape[335376,90,27] and type float on /job:
As you can see, it seems it tries to fit the whole dataset into GPU memory, instead of going batch by batch. I ever set batch_size to 1, and this error persists. The first number 335376 is a 90% of my dataset (this number is different to the one above, the one above is the one that works, this one doesn't).
Shouldn't it try to allocate tensor with shape 256,90,27?
No, keras is not ignoring your batch size.
You're trying to create numpy arrays with too large dimensions.
'inputsB': np.array(n_input2) this allocates a really large numpy array, so even before the training starts, this numpy conversion is not possible due to limited memory.
You need to use data generators which doesn't laod the full data at once into memory.
Ref: https://keras.io/api/preprocessing/

Dataset shape mismatch Conv1D input layer

Trying to set up a Conv1D layer to be the input layer in keras.
The dataset is 1000 timesteps, and each timestep has 1 feature.
After reading a bunch of answers I reshaped my dataset to be in the following format of (n_samples, timesteps, features), which corresponds to the following in my case:
train_data = (78968, 1000, 1)
test_data = (19742, 1000, 1)
train_target = (78968,)
test_target = (19742,)
I later create and compile the code using the following lines
model = Sequential()
model.add(Conv1D(64, (4), input_shape = (1000,1) ))
model.add(MaxPooling1D(pool_size=2))
model.add(Dense(1))
optimizer = opt = Adam(decay = 1.000-0.999)
model.compile(optimizer=optimizer,
loss='mean_squared_error',
metrics=['mean_absolute_error','mean_squared_error'])
Then I try to fit, note, train_target and test_target are pandas series so i'm calling DataFrame.values to convert to numpy array, i suspect there might be an issue there?
training = model.fit(train_data,
train_target.values,
validation_data=(test_data, test_target.values),
epochs=epochs,
verbose=1)
The model compiles but I get an error when I try to fit
Error when checking target: expected dense_4 to have 3 dimensions,
but got array with shape (78968, 1)
I've tried every combination of reshaping the data and can't get this to work.
I've used keras with dense layers only before for a different project where the input_dimension was specificied instead of the input_shape, so I'm not sure what I'm doing wrong here. I've read almost every stack overflow question about data shape issues and I'm afraid the problem is elsewhere, any help is appreciated, thank you.
Under the line model.add(MaxPooling1D(pool_size=2)), add one line model.add(Flatten()), your problem will be solved. Flatten function will help you convert your data into correct shape, please see this site for more information https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten

Building/Training 1D CNN for sequence-InvalidArgument

I'm building and training a CNN for a sequence, and have been using RNN's successfully, but am running into issues with CNN.
Here's the code, cnn1 is first (more complex model), tried getting a simpler one to fit and getting errors on both:
The shapes are as follows:
xtrain (5206, 19, 4)
ytrain (5206, 4)
xvalid (651, 19, 4)
yvalid (651, 4)
xtest (651, 19, 4)
ytest (651, 4)
I've tried just about every combination of kernel sizes and nodes I can think of, tried 2 different model builds.
model_cnn1.add(keras.layers.Conv1D(32, (4), activation='relu'))
model_cnn1.add(keras.layers.MaxPooling1D((4)))
model_cnn1.add(keras.layers.Conv1D(32, (4), activation='relu'))
model_cnn1.add(keras.layers.MaxPooling1D((4)))
model_cnn1.add(keras.layers.Conv1D(32, (4), activation='relu'))
model_cnn1.add(keras.layers.Dense(4))
model_cnn2 = keras.models.Sequential([
keras.layers.Conv1D(100,(4),input_shape=(19,4),activation='relu'),
keras.layers.MaxPooling1D(4),
keras.layers.Dense(4)
])
model_cnn2.compile(loss='mse',optimizer='adam',metrics= ['mse','accuracy'])
model_cnn2.fit(X_train_tf,y_train_tf,epochs=25)
Output is 1/25 epochs, not entirely run, then on cnn1 I receive some variation of (final line):
ValueError: Negative dimension size caused by subtracting 4 from 1 for
'max_pooling1d_26/MaxPool' (op: 'MaxPool') with input shapes:
[?,1,1,32]
on cnn2 (simpler) I get error (final line):
InvalidArgumentError: Incompatible shapes: [32,4,4] vs. [32,4]
[[{{node metrics_6/mse/SquaredDifference}}]]
[Op:__inference_keras_scratch_graph_6917]
In general, is there some rule I should be following here for kernels/nodes/etc? I always seem to get these errors on the shape.
I'm hoping after I build a model of each type I'll understand the ins and outs--no pun intended--but it's driving me crazy!
I've tried every combination of
You can read up on the docs of the Conv1D and MaxPooling1D to read that these layers change the output shape depending on the value for strides. In your case you can keep the output shape for Conv1D equal by specifying a padding. MaxPooling1D changes the output shape by definition. With strides = 4, the output shape will be 4 times smaller in fact. I'd suggest carefully reading the docs to figure out exactly what happens and learning about the underlying theory of CNNs as to why this happens.

Stateful LSTM and stream predictions

I've trained an LSTM model (built with Keras and TF) on multiple batches of 7 samples with 3 features each, with a shape the like below sample (numbers below are just placeholders for the purpose of explanation), each batch is labeled 0 or 1:
Data:
[
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
...
]
i.e: batches of m sequences, each of length 7, whose elements are 3-dimensional vectors (so batch has shape (m73))
Target:
[
[1]
[0]
[1]
...
]
On my production environment data is a stream of samples with 3 features ([1,2,3],[1,2,3]...). I would like to stream each sample as it arrives to my model and get the intermediate probability without waiting for the entire batch (7) - see the animation below.
One of my thoughts was padding the batch with 0 for the missing samples,
[[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]] but that seems to be inefficient.
Will appreciate any help that will point me in the right direction of both saving the LSTM intermediate state in a persistent way, while waiting for the next sample and predicting on a model trained on a specific batch size with partial data.
Update, including model code:
opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
model = Sequential()
num_features = data.shape[2]
num_samples = data.shape[1]
first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features),
return_sequences=True, activation='tanh')
model.add(first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(LSTM(16, return_sequences=True, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=opt,
metrics=['accuracy', keras_metrics.precision(),
keras_metrics.recall(), f1])
Model Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100, 32) 6272
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 100, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 100, 32) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 100, 16) 3136
_________________________________________________________________
dropout_2 (Dropout) (None, 100, 16) 0
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 100, 16) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 1600) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 1601
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________
I think there might be an easier solution.
If your model does not have convolutional layers or any other layers that act upon the length/steps dimension, you can simply mark it as stateful=True
Warning: your model has layers that act on the length dimension !!
The Flatten layer transforms the length dimension into a feature dimension. This will completely prevent you from achieving your goal. If the Flatten layer is expecting 7 steps, you will always need 7 steps.
So, before applying my answer below, fix your model to not use the Flatten layer. Instead, it can just remove the return_sequences=True for the last LSTM layer.
The following code fixed that and also prepares a few things to be used with the answer below:
def createModel(forTraining):
#model for training, stateful=False, any batch size
if forTraining == True:
batchSize = None
stateful = False
#model for predicting, stateful=True, fixed batch size
else:
batchSize = 1
stateful = True
model = Sequential()
first_lstm = LSTM(32,
batch_input_shape=(batchSize, num_samples, num_features),
return_sequences=True, activation='tanh',
stateful=stateful)
model.add(first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
#this is the last LSTM layer, use return_sequences=False
model.add(LSTM(16, return_sequences=False, stateful=stateful, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
#don't add a Flatten!!!
#model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
if forTraining == True:
compileThisModel(model)
With this, you will be able to train with 7 steps and predict with one step. Otherwise it will not be possible.
The usage of a stateful model as a solution for your question
First, train this new model again, because it has no Flatten layer:
trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)
Now, with this trained model, you can simply create a new model exactly the same way you created the trained model, but marking stateful=True in all its LSTM layers. And we should copy the weights from the trained model.
Since these new layers will need a fixed batch size (Keras' rules), I assumed it would be 1 (one single stream is coming, not m streams) and added it to the model creation above.
predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())
And voilĂ . Just predict the outputs of the model with a single step:
pseudo for loop as samples arrive to your model:
prob = predictingModel.predict_on_batch(sample)
#where sample.shape == (1, 1, 3)
When you decide that you reached the end of what you consider a continuous sequence, call predictingModel.reset_states() so you can safely start a new sequence without the model thinking it should be mended at the end of the previous one.
Saving and loading states
Just get and set them, saving with h5py:
def saveStates(model, saveName):
f = h5py.File(saveName,'w')
for l, lay in enumerate(model.layers):
#if you have nested models,
#consider making this recurrent testing for layers in layers
if isinstance(lay,RNN):
for s, stat in enumerate(lay.states):
f.create_dataset('states_' + str(l) + '_' + str(s),
data=K.eval(stat),
dtype=K.dtype(stat))
f.close()
def loadStates(model, saveName):
f = h5py.File(saveName, 'r')
allStates = list(f.keys())
for stateKey in allStates:
name, layer, state = stateKey.split('_')
layer = int(layer)
state = int(state)
K.set_value(model.layers[layer].states[state], f.get(stateKey))
f.close()
Working test for saving/loading states
import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K
def createModel():
inp = Input(batch_shape=(1,None,3))
out = LSTM(5,return_sequences=True, stateful=True)(inp)
out = LSTM(2, stateful=True)(out)
out = Dense(1)(out)
model = Model(inp,out)
return model
def saveStates(model, saveName):
f = h5py.File(saveName,'w')
for l, lay in enumerate(model.layers):
#if you have nested models, consider making this recurrent testing for layers in layers
if isinstance(lay,RNN):
for s, stat in enumerate(lay.states):
f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))
f.close()
def loadStates(model, saveName):
f = h5py.File(saveName, 'r')
allStates = list(f.keys())
for stateKey in allStates:
name, layer, state = stateKey.split('_')
layer = int(layer)
state = int(state)
K.set_value(model.layers[layer].states[state], f.get(stateKey))
f.close()
def printStates(model):
for l in model.layers:
#if you have nested models, consider making this recurrent testing for layers in layers
if isinstance(l,RNN):
for s in l.states:
print(K.eval(s))
model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states
print('model1')
printStates(model1)
print('model2')
printStates(model2)
saveStates(model1,'testStates5')
loadStates(model2,'testStates5')
print('model1')
printStates(model1)
print('model2')
printStates(model2)
Considerations on the aspects of the data
In your first model (if it is stateful=False), it considers that each sequence in m is individual and not connected to the others. It also considers that each batch contains unique sequences.
If this is not the case, you might want to train the stateful model instead (considering that each sequence is actually connected to the previous sequence). And then you would need m batches of 1 sequence. -> m x (1, 7 or None, 3).
If I understood correctly, you have batches of m sequences, each of length 7, whose elements are 3-dimensional vectors (so batch has shape (m*7*3)).
In any Keras RNN you can set the
return_sequences flag to True to become the intermediate states, i.e., for every batch, instead of the definitive prediction, you will get the corresponding 7 outputs, where output i represents the prediction at stage i given all inputs from 0 to i.
But you would be getting all at once at the end. As far as I know, Keras doesn't provide a direct interface for retrieving the throughput whilst the batch is being processed. This may be even more constrained if you are using any of the CUDNN-optimized variants. What you can do is basically to regard your batch as 7 succesive batches of shape (m*1*3), and feed them progressively to your LSTM, recording the hidden state and prediction at each step. For that, you can either set return_state to True and do it manually, or you can simply set statefulto True and let the object keep track of it.
The following Python2+Keras example should exactly represent what you want. Specifically:
allowing to save the whole LSTM intermediate state in a persistent way
while waiting for the next sample
and predicting on a model trained on a specific batch size that may be arbitrary and unknown.
For that, it includes an example of stateful=True for easiest training, and return_state=True for most precise inference, so you get a flavor of both approaches. It also assumes that you get a model that has been serialized and from which you don't know much about. The structure is closely related to the one in Andrew Ng's course, who is definitely more authoritative than me in the topic. Since you don't specify how the model has been trained, I assumed a many-to-one training setup, but this could be easily adapted.
from __future__ import print_function
from keras.layers import Input, LSTM, Dense
from keras.models import Model, load_model
from keras.optimizers import Adam
import numpy as np
# globals
SEQ_LEN = 7
HID_DIMS = 32
OUTPUT_DIMS = 3 # outputs are assumed to be scalars
##############################################################################
# define the model to be trained on a fixed batch size:
# assume many-to-one training setup (otherwise set return_sequences=True)
TRAIN_BATCH_SIZE = 20
x_in = Input(batch_shape=[TRAIN_BATCH_SIZE, SEQ_LEN, 3])
lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, stateful=True)
dense = Dense(OUTPUT_DIMS, activation='linear')
m_train = Model(inputs=x_in, outputs=dense(lstm(x_in)))
m_train.summary()
# a dummy batch of training data of shape (TRAIN_BATCH_SIZE, SEQ_LEN, 3), with targets of shape (TRAIN_BATCH_SIZE, 3):
batch123 = np.repeat([[1, 2, 3]], SEQ_LEN, axis=0).reshape(1, SEQ_LEN, 3).repeat(TRAIN_BATCH_SIZE, axis=0)
targets = np.repeat([[123,234,345]], TRAIN_BATCH_SIZE, axis=0) # dummy [[1,2,3],,,]-> [123,234,345] mapping to be learned
# train the model on a fixed batch size and save it
print(">> INFERECE BEFORE TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
m_train.compile(optimizer=Adam(lr=0.5), loss='mean_squared_error', metrics=['mae'])
m_train.fit(batch123, targets, epochs=100, batch_size=TRAIN_BATCH_SIZE)
m_train.save("trained_lstm.h5")
print(">> INFERECE AFTER TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
##############################################################################
# Now, although we aren't training anymore, we want to do step-wise predictions
# that do alter the inner state of the model, and keep track of that.
m_trained = load_model("trained_lstm.h5")
print(">> INFERECE AFTER RELOADING TRAINED MODEL:", m_trained.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
# now define an analogous model that allows a flexible batch size for inference:
x_in = Input(shape=[SEQ_LEN, 3])
h_in = Input(shape=[HID_DIMS])
c_in = Input(shape=[HID_DIMS])
pred_lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, return_state=True, name="lstm_infer")
h, cc, c = pred_lstm(x_in, initial_state=[h_in, c_in])
prediction = Dense(OUTPUT_DIMS, activation='linear', name="dense_infer")(h)
m_inference = Model(inputs=[x_in, h_in, c_in], outputs=[prediction, h,cc,c])
# Let's confirm that this model is able to load the trained parameters:
# first, check that the performance from scratch is not good:
print(">> INFERENCE BEFORE SWAPPING MODEL:")
predictions, hs, zs, cs = m_inference.predict([batch123,
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
batch_size=1)
print(predictions)
# import state from the trained model state and check that it works:
print(">> INFERENCE AFTER SWAPPING MODEL:")
for layer in m_trained.layers:
if "lstm" in layer.name:
m_inference.get_layer("lstm_infer").set_weights(layer.get_weights())
elif "dense" in layer.name:
m_inference.get_layer("dense_infer").set_weights(layer.get_weights())
predictions, _, _, _ = m_inference.predict([batch123,
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
batch_size=1)
print(predictions)
# finally perform granular predictions while keeping the recurrent activations. Starting the sequence with zeros is a common practice, but depending on how you trained, you might have an <END_OF_SEQUENCE> character that you might want to propagate instead:
h, c = np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)), np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))
for i in range(len(batch123)):
# about output shape: https://keras.io/layers/recurrent/#rnn
# h,z,c hold the network's throughput: h is the proper LSTM output, c is the accumulator and cc is (probably) the candidate
current_input = batch123[i:i+1] # the length of this feed is arbitrary, doesn't have to be 1
pred, h, cc, c = m_inference.predict([current_input, h, c])
print("input:", current_input)
print("output:", pred)
print(h.shape, cc.shape, c.shape)
raw_input("do something with your prediction and hidden state and press any key to continue")
Additional information:
Since we have two forms of state persistency:
1. The saved/trained parameters of the model that are the same for each sequence
2. The a, c states that evolve throughout the sequences and may be "restarted"
It is interesting to take a look at the guts of the LSTM object. In the Python example that I provide, the a and c weights are explicitly handled, but the trained parameters aren't, and it may not be obvious how they are internally implemented or what do they mean. They can be inspected as follows:
for w in lstm.weights:
print(w.name, w.shape)
In our case (32 hidden states) returns the following:
lstm_1/kernel:0 (3, 128)
lstm_1/recurrent_kernel:0 (32, 128)
lstm_1/bias:0 (128,)
We observe a dimensionality of 128. Why is that? this link describes the Keras LSTM implementation as follows:
The g is the recurrent activation, p is the activation, Ws are the kernels, Us are the recurrent kernels, h is the hidden variable which is the output too and the notation * is an element-wise multiplication.
Which explains the 128=32*4 being the parameters for the affine transformation happening inside each one of the 4 gates, concatenated:
The matrix of shape (3, 128) (named kernel) handles the input for a given sequence element
The matrix of shape (32, 128) (named recurrent_kernel) handles the input for the last recurrent state h.
The vector of shape (128,) (named bias), as usual in any other NN setup.
Note: This answer assumes that your model in training phase is not stateful. You must understand what an stateful RNN layer is and make sure that the training data has the corresponding properties of statefulness. In short it means there is a dependency between the sequences, i.e. one sequence is the follow-up to another sequence, which you want to consider in your model. If your model and training data is stateful then I think other answers which involve setting stateful=True for the RNN layers from the beginning are simpler.
Update: No matter the training model is stateful or not, you can always copy its weights to the inference model and enable statefulness. So I think solutions based on setting stateful=True are shorter and better than mine. Their only drawback is that the batch size in these solutions must be fixed.
Note that the output of a LSTM layer over a single sequence is determined by its weight matrices, which are fixed, and its internal states which depends on the previous processed timestep. Now to get the output of LSTM layer for a single sequence of length m, one obvious way is to feed the entire sequence to the LSTM layer in one go. However, as I stated earlier, since its internal states depends on the previous timestep, we can exploit this fact and feed that single sequence chunk by chunk by getting the state of LSTM layer at the end of processing a chunk and pass it to the LSTM layer for processing the next chunk. To make it more clear, suppose the sequence length is 7 (i.e. it has 7 timesteps of fixed-length feature vectors). As an example, it is possible to process this sequence like this:
Feed the timesteps 1 and 2 to the LSTM layer; get the final state (call it C1).
Feed the timesteps 3, 4 and 5 and state C1 as the initial state to the LSTM layer; get the final state (call it C2).
Feed the timesteps 6 and 7 and state C2 as the initial state to the LSTM layer; get the final output.
That final output is equivalent to the output produced by the LSTM layer if we had feed it the entire 7 timesteps at once.
So to realize this in Keras, you can set the return_state argument of LSTM layer to True so that you can get the intermediate state. Further, don't specify a fixed timestep length when defining the input layer. Instead use None to be able to feed the model with sequences of arbitrary length which enables us to process each sequence progressively (it's fine if your input data in training time are sequences of fixed-length).
Since you need this chuck processing capability in inference time, we need to define a new model which shares the LSTM layer used in training model and can get the initial states as input and also gives the resulting states as output. The following is a general sketch of it could be done (note that the returned state of LSTM layer is not used when training the model, we only need it in test time):
# define training model
train_input = Input(shape=(None, n_feats)) # note that the number of timesteps is None
lstm_layer = LSTM(n_units, return_state=True)
lstm_output, _, _ = lstm_layer(train_input) # note that we ignore the returned states
classifier = Dense(1, activation='sigmoid')
train_output = classifier(lstm_output)
train_model = Model(train_input, train_output)
# compile and fit the model on training data ...
# ==================================================
# define inference model
inf_input = Input(shape=(None, n_feats))
state_h_input = Input(shape=(n_units,))
state_c_input = Input(shape=(n_units,))
# we use the layers of previous model
lstm_output, state_h, state_c = lstm_layer(inf_input,
initial_state=[state_h_input, state_c_input])
output = classifier(lstm_output)
inf_model = Model([inf_input, state_h_input, state_c_input],
[output, state_h, state_c]) # note that we return the states as output
Now you can feed the inf_model as much as the timesteps of a sequence are available right now. However, note that initially you must feed the states with vectors of all zeros (which is the default initial value of states). For example, if the sequence length is 7, a sketch of what happens when new data stream is available is as follows:
state_h = np.zeros((1, n_units,))
state_c = np.zeros((1, n_units))
# three new timesteps are available
outputs = inf_model.predict([timesteps, state_h, state_c])
out = output[0,0] # you may ignore this output since the entire sequence has not been processed yet
state_h = outputs[0,1]
state_c = outputs[0,2]
# after some time another four new timesteps are available
outputs = inf_model.predict([timesteps, state_h, state_c])
# we have processed 7 timesteps, so the output is valid
out = output[0,0] # store it, pass it to another thread or do whatever you want to do with it
# reinitialize the state to make them ready for the next sequence chunk
state_h = np.zeros((1, n_units))
state_c = np.zeros((1, n_units))
# to be continued...
Of course you need to do this in some kind of loop or implement a control flow structure to process the data stream, but I think you get what the general idea looks like.
Finally, although your specific example is not a sequence-to-sequence model, but I highly recommend to read the official Keras seq2seq tutorial which I think one can learn a lot of ideas from it.
As far as I know, because of the static graph in Tensorflow, there is no efficient way to feed inputs with different length from the training input length.
Padding is the official way to work around with that, but it is less efficient and memory consuming. I suggest you look into Pytorch, which will be trivial to fix your problem.
There are a lot of great posts to build lstm with Pytorch, and you will understand the benefit of dynamic graph once you see them.

Simple stateful nn Keras, can fit model but not predict

I'm trying to create a simple stateful neural network in keras to wrap my head around how to connect Embedding layers and LSTM's. I have a piece of text where I have mapped every character to a integer and would like to send in one character at a time to predict the next character. I have done this earlier where I have sent in 8 characters at a time and got that to work well (using return_sequences=True and TimeDistributed(Dense)). But this time I want to only send in 1 character at a time and this is where my problem arises.
The code I use to set up my model:
n_fac = 32
vocab_size = len(chars)
n_hidden = 256
batch_size=64
model = Sequential()
model.add(Embedding(vocab_size,n_fac,input_length=1,batch_input_shape=(batch_size,1)))
model.add(BatchNormalization())
model.add(LSTM(n_hidden,stateful=True))
model.add(Dense(vocab_size,activation='softmax'))
model.summary() gives me the following:
Layer (type) Output Shape Param # Connected to
embedding_1 (Embedding) (64, 1, 32) 992 embedding_input_1[0][0]
batchnormalization_1 (BatchNorma (64, 1, 32) 128 embedding_1[0][0]
lstm_1 (LSTM) (64, 256) 295936 batchnormalization_1[0][0]
dense_1 (Dense) (64, 31) 7967 lstm_1[0][0]
Total params: 305,023
Trainable params: 304,959
Non-trainable params: 64
The code I use to set up my training data:
text = ... #Omitted for simplicity. Just setting text to some kind of literature work
text = text.lower() #Simple model, therefor only using lower case characters
idx2char = list(set(list(text)))
char2idx = {char:idx for idx,char in enumerate(idx2char)}
text_in_idx = [char2idx[char] for char in text]
x = text_idx[:-1]
y = text_idx[1:]
Compiling and training my network:
model.compile(optimizer=Adam(lr=1e-4),loss='sparse_categorical_crossentropy')
nb_epoch = 10
for i in range(nb_epoch):
model.reset_states()
model.fit(x,y,nb_epoch=1,batch_size=batch_size,shuffle=False)
Training works as it should, the loss is reduced with each epoch.
Now I want to try out my trained network but have no idea how to give it a character to predict the next. I start out by resetting its states and then want to start feeding it one char at a time.
I tried a couple of different inputs but all of them failed. These are not qualified guesses.
#The model uses integers for characters, therefor integers are sent as input
model.predict([1]) #Type error
model.predict(np.array([1])) #Value error
model.predict(np.array([1])[np.newaxis,:]) #Value error
model.predict(np.array([1])[:,np.newaxis]) #Value error
Am I forced to send in something of length batch_size or how am I supposed to send in data for the model to predict something?
The error text for Value error is very long and obscure so I omitted it. I can supply it if needed.
Using theano backend with keras.

Categories