I'm very new to keras and also to python.
I have a time series dataset with different sequence lengths (for example 1st sequence is 484000x128, 2nd sequence is 563110x128, etc)
I've put the sequences in 3D array.
My question is how to define the input shape, because I'm confused. I was using DL4J but the concept is different in defining the network configuration.
Here is my first trial code:
import numpy as np
from keras.models import Sequential
from keras.layers import Embedding,LSTM,Dense,Dropout
## Loading dummy data
sequences = np.array([[[1,2,3],[1,2,3]], [[4,5,6],[4,5,6],[4,5,6]]])
y = np.array([[[0],[0]], [[1],[1],[1]]])
x_test=np.array([[2,3,2],[4,6,7],[1,2,1]])
y_test=np.array([0,1,1])
n_epochs=40
# model configration
model = Sequential()
model.add(LSTM(100, input_shape=(3,1), activation='tanh', recurrent_activation='hard_sigmoid')) # 100 num of LSTM units
model.add(LSTM(100, activation='tanh', recurrent_activation='hard_sigmoid'))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print(model.summary())
## training with batches of size 1 (each batch is a sequence)
for epoch in range(n_epochs):
for seq, label in zip(sequences, y):
model.train(np.array([seq]), [label]) # train a batch at a time..
scores=model.evaluate(x_test, y_test) # evaluate batch at a time..
Here is the docs on input shapes for LSTMs:
Input shapes
3D tensor with shape (batch_size, timesteps, input_dim), (Optional) 2D
tensors with shape (batch_size, output_dim).
Which implies that you you're going to need timesteps with a constant size for each batch.
The canonical way of doing this is padding your sequences using something like keras's padding utility
then you can try:
# let say timestep you choose: is 700000 and dimension of the vectors are 128
timestep = 700000
dims = 128
model.add(LSTM(100, input_shape=(timestep, dim),
activation='tanh', recurrent_activation='hard_sigmoid'))
I edited the answer to remove the batch_size argument. With this setup the batch size is unspecified, you could set that when you fitting the model (in model.fit()).
Related
I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.
During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:
# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, calling fit works!
But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64, similar to the one presented here.
# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1
# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
# [[{{node ctc_loss_1/CTCLoss}}]]
After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.
TL;DR Solution:
If you're using ctc_batch_cost:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.
If you're using ctc_loss:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.
Solution:
The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.
The TensorFlow documentation for ctc_batch_cost reads:
tf.keras.backend.ctc_batch_cost(
y_true, y_pred, input_length, label_length
)
...
input_length tensor (samples, 1) containing the sequence length for
each batch item in y_pred.
...
input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation:
tf.nn.ctc_loss(
labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
blank_index=None, name=None
)
...
logit_length tensor of shape [batch_size] Length of input sequence in
logits.
...
That's where it clicked, at the word logit. So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (i.e. number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.
I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.
Now fit runs without crashing.
Note: I already read keras forward pass with tensorflow variable as input but it did not help.
I'm training an auto-encoder unsupervised neural-network with Keras with the MNIST database:
import keras, cv2
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255.0
x_test = x_test.reshape(10000, 784).astype('float32') / 255.0
model = Sequential()
model.add(Dense(100, activation='sigmoid', input_shape=(784,)))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(784, activation='sigmoid'))
model.compile(loss='mean_squared_error', optimizer='sgd')
history = model.fit(x_train, x_train, batch_size=1, epochs=1, verbose=0)
Then I would like to get the output vector when the input vector is x_test[i]:
for i in range(100):
x = x_test[i]
a = model(x)
cv2.imshow('img', a.reshape(28,28))
cv2.waitKey(0)
but I get this error:
All inputs to the layer should be tensors.
How should I modify this code to do a forward pass of an input vector in the neural network, and get a vector in return?
Also how to get the activation after, say, the 2nd layer? i.e. don't propagate until the last layer, but get the output after the 2nd layer.
Example: input: vector of size 784, output: vector of size 10
To run a model after you've finished training it you need to use keras predict(). This will evaluate the graph, given your input data. Note that the input data must be the same dimensions as the specified model inputs, which in your case looks to be [None, 784]. Keras does not require you to specify the batch dimension but you still need a 2D array going in. Do something like..
x = x_test[5]
x = x[numpy.newaxis,:]
out_val = model.predict(x)[0]
if you just want to process a single value.
The numpy.newaxis is required to make a 2D array and thus match your input size. You can skip this if you pass in an array of values to evaluate all at once.
With Keras/Tensorflow, your model is a graph/function, not standard python procedural code. You can't call it with data directly. You need to create functions and then call the functions. To get the output from an intermediate layer you can do something like..
OutFunc = K.function([model.input], [model.layers[2].output])
out_val = OutFunc([x])[0]
again, keep in mind there is a batch dimension on the input which will be produced in the output. There's a number of posts on getting data from intermediate layers if you need some additional examples. For instance see Keras, How to get the output of each layer?
An other way to do this than the accepted answer: when x is just a (784,) or (784,1) numpy array, we can use this:
model.predict([[x]])
with a double [[...]].
I want to implement Recurrent Neural network with GRU using Keras in python. I have problem in running code and I change variables more and more but it doesn't work. Do you have an idea for solve it?
inputs = 42 #number of columns input
num_hidden =50 #number of neurons in the layer
outputs = 1 #number of columns output
num_epochs = 50
batch_size = 1000
learning_rate = 0.05
#train (125973, 42) 125973 Rows and 42 Features
#Labels (125973,1) is True Results
model = tf.contrib.keras.models.Sequential()
fv=tf.contrib.keras.layers.GRU
model.add(fv(units=42, activation='tanh', input_shape= (1000,42),return_sequences=True)) #i want to send Batches to train
#model.add(tf.keras.layers.Dropout(0.15)) # Dropout overfitting
#model.add(fv((1,42),activation='tanh', return_sequences=True))
#model.add(Dropout(0.2)) # Dropout overfitting
model.add(fv(42, activation='tanh'))
model.add(tf.keras.layers.Dropout(0.15)) # Dropout overfitting
model.add(tf.keras.layers.Dense(1000,activation='softsign'))
#model.add(tf.keras.layers.Activation("softsign"))
start = time.time()
# sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
# model.compile(loss="mse", optimizer=sgd)
model.compile(loss="mse", optimizer="Adam")
inp = np.array(train)
oup = np.array(labels)
X_tr = inp[:batch_size].reshape(-1, batch_size, inputs)
model.fit(X_tr,labels,epochs=20, batch_size=batch_size)
However I get the following error:
ValueError: Error when checking target: expected dense to have shape (1000,) but got array with shape (1,)
Here, you have mentioned input vector shape to be 1000.
model.add(fv(units=42, activation='tanh', input_shape= (1000,42),return_sequences=True)) #i want to send Batches to train
However, shape of your training data (X_tr) is 1-D
Check your X_tr variable and have same dimension for input layer.
If you read the error carefully you would realize there is a shape mismatch between the shapes of labels you provide, which is (None, 1), and the shape of output of model, which is (None, 1):
ValueError: Error when checking target: <--- This means the output shapes
expected dense to have shape (1000,) <--- output shape of model
but got array with shape (1,) <--- the shape of labels you give when training
Therefore you need to make them consistent. You just need to change the number of units in the last layer to 1 since there is one output per input sample:
model.add(tf.keras.layers.Dense(1, activation='softsign')) # 1 unit in the output
#x_train.shape = 7x5x5 numpy array
#y_train.shape = 3x5x5 numpy array
#x_test.shape = (7,) numpy array
#y_test.shape = (3,) numpy array I have binary output as 0 or 1.
timeteps = 5
data_dim = 5
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=`(timesteps,data_dim)))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=5, batch_size=1)
score = model.evaluate(X_test,y_test,batch_size=1)
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (3, 5, 5)
I am trying to model LSTM using random data and this error occurs. I have tried many things but I could not succeed.
Thanks in advance.
There are a few problems/misunderstands here?
You can see that your y is actually 3 dimensional. However, the last lstm layer, you have return sequences as false, meaning that the LSTM is returning a single 32 long vector and sending that into the dense layer.
Furthermore, the use of multiple LSTMS here seems to lack purpose, though it does not necessarily harm anything.
In order to fit your presumed data, you would want the last lstm to have return_sequences as True, and have the number of neurons in that lstm not 32, but rather 5, as in the final dimension of your y data.
You could also not have it at all (since you already have two lstms before that, and instead make the second lstm only have 5 neurons and have the final lstm layer be removed entirely. You would then use a time distributed wrapper on the last dense layer
model.add(TimeDistrubuted(Dense(1,activation='sigmoid')))
which says to apply the same dense layer to every timestep of the data, which is required by the shape of your y data.
I have a dataset with 5K rows (-1K for validation) and 17 columns, including the last one (the target integer binary label).
My model is simply this 2-layer LSTM:
model = Sequential()
model.add(Embedding(output_dim=64, input_dim=17))
model.add(LSTM(32, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='rmsprop',
class_mode='binary')
After loading my dataset with pandas
df_train = pd.read_csv(train_file)
train_X, train_y = df_train.values[:, :-1], df_train['target'].values
and trying to run my model, I get this error:
Exception: When using TensorFlow, you should define explicitly the number of timesteps of your sequences. - If your first layer is an Embedding, make sure to pass it an "input_length" argument. Otherwise, make sure the first layer has an "input_shape" or "batch_input_shape" argument, including the time axis.
What should I put in input_length? The total rowcount?
Since my dataframe has a shape as train_X=(4000, 17) train_y=(4000,) how can I prepare it to feed this kind of model? I have to change my input data shape?
Thanks for any help!! (=
It looks like Keras uses the static unrolling approach to build recurrent networks (such as LSTMs) on TensorFlow. The input_length should be the length of the longest sequence that you want to train: so if each row of your CSV file train_file is a comma-delimited sequence of symbols, it should be the number of symbols in the longest row.