I'm building an RNN to analyze motion capture (MoCap) data using TensorFlow, Pandas, and Keras.
About my data:
Data is obtained through pandas.read_csv and has a shape of
(832, 165)
Each row denotes a whole frame of data in a movement sequence (832 frames)
Each column denotes the rotational data for a joint (165 joints total)
I'm attempting to feed the data in one row at a time. The output should be the next frame in the movement sequence. I keep running into different types of errors when running model.fit.
I've attached a series of photos representing the different attempts to make the model work. If someone could provide some guidance as to why it's not working and how to fix I'd greatly appreciate it.
As a side note, each version of my code is different. I'm okay with using any as long as it ends up working, so when providing feedback if you could identify which version of my code you're talking about?
Uses tf.data.Dataset as input
Version 1 Code / Output
Version 2 Code / Output
Version 3: [Code] [Output]
Uses pandas arrays for input and target
Version 4 Code / Output
Version 5 Code / Output
Using Code 4 as a basis for troubleshooting, I noticed that you are passing incompatible shapes to the layers.
This line model.add(keras.layers.InputLayer(input_shape = (N_TIMESTEPS, N_FEATURES))) expects your data to have the same shape.
Whereas your data have (832, 165), which is the N_SAMPLES on the first index and the N_FEATURES, the N_TIMESTEPS is missing.
First, you should create a modified dataset that will generate a shape of (N_SAMPLES, N_TIMESTEPS, N_FEATURES).
Here is an example to generate a dummy dataset:
data = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
target = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
The N_TIMESTEPS in your data is important in LSTM as it determines how many TIME_STEPS to consider per update.
Here is the complete code used to simulate successful execution in Google Colab.
%tensorflow_version 2.x # To ensure latest Tensorflow version in Google Colab
import tensorflow as tf
import tensorflow.keras as keras
print(tf.__version__) # Tensorflow 2.2.0-rc3
BATCH_SIZE = 1
N_TIMESTEPS = 10
#Data is obtained through pandas.read_csv and has a shape of (832, 165)
#Each row denotes a whole frame of data in a movement sequence (832 frames)
#Each column denotes the rotational data for a joint (165 joints total)
# N_SAMPLES = data.values.shape[0]
# N_FEATURES = data.values.shape[1]
N_SAMPLES = 832
N_FEATURES = 165
def get_compiled_model():
model = keras.Sequential()
model.add(keras.layers.InputLayer(input_shape = (N_TIMESTEPS, N_FEATURES)))
model.add(keras.layers.LSTM(35, activation = 'relu', return_sequences = True))
model.add(keras.layers.LSTM(35, activation = 'relu', return_sequences = True))
model.add(keras.layers.Dense(165, activation = 'tanh'))
model.compile(optimizer = 'adam',
loss = 'mse',
metrics = ['accuracy'])
return model
model = get_compiled_model()
model.summary()
data = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
target = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
model.fit(data, target, epochs = 15, batch_size = BATCH_SIZE, shuffle = False)
Hope this helps you.
You could read more about the Tensorflow Keras Guide using RNN in this link.
Related
I'm working on implementing a 2D (Perhaps 1D) CNN+ LSTM classifier for Network Traffic classification purposes. The CNN will essentially be used as a feature extractor and the LSTM would work for the classification.
I have used the TimeDistributed layer to help combine the CNN and LSTM layers together (Code attached.)
Since the input size varies dynamically, the number of data points has been indicated with None.
no_rows=20 (Number of packets considered per flow for classification)
no_cols=7 (Number of features considered for each packet)
Despite using the TimeDistributed layer wrap, I am facing some input dimension issues. Not quite sure how to resolve this.
Using Reshape as a layer to resolve this was one of the many fixes I came across but didn't work. Kindly let me know how to build this structure and how to fix my code.
Thanks !
(Using a Linux based AWS instance, Ubuntu 16.04 and Tensorflow backend to implement the code)
Used Reshape layer from Keras core layers to fix the output of the CNN but did not resolve the issue.
Had to remove the Flatten layer and replace it with GlobalMaxPooling2D layer due to the presence of dynamically changing input size.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.30, random_state = 36)
model = Sequential()
# Adding CNN Model layers
model.add(TimeDistributed(Conv2D(32, kernel_size = (4 , 2), strides = 1, padding='valid', activation = 'relu', input_shape = (None,20,7,1))))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Conv2D(64, kernel_size = (4 , 2), strides = 1, padding='valid', activation = 'relu')))
model.add(TimeDistributed(BatchNormalization()))
#model.add(TimeDistributed(Reshape((-1,1))))
model.add(TimeDistributed(GlobalMaxPooling2D()))
#model.add(Reshape((1,1)))
# Adding LSTM layers
model.add(LSTM(128, recurrent_dropout=0.2))
model.add(Dropout(rate = 0.2))
model.add(Dense(100))
model.add(Dropout(rate = 0.4))
model.add(Dense(108))
model.add(Dense(num_classes,activation='softmax'))
# Compiling this model
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
#print(model.summary())
#Training the Network
history = model.fit(X_train, Y_train, batch_size=32, epochs = 1, validation_data=(X_test, Y_test))
Running the code snippet mentioned above I face the following error message:
"Input tensor must be of rank 3, 4 or 5 but was {}.".format(n + 2))
ValueError: Input tensor must be of rank 3, 4 or 5 but was 2.
One thing you can do is to make a batch of a fixed input (number of frames) from your video source and process that. The code for doing that would be:
def get_data(video_source, batch_size):
x_data = []
#Reading the Video from file path
cap = cv2.VideoCapture(video_source)
for i in range(batch_size):
#To Store Frames
frames = []
for j in range(frame_to_process): #here we get frame_to_process
ret, frame = cap.read()
if not ret:
# print('No frames found!')
break
# converting to frmae gray
# frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# resizing frame to a particular input shape
# frame = cv2.resize(frame,(30,30),interpolation=cv2.INTER_AREA)
frames.append(frame)
# appending each batch
x_data.append(frames)
return x_data
# number of frames in each batch
frame_to_process = 30
# size of each batch
batch_size = 32
# make batch of video inputs
X_data = np.array(get_data(video_source, batch_size, frame_to_process))
Tip: Also, instead of using Conv2D with TimeDistrubuted you can use ConvLSTM which can give a little performance improvement.
Anyways, if you want to process frames dynamically you can convert the code to Pytorch which has Dynamic Graphs, where you can give input with variable batch size.
Dynamic Computation Graph
Difference between Static and Dynamic Graphs
I have some troubles with the LSTM implementation in Keras.
My training set is structured as follow:
number of sequences: 5358
the length of each sequence is 300
each element of the sequence is a vector of 54 features
I'm unsure on how to shape the input for a stateful LSTM.
Following this tutorial: http://philipperemy.github.io/keras-stateful-lstm/, I've created the subsequences (in my case there are 1452018 subsequences with a window_size = 30).
What is the best option to reshape the data for a stateful LSTM's input?
What means the timestep of the input in this case? And why?
Is the batch_size related to the timestep?
I'm unsure on how to shape the input for a stateful LSTM.
LSTM(100, statefull=True)
But before using stateful LSTM ask yourself do I really need statefull LSTM? See here and here for more details.
What is the best option to reshape the data for a stateful LSTM's
input?
It really depends on the problem on hands. However, I think you do not need reshaping just feed data directly into Keras:
input_layer = Input(shape=(300, 54))
What means the timestep of the input in this case? And why?
In your example timestamp is 300. See here for further details on timestamp. In the following picture, we have 5 timestamps that we feed them into the LSTM network.
Is the batch_size related to the timestep?
No, it has nothing to do with batch_size. More details on batch_size can be found here.
Here is simple code based on the description that you provide. It might give you some intuition:
import numpy as np
from tensorflow.python.keras import Input, Model
from tensorflow.python.keras.layers import LSTM
from tensorflow.python.layers.core import Dense
x_train = np.zeros(shape=(5358, 300, 54))
y_train = np.zeros(shape=(5358, 1))
input_layer = Input(shape=(300, 54))
lstm = LSTM(100)(input_layer)
dense1 = Dense(20, activation='relu')(lstm)
dense2 = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=input_layer, ouputs=dense2)
model.compile("adam", loss='binary_crossentropy')
model.fit(x_train, y_train, batch_size=512)
I've been working with LSTMs for a while and I think I have grasped the main concepts. I have been trying to play with the Keras environment for a while so that I could get a better idea of how LSTM work, so I decided to train a neural network to identify the MNIST dataset.
I know that when I train a LSTM I should give a tensor as an input (number of samples, time steps, features). I reshaped the image from a 28x28 to a single vector of 784 elements (1x784) and then I make the input_shape = (60000, 1, 784). Eventually I tried to change the number of time steps and my new input_shape becomes (60000,16,49).
What I don't understand is why when I change the number of time steps the feature vector changes from 784 to 49. I think I don't really understand the concept of time steps in an LSTM. Could you please explain it better? Possibly referring to this particular case?
Furthermore, when I increase the time steps the precision is lower, why is so? Shouldn't it be higher?
Thank you.
edit
from __future__ import print_function
import numpy as np
import struct
from keras.models import Sequential
from keras.layers import Dense, LSTM, Activation
from keras.utils import np_utils
train_im = open('train-images-idx3-ubyte','rb')
train_la = open('train-labels-idx1-ubyte','rb')
test_im = open('t10k-images-idx3-ubyte','rb')
test_la = open('t10k-labels-idx1-ubyte','rb')
##training images and labels
magic,num_ima = struct.unpack('>II', train_im.read(8))
rows,columns = struct.unpack('>II', train_im.read(8))
img = np.fromfile(train_im,dtype=np.uint8).reshape(rows*columns, num_ima) #784*60000
magic_l, num_l = struct.unpack('>II', train_la.read(8))
lab = np.fromfile(train_la, dtype=np.int8) #1*60000
## test images and labels
magic, num_test = struct.unpack('>II', test_im.read(8))
rows,columns = struct.unpack('>II', test_im.read(8))
img_test = np.fromfile(test_im,dtype=np.uint8).reshape(rows*columns, num_test) #784x10000
magic_l, num_l = struct.unpack('>II', test_la.read(8))
lab_test = np.fromfile(test_la, dtype=np.int8) #1*10000
batch = 50
epoch=15
hidden_units = 10
classes = 1
a, b = img.T.shape[0:]
img = img.reshape(img.T.shape[0],-1,784)
img_test = img_test.reshape(img_test.T.shape[0],-1,784)
lab = np_utils.to_categorical(lab, 10)
lab_test = np_utils.to_categorical(lab_test, 10)
print(img.shape[0:])
model = Sequential()
model.add(LSTM(40,input_shape =img.shape[1:], batch_size = batch))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer = 'RMSprop', loss='mean_squared_error', metrics = ['accuracy'])
model.fit(img, lab, batch_size = batch,epochs=epoch,verbose=1)
scores = model.evaluate(img_test, lab_test, batch_size=batch)
predictions = model.predict(img_test, batch_size = batch)
print('LSTM test score:', scores[0])
print('LSTM test accuracy:', scores[1])
edit 2
Thank you very much, when I do so I get the following error:
ValueError: Input arrays should have the same number of samples as target arrays. Found 3750 input samples and 60000 target samples.
I know that I should reshape the output as well but I don't know what shape it should have.
Timesteps represent states in time like frames extracted from a video. The shape of the input passed to the LSTM should be in the form (num_samples,timesteps,input_dim). If you want 16 timesteps you should reshape your data as (num_samples//timesteps, timesteps, input_dims)
img=img.reshape(3750,16,784)
So with your batch_size=50,it will pass 50*16 images at a time.
Right now as you keep the num_samples constant, it splits your input_dims.
edit:
The target array will have the same shape as the num_samples i.e 3750 in your case. All the time steps will share the same label. You have to decide what you are going to do with those MNIST sequences. Your current model classifies those sequences (not digits) into 10 classes.
My input data consists of 10 samples, each of which has 200 time steps, while each time step is described by a vector of 30 dimensions.
In addition, each time step consists of a 3 dimensional vector (one hot encoding) which describes the action which has been taken at that particular time step. With that being said, I am trying to build a model which get fed in all previous actions and then predicts which action would be the best to take next.
I tried to get this working with tflearn and tensorflow but with limited success so far.
Simple sample code:
import numpy as np
import operator
import tflearn
from tflearn import regression
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.embedding_ops import embedding
from tflearn.layers.recurrent import bidirectional_rnn, BasicLSTMCell
from tflearn.data_utils import to_categorical, pad_sequences
SAMPLES = 10
TIME_STEPS = 200
DATA_DIMENSIONS = 30
LABEL_CLASSES = 3
x = []
y = []
# Generate fake data.
for i in range(SAMPLES):
sequences = []
outputs = []
for i in range(TIME_STEPS):
d = []
for i in range(DATA_DIMENSIONS):
d.append(1)
sequences.append(d)
outputs.append([0,0,1])
x.append(sequences)
y.append(outputs)
print("X1:", len(x), ", X2:", len(x[0]), ", X3:", len(x[0][0]))
print("Y1:", len(y), ", Y2:", len(y[0]), ", Y3:", len(y[0][0]))
# Define model
net = tflearn.input_data([None, TIME_STEPS, DATA_DIMENSIONS], name='input')
net = tflearn.lstm(net, 128, dropout=0.8, return_seq=True)
net = tflearn.fully_connected(net, LABEL_CLASSES, activation='softmax')
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(net)
# Fit model.
model.fit({'input': x}, {'targets': y},
n_epoch=1,
snapshot_step=1000,
show_metric=True, run_id='test', batch_size=32)
Error
ValueError: Cannot feed value of shape (10, 200, 3) for Tensor
'targets/Y:0', which has shape '(?, 3)'
As far as I understand, the input_data should be correct. However, the output data is apparently wrong, at least, Tensorflow throws an error. That is probably because my model expects one label per sample rather than one label per time step.
Can I even achieve my goal with an LSTM, and if so, how do I have to set up my model?
Thanks,
Robert
As the error suggests, there is a shape mismatch between the expected size of your targets tensor, and the one of the data you actually provide for it. Let us break it down.
From what I understand, you have labeled action for every timestep of your sequences. This means that the labels that you provide should have a shape (10, 200, 3). This seems to be the case from the error message. Good.
So we now know the error comes from what the network generates.
=================
Input data -> (10, 200, 30)
LSTM -> (10, 128) (because return_seq=False)
FullyConnected -> (10, 3).
=================
So that explains the second part of the error message, your network indeed produces an output with shape (10, 3) which mismatches the one of your data.
I think you missed the return_seq argument of the LSTM. As is usually the case with RNN implementations, you have a parameter telling if you want the layer to return outputs for the whole sequence, or only for the last timestep. Here by default it is the second option, that is why you don't get an output with the expected shape. Use return_seq=True.
I am brand new to Deep-Learning so I'm reading though Deep Learning with Keras by Antonio Gulli and learning a lot. I want to start using some of the concepts. I want to try and implement a neural network with a 1-dimensional convolutional layer that feeds into a bidirectional recurrent layer (like the paper below). All the tutorials or code snippets I've encountered do not implement anything remotely similar to this (e.g. image recognition) or use an older version of keras with different functions and usage.
What I'm trying to do is a variation of this paper:
(1) convert DNA sequences to one-hot encoding vectors; ✓
(2) use a 1 dimensional convolutional neural network; ✓
(3) with max pooling; ✓
(4) send the output to a bidirectional RNN; ⓧ
(5) classify the input;
I cannot figure out how to get the shapes to match up on the Bidirectional RNN. I can't even get an ordinary RNN to work at this stage. How can I restructure the incoming layers to work with a Bidirectional RNN?
Note:
The original code came from https://github.com/uci-cbcl/DanQ/blob/master/DanQ_train.py but I simplified the output layer to just do binary classification. This processed was described (kind of) in https://github.com/fchollet/keras/issues/3322 but I cannot get it to work with the updated keras. The original code (and the 2nd link) work on a very large dataset so I am generating some fake data to illustrate the concept. They are also using an older version of keras where key functionality changes have been made since then.
# Imports
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl.keras.layers.core import *
from tensorflow.python.keras._impl.keras.layers import Conv1D, MaxPooling1D, SimpleRNN, Bidirectional, Input
from tensorflow.python.keras._impl.keras.models import Model, Sequential
# Set up TensorFlow backend
K = tf.keras.backend
K.set_session(tf.Session())
np.random.seed(0) # For keras?
# Constants
NUMBER_OF_POSITIONS = 40
NUMBER_OF_CLASSES = 2
NUMBER_OF_SAMPLES_IN_EACH_CLASS = 25
# Generate sequences
https://pastebin.com/GvfLQte2
# Build model
# ===========
# Input Layer
input_layer = Input(shape=(NUMBER_OF_POSITIONS,4))
# Hidden Layers
y = Conv1D(100, 10, strides=1, activation="relu", )(input_layer)
y = MaxPooling1D(pool_size=5, strides=5)(y)
y = Flatten()(y)
y = Bidirectional(SimpleRNN(100, return_sequences = True, activation="tanh", ))(y)
y = Flatten()(y)
y = Dense(100, activation='relu')(y)
# Output layer
output_layer = Dense(NUMBER_OF_CLASSES, activation="softmax")(y)
model = Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="categorical_crossentropy", )
model.summary()
# ~/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/recurrent.py in build(self, input_shape)
# 1049 input_shape = tensor_shape.TensorShape(input_shape).as_list()
# 1050 batch_size = input_shape[0] if self.stateful else None
# -> 1051 self.input_dim = input_shape[2]
# 1052 self.input_spec[0] = InputSpec(shape=(batch_size, None, self.input_dim))
# 1053
# IndexError: list index out of range
You don't need to restructure anything at all to get the output of a Conv1D layer into an LSTM layer.
So, the problem is simply the presence of the Flatten layer, which destroys the shape.
These are the shapes used by Conv1D and LSTM:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
Length is the same as timeSteps, and channels is the same as features.
Using the Bidirectional wrapper won't change a thing either. It will only duplicate your output features.
Classifying.
If you're going to classify the entire sequence as a whole, your last LSTM must use return_sequences=False. (Or you may use some flatten + dense instead after)
If you're going to classify each step of the sequence, all your LSTMs should have return_sequences=True. You should not flatten the data after them.