Understanding the structure of my LSTM model - python

I'm trying to solve the following problem:
I have time series data from a number of devices. each device recording is of length 3000. Every datapoint captured has 4 measurements. so my data is shaped (number of device recordings, 3000, 4).
I'm trying produce a vector of length 3000 where each data point of is one of 3 labels (y1, y2, y3), so my desired output dim is (number of device recording, 3000, 1). I have labeled data for training.
I'm trying to use an LSTM model for this, as 'classification as I move along time series data' seems like a RNN type of problem.
I have my network set up like this:
model = Sequential()
model.add(LSTM(3, input_shape=(3000, 4), return_sequences=True))
model.add(LSTM(3, activation = 'softmax', return_sequences=True))
model.summary()
and the summary looks like this:
Model: "sequential_23"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_29 (LSTM) (None, 3000, 3) 96
_________________________________________________________________
lstm_30 (LSTM) (None, 3000, 3) 84
=================================================================
Total params: 180
Trainable params: 180
Non-trainable params: 0
_________________________________________________________________
All looks good and well in the output space, as I can use the result from each unit to determine which of my three categories belongs to that particular time step (I think).
But I only have 180 trainable parameters, so I'm guessing that I am doing something horribly wrong.
Can someone help me understand why I have so few trainable parameters? Am I misinterpreting how to set up this LSTM? Am I just worrying over nothing?
Does that 3 units mean I only have 3 LSTM 'blocks'? and that it can only look back 3 observations?

In a simplistic viewpoint, you can consider a LSTM layer as an augmented Dense layer with a memory (hence enabling efficient processing of sequences). So the concept of "units" is also the same for both: the number of neurons or feature units of these layers, or in other words, the number of distinctive features these layers can extract from the input.
Therefore, when you specify the number of units to 3 for the LSTM layer, more or less it means that this layer can only extract 3 distinctive features from the input timesteps (note that the number of units has nothing to do with the length of input sequence, i.e. the entire input sequence will be processed by the LSTM layer no matter what the number of units or the length of input sequence is).
Usually, this might be sub-optimal (though, it really depends on the difficulty of the specific problem and dataset you are working on; i.e. maybe 3 units might be enough for your problem/dataset, and you should experiment to find out). Therefore, often a higher number is chosen for the number of units (common choices: 32, 64, 128, 256), and also the classification task is delegated to a dedicated Dense layer (or sometimes called "softmax layer") at the top of the model.
For example, considering the description of your problem, a model with 3 stacked LSTM layers and a Dense classification layer at the top might look like this:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(3000, 4)))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(Dense(3, activation = 'softmax'))

Related

Training a multi-variate multi-series regression problem with stateful LSTMs in Keras

I have time series of P processes, each of varying length but all having 5 variables (dimensions). I am trying to predict the estimated lifetime of a test process. I am approaching this problem with a stateful LSTM in Keras. But I am not sure if my training process is correct.
I divide each sequence into batches of length 30. So each sequence is of the shape (s_i, 30, 5), where s_i is different for each of the P sequences (s_i = len(P_i)//30). I append all sequences into my training data which has the shape (N, 30, 5) where N = s_1 + s_2 + ... + s_p.
Model:
# design network
model = Sequential()
model.add(LSTM(32, batch_input_shape=(1, train_X[0].shape[1], train_X[0].shape[2]), stateful=True, return_sequences=True))
model.add(LSTM(16, return_sequences=False))
model.add(Dense(1, activation="linear"))
model.compile(loss='mse', optimizer=Adam(lr=0.0005), metrics=['mse'])
The model.summary() looks like
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (1, 30, 32) 4864
_________________________________________________________________
lstm_2 (LSTM) (1, 16) 3136
_________________________________________________________________
dense_1 (Dense) (1, 1) 17
=================================================================
Training loops:
for epoch in range(epochs):
mean_tr_acc = []
mean_tr_loss = []
for seq in range(train_X.shape[0]): #24
# train on whole sequence batch by batch
for batch in range(train_X[seq].shape[0]): #68
b_loss, b_acc = model.train_on_batch(np.expand_dims(train_X[seq][batch], axis=0), train_Y[seq][batch][-1])
mean_tr_acc.append(b_acc)
mean_tr_loss.append(b_loss)
#reset lstm internal states after training of each complete sequence
model.reset_states()
Edit:
The problem with the loss graph was I was dividing the values in my custom loss, making them too small. If I remove the division and plot the loss graph logarithmically, it looks alright.
New Problem:
Once the training is done, I am trying to predict. I show my model a 30 time-samples of a new process; so the input shape is same as the batch_input_shape during training i.e. (1, 30, 5). The prediction I am getting for different batches of the same sequence are all same.
I am almost sure I am doing something wrong in the training process. If anyone could help me out, would be grateful. Thanks.
Edit 2:
So the model predicts exactly same results only if it has been trained for more than 20 epochs. Otherwise the prediction values are very close but still a bit different. I guess this is due to some kind of overfitting. Help!!!
The loss for 25 epochs looks like this:
Usually when results are the same it's because your data isn't normalized. I suggest you center your data with mean=0 and std=1 with a simple normal transform (ie. (data - mean)/std ). Try transforming it like so before training and testing. Differences in how data is normalized between training and testing sets can also cause problems, which may be the cause of your discrepancy in train vs test loss. Always use the same normalization technique for all your data.

What strategy should I use in my CNN to go from a 3D volume to a 2D plane?

What strategy should I use in my CNN to go from a 3D volume to a 2D plane as the output layer. Can I even have a 2D layer as output?
I am trying to develop a network which input is a 320x320x3 image and output should be 68x2.
I know one way to do it would be to start from 320x320x3 and after a few layer I could flatten my 3D layers and then shorten it down to a 1D array of 136. But I am trying to understand if I could somehow go down to a desired 2d dimension at the final layer.
Thanks,
Shubham
Edit: I might have misread your question initially. If your intention is to have 136 output nodes that can be arranged in a 68x2 matrix (and not to have a 68x68x2 image in the output, as I though at first), then you can use a Reshape layer after your final dense layer with 136 units:
import keras
from keras.models import Sequential
from keras.layers import Conv2D, Flatten, Dense, Reshape
model = Sequential()
model.add(Conv2D(32, 3, input_shape=(320, 320, 3)))
model.add(Flatten())
model.add(Dense(136))
model.add(Reshape((68, 2)))
model.summary()
This will give you the following model, with the desired shape in the output:
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 318, 318, 32) 896
_________________________________________________________________
flatten_2 (Flatten) (None, 3235968) 0
_________________________________________________________________
dense_2 (Dense) (None, 136) 440091784
_________________________________________________________________
reshape_1 (Reshape) (None, 68, 2) 0
=================================================================
Total params: 440,092,680
Trainable params: 440,092,680
Non-trainable params: 0
Make sure to provide your training labels in the same shape when fitting the model.
(original answer, might still be relevant)
Yes, this is commonly done in semantic segmentation models, where the inputs are images and the outputs are tensors of the same height and width of the images, and with the number of channels equal to the number of classes in the output. If you want to do this in TensorFlow or Keras, you can look up existing implementations, for instance of U-Net architectures.
A core feature of these models is that these networks are fully convolutional: they only consist of convolutional layers. Typically, the feaure maps in these models go from 'wide and shallow' (big feature maps in the spatial dimensions with few channels) at first, to 'small and deep' (small spatial dimensions, high-dimensional channel dimension) and back to the desired output dimension. Hence the U-shape:
There are a lot of ways to go from 320x320x3 to 68x2 with a fully convolutional network, but the input and output of your model would basically look like this:
import keras
from keras import Sequential
from keras.layers import Conv2D
model = Sequential()
model.add(Conv2D(32, 3, activation='relu', input_shape=(320,320,3)))
# Include more convolutional layers, pooling layers, upsampling layers etc
...
# At the end of the model, add your final Conv2dD layer with 2 filters
# and the required activation function
model.add(Conv2D(2, 3, activation='softmax'))

Keras Sequential Dense input layer - and MNIST: why do images need to be reshaped?

I'm asking this because I feel I'm missing something fundamental.
By now most everyone knows that the MNIST images are 28X28 pixels. The keras documentation tells me this about Dense:
Input shape nD tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).
So a newbie like me would assume that the images could be fed to the model as a 28*28 matrix. Yet every tutorial I found goes through various gymasntics to convert the images to a single 784-long feature.
Sometimes by
num_pixels = X_train.shape[1] * X_train.shape[2]
model.add(Dense(num_pixels, input_dim=num_pixels, activation='...'))
or
num_pixels = np.prod(X_train.shape[1:])
model.add(Dense(512, activation='...', input_shape=(num_pixels,)))
or
model.add(Dense(units=10, input_dim=28*28, activation='...'))
history = model.fit(X_train.reshape((-1,28*28)), ...)
or even:
model = Sequential([Dense(32, input_shape=(784,)), ...),])
So my question is simply - why? Can't Dense just accept an image as-is or, if necessary, just process it "behind the scenes", as it were? And if, as I suspect, this processing has to be done, is any of these methods (or others) inherently preferable?
As requested by OP (i.e. Original Poster), I will mention the answer I gave in my comment and elaborate more.
Can't Dense just accept an image as-is or, if necessary, just process
it "behind the scenes", as it were?
Simply no! That's because currently the Dense layer is applied on the last axis. Therefore, if you feed it an image of shape (height, width) or (height, width, channels), Dense layer would be only applied on the last axis (i.e. width or channels). However, when the image is flattened, all the units in the Dense layer would be applied on the whole image and each unit is connected to all the pixels with different weights. To further clarify this consider this model:
model = models.Sequential()
model.add(layers.Dense(10, input_shape=(28*28,)))
model.summary()
Model summary:
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 10) 7850
=================================================================
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________
As you can see, there are 7850 parameters in the Dense layer: each unit is connected to all the pixels (28*28*10 + 10 bias params = 7850). Now consider this model:
model = models.Sequential()
model.add(layers.Dense(10, input_shape=(28,28)))
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 28, 10) 290
=================================================================
Total params: 290
Trainable params: 290
Non-trainable params: 0
_________________________________________________________________
In this case there are only 290 parameters in the Dense layer. Here each unit in the Dense layer is connected to all the pixels as well, but the difference is that the weights are shared across the first axis (28*10 + 10 bias params = 290). It is as though the features are extracted from each row of the image compared to the previous model which extracted features across the whole image. And therefore this (i.e. weight sharing) may or may not be useful for your application.

What is the architecture behind the Keras LSTM Layer implementation?

How does the input dimensions get converted to the output dimensions for the LSTM Layer in Keras? From reading Colah's blog post, it seems as though the number of "timesteps" (AKA the input_dim or the first value in the input_shape) should equal the number of neurons, which should equal the number of outputs from this LSTM layer (delineated by the units argument for the LSTM layer).
From reading this post, I understand the input shapes. What I am baffled by is how Keras plugs the inputs into each of the LSTM "smart neurons".
Keras LSTM reference
Example code that baffles me:
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))
From this, I would think that the LSTM layer has 10 neurons and each neuron is fed a vector of length 64. However, it seems it has 32 neurons and I have no idea what is being fed into each. I understand that for the LSTM to connect to the Dense layer, we can just plug all 32 outputs to each of the 2 neurons. What confuses me is the InputLayer to the LSTM.
(similar SO post but not quite what I need)
Revisited and updated in 2020: I was partially correct! The architecture is 32 neurons. The 10 represents the timestep value. Each neuron is being fed a 64 length vector (maybe representing a word vector), representing 64 features (perhaps 64 words that help identify a word) over 10 timesteps.
The 32 represents the number of neurons. It represents how many hidden states there are for this layer and also represents the output dimension (since we output a hidden state at the end of each LSTM neuron).
Lastly, the 32-dimensional output vector generated from the 32 neurons at the last timestep is then fed to a Dense layer of 2 neurons, which basically means plug the 32 length vector to both neurons, with weights on the input and activation.
More reading with somewhat helpful answers:
Understanding Keras LSTMs
What exactly am I configuring when I create a stateful LSTM layer with N units
Initializing LSTM hidden states with
Keras
I dont think you are right. Actually timestep number does not impact the number of parameters in LSTM.
from keras.layers import LSTM
from keras.models import Sequential
time_step = 13
featrue = 5
hidenfeatrue = 10
model = Sequential()
model.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model.summary()
time_step=100
model2 = Sequential()
model2.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model2.summary()
the reuslt:
Using TensorFlow backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 10) 640
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 10) 640
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________
#Sticky, you are wrong in your interpretation.
Input_shape =(batch_size,sequence_length/timesteps,feature_size).So, your input tensor is 10x64 (like 10 words and its 64 features.Just like word embedding).32 are neurons to make output vector size 32.
The output will have shape structure:
(batch, arbitrary_steps, units) if return_sequences=True.
(batch, units) if return_sequences=False.
The memory states will have a size of "units".

How to give the 1D input to Convolutional Neural Network(CNN) using Keras?

I'm solving a regression problem with Convolutional Neural Network(CNN) using Keras library. I have gone through many examples but failed to understand the concept of input shape to 1D Convolution
This my data set, 1 target variable with 3 raw signals.
For visualization the 5 segments of sensor signal are shown here, each segment has its own meaning
I want to give segment wise sensor values as input to the 1D Convolution layer but problem is that segments are of varibale length.
This is my CNN architecture
I tired to build my CNN model but confused
model = Sequential()
model.add(Conv1D(5, 7, activation='relu',input_shape=input_shape))
model.add(MaxPooling1D(pool_length=4))
model.add(Conv1D(4, 7, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
So, How can I give input to Conv1D of CNN in Keras? OR should I set fixed size input to Conv1D? but how?
My understanding is that the input_shape should be (time_steps, n_features), where time_steps would be the length of the segments (sequence of sensor signals) and n_features the number of channels (3 in your case, as you have 3 different sensors).
Therefore, the input to your network should have 3 dimensions (batch, steps, channels), where batch is the different segments.
I've only worked with fixed time_steps, If you really can't use segments with same length you might try to pad them with zeros.
On the Keras Documentation they say that you may use (None, 3) as the input_shape for variable-length sequences of 3-dimensional vectors, but I never used this way.

Categories