My training data is an overlapping sliding window of users daily data. it's shape is (1470, 3, 256, 18):
1470 batches of 3 days of data, each day has 256 samples of 18 features each.
My targets shape is (1470,):
a label value for each batch.
I want to train an LSTM to predict a [3 days batch] -> [one target]
The 256 day samples is padded with -10 for days that were missing 256 sampels
I've written the following code to build the model:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout,Dense,Masking,Flatten
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import TensorBoard,ModelCheckpoint
from tensorflow.keras import metrics
def build_model(num_samples, num_features):
opt = RMSprop(0.001)
model = Sequential()
model.add(Masking(mask_value=-10., input_shape=(num_samples, num_features)))
model.add(LSTM(32, return_sequences=True, activation='tanh'))
model.add(Dropout(0.3))
model.add(LSTM(16, return_sequences=False, activation='tanh'))
model.add(Dropout(0.3))
model.add(Dense(16, activation='tanh'))
model.add(Dense(8, activation='tanh'))
model.add(Dense(1))
model.compile(loss='mse', optimizer=opt ,metrics=['mae','mse'])
return model
model = build_model(256,18)
model.summary()
Model: "sequential_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
masking_7 (Masking) (None, 256, 18) 0
_________________________________________________________________
lstm_14 (LSTM) (None, 256, 32) 6528
_________________________________________________________________
dropout_7 (Dropout) (None, 256, 32) 0
_________________________________________________________________
lstm_15 (LSTM) (None, 16) 3136
_________________________________________________________________
dropout_8 (Dropout) (None, 16) 0
_________________________________________________________________
dense_6 (Dense) (None, 16) 272
_________________________________________________________________
dense_7 (Dense) (None, 8) 136
_________________________________________________________________
dense_8 (Dense) (None, 1) 9
=================================================================
Total params: 10,081
Trainable params: 10,081
Non-trainable params: 0
_________________________________________________________________
I can see that the shapes are incompatible, but I can't figure out how to change the code to fit my problem.
Any help would be appreciated
Update: I've reshaped my data like so:
train_data.reshape(1470*3, 256, 18)
is that right?
I think you are looking for TimeDistributed(LSTM(...)) (source)
day, num_samples, num_features = 3, 256, 18
model = Sequential()
model.add(Masking(mask_value=-10., input_shape=(day, num_samples, num_features)))
model.add(TimeDistributed(LSTM(32, return_sequences=True, activation='tanh')))
model.add(Dropout(0.3))
model.add(TimeDistributed(LSTM(16, return_sequences=False, activation='tanh')))
model.add(Dropout(0.3))
model.add(Dense(16, activation='tanh'))
model.add(Dense(8, activation='tanh'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam' ,metrics=['mae','mse'])
model.summary()
Related
I've been trying to train a CNN model with facial data for creating emojies using facial expression.I'm actually new to machine learing. The code isn't actually my own but I keep getting this ValueError while trying to train the model.
ValueError: One of the dimensions in the output is <= 0 due to downsampling in conv2d. Consider increasing the input size. Received input shape [None, 100, 100, 1] which would produce output shape with a zero or negative value in a dimension.
The code which I'm trying to run is:
def cnn_model():
num_of_classes = get_num_of_classes()
model = Sequential()
model.add(Conv2D(32, (5,5), input_shape=(image_x, image_y, 1), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(10, 10), strides=(10, 10), padding='same'))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.6))
model.add(Dense(num_of_classes, activation='softmax'))
sgd = optimizers.SGD(lr=1e-2)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
filepath="cnn_model_keras.h5"
checkpoint1 = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint1]
from keras.utils import plot_model
plot_model(model, to_file='model.png', show_shapes=True)
return model, callbacks_list
num_of_classes value = 12
image_x,image_y = 100
I reorganized your code, you are defining a function to build the model, It would be better to keep only what is related to the model architecture and keep the callbacks and model plotting out of the function.
from keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, MaxPooling2D,Flatten,Dropout,Dense
from tensorflow.keras import optimizers
import numpy as np
def cnn_model():
image_x = 100
image_y = 100
model = Sequential()
model.add(Conv2D(32, (5,5), input_shape=(image_x, image_y, 1), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(10, 10), strides=(10, 10), padding='same'))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.6))
model.add(Dense(12, activation='softmax'))
sgd = optimizers.SGD(lr=1e-2)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
########################
# Model summary
model = cnn_model()
model.summary()
#################### TEST
input = np.ones((1,100,100,1))
print("Output:", model.predict(input))
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 96, 96, 32) 832
_________________________________________________________________
batch_normalization (BatchNo (None, 96, 96, 32) 128
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 10, 10, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 3200) 0
_________________________________________________________________
dense (Dense) (None, 1024) 3277824
_________________________________________________________________
batch_normalization_1 (Batch (None, 1024) 4096
_________________________________________________________________
dropout (Dropout) (None, 1024) 0
_________________________________________________________________
dense_1 (Dense) (None, 12) 12300
=================================================================
Total params: 3,295,180
Trainable params: 3,293,068
Non-trainable params: 2,112
_________________________________________________________________
2022-07-03 15:15:36.927515: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Output: [[0.06614503 0.10535268 0.07621874 0.08486015 0.08070944 0.08046351
0.06786356 0.06059184 0.10280456 0.05683669 0.12510006 0.09305366]]
I´m currently trying to make my first steps using Keras on top of Tensorflow to classify timeseries data. I was able to get a pretty simple model running but after some feedback it was recommended to me to use multiple GRU layers in a row and add the TimeDistributed wrapper around my Dense layers. Here is the model I was trying:
model = Sequential()
model.add(GRU(100, input_shape=(n_timesteps, n_features), return_sequences=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(TimeDistributed(Dense(units=100, activation='relu')))
model.add(TimeDistributed(Dense(n_outputs, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I am receiving the following error message when trying to fit the model with the input having a shape of (2357, 128, 11) (2357 samples, 128 timesteps, 11 features):
ValueError: Error when checking target: expected time_distributed_2 to have 3 dimensions, but got array with shape (2357, 5)
This is the output for model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru_1 (GRU) (None, 128, 100) 33600
_________________________________________________________________
gru_2 (GRU) (None, 128, 100) 60300
_________________________________________________________________
gru_3 (GRU) (None, 128, 100) 60300
_________________________________________________________________
gru_4 (GRU) (None, 128, 100) 60300
_________________________________________________________________
gru_5 (GRU) (None, 128, 100) 60300
_________________________________________________________________
gru_6 (GRU) (None, 128, 100) 60300
_________________________________________________________________
time_distributed_1 (TimeDist (None, 128, 100) 10100
_________________________________________________________________
time_distributed_2 (TimeDist (None, 128, 5) 505
=================================================================
Total params: 345,705
Trainable params: 345,705
Non-trainable params: 0
So what is the correct way to put multiple GRU layers in a row and add the TimeDistributed Wrapper to the following Dense layers. I will be very grateful for any helpful input
If you set return_sequences = False in your last layer of GRU, the code will work.
You only need to put return_sequences = True in case the output of a RNN is fed to an input again to a RNN, hence to preserve the time dimensionality space. When you set return_sequences = False, this means that the output will be only the last hidden state (instead of hidden state at every time step), and the time dimensionality will disappear.
That is why when you set return_sequnces = False, the output dimensionality decreases from N to N-1.
I want to train a model to predict one's emotion from the physical signals. I have a physical signal and using it as input feature;
ecg(Electrocardiography)
I want to use the CNN architecture to extract features from the data, and then use these extracted features to feed a classical "Decision Tree Classifier". Below, you can see my CNN aproach without the decision tree;
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(3, activation = 'softmax'))
I want to edit this code so that, in the output layer there will be working decision tree instead of model.add(Dense(3, activation = 'softmax')). I have tried to save the outputs of the last convolutional layer like this;
output = model.layers[-6].output
And when I printed out the output variable, result was this;
THE OUTPUT: Tensor("conv1d_56/Relu:0", shape=(?, 8971, 30),
dtype=float32)
I guess, the output variable holds the extracted features. Now, how can I feed my decision tree classifier model with this data which is stored in the output variable? Here is the decision tree from scikit learn;
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier(criterion = 'entropy')
dtc.fit()
How should I feed the fit() method? Thanks in advance.
To extract a vector of features that you can pass on to another algorithm, you need a fully connected layer before your softmax layer. Something like this will add in a 128 dimensional layer just before your softmax layer:
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation = 'softmax'))
If you then run model.summary() you can see the name of the layers:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_9 (Conv1D) (None, 17941, 15) 915
_________________________________________________________________
max_pooling1d_9 (MaxPooling1 (None, 8970, 15) 0
_________________________________________________________________
dropout_10 (Dropout) (None, 8970, 15) 0
_________________________________________________________________
batch_normalization_9 (Batch (None, 8970, 15) 60
_________________________________________________________________
conv1d_10 (Conv1D) (None, 8911, 30) 27030
_________________________________________________________________
max_pooling1d_10 (MaxPooling (None, 2227, 30) 0
_________________________________________________________________
dropout_11 (Dropout) (None, 2227, 30) 0
_________________________________________________________________
batch_normalization_10 (Batc (None, 2227, 30) 120
_________________________________________________________________
flatten_6 (Flatten) (None, 66810) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 8551808
_________________________________________________________________
dropout_12 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 3) 387
=================================================================
Total params: 8,580,320
Trainable params: 8,580,230
Non-trainable params: 90
_________________________________________________________________
Once your network has been trained you can create a new model where the output layer becomes 'dense_7' and it'll generate 128 dimensional feature vectors:
feature_vectors_model = Model(model.input, model.get_layer('dense_7').output)
dtc_features = feature_vectors_model.predict(your_X_data) # fit your decision tree on this data
In my last post linked here, it was said that I have to modify my model for it to be better. To quote the only answerer's comment to my questions (again, thank you, Sir):
The accuracy of prediction is a metric of how good your neural network architecture is and it also depends on your train/validation data. You will have to tune your neural network in such a way that you generalize well by adjusting the hyper parameters such as number of layers, type of layers, learning rate, optimizer etc. ...
I would like to know how I would do these mentioned. Or at the least, be pointed in the right direction. I am honestly both lost in theory and practice.
The only thing I have been able to do is to adjust the epoch above 100. I have also cleaned the images to be identified as much as I can.
Currently, here is how I create my model. It is only based on Tensorflow 2.0's tutorial.
import numpy as np
import tensorflow as tf
from tensorflow import keras
# Load and prepare the MNIST dataset. Convert the samples from integers to floating-point numbers:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
def createModel():
# Build the tf.keras.Sequential model by stacking layers.
# Choose an optimizer and loss function used for training:
model = tf.keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
model = createModel()
model.fit(x_train, y_train, epochs=102, validation_data=(x_test, y_test))
model.evaluate(x_test, y_test)
It gave out a validation accuracy of around .9800 for me. But its performance against images of handwritten characters I've extracted from documents is dismal. I would also like it to be extended such that it can also read other selected characters, but I guess that can be another question for another day.
Thanks!
You could have multiple layers of Convolution/ Max Pool at the beginning that would perform a feature extraction by scanning the image. After that you use a fully connected NN like you did before and a softmax.
You could create a model with a CNN that way:
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.models import Sequential
# Create the model
model = Sequential()
# Add the 1st Convolution/ max pool
model.add(Conv2D(40, kernel_size=5, padding="same",input_shape=(28, 28, 1), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# 2nd convolution / max pool
model.add(Conv2D(200, kernel_size=3, padding="same", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# 3rd convolution/ max pool
model.add(Conv2D(512, kernel_size=3, padding="valid", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# Reduce dimensions from 2d to 1d
model.add(Flatten())
model.add(Dense(units=100, activation='relu'))
# Add dropout to prevent overfitting
model.add(Dropout(0.5))
# Final fullyconnected layer
model.add(Dense(10, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Which returns the following model:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 28, 28, 40) 1040
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 40) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 14, 14, 200) 72200
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 200) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 10, 10, 512) 922112
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 32768) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 3276900
_________________________________________________________________
dropout_1 (Dropout) (None, 100) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 1010
=================================================================
Total params: 4,273,262
Trainable params: 4,273,262
Non-trainable params: 0
_________________________________________________________________
I want to apply CNN and LSTM on my data, I just choose a small set of data; My training data's size is (400,50)and my testing data is (200,50).
With only CNN model, it works without any errors, I just have many errors when adding the LSTM model:
model = Sequential()
model.add(Conv1D(filters=8,
kernel_size=16,
padding='valid',
activation='relu',
strides=1, input_shape=(50,1)))
model.add(MaxPooling1D(pool_size=2,strides=None, padding='valid', input_shape=(50,1))) # strides=None means strides=pool_size
model.add(Conv1D(filters=8,
kernel_size=8,
padding='valid',
activation='relu',
strides=1))
model.add(MaxPooling1D(pool_size=2,strides=None, padding='valid',input_shape=(50,1)))
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2)) # 100 num of LSTM units
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(TimeDistributed(Dense(256, activation='softmax')))
# # # 4. Compile model
print('########################### Compilation of the model ######################################')
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print(model.summary())
print('###########################Fitting the model ######################################')
# # # # # 5. Fit model on training data
x_train = x_train.reshape((400,50,1))
print(x_train.shape) # (400,50,1)
x_test = x_test.reshape((200,50,1))
print(x_test.shape) # (200,50,1)
model.fit(x_train, y_train, batch_size=100, epochs=100,verbose=0)
print(model.summary())
# # # # # 6. Evaluate model on test data
score = model.evaluate(x_test, y_test, verbose=0)
print (score)
This is the error:
Traceback (most recent call last):
File "CNN_LSTM_Based_Attack.py", line 156, in <module>
model.fit(x_train, y_train, batch_size=100, epochs=100,verbose=0)
File "/home/doc/.local/lib/python2.7/site-packages/keras/models.py", line 853, in fit
initial_epoch=initial_epoch)
File "/home/doc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1424, in fit
batch_size=batch_size)
File "/home/doc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1304, in _standardize_user_data
exception_prefix='target')
File "/home/doc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 127, in _standardize_input_data
str(array.shape))
ValueError: Error when checking target: expected time_distributed_1 to have 3 dimensions, but got array with shape (400, 256)
You can find here the whole summary for this model:(I am new with LSTM it is the first time that I use it).
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 35, 8) 136
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 17, 8) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 17, 8) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 10, 8) 520
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 5, 8) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 5, 8) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 5, 32) 5248
_________________________________________________________________
lstm_2 (LSTM) (None, 5, 32) 8320
_________________________________________________________________
lstm_3 (LSTM) (None, 5, 32) 8320
_________________________________________________________________
lstm_4 (LSTM) (None, 5, 32) 8320
_________________________________________________________________
lstm_5 (LSTM) (None, 5, 32) 8320
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 256) 8448
=================================================================
Total params: 47,632
Trainable params: 47,632
Non-trainable params: 0
_________________________________________________________________
When I replace this lines of code:
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2)) # 100 num of LSTM units
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(LSTM(32, return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
dropout=0.2,recurrent_dropout=0.2))
model.add(TimeDistributed(Dense(256, activation='softmax')))
With only this line:
model.add(LSTM(26, activation='tanh'))
Than it works very well.
I would be grateful if you could help me please.
So LSTM layers expect input in shape (Samples, Time steps, Features). When stacking LSTM you should return_sequences = True. This will give an output of shape (Samples, Time steps, units), thus allowing the stack to fit together - You should set return_sequences = False on the last LSTM-layer if you only want to predict one step ahead (i.e. the next value in the sequence/time series) - if you don't it will predict the same number of time steps as is in the input. You can of cause also predict a different number (e.g. given 50 past observations predict the next 10, but it is a little tricky in Keras).
In your case the Conv/MaxPool-layers output 5 "time steps" and you have return_sequences = True on the last LSTM-layer - so your "y" must have shape (Samples, 5, 256) - otherwise turn return_sequences = False on the last layer and don't use TimeDistributed, as you only predict one time step ahead.