I am using keras tuner to optimize hyperparameters: hidden layers, neurons, activation function, and learning rate. I have time series regression problem with 31 inputs, 32 outputs with N number of data samples.
My original X_train shape is (N,31) and Y_train shape is (N,32). I transform it to work for keras shape and I reshape X_train and Y_train as following:
X_train.shape: (N,31,1)
Y_train.shape: (N,32).
In the above code, X_train.shape(1) is 31 and Y_train.shape(1) is 32. When I used hyperparameter tuning, it says ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 20).
Following Error exists:
What I am missing and what is its issues.
LSTM layers expects a 3D tensor input with the shape [batch, timesteps, feature]. Since you are using number of layers are a tuning parameter along with LSTM layers, when the number of LSTM layers is 2 and above, the LSTM layers after the first LSTM layer will also expect a 3D tensor as input which means that you will need to add the 'return_sequences=True' parameter to the setup so that the output tensor from previous LSTM layer has ndim=3 (i.e. batch size, timesteps, hidden state) which is fed into the next LSTM layer.
Related
I'm trying to fit a tf.data.Dataset as follows:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
INPUT_NEURONS = 10
OUTPUT_NEURONS = 1
features = tf.random.normal((1000, INPUT_NEURONS))
labels = tf.random.normal((1000, OUTPUT_NEURONS))
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
def build_model():
model = keras.Sequential(
[
layers.Dense(3, input_shape=[INPUT_NEURONS]),
layers.Dense(OUTPUT_NEURONS),
]
)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
return model
model = build_model()
model.fit(dataset, epochs=2, verbose=2)
However, I'm getting the following error:
ValueError: Input 0 of layer sequential is incompatible with the
layer: expected axis -1 of input shape to have value 10 but received
input with shape (10, 1)
The model.summary() looks good though:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 3) 33
_________________________________________________________________
dense_1 (Dense) (None, 1) 4
=================================================================
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________
Is Keras Model fit() actually suitable to tf.data.Dataset? If so, what I'm doing wrong here?
as far I know training using batches is optional, a hyperparameter to
use or not during the model development
Not exactly, not optional. TF-Keras is designed to work with batches. First dimension in the summary always corresponds to batch_size, and None indicates that any batch_size is accepted by the model.
Most of the times you want your model to accept any batch size. Well, if you use stateful LSTMs then you want to a define static batch_size.
After you put your data into tf.data.Dataset they would not have batch dimension specifically:
dataset.element_spec
>> (TensorSpec(shape=(10,), dtype=tf.float32, name=None),
TensorSpec(shape=(1,), dtype=tf.float32, name=None))
And when using tf.data, batch_size in Model.fit() is ignored, so batching should be done manually. More specifically, you may not know how many elements a tf.data.Dataset contains everytime.
In this situation it does not make sense (I'll explain) to batch after dataset creation:
dataset.batch(3).element_spec
>> (TensorSpec(shape=(None, 10), dtype=tf.float32, name=None),
TensorSpec(shape=(None, 1), dtype=tf.float32, name=None))
tf.data is generally used with mid-large size datasets so batching it after creation will allow vectorized transformations. Consider these scenarios:
You have 5M rows of signal data want to apply fft. If you don't batch it before the the fft process, it will apply one by one.
You have (100K) images dataset. You want to apply some transformations or some operations. Batched dataset will allow faster and vectorized transformations and save a lot of time.
I am learning the LSTM model to fit the data set to the multi-class classification, which is eight genres of music, but unsure about the input shape in the Keras model.
I've followed the tutorials here:
How to reshape input data for LSTM model
Multi-Class Classification Tutorial with the Keras Deep Learning Library
Sequence Classification with LSTM Recurrent Neural Networks in Python with Keras
My data is like this:
vector_1,vector_2,...vector_30,genre
23.5 20.5 3 pop
.
.
.
(7678)
I transformed my data shape into (7678,1,30), which is 7678 pieces of music, 1 timestep, and 30 vectors. For the music genre, I used train_labels = pd.get_dummies(df['genre'])
Here is my model:
# build a sequential model
model = Sequential()
# keras convention to use the (1,30) from the scaled_train
model.add(LSTM(32,input_shape=(1,30),return_sequences=True))
model.add(LSTM(32,return_sequences=True))
model.add(LSTM(32))
# to avoid overfitting
model.add(Dropout(0.3))
# output layer
model.add(Dense(8,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
Fitting the model
model.fit(scaled_train,train_labels,epochs=5,validation_data=(scaled_validation,valid_labels))
But when trying to fit the model, I got the error ValueError: Shapes (None, 8) and (None, 1, 8) are incompatible. Is there anything I did wrong in the code? Any help is highly appreciated.
The shape of my data
print(scaled_train.shape)
print(train_labels.shape)
print(scaled_validation.shape)
print(valid_labels.shape)
(7678, 1, 30)
(7678, 8)
(450, 30)
(450, 8)
EDIT
I've tried How to stack multiple lstm in keras?
But still, get the error ValueError: Input 0 of layer sequential_21 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 30]
As the name suggests, return_sequences=True will return a sequence (with a time step), That's why your output shape is (None, 1, 8): the time step is maintained. It doesn't flatten automatically when it goes through the dense layer. Try:
model = Sequential()
model.add(LSTM(32,input_shape=(1,30),return_sequences=False))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(8,activation='softmax'))
I guess this doesn't happen if you uncomment the second LSTM layer?
When I tried to experiment with CNN pruning I was stopped right in the beginning because I couldn't explain the weight dimensions to myself.
The CNN has the following structure (exported from model.layers()):
conv2d (64 filters with filter dimension 5x5)
max_pooling2d
dropout
conv2d (128 filters with filter dimension 5x5)
max_pooling2d
flatten
dense (128 units)
dense (39 classes)
The corresponding weights have the following dimensions (from .get_weights()):
conv2d: shape(5,5,1,64)
max_pooling2d: shape(64,)
dropout: shape(5,5,64,128)
conv2d: shape(128,)
max_pooling2d: shape(6272,128)
flatten: shape(128,)
dense: shape(128,39)
dense: shape(39,)
Please have a look at the Conv2D layers and their parameters and dimensions. The first Conv2D layer (conv2d: shape(5,5,1,64)) seems to have an explainable number of weights: 5 x 5 (filter size) and 64 filters.
What is unclear to me is why the second Conv2D layer (conv2d: shape(128,)) only has 128 entries in the weights array. The dropout layer before (dropout: shape(5,5,64,128)) seems to have the weights dimensions I would expect the Con2D layer to have.
I have the following code which works on pre-trained VGG model but fails on ResNet and Inception model.
vgg_model = keras.applications.vgg16.VGG16(weights='imagenet')
type(vgg_model)
vgg_model.summary()
model = Sequential()
for layer in vgg_model.layers:
model.add(layer)
Now, changing the model to ResNet as follows:
resnet_model=keras.applications.resnet50.ResNet50(weights='imagenet')
type(resnet_model)
resnet_model.summary()
model = Sequential()
for layer in resnet_model.layers:
model.add(layer)
gives the following error:
ValueError: Input 0 is incompatible with layer res2a_branch1: expected axis -1 of input shape to have value 64 but got shape (None, 56, 56, 256)
The problem is due to the fact that unlike VGG, Resnet does not have a sequential architecture (e.g. some layers are connected to more than one layers, there are skip connections, etc.). Therefore you cannot iterate over the layers in the model one after another and connect each layer to the previous one (i.e. sequentially). You can plot the architecture of the model using plot_model() to have a better understanding of this point.
I am trying to use an LSTM for multi-class classification of time series data.
The training set has dimensions (390, 179), i.e. 390 objects with 179 time steps each.
There are 37 possible classes.
I would like to use a Keras model with just an LSTM and activation layer to classify input data.
I also need the hidden states for all the training data and test data passed through the model, at every step of the LSTM (not just the final state).
I know return_sequences=True is needed, but I'm having trouble getting dimensions to match.
Below is some code I've tried, but I've tried a ton of other combinations of calls from a motley of stackexchange and git issues. In all of them I get some dimension mismatch or another.
I don't know how to extract the hidden state representations from the model.
We have X_train.shape = (390, 1, 179), Y_train.shape = (390, 37) (one-shot binary vectors)/.
n_units = 8
n_sequence = 179
n_class = 37
x = Input(shape=(1, n_sequence))
y = LSTM(n_units, return_sequences=True)(x)
z = Dense(n_class, activation='softmax')(y)
model = Model(inputs=[x], outputs=[y])
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(X_train, Y_train, epochs=100, batch_size=128)
Y_test_predict = model.predict(X_test, batch_size=128)
This is what the above gives me:
ValueError: A target array with shape (390, 37) was passed for an output of shape (None, 1, 37) while using as loss 'categorical_crossentropy'. This loss expects targets to have the same shape as the output.
You input shape should like this: (samples, timesteps, features)
Where samples are how many sequences you have, timesteps how long are your sequences, and features how many input you wanna input in one timestep.
If you set return_sequences=True, your label array should have the shape of (samples, timesteps, output features).
There didn't seem to be any way to build a working trainable model while also returning the hidden states with return_sequences=True.
The fix I found was to build a predictor model and train it, and save the weights. Then I built a new model which ended with my LSTM layer, and fed it the trained weights. So, using return_sequences=True, I was able to predict on new data and get the data's representations at each hidden state.