I am building an LSTM-based Deep Q-learning Network with Python, Keras, and Tensorflow and have run into the following problem. After I have created the network with a given batch_input_shape and try to fit the network to data of that shape, I receive the following error:
WARNING:tensorflow:Model was constructed with shape (64, 1, 10) for
input Tensor("lstm_34_input:0", shape=(64, 1, 10), dtype=float32), but
it was called on an input with incompatible shape (32, 1, 10).
I have created the following toy example to simply demonstrate the code which causes the problem.
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.python.keras.layers import Flatten
from tensorflow.keras.layers import Dense, Dropout, LSTM
from tensorflow.keras.optimizers import Adam
# Hyperparameters
action_space = range(0, 10)
input_length = 10
batch_size = 64
timesteps = 1
learning_rate = 0.0001
# Create random input variables
state = np.random.randint(0, 100, size=(batch_size, timesteps, input_length))
target = np.random.randint(0, 100, size=(batch_size, len(action_space)))
# Build the model
model = Sequential()
model.add(LSTM(10, batch_input_shape=(batch_size, timesteps, input_length), return_sequences=True, activation="tanh",
recurrent_dropout=0, stateful=False))
model.add(LSTM(10, activation="tanh", return_sequences=True, recurrent_dropout=0, stateful=False))
model.add(Flatten())
model.add(Dense(len(action_space), activation="relu"))
model.compile(loss="mean_squared_error", optimizer=Adam(lr=learning_rate))
# Fit the model
model.fit(state, target, epochs=1, verbose=1)
This creates the error seen above.
My understanding is that the input layer should expect to receive a batch of shape (64, 1, 10) and we pass it this shape. However, the input layer appears to receive a shape of (32, 1, 10). We can verify that state.shape is (64, 1, 10) as expected so at some stage there is a reshaping of this input, or perhaps the error refers to an input to the hidden or output layer?
Any help would be greatly appreciated.
Update
I am using Tensorflow GPU version 2.3.0
and Keras version 2.3.1
For anyone trying to find a way to implement batch_input_shape in order to use a stateful LSTM, the answer comes from Simon's comment above:
add batch_size=batch_size to model.fit
You can also add it to model.predict() for a DQN.
Related
I'm trying to fit a LSTM-model to my data with a Masking Layer in front and I get this error:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 4 for '{{node binary_crossentropy/weighted_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](Cast)' with input shapes: [128,4].
This is my code:
from tensorflow.keras.layers import LSTM, Dense, BatchNormalization, Masking
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Nadam
import numpy as np
if __name__ == '__main__':
# define stub data
samples, timesteps, features = 128, 4, 99
X = np.random.rand(samples, timesteps, features)
Y = np.random.randint(0, 2, size=(samples))
# create model
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(None, 99)))
model.add(LSTM(100, return_sequences=True))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))
optimizer = Nadam(learning_rate=0.0001)
loss = BinaryCrossentropy(from_logits=False)
model.compile(loss=loss, optimizer=optimizer)
# train model
model.fit(
X,
Y,
batch_size=128)
I see from this related post, that I can't use one-hot encoded labels, but my labels are not one-hot encoded.
Also, when I remove the masking layer, training works.
From my understanding one sample consists of 4 timesteps with 99 features here. The shape of X is therefore (128,4,99)
Therefore, I only have to provide one label for each sample, the shape of Y therefore being (128,)
But it seems like the dimensions of X and or Y are not correct, as tensorflow wants to change its dimensions?
I have tried providing a label per timestep of each sample (Y = np.random.randint(0, 2, size=(samples, timesteps)), with the same result.
Why does adding the masking layer introduce this error? And how can I keep the masking layer without getting the error?
System Information:
Python version: 3.9.5
Tensorflow version: 2.5.0
OS: Windows
I don't think the problem is the Masking layer. Since you set the parameter return_sequences to True in the LSTM layer, you are getting a sequence with the same number of time steps as your input and an output space of 100 for each timestep, hence the shape (128, 4, 100), where 128 is the batch size. Afterwards, you apply a BatchNormalization layer and finally a Dense layer resulting in the shape (128, 4, 1). The problem is your labels have a 2D shape (128, 1) and your model has a 3D output due to the return_sequences parameter. So, simply setting this parameter to False should solve your problem. See also this post.
Here is a working example:
from tensorflow.keras.layers import LSTM, Dense, BatchNormalization, Masking
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Nadam
import numpy as np
if __name__ == '__main__':
# define stub data
samples, timesteps, features = 128, 4, 99
X = np.random.rand(samples, timesteps, features)
Y = np.random.randint(0, 2, size=(samples))
# create model
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(None, 99)))
model.add(LSTM(100, return_sequences=False))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))
optimizer = Nadam(learning_rate=0.0001)
loss = BinaryCrossentropy(from_logits=False)
model.compile(loss=loss, optimizer=optimizer)
# train model
model.fit(
X,
Y,
batch_size=128)
This question already has an answer here:
ValueError: logits and labels must have the same shape ((None, 4) vs (None, 1))
(1 answer)
Closed 1 year ago.
I have a multi-label classification problem that I am trying to solve with Neural Network using Tensorflow 2.
The problem - I am trying to predict a cause and its corresponding severity. I can have n number of causes and each of the causes can have m possible severity.
Let's say for simplicity
number of causes = 2
number of each causes possible severity = 2
So we essentially have 4 possible outputs
We also have 4 possible input features
I wrote below code -
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras import Model
from tensorflow.keras.callbacks import ModelCheckpoint
def get_model_multilabel(n_inputs, n_outputs):
opt = tf.keras.optimizers.SGD(lr=0.01, momentum=0.9)
model = tf.keras.models.Sequential([
#input layer
Dense(10, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'),
## two hidden layer
Dense(10, kernel_initializer='he_uniform', activation='relu'),
Dropout(0.2),
Dense(5, kernel_initializer='he_uniform', activation='relu'),
Dropout(0.2),
## output layer
Dense(n_outputs, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
n_inputs = 4 # because we have 4 features
n_outputs = 4 # because we have 4 labels
mlmodel = get_model_multilabel(n_inputs, n_outputs)
## train the model
mlmodel.fit(X_train,y_train, epochs=50, batch_size=32, validation_split = 0.2, callbacks=callbacks_list)
X_train.shape is (1144, 4) and
y_train.shape is (1144,)
Note the sigmoid activation in the last layer and the binary_crossentropy loss function as I am trying to model a multi-label classification problem. Reference How do I implement multilabel classification neural network with keras
When I train this, it throws error
ValueError: logits and labels must have the same shape ((None, 4) vs (None, 1))
Not sure what am I missing here. Please suggest.
Your Y_train is incorrect in shape it should be (1144,n_outputs) , instead it is (1144,) , which if reshaped is (1144,1) . Your code dosent know the number of samples so it becomes (None,1) . It must match with output shape or (None,4). You have loaded the data incorrectly.
I am trying to use the transfer learning example on keras.applications using VGG19. I am trying to train on the cifar10 dataset, so 10 classes. My model is (conceptually) simple as it's just VGG 19 minus the top three layers and then some extra layers that are trainable.
import tensorflow as tf
from keras.utils import to_categorical
from keras.applications import VGG19
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
#%%
# Specify input and number of classes
input_tensor = Input(shape=(32, 32, 3))
num_classes=10
#Load the data (cifar100), if label mode is fine then 100 classes
(X_train,y_train),(X_test,y_test)=tf.keras.datasets.cifar10.load_data()
#One_Hot_encode y data
y_test=to_categorical(y_test,num_classes=num_classes,dtype='int32')
y_train=to_categorical(y_train,num_classes=num_classes,dtype='int32')
#%%
# create the base pre-trained model
base_model = VGG19(weights='imagenet', include_top=False,
input_tensor=input_tensor)
# Add a fully connected layer and then a logistic layer
x = base_model.output
# # let's add a fully-connected layer
x = Dense(1024, activation='relu',name='Fully_Connected')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(num_classes, activation='softmax',name='Logistic')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
# train the model on the new data for a few epochs
model.fit(X_train,y_train,epochs=10)
#%%
model.evaluate(X_test,y_test)
Now when I try to train using X_train [dimensions of (50000, 32, 32, 3)] and y_test (dimensions of (50000,10),
I get an error:
ValueError: Error when checking target: expected Logistic to have 4
dimensions, but got array with shape (50000, 10)
So for some reason the model isn't realizing that its output shape should be a 1x10 vector with one-hot encoding for the 10 classes.
How can I make it so that the dimensions agree? I don't fully understand the dimensions of output that keras is expecting here. When I do model.summary() the Logistic layer yields that the output shape should be (None, 1, 1, 10), which when flattened should just give a
VGG19 without the top layers does not return a fully-connected layer and instead returns a 2D feature space (output of a Conv2D/max pooling2d I believe). You'll probably want to place a flatten after the VGG, that'd be the best practical choice, as it will make your output shape (None,10).
Otherwise, you could do
y_train = np.reshape(y_train, (50000,1,1,10))
Ciao,
I'm working with CNN 1d on Keras but I have tons of troubles with the input shape variable.
I have a time series of 100 timesteps and 5 features with boolean labels. I want to train a CNN 1d that works with a sliding window of length 10. This is a very simple code I wrote:
from keras.models import Sequential
from keras.layers import Dense, Conv1D
import numpy as np
N_FEATURES=5
N_TIMESTEPS=10
X = np.random.rand((100, N_FEATURES))
Y = np.random.randint(0,2, size=100)
# CNN
model.Sequential()
model.add(Conv1D(filter=32, kernel_size=N_TIMESTEPS, activation='relu', input_shape=N_FEATURES
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
My problem here is that I get the following error:
File "<ipython-input-2-43966a5809bd>", line 2, in <module>
model.add(Conv1D(filter=32, kernel_size=10, activation='relu', input_shape=N_FEATURES))
TypeError: __init__() takes at least 3 arguments (3 given)
I've also tried by passing to the input_shape the following values:
input_shape=(None, N_FEATURES)
input_shape=(1, N_FEATURES)
input_shape=(N_FEATURES, None)
input_shape=(N_FEATURES, 1)
input_shape=(N_FEATURES, )
Do you know what's wrong with the code or in general can you explain the logic behind in input_shape variable in Keras CNN?
The crazy thing is that my problem is the same of the following:
Keras CNN Error: expected Sequence to have 3 dimensions, but got array with shape (500, 400)
But I cannot solve it with the solution given in the post.
The Keras version is 2.0.6-tf
Thanks
This should work:
from keras.models import Sequential
from keras.layers import Dense, Conv1D
import numpy as np
N_FEATURES=5
N_TIMESTEPS=10
X = np.random.rand(100, N_FEATURES)
Y = np.random.randint(0,2, size=100)
# Create a Sequential model
model = Sequential()
# Change the input shape to input_shape=(N_TIMESTEPS, N_FEATURES)
model.add(Conv1D(filters=32, kernel_size=N_TIMESTEPS, activation='relu', input_shape=(N_TIMESTEPS, N_FEATURES)))
# If it is a binary classification then you want 1 neuron - Dense(1, activation='sigmoid')
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Please see the comments before each line of code. Moreover, the input shape that Conv1D expects is (time_steps, feature_size_per_time_step). The translation of that for your code is (N_TIMESTEPS, N_FEATURES).
I have a trained model which is created using Keras. On this model I want to apply transfer learning by freezing all but the last convolutional layer. However, when I fit the model after freezing the layers I notice that some of the freezed layers have different weights. How can I avoid this?
I tried to freeze the entire model with model.trainable = False but this also didn't work out.
I am using python 3.5.0, tensorflow 1.0.1 and Keras 2.0.3
Example script
import os
import timeit
import datetime
import numpy as np
from keras.layers.core import Activation, Reshape, Permute
from keras.layers.convolutional import Convolution2D, MaxPooling2D, UpSampling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.optimizers import Adam
from keras import models
from keras import backend as K
K.set_image_dim_ordering('th')
def conv_model(input_shape, data_shape, kern_size, filt_size, pad_size,\
maxpool_size, n_classes, compile_model=True):
"""
Create a small conv neural network
input_shape: input shape of the images
data_shape: 1d shape of the data
kern_size: Kernel size used in all convolutional2d layers
filt_size: Filter size of the first and last convolutional2d layer
pad_size: size of padding
maxpool_size: Pool size of all maxpooling2d and upsampling2d layers
n_classes: number of output classes
compile_model: True if the model should be compiled
output: Keras deep learning model
"""
#keep track of compilation time
start_time = timeit.default_timer()
model = models.Sequential()
# Add a noise layer to get a denoising autoencoder. This helps avoid overfitting
model.add(ZeroPadding2D(padding=(pad_size, pad_size), input_shape=input_shape))
#Encoding layers
model.add(Convolution2D(filt_size, kern_size, kern_size, border_mode='valid'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
model.add(UpSampling2D(size=(maxpool_size, maxpool_size)))
model.add(ZeroPadding2D(padding=(pad_size, pad_size)))
model.add(Convolution2D(filt_size, kern_size, kern_size, border_mode='valid'))
model.add(BatchNormalization())
model.add(Convolution2D(n_classes, 1, 1, border_mode='valid'))
model.add(Reshape((n_classes, data_shape), input_shape=(n_classes,)+input_shape[1:]))
model.add(Permute((2, 1)))
model.add(Activation('softmax'))
if compile_model:
model.compile(loss="categorical_crossentropy", optimizer='adam', metrics=["accuracy"])
print('Model compiled in {0} seconds'.format(datetime.timedelta(seconds=round(\
timeit.default_timer() - start_time))))
return model
if __name__ == '__main__':
#Create some random training data
train_data = np.random.randint(0, 10, 3*512*512*20, dtype='uint8').reshape(-1, 3, 512, 512)
train_labels = np.random.randint(0, 1, 7*512*512*20, dtype='uint8').reshape(-1, 512*512, 7)
#Get dims of the data
data_dims = train_data.shape[2:]
data_shape = np.prod(data_dims)
#Create initial model
initial_model = conv_model((train_data.shape[1], train_data.shape[2], train_data.shape[3]),\
data_shape, 3, 4, 1, 2, train_labels.shape[-1])
#Train initial model on first part of the training data
initial_model.fit(train_data[0:10], train_labels[0:10], verbose=2)
#Store initial weights
initial_weights = initial_model.get_weights()
#Create transfer learning model
transf_model = conv_model((train_data.shape[1], train_data.shape[2], train_data.shape[3]),\
data_shape, 3, 4, 1, 2, train_labels.shape[-1], False)
#Set transfer model weights
transf_model.set_weights(initial_weights)
#Set all layers trainable to False (except final conv layer)
for layer in transf_model.layers:
layer.trainable = False
transf_model.layers[9].trainable = True
print(transf_model.layers[9])
#Compile model
transf_model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=1e-4),\
metrics=["accuracy"])
#Train model on second part of the data
transf_model.fit(train_data[10:20], train_labels[10:20], verbose=2)
#Store transfer model weights
transf_weights = transf_model.get_weights()
#Check where the weights have changed
for i in range(len(initial_weights)):
update_w = np.sum(initial_weights[i] != transf_weights[i])
if update_w != 0:
print(str(update_w)+' updated weights for layer '+str(transf_model.layers[i]))
Once you compiled your model - you lost your previous weights, as they were resampled. You need to first transfer them, set weights to be not trainable and then compile it:
#Compile model
transf_model.set_weights(initial_weights)
#Set all layers trainable to False (except final conv layer)
for layer in transf_model.layers:
layer.trainable = False
transf_model.layers[9].trainable = True
transf_model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=1e-4),\
metrics=["accuracy"])
Otherwise - weights would change as they are resampled.
EDIT:
The model should be compiled after changes - because during compilation keras is setting all trainable / not trainable weights in a list which is not further changed.
You should Upgrade Keras to Keras v2.1.3
This issue is just solved and this very last feature of freezing BatchNormalization layers is now available in the recent release:
trainable attribute in BatchNormalization now disables the updates of the batch statistics (i.e. if trainable == False the layer will now run 100% in inference mode).
The Reason of error:
In the previous versions, the variance and mean parameters of BatchNormalization layers couldn't set untrainable and it didn't work, although you sat layer.trainable = False.
Now, it works!