I'm making a simple image classification in keras and I used MaxPooling2D to reduce image sizes. Recently I learned about strides and I want to implement them but I run into errors. Here's a piece of code which gives errors:
early_stopping = EarlyStopping(monitor = 'val_loss',min_delta = 0.001, patience = 20, restore_best_weights = True)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(512, (2, 2),input_shape=(X[0].shape), strides = 2, data_format='channels_first', activation = 'relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Conv2D(512, (3, 3), data_format='channels_first',activation = 'relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3)))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(128, (3, 3), data_format='channels_first',activation = 'relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(4, 4)))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
opt = keras.optimizers.Adam(learning_rate=0.0005)
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])
h= model.fit(trainx, trainy, validation_data = (valx, valy), batch_size=64, epochs=80, callbacks = [early_stopping], verbose = 0)
Here's the error:
ValueError: Negative dimension size caused by subtracting 4 from 2 for '{{node max_pooling2d_35/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 4, 4, 1], padding="VALID", strides=[1, 4, 4, 1]](Placeholder)' with input shapes: [?,128,2,46].
when I remove 'strides = 2' everything works just fine. Why is strides option causing input shape error and how can I prevent it? I couldn't find any info about that.
Stride is how much a kernel is shifted every time. A stride of size 2 essentially cuts the dimensions of the input block in half along each axis. Seems like you have an image of size 128 by 2 at some point due to your convolutions and strides. Of course you can't place a 4 x 4 filter on it since the dimension is only 2 on one axis.
You can use padding here to pad the data, I believe with 0s, to bring the dimensions up 128 by 4 and avoid the error.
Related
I'm trying to train a siamese CNN model for deepfake detection. The model takes pairs of 3D images. Each pair consists of a 3D face image and a 3D background block image. The 3D image is formed by stacking images across multiple frames where the number of stacked frames is referred to as depth. Each 3D image has the shape (height, width, depth, channels=3) and dtype=float32. The images are normalized to have values between 0.0 and 1.0. I use a custom data generator to generate batches of data. The data generator is tested and provides the data as expected. Each generated batch consists of a tuple of ([input1, input2], output) where each input has shape (batch_size, img_height, img_width, img_depth, 3) and each output has shape (batch_size, 1) because it's a binary classification problem and the output is either 0 (for real) or 1 (for fake).
The assumption is for real videos, the noise patterns are the same for both the face and the background. However, for fake videos the face is manipulated while the background isn't. So the noise patterns will be different. The objective is to train a siamese network to extract features related to noise patterns from the face and background and calculate a distance score (between 0 and 1). For real videos the distance score should be close to 0. And for fake videos the distance should be close to 1.
My base model is a 3D CNN model with the following architecture:
def create_3D_CNN(input_shape):
model = Sequential()
# 1
model.add(Conv3D(8, kernel_size=(3, 3, 3), padding='same', input_shape=input_shape))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool3D((2, 2, 1), strides=(2, 2, 1), padding='same'))
# 2
model.add(Conv3D(16, kernel_size=(3, 3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool3D((2, 2, 2), strides=(2, 2, 2), padding='same'))
# 3
model.add(Conv3D(32, kernel_size=(3, 3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool3D((2, 2, 2), strides=(2, 2, 2), padding='same'))
# 4
model.add(Conv3D(64, kernel_size=(3, 3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool3D((2, 2, 2), strides=(2, 2, 2), padding='same'))
# 5
model.add(Conv3D(128, kernel_size=(3, 3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool3D((2, 2, 2), strides=(2, 2, 2), padding='same'))
# final
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(1024))
model.add(BatchNormalization())
model.add(Activation('relu'))
return model
It extracts 1024 features from each input sample. Because it's a siamese network, there are 2 inputs which are processed in parallel (face and background) by the same 3D CNN using shared weights. Each input is converted to a vector of 1024 features. Assume the two vectors are called a and b.
The two vectors are fed to an output layer which calculates a distance value score between 0 and 1 based on the Manhattan distance. The equation of the output layer is as follows:
predicted_output = 1 - exp(-sum|a_i - b_i|)
When a and b are smilar, the predicted_output will be close to 0. When a and b are very different, the predicted_output will be close to 1.
Here is the code for the rest of the model:
def manhatten_distance(vects):
x, y = vects
return 1 - K.exp(-K.sum(K.abs(x - y), axis=1, keepdims=True))
def man_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0], 1)
input_shape = train_gen[0][0][0][0].shape
print('input_shape =', input_shape)
#-------------- defining layers using functional api ----------------------
# inputs
input_face = Input(shape=input_shape)
input_back = Input(shape=input_shape)
# feature extraction (shared weights)
CNN_3D = create_3D_CNN(input_shape)
features_face = CNN_3D(input_face)
features_back = CNN_3D(input_back)
# distance between the 2 feature vectors
distance = Lambda(manhatten_distance, output_shape=man_dist_output_shape)([features_face, features_back])
#---------------- creating final model ---------------------------
model = Model(inputs=[input_face, input_back], outputs=distance)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
However, when I train the model, the accuracy is always close to 50% and the loss (binary_crossentropy) doesn't decrease below 7.8. When I try to make predictions after training I always get the same value.
What could be the reason for this?
Update:
To narrow down the issue I think the problem has to do with the Manhattan distance layer.
I replaced it with a "concatenate" layer and added another layer for output with 1 neuron and sigmoid activation and the model seems to be learning.
Why would the Manhattan distance layer prevent the model from learning?
Images that I would like to use to train the network are about the size of 4000px*3000px and about 40k of them, sorted in 250 classes.
I have made a CNN shown below:
model = keras.Sequential([
layers.Input((imgHeight, imgWidth, 1)),
layers.Conv2D(16, 3, padding = 'same'), # filters, kernel_size
layers.Conv2D(32, 3, padding = 'same'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(250),
])
How do I figure out what layers.Conv2D(*16*, ...), a value I need?
How do I figure out what layers.Dense(*250*), a value I need?
Because I can't start the training process, I'm running out of memory.
The output shape of the Flatten() layer is 96 Million, and so the final dense layer of your model has 24 Billion parameters, this is why you are running out of memory. There are some steps you can take to fix this issue:
Try resizing your images to a smaller shape, if 4000x3000x1 isn't necessary, 160x160x1 would be a good choice.
Try using more Conv2D layers followed by a MaxPool2D layer to decrease the size of the input, and then finally at the end, use a Flatten layer followed by a Dense layer.
For example:
model = keras.Sequential([
layers.Input((160, 160, 1)),
layers.Conv2D(32, 3, padding = 'same'),
layers.Conv2D(32, 3, padding = 'same'),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, 3, padding = 'same'),
layers.Conv2D(64, 3, padding = 'same'),
layers.MaxPooling2D((2,2)),
layers.Conv2D(128, 3, padding = 'same'),
layers.Conv2D(128, 3, padding = 'same'),
layers.MaxPooling2D((2,2)),
layers.Conv2D(256, 3, padding = 'same'),
layers.Conv2D(256, 3, padding = 'same'),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(512),
layers.Dense(250),
])
This type of architecture will work well if you are doing a classification task, and will not run out of memory.
I am currently trying to build a CRNN with Keras. When I try to reshape the input size, I had some trouble finding the correct dimension for my LSTM. After some debugging, I found a field in my model object called output_shape whose value was (3,1,244) and I tried to pass it as a 2D array with (3,224). Everything worked fine, but did I do correctly? What is the math behind this and what can I do next time to discover this size without debugging?
def CRNN(blockSize, blockCount, inputShape, trainGen, testGen, epochs):
model = Sequential()
# Conv Layer
channels = 32
for i in range(blockCount):
for j in range(blockSize):
if (i, j) == (0, 0):
conv = Conv2D(channels, kernel_size=(5, 5),
input_shape=inputShape, padding='same')
else:
conv = Conv2D(channels, kernel_size=(5, 5), padding='same')
model.add(conv)
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.15))
if j == blockSize - 2:
channels += 32
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
model.add(Dropout(0.15))
# Feature aggregation across time
model.add(Reshape((3, 224)))
# LSTM layer
model.add(Bidirectional(LSTM(200), merge_mode='ave'))
model.add(Dropout(0.5))
# Linear classifier
model.add(Dense(4, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy']) # F1?
model.fit_generator(trainGen,
validation_data=testGen, steps_per_epoch = trainGen.x.size // 20,
validation_steps = testGen.x.size // 20,
epochs=epochs, verbose=1)
return model
# Function call
model = CRNN(4, 6, (140, 33, 1), trainGen, testGen, 1)
I am trying to detect the single pixel location of a single object in an image. I have a keras CNN regression network with my image tensor as the input, and a 3 item vector as the output.
First item: Is a 1 (if an object was found) or 0 (no object was found)
Second item: Is a number between 0 and 1 which indicates how far along the x axis is the object
Third item: Is a number between 0 and 1 which indicates how far along the y axis is the object
I have trained the network on 2000 test images and 500 validation images, and the val_loss is far less than 1, and the val_acc is best at around 0.94. Excellent.
But then when I predict the output, I find the values for all three output items are not between 0 and 1, they are actually between -2 and 3 approximately. All three items should be between 0 and 1.
I have not used any non-linear activation functions on the output layer, and have used relus for all non-output layers. Should I be using a softmax, even though it is non-linear? The second and third items are predicting the x and y axis of the image, which appear to me as linear quantities.
Here is my keras network:
inputs = Input((256, 256, 1))
base_kernels = 64
# 256
conv1 = Conv2D(base_kernels, 3, activation='relu', padding='same', kernel_initializer='he_normal')(inputs)
conv1 = BatchNormalization()(conv1)
conv1 = Conv2D(base_kernels, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv1)
conv1 = BatchNormalization()(conv1)
conv1 = Dropout(0.2)(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
# 128
conv2 = Conv2D(base_kernels * 2, 3, activation='relu', padding='same', kernel_initializer='he_normal')(pool1)
conv2 = BatchNormalization()(conv2)
conv2 = Conv2D(base_kernels * 2, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv2)
conv2 = BatchNormalization()(conv2)
conv2 = Dropout(0.2)(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
# 64
conv3 = Conv2D(base_kernels * 4, 3, activation='relu', padding='same', kernel_initializer='he_normal')(pool2)
conv3 = BatchNormalization()(conv3)
conv3 = Conv2D(base_kernels * 4, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv3)
conv3 = BatchNormalization()(conv3)
conv3 = Dropout(0.2)(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
flat = Flatten()(pool3)
dense = Dense(256, activation='relu')(flat)
output = Dense(3)(dense)
model = Model(inputs=[inputs], outputs=[output])
optimizer = Adam(lr=1e-4)
model.compile(optimizer=optimizer, loss='mean_absolute_error', metrics=['accuracy'])
Can anyone please help? Thanks! :)
Chris
The sigmoid activation produces outputs between zero and one, so if you use it as activation of your last layer(the output), the network's output will be between zero and one.
output = Dense(3, activation="sigmoid")(dense)
I trained my CNN, with gray scale images of size 150x150, and the training went without having any errors, however, whenever I try to run the model.predict() function I get this error :
expected convolution2d_input_1 to have 4 dimensions, but got array with shape (150, 150, 1)
even though I do the exact same preprocessing for the images I pass to the predict function as the images I used to train the CNN. and they have a size of 150x150x1 just like my input shape in the CNN and just like the shape of the images I used to train the CNN with.
here is my CNN:
model = Sequential()
model.add(Conv2D(32, 3, 3, input_shape=(150, 150, 1), activation =
'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, 3, 3, activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, 3, 3, activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'softmax'))
The input is of size 150x150x1, and the expected input my CNN says it needs is (None, 150, 150, 1)
I have been trying to solve this issue for days now yet no luck whatsoever.
If you are trying to make a prediction on a single image, you should add one dimension and then make the prediction like this:
import numpy as np
img = np.expand_dims(img, axis=0)
prediction = model.predict(img)