Adding padding='same' argument to Conv2D layers broke the model - python

I created this model
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D, Input, Dense
from tensorflow.keras.layers import Reshape, Flatten
from tensorflow.keras import Model
def create_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10,optimizer='adam',
show_summary=True):
inputs = Input(input_shape)
x = Conv2D(filters=32, kernel_size=3, activation='relu', padding='same')(inputs)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=48, kernel_size=3, activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=64, kernel_size=3, activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dense(units=n_prediction*n_class, activation='softmax')(x)
outputs = Reshape((n_prediction,n_class))(x)
model = Model(inputs, outputs)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics= ['accuracy'])
if show_summary:
model.summary()
return model
I tried the model on MNIST dataset
import tensorflow as tf
import numpy as np
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
inputs = x_train
outputs = tf.keras.utils.to_categorical(y_train, num_classes=10)
outputs = np.expand_dims(outputs,1)
model = create_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10)
model.fit(inputs, outputs, epochs=10, validation_split=0.1)
but it failed to converge (stuck at 10% accuracy => same as random guessing). Yet when I remove the "padding='same'" argument from Conv2D layers, it works flawlessly:
def working_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10,optimizer='adam',
show_summary=True):
inputs = Input(input_shape)
x = Conv2D(filters=32, kernel_size=3, activation='relu')(inputs)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=48, kernel_size=3, activation='relu')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=64, kernel_size=3, activation='relu')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dense(units=n_prediction*n_class, activation='softmax')(x)
outputs = Reshape((n_prediction,n_class))(x)
model = Model(inputs, outputs)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics= ['accuracy'])
if show_summary:
model.summary()
return model
Anyone has any idea what problem this is?

Thank you for sharing, it was really interesting to me. So I wrote the code and tested several scenarios. Note that what I'm going to say is just my guest and I'm not sure about it.
My conclusion from those tests is that no padding or valid padding works because it produces (1, 1, 64) output shape for the last conv layer. But if you set the padding to same it will produce (3, 3, 64) and because the next layer is a big Dense layer, it will multiply the number of networks parameters by 9 (I expected to somehow result in overfitting) and it seems to make it much harder for the network to find the good values of the parameters. So I tried some different ways to reduce the output of last conv layer to (1, 1, 64) as below:
using one more conv layer + maxpooling
change the last maxpooling to pool_size of 4
using stride of 2 for one of conv layers
change the filters of last conv layer to 20
and they all worked well. even changing the dense units from 512 to 64 will help as well (note that even now you may get poor results with a little chance, because of bad initialization I guess).
Then I changed the shape of the last conv layer to (2, 2, 64) and the chance to get a good result (more than 90% accuracy) reduced (alot of time I've got 10% accuracy).
So it seems that a lot of parameters can confuse the model. But if you want to know why the network does not overfit, I have no answer for you.

Related

How do I correctly use Keras Embedding layer?

I have written the following multi-input Keras TensorFlow model:
CHARPROTLEN = 25 #size of vocab
CHARCANSMILEN = 62 #size of vocab
protein_input = Input(shape=(train_protein.shape[1:]))
compound_input = Input(shape=(train_smile.shape[1:]))
#protein layers
x = Embedding(input_dim=CHARPROTLEN+1,output_dim=128, input_length=maximum_amino_acid_sequence_length) (protein_input)
x = Conv1D(filters=32, padding="valid", activation="relu", strides=1, kernel_size=4)(x)
x = Conv1D(filters=64, padding="valid", activation="relu", strides=1, kernel_size=8)(x)
x = Conv1D(filters=96, padding="valid", activation="relu", strides=1, kernel_size=12)(x)
final_protein = GlobalMaxPooling1D()(x)
#compound layers
y = Embedding(input_dim=CHARCANSMISET+1,output_dim=128, input_length=maximum_SMILES_length) (compound_input)
y = Conv1D(filters=32, padding="valid", activation="relu", strides=1, kernel_size=4)(y)
y = Conv1D(filters=64, padding="valid", activation="relu", strides=1, kernel_size=6)(y)
y = Conv1D(filters=96, padding="valid", activation="relu", strides=1, kernel_size=8)(y)
final_compound = GlobalMaxPooling1D()(y)
join = tf.keras.layers.concatenate([final_protein, final_compound], axis=-1)
x = Dense(1024, activation="relu")(join)
x = Dropout(0.1)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1,kernel_initializer='normal')(x)
model = Model(inputs=[protein_input, compound_input], outputs=[predictions])
The inputs have the following shapes:
train_protein.shape
TensorShape([5411, 1500, 1])
train_smile.shape
TensorShape([5411, 100, 1])
I get the following error message:
ValueError: One of the dimensions in the output is <= 0 due to downsampling in conv1d. Consider increasing the input size. Received input shape [None, 1500, 1, 128] which would produce output shape with a zero or negative value in a dimension.
Is this due to the Embedding layer having the incorrect output_dim? How do I correct this? Thanks.
A Conv1D layer requires the input shape (batch_size, timesteps, features), which train_protein and train_smile already have. For example, train_protein consists of 5411 samples, where each sample has 1500 timesteps, and each timestep one feature. Applying an Embedding layer to them results in adding an additional dimension, which Conv1D layers cannot work with.
You have two options. You either leave out the Embedding layer altogether and feed your inputs directly to the Conv1D layers, or you reshape your data to be (5411, 1500) for train_protein and (5411, 100) for train_smile. You can use tf.reshape, tf.squeeze, or tf.keras.layers.Reshape to reshape the data. Afterwards you can use the Embedding layer as planned. And note that output_dim determines the n-dimensional vector to which each timestep will be mapped. See also this and this.

how to solve input tensor error in adopting pre-trained keras models

I am trying to adopt a pre-trained keras model as follow, but it requires an input to be a tensor. can anyone help to solve it?
from keras.applications.vgg19 import VGG19
inputs = layers.Input(shape = (32,32,4))
vgg_model = VGG19(weights='imagenet', include_top=False)
vgg_model.trainable = False
x = tensorflow.keras.layers.Flatten(name='flatten')(vgg_model)
x = tensorflow.keras.layers.Dense(512, activation='relu', name='fc1')(x)
x = tensorflow.keras.layers.Dense(512, activation='relu', name='fc2')(x)
x = tensorflow.keras.layers.Dense(1,name='predictions')(x)
new_model = tensorflow.keras.models.Model(inputs=inputs, outputs=x)
new_model.compile(optimizer='adam', loss='mean_squared_error',
metrics=['mae'])
error:
TypeError: Inputs to a layer should be tensors. Got: <keras.engine.functional.Functional object at 0x000001F48267B588>
If you want to use the VGG19 as your base model, you will have to use its output as the input to your custom model:
import tensorflow as tf
from keras.applications.vgg19 import VGG19
vgg_model = VGG19(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
vgg_model.trainable = False
x = vgg_model.output
x = tf.keras.layers.Dense(512, activation='relu', name='fc1')(x)
x = tf.keras.layers.Dense(512, activation='relu', name='fc2')(x)
x = tf.keras.layers.Dense(1, name='predictions')(x)
new_model = tf.keras.Model(inputs=vgg_model.input, outputs=x)
new_model.compile(optimizer='adam', loss='mean_squared_error',
metrics=['mae'])
new_model(tf.random.normal((1, 32, 32, 3)))
Note that I removed your Flatten layer, since the output from the vgg_model already has the shape (batch_size, features).

How to convert a tensorflow model to a pytorch model?

I'm new to pytorch. Here's an architecture of a tensorflow model and I'd like to convert it into a pytorch model.
I have done most of the codes but am confused about a few places.
1) In tensorflow, the Conv2D function takes filter as an input. However, in pytorch, the function takes the size of input channels and output channels as inputs. So how do I find the equivalent number of input channels and output channels, provided with the size of the filter.
2) In tensorflow, the dense layer has a parameter called 'nodes'. However, in pytorch, the same layer has 2 different inputs (the size of the input parameters and size of the targeted parameters), how do I determine them based on the number of the nodes.
Here's the tensorflow code.
from keras.utils import to_categorical
from keras.models import Sequential, load_model
from keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Dropout
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu', input_shape=X_train.shape[1:]))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(43, activation='softmax'))
Here's my code.:
import torch.nn.functional as F
import torch
# The network should inherit from the nn.Module
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# Define 2D convolution layers
# 3: input channels, 32: output channels, 5: kernel size, 1: stride
self.conv1 = nn.Conv2d(3, 32, 5, 1) # The size of input channel is 3 because all images are coloured
self.conv2 = nn.Conv2d(32, 64, 5, 1)
self.conv3 = nn.Conv2d(64, 128, 3, 1)
self.conv3 = nn.Conv2d(128, 256, 3, 1)
# It will 'filter' out some of the input by the probability(assign zero)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
# Fully connected layer: input size, output size
self.fc1 = nn.Linear(36864, 128)
self.fc2 = nn.Linear(128, 10)
# forward() link all layers together,
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = self.conv3(x)
x = F.relu(x)
x = self.conv4(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Thanks in advance!
1) In pytorch, we take input channels and output channels as an input. In your first layer, the input channels will be the number of color channels in your image. After that it's always going to be the same as the output channels from your previous layer (output channels are specified by the filters parameter in Tensorflow).
2). Pytorch is slightly annoying in the fact that when flattening your conv outputs you'll have to calculate the shape yourself. You can either use an equation to calculate this (𝑂𝑢𝑡=(𝑊−𝐹+2𝑃)/𝑆+1), or make a shape calculating function to get the shape of a dummy image after it's been passed through the conv part of the network. This parameter will be your size of input argument; the size of your output argument will just be the number of nodes you want in your next fully connected layer.

How to remove the FC layer off of a fine turned model keras

So I have finetuned a Resnet50 model with the following architecture:
model = models.Sequential()
model.add(resnet)
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(layers.Dense(2048, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(736, activation='softmax')) # Output layer
So now I have a saved model (.h5) which I want to use as input into another model. But I don't want the last layer. I would normally do it like this with a base resnet50 model:
def base_model():
resnet = resnet50.ResNet50(weights="imagenet", include_top=False)
x = resnet.output
x = GlobalAveragePooling2D()(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Lambda(lambda x_: K.l2_normalize(x,axis=1))(x)
return Model(inputs=resnet.input, outputs=x)
but that does not work for the model as it gives me an error. I am trying it like this right now but still, it does not work.
def base_model():
resnet = load_model("../Models/fine_tuned_model/fine_tuned_resnet50.h5")
x = resnet.layers.pop()
#resnet = resnet50.ResNet50(weights="imagenet", include_top=False)
#x = resnet.output
#x = GlobalAveragePooling2D()(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.6)(x)
x = Lambda(lambda x_: K.l2_normalize(x,axis=1))(x)
return Model(inputs=resnet.input, outputs=x)
enhanced_resent = base_model()
This is the error that it gives me.
Layer dense_3 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.core.Dense'>. Full input: [<keras.layers.core.Dense object at 0x000001C61E68E2E8>]. All inputs to the layer should be tensors.
I don't know if I can do this or not.
I have finally figured it out after quitting for an hour. So this is how you will do it.
def base_model():
resnet = load_model("../Models/fine_tuned_model/42-0.85.h5")
x = resnet.layers[-2].output
x = Dense(4096, activation='relu', name="FC1")(x)
x = Dropout(0.6, name="FCDrop1")(x)
x = Dense(4096, activation='relu', name="FC2")(x)
x = Dropout(0.6, name="FCDrop2")(x)
x = Lambda(lambda x_: K.l2_normalize(x,axis=1))(x)
return Model(inputs=resnet.input, outputs=x)
enhanced_resent = base_model()
And this works perfectly. I hope this helps out someone else as I have never seen this done in any tutorial before.
x = resnet.layers[-2].output
This will get the layer you want, but you need to know which index the layer you want is at. -2 is the 2nd to last FC layer that I wanted as I wanted the feature extractions, not the final classification. This can be found doing a
model.summary()

CNN accuracy on binary classification of cat/dog images no better than random

I've adapted a simple CNN from a tutorial on Analytics Vidhya.
Problem is that my accuracy on a holdout set is no better than random. I am training on ~8600 images each of cats and dogs, which should be enough data for decent model, but accuracy on the test set is at 49%. Is there a glaring omission in my code somewhere?
import os
import numpy as np
import keras
from keras.models import Sequential
from sklearn.model_selection import train_test_split
from datetime import datetime
from PIL import Image
from keras.utils.np_utils import to_categorical
from sklearn.utils import shuffle
def main():
cat=os.listdir("train/cats")
dog=os.listdir("train/dogs")
filepath="train/cats/"
filepath2="train/dogs/"
print("[INFO] Loading images of cats and dogs each...", datetime.now().time())
#print("[INFO] Loading {} images of cats and dogs each...".format(num_images), datetime.now().time())
images=[]
label = []
for i in cat:
image = Image.open(filepath+i)
image_resized = image.resize((300,300))
images.append(image_resized)
label.append(0) #for cat images
for i in dog:
image = Image.open(filepath2+i)
image_resized = image.resize((300,300))
images.append(image_resized)
label.append(1) #for dog images
images_full = np.array([np.array(x) for x in images])
label = np.array(label)
label = to_categorical(label)
images_full, label = shuffle(images_full, label)
print("[INFO] Splitting into train and test", datetime.now().time())
(trainX, testX, trainY, testY) = train_test_split(images_full, label, test_size=0.25)
filters = 10
filtersize = (5, 5)
epochs = 5
batchsize = 32
input_shape=(300,300,3)
#input_shape = (30, 30, 3)
print("[INFO] Designing model architecture...", datetime.now().time())
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(filters, filtersize, strides=(1, 1), padding='same',
data_format="channels_last", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=2, input_dim=50,activation='softmax'))
#model.add(keras.layers.Dense(units=2, input_dim=5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print("[INFO] Fitting model...", datetime.now().time())
model.fit(trainX, trainY, epochs=epochs, batch_size=batchsize, validation_split=0.3)
model.summary()
print("[INFO] Evaluating on test set...", datetime.now().time())
eval_res = model.evaluate(testX, testY)
print(eval_res)
if __name__== "__main__":
main()
For me the problem comes from the size of your network, you have only one Conv2D with a filter size of 10. This is way too small to learn the deep reprensation of your image.
Try to increment this a lot by using blocks of common architectures like VGGnet !
Example of a block :
x = Conv2D(32, (3, 3) , padding='SAME')(model_input)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3) , padding='SAME')(x)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
You need to try multiple blocks like that, and increasing the filter size in order to capture deeper features.
Other thing, you don't need to specify the input_dim of your dense layer, keras automaticly take care of that !
Last but not least, you need to fully connected network in oder to correctly classify your images, not only a single layer.
For example :
x = Flatten()(x)
x = Dense(256)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(128)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(2)(x)
x = Activation('softmax')(x)
Try those changes and keep me in touch !
Update after op's questions
Images are complex, they contain much information like shapes, edges, colors, etc
In order to capture the maximum amont of information you need to passes through multiple convolutions which will learn the different aspects of the image.
Imagine that like for example first convolution will learn to recognise a square, the second conv to recognise circles, the third to recognise edges, etc ..
And for my second point, the final fully connected acts like a classifier, the conv network will output a vector that "represents" a dog or a cat, now you need to learn that this kind of vector is one class or the other one.
And directly feeding that vector in the final layer is not enough to learn this representation.
Is that more clear ?
Last update for op's second comment
Here the two ways for defining a Keras model, both output the same thing !
model_input = Input(shape=(200, 1))
x = Dense(32)(model_input)
x = Dense(16)(x)
x = Activation('relu')(x)
model = Model(inputs=model_input, outputs=x)
model = Sequential()
model.add(Dense(32, input_shape=(200, 1)))
model.add(Dense(16, activation = 'relu'))
Example of architecure
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Don't forget to normalize your data before feeding into your network.
A simple images_full = images_full / 255.0 on your data can boost your accuracy a lot.
Try it with grayscale images too, it's more computaly efficient.

Categories