how does input_shape in keras.applications work? - python

I have been through the Keras documentation but I am still unable to figure how does the input_shape parameter works and why it does not change the number of parameters for my DenseNet model when I pass it my custom input shape. An example:
import keras
from keras import applications
from keras.layers import Conv3D, MaxPool3D, Flatten, Dense
from keras.layers import Dropout, Input, BatchNormalization
from keras import Model
# define model 1
INPUT_SHAPE = (224, 224, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 224, 224, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
# define model 2
INPUT_SHAPE = (512, 512, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 512, 512, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
Ideally with an increase in the input shape the number of parameters should increase, however as you can see they stay exactly the same. My questions are thus:
Why do the number of parameters not change with a change in the input_shape?
I have only defined one channel in my input_shape, what would happen to my model training in this scenario? The documentation says the following:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
However when I run the model with this configuration it runs without any problems. Could there be something that I am missing out?
Using Keras 2.2.4 with Tensorflow 1.12.0 as backend.

1.
In the convolutional layers the input size does not influence the number of weights, because the number of weights is determined by the kernel matrix dimensions. A larger input size leads to a larger output size, but not to an increasing number of weights.
This means, that the output size of the convolutional layers of the second model will be larger than for the first model, which would increase the number of weights in the following dense layer. However if you take a look into the architecture of DenseNet you notice that there's a GlobalMaxPooling2D layer after all the convolutional layers, which averages all the values for each output channel. Thats why the output of DenseNet will be of size 1024, whatever the input shape.
2.
Yes, the model will still work. I'm not entirely sure about that, but my guess is that the single channel will be broadcasted (dublicated) to fill all three channels. Thats at least how these things are usually handled (see for exaple tensorflow or numpy).

The DenseNet is composed of two parts, the convolution part, and the global pooling part.
The number of the convolution part's trainable weights doesn't depend on the input shape.
Usually, a classification network should employ fully connected layers to infer the classification, however, in DenseNet, global pooling is used and doesn't bring any trainable weights.
Therefore, the input shape doesn't affect the number of weights of the entire network.

Related

Training the same model with two different outputs with Keras

I have a simple GRU network coded with Keras in python as below:
gru1 = GRU(16, activation='tanh', return_sequences=True)(input)
dense = TimeDistributed(Dense(16, activation='tanh'))(gru1)
output = TimeDistributed(Dense(1, activation="sigmoid"))(dense)
I've used a sigmoid activation for output since my purpose is classification. But I need to use the same model for regression as well. I'll need to change the output activation as linear. However, the rest of the network is still the same. So in this case, I'll use two different networks for two different purposes. Inputs are the same. But outputs are classes for sigmoid and values for linear activation.
My question is, is there any way to use only one network but get two different outputs at the end? Thanks.
Yes, you can use functional API to design a multi-output model. You can keep shared layers and 2 different outputs one with sigmoid another with linear activation.
N.B: Don't use input as a variable, it's a function name in python.
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
seq_len = 100 # your sequence length
input_ = Input(shape=(seq_len,1))
gru1 = GRU(16, activation='tanh', return_sequences=True)(input_)
dense = TimeDistributed(Dense(16, activation='tanh'))(gru1)
output1 = TimeDistributed(Dense(1, activation="sigmoid", name="out1"))(dense)
output2 = TimeDistributed(Dense(1, activation="linear", name="out2"))(dense)
model = Model(input_, [output1, output2])
model.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 100, 1)] 0
__________________________________________________________________________________________________
gru_2 (GRU) (None, 100, 16) 912 input_3[0][0]
__________________________________________________________________________________________________
time_distributed_3 (TimeDistrib (None, 100, 16) 272 gru_2[0][0]
__________________________________________________________________________________________________
time_distributed_4 (TimeDistrib (None, 100, 1) 17 time_distributed_3[0][0]
__________________________________________________________________________________________________
time_distributed_5 (TimeDistrib (None, 100, 1) 17 time_distributed_3[0][0]
==================================================================================================
Total params: 1,218
Trainable params: 1,218
Non-trainable params: 0
Compiling with two loss functions:
losses = {
"out1": "binary_crossentropy",
"out2": "mse",
}
# initialize the optimizer and compile the model
model.compile(optimizer='adam', loss=losses, metrics=["accuracy", "mae"])

What determines input dimension of dense layer at end of a CNN

I'm following CNN tutorial at analytics vidhya.
I'm having difficulty visualizing the connection between the flattened layer and the dense layer with 2 nodes and an input dimension of 50. This is a binary classification problem, so I understand the 2 nodes. However, what determines the input dimensions? We can also omit this parameter, in which case there will just be fewer weights to train for this dense layer?
import os
import numpy as np
import pandas as pd
import scipy
import sklearn
import keras
from keras.models import Sequential
import cv2
from skimage import io
%matplotlib inline
#Defining the File Path
cat=os.listdir("/mnt/hdd/datasets/dogs_cats/train/cat")
dog=os.listdir("/mnt/hdd/datasets/dogs_cats/train/dog")
filepath="/mnt/hdd/datasets/dogs_cats/train/cat/"
filepath2="/mnt/hdd/datasets/dogs_cats/train/dog/"
#Loading the Images
images=[]
label = []
for i in cat:
image = scipy.misc.imread(filepath+i)
images.append(image)
label.append(0) #for cat images
for i in dog:
image = scipy.misc.imread(filepath2+i)
images.append(image)
label.append(1) #for dog images
#resizing all the images
for i in range(0,23000):
images[i]=cv2.resize(images[i],(300,300))
#converting images to arrays
images=np.array(images)
label=np.array(label)
# Defining the hyperparameters
filters=10
filtersize=(5,5)
epochs =5
batchsize=128
input_shape=(300,300,3)
#Converting the target variable to the required size
from keras.utils.np_utils import to_categorical
label = to_categorical(label)
#Defining the model
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(filters, filtersize, strides=(1, 1), padding='valid', data_format="channels_last", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=2, input_dim=50,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(images, label, epochs=epochs, batch_size=batchsize,validation_split=0.3)
model.summary()
However, what determines the input dimensions? We can also omit this
parameter, in which case there will just be fewer weights to train for
this dense layer?
It is determined by the output shape of the previous layer. As seen from the model.summary(), the output shape from Flatten layer is (None, 219040), so the input dimensions to the Dense layer is 219040. So, in this case there are more weights to train(>50).
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 296, 296, 10) 760
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 148, 148, 10) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 219040) 0
_________________________________________________________________
dense_1 (Dense) (None, 2) 438082
=================================================================
Total params: 438,842
Trainable params: 438,842
Non-trainable params: 0
_________________________________________________________________
As can be seen from the code snippet below, the weights for the dense layer are created based on the input_shape parameter(which is the output_shape of the previous layer). The input_dim passed by the user when constructing the Dense layer is ignored.
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
https://github.com/keras-team/keras/blob/3bda5520b787f84f687bb116c460f3aedada039b/keras/layers/core.py#L891

What strategy should I use in my CNN to go from a 3D volume to a 2D plane?

What strategy should I use in my CNN to go from a 3D volume to a 2D plane as the output layer. Can I even have a 2D layer as output?
I am trying to develop a network which input is a 320x320x3 image and output should be 68x2.
I know one way to do it would be to start from 320x320x3 and after a few layer I could flatten my 3D layers and then shorten it down to a 1D array of 136. But I am trying to understand if I could somehow go down to a desired 2d dimension at the final layer.
Thanks,
Shubham
Edit: I might have misread your question initially. If your intention is to have 136 output nodes that can be arranged in a 68x2 matrix (and not to have a 68x68x2 image in the output, as I though at first), then you can use a Reshape layer after your final dense layer with 136 units:
import keras
from keras.models import Sequential
from keras.layers import Conv2D, Flatten, Dense, Reshape
model = Sequential()
model.add(Conv2D(32, 3, input_shape=(320, 320, 3)))
model.add(Flatten())
model.add(Dense(136))
model.add(Reshape((68, 2)))
model.summary()
This will give you the following model, with the desired shape in the output:
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 318, 318, 32) 896
_________________________________________________________________
flatten_2 (Flatten) (None, 3235968) 0
_________________________________________________________________
dense_2 (Dense) (None, 136) 440091784
_________________________________________________________________
reshape_1 (Reshape) (None, 68, 2) 0
=================================================================
Total params: 440,092,680
Trainable params: 440,092,680
Non-trainable params: 0
Make sure to provide your training labels in the same shape when fitting the model.
(original answer, might still be relevant)
Yes, this is commonly done in semantic segmentation models, where the inputs are images and the outputs are tensors of the same height and width of the images, and with the number of channels equal to the number of classes in the output. If you want to do this in TensorFlow or Keras, you can look up existing implementations, for instance of U-Net architectures.
A core feature of these models is that these networks are fully convolutional: they only consist of convolutional layers. Typically, the feaure maps in these models go from 'wide and shallow' (big feature maps in the spatial dimensions with few channels) at first, to 'small and deep' (small spatial dimensions, high-dimensional channel dimension) and back to the desired output dimension. Hence the U-shape:
There are a lot of ways to go from 320x320x3 to 68x2 with a fully convolutional network, but the input and output of your model would basically look like this:
import keras
from keras import Sequential
from keras.layers import Conv2D
model = Sequential()
model.add(Conv2D(32, 3, activation='relu', input_shape=(320,320,3)))
# Include more convolutional layers, pooling layers, upsampling layers etc
...
# At the end of the model, add your final Conv2dD layer with 2 filters
# and the required activation function
model.add(Conv2D(2, 3, activation='softmax'))

Keras Sequential Dense input layer - and MNIST: why do images need to be reshaped?

I'm asking this because I feel I'm missing something fundamental.
By now most everyone knows that the MNIST images are 28X28 pixels. The keras documentation tells me this about Dense:
Input shape nD tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).
So a newbie like me would assume that the images could be fed to the model as a 28*28 matrix. Yet every tutorial I found goes through various gymasntics to convert the images to a single 784-long feature.
Sometimes by
num_pixels = X_train.shape[1] * X_train.shape[2]
model.add(Dense(num_pixels, input_dim=num_pixels, activation='...'))
or
num_pixels = np.prod(X_train.shape[1:])
model.add(Dense(512, activation='...', input_shape=(num_pixels,)))
or
model.add(Dense(units=10, input_dim=28*28, activation='...'))
history = model.fit(X_train.reshape((-1,28*28)), ...)
or even:
model = Sequential([Dense(32, input_shape=(784,)), ...),])
So my question is simply - why? Can't Dense just accept an image as-is or, if necessary, just process it "behind the scenes", as it were? And if, as I suspect, this processing has to be done, is any of these methods (or others) inherently preferable?
As requested by OP (i.e. Original Poster), I will mention the answer I gave in my comment and elaborate more.
Can't Dense just accept an image as-is or, if necessary, just process
it "behind the scenes", as it were?
Simply no! That's because currently the Dense layer is applied on the last axis. Therefore, if you feed it an image of shape (height, width) or (height, width, channels), Dense layer would be only applied on the last axis (i.e. width or channels). However, when the image is flattened, all the units in the Dense layer would be applied on the whole image and each unit is connected to all the pixels with different weights. To further clarify this consider this model:
model = models.Sequential()
model.add(layers.Dense(10, input_shape=(28*28,)))
model.summary()
Model summary:
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 10) 7850
=================================================================
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________
As you can see, there are 7850 parameters in the Dense layer: each unit is connected to all the pixels (28*28*10 + 10 bias params = 7850). Now consider this model:
model = models.Sequential()
model.add(layers.Dense(10, input_shape=(28,28)))
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 28, 10) 290
=================================================================
Total params: 290
Trainable params: 290
Non-trainable params: 0
_________________________________________________________________
In this case there are only 290 parameters in the Dense layer. Here each unit in the Dense layer is connected to all the pixels as well, but the difference is that the weights are shared across the first axis (28*10 + 10 bias params = 290). It is as though the features are extracted from each row of the image compared to the previous model which extracted features across the whole image. And therefore this (i.e. weight sharing) may or may not be useful for your application.

Sharing filter weight in convolution network

I have been working on VGG16 for image recognition for quite a while, and I am very confident about it already. Today, I came across a post in Quora and I started to doubt my understanding on CNN.
In that post, it says that the same filter in a CNN layer should share the same weight. So, assume the kernal size is 3 and the number of filter is 1 in the following constitutional layer, the total number of parameters (weights) should be 3X1 = 3, which is represented by the red, blue, and green arrows. It's easy to understand the Conv1d example.
Then, I try to do experiment on Conv2d with the following keras code:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100,100,1,), name='input_layer')
ccm1_conv = Conv2D(filters=1,kernel_size=(3,3),strides=(1,1),padding='same')(input_layer)
model = Model(input_layer,ccm1_conv)
model.summary()
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) (None, 100, 100, 1) 0
_________________________________________________________________
conv2d_9 (Conv2D) (None, 100, 100, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Since I use only 1 filter, and my kernel_size = 3X3, which means that the kernel reads 9 neurons in the previous layers and then connect it to a neuron in the next layer. Therefore, I would expect 9 parameters (weights) instead of 10.
Then, I tried number of filters = 10, kernal_size = 5X5, it gives 260 parameters (weights) instead of 5*5*10 parameters that I would expect:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100,100,1,), name='input_layer')
ccm1_conv = Conv2D(filters=60,kernel_size=(3,3),strides=(1,1))(input_layer)
model = Model(input_layer,ccm1_conv)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) (None, 100, 100, 1) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 100, 100, 10) 260
=================================================================
Total params: 260
Trainable params: 260
Non-trainable params: 0
_________________________________________________________________
It seems the number of parameters in Conv2d is calculated by the following equation
num_weights = num_filters * (kernal_width*kernal_height + 1)
And I have no idea where does the +1 come from.
The +1 comes from the bias term of each filter. In addition to the kernel weights, each filter has an extra parameter called the bias term (which multiplies a constant 1), like in a fully-connected layer. Keras uses a bias term for each filter by default, but you can also omit it by setting the argument use_bias of Conv2D to False:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100, 100, 1,), name='input_layer')
ccm1_conv = Conv2D(filters=1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False)(input_layer)
model = Model(input_layer, ccm1_conv)
model.summary()

Categories