What determines input dimension of dense layer at end of a CNN - python

I'm following CNN tutorial at analytics vidhya.
I'm having difficulty visualizing the connection between the flattened layer and the dense layer with 2 nodes and an input dimension of 50. This is a binary classification problem, so I understand the 2 nodes. However, what determines the input dimensions? We can also omit this parameter, in which case there will just be fewer weights to train for this dense layer?
import os
import numpy as np
import pandas as pd
import scipy
import sklearn
import keras
from keras.models import Sequential
import cv2
from skimage import io
%matplotlib inline
#Defining the File Path
cat=os.listdir("/mnt/hdd/datasets/dogs_cats/train/cat")
dog=os.listdir("/mnt/hdd/datasets/dogs_cats/train/dog")
filepath="/mnt/hdd/datasets/dogs_cats/train/cat/"
filepath2="/mnt/hdd/datasets/dogs_cats/train/dog/"
#Loading the Images
images=[]
label = []
for i in cat:
image = scipy.misc.imread(filepath+i)
images.append(image)
label.append(0) #for cat images
for i in dog:
image = scipy.misc.imread(filepath2+i)
images.append(image)
label.append(1) #for dog images
#resizing all the images
for i in range(0,23000):
images[i]=cv2.resize(images[i],(300,300))
#converting images to arrays
images=np.array(images)
label=np.array(label)
# Defining the hyperparameters
filters=10
filtersize=(5,5)
epochs =5
batchsize=128
input_shape=(300,300,3)
#Converting the target variable to the required size
from keras.utils.np_utils import to_categorical
label = to_categorical(label)
#Defining the model
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(filters, filtersize, strides=(1, 1), padding='valid', data_format="channels_last", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=2, input_dim=50,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(images, label, epochs=epochs, batch_size=batchsize,validation_split=0.3)
model.summary()

However, what determines the input dimensions? We can also omit this
parameter, in which case there will just be fewer weights to train for
this dense layer?
It is determined by the output shape of the previous layer. As seen from the model.summary(), the output shape from Flatten layer is (None, 219040), so the input dimensions to the Dense layer is 219040. So, in this case there are more weights to train(>50).
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 296, 296, 10) 760
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 148, 148, 10) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 219040) 0
_________________________________________________________________
dense_1 (Dense) (None, 2) 438082
=================================================================
Total params: 438,842
Trainable params: 438,842
Non-trainable params: 0
_________________________________________________________________
As can be seen from the code snippet below, the weights for the dense layer are created based on the input_shape parameter(which is the output_shape of the previous layer). The input_dim passed by the user when constructing the Dense layer is ignored.
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
https://github.com/keras-team/keras/blob/3bda5520b787f84f687bb116c460f3aedada039b/keras/layers/core.py#L891

Related

How do I connect two keras models into one model?

Let's say I have a ResNet50 model and I wish to connect the output layer of this model to the input layer of a VGG model.
This is the ResNet model and the output tensor of ResNet50:
img_shape = (164, 164, 3)
resnet50_model = ResNet50(include_top=False, input_shape=img_shape, weights = None)
print(resnet50_model.output.shape)
I get the output:
TensorShape([Dimension(None), Dimension(6), Dimension(6), Dimension(2048)])
Now I want a new layer where I reshape this output tensor to (64,64,18)
Then I have a VGG16 model:
VGG_model = VGG_model = VGG16(include_top=False, weights=None)
I want the output of the ResNet50 to reshape into the desired tensor and fed in as an input to the VGG model. So essentially I want to concatenate two models. Can someone help me do that?
Thank you!
There are multiple ways you can do this. Here is one way of using Sequential model API to do it.
import tensorflow as tf
from tensorflow.keras.applications import ResNet50, VGG16
model = tf.keras.Sequential()
img_shape = (164, 164, 3)
model.add(ResNet50(include_top=False, input_shape=img_shape, weights = None))
model.add(tf.keras.layers.Reshape(target_shape=(64,64,18)))
model.add(tf.keras.layers.Conv2D(3,kernel_size=(3,3),name='Conv2d'))
VGG_model = VGG16(include_top=False, weights=None)
model.add(VGG_model)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Model summary is as follows
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resnet50 (Model) (None, 6, 6, 2048) 23587712
_________________________________________________________________
reshape (Reshape) (None, 64, 64, 18) 0
_________________________________________________________________
Conv2d (Conv2D) (None, 62, 62, 3) 489
_________________________________________________________________
vgg16 (Model) multiple 14714688
=================================================================
Total params: 38,302,889
Trainable params: 38,249,769
Non-trainable params: 53,120
_________________________________________________________________
Full code is here.

Multi-feature causal CNN - Keras implementation

I'm currently using a basic LSTM to make regression predictions and I would like to implement a causal CNN as it should be computationally more efficient.
I'm struggling to figure out how to reshape my current data to fit the causal CNN cell and represent the same data/timestep relationship as well as what the dilation rate should be set at.
My current data is of this shape: (number of examples, lookback, features) and here's a basic example of the LSTM NN I'm using right now.
lookback = 20 # height -- timeseries
n_features = 5 # width -- features at each timestep
# Build an LSTM to perform regression on time series input/output data
model = Sequential()
model.add(LSTM(units=256, return_sequences=True, input_shape=(lookback, n_features)))
model.add(Activation('elu'))
model.add(LSTM(units=256, return_sequences=True))
model.add(Activation('elu'))
model.add(LSTM(units=256))
model.add(Activation('elu'))
model.add(Dense(units=1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train,
epochs=50, batch_size=64,
validation_data=(X_val, y_val),
verbose=1, shuffle=True)
prediction = model.predict(X_test)
I then created a new CNN model (although not causal as the 'causal' padding is only an option for Conv1D and not Conv2D, per Keras documentation. If I understand correctly, by having multiple features, I need to use Conv2D, rather than Conv1D but then if I set Conv2D(padding='causal'), I get the following error - Invalid padding: causal)
Anyways, I was also able to fit the data with a new shape (number of examples, lookback, features, 1) and run the following model using the Conv2D Layer:
lookback = 20 # height -- timeseries
n_features = 5 # width -- features at each timestep
model = Sequential()
model.add(Conv2D(128, 3, activation='elu', input_shape=(lookback, n_features, 1)))
model.add(MaxPool2D())
model.add(Conv2D(128, 3, activation='elu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train,
epochs=50, batch_size=64,
validation_data=(X_val, y_val),
verbose=1, shuffle=True)
prediction = model.predict(X_test)
However, from my understanding, this does not propagate the data as causal, rather just the entire set (lookback, features, 1) as an image.
Is there any way to either reshape my data to fit into a Conv1D(padding='causal') Layer, with multiple features or somehow run the same data and input shape as Conv2D with 'causal' padding?
I believe that you can have causal padding with dilation for any number of input features. Here is the solution I would propose.
The TimeDistributed layer is key to this.
From Keras Documentation: "This wrapper applies a layer to every temporal slice of an input. The input should be at least 3D, and the dimension of index one will be considered to be the temporal dimension."
For our purposes, we want this layer to apply "something" to each feature, so we move the features to the temporal index, which is 1.
Also relevant is the Conv1D documentation.
Specifically about channels: "The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, steps, channels) (default format for temporal data in Keras)"
from tensorflow.python.keras import Sequential, backend
from tensorflow.python.keras.layers import GlobalMaxPool1D, Activation, MaxPool1D, Flatten, Conv1D, Reshape, TimeDistributed, InputLayer
backend.clear_session()
lookback = 20
n_features = 5
filters = 128
model = Sequential()
model.add(InputLayer(input_shape=(lookback, n_features, 1)))
# Causal layers are first applied to the features independently
model.add(Permute(dims=(2, 1))) # UPDATE must permute prior to adding new dim and reshap
model.add(Reshape(target_shape=(n_features, lookback, 1)))
# After reshape 5 input features are now treated as the temporal layer
# for the TimeDistributed layer
# When Conv1D is applied to each input feature, it thinks the shape of the layer is (20, 1)
# with the default "channels_last", therefore...
# 20 times steps is the temporal dimension
# 1 is the "channel", the new location for the feature maps
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**0)))
# You could add pooling here if you want.
# If you want interaction between features AND causal/dilation, then apply later
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**1)))
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**2)))
# Stack feature maps on top of each other so each time step can look at
# all features produce earlier
model.add(Permute(dims=(2, 1, 3))) # UPDATED to fix issue with reshape
model.add(Reshape(target_shape=(lookback, n_features * filters))) # (20 time steps, 5 features * 128 filters)
# Causal layers are applied to the 5 input features dependently
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**0))
model.add(MaxPool1D())
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**1))
model.add(MaxPool1D())
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**2))
model.add(GlobalMaxPool1D())
model.add(Dense(units=1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
Final Model Summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
reshape (Reshape) (None, 5, 20, 1) 0
_________________________________________________________________
time_distributed (TimeDistri (None, 5, 20, 128) 512
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 20, 128) 49280
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 20, 128) 49280
_________________________________________________________________
reshape_1 (Reshape) (None, 20, 640) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 20, 128) 245888
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 10, 128) 0
_________________________________________________________________
conv1d_4 (Conv1D) (None, 10, 128) 49280
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 5, 128) 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 5, 128) 49280
_________________________________________________________________
global_max_pooling1d (Global (None, 128) 0
_________________________________________________________________
dense (Dense) (None, 1) 129
=================================================================
Total params: 443,649
Trainable params: 443,649
Non-trainable params: 0
_________________________________________________________________
Edit:
"why you need to reshape and use n_features as the temporal layer"
The reason why n_features needs to be at the temporal layer initially is because Conv1D with dilation and causal padding only works with one feature at a time, and because of how the TimeDistributed layer is implemented.
From their documentation "Consider a batch of 32 samples, where each sample is a sequence of 10 vectors of 16 dimensions. The batch input shape of the layer is then (32, 10, 16), and the input_shape, not including the samples dimension, is (10, 16).
You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps, independently:"
By applying the TimeDistributed layer independently to each feature, it reduces the dimension of the problem as if there was only one feature (which would easily allow for dilation and causal padding). With 5 features, they need to each be handled separately at first.
After your edits this recommendation still applies.
There shouldn't be a difference in terms of the network whether InputLayer is included in the first layer or separate so you can definitely put it in the first CNN if that resolves the issue.
In Conv1D with causal padding is dilation convolution. For Conv2D you can use dilation_rate parameter of the Conv2D class. You have to assign dilation_rate with 2-tuple of integers. For more information you can read in the keras documentation or here.

how does input_shape in keras.applications work?

I have been through the Keras documentation but I am still unable to figure how does the input_shape parameter works and why it does not change the number of parameters for my DenseNet model when I pass it my custom input shape. An example:
import keras
from keras import applications
from keras.layers import Conv3D, MaxPool3D, Flatten, Dense
from keras.layers import Dropout, Input, BatchNormalization
from keras import Model
# define model 1
INPUT_SHAPE = (224, 224, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 224, 224, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
# define model 2
INPUT_SHAPE = (512, 512, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 512, 512, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
Ideally with an increase in the input shape the number of parameters should increase, however as you can see they stay exactly the same. My questions are thus:
Why do the number of parameters not change with a change in the input_shape?
I have only defined one channel in my input_shape, what would happen to my model training in this scenario? The documentation says the following:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
However when I run the model with this configuration it runs without any problems. Could there be something that I am missing out?
Using Keras 2.2.4 with Tensorflow 1.12.0 as backend.
1.
In the convolutional layers the input size does not influence the number of weights, because the number of weights is determined by the kernel matrix dimensions. A larger input size leads to a larger output size, but not to an increasing number of weights.
This means, that the output size of the convolutional layers of the second model will be larger than for the first model, which would increase the number of weights in the following dense layer. However if you take a look into the architecture of DenseNet you notice that there's a GlobalMaxPooling2D layer after all the convolutional layers, which averages all the values for each output channel. Thats why the output of DenseNet will be of size 1024, whatever the input shape.
2.
Yes, the model will still work. I'm not entirely sure about that, but my guess is that the single channel will be broadcasted (dublicated) to fill all three channels. Thats at least how these things are usually handled (see for exaple tensorflow or numpy).
The DenseNet is composed of two parts, the convolution part, and the global pooling part.
The number of the convolution part's trainable weights doesn't depend on the input shape.
Usually, a classification network should employ fully connected layers to infer the classification, however, in DenseNet, global pooling is used and doesn't bring any trainable weights.
Therefore, the input shape doesn't affect the number of weights of the entire network.

Sharing filter weight in convolution network

I have been working on VGG16 for image recognition for quite a while, and I am very confident about it already. Today, I came across a post in Quora and I started to doubt my understanding on CNN.
In that post, it says that the same filter in a CNN layer should share the same weight. So, assume the kernal size is 3 and the number of filter is 1 in the following constitutional layer, the total number of parameters (weights) should be 3X1 = 3, which is represented by the red, blue, and green arrows. It's easy to understand the Conv1d example.
Then, I try to do experiment on Conv2d with the following keras code:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100,100,1,), name='input_layer')
ccm1_conv = Conv2D(filters=1,kernel_size=(3,3),strides=(1,1),padding='same')(input_layer)
model = Model(input_layer,ccm1_conv)
model.summary()
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) (None, 100, 100, 1) 0
_________________________________________________________________
conv2d_9 (Conv2D) (None, 100, 100, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Since I use only 1 filter, and my kernel_size = 3X3, which means that the kernel reads 9 neurons in the previous layers and then connect it to a neuron in the next layer. Therefore, I would expect 9 parameters (weights) instead of 10.
Then, I tried number of filters = 10, kernal_size = 5X5, it gives 260 parameters (weights) instead of 5*5*10 parameters that I would expect:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100,100,1,), name='input_layer')
ccm1_conv = Conv2D(filters=60,kernel_size=(3,3),strides=(1,1))(input_layer)
model = Model(input_layer,ccm1_conv)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) (None, 100, 100, 1) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 100, 100, 10) 260
=================================================================
Total params: 260
Trainable params: 260
Non-trainable params: 0
_________________________________________________________________
It seems the number of parameters in Conv2d is calculated by the following equation
num_weights = num_filters * (kernal_width*kernal_height + 1)
And I have no idea where does the +1 come from.
The +1 comes from the bias term of each filter. In addition to the kernel weights, each filter has an extra parameter called the bias term (which multiplies a constant 1), like in a fully-connected layer. Keras uses a bias term for each filter by default, but you can also omit it by setting the argument use_bias of Conv2D to False:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100, 100, 1,), name='input_layer')
ccm1_conv = Conv2D(filters=1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False)(input_layer)
model = Model(input_layer, ccm1_conv)
model.summary()

Expected dense_3_input to have shape (None, 40) but got array with shape (40, 1)

I am a beginner at Deep Learning and am attempting to practice the implementation of Neural Networks in Python by performing audio analysis on a dataset. I have been following the Urban Sound Challenge tutorial and have completed the code for training the model, but I keep running into errors when trying to run the model on the test set.
Here is my code for creation of the model and training:
import numpy as np
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
num_labels = y.shape[1]
filter_size = 2
model = Sequential()
model.add(Dense(256, input_shape = (40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
model.fit(X, y, batch_size=32, epochs=40, validation_data=(val_X, val_Y))
Running model.summary() before fitting the model gives me:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 256) 10496
_________________________________________________________________
activation_3 (Activation) (None, 256) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 256) 0
_________________________________________________________________
dense_4 (Dense) (None, 10) 2570
_________________________________________________________________
activation_4 (Activation) (None, 10) 0
=================================================================
Total params: 13,066
Trainable params: 13,066
Non-trainable params: 0
_________________________________________________________________
After fitting the model, I attempt to run it on one file so that it can classify the sound.
file_name = ".../UrbanSoundClassifier/test/Test/5.wav"
test_X, sample_rate = librosa.load(file_name,res_type='kaiser_fast')
mfccs = np.mean(librosa.feature.mfcc(y=test_X, sr=sample_rate, n_mfcc=40).T,axis=0)
test_X = np.array(mfccs)
print(model.predict(test_X))
However, I get
ValueError: Error when checking : expected dense_3_input to have shape
(None, 40) but got array with shape (40, 1)
Would someone kindly like to point me in the right direction as to how I should be testing the model? I do not know what the input for model.predict() should be.
Full code can be found here.
So:
The easiest fix to that is simply reshaping test_x:
test_x = test_x.reshape((1, 40))
More sophisticated is to reuse the pipeline you have for the creation of train and valid set also for a test set. Please, notice that the process you applied to data files is totally different in case of test. I'd create a test dataframe:
test_dataframe = pd.DataFrame({'filename': ["here path to test file"]}
and then reused existing pipeline for creation of validation set.

Categories