I'm currently using a basic LSTM to make regression predictions and I would like to implement a causal CNN as it should be computationally more efficient.
I'm struggling to figure out how to reshape my current data to fit the causal CNN cell and represent the same data/timestep relationship as well as what the dilation rate should be set at.
My current data is of this shape: (number of examples, lookback, features) and here's a basic example of the LSTM NN I'm using right now.
lookback = 20 # height -- timeseries
n_features = 5 # width -- features at each timestep
# Build an LSTM to perform regression on time series input/output data
model = Sequential()
model.add(LSTM(units=256, return_sequences=True, input_shape=(lookback, n_features)))
model.add(Activation('elu'))
model.add(LSTM(units=256, return_sequences=True))
model.add(Activation('elu'))
model.add(LSTM(units=256))
model.add(Activation('elu'))
model.add(Dense(units=1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train,
epochs=50, batch_size=64,
validation_data=(X_val, y_val),
verbose=1, shuffle=True)
prediction = model.predict(X_test)
I then created a new CNN model (although not causal as the 'causal' padding is only an option for Conv1D and not Conv2D, per Keras documentation. If I understand correctly, by having multiple features, I need to use Conv2D, rather than Conv1D but then if I set Conv2D(padding='causal'), I get the following error - Invalid padding: causal)
Anyways, I was also able to fit the data with a new shape (number of examples, lookback, features, 1) and run the following model using the Conv2D Layer:
lookback = 20 # height -- timeseries
n_features = 5 # width -- features at each timestep
model = Sequential()
model.add(Conv2D(128, 3, activation='elu', input_shape=(lookback, n_features, 1)))
model.add(MaxPool2D())
model.add(Conv2D(128, 3, activation='elu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train,
epochs=50, batch_size=64,
validation_data=(X_val, y_val),
verbose=1, shuffle=True)
prediction = model.predict(X_test)
However, from my understanding, this does not propagate the data as causal, rather just the entire set (lookback, features, 1) as an image.
Is there any way to either reshape my data to fit into a Conv1D(padding='causal') Layer, with multiple features or somehow run the same data and input shape as Conv2D with 'causal' padding?
I believe that you can have causal padding with dilation for any number of input features. Here is the solution I would propose.
The TimeDistributed layer is key to this.
From Keras Documentation: "This wrapper applies a layer to every temporal slice of an input. The input should be at least 3D, and the dimension of index one will be considered to be the temporal dimension."
For our purposes, we want this layer to apply "something" to each feature, so we move the features to the temporal index, which is 1.
Also relevant is the Conv1D documentation.
Specifically about channels: "The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, steps, channels) (default format for temporal data in Keras)"
from tensorflow.python.keras import Sequential, backend
from tensorflow.python.keras.layers import GlobalMaxPool1D, Activation, MaxPool1D, Flatten, Conv1D, Reshape, TimeDistributed, InputLayer
backend.clear_session()
lookback = 20
n_features = 5
filters = 128
model = Sequential()
model.add(InputLayer(input_shape=(lookback, n_features, 1)))
# Causal layers are first applied to the features independently
model.add(Permute(dims=(2, 1))) # UPDATE must permute prior to adding new dim and reshap
model.add(Reshape(target_shape=(n_features, lookback, 1)))
# After reshape 5 input features are now treated as the temporal layer
# for the TimeDistributed layer
# When Conv1D is applied to each input feature, it thinks the shape of the layer is (20, 1)
# with the default "channels_last", therefore...
# 20 times steps is the temporal dimension
# 1 is the "channel", the new location for the feature maps
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**0)))
# You could add pooling here if you want.
# If you want interaction between features AND causal/dilation, then apply later
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**1)))
model.add(TimeDistributed(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**2)))
# Stack feature maps on top of each other so each time step can look at
# all features produce earlier
model.add(Permute(dims=(2, 1, 3))) # UPDATED to fix issue with reshape
model.add(Reshape(target_shape=(lookback, n_features * filters))) # (20 time steps, 5 features * 128 filters)
# Causal layers are applied to the 5 input features dependently
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**0))
model.add(MaxPool1D())
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**1))
model.add(MaxPool1D())
model.add(Conv1D(filters, 3, activation="elu", padding="causal", dilation_rate=2**2))
model.add(GlobalMaxPool1D())
model.add(Dense(units=1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
Final Model Summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
reshape (Reshape) (None, 5, 20, 1) 0
_________________________________________________________________
time_distributed (TimeDistri (None, 5, 20, 128) 512
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 20, 128) 49280
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 20, 128) 49280
_________________________________________________________________
reshape_1 (Reshape) (None, 20, 640) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 20, 128) 245888
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 10, 128) 0
_________________________________________________________________
conv1d_4 (Conv1D) (None, 10, 128) 49280
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 5, 128) 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 5, 128) 49280
_________________________________________________________________
global_max_pooling1d (Global (None, 128) 0
_________________________________________________________________
dense (Dense) (None, 1) 129
=================================================================
Total params: 443,649
Trainable params: 443,649
Non-trainable params: 0
_________________________________________________________________
Edit:
"why you need to reshape and use n_features as the temporal layer"
The reason why n_features needs to be at the temporal layer initially is because Conv1D with dilation and causal padding only works with one feature at a time, and because of how the TimeDistributed layer is implemented.
From their documentation "Consider a batch of 32 samples, where each sample is a sequence of 10 vectors of 16 dimensions. The batch input shape of the layer is then (32, 10, 16), and the input_shape, not including the samples dimension, is (10, 16).
You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps, independently:"
By applying the TimeDistributed layer independently to each feature, it reduces the dimension of the problem as if there was only one feature (which would easily allow for dilation and causal padding). With 5 features, they need to each be handled separately at first.
After your edits this recommendation still applies.
There shouldn't be a difference in terms of the network whether InputLayer is included in the first layer or separate so you can definitely put it in the first CNN if that resolves the issue.
In Conv1D with causal padding is dilation convolution. For Conv2D you can use dilation_rate parameter of the Conv2D class. You have to assign dilation_rate with 2-tuple of integers. For more information you can read in the keras documentation or here.
Related
I'm currently working on a Keras neural network for fun. I'm just learning the basics, but cant get over this dimension problem:
So my input data (X) should be a 12x6 matrix, with 12 timestamps and 6 different data values for every timestamp:
X = np.zeros([2867, 12, 6])
Y = np.zeros([2867, 3])
My Output (Y) should be a one-hot encoded 3x1 vector.
Now i want to feed this data through the following LSTM model.
model = Sequential()
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(12, 6)))
model.add(Dense(3))
model.summary()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=X, y=Y, batch_size=100, epochs=1000, verbose=2, validation_split=0.2)
The Summary looks like this:
Model: "sequential"
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 12, 30) 4440
_________________________________________________________________
dense (Dense) (None, 12, 3) 93
=================================================================
Total params: 4,533
Trainable params: 4,533
Non-trainable params: 0
_________________________________________________________________
When i run this program, i get this error:
ValueError: Shapes (None, 3) and (None, 12, 3) are incompatible.
I already tried to reshape my data to a 72x1 vector, but this doesnt work either.
Maybe someone can help me how to shape my input data correctly :).
You probably need to define your model as follows as you used the categorical_crossentropy loss function.
model.add(LSTM(30, activation="softsign",
return_sequences=False, input_shape=(12, 6)))
model.add(Dense(3, activations='softmax'))
I have several data files of human activity recognition data consisting of time-ordered rows of recorded raw samples. Each row has 8 columns of EMG sensor data and 1 corresponding column of target sensor data. I'm trying to feed the 8 channels of EMG sensor data into a CNN+LSTM deep model in order to predict the 1 channel of target data. I do this by breaking down a dataset (a in the image below) into 50-row windows of raw samples (b in the image below) and then reshaping these windows into blocks of 4 windows, to act as time steps for the LSTM part of the model (c in the image below). The following image will hopefully explain it better:
I've been following the tutorial here as to how to implement my model: https://medium.com/smileinnovation/how-to-work-with-time-distributed-data-in-a-neural-network-b8b39aa4ce00
I have reshaped the data and built the model but keep coming back to the following error that I cannot figure out how to resolve:
"ValueError: Error when checking target: expected FC_out to have 2 dimensions, but got array with shape (808, 50, 1)"
My code follows and is written in Python using Keras and Tensorflow:
from keras.models import Sequential
from keras.layers import CuDNNLSTM
from keras.layers.convolutional import Conv2D
from keras.layers.core import Dense, Dropout
from keras.layers import Flatten
from keras.layers import TimeDistributed
#Code that reads in file data and shapes it into 4-window blocks omitted. That code produces the following arrays:
#x_train - shape of (808, 4, 50, 8) which equates to (samples, time steps, window length, number of channels)
#x_valid - shape of (223, 4, 50, 8) which equates to the same as x_train
#y_train - shape of (808, 50, 1) which equates to (samples, window length, number of target channels)
# Followed machine learning mastery style for ease of reading
numSteps = x_train.shape[1]
windowLength = x_train.shape[2]
numChannels = x_train.shape[3]
numOutputs = 1
# Reshape x data for use with TimeDistributed wrapper, adding extra dimension at the end
x_train = x_train.reshape(x_train.shape[0], numSteps, windowLength, numChannels, 1)
x_valid = x_valid.reshape(x_valid.shape[0], numSteps, windowLength, numChannels, 1)
# Build model
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3,3), activation=activation, name="Conv2D_1"),
input_shape=(numSteps, windowLength, numChannels, 1)))
model.add(TimeDistributed(Conv2D(64, (3,3), activation=activation, name="Conv2D_2")))
model.add(Dropout(0.4, name="CNN_Drop_01"))
# Flatten for passing to LSTM layer
model.add(TimeDistributed(Flatten(name="Flatten_1")))
# LSTM and Dropout
model.add(CuDNNLSTM(28, return_sequences=True, name="LSTM_01"))
model.add(Dropout(0.4, name="Drop_01"))
# Second LSTM and Dropout
model.add(CuDNNLSTM(28, return_sequences=False, name="LSTM_02"))
model.add(Dropout(0.3, name="Drop_02"))
# Fully Connected layer and further Dropout
model.add(Dense(16, activation=activation, name="FC_1"))
model.add(Dropout(0.4)) # For example, for 3 outputs classes
# Final fully Connected layer specifying outputs
model.add(Dense(numOutputs, activation=activation, name="FC_out"))
# Compile model, produce summary and save model image to file
# NOTE: coeffDetermination refers to a function for calculating R2 and is not included in this code
model.compile(optimizer='Adam', loss='mse', metrics=[coeffDetermination])
# Now train the model
history_cb = model.fit(x_train, y_train, validation_data=(x_valid, y_valid), epochs=30, batch_size=64)
I'd be grateful if anyone can figure out what I've done wrong. Or am I just going about this the incorrect way, with trying to use this model configuration for time series prediction?
"ValueError: Error when checking target: expected FC_out to have 2 dimensions, but got array with shape (808, 50, 1)"
Your input is (808, 4, 50, 8, 1) and output is (808, 50, 1)
However, from the model.summary() shows that output shape should be (None, 4, 1)
Since the # of time steps is 4, y_train should be something like (808, 4, 1).
Or, if you want to have (888, 50, 1), you need to change model to get the last part as (None, 50, 1).
Model: "sequential_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_18 (TimeDis (None, 4, 48, 6, 64) 640
_________________________________________________________________
time_distributed_19 (TimeDis (None, 4, 46, 4, 64) 36928
_________________________________________________________________
CNN_Drop_01 (Dropout) (None, 4, 46, 4, 64) 0
_________________________________________________________________
time_distributed_20 (TimeDis (None, 4, 11776) 0
_________________________________________________________________
LSTM_01 (LSTM) (None, 4, 28) 1322160
_________________________________________________________________
Drop_01 (Dropout) (None, 4, 28) 0
_________________________________________________________________
Drop_02 (Dropout) (None, 4, 28) 0
_________________________________________________________________
FC_1 (Dense) (None, 4, 16) 464
_________________________________________________________________
dropout_3 (Dropout) (None, 4, 16) 0
_________________________________________________________________
FC_out (Dense) (None, 4, 1) 17
=================================================================
Total params: 1,360,209
Trainable params: 1,360,209
Non-trainable params: 0
For Many to many sequence prediction with different sequence length, check this link https://github.com/keras-team/keras/issues/6063
dataX or input : (nb_samples, nb_timesteps, nb_features) -> (1000, 50, 1)
dataY or output: (nb_samples, nb_timesteps, nb_features) -> (1000, 10, 1)
model = Sequential()
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))
model.add(RepeatVector(10))
model.add(LSTM(output_dim=hidden_neurons, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
I'm following CNN tutorial at analytics vidhya.
I'm having difficulty visualizing the connection between the flattened layer and the dense layer with 2 nodes and an input dimension of 50. This is a binary classification problem, so I understand the 2 nodes. However, what determines the input dimensions? We can also omit this parameter, in which case there will just be fewer weights to train for this dense layer?
import os
import numpy as np
import pandas as pd
import scipy
import sklearn
import keras
from keras.models import Sequential
import cv2
from skimage import io
%matplotlib inline
#Defining the File Path
cat=os.listdir("/mnt/hdd/datasets/dogs_cats/train/cat")
dog=os.listdir("/mnt/hdd/datasets/dogs_cats/train/dog")
filepath="/mnt/hdd/datasets/dogs_cats/train/cat/"
filepath2="/mnt/hdd/datasets/dogs_cats/train/dog/"
#Loading the Images
images=[]
label = []
for i in cat:
image = scipy.misc.imread(filepath+i)
images.append(image)
label.append(0) #for cat images
for i in dog:
image = scipy.misc.imread(filepath2+i)
images.append(image)
label.append(1) #for dog images
#resizing all the images
for i in range(0,23000):
images[i]=cv2.resize(images[i],(300,300))
#converting images to arrays
images=np.array(images)
label=np.array(label)
# Defining the hyperparameters
filters=10
filtersize=(5,5)
epochs =5
batchsize=128
input_shape=(300,300,3)
#Converting the target variable to the required size
from keras.utils.np_utils import to_categorical
label = to_categorical(label)
#Defining the model
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(filters, filtersize, strides=(1, 1), padding='valid', data_format="channels_last", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=2, input_dim=50,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(images, label, epochs=epochs, batch_size=batchsize,validation_split=0.3)
model.summary()
However, what determines the input dimensions? We can also omit this
parameter, in which case there will just be fewer weights to train for
this dense layer?
It is determined by the output shape of the previous layer. As seen from the model.summary(), the output shape from Flatten layer is (None, 219040), so the input dimensions to the Dense layer is 219040. So, in this case there are more weights to train(>50).
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 296, 296, 10) 760
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 148, 148, 10) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 219040) 0
_________________________________________________________________
dense_1 (Dense) (None, 2) 438082
=================================================================
Total params: 438,842
Trainable params: 438,842
Non-trainable params: 0
_________________________________________________________________
As can be seen from the code snippet below, the weights for the dense layer are created based on the input_shape parameter(which is the output_shape of the previous layer). The input_dim passed by the user when constructing the Dense layer is ignored.
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
https://github.com/keras-team/keras/blob/3bda5520b787f84f687bb116c460f3aedada039b/keras/layers/core.py#L891
In my last post linked here, it was said that I have to modify my model for it to be better. To quote the only answerer's comment to my questions (again, thank you, Sir):
The accuracy of prediction is a metric of how good your neural network architecture is and it also depends on your train/validation data. You will have to tune your neural network in such a way that you generalize well by adjusting the hyper parameters such as number of layers, type of layers, learning rate, optimizer etc. ...
I would like to know how I would do these mentioned. Or at the least, be pointed in the right direction. I am honestly both lost in theory and practice.
The only thing I have been able to do is to adjust the epoch above 100. I have also cleaned the images to be identified as much as I can.
Currently, here is how I create my model. It is only based on Tensorflow 2.0's tutorial.
import numpy as np
import tensorflow as tf
from tensorflow import keras
# Load and prepare the MNIST dataset. Convert the samples from integers to floating-point numbers:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
def createModel():
# Build the tf.keras.Sequential model by stacking layers.
# Choose an optimizer and loss function used for training:
model = tf.keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
model = createModel()
model.fit(x_train, y_train, epochs=102, validation_data=(x_test, y_test))
model.evaluate(x_test, y_test)
It gave out a validation accuracy of around .9800 for me. But its performance against images of handwritten characters I've extracted from documents is dismal. I would also like it to be extended such that it can also read other selected characters, but I guess that can be another question for another day.
Thanks!
You could have multiple layers of Convolution/ Max Pool at the beginning that would perform a feature extraction by scanning the image. After that you use a fully connected NN like you did before and a softmax.
You could create a model with a CNN that way:
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.models import Sequential
# Create the model
model = Sequential()
# Add the 1st Convolution/ max pool
model.add(Conv2D(40, kernel_size=5, padding="same",input_shape=(28, 28, 1), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# 2nd convolution / max pool
model.add(Conv2D(200, kernel_size=3, padding="same", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# 3rd convolution/ max pool
model.add(Conv2D(512, kernel_size=3, padding="valid", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# Reduce dimensions from 2d to 1d
model.add(Flatten())
model.add(Dense(units=100, activation='relu'))
# Add dropout to prevent overfitting
model.add(Dropout(0.5))
# Final fullyconnected layer
model.add(Dense(10, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Which returns the following model:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 28, 28, 40) 1040
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 40) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 14, 14, 200) 72200
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 200) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 10, 10, 512) 922112
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 32768) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 3276900
_________________________________________________________________
dropout_1 (Dropout) (None, 100) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 1010
=================================================================
Total params: 4,273,262
Trainable params: 4,273,262
Non-trainable params: 0
_________________________________________________________________
I have been through the Keras documentation but I am still unable to figure how does the input_shape parameter works and why it does not change the number of parameters for my DenseNet model when I pass it my custom input shape. An example:
import keras
from keras import applications
from keras.layers import Conv3D, MaxPool3D, Flatten, Dense
from keras.layers import Dropout, Input, BatchNormalization
from keras import Model
# define model 1
INPUT_SHAPE = (224, 224, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 224, 224, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
# define model 2
INPUT_SHAPE = (512, 512, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 512, 512, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
Ideally with an increase in the input shape the number of parameters should increase, however as you can see they stay exactly the same. My questions are thus:
Why do the number of parameters not change with a change in the input_shape?
I have only defined one channel in my input_shape, what would happen to my model training in this scenario? The documentation says the following:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
However when I run the model with this configuration it runs without any problems. Could there be something that I am missing out?
Using Keras 2.2.4 with Tensorflow 1.12.0 as backend.
1.
In the convolutional layers the input size does not influence the number of weights, because the number of weights is determined by the kernel matrix dimensions. A larger input size leads to a larger output size, but not to an increasing number of weights.
This means, that the output size of the convolutional layers of the second model will be larger than for the first model, which would increase the number of weights in the following dense layer. However if you take a look into the architecture of DenseNet you notice that there's a GlobalMaxPooling2D layer after all the convolutional layers, which averages all the values for each output channel. Thats why the output of DenseNet will be of size 1024, whatever the input shape.
2.
Yes, the model will still work. I'm not entirely sure about that, but my guess is that the single channel will be broadcasted (dublicated) to fill all three channels. Thats at least how these things are usually handled (see for exaple tensorflow or numpy).
The DenseNet is composed of two parts, the convolution part, and the global pooling part.
The number of the convolution part's trainable weights doesn't depend on the input shape.
Usually, a classification network should employ fully connected layers to infer the classification, however, in DenseNet, global pooling is used and doesn't bring any trainable weights.
Therefore, the input shape doesn't affect the number of weights of the entire network.