This is my test code:
from keras import layers
input1 = layers.Input((2,3))
output = layers.Dense(4)(input1)
print(output)
The output is:
<tf.Tensor 'dense_2/add:0' shape=(?, 2, 4) dtype=float32>
But What Happend?
The documentation says:
Note: if the input to the layer has a rank greater than 2, then it is
flattened prior to the initial dot product with kernel.
While the output is reshaped?
Currently, contrary to what has been stated in documentation, the Dense layer is applied on the last axis of input tensor:
Contrary to the documentation, we don't actually flatten it. It's
applied on the last axis independently.
In other words, if a Dense layer with m units is applied on an input tensor of shape (n_dim1, n_dim2, ..., n_dimk) it would have an output shape of (n_dim1, n_dim2, ..., m).
As a side note: this makes TimeDistributed(Dense(...)) and Dense(...) equivalent to each other.
Another side note: be aware that this has the effect of shared weights. For example, consider this toy network:
model = Sequential()
model.add(Dense(10, input_shape=(20, 5)))
model.summary()
The model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 20, 10) 60
=================================================================
Total params: 60
Trainable params: 60
Non-trainable params: 0
_________________________________________________________________
As you can see the Dense layer has only 60 parameters. How? Each unit in the Dense layer is connected to the 5 elements of each row in the input with the same weights, therefore 10 * 5 + 10 (bias params per unit) = 60.
Update. Here is a visual illustration of the example above:
Related
I am trying to build a neural network that takes in data in form of a matrix and outputs a vector but I don't know what layers to use to perform that.
My input has shape (10,4) and my desired output has shape (3,).
My current model is the following :
model = tf.keras.Sequential([
tf.keras.layers.Dense(256,activation="relu"),
tf.keras.layers.Dense(256,activation="relu"),
tf.keras.layers.Dense(1),
])
this at least results in a vector instead of a matrix but it has (10,) instead of (3,). I could probably find a way to reduce that to (3,) but I doubt I am doing the correct thing with this approach.
Assuming that your (10,4) is a matrix which doesn't represent a 10 length sequence (where you will need an LSTM) OR an image (where you will need a 2D CNN), you can simply flatten() the input matrix and pass it through to the next few dense layers as below.
from tensorflow.keras import layers, Model
inp = layers.Input((10,4)) #none,10,4
x = layers.Flatten()(inp) #none,40
x = layers.Dense(256)(x) #none,256
out = layers.Dense(3)(x) #none,3
model = Model(inp, out)
model.summary()
Layer (type) Output Shape Param #
=================================================================
input_43 (InputLayer) [(None, 10, 4)] 0
_________________________________________________________________
flatten_1 (Flatten) (None, 40) 0
_________________________________________________________________
dense_82 (Dense) (None, 256) 10496
_________________________________________________________________
dense_83 (Dense) (None, 3) 771
=================================================================
Total params: 11,267
Trainable params: 11,267
Non-trainable params: 0
You can use the Flatten layer: https://keras.io/api/layers/reshaping_layers/flatten/
Or you might want to try 2d CNNs.
Considering this LSTM based RNN:
# Instantiating the model
model = Sequential()
# Input layer
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(30, 1)))
# Hidden layers
model.add(LSTM(12, activation="softsign", return_sequences=True))
model.add(LSTM(12, activation="softsign", return_sequences=True))
# Final Hidden layer
model.add(LSTM(10, activation="softsign"))
# Output layer
model.add(Dense(10))
Is each output unit from the final hidden layer connected to each 12 output unit of the preceding hidden layer ? (10*12 = 120 connections)
Is each one of the 10 outputs from the Dense layer connected to each one of the final hidden layer (10*10 = 100 connections)
Would there be a difference in term of connections between the Input layer and the 1st hidden layer if variable "return_sequence" was set to False (for both layers or for one) ?
Thanks a lot for your help
Aymeric
Here is how I picture the RNN, please tell me if it's wrong:
Note about the picture:
X = one training example, i.e a vector of 30 bitcoin (BTC) values (each value represent one day, 30 days total)
Output vector = 10 values that are supposed to be the 10 next values of bitcoin (10 next days)
Let's take a look at the model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 30, 30) 3840
_________________________________________________________________
lstm_1 (LSTM) (None, 30, 12) 2064
_________________________________________________________________
lstm_2 (LSTM) (None, 30, 12) 1200
_________________________________________________________________
lstm_3 (LSTM) (None, 10) 920
_________________________________________________________________
dense (Dense) (None, 10) 110
=================================================================
Total params: 8,134
Trainable params: 8,134
Non-trainable params: 0
_________________________________________________________________
Since you don't use return_sequences=True, the default is return_sequences=False, which means only the last output from the final LSTM layer is used by the Dense layer.
Yes. But it is actually 110 because you have a bias: (10 + 1) * 10.
There would not. The difference between return_sequence=True and return_sequence=False is that when it is set to false, only the final output will be sent to the next layer. So if I have a time series data with 30 events (1, 30, 30), only the output from the 30th event will be passed along to the next layer. The computations are the same, so there will be no difference in weights. Do know that there might be some shape mis-matches if you try to set some of these to be False out of the box.
I have been through the Keras documentation but I am still unable to figure how does the input_shape parameter works and why it does not change the number of parameters for my DenseNet model when I pass it my custom input shape. An example:
import keras
from keras import applications
from keras.layers import Conv3D, MaxPool3D, Flatten, Dense
from keras.layers import Dropout, Input, BatchNormalization
from keras import Model
# define model 1
INPUT_SHAPE = (224, 224, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 224, 224, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
# define model 2
INPUT_SHAPE = (512, 512, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 512, 512, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
Ideally with an increase in the input shape the number of parameters should increase, however as you can see they stay exactly the same. My questions are thus:
Why do the number of parameters not change with a change in the input_shape?
I have only defined one channel in my input_shape, what would happen to my model training in this scenario? The documentation says the following:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
However when I run the model with this configuration it runs without any problems. Could there be something that I am missing out?
Using Keras 2.2.4 with Tensorflow 1.12.0 as backend.
1.
In the convolutional layers the input size does not influence the number of weights, because the number of weights is determined by the kernel matrix dimensions. A larger input size leads to a larger output size, but not to an increasing number of weights.
This means, that the output size of the convolutional layers of the second model will be larger than for the first model, which would increase the number of weights in the following dense layer. However if you take a look into the architecture of DenseNet you notice that there's a GlobalMaxPooling2D layer after all the convolutional layers, which averages all the values for each output channel. Thats why the output of DenseNet will be of size 1024, whatever the input shape.
2.
Yes, the model will still work. I'm not entirely sure about that, but my guess is that the single channel will be broadcasted (dublicated) to fill all three channels. Thats at least how these things are usually handled (see for exaple tensorflow or numpy).
The DenseNet is composed of two parts, the convolution part, and the global pooling part.
The number of the convolution part's trainable weights doesn't depend on the input shape.
Usually, a classification network should employ fully connected layers to infer the classification, however, in DenseNet, global pooling is used and doesn't bring any trainable weights.
Therefore, the input shape doesn't affect the number of weights of the entire network.
I'm asking this because I feel I'm missing something fundamental.
By now most everyone knows that the MNIST images are 28X28 pixels. The keras documentation tells me this about Dense:
Input shape nD tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).
So a newbie like me would assume that the images could be fed to the model as a 28*28 matrix. Yet every tutorial I found goes through various gymasntics to convert the images to a single 784-long feature.
Sometimes by
num_pixels = X_train.shape[1] * X_train.shape[2]
model.add(Dense(num_pixels, input_dim=num_pixels, activation='...'))
or
num_pixels = np.prod(X_train.shape[1:])
model.add(Dense(512, activation='...', input_shape=(num_pixels,)))
or
model.add(Dense(units=10, input_dim=28*28, activation='...'))
history = model.fit(X_train.reshape((-1,28*28)), ...)
or even:
model = Sequential([Dense(32, input_shape=(784,)), ...),])
So my question is simply - why? Can't Dense just accept an image as-is or, if necessary, just process it "behind the scenes", as it were? And if, as I suspect, this processing has to be done, is any of these methods (or others) inherently preferable?
As requested by OP (i.e. Original Poster), I will mention the answer I gave in my comment and elaborate more.
Can't Dense just accept an image as-is or, if necessary, just process
it "behind the scenes", as it were?
Simply no! That's because currently the Dense layer is applied on the last axis. Therefore, if you feed it an image of shape (height, width) or (height, width, channels), Dense layer would be only applied on the last axis (i.e. width or channels). However, when the image is flattened, all the units in the Dense layer would be applied on the whole image and each unit is connected to all the pixels with different weights. To further clarify this consider this model:
model = models.Sequential()
model.add(layers.Dense(10, input_shape=(28*28,)))
model.summary()
Model summary:
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 10) 7850
=================================================================
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________
As you can see, there are 7850 parameters in the Dense layer: each unit is connected to all the pixels (28*28*10 + 10 bias params = 7850). Now consider this model:
model = models.Sequential()
model.add(layers.Dense(10, input_shape=(28,28)))
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 28, 10) 290
=================================================================
Total params: 290
Trainable params: 290
Non-trainable params: 0
_________________________________________________________________
In this case there are only 290 parameters in the Dense layer. Here each unit in the Dense layer is connected to all the pixels as well, but the difference is that the weights are shared across the first axis (28*10 + 10 bias params = 290). It is as though the features are extracted from each row of the image compared to the previous model which extracted features across the whole image. And therefore this (i.e. weight sharing) may or may not be useful for your application.
How does the input dimensions get converted to the output dimensions for the LSTM Layer in Keras? From reading Colah's blog post, it seems as though the number of "timesteps" (AKA the input_dim or the first value in the input_shape) should equal the number of neurons, which should equal the number of outputs from this LSTM layer (delineated by the units argument for the LSTM layer).
From reading this post, I understand the input shapes. What I am baffled by is how Keras plugs the inputs into each of the LSTM "smart neurons".
Keras LSTM reference
Example code that baffles me:
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))
From this, I would think that the LSTM layer has 10 neurons and each neuron is fed a vector of length 64. However, it seems it has 32 neurons and I have no idea what is being fed into each. I understand that for the LSTM to connect to the Dense layer, we can just plug all 32 outputs to each of the 2 neurons. What confuses me is the InputLayer to the LSTM.
(similar SO post but not quite what I need)
Revisited and updated in 2020: I was partially correct! The architecture is 32 neurons. The 10 represents the timestep value. Each neuron is being fed a 64 length vector (maybe representing a word vector), representing 64 features (perhaps 64 words that help identify a word) over 10 timesteps.
The 32 represents the number of neurons. It represents how many hidden states there are for this layer and also represents the output dimension (since we output a hidden state at the end of each LSTM neuron).
Lastly, the 32-dimensional output vector generated from the 32 neurons at the last timestep is then fed to a Dense layer of 2 neurons, which basically means plug the 32 length vector to both neurons, with weights on the input and activation.
More reading with somewhat helpful answers:
Understanding Keras LSTMs
What exactly am I configuring when I create a stateful LSTM layer with N units
Initializing LSTM hidden states with
Keras
I dont think you are right. Actually timestep number does not impact the number of parameters in LSTM.
from keras.layers import LSTM
from keras.models import Sequential
time_step = 13
featrue = 5
hidenfeatrue = 10
model = Sequential()
model.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model.summary()
time_step=100
model2 = Sequential()
model2.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model2.summary()
the reuslt:
Using TensorFlow backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 10) 640
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 10) 640
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________
#Sticky, you are wrong in your interpretation.
Input_shape =(batch_size,sequence_length/timesteps,feature_size).So, your input tensor is 10x64 (like 10 words and its 64 features.Just like word embedding).32 are neurons to make output vector size 32.
The output will have shape structure:
(batch, arbitrary_steps, units) if return_sequences=True.
(batch, units) if return_sequences=False.
The memory states will have a size of "units".