Keras: Embedding layer + LSTM: Time Dimension

Keras: Embedding layer + LSTM: Time Dimension - python

This might be too stupid to ask ... but ...
When using LSTM after the initial Embedding layer in Keras (for example the Keras LSTM-IMDB tutorial code), how does the Embedding layer know that there is a time dimension? In another word, how does the Embedding layer knowthe length of each sequence in the training data set? How does the Embedding layer know I am training on sentences, not on individual words? Does it simply infer during the training process?

Embedding layer is usually either first or second layer of your model. If it's first (usually when you use Sequential API) - then you need to specify its input shape which is either (seq_len,) or (None,). In a case when it's second layer (usually when you use Functional API) then you need to specify a first layer which is an Input layer. For this layer - you also need to specify shape. In a case when a shape is (None,) then an input shape is inferred from a size of a batch of data fed to a model.

Related

TimeDistributed layer, but with different weights

I'm trying to apply a separate convolution to each layer of a 3-dimensional array, which brought me to the Keras TimeDistributed layer. But the documentation notes that:
"Because TimeDistributed applies the same instance of Conv2D to each of the
timestamps, the same set of weights are used at each timestamp."
However, I want to perform a separate convolution (with independently defined weights / filters) for each layer of the array, not using the same set of weights. Is there some built in way to do this? Any help is appreciated!

Which layer of a deep learning model (DenseNet-121) to use as output when using model as feature extractor

im having trouble deciding or being sure what layer of a densenet-121 (fined-tuned model) to use for feature extraction.
I have the following model (is based on DenseNet-121 but I added a classification layer because I have trained it to classify the image into 7 classes). These are the last layers of my model:
However, Im having trouble figuring out which layer to use (BatchNormalization or the relu). I want to have a vector of len(4096). Is there a difference of output from the two layers? Which one is the recommended one to use?

if you are doing classification you want the dense_3 layer as the model output. The batch normalization layer and the relu layer each produce an output of shape(2,2,1024). 4096 is the number of trainable parameters for the layer

LSTM Inputs Confusion

I've been trying to understand LSTM inputs for a while now and I think I understand but I keep getting confused on how to implement them.
This is what I think, please correct me if I am wrong.
When specifying an LSTM, you specify the number of cells and the input shape (I've been having issues with the input shape). The Number of cells specifies how many cells should look at the data given and does not affect the required input shape. The input shape (When Stateful) goes by batch size, Timesteps in a batch, and features in a time step. A stateful LSTM retains it's Internal States until reset. Is this right?
If so I'm having so much confusion trying to specify the input shape for my network. This is because I'm trying to upgrade a current network and I cant figure out how and where to specify the input shape without an error.
The way I'm trying to upgrade it is initially I have a CNN going to a dense layer. I'm trying to change it such that it adds an LSTM that takes the CNN's flattened 1D output as one batch and one time step with features dependent on the size of the CNN's Output. Then concatenates its output with the CNN's output (The LSTM's Input) then feeds into the dense layer. Thus it now behaves like LSTM with a skip connection. The issue that I cant seem to understand is when and how to specify the LSTM layer's Input_shape as it has NO INPUT_SHAPE parameter for the functional API? Or maybe I'm just super confused, Everyone uses different API's going over different examples and it gets super confusing what is and isn't specified and how.
Thank you, even if you just help with one of the two parts.
TLDR:
Do I understand LSTM Parameters correctly?
How and when do I specify LSTM Input_shapes if at all?

LSTM units argument means dimensions of LSTM matrices and output shape.
With Functional API you can specify input shape for the very first layer only. If your LSTM layer follows CNN - then its input shape is determined automatically as CNN output.

Change the input size in Keras

I have trained a fully convolutional neural network with Keras. I have used the Functional API and have defined the input layer as Input(shape=(128,128,3)), corresponding to the size of the images in my training set.
However, I want to use the trained model on images of variable sizes (which should be ok because the network is fully convolutional). To do this, I need to change my input layer to Input(shape=(None,None,3)). The obvious way to solve the problem would have been to train my model directly with an input shape of (None,None,3) but I use a custom loss function where I need to specify the size of my training images.
I have tried to define a new input layer and assign it to my model like this :
from keras.engine import InputLayer
input_layer = InputLayer(input_shape=(None, None, 3), name="input")
model.layers[0] = input_layer
This actually changes the size of the input layers accordingly but the following layers still expect (128,128,filters) inputs.
Is there a way to change all of the inputs values at once ?

Create a new model, exactly the same, except for new input shape; and tranfer weights:
newModel.set_weights(oldModel.get_weights())
If anything goes wrong, then it might not be fully convolutional (ex: contains a Flatten layer).

Keras Sequential model input layer

When creating a Sequential model in Keras, I understand you provide the input shape in the first layer. Does this input shape then make an implicit input layer?
For example, the model below explicitly specifies 2 Dense layers, but is this actually a model with 3 layers consisting of one input layer implied by the input shape, one hidden dense layer with 32 neurons, and then one output layer with 10 possible outputs?
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])

Well, it actually is an implicit input layer indeed, i.e. your model is an example of a "good old" neural net with three layers - input, hidden, and output. This is more explicitly visible in the Keras Functional API (check the example in the docs), in which your model would be written as:
inputs = Input(shape=(784,)) # input layer
x = Dense(32, activation='relu')(inputs) # hidden layer
outputs = Dense(10, activation='softmax')(x) # output layer
model = Model(inputs, outputs)
Actually, this implicit input layer is the reason why you have to include an input_shape argument only in the first (explicit) layer of the model in the Sequential API - in subsequent layers, the input shape is inferred from the output of the previous ones (see the comments in the source code of core.py).
You may also find the documentation on tf.contrib.keras.layers.Input enlightening.

It depends on your perspective :-)
Rewriting your code in line with more recent Keras tutorial examples, you would probably use:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=784))
model.add(Dense(10, activation='softmax')
...which makes it much more explicit that you only have 2 Keras layers. And this is exactly what you do have (in Keras, at least) because the "input layer" is not really a (Keras) layer at all: it's only a place to store a tensor, so it may as well be a tensor itself.
Each Keras layer is a transformation that outputs a tensor, possibly of a different size/shape to the input. So while there are 3 identifiable tensors here (input, outputs of the two layers), there are only 2 transformations involved corresponding to the 2 Keras layers.
On the other hand, graphically, you might represent this network with 3 (graphical) layers of nodes, and two sets of lines connecting the layers of nodes. Graphically, it's a 3-layer network. But "layers" in this graphical notation are bunches of circles that sit on a page doing nothing, whereas a layers in Keras transform tensors and do actual work for you. Personally, I would get used to the Keras perspective :-)
Note finally that for fun and/or simplicity, I substituted input_dim=784 for input_shape=(784,) to avoid the syntax that Python uses to both confuse newcomers and create a 1-D tuple: (<value>,).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.