How to vectorize LSTMs? - python

In particular, I'm confused about what it means for an LSTM layer to have (say) 50 cells. Consider the following LSTM block from this awesome blog post:
Say my input xt is a (20,) vector and the hidden layer ht is a (50,) vector. Given that the cell state Ct undergoes only point-wise operations (point-wise tanh and *) before becoming the new hidden state, I gather that Ct.shape = ht.shape = (50,). Now the forget gate looks at the input concatenated with the hidden layer, which would be a (20+50,) = (70,) vector, which means the forget gate must have a weight matrix of shape (50, 70), such that dot(W, [xt, ht]).shape = (50,).
So my question at this point is that, am I looking at a LSTM block with 50 cells when Ct.shape = (50,)? Or am I misunderstanding what it means for a LSTM layer to have 50 cells?

I understand what you are getting confused with. So basically, the black line connecting the two boxes at the top which represents the cell state is actually a set of very small 50 lines grouped together. These get multiplied point wise with the output of the forget gate which has an output consisting of 50 values. These 50 values multiply with the cell state point wise.

Related

How point wise multiplication and addition takes place in LSTM?

timesteps=4, features=2 and LSTM(units=3)
Input to lstm will be (batch_size, timesteps, features) dimensions of the hidden State and cell state will be (None, unit in LSTM). When lstm take the input at t1 will get concatenated with the hidden State at features axis. After that it will pass through sigmoid and tanh. Now my main confusion is "how will it do the point Wise operation(addition or multiplication) with the cell state".
How I look at this is shown in the attached figure. If there's anything which I am taking wrong kindly make the correction. Thanks in advance to everyone.

Tensorflow CNN for different input size

I'm trying to make conv network for image regression.
As shown in below, one image [224 x 224] has one GT value {x}.
It's easy to make train [224 x 224] and valid/test with [224 x 224] images.
However, I'd like to apply CNN for different image sizes.
For example, [224 x 229] image, I want to get 5 regression values 'at once'.
Simply, I can do that by just sliding windows of [224 x 224] x 5 times, but apparently it is too slow.
I think using conv for different image size is possible. But FCL is not.
If I change image size to [455 x 256]
lhs shape= [4608,1024] rhs shape= [2048,1024]
error occurred. Is there any way to handle it?
Fully connected layers have a fixed size input. Thus, changing the input size will cause a wrong-size error.
One way to tackle this problem, and allow for different image sizes is to use a fully convolutional network.
An example with easy numbers:
Assuming for example the conv layer's output is of size 16X16, you can create a "classifier layer" of size 4x4 with stride 4, that would output for each of the 4 4x4 squares comprising the 16x16 feature map, a single value per dimension. Such filter would be of size 4x4xn_dim, in your case n_dim will be 5, and the final output would be of size 4x4x5, corresponding to 5 outputs (one for each regression value) for each 4x4 square.
You will notice you can play with the shape of the last conv filter to obtain different sizes for the final output, corresponding to different parts of the input image, but really, looking at all of it.
You can work out the numbers for your own example.
You probably would like to read about basic methods for semantic segmentaion.
Also see basic fully conv nets.

Stuck understanding ResNet's Identity block and Convolutional blocks

I'm learning Residual Networks (ResNet50) from Andrew Ng coursera lectures. I understand that one of the main reasons why ResNets work is that they can learn identity function and that's why adding more and more layers in network does not hurt the performance of the network.
Now as described in lectures, there are two type of blocks are used in ResNets: 1) Identity block and Convolutional block.
Identity Block is used when there is no change in input and output dimensions. Convolutional block is almost same as identity block but there is a convolutional layer in short-cut path to just change the dimension such that the dimension of input and output matches.
Here is identity block:
and here is convolutional block:
Now in implementation of convolutional block (2nd image), First block (i.e. conv2d --> BatchNorm --> ReLu is implemented with 1x1 convolution and stride > 1.
# First component of main path
X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', padding = 'valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)
I don't understand the reason behind keeping stride > 1 with window size 1. Isn't it just data loss? We are just considering alternate pixels in this case.
What should be the possible reason for such hyperparameter selection? Any intuitive explanation will help! Thanks.
I don't understand the reason behind keeping stride > 1 with window
size 1. Isn't it just data loss?
Please refer the section on Deeper Bottleneck Architectures in the resnet paper. Also, Figure 5.
https://arxiv.org/pdf/1512.03385.pdf
1 x 1 convolutions are typically used to increase or decrease the dimensionality along the filter dimension. So, in the bottleneck architecture the first 1 x 1 layer reduces the dimensions so that the 3 x 3 layer needs to handle smaller input/output dimensions. Then the final 1 x 1 layer increases the filter dimensions again.
It's done to save on computation/training time.
From the paper,
"Because of concerns on the training time that we can afford, we modify the building block as a bottleneck design".
I believe you might have answered your own question. The convolutional block is used whenever you need to change the dimension in order for the output and input dimensions to match. That being said, how do you change the dimension of a certain volume using convolutions? Well, you change the stride.
For any given convolution operation, assuming a square input, the dimension of the output volume can be obtained through the formula (n+2p-f)/s +1, where n is the input dimension, p is your zero-padding, f the filter dimension and s is the stride. By increasing the stride you're effectively reducing the dimension of your shortcut's output volume, and thus, it can be used in such a way as to make sure that the dimensions of your shortcut and lower paths will match in order for the final sum to be performed.
Why is it >1 then? Well, if you didn't need a stride larger than one, you wouldn't be needing a dimension alteration in the first place and therefore would be using the identity block instead.

How to calculate output dimensions in CNN if you specify number of outputs

I am having trouble figuring out what the dimensions of each CNN layer is.
Let's say my input is a vector which I then projected onto a 4x4x256 matrix using a fully-connected layer as so...
zP = slim.fully_connected(
z,
4*4*256,
normalizer_fn=slim.batch_norm,
activation_fn=tf.nn.relu,
scope='g_project',
weights_initializer=initializer
)
# Layer is reshaped to a 4x4x256 mapping.
zCon = tf.reshape(zP,[-1,4,4,256])
Where z was my original vector. I then take this 4x4x256 matrix and feed it into a CNN...
gen1 = slim.convolution2d_transpose(
zCon,
num_outputs=64,
kernel_size=[5,5],
stride=[2,2],
padding="SAME",
normalizer_fn=slim.batch_norm,
activation_fn=tf.nn.relu,
scope='g_conv1',
weights_initializer=initializer
)
As you can see I used a convolutional 2d transpose and I specified the output as 64, with a stride of 2 and a filter size of 5. This means that I know one of my dimension will be 64, however I do not know what the other 2 dimensions will be and I do not know how to calculate it.
I tried using the following formula but it is not working out for me...
How can I calculate the remaining dimensions?
The formula you have written is for the Convolution operation, since you need to calculate for the transposed convolution where the shapes are inverse of convolution, the formula can be derived from the above equation by re-arranging the terms as:
W = (Out-1)*S + F - 2P
W is your actual output and Out is your actual input to the transpose convolution.

How to interpret clearly the meaning of the units parameter in Keras?

I am wondering how LSTM work in Keras. In this tutorial for example, as in many others, you can find something like this :
model.add(LSTM(4, input_shape=(1, look_back)))
What does the "4" mean. Is it the number of neuron in the layer. By neuron, I mean something that for each instance gives a single output ?
Actually, I found this brillant discussion but wasn't really convinced by the explanation mentioned in the reference given.
On the scheme, one can see the num_unitsillustrated and I think I am not wrong in saying that each of this unit is a very atomic LSTM unit (i.e. the 4 gates). However, how these units are connected ? If I am right (but not sure), x_(t-1)is of size nb_features, so each feature would be an input of a unit and num_unit must be equal to nb_features right ?
Now, let's talk about keras. I have read this post and the accepted answer and get trouble. Indeed, the answer says :
Basically, the shape is like (batch_size, timespan, input_dim), where input_dim can be different from the unit
In which case ? I am in trouble with the previous reference...
Moreover, it says,
LSTM in Keras only define exactly one LSTM block, whose cells is of unit-length.
Okay, but how do I define a full LSTM layer ? Is it the input_shape that implicitely create as many blocks as the number of time_steps (which, according to me is the first parameter of input_shape parameter in my piece of code ?
Thanks for lighting me
EDIT : would it also be possible to detail clearly how to reshape data of, say, size (n_samples, n_features) for a stateful LSTM model ? How to deal with time_steps and batch_size ?
First, units in LSTM is NOT the number of time_steps.
Each LSTM cell(present at a given time_step) takes in input x and forms a hidden state vector a, the length of this hidden unit vector is what is called the units in LSTM(Keras).
You should keep in mind that there is only one RNN cell created by the code
keras.layers.LSTM(units, activation='tanh', …… )
and RNN operations are repeated by Tx times by the class itself.
I've linked this to help you understand it better in with a very simple code.
You can (sort of) think of it exactly as you think of fully connected layers. Units are neurons.
The dimension of the output is the number of neurons, as with most of the well known layer types.
The difference is that in LSTMs, these neurons will not be completely independent of each other, they will intercommunicate due to the mathematical operations lying under the cover.
Before going further, it might be interesting to take a look at this very complete explanation about LSTMs, its inputs/outputs and the usage of stative = true/false: Understanding Keras LSTMs. Notice that your input shape should be input_shape=(look_back, 1). The input shape goes for (time_steps, features).
While this is a series of fully connected layers:
hidden layer 1: 4 units
hidden layer 2: 4 units
output layer: 1 unit
This is a series of LSTM layers:
Where input_shape = (batch_size, arbitrary_steps, 3)
Each LSTM layer will keep reusing the same units/neurons over and over until all the arbitrary timesteps in the input are processed.
The output will have shape:
(batch, arbitrary_steps, units) if return_sequences=True.
(batch, units) if return_sequences=False.
The memory states will have a size of units.
The inputs processed from the last step will have size of units.
To be really precise, there will be two groups of units, one working on the raw inputs, the other working on already processed inputs coming from the last step. Due to the internal structure, each group will have a number of parameters 4 times bigger than the number of units (this 4 is not related to the image, it's fixed).
Flow:
Takes an input with n steps and 3 features
Layer 1:
For each time step in the inputs:
Uses 4 units on the inputs to get a size 4 result
Uses 4 recurrent units on the outputs of the previous step
Outputs the last (return_sequences=False) or all (return_sequences = True) steps
output features = 4
Layer 2:
Same as layer 1
Layer 3:
For each time step in the inputs:
Uses 1 unit on the inputs to get a size 1 result
Uses 1 unit on the outputs of the previous step
Outputs the last (return_sequences=False) or all (return_sequences = True) steps
The number of units is the size (length) of the internal vector states, h and c of the LSTM. That is no matter the shape of the input, it is upscaled (by a dense transformation) by the various kernels for the i, f, and o gates. The details of how the resulting latent features are transformed into h and c are described in the linked post. In your example, the input shape of data
(batch_size, timesteps, input_dim)
will be transformed to
(batch_size, timesteps, 4)
if return_sequences is true, otherwise only the last h will be emmited making it (batch_size, 4). I would recommend using a much higher latent dimension, perhaps 128 or 256 for most problems.
I would put it this way - there are 4 LSTM "neurons" or "units", each with 1 Cell State and 1 Hidden State for each timestep they process. So for an input of 1 timestep processing , you will have 4 Cell States, and 4 Hidden States and 4 Outputs.
Actually the correct way to say this is - for one timestep sized input you 1 Cell State (a vector of size 4) and 1 Hidden State (a vector of size 4) and 1 Output (a vector of size 4).
So if you feed in a timeseries with 20 steps, you will have 20 (intermediate) Cell States, each of size 4. That is because the inputs in LSTM are processed sequentially, 1 after the other. Similarly you will have 20 Hidden States, each of size 4.
Usually, your output will be the output of the LAST step (a vector of size 4). However in case you want the outputs of each intermediate step(remember you have 20 timesteps to process), you can make return_sequences = TRUE. In which case you will have 20 , 4 sized vectors each telling you what was the output when each of those steps got processed as those 20 inputs came one after the other.
In case when you put return_states = TRUE , you get the last Hidden State of size = 4 and last Cell State of size 4.

Categories