Tensorflow Resnet example: Is the bottleneck size wrong? - python

The following code is from the Tensorflow Resnet example at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/resnet.py:
# Create the bottleneck groups, each of which contains `num_blocks`
# bottleneck groups.
for group_i, group in enumerate(groups):
for block_i in range(group.num_blocks):
name = 'group_%d/block_%d' % (group_i, block_i)
# 1x1 convolution responsible for reducing dimension
with tf.variable_scope(name + '/conv_in'):
conv = tf.layers.conv2d(
net,
filters=group.num_filters,
kernel_size=1,
padding='valid',
activation=tf.nn.relu)
conv = tf.layers.batch_normalization(conv, training=training)
with tf.variable_scope(name + '/conv_bottleneck'):
conv = tf.layers.conv2d(
conv,
filters=group.bottleneck_size,
kernel_size=3,
padding='same',
activation=tf.nn.relu)
conv = tf.layers.batch_normalization(conv, training=training)
# 1x1 convolution responsible for restoring dimension
with tf.variable_scope(name + '/conv_out'):
input_dim = net.get_shape()[-1].value
conv = tf.layers.conv2d(
conv,
filters=input_dim,
kernel_size=1,
padding='valid',
activation=tf.nn.relu)
conv = tf.layers.batch_normalization(conv, training=training)
# shortcut connections that turn the network into its counterpart
# residual function (identity shortcut)
net = conv + net
This piece of code runs for each block, with a given output and bottleneck dimension. These blocks are defined as:
# Configurations for each bottleneck group.
BottleneckGroup = namedtuple('BottleneckGroup',
['num_blocks', 'num_filters', 'bottleneck_size'])
groups = [
BottleneckGroup(3, 128, 32), BottleneckGroup(3, 256, 64),
BottleneckGroup(3, 512, 128), BottleneckGroup(3, 1024, 256)
]
In the common practice, as far as I know, these so-called bottleneck layers of the ResNets first reduce the input channel count with 1x1 kernels, applies higher order (3x3) convolutions in that reduced channel size and then restores to the input channel size back with a final 1x1 convolutional layer, as given in the original ResNet paper:
But in the Tensorflow example, the first 1x1 layer uses the block output size, not the bottleneck size, hence no meaningful channel reduction is made. Is the Tensorflow example really wrong here, or am I missing something?

Related

PyTorch CNN linear layer shape after conv2d [duplicate]

This question already has answers here:
Pytorch - Inferring linear layer in_features
(2 answers)
Closed 1 year ago.
I was trying to learn PyTorch and came across a tutorial where a CNN is defined like below,
class Net(Module):
def __init__(self):
super(Net, self).__init__()
self.cnn_layers = Sequential(
# Defining a 2D convolution layer
Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
# Defining another 2D convolution layer
Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
)
self.linear_layers = Sequential(
Linear(4 * 7 * 7, 10)
)
# Defining the forward pass
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
I understood how the cnn_layers are made. After the cnn_layers, the data should be flattened and given to linear_layers.
I don't understand how the number of features to Linear is 4*7*7. I understand that 4 is the output dimension from the last Conv2d layer.
How is 7*7 coming in to picture? Does stride or padding got any role in that?
Input image shape is [1, 28, 28]
Conv2d layers have a kernel size of 3, stride and padding of 1, which means it doesn't change the spatial size of an image. There are two MaxPool2d layers which reduce the spatial dimensions from (H, W) to (H/2, W/2). So, for each batch, output of the last convolution with 4 output channels has a shape of (batch_size, 4, H/4, W/4). In the forward pass feature tensor is flattened by x = x.view(x.size(0), -1) which makes it in the shape (batch_size, H*W/4). I assume H and W are 28, for which the linear layer would take inputs of shape (batch_size, 196).
Actually,
in the 2D convolution layers features [values] in a matric [2D-tensor],
As usual neural network end up with a fully connected layer followed by the logist later.
so, features in the fully-connected layer in the vector [1D-tensor].
therefore we have to map each feature [value] in the last metric into the fully-connected layer follows.
in pytorch implementation of the fully-connected layer is Linear class.
the first parameter is the number of input features:
in this case
input_image : (28,28,1)
after_Conv2d_1 : (28,28,4) <- because of the padding : if padding := 0 then (26,26,1)
after_maxPool_1 : (14,14,4) <- due to the stride of 2
after_Conv2D_2 : (14,14,4) <- because this is "same" padding
after_maxPool_2 : (7,7,4)
in the end, the total number of features before the fully connected layer is 4*7*7.
Also, here shows why we use an odd number for the kernel size and start from images with even number of pixels

TensorFlow Keras MaxPool2D breaks LSTM with CTC loss?

I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.
During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:
# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, calling fit works!
But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64, similar to the one presented here.
# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1
# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
# [[{{node ctc_loss_1/CTCLoss}}]]
After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.
TL;DR Solution:
If you're using ctc_batch_cost:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.
If you're using ctc_loss:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.
Solution:
The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.
The TensorFlow documentation for ctc_batch_cost reads:
tf.keras.backend.ctc_batch_cost(
y_true, y_pred, input_length, label_length
)
...
input_length tensor (samples, 1) containing the sequence length for
each batch item in y_pred.
...
input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation:
tf.nn.ctc_loss(
labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
blank_index=None, name=None
)
...
logit_length tensor of shape [batch_size] Length of input sequence in
logits.
...
That's where it clicked, at the word logit. So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (i.e. number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.
I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.
Now fit runs without crashing.

How to convert a tensorflow model to a pytorch model?

I'm new to pytorch. Here's an architecture of a tensorflow model and I'd like to convert it into a pytorch model.
I have done most of the codes but am confused about a few places.
1) In tensorflow, the Conv2D function takes filter as an input. However, in pytorch, the function takes the size of input channels and output channels as inputs. So how do I find the equivalent number of input channels and output channels, provided with the size of the filter.
2) In tensorflow, the dense layer has a parameter called 'nodes'. However, in pytorch, the same layer has 2 different inputs (the size of the input parameters and size of the targeted parameters), how do I determine them based on the number of the nodes.
Here's the tensorflow code.
from keras.utils import to_categorical
from keras.models import Sequential, load_model
from keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Dropout
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu', input_shape=X_train.shape[1:]))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(43, activation='softmax'))
Here's my code.:
import torch.nn.functional as F
import torch
# The network should inherit from the nn.Module
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# Define 2D convolution layers
# 3: input channels, 32: output channels, 5: kernel size, 1: stride
self.conv1 = nn.Conv2d(3, 32, 5, 1) # The size of input channel is 3 because all images are coloured
self.conv2 = nn.Conv2d(32, 64, 5, 1)
self.conv3 = nn.Conv2d(64, 128, 3, 1)
self.conv3 = nn.Conv2d(128, 256, 3, 1)
# It will 'filter' out some of the input by the probability(assign zero)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
# Fully connected layer: input size, output size
self.fc1 = nn.Linear(36864, 128)
self.fc2 = nn.Linear(128, 10)
# forward() link all layers together,
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = self.conv3(x)
x = F.relu(x)
x = self.conv4(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Thanks in advance!
1) In pytorch, we take input channels and output channels as an input. In your first layer, the input channels will be the number of color channels in your image. After that it's always going to be the same as the output channels from your previous layer (output channels are specified by the filters parameter in Tensorflow).
2). Pytorch is slightly annoying in the fact that when flattening your conv outputs you'll have to calculate the shape yourself. You can either use an equation to calculate this (𝑂𝑢𝑡=(𝑊−𝐹+2𝑃)/𝑆+1), or make a shape calculating function to get the shape of a dummy image after it's been passed through the conv part of the network. This parameter will be your size of input argument; the size of your output argument will just be the number of nodes you want in your next fully connected layer.

ValueError: Input 0 is incompatible with layer lstm_14: expected ndim=3, found ndim=2

I am building a cnn_rnn network for image classification. I am getting an error while running the following python code in my jupyter notebook.
# model
model1 = Sequential()
# first convolutional layer
model1.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape(160, 120, 3)))
# second convolutional layer
model1.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
#Adding a pooling Layer
model1.add(MaxPooling2D(pool_size=(3, 3)))
#Adding dropouts
model1.add(Dropout(0.25))
# flatten and put a fully connected layer
model1.add(Flatten())
model1.add(Dense(32, activation='relu')) # fully connected
#Adding RNN N/W
model1.add(LSTM(32, return_sequences=True))
model1.add(TimeDistributed(Dense(5, activation='softmax')))
I also tried adding input_shape=(160, 120, 3) as a parameter to the LSTM function but to no avail. Please Help!
P.S: I also tried using GRU instead of LSTM but got the same error.
Update: Please note the model.summary() results
enter image description here
Your error is due to your use of Flatten and Dense BEFORE the LSTM layer.
LSTM layers require the input to be in the shape of (Batchsize x Length x Feature depth0) (or some variant), whereas your flatten changes the Conv2D output from (B x H x W x F) to (B x W * F * H) if that makes sense. If you want to use this architecture, I'd recommend using a Reshape layer to flatten the dimension you want, and using a Conv1D of kernel size 1 (same as a fully-connected layer) before the LSTM layer.
Or, if you want to use this exact code, add this before your LSTM layer and it should work:
model1.add(Reshape(target_shape=(1,54*40,32))
It's 54 and 40 due a pool_size of (3,3).

Tensorflow softmax function returning one-hot encoded array

I have this piece of code which computes the softmax function on the output predictions from my convnet.
pred = conv_net(x, weights, biases, keep_prob, batchSize)
softmax = tf.nn.softmax(pred)
My prediction array is of shape [batch_size, number_of_classes] = [128,6]
An example row from this array is...
[-2.69500896e+08 4.84445800e+07 1.99136800e+08 6.12981480e+07
2.33545440e+08 1.19338824e+08]
After running the softmax function I will get a result that is a one hot encoded array...
[ 0 0 0 0 1 0 ]
I would think this is because I am taking the exponential of very large values. I was just wondering if I am doing something wrong or if I should be scaling my values first before applying the softmax function. My loss function is
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
and I am minimizing this with the the Adam Optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
My network is able to learn just fine.
My reasoning for applying the softmax function is to obtain the probability values for each class on the test data.
EDIT
It seems to fix these very large values for my softmax function I should add normalization and regularization. I have added the design code for my convnet and any help on where to place regularization and normalization would be great.
# Create model
def conv_net(x, weights, biases, dropout, batchSize):
# Reshape input picture
x = tf.reshape(x, shape=[-1, 150, 200, 1])
x = tf.random_crop(x, size=[batchSize, 128, 192, 1])
# Convolution Layer 1
conv1 = conv2d(x, weights['wc1'], biases['bc1'])
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)
# Convolution Layer 2
conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
# Convolution Layer 3
conv3 = conv2d(conv2, weights['wc3'], biases['bc3'])
# Max Pooling (down-sampling)
conv3 = maxpool2d(conv3, k=2)
# Convolution Layer 4
conv4 = conv2d(conv3, weights['wc4'], biases['bc4'])
# Max Pooling (down-sampling)
conv4 = maxpool2d(conv4, k=2)
# Convolution Layer 5
conv5 = conv2d(conv4, weights['wc5'], biases['bc5'])
# Max Pooling (down-sampling)
conv5 = maxpool2d(conv5, k=2)
# Fully connected layer
# Reshape conv5 output to fit fully connected layer input
fc1 = tf.reshape(conv5, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)
# Output, class prediction
out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
return out
You have a serious need for some regularization. Your outputs are on the order of 10^8. Usually, we deal with much smaller numbers. If you add more regularization your classifier won't be so certain about everything and it won't give outputs that look like one-hot encodings.
The one-hot encoded array problem can happen for multiple reasons:
Weight initialization is too big (try to initialize the weights with a smaller scale, like stddev=1e-2)
Add regularization to your loss function for all the weights Ej:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_true,logits=y_conv)+beta*tf.nn.l2_loss(W_conv1) +beta*tf.nn.l2_loss(W_conv2) +beta*tf.nn.l2_loss(W_fc1)+beta*tf.nn.l2_loss(W_fc2))
Add dropout
Combine dropout with some L2/L1 regularization technique

Categories