Using tf.image.pyramids during training to create downsampled feature maps - python

I'm attempting to use tf.image.pyramids.downsample from tensorflow_graphics in an auto-encoder model in every Down (encoding) block, to be then sent as a skip connection to Up (decoder) blocks.
class DownConv(Model):
n = 0
def __init__(self, kernel_size, filters, initializer, n_lower_levels):
super(DownConv, self).__init__(name=f"DownConv_{DownConv.n}")
DownConv.n += 1
self.pad = tf.constant([[0, 0], [kernel_size // 2, kernel_size // 2], [kernel_size // 2, kernel_size // 2], [0, 0]])
self.conv = L.Conv2D(filters, kernel_size, strides=2, kernel_initializer=initializer)
self.pyramid = None
self.filters = filters
self.n_lower_levels = n_lower_levels
def call(self, input_t):
logger.debug(f"Received {input_t.shape} in {self.name}")
x = tf.pad(input_t, self.pad, "SYMMETRIC")
x = self.conv(x)
p = tf.Variable(x)
self.pyramid = downsample(p, self.n_lower_levels)
pyramods = ", ".join([str(p.shape) for p in self.pyramid])
logger.debug(f"Received {input_t.shape} in {self.name}")
logger.debug(f"Generated pyramids: {pyramods}")
return tf.nn.selu(x)
However, thanks to logging I found out that this doesn't work. It seems only the very first pyramid (the first step of the downsample) contains the channels, the rest of them have None for channels.
self.pyramid[0].shape yields the correct (None, 256, 256, 64), but self.pyramid[1] yields (None, 256, 256, None) during a training step. Note that batches are correctly None here for axis=0, it is normal Tensorflow behavior for error logs.
Due to this issue, the training step produces an error in my Up blocks, when it tries to concatenate the two feature maps:
ValueError: The channel dimension of the inputs should be defined. The input_shape received is (None, 32, 32, None), where axis -1 (0-based) is the channel dimension, which found to be `None`.
Call arguments received:
• input_t=tf.Tensor(shape=(None, 32, 32, 256), dtype=float32)

Related

Dimensionality problem with PyTorch Conv layers

I'm trying to train a neural network in PyTorch with some input signals. The layers are conv1d. The shape of my input is [100, 10], meaning 100 signals of a length of 10.
But when I execute the training, I have this error:
Given groups=1, weight of size [100, 10, 1], expected input[1, 1, 10] to have 10 channels, but got 1 channels instead
config = [10, 100, 100, 100, 100, 100, 100, 100]
batch_size = 1
epochs = 10
learning_rate = 0.001
kernel_size = 1
class NeuralNet(nn.Module):
def __init__(self, config, kernel_size=1):
super().__init__()
self.config = config
self.layers = nn.ModuleList([nn.Sequential(
nn.Conv1d(self.config[i], self.config[i + 1], kernel_size = kernel_size),
nn.ReLU())
for i in range(len(self.config)-1)])
self.last_layer = nn.Linear(self.config[-1], 3)
self.layers.append(nn.Flatten())
self.layers.append(self.last_layer)
def forward(self, x):
for i, l in enumerate(self.layers):
x = l(x)
return x
def loader(train_data, batch_size):
inps = torch.tensor(train_data[0])
tgts = torch.tensor(train_data[1])
inps = torch.unsqueeze(inps, 1)
dataset = TensorDataset(inps, tgts)
train_dataloader = DataLoader(dataset, batch_size = batch_size)
return train_dataloader
At first, my code was without the unsqueez(inps) line and I had the exact same error, but then I added this line thinking that I must have an input of size (num_examples, num_channels, lenght_of_signal) but it didn't resolve the problem at all.
Thank you in advance for your answers
nn.Conv1d expects input with shape of form (batch_size, num_of_channels, seq_length). It's parameters allow to directly set number of ouput channels (out_channels) and change length of output using, for example, stride. For conv1d layer to work correctly it should know number of input channels (in_channels), which is not the case on first convolution: input.shape == (batch_size, 1, 10), therefore num_of_channels = 1, while convolution in self.layers[0] expects this value to be equal 10 (because in_channels set by self.config[0] and self.config[0] == 10). Hence to fix this append one more value to config:
config = [10, 100, 100, 100, 100, 100, 100, 100] # as in snippet above
config = [1] + config
At this point convs should be working fine, but there is another obstacle in self.layers -- linear layer at the end. So if kernel_size of 1 was used, then after final convolution batch will have shape (batch_size, 100, 10), and after flatten (batch_size, 100 * 10), while last_layer expects input of shape (batch_size, 100). So, if length of sequence after final conv layer is known (which is certainly the case if you're using kernel_size of 1 with default stride of 1 and default padding of 0 -- length stays same), last_layer should be defined as:
self.last_layer = nn.Linear(final_length * self.config[-1], 3)
and in snippet above final_length can be set to 10 (since conditions in previous brackets satisfied). To catch idea of how shapes in conv1d transformed take look at simple example in gif below (here batch_size is equal to 1):

Pytorch Conv1D gives different size to ConvTranspose1d

I am trying to build a basic/shallow CNN auto-encoder for 1D time series data in pytorch/pytorch-lightning.
Currently, my encoding block is:
class encodingBlock(nn.Module):
def __init__(self):
super().__init__()
self.conv1d_1 = nn.Conv1d(1, 64, kernel_size=32)
self.relu = nn.ReLU()
self.batchnorm = nn.BatchNorm1d(64)
self.maxpool = nn.MaxPool1d(kernel_size=2, stride=2, return_indices=True)
self.fc = nn.Linear(64, 4)
def forward(self, x):
cnn_out1 = self.conv1d_1(x)
norm_out1 = self.batchnorm(cnn_out1)
relu_out1 = self.relu(norm_out1)
maxpool_out, indices = self.maxpool(relu_out1)
gap_out = torch.mean(maxpool_out, dim = 2)
fc_out = self.relu(self.fc(gap_out))
return fc_out, indices
And my decoding block is:
class decodingBlock(nn.Module):
def __init__(self):
super().__init__()
self.Tconv1d_1 = nn.ConvTranspose1d(64, 1, kernel_size=32, output_padding=1)
self.relu = nn.ReLU()
self.batchnorm = nn.BatchNorm1d(1)
self.maxunpool = nn.MaxUnpool1d(kernel_size=2, stride=2)
self.upsamp = nn.Upsample(size=59, mode='nearest')
self.fc = nn.Linear(4, 64)
def forward(self, x, indices):
fc_out = self.fc(x)
relu_out = self.relu(fc_out)
relu_out = relu_out.unsqueeze(dim = 2)
upsamp_out = self.upsamp(relu_out)
maxpool_out = self.maxunpool(upsamp_out, indices)
cnnT_out = self.Tconv1d_1(maxpool_out)
norm_out = self.batchnorm(cnnT_out)
relu_out = self.relu(norm_out)
return relu_out
However, looking at the outputs:
Input size: torch.Size([1, 1, 150])
Conv1D out size: torch.Size([1, 64, 119])
Maxpool out size: torch.Size([1, 64, 59])
Global average pooling out size: torch.Size([1, 64])
Encoder dense out size: torch.Size([1, 4])
...
Decoder input: torch.Size([1, 4])
Decoder dense out size: torch.Size([1, 64])
Unsqueeze out size: torch.Size([1, 64, 1])
Upsample out size: torch.Size([1, 64, 59])
Decoder maxunpool out size: torch.Size([1, 64, 118])
Transpose Conv out size: torch.Size([1, 1, 149])
The outputs from the MaxUnpool1d and ConvTranspose1d layers are not the expected dimension.
I have two questions that I was hoping to get some help on:
Why are the dimensions wrong?
Is there a better way to "reverse" the global average pooling than the upsampling procedure I have used?
1. Regarding input and output shapes:
pytorch's doc has the explicit formula relating input and output sizes.
For convolution:
Similarly for pooling:
For transposed convolution:
And for unpooling:
Make sure your padding and output_padding values add up to the proper output shape.
2. Is there a better way?
Transposed convolution has its faults, as you already noticed. It also tends to produce "checkerboard artifacts".
One solution is to use pixelshuffle: that is, predict for each low-res point twice the number of channels, and then split them into two points with the desired number of features.
Alternatively, you can interpolate using a fixed method from the low resolution to the higher one. Apply regular convolutions to the upsampled vectors.
If you choose this path, you might consider using ResizeRight instead of pytorch's interpolate - it has better handling of edge cases.

Issues with the output size of a Many-to-Many CNN-LSTM in PyTorch

I am trying to build a binary temporal image classifier by combining ResNet18 and an LSTM. However, I have never really used RNNs before and have been struggling on getting the correct output shape.
I am using a batch size of 128 and a sequence size of 32. The images are 80x80 grayscale images.
The current model is:
class CNNLSTM(nn.Module):
def __init__(self):
super(CNNLSTM, self).__init__()
self.resnet = models.resnet18(pretrained=False)
self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
self.resnet.fc = nn.Sequential(nn.Linear(in_features=512, out_features=256, bias=True))
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3)
self.fc1 = nn.Linear(256, 128)
self.fc2 = nn.Linear(128, 1)
def forward(self, x_3d):
#x3d: torch.Size([128, 32, 1, 80, 80])
hidden = None
toret = []
for t in range(x_3d.size(1)):
x = self.resnet(x_3d[:, t, :, :, :])
out, hidden = self.lstm(x.unsqueeze(0), hidden)
x = self.fc1(out[-1, :, :])
x = F.relu(x)
x = self.fc2(x)
print("x shape: ", x.shape)
toret.append(x)
return torch.stack(toret)
Which returns a tensor of shape torch.Size([32, 128, 1]) which, according to what I understand, means that every nth row represents the nth time step of each element in the sequence.
How can I get output of shape 128x1x32 instead?
And is there a better way to do this?
You could permute the dimensions:
a = torch.rand(32, 128, 1)
a = a.permute(1, 2, 0) # these are the indices of the original dimensions
print(a.shape)
>> torch.Size([128, 1, 32])
But you could also set batch_first=True in the LSTM module:
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3, batch_first=True)
This will expect that the input to the LSTM has the shape batch-size x seq-len x features and will output a tensor in the same way.

Why return self.head(x.view(x.size(0), -1)) in the nn.Module for pyTorch reinforcement learning example

I understand that the balancing the pole example requires 2 outputs. Reinforcement Learning (DQN) Tutorial
Here is the output for self.head
print ('x',self.head)
x = Linear(in_features=512, out_features=2, bias=True)
When I run the epochs below is the outputs:
print (self.head(x.view(x.size(0), -1)))
return self.head(x.view(x.size(0), -1))
tensor([[-0.6945, -0.1930]])
tensor([[-0.0195, -0.1452]])
tensor([[-0.0906, -0.1816]])
tensor([[ 0.0631, -0.9051]])
tensor([[-0.0982, -0.5109]])
...
The size of x is:
x = torch.Size([121, 32, 2, 8])
So I am trying to understand what x.view(x.size(0), -1) is doing?
I understand from the comment in the code that it's returning:
Returns tensor([[left0exp,right0exp]...]).
But how does x which is torch.Size([121, 32, 2, 8]) being reduced to a tensor of size 2?
Is there an alternative way of writing that makes more sense? What if I had 4 outputs. How would I represent that? Why x.size(0). Why -1?
So appears to take self.head with 4 outputs to 2 outputs. Is that correct?
At the bottom is that class I am referring:
class DQN(nn.Module):
def __init__(self, h, w, outputs):
super(DQN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=5, stride=2)
self.bn1 = nn.BatchNorm2d(16)
self.conv2 = nn.Conv2d(16, 32, kernel_size=5, stride=2)
self.bn2 = nn.BatchNorm2d(32)
self.conv3 = nn.Conv2d(32, 32, kernel_size=5, stride=2)
self.bn3 = nn.BatchNorm2d(32)
# Number of Linear input connections depends on output of conv2d layers
# and therefore the input image size, so compute it.
def conv2d_size_out(size, kernel_size = 5, stride = 2):
return (size - (kernel_size - 1) - 1) // stride + 1
convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w)))
convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h)))
linear_input_size = convw * convh * 32
self.head = nn.Linear(linear_input_size, outputs)
# Called with either one element to determine next action, or a batch
# during optimization. Returns tensor([[left0exp,right0exp]...]).
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
x = F.relu(self.bn3(self.conv3(x)))
return self.head(x.view(x.size(0), -1))
x.view(x.size(0), -1) is flattening the tensor, this is because the Linear layer only accepts a vector (1d array). To break it down, x.view() reshapes the tensor of the specified shape (more info). x.shape(0) returns 1st dimension of the tensor (which is the batch size, this should remain the constant). The -1 in x.view() is a filler, in other words, its dimensions that we don't know, so PyTorch automatically calculates it. For example, if x = torch.tensor([1,2,3,4]), to reshape the tensor to a 2x2, you could do x.view(2,2) or x.view(2,-1) or x.view(-1,2).
The output shape is not a tensor shape of 2, but that of 121,2 (the 121 is the batch size, and the 2 comes from the Linear layers output). So to change the output size from 2, to 4, you would have to change the outputs argument in the __init__ function to 4.

Tensorflow conv2d_transpose output_shape

I want to implement a Generative adversarial network (GAN) with unfixed input size, like 4-D Tensor (Batch_size, None, None, 3).
But when I use conv2d_transpose, there is a parameter output_shape, this parameter must pass the true size after deconvolution opeartion.
For example, if the size of batch_img is (64, 32, 32, 128), w is weight with (3, 3, 64, 128) , after
deconv = tf.nn.conv2d_transpose(batch_img, w, output_shape=[64, 64, 64, 64],stride=[1,2,2,1], padding='SAME')
So, I get deconv with size (64, 64, 64, 64), it's ok if I pass the true size of output_shape.
But, I want to use unfixed input size (64, None, None, 128), and get deconv with (64, None, None, 64).
But, it raises an error as below.
TypeError: Failed to convert object of type <type'list'> to Tensor...
So, what can I do to avoid this parameter in deconv? or is there another way to implement unfixed GAN?
The output shape list does not accept to have None in the list because the None object can not be converted to a Tensor Object
None is only allowed in shapes of tf.placeholder
for varying size output_shape instead of None try -1 for example you want size(64, None, None, 128) so try [64, -1, -1, 128]... I am not exactly sure whether this will work... It worked for me for batch_size that is my first argument was not of fixed size so I used -1
How ever there is also one high level api for transpose convolution tf.layers.conv2d_transpose()
I am sure the high level api tf.layers.conv2d_transpose() will work for you because it takes tensors of varying inputs
You do not even need to specify the output-shape you just need to specify the output_channel and the kernel to be used
For more details : https://www.tensorflow.org/api_docs/python/tf/layers/conv2d_transpose... I hope this helps
I ran into this problem too. Using -1, as suggested in the other answer here, doesn't work. Instead, you have to grab the shape of the incoming tensor and construct the output_size argument. Here's an excerpt from a test I wrote. In this case it's the first dimension that's unknown, but it should work for any combination of known and unknown parameters.
output_shape = [8, 8, 4] # width, height, channels-out. Handle batch size later
xin = tf.placeholder(dtype=tf.float32, shape = (None, 4, 4, 2), name='input')
filt = tf.placeholder(dtype=tf.float32, shape = filter_shape, name='filter')
## Find the batch size of the input tensor and add it to the front
## of output_shape
dimxin = tf.shape(xin)
ncase = dimxin[0:1]
oshp = tf.concat([ncase,output_shape], axis=0)
z1 = tf.nn.conv2d_transpose(xin, filt, oshp, strides=[1,2,2,1], name='xpose_conv')
I find a solution to use tf.shape for unspecified shape and get_shape() for specified shape.
def get_deconv_lens(H, k, d):
return tf.multiply(H, d) + k - 1
def deconv2d(x, output_shape, k_h=2, k_w=2, d_h=2, d_w=2, stddev=0.02, name='deconv2d'):
# output_shape: the output_shape of deconv op
shape = tf.shape(x)
H, W = shape[1], shape[2]
N, _, _, C = x.get_shape().as_list()
H1 = get_deconv_lens(H, k_h, d_h)
W1 = get_deconv_lens(W, k_w, d_w)
with tf.variable_scope(name):
w = tf.get_variable('weights', [k_h, k_w, C, x.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev))
biases = tf.get_variable('biases', shape=[C], initializer=tf.zeros_initializer())
deconv = tf.nn.conv2d_transpose(x, w, output_shape=[N, H1, W1, C], strides=[1, d_h, d_w, 1], padding='VALID')
deconv = tf.nn.bias_add(deconv, biases)
return deconv

Categories