I'm trying to create a GAN that generates audio data. Using a standard GAN that consists only of Conv2d layers has not provided good results, so I'm trying to add an LSTM layer to the model. However, the output of my last Conv2d layer shape is (None, 1 ,256, 64). This is not a valid shape for the LSTM layer and I'm trying to drop the "1" dim of the previous layer, as I don't think it is needed.
I've tried torch.reshape, torch.squeeze, and just indexing the tensor to remove that dim, but all result in shape errors.
How can I add an LSTM layer to this model, and input the correct shape?
class Generator(nn.Module):
def __init__(self, channels_noise, channels_img, features_g):
super(Generator, self).__init__()
self.net = nn.Sequential(
# Input: N x channels_noise x 1 x 1
self._block(channels_noise, features_g * 16, (16,4), 1, 0), # img: 4x4
self._block(features_g * 16, features_g * 8, 4, 2, 1), # img: 8x8
self._block(features_g * 8, features_g * 4, 4, 2, 1), # img: 16x16
self._block(features_g * 4, features_g * 2, 4, 2, 1), # img: 32x32
nn.ConvTranspose2d(
features_g * 2, channels_img, kernel_size=4, stride=2, padding=1
),
# Output: N x channels_img x 64 x 64
nn.Tanh(),
)
self.lstm = nn.LSTM(input_size=64, hidden_size=10)
def _block(self, in_channels, out_channels, kernel_size, stride, padding):
return nn.Sequential(
nn.ConvTranspose2d(
in_channels, out_channels, kernel_size, stride, padding, bias=False,
),
nn.BatchNorm2d(out_channels),
nn.ReLU(),
)
def forward(self, x):
x = self.net(x)
x = x[:,0,:,:]
lstm_out, _ = self.lstm(x)
return lstm_out
Related
I am attempting to classify 3D blocks of data with H,D,W of 64,1024,64 respectively. These are done in batches of 2. However the input shapes do not seem to be loading in correctly and I get the error: Expected 4D (unbatched) or 5D (batched) input to conv3d, but got input of size: [2, 1]. However the input shape is [2, 1, 64, 1024, 64].
Please see the code below:
print('Images shape ', images.shape)
# forward + backward + optimize
outputs = Net(images)
Which calls the CNN3D class:
class CNN3D(nn.Module):
def __init__(self, input_shape, conv_layers, kernel_size, out_channel_ratio, FC_layers):
super(CNN3D, self).__init__()
self.input = input_shape
model = [
self._conv_layer_set(1, 8),
# self._conv_layer_set(8, 16),
# self._conv_layer_set(16, 32),
nn.Flatten(),
# nn.Linear((6*14*6), 256),
# nn.LeakyReLU(),
# nn.Linear(256, 128),
# nn.LeakyReLU(),
# nn.Linear(1952752, 1)
nn.LazyLinear(1)
]
self.model = nn.Sequential(*model)
def _conv_layer_set(self, in_c, out_c):
conv_layer = nn.Sequential(
nn.Conv3d(in_c, out_c, kernel_size=(3, 7, 3), padding=0),
nn.LeakyReLU(),
nn.MaxPool3d((2, 4, 2)),
)
return conv_layer
def forward(self, x):
print('Input shape ', self.input)
for layer in self.model:
x = layer(x)
print(x.size())
return self.model(x)
This gives the following output:
Images shape torch.Size([2, 1, 64, 1024, 64])
Images shape torch.Size([2, 1, 64, 1024, 64])
Input shape (1, 64, 1024, 64)
torch.Size([2, 8, 31, 254, 31])
torch.Size([2, 1952752])
torch.Size([2, 1])
return F.conv3d(
RuntimeError: Expected 4D (unbatched) or 5D (batched) input to conv3d, but got input of size: [2, 1]
The error seems inconsistent with the input shape so I am not sure what is going on.
Thanks
I am trying to learn build a U-NET architecture from scratch. I have written this code but the problem is that when I try to run to check the output of the encoder part, I am having issues with it. When you the run the code below , you'll get
import torch
import torch.nn as nn
batch = 1
channels = 3
width = 512 # same as height
image = torch.randn(batch, channels, width, width)
enc = Encoder(channels)
enc(image)
RuntimeError: Given groups=1, weight of size [128, 64, 3, 3], expected input[1, 3, 512, 512] to have 64 channels, but got 3 channels instead
Below is the code:
class ConvolutionBlock(nn.Module):
'''
The basic Convolution Block Which Will have Convolution -> RelU -> Convolution -> RelU
'''
def __init__(self, in_channels, out_channels, upsample:bool = False,):
'''
args:
upsample: If True, then use TransposedConv2D (Means it being used in the decoder part) instead MaxPooling
batch_norm was introduced after UNET so they did not know if it existed. Might be useful
'''
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size = 3, padding= 1), # padding is 0 by default, 1 means the input width, height == out width, height
nn.ReLU(),
nn.Conv2d(out_channels, out_channels, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2, stride = 2) if not upsample else nn.ConvTranspose2d(out_channels, out_channels//2, kernel_size = 2, ) # As it is said in the paper that it TransPose2D halves the features
)
def forward(self, feature_map_x):
'''
feature_map_x could be the image itself or the
'''
return self.network(feature_map_x)
class Encoder(nn.Module):
'''
'''
def __init__(self, image_channels:int = 1, repeat:int = 4):
'''
In UNET, the features start at 64 and keeps getting twice the size of the previous one till it reached BottleNeck
'''
super().__init__()
in_channels = [image_channels,64, 128, 256, 512]
out_channels = [64, 128, 256, 512, 1024]
self.layers = nn.ModuleList(
[ConvolutionBlock(in_channels = in_channels[i], out_channels = out_channels[i]) for i in range(repeat+1)]
)
def forward(self, feature_map_x):
for layer in self.layers:
out = layer(feature_map_x)
return out
EDIT: Running the code below gives me expected info too:
in_ = [3,64, 128, 256, 512]
ou_ = [64, 128, 256, 512, 1024]
width = 512
from torchsummary import summary
for i in range(5):
cb = ConvolutionBlock(in_[i], ou_[i])
summary(cb, (in_[i],width,width))
print('#'*50)
There was a code logic mistake in the forward of Encoder
I did:
for layer in self.layers:
out = layer(feature_map_x)
return out
but I was supposed to use feature_map_x as the input because the loop was iterating over the original feature map before but it was supposed to get the output of previous layer.
for layer in self.layers:
feature_map_x = layer(feature_map_x)
return feature_map_x
I'm having a problem implementing a super-resolution model
class SRNet(Model):
def __init__(self, scale=4):
super(SRNet, self).__init__()
self.scale = scale
self.conv1 = Sequential([
layers.Conv2D(filters=64, kernel_size=3,
strides=(1, 1), padding="same", data_format="channels_first"),
layers.ReLU(),
])
self.residualBlocks = Sequential(
[ResidualBlock() for _ in range(16)])
self.convUp = Sequential([
layers.Conv2DTranspose(filters=64, kernel_size=3, strides=(
2, 2), padding="same", data_format="channels_first"),
layers.ReLU(),
layers.Conv2DTranspose(filters=64, kernel_size=3, strides=(
2, 2), padding="same", data_format="channels_first"),
layers.ReLU(),
])
self.reluAfterPixleShuffle = layers.ReLU()
self.convOut = layers.Conv2D(
filters=3, kernel_size=3, strides=(1, 1), padding="same", data_format="channels_first", input_shape=(4, 1440, 2560)) # (kernel, kernel, channel, output)
def call(self, lrCur_hrPrevTran):
lrCur, hrPrevTran = lrCur_hrPrevTran
x = tf.concat([lrCur, hrPrevTran], axis=1)
x = self.conv1(x)
x = self.residualBlocks(x)
x = self.convUp(x)
# pixel shuffle
Subpixel_layer = Lambda(lambda x: tf.nn.depth_to_space(
x, self.scale, data_format="NCHW"))
x = Subpixel_layer(inputs=x)
x = self.reluAfterPixleShuffle(x)
x = self.convOut(x)
return x
Error
/usr/src/app/generator.py:164 call *
x = self.convOut(x)
ValueError: Tensor's shape (3, 3, 64, 3) is not compatible with supplied shape (3, 3, 4, 3)
after reading the error I know that (3, 3, 4, 3) is (kernel size, kernel size, channel, output) mean that only channel of input is not correct
so I printed out the shape of the input
# after pixel shuffle before convOut
print(x.shape)
>>> (1, 4, 1440, 2560) (batch size, channel, height, width)
but the shape of x after pixel shuffle (depth_to_space) is (1, 4, 1440, 2560) the channel value is 4 which is the same as convOut need
question is why the input's channel is changing from 4 to 64 as the error?
I have found a solution
First of all, I'm using checkpoints to save model weight when training
during the implementation and testing of the model, I have changed some of the layers so the input size is changed too, but my weight still remember the input size from the previous checkpoint
so I delete the checkpoints folder and then everything works again
I'm trying to classify cat and dog in CNN with PyTorch.
While I made few layers and processing images, I found that final processed feature map size doesn't match with calculated size.
So I tried to check feature map size step by step in CNN process with print shape but it doesn't work.
I heard tensorflow enables check tensor size in steps but how can I do that?
What I want is :
def __init__(self):
super(CNN, self).__init__()
conv1 = nn.Conv2d(1, 16, 3, 1, 1)
conv1_1 = nn.Conv2d(16, 16, 3, 1, 1)
pool1 = nn.MaxPool2d(2)
conv2 = nn.Conv2d(16, 32, 3, 1, 1)
conv2_1 = nn.Conv2d(32, 32, 3, 1, 1)
pool2 = nn.MaxPool2d(2)
conv3 = nn.Conv2d(32, 64, 3, 1, 1)
conv3_1 = nn.Conv2d(64, 64, 3, 1, 1)
conv3_2 = nn.Conv2d(64, 64, 3, 1, 1)
pool3 = nn.MaxPool2d(2)
self.conv_module = nn.Sequential(
conv1,
nn.ReLU(),
conv1_1,
nn.ReLU(),
pool1,
# check first result size
conv2,
nn.ReLU(),
conv2_1,
nn.ReLU(),
pool2,
# check second result size
conv3,
nn.ReLU(),
conv3_1,
nn.ReLU(),
conv3_2,
nn.ReLU(),
pool3,
# check third result size
pool4,
# check fourth result size
pool5
# check fifth result size
)
If there's any other way to check feature size at every step, please give some advice.
Thanks in advance.
To do that you shouldn't use nn.Sequential. Just initialize your layers in __init__() and call them in the forward function. In the forward function you can print the shapes out. For example like this:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(...)
self.maxpool1 = nn.MaxPool2d()
self.conv2 = nn.Conv2d(...)
self.maxpool2 = nn.MaxPool2d()
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.maxpool1(x)
print(x.size())
x = self.conv2(x)
x = F.relu(x)
x = self.maxpool2(x)
print(x.size())
Hope thats what you looking for!
I am trying to input an image and a vector as input to the model. The image has the correct shape of 4d, but the vector that I input doesn't have such shape. The image size is 424x512 while the vector is of shape (18,). After using dataloader, I get batches of shape (50x1x424x512) and (50x18). Model gives error as it needs the vector shape to be 4d too. How do I do that?
Here is my code :
def loadTrainingData_B(args):
fdm = []
tdm = []
parameters = []
for i in image_files[:4]:
try:
false_dm = np.fromfile(join(ref, i), dtype=np.int32)
false_dm = Image.fromarray(false_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
fdm.append(false_dm)
true_dm = np.fromfile(join(ref, i), dtype=np.int32)
true_dm = Image.fromarray(true_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
tdm.append(true_dm)
pos = param_filenames.index(i)
param = np.array(params[pos, 1:])
param = np.where(param == '-point-light-source', 1, param).astype(np.float64)
parameters.append(param)
except:
print('[!] File {} not found'.format(i))
return (fdm, parameters, tdm)
class Flat_ModelB(Dataset):
def __init__(self, args, train=True, transform=None):
self.args = args
if train == True:
self.fdm, self.parameters, self.tdm = loadTrainingData_B(self.args)
else:
self.fdm, self.parameters, self.tdm = loadTestData_B(self.args)
self.data_size = len(self.parameters)
self.transform = transforms.Compose([transforms.ToTensor()])
def __getitem__(self, index):
return (self.transform(self.fdm[index]).double(), torch.from_numpy(self.parameters[index]).double(), self.transform(self.tdm[index]).double())
def __len__(self):
return self.data_size
The error I get is :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 2-dimensional input of size [50, 18] instead
Here is the model :
class Model_B(nn.Module):
def __init__(self, config):
super(Model_B, self).__init__()
self.config = config
# CNN layers for fdm
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer3 = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer4 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer5 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=5, stride=2, padding=2,output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer6 = nn.Sequential(
nn.ConvTranspose2d(in_channels=16, out_channels=1, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(1))
# CNN layer for parameters
self.param_layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
def forward(self, x, y):
out = self.layer1(x)
out_param = self.param_layer1(y)
print("LayerParam 1 Output Shape : {}".format(out_param.shape))
print("Layer 1 Output Shape : {}".format(out.shape))
out = self.layer2(out)
print("Layer 2 Output Shape : {}".format(out.shape))
out = self.layer3(out)
# out = torch.cat((out, out_param), dim=2)
print("Layer 3 Output Shape : {}".format(out.shape))
out = self.layer4(out)
print("Layer 4 Output Shape : {}".format(out.shape))
out = self.layer5(out)
print("Layer 5 Output Shape : {}".format(out.shape))
out = self.layer6(out)
print("Layer 6 Output Shape : {}".format(out.shape))
return out
and the method by which I access the data :
for batch_idx, (fdm, parameters) in enumerate(self.data):
if self.config.gpu:
fdm = fdm.to(device)
parameters = parameters.to(device)
print('shape of parameters for model a : {}'.format(parameters.shape))
output = self.model(fdm)
loss = self.criterion(output, parameters)
Edit :
I think my code is incorrect as I am trying to apply convolutions over a vector of (18). I tried to copy the vector and make it (18x64) and then input it. It still doesnt work and gives this output :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 3-dimensional input of size [4, 18, 64] instead
I am not sure how to concatenate an 18 length vector to the output of layer 3, if I can't do any of these things.
Looks like you are training an autoencoder model and want to parameterize it with some additional vector input in the bottleneck layer. If you want to perform some transformations on it then you have to decide whether if you need any spatial dependencies. Given the constant input size (N, 1, 424, 512), the output of layer3 will have a shape (N, 32, 53, 64). You have a lot of options, depending on you desired model performance:
Use a nn.Linear with activations to transform the parameter vector. Then you might add extra spatial dimensions and repeat this vector in all spatial locations:
img = torch.rand((1, 1, 424, 512))
vec = torch.rand(1, 19)
layer3_out = model(img)
N, C, H, W = layer3_out.shape
param_encoder = nn.Sequential(nn.Linear(19, 30), nn.ReLU(), nn.Linear(30, 10))
param = param_encoder(vec)
param = param.unsqueeze(-1).unsqueeze(-1).expand(N, -1, H, W)
encoding = torch.cat([param, layer3_out], dim=1)
Use transposed convolutions to upsample your parameter vector to the size of layer3 output. But that would be harder to implement as you have to calculate exact output shape to fit with (N, 32, 53, 64)
Transform input vector with MLP using nn.Linear to the size 2x of the channels in layer3 output. Then use so called Feature-wise transformations to scale and shift feature maps from layer3.
I would recomend to start with the first option since this is the simplest one to implement and then try others.