I am attempting to classify 3D blocks of data with H,D,W of 64,1024,64 respectively. These are done in batches of 2. However the input shapes do not seem to be loading in correctly and I get the error: Expected 4D (unbatched) or 5D (batched) input to conv3d, but got input of size: [2, 1]. However the input shape is [2, 1, 64, 1024, 64].
Please see the code below:
print('Images shape ', images.shape)
# forward + backward + optimize
outputs = Net(images)
Which calls the CNN3D class:
class CNN3D(nn.Module):
def __init__(self, input_shape, conv_layers, kernel_size, out_channel_ratio, FC_layers):
super(CNN3D, self).__init__()
self.input = input_shape
model = [
self._conv_layer_set(1, 8),
# self._conv_layer_set(8, 16),
# self._conv_layer_set(16, 32),
nn.Flatten(),
# nn.Linear((6*14*6), 256),
# nn.LeakyReLU(),
# nn.Linear(256, 128),
# nn.LeakyReLU(),
# nn.Linear(1952752, 1)
nn.LazyLinear(1)
]
self.model = nn.Sequential(*model)
def _conv_layer_set(self, in_c, out_c):
conv_layer = nn.Sequential(
nn.Conv3d(in_c, out_c, kernel_size=(3, 7, 3), padding=0),
nn.LeakyReLU(),
nn.MaxPool3d((2, 4, 2)),
)
return conv_layer
def forward(self, x):
print('Input shape ', self.input)
for layer in self.model:
x = layer(x)
print(x.size())
return self.model(x)
This gives the following output:
Images shape torch.Size([2, 1, 64, 1024, 64])
Images shape torch.Size([2, 1, 64, 1024, 64])
Input shape (1, 64, 1024, 64)
torch.Size([2, 8, 31, 254, 31])
torch.Size([2, 1952752])
torch.Size([2, 1])
return F.conv3d(
RuntimeError: Expected 4D (unbatched) or 5D (batched) input to conv3d, but got input of size: [2, 1]
The error seems inconsistent with the input shape so I am not sure what is going on.
Thanks
Related
I'm trying to create a GAN that generates audio data. Using a standard GAN that consists only of Conv2d layers has not provided good results, so I'm trying to add an LSTM layer to the model. However, the output of my last Conv2d layer shape is (None, 1 ,256, 64). This is not a valid shape for the LSTM layer and I'm trying to drop the "1" dim of the previous layer, as I don't think it is needed.
I've tried torch.reshape, torch.squeeze, and just indexing the tensor to remove that dim, but all result in shape errors.
How can I add an LSTM layer to this model, and input the correct shape?
class Generator(nn.Module):
def __init__(self, channels_noise, channels_img, features_g):
super(Generator, self).__init__()
self.net = nn.Sequential(
# Input: N x channels_noise x 1 x 1
self._block(channels_noise, features_g * 16, (16,4), 1, 0), # img: 4x4
self._block(features_g * 16, features_g * 8, 4, 2, 1), # img: 8x8
self._block(features_g * 8, features_g * 4, 4, 2, 1), # img: 16x16
self._block(features_g * 4, features_g * 2, 4, 2, 1), # img: 32x32
nn.ConvTranspose2d(
features_g * 2, channels_img, kernel_size=4, stride=2, padding=1
),
# Output: N x channels_img x 64 x 64
nn.Tanh(),
)
self.lstm = nn.LSTM(input_size=64, hidden_size=10)
def _block(self, in_channels, out_channels, kernel_size, stride, padding):
return nn.Sequential(
nn.ConvTranspose2d(
in_channels, out_channels, kernel_size, stride, padding, bias=False,
),
nn.BatchNorm2d(out_channels),
nn.ReLU(),
)
def forward(self, x):
x = self.net(x)
x = x[:,0,:,:]
lstm_out, _ = self.lstm(x)
return lstm_out
I am creating patches of xception model, when I apply patches class and then reshape it then it generates an error of incompatibility between patches and reshape layers. I did not find any suitable documentation to study about calculation of parameters in patch layer to reshape them in proper dimensions for the next layer. That's why when I run this code it generates the error mentioned in title. How would I know the math to go from one layer to another and define proper size of patches = tf.reshape(patches, [1, 32, 32, 3]) to make patches from it's above layer patches = Patches(patch_size)(xception_input) without errors. My code is something like below
class Patches(layers.Layer):
def __init__(self, patch_size):
super(Patches, self).__init__()
self.patch_size = patch_size
def call(self, images):
batch_size = tf.shape(images)[0]
patches = tf.image.extract_patches(
images=images,
sizes=[1, self.patch_size, self.patch_size, 1],
strides=[1, self.patch_size, self.patch_size, 1],
rates=[1, 1, 1, 1],
padding="VALID",
)
patch_dims = patches.shape[-1]
patches = tf.reshape(patches, [batch_size, -1, patch_dims])
return patches
Model code
xception = keras.applications.Xception(
include_top=False, weights="imagenet", pooling="avg"
)
for layer in xception.layers:
layer.trainable = trainable
inputs = layers.Input(shape=(299, 299, 3), name="image_input")
patch_size = 72
print("patch_size shape is ", patch_size.shape)
print("inputs shape is ", inputs.shape)
xception_input = tf.keras.applications.xception.preprocess_input(inputs)
print("xception shape is ", xception_input.shape)
patches = Patches(patch_size)(xception_input)
print("patches shape is ", patches.shape)
patches = tf.reshape(patches, [1, 32, 32, 3])
print("patches reshape shape is ", patches.shape)
embeddings = xception(patches)
return keras.Model(inputs, embeddings, name="vision_encoder")
Output shape
patch_size shape is 72
inputs shape is (None, 299, 299, 3)
xception shape is (None, 299, 299, 3)
patches shape is (None, None, 15552)
patches reshape shape is (1, 32, 32, 3)
Update
When I change the patch code part like below
class Patches_en(layers.Layer):
def __init__(self, patch_size):
super(Patches_en, self).__init__()
self.patch_size = patch_size
def call(self, images):
#batch_size = tf.shape(images)[0]
patches = tf.image.extract_patches(
images=images,
sizes=[1, self.patch_size, self.patch_size, 1],
strides=[1, self.patch_size, self.patch_size, 1],
rates=[1, 1, 1, 1],
padding="VALID",
)
return patches
Output
inputs shape is (None, 299, 299, 3)
xception shape is (None, 299, 299, 3)
patches shape is (None, 4, 4, 15552)
and it generates this error
ValueError: Exception encountered when calling layer "tf.reshape" (type
TFOpLambda).
Dimension size must be evenly divisible by 248832 but is 3072 for '{{node tf.reshape/Reshape}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32](Placeholder, tf.reshape/Reshape/shape)' with input shapes: [?,4,4,15552], [4] and with input tensors computed as partial shapes: input[1] = [1,32,32,3].
Call arguments received:
• tensor=tf.Tensor(shape=(None, 4, 4, 15552), dtype=float32)
• shape=['1', '32', '32', '3']
• name=None
I am trying to learn build a U-NET architecture from scratch. I have written this code but the problem is that when I try to run to check the output of the encoder part, I am having issues with it. When you the run the code below , you'll get
import torch
import torch.nn as nn
batch = 1
channels = 3
width = 512 # same as height
image = torch.randn(batch, channels, width, width)
enc = Encoder(channels)
enc(image)
RuntimeError: Given groups=1, weight of size [128, 64, 3, 3], expected input[1, 3, 512, 512] to have 64 channels, but got 3 channels instead
Below is the code:
class ConvolutionBlock(nn.Module):
'''
The basic Convolution Block Which Will have Convolution -> RelU -> Convolution -> RelU
'''
def __init__(self, in_channels, out_channels, upsample:bool = False,):
'''
args:
upsample: If True, then use TransposedConv2D (Means it being used in the decoder part) instead MaxPooling
batch_norm was introduced after UNET so they did not know if it existed. Might be useful
'''
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size = 3, padding= 1), # padding is 0 by default, 1 means the input width, height == out width, height
nn.ReLU(),
nn.Conv2d(out_channels, out_channels, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2, stride = 2) if not upsample else nn.ConvTranspose2d(out_channels, out_channels//2, kernel_size = 2, ) # As it is said in the paper that it TransPose2D halves the features
)
def forward(self, feature_map_x):
'''
feature_map_x could be the image itself or the
'''
return self.network(feature_map_x)
class Encoder(nn.Module):
'''
'''
def __init__(self, image_channels:int = 1, repeat:int = 4):
'''
In UNET, the features start at 64 and keeps getting twice the size of the previous one till it reached BottleNeck
'''
super().__init__()
in_channels = [image_channels,64, 128, 256, 512]
out_channels = [64, 128, 256, 512, 1024]
self.layers = nn.ModuleList(
[ConvolutionBlock(in_channels = in_channels[i], out_channels = out_channels[i]) for i in range(repeat+1)]
)
def forward(self, feature_map_x):
for layer in self.layers:
out = layer(feature_map_x)
return out
EDIT: Running the code below gives me expected info too:
in_ = [3,64, 128, 256, 512]
ou_ = [64, 128, 256, 512, 1024]
width = 512
from torchsummary import summary
for i in range(5):
cb = ConvolutionBlock(in_[i], ou_[i])
summary(cb, (in_[i],width,width))
print('#'*50)
There was a code logic mistake in the forward of Encoder
I did:
for layer in self.layers:
out = layer(feature_map_x)
return out
but I was supposed to use feature_map_x as the input because the loop was iterating over the original feature map before but it was supposed to get the output of previous layer.
for layer in self.layers:
feature_map_x = layer(feature_map_x)
return feature_map_x
I've just created a Neural Network with Skorch to detect aircrafts on a picture and I trained it with a train dataset with the shape (40000, 64, 64, 3).
Then I tested it with a test dataset of (15000, 64, 64, 3).
module = nn.Sequential(
nn.Conv2d(3, 64, 3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(64, 64, 3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(64, 64, 3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(6 * 6 * 64, 256),
nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, 2),
nn.Softmax(),
)
early_stopping = EarlyStopping(monitor='valid_loss', lower_is_better=True)
net = NeuralNetClassifier(
module,
max_epochs=20,
lr=1e-4,
callbacks=[early_stopping],
# Shuffle training data on each epoch
iterator_train__shuffle=True,
device="cuda" if torch.cuda.is_available() else "cpu",
optimizer=optim.Adam
)
net.fit(
train_images_balanced.transpose((0, 3, 1, 2)).astype(np.float32),
train_labels_balanced
)
Now I need to test it on 512*512 pictures, so I have a new dataset of (30, 512, 512, 3).
So I took a sliding window code, that allowed me to divide the picture in 64*64 parts.
def sliding_window(image, stepSize, windowSize):
# slide a window across the image
for y in range(0, image.shape[0], stepSize):
for x in range(0, image.shape[1], stepSize):
# yield the current window
yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])
Now I wanna be able to predict if every single 64*64 image contains an aircraft, but I don't know how to do it, as net.predict() takes a dataset as an argument (arg : dim 4)
net.predict() takes a dataset as an argument (arg : dim 4)
net.predict accepts a number of data formats, among other things datasets. However, it seems for your case it would be best if it would accept torch tensors or numpy arrays - and it does! Just pass your 64x64 chunks to net.predict, something like this:
# (n, 512, 512, 3)
X = my_data
# (n, 4096, 64, 64, 3)
X = sliding_window(X, 64, 64)
# (n * 4096, 64, 64, 3)
X = X.reshape(-1, 64, 64, 3)
y = net.predict(X)
I am trying to input an image and a vector as input to the model. The image has the correct shape of 4d, but the vector that I input doesn't have such shape. The image size is 424x512 while the vector is of shape (18,). After using dataloader, I get batches of shape (50x1x424x512) and (50x18). Model gives error as it needs the vector shape to be 4d too. How do I do that?
Here is my code :
def loadTrainingData_B(args):
fdm = []
tdm = []
parameters = []
for i in image_files[:4]:
try:
false_dm = np.fromfile(join(ref, i), dtype=np.int32)
false_dm = Image.fromarray(false_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
fdm.append(false_dm)
true_dm = np.fromfile(join(ref, i), dtype=np.int32)
true_dm = Image.fromarray(true_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
tdm.append(true_dm)
pos = param_filenames.index(i)
param = np.array(params[pos, 1:])
param = np.where(param == '-point-light-source', 1, param).astype(np.float64)
parameters.append(param)
except:
print('[!] File {} not found'.format(i))
return (fdm, parameters, tdm)
class Flat_ModelB(Dataset):
def __init__(self, args, train=True, transform=None):
self.args = args
if train == True:
self.fdm, self.parameters, self.tdm = loadTrainingData_B(self.args)
else:
self.fdm, self.parameters, self.tdm = loadTestData_B(self.args)
self.data_size = len(self.parameters)
self.transform = transforms.Compose([transforms.ToTensor()])
def __getitem__(self, index):
return (self.transform(self.fdm[index]).double(), torch.from_numpy(self.parameters[index]).double(), self.transform(self.tdm[index]).double())
def __len__(self):
return self.data_size
The error I get is :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 2-dimensional input of size [50, 18] instead
Here is the model :
class Model_B(nn.Module):
def __init__(self, config):
super(Model_B, self).__init__()
self.config = config
# CNN layers for fdm
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer3 = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer4 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer5 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=5, stride=2, padding=2,output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer6 = nn.Sequential(
nn.ConvTranspose2d(in_channels=16, out_channels=1, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(1))
# CNN layer for parameters
self.param_layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
def forward(self, x, y):
out = self.layer1(x)
out_param = self.param_layer1(y)
print("LayerParam 1 Output Shape : {}".format(out_param.shape))
print("Layer 1 Output Shape : {}".format(out.shape))
out = self.layer2(out)
print("Layer 2 Output Shape : {}".format(out.shape))
out = self.layer3(out)
# out = torch.cat((out, out_param), dim=2)
print("Layer 3 Output Shape : {}".format(out.shape))
out = self.layer4(out)
print("Layer 4 Output Shape : {}".format(out.shape))
out = self.layer5(out)
print("Layer 5 Output Shape : {}".format(out.shape))
out = self.layer6(out)
print("Layer 6 Output Shape : {}".format(out.shape))
return out
and the method by which I access the data :
for batch_idx, (fdm, parameters) in enumerate(self.data):
if self.config.gpu:
fdm = fdm.to(device)
parameters = parameters.to(device)
print('shape of parameters for model a : {}'.format(parameters.shape))
output = self.model(fdm)
loss = self.criterion(output, parameters)
Edit :
I think my code is incorrect as I am trying to apply convolutions over a vector of (18). I tried to copy the vector and make it (18x64) and then input it. It still doesnt work and gives this output :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 3-dimensional input of size [4, 18, 64] instead
I am not sure how to concatenate an 18 length vector to the output of layer 3, if I can't do any of these things.
Looks like you are training an autoencoder model and want to parameterize it with some additional vector input in the bottleneck layer. If you want to perform some transformations on it then you have to decide whether if you need any spatial dependencies. Given the constant input size (N, 1, 424, 512), the output of layer3 will have a shape (N, 32, 53, 64). You have a lot of options, depending on you desired model performance:
Use a nn.Linear with activations to transform the parameter vector. Then you might add extra spatial dimensions and repeat this vector in all spatial locations:
img = torch.rand((1, 1, 424, 512))
vec = torch.rand(1, 19)
layer3_out = model(img)
N, C, H, W = layer3_out.shape
param_encoder = nn.Sequential(nn.Linear(19, 30), nn.ReLU(), nn.Linear(30, 10))
param = param_encoder(vec)
param = param.unsqueeze(-1).unsqueeze(-1).expand(N, -1, H, W)
encoding = torch.cat([param, layer3_out], dim=1)
Use transposed convolutions to upsample your parameter vector to the size of layer3 output. But that would be harder to implement as you have to calculate exact output shape to fit with (N, 32, 53, 64)
Transform input vector with MLP using nn.Linear to the size 2x of the channels in layer3 output. Then use so called Feature-wise transformations to scale and shift feature maps from layer3.
I would recomend to start with the first option since this is the simplest one to implement and then try others.