I have data set of colored images in the form of ndarray (100, 20, 20, 3) and 100 corresponding labels. When passing them as input to a fully connected neural network (not CNN), what should I do with the 3 values of RGB? Average them perhaps lose some information, but if not manipulating them, my main issue is batch size, as demo-ed below in pytorch.
for epoch in range(n_epochs):
for i, (images, labels) in enumerate(train_loader):
# because of rgb values, now images is 3 times the length of labels
images = Variable(images.view(-1, 400))
labels = Variable(labels)
optimizer.zero_grad()
outputs = net(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
This returns 'ValueError: Expected input batch_size (300) to match target batch_size (100).' Should I have reshaped images into (1, 1200) dimension tensors? Thanks in advance for answers.
Since size of labels is (100,), so your batch data should be with shape of (100, H, W, C). I'm assuming your data loader is returning a tensor with shape of (100,20,20,3). The error happens because you reshape the tensor to (300,400).
Check your network architecture whether the input tensor shape is (20,20,3).
If your network can only accept single channel images, you can first convert your RGB to grayscale images.
Or, modify your network architecture to make it accept 3 channels images. One convenient way is adding an extra layer reducing 3 channels to 1 channel, and you do not need to change the other parts of the network.
Use Grey scaled image to reduce the batch size
Related
I have a dataset with three labels.
First, I'm loading my data into a Dataset with the ImageFolder class and the CenterCrop transform.
So every picture has now three channels r, g, b with 224x224 values from 0..1. I thought my NN has 224×224×3 Input Nodes.
I get this error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (21504x224 and 150528x200)
Assuming that the first mat1 is the Image (I have no clue where the 21504 is coming from) and the second one is the first nn.Linear(2242243, 200) Layer, I can see why they cannot be multiplied. So, I changed the nn.Linear(2242243) to nn.Linear(224). Now it works, but..
transform = transforms.Compose([transforms.Resize((300, 200)),
transforms.CenterCrop(224),
transforms.ToTensor()])
dataset = datasets.ImageFolder('/content/drive/MyDrive/Colab Notebooks/data', transform=transform)
# ... BatchSize = 32
model = nn.Sequential(
nn.Linear(224, 200), # <- The Size of one Row!
nn.Sigmoid(),
nn.Linear(200, 40),
nn.Sigmoid(),
nn.Dropout(p=0.2),
nn.Linear(40, 40),
nn.Sigmoid(),
nn.Linear(40, 20),
nn.Sigmoid(),
nn.Linear(20, 3)
).to(device)
# ...
prediction = model(x)
For some Reaseon my prediction has now this from.
prediction[32][3][224]
I would expect 32 items in a list for a Batch. Every Item from the Batch contains 3 values with the probability what the label is. But why does the Height / width came up here?.
I think I have to change the Format of the dataset from 224x224x3 (3d) to 150528 1d, but operations like view() did not work, probably because of the lazy loading from ImageFolder.
The NN works fine with the MINST dataset. (28x28) In and (10) Out. So my guess is: The dataset has to be transformed, but I can't figure out how.
It can work if you apply transforms.Lambda and change transforms in the following way:
transform = transforms.Compose([transforms.Resize((300, 200)),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Lambda(lambda x: torch.flatten(x))])
As a result you will have (batch_size, 150528) shape for each batch which means that you need to specify 150528 input size for the first linear layer. Not sure that it's reasonable to classify images with linear fully-connected layers in such a case. It will work slowly and unlikely converge to some good predictions. You'd better consider using convolution layers instead.
When loading a model and using it for inference, I feed in an array of images,
image_tile_tensor where the shape is (total_tile, tile_height, tile_width, 3). image_tile_tensor is a numpy array.
I input this into my model for inference using the code below:
image_tile_tensor = tf.image.convert_image_dtype(image_tile_tensor, tf.float32)
image_tile_prediction = model.predict(image_tile_tensor, verbose=1)
During inference, all the prediction come out nicely except for the last few image tile in image_tile_tensor. For example, if I have 112 total image tile for inference, the last 12 image tiles has prediction with all values equal to NaN while the first 100 image tile has the expected prediction value.
Any idea what the problem might be? I am abit lost and dont know where to start debug this at the moment. If it helps, the input tile_height and tile_width are (192,192) and training batch size = 16.
I am building a CNN structure which is able to take multiple images as input. The number of inputs is varying —- for example it can be 3 or 4 or any other number ideally.
Here is what I want:
When input 3 images, there will be 3 streams of vgg16 which share the same weights.
When input 4 images, there will be 4 streams of vgg16 which share the same weights.
In conclusion, the network structure can only be define after I feed data during training/testing. How can I do it with tensorflow?
Consider an example of a TensorFlow placeholder :
x_train_placeholder = tf.placeholder(tf.float32, [None, 256, 256, 3])
In the above placeholder None is used for the size of the 1st dimension which allows us to pass a variable number of input tensors (images) of size (256, 256, 3) as input to the CNN model without explicitly specifying/hard-coding these values.
Consider, we input a batch of images x_train_batch which is of dimensions [4, 256, 256, 3] in which the first dimension is 4 indicating we're passing a batch of 4 images to the model (this could be modified to 3 according to your problem description). Now, we could pass x_train_batch to the placeholder x_train_placeholder above using the below code of code:
_, loss_train = sess.run([optimizer, loss], feed_dict={x_train_placeholder: x_train_batch})
where loss is the definition of the cost function we're using to optimize our network. Notice that since we're not hard-coding the value of the first dimension of x_train_batch anywhere, this allows us to pass a variable number of images to the tensorflow placeholder during training or testing.
The output layer of my CNN should use the RBF function, described as "each neuron outputs the square of the Euclidean distance between its input vector and its weight vector". I've implemented this as
dense2 = tf.square(tf.norm(dense1 - tf.transpose(dense2_W)))
where dense1 is a tensor of shape (?, 84). I've tried declaring dense2_W, the weights, as a variable of shape (84, 10) since it's doing number classification and should have 10 outputs. Running the code with a batch of 100 I get this error: InvalidArgumentError: Incompatible shapes: [100,84] vs. [10,84]. I believe it is due to the subtraction.
I train the network by iterating this code:
x_batch, y_batch = mnist.train.next_batch(100)
x_batch = tf.pad(x_batch, [[0,0],[2,2],[2,2],[0,0]]).eval() # Pad 28x28 -> 32x32
sess.run(train_step, {X: x_batch, Y: y_batch})
and then test it using the entire test set, thus the batch size in the network must be dynamic.
How can I work around this? The batch size must be dynamic, as in dense1's case, but I don't understand how to make a variable with dynamic size and transposing it (dense2_W).
You need the shapes of the two tensors to match. Assuming you want to share the weights across the batch and also having separate set of weights for each output class, you could reshape both of the tensors in order to be correctly broadcasted, e.g:
# broadcasting will copy the input to every output class neuron
input_dense = tf.expand_dims(dense1, axis=2)
# broadcasting here will copy the weights across the batch
weights = tf.expand_dims(tf.transpose(dense2_W), axis=0)
dense2 = tf.square(tf.norm(input_dense - weights, axis=1))
The resulting tensor dense2 should have shape of [batch_size, num_classes], which is [100, 10] in your case (so it will hold logits for every data instance over the number of output classes)
EDIT: added axis argument to the tf.norm call so that the distance is computed in the hidden dimension (not over the whole matrices).
I'm trying to calculate the gradient at some layer with respect to the input image. The gradient is defined as
feature = g.get_tensor_by_name('inception/conv2d0_pre_relu:0')
gradient = tf.gradients(tf.reduce_max(feature, 3), x)
and my input image has a shape of (299,299), which is the size that inception is trained at
print(img.shape)
# output (299,299,3)
Then the gradient with respect to the input can be calculated as
img_4d=img[np.newaxis]
res = sess.run(gradient, feed_dict={x: img_4d})[0]
print(res.shape)
# output (1,299,299,3)
We see that the gradient has the same shape as the input image, which is expected.
However, it appears that one can use image with any size but still get the gradient. For example, if I have a img_resized with a shape (150,150,3), the gradient with respect to this input will also with a shape of (150,150,3):
img_resized=skimage.transform.resize(img, [150,150], preserve_range=True)
img_4d=img_resized[np.newaxis]
res = sess.run(gradient, feed_dict={x: img_4d})[0]
res.shape
# output (1,150,150,3)
So why does this work? In my naive picture, the dimension of the input image must be fixed at (299,299,3), and the gradient at some layer with respect to the input would always have the shape of (299,299,3). Why is it able to generate a gradient of other sizes?
In other words, what happens in the above code? When we feed an image with shape (150,150,3), does tensorflow resize the image to (299,299,3) and calculate the gradient with shape (299,299,3), and then resize the gradient back to (150,150,3)?
This is an expected phenomena esp. in the case of inception net which can work with any sized input owing to being fully convolutional network. Unlike Alexnet or VGG which rely on Fully Connected layer in later part of the network, Fully Convolutional networks can work on any sized input. Hope this answers your question.