Implementing CNN with tensorflow - python

I'm new in convolutional neural networks and in Tensorflow and I need to implement a conv layer with further parameters:
Conv. layer1: filter=11, channel=64, stride=4, Relu.
The API is following:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
I understand, what is stride and that it should be [1, 4, 4, 1] in my case. But I do not understand, how should I pass a filter parameter and padding.
Could someone help with it?

At first, you need to create a filter variable:
W = tf.Variable(tf.truncated_normal(shape = [11, 11, 3, 64], stddev = 0.1), tf.float32)
First two fields of shape parameter stand for filter size, third for the number of input channels (I guess your images have 3 channels) and fourth for the number of output channels.
Now output of convolutional layer could be computed as follows:
conv1 = tf.nn.conv2d(input, W, strides = [1, 4, 4, 1], padding = 'SAME'), where padding = 'SAME' stands for zero padding and therefore size of the image remains the same, input should have size [batch, size1, size2, 3].
ReLU application is pretty straightforward:
conv1 = tf.nn.relu(conv1)

Related

CNN for variable sized images in pytorch

I want to make a CNN model in pytorch which can be fed images of different sizes. I am trying to use 2d convolution layer, which takes 4D input shape (pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions).
However, I'm not sure how to set up the input layer that can adjust all the variable sized images into fixed number of feature maps to pass over to remaining layers.
For example, the shape of the input for colored images is [4, 3, 32, 32], which corresponds to batch size, number of channel(RGB), width, and height. If images are grayscale, then it will have [4, 1, 32, 32], which will produce an error when the shape is not what the layer expected.
Error message is "RuntimeError: Given groups=1, weight of size [6, 3, 5, 5], expected input[4, 1, 32, 32] to have 3 channels, but got 1 channels instead"
The architecture of my current CNN is like below.
def __init__(self, num_out, kernel_size, num_input_filters):
super().__init__()
self.num_input_filters = num_input_filters
self.num_out = num_out
self.kernel_size = kernel_size
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, inp):
inp = self.pool(F.relu(self.conv1(inp)))
return inp
I have referenced similar questions and Fully convolutional networks (FCN) have no limitations on the input size at all, which could be the solution. And pytorch provides ConvTranspose2d() for FCN, but its parameters still seem to require fixed input size.
Are there any methods that can solve this problem?
You can just convert the grayscale images to RGB by duplicating the single channel to three.
As your example, shape [1, 32, 32] can be converted to [3, 32, 32] by the following codes:
np.concatenate((images,)*3)
If the shape is [32, 32, 1], try
np.concatenate((images,)*3, axis=-1)
If the shape is [32, 32], then you can try codes below to convert to [32, 32, 3]:
img_shape = tuple(np.ones(len(images.shape), dtype=int))
img_shape += (3,)
images = np.tile(np.expand_dims(images, axis=-1), img_shape)

What's the difference between conv2d(SAME) and tf.pad + conv2d(VALID)?

I'm almost new to tensorflow, and when I learn tensorflow through some tutorials, i've read the following codes:
if stride == 1:
return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, padding='SAME', scope=scope)
else:
pad_total = kernel_size - 1
pad_beg = pad_total // 2
pad_end = pad_total - pad_beg
inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride, padding='VALID', scope=scope)
However, i also learn that, "SAME" padding means the output data has the same size with the input data, while "VALID" means different, and the the method of tf.pad also pad zero manually, so is there any difference between these two methods? Or what's the purpose of this tf.pad?
In many real-word use-cases, there is no difference.
For instance, in some imagenet architectures, we often pad with 1, then do a 3x3 convolution. Behaviour of the network would be the same if you first zero-pad with 1, then convolve, or if you convolve with "same" padding.
However, behaviour will be different in non-standard situations. Remember that you can define kernel size AND stride AND dilation rate at a convolution layer.
A counterexample where there is a difference between conv2d(SAM) and a symmetric tf.pad +conv2d(VALID):
Input: (7,7,1)
Kernel: (4,4)
Stride: (2,2)
conv2d(SAME) here would be the same as tf.pad(0 pixel left/top, 1 pixel right/bottom), and would yield a (3,3,1) output.

Tensorflow convolution

I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)

TensorFlow TypeError: Value passed to parameter input has DataType uint8 not in list of allowed values: float16, float32

I am trying to get a simple CNN to train for the past 3 days.
First, I have setup an input pipeline/queue configuration that reads images from a directory tree and prepares batches.
I got the code for this at this link. So, I now have train_image_batch and train_label_batch that I need to feed to my CNN.
train_image_batch, train_label_batch = tf.train.batch(
[train_image, train_label],
batch_size=BATCH_SIZE
# ,num_threads=1
)
And I am unable to figure out how. I am using the code for CNN given at this link.
# Input Layer
input_layer = tf.reshape(train_image_batch, [-1, IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS])
# Convolutional Layer #1
conv1 = new_conv_layer(input_layer, NUM_CHANNELS, 5, 32, 2)
# Pooling Layer #1
pool1 = new_pooling_layer(conv1, 2, 2)
The input_layer on printing shows this
Tensor("Reshape:0", shape=(5, 120, 120, 3), dtype=uint8)
The next line crashes with TypeError; conv1 = new_conv_layer(...). The body of new_conv_layer function is given below
def new_conv_layer(input, # The previous layer.
num_input_channels, # Num. channels in prev. layer.
filter_size, # Width and height of each filter.
num_filters, # Number of filters.
stride):
# Shape of the filter-weights for the convolution.
# This format is determined by the TensorFlow API.
shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights aka. filters with the given shape.
weights = tf.Variable(tf.truncated_normal(shape, stddev=0.05))
# Create new biases, one for each filter.
biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))
# Create the TensorFlow operation for convolution.
# Note the strides are set to 1 in all dimensions.
# The first and last stride must always be 1,
# because the first is for the image-number and
# the last is for the input-channel.
# But e.g. strides=[1, 2, 2, 1] would mean that the filter
# is moved 2 pixels across the x- and y-axis of the image.
# The padding is set to 'SAME' which means the input image
# is padded with zeroes so the size of the output is the same.
layer = tf.nn.conv2d(input=input,
filter=weights,
strides=[1, stride, stride, 1],
padding='SAME')
# Add the biases to the results of the convolution.
# A bias-value is added to each filter-channel.
layer += biases
# Rectified Linear Unit (ReLU).
# It calculates max(x, 0) for each input pixel x.
# This adds some non-linearity to the formula and allows us
# to learn more complicated functions.
layer = tf.nn.relu(layer)
# Note that ReLU is normally executed before the pooling,
# but since relu(max_pool(x)) == max_pool(relu(x)) we can
# save 75% of the relu-operations by max-pooling first.
# We return both the resulting layer and the filter-weights
# because we will plot the weights later.
return layer, weights
Precisely it crashes at tf.nn.conv2d with this error
TypeError: Value passed to parameter 'input' has DataType uint8 not in list of allowed values: float16, float32
The image from your input pipeline is of type 'uint8', you need to type cast it to 'float32', You can do this after the image jpeg decoder:
image = tf.image.decode_jpeg(...
image = tf.cast(image, tf.float32)
You need to cast your image from int to float, You can simply do so for your input images.
image = image.astype('float')
It works fine with me.

Image over Image convolution in Tensorflow

Assume, I have two set of images, A and B, each 11X5x5x3, where 11 is a number of examples and 5x5x3 is an image dimension.
Is there an easy way in Tensorflow to apply convolution for each image in A_i over B_i (i.e. B_i plays a filter role and A_i is an input in tf.conv2d)? For example, conv2d(A_1,B_1), conv2d(A_2,B_2),...,conv2d(A_11,B_11)
No weight learning here just wanted to apply convolution on one image over another.
I tried to do it as follow:
# change B to 5x5x3x11 to be compatible with tf convolution.
tf.nn.conv2d(A, B, strides=[1,1,1,1], padding ='SAME' )
but the problem with this is that it applies convolution on every A_i over all B_i's. I don't want this, I want A_i only over B_j where i==j. Of course I can do it one by one, but it wouldn't be efficient and I need to do it in a batch mode.
Any comment how to solve this problem?
Thanks.
J
I am not sure that is what you need because it is not really batch mode but you could use a map function :
A = tf.placeholder(dtype=tf.float32, shape=[None, 5, 5, 3])
B = tf.placeholder(dtype=tf.float32, shape=[None, 5, 5, 3])
output = tf.map_fn(
lambda inputs : tf.nn.conv2d(
tf.expand_dims(inputs[0], 0), # H,W,C -> 1,H,W,C
tf.expand_dims(inputs[1], 3), # H,W,C -> H,W,C,1
strides=[1,1,1,1],
padding="SAME"
), # Result of conv is 1,H,W,1
elems=[A,B],
dtype=tf.float32
)
final_output = output[:, 0, :, :, 0] # B,1,H,W,1 -> B,H,W
Performance will depend on how the tiny separate convolutions will be parallelized I guess.

Categories