Assume, I have two set of images, A and B, each 11X5x5x3, where 11 is a number of examples and 5x5x3 is an image dimension.
Is there an easy way in Tensorflow to apply convolution for each image in A_i over B_i (i.e. B_i plays a filter role and A_i is an input in tf.conv2d)? For example, conv2d(A_1,B_1), conv2d(A_2,B_2),...,conv2d(A_11,B_11)
No weight learning here just wanted to apply convolution on one image over another.
I tried to do it as follow:
# change B to 5x5x3x11 to be compatible with tf convolution.
tf.nn.conv2d(A, B, strides=[1,1,1,1], padding ='SAME' )
but the problem with this is that it applies convolution on every A_i over all B_i's. I don't want this, I want A_i only over B_j where i==j. Of course I can do it one by one, but it wouldn't be efficient and I need to do it in a batch mode.
Any comment how to solve this problem?
Thanks.
J
I am not sure that is what you need because it is not really batch mode but you could use a map function :
A = tf.placeholder(dtype=tf.float32, shape=[None, 5, 5, 3])
B = tf.placeholder(dtype=tf.float32, shape=[None, 5, 5, 3])
output = tf.map_fn(
lambda inputs : tf.nn.conv2d(
tf.expand_dims(inputs[0], 0), # H,W,C -> 1,H,W,C
tf.expand_dims(inputs[1], 3), # H,W,C -> H,W,C,1
strides=[1,1,1,1],
padding="SAME"
), # Result of conv is 1,H,W,1
elems=[A,B],
dtype=tf.float32
)
final_output = output[:, 0, :, :, 0] # B,1,H,W,1 -> B,H,W
Performance will depend on how the tiny separate convolutions will be parallelized I guess.
Related
Group convolution is available in Keras using code here for example: https://github.com/tensorflow/tensorflow/issues/34024#issuecomment-552034933
However, for my specific application, I require that during training, each of the groups in the group convolution has the same weights. For example, if I have a tensor of shape 8x8x32 and I want to have groups = 2 and filter_size=3x3, then normal group convolution will use two 3x3x16 tensors to convolve across the first half of the 8x8x32 and the second half of the 8x8x32. I want to ensure that the two 3x3x16 tensors have the same weights, even during training.
I can do this by getting rid of my group convolution framework and splitting my 8x8x32 tensor into two 8x8x16 tensors, then running each of them through a single non-grouped 3x3x16 convolution. However, by not using a group convolution framework, the code runs slower because the tasks are not run in parallel.
How can I use the speed upgrade that group convolution offers in Keras while constraining the weights in each group to be the same?
Apparently, how group convolutions work in TensorFlow (at the moment, at least, since it does not seem to be documented yet, so I guess it could change) is, given a batch img with shape (n, h, w, c) and a filter k with shape (kh, kw, c1, c2), it makes a convolution in g = c / c1 groups where the result has c2 channels. c must be divisible by c1 and c2 must be a multiple of g. As I understand it, this means that, if we call the number of output channels per group a = c2 / g, then the first group uses the filters k[:, :, :, :a], the second group k[:, :, :, a:2*a], and so on. If you want to use exactly the same filters for every convolution group, you just need to make filters for a single group, with shape (kh, kw, c1, a) and then tile it g times in the last dimension.
In the code that you reference, you would just need to make the following changes. The definition of self.kernel would change to:
# Make sure self.filters is divisible by self.groups
kernel_shape = self.kernel_size + (input_dim // self.groups, self.filters // self.groups)
# Filters for a single group
self.kernel_base = self.add_weight(
name='kernel',
shape=kernel_shape,
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint,
trainable=True,
dtype=self.dtype)
# Tile filters for the rest of groups
self.kernel = tf.tile(self.kernel_base, [1, 1, 1, self.groups])
Assuming you also want the bias to work in the same way, you would do the same with it:
if self.use_bias:
# Bias for a single group
self.bias_base = self.add_weight(
name='bias',
shape=(self.filters // self.groups,),
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint,
trainable=True,
dtype=self.dtype)
# Bias for all groups
self.bias = tf.tile(self.kernel_base, [1, 1, 1, self.groups])
else:
self.bias = None
The rest of the code would work similarly, as tf.nn.conv2d would just use self.kernels as before, and self.bias would be added similarly.
New answer
Ah, I though you wanted to be assured that the standard grouped convolution used the same weights for both groups. Yes it does!!!
The difference between what you want and what tensorflow does is that you want to "contatenate" results of two groups while tensorflow probably "sums" the results of the two groups.
In this case, you can try to use a TimeDistributed convolution. I'm not sure if the performance is different from simply splitting the data in two as you wanted, but you will use a single layer and that might bring you a performance advantage:
inputs = Input((8,8,32))
grouped = Reshape((8,8,2,16))(inputs)
grouped = Permute((3,1,2,4))(grouped) #this might be a performance drawback
conv_out = TimeDistributed(Conv2D(half_filters, (3,3), ...))(grouped)
ungrouped = Permute((2,3,1,4))(conv_out)
ungrouped = Reshape((x,y,2*half_filters))
Old answer
If you calculate the shape of the weights with the following code from your source:
kernel_shape = self.kernel_size + (input_dim // self.groups, self.filters)
The result, according to your example will be:
(3, 3, 32//2, filters)
This does support the idea that is only one half-size kernel for the two groups. There is only half the number of parameters.
You can check this with a simple model:
inputs = Input((x, y, channels))
outputs1 = Conv2D(filters, 3, padding="same")(inputs)
outputs2 = GroupConv2D(filts, 2, groups=2, padding="same")(inputs)
model1 = Model(inputs, outputs1)
model2 = Model(inputs, outputs2)
model1.summary() #count the parameters
model2.summary() #expect half the parameters
I believe the tf.tile solution that #jdehesa wrote is your best choice.
However if you happen to have your data formatted in channel_first/NCHW format you could also try to reshape from (n, g * c, h, w) to (n * g, c, h, w) before the convolution and then reshape back after the convolution.
After training the cnn model, I want to visualize the weight or print out the weights, what can I do?
I cannot even print out the variables after training.
Thank you!
To visualize the weights, you can use a tf.image_summary() op to transform a convolutional filter (or a slice of a filter) into a summary proto, write them to a log using a tf.train.SummaryWriter, and visualize the log using TensorBoard.
Let's say you have the following (simplified) program:
filter = tf.Variable(tf.truncated_normal([8, 8, 3]))
images = tf.placeholder(tf.float32, shape=[None, 28, 28])
conv = tf.nn.conv2d(images, filter, strides=[1, 1, 1, 1], padding="SAME")
# More ops...
loss = ...
optimizer = tf.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss)
filter_summary = tf.image_summary(filter)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('/tmp/logs', sess.graph_def)
for i in range(10000):
sess.run(train_op)
if i % 10 == 0:
# Log a summary every 10 steps.
summary_writer.add_summary(filter_summary, i)
After doing this, you can start TensorBoard to visualize the logs in /tmp/logs, and you will be able to see a visualization of the filter.
Note that this trick visualizes depth-3 filters as RGB images (to match the channels of the input image). If you have deeper filters, or they don't make sense to interpret as color channels, you can use the tf.split() op to split the filter on the depth dimension, and generate one image summary per depth.
Like #mrry said, you can use tf.image_summary. For example, for cifar10_train.py, you can put this code somewhere under def train(). Note how you access a var under scope 'conv1'
# Visualize conv1 features
with tf.variable_scope('conv1') as scope_conv:
weights = tf.get_variable('weights')
# scale weights to [0 255] and convert to uint8 (maybe change scaling?)
x_min = tf.reduce_min(weights)
x_max = tf.reduce_max(weights)
weights_0_to_1 = (weights - x_min) / (x_max - x_min)
weights_0_to_255_uint8 = tf.image.convert_image_dtype (weights_0_to_1, dtype=tf.uint8)
# to tf.image_summary format [batch_size, height, width, channels]
weights_transposed = tf.transpose (weights_0_to_255_uint8, [3, 0, 1, 2])
# this will display random 3 filters from the 64 in conv1
tf.image_summary('conv1/filters', weights_transposed, max_images=3)
If you want to visualize all your conv1 filters in one nice grid, you would have to organize them into a grid yourself. I did that today, so now I'd like to share a gist for visualizing conv1 as a grid
You can extract the values as numpy arrays the following way:
with tf.variable_scope('conv1', reuse=True) as scope_conv:
W_conv1 = tf.get_variable('weights', shape=[5, 5, 1, 32])
weights = W_conv1.eval()
with open("conv1.weights.npz", "w") as outfile:
np.save(outfile, weights)
Note that you have to adjust the scope ('conv1' in my case) and the variable name ('weights' in my case).
Then it boils down on visualizing numpy arrays. One example how to visualize numpy arrays is
#!/usr/bin/env python
"""Visualize numpy arrays."""
import numpy as np
import scipy.misc
arr = np.load('conv1.weights.npb')
# Get each 5x5 filter from the 5x5x1x32 array
for filter_ in range(arr.shape[3]):
# Get the 5x5x1 filter:
extracted_filter = arr[:, :, :, filter_]
# Get rid of the last dimension (hence get 5x5):
extracted_filter = np.squeeze(extracted_filter)
# display the filter (might be very small - you can resize the window)
scipy.misc.imshow(extracted_filter)
Using the tensorflow 2 API, There are several options:
Weights extracted using the get_weights() function.
weights_n = model.layers[n].get_weights()[0]
Bias extracted using the numpy() convert function.
bias_n = model.layers[n].bias.numpy()
I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)
I am trying to get a simple CNN to train for the past 3 days.
First, I have setup an input pipeline/queue configuration that reads images from a directory tree and prepares batches.
I got the code for this at this link. So, I now have train_image_batch and train_label_batch that I need to feed to my CNN.
train_image_batch, train_label_batch = tf.train.batch(
[train_image, train_label],
batch_size=BATCH_SIZE
# ,num_threads=1
)
And I am unable to figure out how. I am using the code for CNN given at this link.
# Input Layer
input_layer = tf.reshape(train_image_batch, [-1, IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS])
# Convolutional Layer #1
conv1 = new_conv_layer(input_layer, NUM_CHANNELS, 5, 32, 2)
# Pooling Layer #1
pool1 = new_pooling_layer(conv1, 2, 2)
The input_layer on printing shows this
Tensor("Reshape:0", shape=(5, 120, 120, 3), dtype=uint8)
The next line crashes with TypeError; conv1 = new_conv_layer(...). The body of new_conv_layer function is given below
def new_conv_layer(input, # The previous layer.
num_input_channels, # Num. channels in prev. layer.
filter_size, # Width and height of each filter.
num_filters, # Number of filters.
stride):
# Shape of the filter-weights for the convolution.
# This format is determined by the TensorFlow API.
shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights aka. filters with the given shape.
weights = tf.Variable(tf.truncated_normal(shape, stddev=0.05))
# Create new biases, one for each filter.
biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))
# Create the TensorFlow operation for convolution.
# Note the strides are set to 1 in all dimensions.
# The first and last stride must always be 1,
# because the first is for the image-number and
# the last is for the input-channel.
# But e.g. strides=[1, 2, 2, 1] would mean that the filter
# is moved 2 pixels across the x- and y-axis of the image.
# The padding is set to 'SAME' which means the input image
# is padded with zeroes so the size of the output is the same.
layer = tf.nn.conv2d(input=input,
filter=weights,
strides=[1, stride, stride, 1],
padding='SAME')
# Add the biases to the results of the convolution.
# A bias-value is added to each filter-channel.
layer += biases
# Rectified Linear Unit (ReLU).
# It calculates max(x, 0) for each input pixel x.
# This adds some non-linearity to the formula and allows us
# to learn more complicated functions.
layer = tf.nn.relu(layer)
# Note that ReLU is normally executed before the pooling,
# but since relu(max_pool(x)) == max_pool(relu(x)) we can
# save 75% of the relu-operations by max-pooling first.
# We return both the resulting layer and the filter-weights
# because we will plot the weights later.
return layer, weights
Precisely it crashes at tf.nn.conv2d with this error
TypeError: Value passed to parameter 'input' has DataType uint8 not in list of allowed values: float16, float32
The image from your input pipeline is of type 'uint8', you need to type cast it to 'float32', You can do this after the image jpeg decoder:
image = tf.image.decode_jpeg(...
image = tf.cast(image, tf.float32)
You need to cast your image from int to float, You can simply do so for your input images.
image = image.astype('float')
It works fine with me.
I'm new in convolutional neural networks and in Tensorflow and I need to implement a conv layer with further parameters:
Conv. layer1: filter=11, channel=64, stride=4, Relu.
The API is following:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
I understand, what is stride and that it should be [1, 4, 4, 1] in my case. But I do not understand, how should I pass a filter parameter and padding.
Could someone help with it?
At first, you need to create a filter variable:
W = tf.Variable(tf.truncated_normal(shape = [11, 11, 3, 64], stddev = 0.1), tf.float32)
First two fields of shape parameter stand for filter size, third for the number of input channels (I guess your images have 3 channels) and fourth for the number of output channels.
Now output of convolutional layer could be computed as follows:
conv1 = tf.nn.conv2d(input, W, strides = [1, 4, 4, 1], padding = 'SAME'), where padding = 'SAME' stands for zero padding and therefore size of the image remains the same, input should have size [batch, size1, size2, 3].
ReLU application is pretty straightforward:
conv1 = tf.nn.relu(conv1)