I want to translate a code from pytorch which uses torch.nn.functional.unfold to tensorflow2.
I saw in How to replicate PyTorch's nn.functional.unfold function in Tensorflow? and Pytorch "Unfold" equivalent in Tensorflow that i need to use tf.image.extract_patches() function.
I have:
image = np.random.rand(2,3,32,32)
torch_image = tensor(image)
torch_x = torch.nn.functional.unfold(torch_image, (3,3), dilation=1, padding=0, stride=1)
print(torch_x.shape)
tf_image = tf.convert_to_tensor(image)
tf_image = tf.transpose(tf_image, [0, 2, 3, 1])
tf_x = tf.image.extract_patches(tf_image, sizes=[1,3,3,1], strides=[1,1,1,1], rates=[1,1,1,1], padding="VALID")
print(tf_x.shape)
This code gives me an output torch_x with a shape of (2,27,900) and an output tf_x with a shape of (2,30,30,27).
I realize a small test:
a = sorted(list(torch_x.numpy().flatten()))
b = sorted(list(tf_x.numpy().flatten()))
print(set([i-j for i,j in zip(a,b)]))
It results than all the values of tf_x are in torch_x. But, i dont know how to reshape tf_x to be equal to torch_x. I tried :
final_tf_x = tf.transpose(tf_x, [0, 3, 1, 2])
final_tf_x = tf.reshape(final_tf_x, [final_tf_x.shape[0], final_tf_x.shape[1], -1])
print(final_tf_x.shape)
print(np.abs(torch_x.numpy()-final_tf_x.numpy())<1e-8)
It gives me a tensor of the same shape as torch_x but the 2 tensors are not equal elementwise. Can someone explain me how to do this last step?
Related
I'm a beginner at the PyTorch library, and I got stuck in an exercise.
The code below works for the input image with size 2x2. I'm trying to do the same thing as below but the input image with size 4x4.
The code:
import torch
Assume that we have a 2x2 input image
inputs = torch.tensor([[[[1., 2.],
[3., 4.]]]])
inputs.shape
Output: torch.Size([1,1,2,2]
A fully connected layer, which maps the 4 input features two 2 outputs, would be computed as follows:
fc = torch.nn.Linear(4, 2)
weights = torch.tensor([[1.1, 1.2, 1.3, 1.4],
[1.5, 1.6, 1.7, 1.8]])
bias = torch.tensor([1.9, 2.0])
fc.weight.data = weights
fc.bias.data = bias
torch.relu(fc(inputs.view(-1, 4)))
Output: torch.Size([2, 1, 2, 2])
Output: torch.Size([2])
Obtain the same outputs if we use convolutional layers where the kernel size is the same size as the input feature array:
conv = torch.nn.Conv2d(in_channels=1,
out_channels=2,
kernel_size=inputs.squeeze(dim=(0)).squeeze(dim=(0)).size())
print(conv.weight.size())
print(conv.bias.size())
Output: torch.Size([2, 1, 2, 2])
Output: torch.Size([2])
conv.weight.data = weights.view(2, 1, 2, 2)
conv.bias.data = bias
torch.relu(conv(inputs))
Output: tensor([[[[14.9000]],
[[19.0000]]]], grad_fn=<ReluBackward0>)
Replace the fully connected layer using a convolutional layer when we reshape the input image into a num_inputs x 1 x 1 image:
conv = torch.nn.Conv2d(in_channels=4,
out_channels=2,
kernel_size=(1, 1))
conv.weight.data = weights.view(2, 4, 1, 1)
conv.bias.data = bias
torch.relu(conv(inputs.view(1, 4, 1, 1)))
Output: tensor([[[[14.9000]],
[[19.0000]]]], grad_fn=<ReluBackward0>)
So based on this code how to input an image that has a size 4x4 and replace the Fully Connected Layers using Convolution Layers?
You simply need to change the shape of input and reshape weights as per 4x4.
inputs = torch.randn(1, 1, 4, 4)
fc = torch.nn.Linear(16, 2)
torch.relu(fc(inputs.view(-1, 16)))
# output
tensor([[0.0000, 0.2525]], grad_fn=<ReluBackward0>)
Now, for conv layer
conv = torch.nn.Conv2d(in_channels=1,
out_channels=2,
kernel_size=inputs.squeeze(dim=(0)).squeeze(dim=(0)).size())
conv.weight.data = fc.weight.data.view(2, 1, 4, 4)
conv.bias.data = fc.bias.data
torch.relu(conv(inputs))
# output
tensor([[[[0.0000]],
[[0.2525]]]], grad_fn=<ReluBackward0>)
You can read Converting FC layers to CONV layers if not sure how conv layers params are taken.
I want to apply "tf.nn.max_pool()" on a single image but I get a result with dimension that is totally different than the input:
import tensorflow as tf
import numpy as np
ifmaps_1 = tf.Variable(tf.random_uniform( shape=[ 7, 7, 3], minval=0, maxval=3, dtype=tf.int32))
ifmaps=tf.dtypes.cast(ifmaps_1, dtype=tf.float64)
ofmaps_tf = tf.nn.max_pool([ifmaps], ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding="SAME")[0] # no padding
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print("ifmaps_tf = ")
print(ifmaps.eval())
print("ofmaps_tf = ")
result = sess.run(ofmaps_tf)
print(result)
I think this is related to trying to apply pooling to single example not on a batch. I need to do the pooling on a single example.
Any help is appreciated.
Your input is (7,7,3), kernel size is (3,3) and stride is (2,2). So if you do not want any paddings, (state in your comment), you should use padding="VALID", that will return a (3,3) tensor as output. If you use padding="SAME", it will return (4,4) tensor.
Usually, the formula of calculating output size for SAME pad is:
out_size = ceil(in_sizei/stride)
For VALID pad is:
out_size = ceil(in_size-filter_size+1/stride)
When I build the FCN for segmentation, I want the images to keep the original size of input data, so I use the fully convolution layers. When I choose the fixed input size, such as (224, 224), the transpose conv works fine. However, when I changed the code of using (224, 224) to (h, w), I meet the following error. I googled before, but I didn't figure it out. Can anyone help me? Thanks.
Error information:
InvalidArgumentError (see above for traceback): Conv2DSlowBackpropInput: Size
of out_backprop doesn't match computed: actual = 62, computed =
63spatial_dim: 2 input: 500 filter: 16 output: 62 stride: 8 dilation: 1
[[Node: deconv_layer/conv2d_transpose_2 =
Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1],
padding="SAME", strides=[1, 1, 8, 8], use_cudnn_on_gpu=true,
_device="/job:localhost/replica:0/task:0/device:GPU:0"]
(deconv_layer/conv2d_transpose_2-0-VecPermuteNHWCToNCHW-
LayoutOptimizer/_1961, deconv_layer/deconv3/kernel/read,
deconv_layer/Add_1)]]
[[Node: losses/_2091 = _Recv[client_terminated=false,
recv_device="/job:localhost/replica:0/task:0/device:CPU:0",
send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1, tensor_name="edge_4480_losses",
tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]
()]]
Code:
with tf.variable_scope("deconv_layer"):
deconv_shape1 = block2.get_shape()
W_t1 = deconv_utils.weight_variable([4, 4, deconv_shape1[3].value, 2048],
name="deconv1/kernel")
b_t1 = deconv_utils.bias_variable([deconv_shape1[3].value],
name="deconv1/biases")
deconv_t1 = deconv_utils.conv2d_transpose_strided(block4, W_t1, b_t1,
output_shape=tf.shape(block2))
fuse1 = tf.add(deconv_t1, block2)
print("deconv_t1: ", deconv_t1.shape)
print("fuse_1: ", fuse1.shape)
tf.identity(fuse1, name="fuse1")
deconv_shape2 = block1.get_shape()
W_t2 = deconv_utils.weight_variable([4, 4, deconv_shape2[3].value,
deconv_shape1[3].value], name="deconv2/kernel")
b_t2 = deconv_utils.bias_variable([deconv_shape2[3].value],
name="deconv2/biases")
deconv_t2 = deconv_utils.conv2d_transpose_strided(fuse1, W_t2, b_t2,
output_shape=tf.shape(block1))
fuse2 = tf.add(deconv_t2, block1)
print("deconv_t2: ", deconv_t2.shape)
print("fuse2: ", fuse2.shape)
tf.identity(fuse2, name="fuse2")
shape = tf.shape(features)
deconv_shape3 = tf.stack([shape[0], shape[1], shape[2], num_classes])
W_t3 = deconv_utils.weight_variable([16, 16, num_classes,
deconv_shape2[3].value], name="deconv3/kernel")
b_t3 = deconv_utils.bias_variable([num_classes], name="deconv3/biases")
deconv_t3 = deconv_utils.conv2d_transpose_strided(fuse2, W_t3, b_t3,
output_shape=deconv_shape3, stride=8)
print("deconv_t3: ", deconv_t3.shape)
The version with out custom functions is here:
with tf.variable_scope("deconv_layer"):
deconv1_shape = block2.get_shape()
shape1 = [4, 4, deconv1_shape[3].value, 2048]
deconv1_kernel = tf.Variable(initial_value=tf.truncated_normal(shape1,
stddev=0.02),
trainable=True,
name="deconv1/kernel")
deconv1 = tf.nn.conv2d_transpose(value=block4,
filter=deconv1_kernel,
# output_shape=[BATCH_SIZE,
tf.shape(block2)[1], tf.shape(block2)[2], 512],
output_shape=tf.shape(block2),
strides=[1, 2, 2, 1],
padding='SAME',
data_format='NHWC'
)
print('deconv1', deconv1.shape)
fuse1 = tf.add(deconv1, block2) # fuse1 = pool4 + deconv2(pool5)
tf.identity(fuse1, name="fuse1")
deconv2_shape = block1.get_shape()
shape2 = [4, 4, deconv2_shape[3].value, deconv1_shape[3].value]
deconv2_kernel = tf.Variable(initial_value=tf.truncated_normal(shape2,
stddev=0.02),
trainable=True,
name="deconv2/kernel")
deconv2 = tf.nn.conv2d_transpose(value=fuse1,
filter=deconv2_kernel,
output_shape=tf.shape(block1),
strides=[1, 2, 2, 1],
padding='SAME',
data_format='NHWC'
)
print('deconv2', deconv2.shape)
fuse2 = tf.add(deconv2, block1)
tf.identity(fuse2, name="fuse2")
deconv3_shape = tf.stack([tf.shape(features)[0], tf.shape(features)[1],
tf.shape(features)[2], num_classes])
shape3 = [16, 16, num_classes, deconv2_shape[3].value]
deconv_final_kernel = tf.Variable(initial_value=tf.truncated_normal(shape3, stddev=0.02),
trainable=True,
name="deconv3/kernel")
seg_logits = tf.nn.conv2d_transpose(value=fuse2,
filter=deconv_final_kernel,
output_shape=deconv3_shape,
strides=[1, 8, 8, 1],
padding='SAME',
data_format='NHWC')
The conv Net and Deconv Net in FCN, which are built by different structures, are maybe not consistent with each other. In this case, the conv net use conv with padding='VALID', while the deconv net uses all conv_transpose with padding='SAME. Thus the shapes are not the same, which causes the problem above.
This is because of your stride > 1. The calculation can not be correct at all time. This GitHub post explains it.
I had similar issue while trying to replicate pytorch's transposeconv2d function in tensorflow. I was trying to do padding on input before passing to the conv2d_transpose() function and doing padding again on the deconvolved output. This was the reason why graph was initialized properly but there was error in calculating the gradients. I solved the error by removing all manual paddings and changing padding="SAME" inside the function. I guess this is handeled internally in the function. Correct me if I am wrong. I don't know how much this affects the actual output.
The conv1d_transpose is not yet in the stable version of Tensorflow, but an implementation is available on github
I would like to create a 1D deconvolution network. The shape of the input is [-1, 256, 16] and the output should be [-1,1024,8]. The kernel's size is 5 and the stride is 4.
I tried to build a 1D convolutional layer with this function:
(output_depth, input_depth) = (8, 16)
kernel_width = 7
f_shape = [kernel_width, output_depth, input_depth]
layer_1_filter = tf.Variable(tf.random_normal(f_shape))
layer_1 = tf_exp.conv1d_transpose(
x,
layer_1_filter,
[-1,1024,8],
stride=4, padding="VALID"
)
The shape of layer_1 is TensorShape([Dimension(None), Dimension(None), Dimension(None)]), but it should be [-1,1024,8]
What do I wrong? How is it possible to implement 1D deconvolution in Tensorflow?
The pull request is open as of this moment, so the API and behavior can and probably will change. Some feature that one might expect from conv1d_transpose aren't supported:
output_shape requires batch size to be known statically, can't pass -1;
on the other hand, output shape is dynamic (this explains None dimension).
Also, the kernel_width=7 expects in_width=255, not 256. Should make kernel_width less than 4 to match in_width=256. The result is this demo code:
x = tf.placeholder(shape=[None, 256, 16], dtype=tf.float32)
filter = tf.Variable(tf.random_normal([3, 8, 16])) # [kernel_width, output_depth, input_depth]
out = conv1d_transpose(x, filter, output_shape=[100, 1024, 8], stride=4, padding="VALID")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
result = sess.run(out, feed_dict={x: np.zeros([100, 256, 16])})
print(result.shape) # prints (100, 1024, 8)
The new tf.contrib.nn.conv1d_transpose is now added to Tensorflow API r1.8.
I would like to train a network with two different shapes of input tensor. Each epoch chooses one type.
Here I write a small code:
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
imgs1 = tf.placeholder(tf.float32, [4, 224, 224, 3], name = 'input_imgs1')
imgs2 = tf.placeholder(tf.float32, [4, 180, 180, 3], name = 'input_imgs2')
epoch_num_tf = tf.placeholder(tf.int32, [], name = 'input_epoch_num')
imgs = tf.cond(tf.equal(tf.mod(epoch_num_tf, 2), 0),
lambda: tf.Print(imgs2, [imgs2.get_shape()], message='(even number) input epoch number is '),
lambda: tf.Print(imgs1, [imgs1.get_shape()], message='(odd number) input epoch number is'))
print(imgs.get_shape())
for epoch in range(10):
epoch_num = np.array(epoch).astype(np.int32)
imgs1_input = np.ones([4, 224, 224, 3], dtype = np.float32)
imgs2_input = np.ones([4, 180, 180, 3], dtype = np.float32)
output = sess.run(imgs, feed_dict = {epoch_num_tf: epoch_num,
imgs1: imgs1_input,
imgs2: imgs2_input})
When I execute it, the output of imgs.get_shape() is (4, ?, ?, 3)
i.e. imgs.get_shape()[1]=None, imgs.get_shape()[2]=None.
But I will use the value of the output of imgs.get_shape() to define the kernel (ksize) and strides size (strides) of the tf.nn.max_pool() e.g. ksize=[1,imgs.get_shape()[1]/6, imgs.get_shape()[2]/6, 1] in the future code.
I think ksize and strides cannot support tf.Tensor value.
How to solve this problem? Or how to set the shape of imgs conditionally?
When you do print(a.get_shape()), you are getting the static shape of the tensor a. Assuming you mean imgs.get_shape() and not a.get_shape() in the code above, dimensions 1 and 2 of imgs vary dynamically with the value of epoch_num_tf. Therefore the static shape in those dimensions is unknown, which TensorFlow represents as None.
If you want to use the dynamic shape of imgs in subsequent code, you should use the tf.shape() operator to get the shape as a tf.Tensor. For example, instead of imgs.get_shape()[2], you can use tf.shape(imgs)[2].
Unfortunately, the ksize and strides arguments of tf.nn.max_pool() do not accept tf.Tensor values. (I think this is a historical limitation, because these were configured as "attrs" rather than "inputs" of the corresponding kernel. Please open a GitHub issue if you'd like to request this feature!) One possible workaround would be to use another tf.cond():
imgs = ...
# Could also use `tf.equal(tf.mod(epoch_num_tf, 2), 0)` as the predicate.
pool_output = tf.cond(tf.equal(tf.shape(imgs)[2], 180),
lambda: tf.nn.max_pool(imgs, ksize=[1, 180/6, 180/6, 1], ...),
lambda: tf.nn.max_pool(imgs, ksize=[1, 224/6, 224/6, 1], ...))