How to use the conv1d_transpose in Tensorflow? - python

The conv1d_transpose is not yet in the stable version of Tensorflow, but an implementation is available on github
I would like to create a 1D deconvolution network. The shape of the input is [-1, 256, 16] and the output should be [-1,1024,8]. The kernel's size is 5 and the stride is 4.
I tried to build a 1D convolutional layer with this function:
(output_depth, input_depth) = (8, 16)
kernel_width = 7
f_shape = [kernel_width, output_depth, input_depth]
layer_1_filter = tf.Variable(tf.random_normal(f_shape))
layer_1 = tf_exp.conv1d_transpose(
x,
layer_1_filter,
[-1,1024,8],
stride=4, padding="VALID"
)
The shape of layer_1 is TensorShape([Dimension(None), Dimension(None), Dimension(None)]), but it should be [-1,1024,8]
What do I wrong? How is it possible to implement 1D deconvolution in Tensorflow?

The pull request is open as of this moment, so the API and behavior can and probably will change. Some feature that one might expect from conv1d_transpose aren't supported:
output_shape requires batch size to be known statically, can't pass -1;
on the other hand, output shape is dynamic (this explains None dimension).
Also, the kernel_width=7 expects in_width=255, not 256. Should make kernel_width less than 4 to match in_width=256. The result is this demo code:
x = tf.placeholder(shape=[None, 256, 16], dtype=tf.float32)
filter = tf.Variable(tf.random_normal([3, 8, 16])) # [kernel_width, output_depth, input_depth]
out = conv1d_transpose(x, filter, output_shape=[100, 1024, 8], stride=4, padding="VALID")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
result = sess.run(out, feed_dict={x: np.zeros([100, 256, 16])})
print(result.shape) # prints (100, 1024, 8)

The new tf.contrib.nn.conv1d_transpose is now added to Tensorflow API r1.8.

Related

Equivalent of pytorch unfold in tensorflow

I want to translate a code from pytorch which uses torch.nn.functional.unfold to tensorflow2.
I saw in How to replicate PyTorch's nn.functional.unfold function in Tensorflow? and Pytorch "Unfold" equivalent in Tensorflow that i need to use tf.image.extract_patches() function.
I have:
image = np.random.rand(2,3,32,32)
torch_image = tensor(image)
torch_x = torch.nn.functional.unfold(torch_image, (3,3), dilation=1, padding=0, stride=1)
print(torch_x.shape)
tf_image = tf.convert_to_tensor(image)
tf_image = tf.transpose(tf_image, [0, 2, 3, 1])
tf_x = tf.image.extract_patches(tf_image, sizes=[1,3,3,1], strides=[1,1,1,1], rates=[1,1,1,1], padding="VALID")
print(tf_x.shape)
This code gives me an output torch_x with a shape of (2,27,900) and an output tf_x with a shape of (2,30,30,27).
I realize a small test:
a = sorted(list(torch_x.numpy().flatten()))
b = sorted(list(tf_x.numpy().flatten()))
print(set([i-j for i,j in zip(a,b)]))
It results than all the values of tf_x are in torch_x. But, i dont know how to reshape tf_x to be equal to torch_x. I tried :
final_tf_x = tf.transpose(tf_x, [0, 3, 1, 2])
final_tf_x = tf.reshape(final_tf_x, [final_tf_x.shape[0], final_tf_x.shape[1], -1])
print(final_tf_x.shape)
print(np.abs(torch_x.numpy()-final_tf_x.numpy())<1e-8)
It gives me a tensor of the same shape as torch_x but the 2 tensors are not equal elementwise. Can someone explain me how to do this last step?

Tensorflow: CNN training converges at a vector of zeros

I'm a beginner in deep learning and have taken a few courses on Udacity. Recently I'm trying to build a deep network detecting hand joints in the input depth images, which doesn't seem to be working well. (My dataset is ICVL Hand Posture Dataset)
The network structure is shown here.
① A batch of input images, 240x320;
② An 8-channel convolutional layer with a 5x5 kernel;
③ A max pooling layer, ksize = stride = 2;
④ A fully-connected layer, weight.shape = [38400, 1024];
⑤ A fully-connected layer, weight.shape = [1024, 48].
After several epochs of training, the output of the last layer converges as a (0, 0, ..., 0) vector. I chose the mean square error as the loss function and its value stayed above 40000 and didn't seem to reduce.
The network structure is already too simple to be simplified again but the problem remains. Could anyone offer any suggestions?
My main code is posted below:
image = tf.placeholder(tf.float32, [None, 240, 320, 1])
annotations = tf.placeholder(tf.float32, [None, 48])
W_convolution_layer1 = tf.Variable(tf.truncated_normal([5, 5, 1, 8], stddev=0.1))
b_convolution_layer1 = tf.Variable(tf.constant(0.1, shape=[8]))
h_convolution_layer1 = tf.nn.relu(
tf.nn.conv2d(image, W_convolution_layer1, [1, 1, 1, 1], 'SAME') + b_convolution_layer1)
h_pooling_layer1 = tf.nn.max_pool(h_convolution_layer1, [1, 2, 2, 1], [1, 2, 2, 1], 'SAME')
W_fully_connected_layer1 = tf.Variable(tf.truncated_normal([120 * 160 * 8, 1024], stddev=0.1))
b_fully_connected_layer1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_pooling_flat = tf.reshape(h_pooling_layer1, [-1, 120 * 160 * 8])
h_fully_connected_layer1 = tf.nn.relu(
tf.matmul(h_pooling_flat, W_fully_connected_layer1) + b_fully_connected_layer1)
W_fully_connected_layer2 = tf.Variable(tf.truncated_normal([1024, 48], stddev=0.1))
b_fully_connected_layer2 = tf.Variable(tf.constant(0.1, shape=[48]))
detection = tf.nn.relu(
tf.matmul(h_fully_connected_layer1, W_fully_connected_layer2) + b_fully_connected_layer2)
mean_squared_error = tf.reduce_sum(tf.losses.mean_squared_error(annotations, detection))
training = tf.train.AdamOptimizer(1e-4).minimize(mean_squared_error)
# This data loader reads images and annotations and convert them into batches of numbers.
loader = ICVLDataLoader('../data/')
with tf.Session() as session:
session.run(tf.global_variables_initializer())
for i in range(1000):
# batch_images: a list with shape = [BATCH_SIZE, 240, 320, 1]
# batch_annotations: a list with shape = [BATCH_SIZE, 48]
[batch_images, batch_annotations] = loader.get_batch(100).to_1d_list()
[x_, t_, l_, p_] = session.run([x_image, training, mean_squared_error, detection],
feed_dict={images: batch_images, annotations: batch_annotations})
And it runs like this.
The main issue is likely the relu activation in the output layer. You should remove this, i.e. let detection simply be the results of a matrix multiplication. If you want to force the outputs to be positive, consider something like the exponential function instead.
While relu is a popular hidden activation, I see one major problem with using it as an output activation: As is well known relu maps negative inputs to 0 -- however, crucially, the gradients will also be 0. This happening in the output layer basically means your network cannot learn from its mistakes when it produces outputs < 0 (which is likely to happen with random initializations). This will likely heavily impair the overall learning process.

Tensorflow: Recurrent Neural Network Batch Training

I am trying to implement RNN in Tensorflow. I am writing my own functions instead of using RNN cells to practice.
The problem is sequence tagging, input size is [32, 48, 900] where 32 is batch size, 48 is time steps and 900 is vocab size which is one-hot encoded vector. Output is [32, 48, 145] where first two dimensions are same as input, but the last dimension is output vocabulary size (one-hot). Basically this is a NLP tagging problem.
I am getting following error:
InvalidArgumentError (see above for traceback): logits and labels must
be same size: logits_size=[48,145] labels_size=[1536,145]
The actual labels_size is [32, 48, 145] but it merges first two dimensions without my control. FYI 32*48 = 1536
If I run my RNN with batch size 1, it works fine as expected. I could not figure out how to solve the issue. I am getting the problem in the last line of the code.
I pasted the related part of the code:
inputs = tf.placeholder(shape=[None, self.seq_length, self.vocab_size], dtype=tf.float32, name="inputs")
targets = tf.placeholder(shape=[None, self.seq_length, self.output_vocab_size], dtype=tf.float32, name="targets")
init_state = tf.placeholder(shape=[1, self.hidden_size], dtype=tf.float32, name="state")
initializer = tf.random_normal_initializer(stddev=0.1)
with tf.variable_scope("RNN") as scope:
hs_t = init_state
ys = []
for t, xs_t in enumerate(tf.split(inputs[0], self.seq_length, axis=0)):
if t > 0: scope.reuse_variables()
Wxh = tf.get_variable("Wxh", [self.vocab_size, self.hidden_size], initializer=initializer)
Whh = tf.get_variable("Whh", [self.hidden_size, self.hidden_size], initializer=initializer)
Why = tf.get_variable("Why", [self.hidden_size, self.output_vocab_size], initializer=initializer)
bh = tf.get_variable("bh", [self.hidden_size], initializer=initializer)
by = tf.get_variable("by", [self.output_vocab_size], initializer=initializer)
hs_t = tf.tanh(tf.matmul(xs_t, Wxh) + tf.matmul(hs_t, Whh) + bh)
ys_t = tf.matmul(hs_t, Why) + by
ys.append(ys_t)
hprev = hs_t
output_softmax = tf.nn.softmax(ys) # Get softmax for sampling
#outputs = tf.concat(ys, axis=0)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets, logits=ys))
The problem may fall in the size of the ys, ys should have the size of [32, 48, 145], but the output ys only have the size of [48,145], so if the batchsize is 1, the taget size is [1, 48, 145], which just have the same size of [48,145] after dimensionality reduction.
To solve the problem you can add a loop to deal with the batchsize ( inputs[0] ) :
such as :
for i in range(inputs.getshape(0)):
for t, xs_t in enumerate(tf.split(inputs[i], self.seq_length, axis=0)):

Tensorflow conv2d_transpose output_shape

I want to implement a Generative adversarial network (GAN) with unfixed input size, like 4-D Tensor (Batch_size, None, None, 3).
But when I use conv2d_transpose, there is a parameter output_shape, this parameter must pass the true size after deconvolution opeartion.
For example, if the size of batch_img is (64, 32, 32, 128), w is weight with (3, 3, 64, 128) , after
deconv = tf.nn.conv2d_transpose(batch_img, w, output_shape=[64, 64, 64, 64],stride=[1,2,2,1], padding='SAME')
So, I get deconv with size (64, 64, 64, 64), it's ok if I pass the true size of output_shape.
But, I want to use unfixed input size (64, None, None, 128), and get deconv with (64, None, None, 64).
But, it raises an error as below.
TypeError: Failed to convert object of type <type'list'> to Tensor...
So, what can I do to avoid this parameter in deconv? or is there another way to implement unfixed GAN?
The output shape list does not accept to have None in the list because the None object can not be converted to a Tensor Object
None is only allowed in shapes of tf.placeholder
for varying size output_shape instead of None try -1 for example you want size(64, None, None, 128) so try [64, -1, -1, 128]... I am not exactly sure whether this will work... It worked for me for batch_size that is my first argument was not of fixed size so I used -1
How ever there is also one high level api for transpose convolution tf.layers.conv2d_transpose()
I am sure the high level api tf.layers.conv2d_transpose() will work for you because it takes tensors of varying inputs
You do not even need to specify the output-shape you just need to specify the output_channel and the kernel to be used
For more details : https://www.tensorflow.org/api_docs/python/tf/layers/conv2d_transpose... I hope this helps
I ran into this problem too. Using -1, as suggested in the other answer here, doesn't work. Instead, you have to grab the shape of the incoming tensor and construct the output_size argument. Here's an excerpt from a test I wrote. In this case it's the first dimension that's unknown, but it should work for any combination of known and unknown parameters.
output_shape = [8, 8, 4] # width, height, channels-out. Handle batch size later
xin = tf.placeholder(dtype=tf.float32, shape = (None, 4, 4, 2), name='input')
filt = tf.placeholder(dtype=tf.float32, shape = filter_shape, name='filter')
## Find the batch size of the input tensor and add it to the front
## of output_shape
dimxin = tf.shape(xin)
ncase = dimxin[0:1]
oshp = tf.concat([ncase,output_shape], axis=0)
z1 = tf.nn.conv2d_transpose(xin, filt, oshp, strides=[1,2,2,1], name='xpose_conv')
I find a solution to use tf.shape for unspecified shape and get_shape() for specified shape.
def get_deconv_lens(H, k, d):
return tf.multiply(H, d) + k - 1
def deconv2d(x, output_shape, k_h=2, k_w=2, d_h=2, d_w=2, stddev=0.02, name='deconv2d'):
# output_shape: the output_shape of deconv op
shape = tf.shape(x)
H, W = shape[1], shape[2]
N, _, _, C = x.get_shape().as_list()
H1 = get_deconv_lens(H, k_h, d_h)
W1 = get_deconv_lens(W, k_w, d_w)
with tf.variable_scope(name):
w = tf.get_variable('weights', [k_h, k_w, C, x.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev))
biases = tf.get_variable('biases', shape=[C], initializer=tf.zeros_initializer())
deconv = tf.nn.conv2d_transpose(x, w, output_shape=[N, H1, W1, C], strides=[1, d_h, d_w, 1], padding='VALID')
deconv = tf.nn.bias_add(deconv, biases)
return deconv

tensorflow: how to set the shape of tensor with different conditional statements?

I would like to train a network with two different shapes of input tensor. Each epoch chooses one type.
Here I write a small code:
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
imgs1 = tf.placeholder(tf.float32, [4, 224, 224, 3], name = 'input_imgs1')
imgs2 = tf.placeholder(tf.float32, [4, 180, 180, 3], name = 'input_imgs2')
epoch_num_tf = tf.placeholder(tf.int32, [], name = 'input_epoch_num')
imgs = tf.cond(tf.equal(tf.mod(epoch_num_tf, 2), 0),
lambda: tf.Print(imgs2, [imgs2.get_shape()], message='(even number) input epoch number is '),
lambda: tf.Print(imgs1, [imgs1.get_shape()], message='(odd number) input epoch number is'))
print(imgs.get_shape())
for epoch in range(10):
epoch_num = np.array(epoch).astype(np.int32)
imgs1_input = np.ones([4, 224, 224, 3], dtype = np.float32)
imgs2_input = np.ones([4, 180, 180, 3], dtype = np.float32)
output = sess.run(imgs, feed_dict = {epoch_num_tf: epoch_num,
imgs1: imgs1_input,
imgs2: imgs2_input})
When I execute it, the output of imgs.get_shape() is (4, ?, ?, 3)
i.e. imgs.get_shape()[1]=None, imgs.get_shape()[2]=None.
But I will use the value of the output of imgs.get_shape() to define the kernel (ksize) and strides size (strides) of the tf.nn.max_pool() e.g. ksize=[1,imgs.get_shape()[1]/6, imgs.get_shape()[2]/6, 1] in the future code.
I think ksize and strides cannot support tf.Tensor value.
How to solve this problem? Or how to set the shape of imgs conditionally?
When you do print(a.get_shape()), you are getting the static shape of the tensor a. Assuming you mean imgs.get_shape() and not a.get_shape() in the code above, dimensions 1 and 2 of imgs vary dynamically with the value of epoch_num_tf. Therefore the static shape in those dimensions is unknown, which TensorFlow represents as None.
If you want to use the dynamic shape of imgs in subsequent code, you should use the tf.shape() operator to get the shape as a tf.Tensor. For example, instead of imgs.get_shape()[2], you can use tf.shape(imgs)[2].
Unfortunately, the ksize and strides arguments of tf.nn.max_pool() do not accept tf.Tensor values. (I think this is a historical limitation, because these were configured as "attrs" rather than "inputs" of the corresponding kernel. Please open a GitHub issue if you'd like to request this feature!) One possible workaround would be to use another tf.cond():
imgs = ...
# Could also use `tf.equal(tf.mod(epoch_num_tf, 2), 0)` as the predicate.
pool_output = tf.cond(tf.equal(tf.shape(imgs)[2], 180),
lambda: tf.nn.max_pool(imgs, ksize=[1, 180/6, 180/6, 1], ...),
lambda: tf.nn.max_pool(imgs, ksize=[1, 224/6, 224/6, 1], ...))

Categories