I want to implement a Generative adversarial network (GAN) with unfixed input size, like 4-D Tensor (Batch_size, None, None, 3).
But when I use conv2d_transpose, there is a parameter output_shape, this parameter must pass the true size after deconvolution opeartion.
For example, if the size of batch_img is (64, 32, 32, 128), w is weight with (3, 3, 64, 128) , after
deconv = tf.nn.conv2d_transpose(batch_img, w, output_shape=[64, 64, 64, 64],stride=[1,2,2,1], padding='SAME')
So, I get deconv with size (64, 64, 64, 64), it's ok if I pass the true size of output_shape.
But, I want to use unfixed input size (64, None, None, 128), and get deconv with (64, None, None, 64).
But, it raises an error as below.
TypeError: Failed to convert object of type <type'list'> to Tensor...
So, what can I do to avoid this parameter in deconv? or is there another way to implement unfixed GAN?
The output shape list does not accept to have None in the list because the None object can not be converted to a Tensor Object
None is only allowed in shapes of tf.placeholder
for varying size output_shape instead of None try -1 for example you want size(64, None, None, 128) so try [64, -1, -1, 128]... I am not exactly sure whether this will work... It worked for me for batch_size that is my first argument was not of fixed size so I used -1
How ever there is also one high level api for transpose convolution tf.layers.conv2d_transpose()
I am sure the high level api tf.layers.conv2d_transpose() will work for you because it takes tensors of varying inputs
You do not even need to specify the output-shape you just need to specify the output_channel and the kernel to be used
For more details : https://www.tensorflow.org/api_docs/python/tf/layers/conv2d_transpose... I hope this helps
I ran into this problem too. Using -1, as suggested in the other answer here, doesn't work. Instead, you have to grab the shape of the incoming tensor and construct the output_size argument. Here's an excerpt from a test I wrote. In this case it's the first dimension that's unknown, but it should work for any combination of known and unknown parameters.
output_shape = [8, 8, 4] # width, height, channels-out. Handle batch size later
xin = tf.placeholder(dtype=tf.float32, shape = (None, 4, 4, 2), name='input')
filt = tf.placeholder(dtype=tf.float32, shape = filter_shape, name='filter')
## Find the batch size of the input tensor and add it to the front
## of output_shape
dimxin = tf.shape(xin)
ncase = dimxin[0:1]
oshp = tf.concat([ncase,output_shape], axis=0)
z1 = tf.nn.conv2d_transpose(xin, filt, oshp, strides=[1,2,2,1], name='xpose_conv')
I find a solution to use tf.shape for unspecified shape and get_shape() for specified shape.
def get_deconv_lens(H, k, d):
return tf.multiply(H, d) + k - 1
def deconv2d(x, output_shape, k_h=2, k_w=2, d_h=2, d_w=2, stddev=0.02, name='deconv2d'):
# output_shape: the output_shape of deconv op
shape = tf.shape(x)
H, W = shape[1], shape[2]
N, _, _, C = x.get_shape().as_list()
H1 = get_deconv_lens(H, k_h, d_h)
W1 = get_deconv_lens(W, k_w, d_w)
with tf.variable_scope(name):
w = tf.get_variable('weights', [k_h, k_w, C, x.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev))
biases = tf.get_variable('biases', shape=[C], initializer=tf.zeros_initializer())
deconv = tf.nn.conv2d_transpose(x, w, output_shape=[N, H1, W1, C], strides=[1, d_h, d_w, 1], padding='VALID')
deconv = tf.nn.bias_add(deconv, biases)
return deconv
Related
I'm attempting to use tf.image.pyramids.downsample from tensorflow_graphics in an auto-encoder model in every Down (encoding) block, to be then sent as a skip connection to Up (decoder) blocks.
class DownConv(Model):
n = 0
def __init__(self, kernel_size, filters, initializer, n_lower_levels):
super(DownConv, self).__init__(name=f"DownConv_{DownConv.n}")
DownConv.n += 1
self.pad = tf.constant([[0, 0], [kernel_size // 2, kernel_size // 2], [kernel_size // 2, kernel_size // 2], [0, 0]])
self.conv = L.Conv2D(filters, kernel_size, strides=2, kernel_initializer=initializer)
self.pyramid = None
self.filters = filters
self.n_lower_levels = n_lower_levels
def call(self, input_t):
logger.debug(f"Received {input_t.shape} in {self.name}")
x = tf.pad(input_t, self.pad, "SYMMETRIC")
x = self.conv(x)
p = tf.Variable(x)
self.pyramid = downsample(p, self.n_lower_levels)
pyramods = ", ".join([str(p.shape) for p in self.pyramid])
logger.debug(f"Received {input_t.shape} in {self.name}")
logger.debug(f"Generated pyramids: {pyramods}")
return tf.nn.selu(x)
However, thanks to logging I found out that this doesn't work. It seems only the very first pyramid (the first step of the downsample) contains the channels, the rest of them have None for channels.
self.pyramid[0].shape yields the correct (None, 256, 256, 64), but self.pyramid[1] yields (None, 256, 256, None) during a training step. Note that batches are correctly None here for axis=0, it is normal Tensorflow behavior for error logs.
Due to this issue, the training step produces an error in my Up blocks, when it tries to concatenate the two feature maps:
ValueError: The channel dimension of the inputs should be defined. The input_shape received is (None, 32, 32, None), where axis -1 (0-based) is the channel dimension, which found to be `None`.
Call arguments received:
• input_t=tf.Tensor(shape=(None, 32, 32, 256), dtype=float32)
I am trying to build a binary temporal image classifier by combining ResNet18 and an LSTM. However, I have never really used RNNs before and have been struggling on getting the correct output shape.
I am using a batch size of 128 and a sequence size of 32. The images are 80x80 grayscale images.
The current model is:
class CNNLSTM(nn.Module):
def __init__(self):
super(CNNLSTM, self).__init__()
self.resnet = models.resnet18(pretrained=False)
self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
self.resnet.fc = nn.Sequential(nn.Linear(in_features=512, out_features=256, bias=True))
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3)
self.fc1 = nn.Linear(256, 128)
self.fc2 = nn.Linear(128, 1)
def forward(self, x_3d):
#x3d: torch.Size([128, 32, 1, 80, 80])
hidden = None
toret = []
for t in range(x_3d.size(1)):
x = self.resnet(x_3d[:, t, :, :, :])
out, hidden = self.lstm(x.unsqueeze(0), hidden)
x = self.fc1(out[-1, :, :])
x = F.relu(x)
x = self.fc2(x)
print("x shape: ", x.shape)
toret.append(x)
return torch.stack(toret)
Which returns a tensor of shape torch.Size([32, 128, 1]) which, according to what I understand, means that every nth row represents the nth time step of each element in the sequence.
How can I get output of shape 128x1x32 instead?
And is there a better way to do this?
You could permute the dimensions:
a = torch.rand(32, 128, 1)
a = a.permute(1, 2, 0) # these are the indices of the original dimensions
print(a.shape)
>> torch.Size([128, 1, 32])
But you could also set batch_first=True in the LSTM module:
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3, batch_first=True)
This will expect that the input to the LSTM has the shape batch-size x seq-len x features and will output a tensor in the same way.
I understand that the balancing the pole example requires 2 outputs. Reinforcement Learning (DQN) Tutorial
Here is the output for self.head
print ('x',self.head)
x = Linear(in_features=512, out_features=2, bias=True)
When I run the epochs below is the outputs:
print (self.head(x.view(x.size(0), -1)))
return self.head(x.view(x.size(0), -1))
tensor([[-0.6945, -0.1930]])
tensor([[-0.0195, -0.1452]])
tensor([[-0.0906, -0.1816]])
tensor([[ 0.0631, -0.9051]])
tensor([[-0.0982, -0.5109]])
...
The size of x is:
x = torch.Size([121, 32, 2, 8])
So I am trying to understand what x.view(x.size(0), -1) is doing?
I understand from the comment in the code that it's returning:
Returns tensor([[left0exp,right0exp]...]).
But how does x which is torch.Size([121, 32, 2, 8]) being reduced to a tensor of size 2?
Is there an alternative way of writing that makes more sense? What if I had 4 outputs. How would I represent that? Why x.size(0). Why -1?
So appears to take self.head with 4 outputs to 2 outputs. Is that correct?
At the bottom is that class I am referring:
class DQN(nn.Module):
def __init__(self, h, w, outputs):
super(DQN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=5, stride=2)
self.bn1 = nn.BatchNorm2d(16)
self.conv2 = nn.Conv2d(16, 32, kernel_size=5, stride=2)
self.bn2 = nn.BatchNorm2d(32)
self.conv3 = nn.Conv2d(32, 32, kernel_size=5, stride=2)
self.bn3 = nn.BatchNorm2d(32)
# Number of Linear input connections depends on output of conv2d layers
# and therefore the input image size, so compute it.
def conv2d_size_out(size, kernel_size = 5, stride = 2):
return (size - (kernel_size - 1) - 1) // stride + 1
convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w)))
convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h)))
linear_input_size = convw * convh * 32
self.head = nn.Linear(linear_input_size, outputs)
# Called with either one element to determine next action, or a batch
# during optimization. Returns tensor([[left0exp,right0exp]...]).
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
x = F.relu(self.bn3(self.conv3(x)))
return self.head(x.view(x.size(0), -1))
x.view(x.size(0), -1) is flattening the tensor, this is because the Linear layer only accepts a vector (1d array). To break it down, x.view() reshapes the tensor of the specified shape (more info). x.shape(0) returns 1st dimension of the tensor (which is the batch size, this should remain the constant). The -1 in x.view() is a filler, in other words, its dimensions that we don't know, so PyTorch automatically calculates it. For example, if x = torch.tensor([1,2,3,4]), to reshape the tensor to a 2x2, you could do x.view(2,2) or x.view(2,-1) or x.view(-1,2).
The output shape is not a tensor shape of 2, but that of 121,2 (the 121 is the batch size, and the 2 comes from the Linear layers output). So to change the output size from 2, to 4, you would have to change the outputs argument in the __init__ function to 4.
I have a keras 3D/2D model. In this model a 3D layer has a shape of [None, None, 4, 32]. I want to reshape this into [None, None, 128]. However, if I simply do the following:
reshaped_layer = Reshape((-1, 128))(my_layer)
my_layer has a shape of [None, 128] and therefore I cannot apply afterwards any 2D convolution, like:
conv_x = Conv2D(16, (1,1))(reshaped_layer)
I've tried to use tf.shape(my_layer) and tf.reshape, but I have not been able to compile the model since tf.reshape is not a Keras layer.
Just to clarify, I'm using channels last; this is not tf.keras, this is just Keras. Here I send a debug of the reshape function: Reshape in keras
This is what I'm doing right now, following the advice of anna-krogager:
def reshape(x):
x_shape = K.shape(x)
new_x_shape = K.concatenate([x_shape[:-2], [x_shape[-2] * x_shape[-1]]])
return K.reshape(x, new_x_shape)
reshaped = Lambda(lambda x: reshape(x))(x)
reshaped.set_shape([None,None, None, 128])
conv_x = Conv2D(16, (1,1))(reshaped)
I get the following error: ValueError: The channel dimension of the inputs should be defined. Found None
You can use K.shape to get the shape of your input (as a tensor) and wrap the reshaping in a Lambda layer as follows:
def reshape(x):
x_shape = K.shape(x)
new_x_shape = K.concatenate([x_shape[:-2], [x_shape[-2] * x_shape[-1]]])
return K.reshape(x, new_x_shape)
reshaped = Lambda(lambda x: reshape(x))(x)
reshaped.set_shape([None, None, None, a * b]) # when x is of shape (None, None, a, b)
This will reshape a tensor with shape (None, None, a, b) to (None, None, a * b).
Digging into the base_layer.py, I have found that reshaped is:
tf.Tensor 'lambda_1/Reshape:0' shape=(?, ?, ?, 128) dtype=float32.
However its atribute "_keras_shape" is (None, None, None, None) even after the set_shape. Therefore, the solution is to set this attribute:
def reshape(x):
x_shape = K.shape(x)
new_x_shape = K.concatenate([x_shape[:-2], [x_shape[-2] * x_shape[-1]]])
return K.reshape(x, new_x_shape)
reshaped = Lambda(lambda x: reshape(x))(x)
reshaped.set_shape([None, None, None, 128])
reshaped.__setattr__("_keras_shape", (None, None, None, 128))
conv_x = Conv2D(16, (1,1))(reshaped)
Since you are reshaping the best you can obtain from (4,32), without losing dimensions, is either (128, 1) or (1, 128). Thus you can do the following:
# original has shape [None, None, None, 4, 32] (including batch)
reshaped_layer = Reshape((-1, 128))(original) # shape is [None, None, 128]
conv_layer = Conv2D(16, (1,1))(K.expand_dims(reshaped_layer, axis=-2)) # shape is [None, None, 1, 16]
The conv1d_transpose is not yet in the stable version of Tensorflow, but an implementation is available on github
I would like to create a 1D deconvolution network. The shape of the input is [-1, 256, 16] and the output should be [-1,1024,8]. The kernel's size is 5 and the stride is 4.
I tried to build a 1D convolutional layer with this function:
(output_depth, input_depth) = (8, 16)
kernel_width = 7
f_shape = [kernel_width, output_depth, input_depth]
layer_1_filter = tf.Variable(tf.random_normal(f_shape))
layer_1 = tf_exp.conv1d_transpose(
x,
layer_1_filter,
[-1,1024,8],
stride=4, padding="VALID"
)
The shape of layer_1 is TensorShape([Dimension(None), Dimension(None), Dimension(None)]), but it should be [-1,1024,8]
What do I wrong? How is it possible to implement 1D deconvolution in Tensorflow?
The pull request is open as of this moment, so the API and behavior can and probably will change. Some feature that one might expect from conv1d_transpose aren't supported:
output_shape requires batch size to be known statically, can't pass -1;
on the other hand, output shape is dynamic (this explains None dimension).
Also, the kernel_width=7 expects in_width=255, not 256. Should make kernel_width less than 4 to match in_width=256. The result is this demo code:
x = tf.placeholder(shape=[None, 256, 16], dtype=tf.float32)
filter = tf.Variable(tf.random_normal([3, 8, 16])) # [kernel_width, output_depth, input_depth]
out = conv1d_transpose(x, filter, output_shape=[100, 1024, 8], stride=4, padding="VALID")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
result = sess.run(out, feed_dict={x: np.zeros([100, 256, 16])})
print(result.shape) # prints (100, 1024, 8)
The new tf.contrib.nn.conv1d_transpose is now added to Tensorflow API r1.8.