reshape a matrix from [?, 100] to [batch_size, ?, 100] - python

I'm building an autoencoder based on RNN. After FC layer, I have to reshape my output to [batch_size, sequence_length, embedding_dimension]. However, my sequence length(timestep) for my decoder is uncertain. What I wish is something work as follow.
outputs = tf.reshape(outputs, [batch_size, None, word_dimension])
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].

You can use -1 for the dimension in your reshape operation that you want to be calculated automatically.
For example, here:
x = tf.zeros((100 * 10 *12,))
reshaped = tf.reshape(x, [100, -1, 12])
reshaped will have shape (100, 10, 12)
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].
You can use the tf.shape operation to find the shape of a tensor at runtime so if you want sequence_length in a tensor with shape [batch_size, sequence_length, embedding_dimension], you need just call tf.shape(x)[1].
For my example above, calling:
tf.shape(reshaped)[1]
would give an int32 tensor with shape () and value 10

Related

Use of PyTorch permute in RCNN

I am looking at an implementation of RCNN for text classification using PyTorch. Full Code. There are two points where the dimensions of tensors are permuted using the permute function. The first is after the LSTM layer and before tanh. The second is after a linear layer and before a max pooling layer.
Could you please explain why the permutation is necessary or useful?
Relevant Code
def forward(self, x):
# x.shape = (seq_len, batch_size)
embedded_sent = self.embeddings(x)
# embedded_sent.shape = (seq_len, batch_size, embed_size)
lstm_out, (h_n,c_n) = self.lstm(embedded_sent)
# lstm_out.shape = (seq_len, batch_size, 2 * hidden_size)
input_features = torch.cat([lstm_out,embedded_sent], 2).permute(1,0,2)
# final_features.shape = (batch_size, seq_len, embed_size + 2*hidden_size)
linear_output = self.tanh(
self.W(input_features)
)
# linear_output.shape = (batch_size, seq_len, hidden_size_linear)
linear_output = linear_output.permute(0,2,1) # Reshaping fot max_pool
max_out_features = F.max_pool1d(linear_output, linear_output.shape[2]).squeeze(2)
# max_out_features.shape = (batch_size, hidden_size_linear)
max_out_features = self.dropout(max_out_features)
final_out = self.fc(max_out_features)
return self.softmax(final_out)
Similar Code in other Repositories
Similar implementations of RCNN use permute or transpose. Here are examples:
https://github.com/prakashpandey9/Text-Classification-Pytorch/blob/master/models/RCNN.py
https://github.com/jungwhank/rcnn-text-classification-pytorch/blob/master/model.py
What permute function does is rearranges the original tensor according to the desired ordering, note permute is different from reshape function, because when apply permute, the elements in tensor follow the index you provide where in reshape it's not.
Example code:
import torch
var = torch.randn(2, 4)
pe_var = var.permute(1, 0)
re_var = torch.reshape(var, (4, 2))
print("Original size:\n{}\nOriginal var:\n{}\n".format(var.size(), var) +
"Permute size:\n{}\nPermute var:\n{}\n".format(pe_var.size(), pe_var) +
"Reshape size:\n{}\nReshape var:\n{}\n".format(re_var.size(), re_var))
Outputs:
Original size:
torch.Size([2, 4])
Original var:
tensor([[ 0.8250, -0.1984, 0.5567, -0.7123],
[-1.0503, 0.0470, -1.9473, 0.9925]])
Permute size:
torch.Size([4, 2])
Permute var:
tensor([[ 0.8250, -1.0503],
[-0.1984, 0.0470],
[ 0.5567, -1.9473],
[-0.7123, 0.9925]])
Reshape size:
torch.Size([4, 2])
Reshape var:
tensor([[ 0.8250, -0.1984],
[ 0.5567, -0.7123],
[-1.0503, 0.0470],
[-1.9473, 0.9925]])
With the role of permute in mind we could see what first permute does is reordering the concatenate tensor for it to fit the inputs format of self.W, i.e with batch as first dimension; and the second permute does similar thing because we want to max pool the linear_output along the sequence and F.max_pool1d will pool along the last dimension.
I am adding this answer to provide additional PyTorch-specific details.
It is necessary to use permute between nn.LSTM and nn.Linear because the output shape of LSTM does not correspond to the expected input shape of Linear.
nn.LSTM outputs output, (h_n, c_n). Tensor output has shape seq_len, batch, num_directions * hidden_size nn.LSTM. nn.Linear expects an input tensor with shape N,∗,H, where N is batch size and H is number of input features. nn.Linear.
It is necessary to use permute between nn.Linear and nn.MaxPool1d because the output of nn.Linear is N, L, C, where N is batch size, C is the number of features, and and L is sequence length. nn.MaxPool1d expects an input tensor of shape N, C, L. nn.MaxPool1d
I reviewed seven implementations of RCNN for text classification with PyTorch on GitHub and gitee and found that permute and transpose are the normal ways to convert the output of one layer to the input of a subsequent layer.

TensorFlow keeping shape the same when slicing?

I am trying to take out a single element out of one dimension, while keeping the shapes the same.
The shape of the tensor is: (BATCH_SIZE, N_STEPS, NUM_FEATURES)
I want to create a new tensor that is (BATCH_SIZE, 1, NUM_FEATURES), where 1 is the final step.
The input tensor shape is (None, 128,16)
I tried to create a new tensor with the following:
X = X[:,-1,:]
X's shape becomes (None, 16) , but I need this to be (None, 1,16)
Update: I got this to work with the following code:
s = tf.shape(X)
X = tf.reshape(X[:,-1,:],shape=[s[0],1,s[2]])

Multi-dimension input to a neural network

I have a neural network with many layers. I have the input to the neural network of dimension [batch_size, 7, 4]. When this input is passed through the network, I observed that only the third dimension of the input keeps changing, that is if my first layer has 20 outputs, then the output of the second layer is [batch_size, 7, 20]. I need the end result after many layers to be of the shape [batchsize, 16].
I have the following questions:
Are the other two dimensions being used at all?
If not, how can I modify my network so that all three dimensions are used?
How do I drop one dimension meaningfully to get the 2-d output that I desire?
Following is my current implementation in Tensorflow v1.14 and Python 3:
out1 = tf.layers.dense(inputs=noisy_data, units=150, activation=tf.nn.tanh) # Outputs [batch, 7, 150]
out2 = tf.layers.dense(inputs=out1, units=75, activation=tf.nn.tanh) # Outputs [batch, 7, 75]
out3 = tf.layers.dense(inputs=out2, units=32, activation=tf.nn.tanh) # Outputs [batch, 7, 32]
out4 = tf.layers.dense(inputs=out3, units=16, activation=tf.nn.tanh) # Outputs [batch, 7, 16]
Any help is appreciated. Thanks.
Answer to Question 1: The data values in 2nd dimension (axis=1) are not being used because if you look at the output of code snippet below (assuming batch_size=2):
>>> input1 = tf.placeholder(float, shape=[2,7,4])
>>> tf.layers.dense(inputs=input1, units=150, activation=tf.nn.tanh)
>>> graph = tf.get_default_graph()
>>> graph.get_collection('variables')
[<tf.Variable 'dense/kernel:0' shape=(4, 150) dtype=float32_ref>, <tf.Variable 'dense/bias:0' shape=(150,) dtype=float32_ref>]
you can see that the dense layer ignores values along 2nd dimension. However, the values along 1st dimension would be considered as it is a part of a batch though the offical tensorflow docs doesn't say anything about the required input shape.
Answer to Question 2: Reshape the input [batch_size, 7, 4] to [batch_size, 28] by using the below line of code before passing the input to the first dense layer:
input1 = tf.reshape(input1, [-1, 7*4])
Answer to Question 3: If you reshape the inputs as above, there is no need to drop a dimension.

Logits have the wrong shape

When I attempt to use the softmax cross entropy function, I get a ValueError saying
ValueError: Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2).
The thing is that my layers are built in such a way that my logits should output only 1 value.
The shape of my logits is (5, 1) but I have no idea why there is a 5. The X for each instance is a 5x7 matrix
X = tf.placeholder(shape=(1, 5, 7), name='inputs', dtype=tf.float32)
y = tf.placeholder(shape=(1, 1), name='outputs', dtype=tf.int32)
hidden1 = tf.layers.dense(X, 150)
hidden2 = tf.layers.dense(hidden1, 50)
logits = tf.layers.dense(hidden2, 1)
with tf.name_scope("loss"):
xentropy= tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
Edit
Check the comment, and try this code.
X = tf.placeholder(shape=(1, 5, 7), name='inputs', dtype=tf.float32)
y = tf.placeholder(shape=(1), name='outputs', dtype=tf.int32)
flattened = tf.layers.flatten(X) # shape (1,35)
hidden1 = tf.layers.dense(flattened, 150) # shape (1,150)
hidden2 = tf.layers.dense(hidden1, 50) # shape (1,50)
logits = tf.layers.dense(hidden2, 1) # shape (1,1)
with tf.name_scope("loss"):
# expects logits of shape (1,1) against labels of shape (1)
xentropy= tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
Original
Let's think through what's going on here.
You create an X placeholder with the shape (5,7) (presumably (batch_size, data_size)).
You feed it into a hidden layer, which transforms the shape from (batch_size, data_size) to (batch_size, units) (units here is 150)
Likewise for the next two layers with hidden2 and logits, resulting in logits having shape (batch_size, 1), which is (5, 1) in this case
You're computing cross entropy between the labels and logits. The requirement for shapes here is for logits to have shape (batch_size, num_classes), where each value is the weight for a particular class, and for labels to have shape (batch_size), where each value is the class number for that particular sample. So this is where things go wrong for you. Your y has shape (1,1), and TF is expecting just a tensor or shape (5).
From what I'm guessing, I think you're trying to directly feed forward X as the data of a single sample (so like a (5,7) shaped matrix). If this is the case, you should have X take the shape (1,5,7) to signify to Tensorflow that X only represents one piece of data.
The thing is that my layers are built in such a way that my logits should output only 1 value.
That's not true. When X is an a X b tensor and you do tf.layers.dense(X, c), you are multiplying X by a b X c matrix (and bias also added of size c). So output size is a X c.
In your case since the first dimension of X is 5, it continues to be 5 even for logits. And your logits should be of size 5. So you are definitely doing something wrong. It is difficult to say what is without more information.

Tensorflow convolution

I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)

Categories