This question already has an answer here:
Operations on random variables not working properly in Tensorflow
(1 answer)
Closed 4 years ago.
I want to reshape a tensor matrix into specific shape. After I did the ops, I found the matrix itself changed. I don't know why this is happened.
tf.reset_default_graph()
with tf.Session() as test:
tf.set_random_seed(1)
a_S = tf.random_normal([1, 1,1,3], mean=1, stddev=4)
a_G = tf.random_normal([1, 1,1,3], mean=1, stddev=4)
J_style_layer = compute_layer_style_cost(a_S, a_G)
print("J_style_layer = " + str(J_style_layer.eval()))
The following is the definition of called function compute_layer_style_cost
def compute_layer_style_cost(a_S, a_G):
"""
Arguments:
a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations
representing style of the image S
a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations
representing style of the image G
Returns:
J_style_layer -- tensor representing a scalar value, style cost defined
above by equation (2)
"""
### START CODE HERE ###
# Retrieve dimensions from a_G (≈1 line)
m, n_H, n_W, n_C = a_S.get_shape().as_list()
print("m=>", m, "n_H=>", n_H, "n_W=>", n_W, "n_C=>", n_C)
print("a_S.shape=>", a_S.shape)
print("a_S=>",a_S.eval())
# Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
a_S = tf.reshape(a_S, [n_C, n_H*n_W])
a_G = tf.reshape(a_G, [n_C, n_H*n_W])
print("a_S.shape=>", a_S.shape)
print("a_S=>",a_S.eval())
After I ran it, I got following result.
m=> 1 n_H=> 1 n_W=> 1 n_C=> 3
a_S.shape=> (1, 1, 1, 3)
a_S=> [[[[-1.68344498 1.89428568 4.18909216]]]]
a_S.shape=> (3, 1)
a_S=> [[-4.78795481]
[ 5.39861012]
[ 4.57472849]]
The above result shows, after reshape ops, the value of tensor matrix has changed. And I don't know why this happen exactly.
After refer Operations on random variables not working properly in Tensorflow and https://www.tensorflow.org/programmers_guide/graphs, it seems I don't run two random variable computation in the same session, I changed my code into
with tf.Session() as sess:
print(sess.run([a_S, a_S_re]))
And it works.
Related
I am looking at an implementation of RCNN for text classification using PyTorch. Full Code. There are two points where the dimensions of tensors are permuted using the permute function. The first is after the LSTM layer and before tanh. The second is after a linear layer and before a max pooling layer.
Could you please explain why the permutation is necessary or useful?
Relevant Code
def forward(self, x):
# x.shape = (seq_len, batch_size)
embedded_sent = self.embeddings(x)
# embedded_sent.shape = (seq_len, batch_size, embed_size)
lstm_out, (h_n,c_n) = self.lstm(embedded_sent)
# lstm_out.shape = (seq_len, batch_size, 2 * hidden_size)
input_features = torch.cat([lstm_out,embedded_sent], 2).permute(1,0,2)
# final_features.shape = (batch_size, seq_len, embed_size + 2*hidden_size)
linear_output = self.tanh(
self.W(input_features)
)
# linear_output.shape = (batch_size, seq_len, hidden_size_linear)
linear_output = linear_output.permute(0,2,1) # Reshaping fot max_pool
max_out_features = F.max_pool1d(linear_output, linear_output.shape[2]).squeeze(2)
# max_out_features.shape = (batch_size, hidden_size_linear)
max_out_features = self.dropout(max_out_features)
final_out = self.fc(max_out_features)
return self.softmax(final_out)
Similar Code in other Repositories
Similar implementations of RCNN use permute or transpose. Here are examples:
https://github.com/prakashpandey9/Text-Classification-Pytorch/blob/master/models/RCNN.py
https://github.com/jungwhank/rcnn-text-classification-pytorch/blob/master/model.py
What permute function does is rearranges the original tensor according to the desired ordering, note permute is different from reshape function, because when apply permute, the elements in tensor follow the index you provide where in reshape it's not.
Example code:
import torch
var = torch.randn(2, 4)
pe_var = var.permute(1, 0)
re_var = torch.reshape(var, (4, 2))
print("Original size:\n{}\nOriginal var:\n{}\n".format(var.size(), var) +
"Permute size:\n{}\nPermute var:\n{}\n".format(pe_var.size(), pe_var) +
"Reshape size:\n{}\nReshape var:\n{}\n".format(re_var.size(), re_var))
Outputs:
Original size:
torch.Size([2, 4])
Original var:
tensor([[ 0.8250, -0.1984, 0.5567, -0.7123],
[-1.0503, 0.0470, -1.9473, 0.9925]])
Permute size:
torch.Size([4, 2])
Permute var:
tensor([[ 0.8250, -1.0503],
[-0.1984, 0.0470],
[ 0.5567, -1.9473],
[-0.7123, 0.9925]])
Reshape size:
torch.Size([4, 2])
Reshape var:
tensor([[ 0.8250, -0.1984],
[ 0.5567, -0.7123],
[-1.0503, 0.0470],
[-1.9473, 0.9925]])
With the role of permute in mind we could see what first permute does is reordering the concatenate tensor for it to fit the inputs format of self.W, i.e with batch as first dimension; and the second permute does similar thing because we want to max pool the linear_output along the sequence and F.max_pool1d will pool along the last dimension.
I am adding this answer to provide additional PyTorch-specific details.
It is necessary to use permute between nn.LSTM and nn.Linear because the output shape of LSTM does not correspond to the expected input shape of Linear.
nn.LSTM outputs output, (h_n, c_n). Tensor output has shape seq_len, batch, num_directions * hidden_size nn.LSTM. nn.Linear expects an input tensor with shape N,∗,H, where N is batch size and H is number of input features. nn.Linear.
It is necessary to use permute between nn.Linear and nn.MaxPool1d because the output of nn.Linear is N, L, C, where N is batch size, C is the number of features, and and L is sequence length. nn.MaxPool1d expects an input tensor of shape N, C, L. nn.MaxPool1d
I reviewed seven implementations of RCNN for text classification with PyTorch on GitHub and gitee and found that permute and transpose are the normal ways to convert the output of one layer to the input of a subsequent layer.
Some notes: I'm using tensorflow 2.3.0, python 3.8.2, and numpy 1.18.5 (not sure if that one matters though)
I'm writing a custom layer that stores a non-trainable tensor N of shape (a, b) internally, where a, b are known values (this tensor is created during init). When called on an input tensor, it flattens the input tensor, flattens its stored tensor, and concatenates the two together. Unfortunately, I can't seem to figure out how to preserve the unknown batch dimension during this concatenation. Here's minimal code:
import tensorflow as tf
from tensorflow.keras.layers import Layer, Flatten
class CustomLayer(Layer):
def __init__(self, N): # N is a tensor of shape (a, b), where a, b > 1
super(CustomLayer, self).__init__()
self.N = self.add_weight(name="N", shape=N.shape, trainable=False, initializer=lambda *args, **kwargs: N)
# correct me if I'm wrong in using this initializer approach, but for some reason, when I
# just do self.N = N, this variable would disappear when I saved and loaded the model
def build(self, input_shape):
pass # my reasoning is that all the necessary stuff is handled in init
def call(self, input_tensor):
input_flattened = Flatten()(input_tensor)
N_flattened = Flatten()(self.N)
return tf.concat((input_flattened, N_flattened), axis=-1)
The first problem I noticed was that Flatten()(self.N) would return a tensor with the same shape (a, b) as the original self.N, and as a result, the returned value would have a shape of (a, num_input_tensor_values+b). My reasoning for this was that the first dimension, a, was treated as the batch size. I modified the call function:
def call(self, input_tensor):
input_flattened = Flatten()(input_tensor)
N = tf.expand_dims(self.N, axis=0) # N would now be shape (1, a, b)
N_flattened = Flatten()(N)
return tf.concat((input_flattened, N_flattened), axis=-1)
This would return a tensor with shape (1, num_input_vals + a*b), which is great, but now the batch dimension is permanently 1, which I realized when I started training a model with this layer and it would only work for a batch size of 1. This is also really apparent in the model summary - if I were to put this layer after an input and add some other layers afterwards, the first dimension of the output tensors goes like None, 1, 1, 1, 1.... Is there a way to store this internal tensor and use it in call while preserving the variable batch size? (For example, with a batch size of 4, a copy of the same flattened N would be concatenated onto the end of each of the 4 flattened input tensors.)
You have to have as many flattened N vectors, as you have samples in your input, because you are concatenating to every sample. Think of it like pairing up rows and concatenating them. If you have only one N vector, then only one pair can be concatenated.
To solve this, you should use tf.tile() to repeat N as many times as there are samples in your batch.
Example:
def call(self, input_tensor):
input_flattened = Flatten()(input_tensor) # input_flattened shape: (None, ..)
N = tf.expand_dims(self.N, axis=0) # N shape: (1, a, b)
N_flattened = Flatten()(N) # N_flattened shape: (1, a*b)
N_tiled = tf.tile(N_flattened, [tf.shape(input_tensor)[0], 1]) # repeat along the first dim as many times, as there are samples and leave the second dim alone
return tf.concat((input_flattened, N_tiled), axis=-1)
This question already has answers here:
How does pytorch broadcasting work?
(2 answers)
Closed 3 years ago.
I'm new to Deep Learning. I'm studying from Udacity.
I came across one of the codes to build up a neural network, where 2 tensors are being added, specifically the 'bias' tensor with the output of the tensor-multiplication product.
It was kind of...
def activation(x):
return (1/(1+torch.exp(-x)))
inputs = images.view(images.shape[0], -1)
w1 = torch.randn(784, 256)
b1 = torch.randn(256)
h = activation(torch.mm(inputs,w1) + b1)
After flattening the MNIST, it came out as [64,784] (inputs).
I'm not getting how the bias tensor (b1) of dimension [256], could be added to the multiplication product of 'inputs' and 'w1' which comes out to be the dimensions of [256, 64].
In simple terms, whenever we use "broadcasting" from a Python library (Numpy or PyTorch), what we are doing is treating our arrays (weight, bias) dimensionally compatible.
In other words, if you are operating with W of shape [256,64], and your bias is only [256]. Then, broadcasting will complete that lacking dimension.
As you can see in the image above, the dimension left is being filled so that our operations can be done successfully. Hope this is helpful
64 is your batch size, meaning that the bias tensor will be added to each of the 64 examples inside of your batch. Basically it's like if you took 64 tensor of size 256 and added the bias to each of them. Pytorch will naturally broadcast the 256 tensor to a 64*256 size that can be added to the 64*256 output of your precedent layer.
This is something called PyTorch broadcasting.
It is very similar to NumPy broadcasting if you used the library.
Here is the example adding a scalar to a 2D tensor m.
m = torch.rand(3,3)
print(m)
s=1
print(m+s)
# tensor([[0.2616, 0.4726, 0.1077],
# [0.0097, 0.1070, 0.7539],
# [0.9406, 0.1967, 0.1249]])
# tensor([[1.2616, 1.4726, 1.1077],
# [1.0097, 1.1070, 1.7539],
# [1.9406, 1.1967, 1.1249]])
Here is the another example adding 1D tensor and 2D tensor.
v = torch.rand(3)
print(v)
print(m+v)
# tensor([0.2346, 0.9966, 0.0266])
# tensor([[0.4962, 1.4691, 0.1343],
# [0.2442, 1.1035, 0.7805],
# [1.1752, 1.1932, 0.1514]])
I rewrote your example:
def activation(x):
return (1/(1+torch.exp(-x)))
images = torch.randn(3,28,28)
inputs = images.view(images.shape[0], -1)
print("INPUTS:", inputs.shape)
W1 = torch.randn(784, 256)
print("W1:", w1.shape)
B1 = torch.randn(256)
print("B1:", b1.shape)
h = activation(torch.mm(inputs,W1) + B1)
Out
INPUTS: torch.Size([3, 784])
W1: torch.Size([784, 256])
B1: torch.Size([256])
To explain:
INPUTS: of size [3, 784] # W1: of size [784, 256] will create tensor of size [3, 256]
Then the addition:
After mm: [3, 256] + B1: [256] is done because B1 will take the shape of [3, 256] based on broadcasting.
I am currently working on a neural network that takes some inputs and returns 2 outputs. I used 2 outputs in a regression problem where they both are 2 coordinates, X and Y.
My problem doesn't need X and Y values but angle it is facing which is atan2(y,x).
I am trying to to create a custom keras metric and a loss function that does a atan2 operation between the elements of the predicted tensor and true tensor so as to better train the network on my task.
The shape of the output tensor in metric is [?, 2] and I want to do a function where I can loop through the tensor and apply atan2(tensor[itr, 1], tensor[itr, 0]) on it to get an array of another tensors.
I have tried using tf.slit and tf.slice
I don't want to convert it into a numpy array and back to tensorflow due to performance reasons.
I have tried to get the shape of tensors using tensor.get_shape().as_list() and iterate through it.
self.model.compile(loss="mean_absolute_error",
optimizer=tf.keras.optimizers.Adam(lr=0.01),
metrics=[vect2d_to_angle_metric])
# This is the function i want to work on
def vect2d_to_angle_metric(y_true, y_predicted):
print("y_true = ", y_true)
print("y_predicted = ", y_predicted)
print("y_true shape = ", y_true.shape())
print("y_predicted shape = ", y_predicted.shape())
The print out of the above function being
y_true = Tensor("dense_2_target:0", shape=(?, ?), dtype=float32)
y_predicted = Tensor("dense_2/BiasAdd:0", shape=(?, 2), dtype=float32)
y_true shape = Tensor("metrics/vect2d_to_angle_metric/Shape:0", shape=(2,), dtype=int32)
y_predicted shape = Tensor("metrics/vect2d_to_angle_metric/Shape_1:0", shape=(2,), dtype=int32)
Python pseudo-code of the functionality I want to apply to the tensorflow function
def evaluate(self):
mean_array = []
for i in range(len(x_test)):
inputs = x_test[i]
prediction = self.model.getprediction(i)
predicted_angle = np.arctan2(result[i][1], result[i][0])
real_angle = np.arctan2(float(self.y_test[i][1]), float(self.y_test[i][0]))
mean_array.append(([abs(predicted_angle - real_angle)]/real_angle) * 100)
i += 1
I expect to slide the 2 sides of the tensor [i][0] and [i][1] and to a tf.atan2() function on both of them and finally make another tensor out of them so as to follow with other calculations and pass the custom loss.
I have the following batch shape:
[?,227,227]
And the following weight variable:
weight_tensor = tf.truncated_normal([227,227],**{'stddev':0.1,'mean':0.0})
weight_var = tf.Variable(weight_tensor)
But when I do tf.batch_matmul:
matrix = tf.batch_matmul(prev_net_2d,weight_var)
I fail with the following error:
ValueError: Shapes (?,) and () must have the same rank
So my question becomes: How do I do this?
How do I just have a weight_variable in 2D that gets multiplied by each individual picture (227x227) so that I have a (227x227) output?? The flat version of this operation completely exhausts the resources...plus the gradient won't change the weights correctly in the flat form...
Alternatively: how do I split the incoming tensor along the batch dimension (?,) so that I can run the tf.matmul function on each of the split tensors with my weight_variable?
You could tile weights along the first dimension
weight_tensor = tf.truncated_normal([227,227],**{'stddev':0.1,'mean':0.0})
weight_var = tf.Variable(weight_tensor)
weight_var_batch = tf.tile(tf.expand_dims(weight_var, axis=0), [batch_size, 1, 1])
matrix = tf.matmul(prev_net_2d,weight_var_batch)
Although batch_matmul doesn't exist anymore