This question already has answers here:
How does pytorch broadcasting work?
(2 answers)
Closed 3 years ago.
I'm new to Deep Learning. I'm studying from Udacity.
I came across one of the codes to build up a neural network, where 2 tensors are being added, specifically the 'bias' tensor with the output of the tensor-multiplication product.
It was kind of...
def activation(x):
return (1/(1+torch.exp(-x)))
inputs = images.view(images.shape[0], -1)
w1 = torch.randn(784, 256)
b1 = torch.randn(256)
h = activation(torch.mm(inputs,w1) + b1)
After flattening the MNIST, it came out as [64,784] (inputs).
I'm not getting how the bias tensor (b1) of dimension [256], could be added to the multiplication product of 'inputs' and 'w1' which comes out to be the dimensions of [256, 64].
In simple terms, whenever we use "broadcasting" from a Python library (Numpy or PyTorch), what we are doing is treating our arrays (weight, bias) dimensionally compatible.
In other words, if you are operating with W of shape [256,64], and your bias is only [256]. Then, broadcasting will complete that lacking dimension.
As you can see in the image above, the dimension left is being filled so that our operations can be done successfully. Hope this is helpful
64 is your batch size, meaning that the bias tensor will be added to each of the 64 examples inside of your batch. Basically it's like if you took 64 tensor of size 256 and added the bias to each of them. Pytorch will naturally broadcast the 256 tensor to a 64*256 size that can be added to the 64*256 output of your precedent layer.
This is something called PyTorch broadcasting.
It is very similar to NumPy broadcasting if you used the library.
Here is the example adding a scalar to a 2D tensor m.
m = torch.rand(3,3)
print(m)
s=1
print(m+s)
# tensor([[0.2616, 0.4726, 0.1077],
# [0.0097, 0.1070, 0.7539],
# [0.9406, 0.1967, 0.1249]])
# tensor([[1.2616, 1.4726, 1.1077],
# [1.0097, 1.1070, 1.7539],
# [1.9406, 1.1967, 1.1249]])
Here is the another example adding 1D tensor and 2D tensor.
v = torch.rand(3)
print(v)
print(m+v)
# tensor([0.2346, 0.9966, 0.0266])
# tensor([[0.4962, 1.4691, 0.1343],
# [0.2442, 1.1035, 0.7805],
# [1.1752, 1.1932, 0.1514]])
I rewrote your example:
def activation(x):
return (1/(1+torch.exp(-x)))
images = torch.randn(3,28,28)
inputs = images.view(images.shape[0], -1)
print("INPUTS:", inputs.shape)
W1 = torch.randn(784, 256)
print("W1:", w1.shape)
B1 = torch.randn(256)
print("B1:", b1.shape)
h = activation(torch.mm(inputs,W1) + B1)
Out
INPUTS: torch.Size([3, 784])
W1: torch.Size([784, 256])
B1: torch.Size([256])
To explain:
INPUTS: of size [3, 784] # W1: of size [784, 256] will create tensor of size [3, 256]
Then the addition:
After mm: [3, 256] + B1: [256] is done because B1 will take the shape of [3, 256] based on broadcasting.
Related
I am looking at an implementation of RCNN for text classification using PyTorch. Full Code. There are two points where the dimensions of tensors are permuted using the permute function. The first is after the LSTM layer and before tanh. The second is after a linear layer and before a max pooling layer.
Could you please explain why the permutation is necessary or useful?
Relevant Code
def forward(self, x):
# x.shape = (seq_len, batch_size)
embedded_sent = self.embeddings(x)
# embedded_sent.shape = (seq_len, batch_size, embed_size)
lstm_out, (h_n,c_n) = self.lstm(embedded_sent)
# lstm_out.shape = (seq_len, batch_size, 2 * hidden_size)
input_features = torch.cat([lstm_out,embedded_sent], 2).permute(1,0,2)
# final_features.shape = (batch_size, seq_len, embed_size + 2*hidden_size)
linear_output = self.tanh(
self.W(input_features)
)
# linear_output.shape = (batch_size, seq_len, hidden_size_linear)
linear_output = linear_output.permute(0,2,1) # Reshaping fot max_pool
max_out_features = F.max_pool1d(linear_output, linear_output.shape[2]).squeeze(2)
# max_out_features.shape = (batch_size, hidden_size_linear)
max_out_features = self.dropout(max_out_features)
final_out = self.fc(max_out_features)
return self.softmax(final_out)
Similar Code in other Repositories
Similar implementations of RCNN use permute or transpose. Here are examples:
https://github.com/prakashpandey9/Text-Classification-Pytorch/blob/master/models/RCNN.py
https://github.com/jungwhank/rcnn-text-classification-pytorch/blob/master/model.py
What permute function does is rearranges the original tensor according to the desired ordering, note permute is different from reshape function, because when apply permute, the elements in tensor follow the index you provide where in reshape it's not.
Example code:
import torch
var = torch.randn(2, 4)
pe_var = var.permute(1, 0)
re_var = torch.reshape(var, (4, 2))
print("Original size:\n{}\nOriginal var:\n{}\n".format(var.size(), var) +
"Permute size:\n{}\nPermute var:\n{}\n".format(pe_var.size(), pe_var) +
"Reshape size:\n{}\nReshape var:\n{}\n".format(re_var.size(), re_var))
Outputs:
Original size:
torch.Size([2, 4])
Original var:
tensor([[ 0.8250, -0.1984, 0.5567, -0.7123],
[-1.0503, 0.0470, -1.9473, 0.9925]])
Permute size:
torch.Size([4, 2])
Permute var:
tensor([[ 0.8250, -1.0503],
[-0.1984, 0.0470],
[ 0.5567, -1.9473],
[-0.7123, 0.9925]])
Reshape size:
torch.Size([4, 2])
Reshape var:
tensor([[ 0.8250, -0.1984],
[ 0.5567, -0.7123],
[-1.0503, 0.0470],
[-1.9473, 0.9925]])
With the role of permute in mind we could see what first permute does is reordering the concatenate tensor for it to fit the inputs format of self.W, i.e with batch as first dimension; and the second permute does similar thing because we want to max pool the linear_output along the sequence and F.max_pool1d will pool along the last dimension.
I am adding this answer to provide additional PyTorch-specific details.
It is necessary to use permute between nn.LSTM and nn.Linear because the output shape of LSTM does not correspond to the expected input shape of Linear.
nn.LSTM outputs output, (h_n, c_n). Tensor output has shape seq_len, batch, num_directions * hidden_size nn.LSTM. nn.Linear expects an input tensor with shape N,∗,H, where N is batch size and H is number of input features. nn.Linear.
It is necessary to use permute between nn.Linear and nn.MaxPool1d because the output of nn.Linear is N, L, C, where N is batch size, C is the number of features, and and L is sequence length. nn.MaxPool1d expects an input tensor of shape N, C, L. nn.MaxPool1d
I reviewed seven implementations of RCNN for text classification with PyTorch on GitHub and gitee and found that permute and transpose are the normal ways to convert the output of one layer to the input of a subsequent layer.
PyTorch offers a functional 2D convolution operation torch.nn.functional.conv2d which may take an input and a weight parameter (https://pytorch.org/docs/stable/nn.html#conv2d).
I want to perform a self convolution, meaning that given a tensor X. I want to calculate:
conv2d(X, X)
However the shape of the filter is expected to be of the form [out_ch, in_channel/groups, kernel_size, kernel_size].
Let's say X is of the form [32, 3, 16, 16,] I have tried to permute X to try to match the expected form of the weight parameter, such as:
W = X.permute(1, 0, 2, 3) # 3, 32, 16, 16
conv2d(X, W)
But that does not work since 32 != in_channel/groups == 3 / 1 == 3
The complete operation would essentially look like this (to see the dimensions requirements):
X + conv2d(X, X)
Is it possible to do this with PyTorch?
Other packages such as SciPy offers a convolve2d operation (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html) but that only supports 2D arrays, and appears to do what I'm trying to do above, but without support for batches and channels.
I am currently working on a neural network that takes some inputs and returns 2 outputs. I used 2 outputs in a regression problem where they both are 2 coordinates, X and Y.
My problem doesn't need X and Y values but angle it is facing which is atan2(y,x).
I am trying to to create a custom keras metric and a loss function that does a atan2 operation between the elements of the predicted tensor and true tensor so as to better train the network on my task.
The shape of the output tensor in metric is [?, 2] and I want to do a function where I can loop through the tensor and apply atan2(tensor[itr, 1], tensor[itr, 0]) on it to get an array of another tensors.
I have tried using tf.slit and tf.slice
I don't want to convert it into a numpy array and back to tensorflow due to performance reasons.
I have tried to get the shape of tensors using tensor.get_shape().as_list() and iterate through it.
self.model.compile(loss="mean_absolute_error",
optimizer=tf.keras.optimizers.Adam(lr=0.01),
metrics=[vect2d_to_angle_metric])
# This is the function i want to work on
def vect2d_to_angle_metric(y_true, y_predicted):
print("y_true = ", y_true)
print("y_predicted = ", y_predicted)
print("y_true shape = ", y_true.shape())
print("y_predicted shape = ", y_predicted.shape())
The print out of the above function being
y_true = Tensor("dense_2_target:0", shape=(?, ?), dtype=float32)
y_predicted = Tensor("dense_2/BiasAdd:0", shape=(?, 2), dtype=float32)
y_true shape = Tensor("metrics/vect2d_to_angle_metric/Shape:0", shape=(2,), dtype=int32)
y_predicted shape = Tensor("metrics/vect2d_to_angle_metric/Shape_1:0", shape=(2,), dtype=int32)
Python pseudo-code of the functionality I want to apply to the tensorflow function
def evaluate(self):
mean_array = []
for i in range(len(x_test)):
inputs = x_test[i]
prediction = self.model.getprediction(i)
predicted_angle = np.arctan2(result[i][1], result[i][0])
real_angle = np.arctan2(float(self.y_test[i][1]), float(self.y_test[i][0]))
mean_array.append(([abs(predicted_angle - real_angle)]/real_angle) * 100)
i += 1
I expect to slide the 2 sides of the tensor [i][0] and [i][1] and to a tf.atan2() function on both of them and finally make another tensor out of them so as to follow with other calculations and pass the custom loss.
This question already has an answer here:
Operations on random variables not working properly in Tensorflow
(1 answer)
Closed 4 years ago.
I want to reshape a tensor matrix into specific shape. After I did the ops, I found the matrix itself changed. I don't know why this is happened.
tf.reset_default_graph()
with tf.Session() as test:
tf.set_random_seed(1)
a_S = tf.random_normal([1, 1,1,3], mean=1, stddev=4)
a_G = tf.random_normal([1, 1,1,3], mean=1, stddev=4)
J_style_layer = compute_layer_style_cost(a_S, a_G)
print("J_style_layer = " + str(J_style_layer.eval()))
The following is the definition of called function compute_layer_style_cost
def compute_layer_style_cost(a_S, a_G):
"""
Arguments:
a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations
representing style of the image S
a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations
representing style of the image G
Returns:
J_style_layer -- tensor representing a scalar value, style cost defined
above by equation (2)
"""
### START CODE HERE ###
# Retrieve dimensions from a_G (≈1 line)
m, n_H, n_W, n_C = a_S.get_shape().as_list()
print("m=>", m, "n_H=>", n_H, "n_W=>", n_W, "n_C=>", n_C)
print("a_S.shape=>", a_S.shape)
print("a_S=>",a_S.eval())
# Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
a_S = tf.reshape(a_S, [n_C, n_H*n_W])
a_G = tf.reshape(a_G, [n_C, n_H*n_W])
print("a_S.shape=>", a_S.shape)
print("a_S=>",a_S.eval())
After I ran it, I got following result.
m=> 1 n_H=> 1 n_W=> 1 n_C=> 3
a_S.shape=> (1, 1, 1, 3)
a_S=> [[[[-1.68344498 1.89428568 4.18909216]]]]
a_S.shape=> (3, 1)
a_S=> [[-4.78795481]
[ 5.39861012]
[ 4.57472849]]
The above result shows, after reshape ops, the value of tensor matrix has changed. And I don't know why this happen exactly.
After refer Operations on random variables not working properly in Tensorflow and https://www.tensorflow.org/programmers_guide/graphs, it seems I don't run two random variable computation in the same session, I changed my code into
with tf.Session() as sess:
print(sess.run([a_S, a_S_re]))
And it works.
I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)