I'm writing a simple neural network in pyTorch, where features and weights both are (1, 5) tensors. What are the differences between the two methods that I mention below?
y = activation(torch.sum(features*weights) + bias)
and
yy = activation(torch.mm(features, weights.view(5,1)) + bias)
Consider it step by step:
x = torch.tensor([[10, 2], [3,5]])
y = torch.tensor([[1,3], [5,6]])
x * y
# tensor([[10, 6],
# [15, 30]])
torch.sum(x*y)
#tensor(61)
x = torch.tensor([[10, 2], [3,5]])
y = torch.tensor([[1,3], [5,6]])
np.matmul(x, y)
# array([[20, 42],
# [28, 39]])
So there is a difference betweeen matmul and * operator. Furthermore, torch.sum makes an entire sum from the tensor, not row or columnwisely.
features = torch.rand(1, 5)
weights = torch.Tensor([1, 2, 3, 4, 5])
print(features)
print(weights)
# Element-wise multiplication of shape (1 x 5)
# out = [f1*w1, f2*w2, f3*w3, f4*w4, f5*w5]
print(features*weights)
# weights has been reshaped to (5, 1)
# Element-wise multiplication of shape (5 x 5)
# out = [f1*w1, f2*w1, f3*w1, f4*w1, f5*w1]
# [f1*w2, f2*w2, f3*w2, f4*w2, f5*w2]
# [f1*w3, f2*w3, f3*w3, f4*w3, f5*w3]
# [f1*w4, f2*w4, f3*w4, f4*w4, f5*w4]
# [f1*w5, f2*w5, f3*w5, f4*w5, f5*w5]
print(features*weights.view(5, 1))
# Matrix-multiplication
# (1, 5) * (5, 1) -> (1, 1)
# out = [f1*w1 + f2*w2 + f3*w3 + f4*w4 + f5*w5]
print(torch.mm(features, weights.view(5, 1)))
output
tensor([[0.1467, 0.6925, 0.0987, 0.5244, 0.6491]]) # features
tensor([1., 2., 3., 4., 5.]) # weights
tensor([[0.1467, 1.3851, 0.2961, 2.0976, 3.2455]]) # features*weights
tensor([[0.1467, 0.6925, 0.0987, 0.5244, 0.6491],
[0.2934, 1.3851, 0.1974, 1.0488, 1.2982],
[0.4400, 2.0776, 0.2961, 1.5732, 1.9473],
[0.5867, 2.7701, 0.3947, 2.0976, 2.5964],
[0.7334, 3.4627, 0.4934, 2.6220, 3.2455]]) # features*weights.view(5,1)
tensor([[7.1709]]) # torch.mm(features, weights.view(5, 1))
Related
I have a tensor operation that I would like to replicate using a combination of torch.stack() and torch.tensordot() to generalize it further on a larger program. In summary, I want to replicate the tensor V_1 using said operations into another tensor called V_2.
N, t , J = 4, 2 , 3
K_f , K_r = 1, 1
R = 5
K = K_f + K_r
id = torch.arange(N).repeat(t).sort()
X = torch.randn(N*t, K , J)
Y = torch.randn(N*t, 1)
D = torch.randn(N, K_r , R)
Draw = D.repeat_interleave(t,0)
beta = torch.randn(2*K_r + K_f, 1)
beta_R = (beta[0:K_r,0] + beta[K_r:2*K_r,0] * Draw ).repeat(1,J,1)
print("shape beta_R:", beta_R.shape)
beta_F = beta[2*K_r:2*K_r + K_f,0].repeat(N*t, J, R)
print("shape beta_F:", beta_F.shape)
XX_0 =X[:,0,:].unsqueeze(2).repeat(1,1,R)
print("shape XX_0:", XX_0.shape)
XX_1 =X[:,1,:].unsqueeze(2).repeat(1,1,R)
print("shape XX_1:", XX_1.shape)
V_1 = XX_0 * beta_R + XX_1 * beta_F
print("shape V_1:",V_1.shape)
#shape beta_R: torch.Size([8, 3, 5])
#shape beta_F: torch.Size([8, 3, 5])
#shape XX_0: torch.Size([8, 3, 5])
#shape XX_1: torch.Size([8, 3, 5])
#shape V_1: torch.Size([8, 3, 5])
Now I want to do the same but stacking my tensors (using torch.stack()) and applying a generalized version of the dot-product (using torch.tensordot()), but I am a bit confused with the dims argument which is not doing what I expected.
#%% Replicating using stacking and tensordot
stack_XX = torch.stack((XX_0, XX_1), 0)
print("shape stack_XX:",stack_XX.shape)
stack_beta = torch.stack((beta_R, beta_F), 0)
print("shape stack_beta:", stack_beta.shape)
# dot product bewteen stack_XX and stack_beta along the first dimension
V_2 = torch.tensordot(stack_XX, stack_beta, dims=([0], [0]))
print("shape V_2:",V_2.shape)
# check if the two are equal
torch.all(V_1.eq(V_2))
#shape stack_XX: torch.Size([2, 8, 3, 5])
#shape stack_beta: torch.Size([2, 8, 3, 5])
#shape V_2: torch.Size([8, 3, 5, 8, 3, 5])
#tensor(False)
So I am basically trying to get tensor(True) when running torch.all(V_1.eq(V_2)).
May be?
torch.einsum( 'abcd,abcd->bcd', stack_XX, stack_beta)
import tensorflow as tf
with tf.Session() as sess:
with tf.variable_scope('masssdsms'):
a = tf.get_variable('a', [1000,24,128], dtype=tf.float32, initializer=tf.random_normal_initializer(stddev=0.1) )
b = tf.get_variable('b', [1000,15,128], dtype=tf.float32, initializer=tf.random_normal_initializer(stddev=0.1) )
I want to get a new tensor named c from a and b.
1000 is the batch size, and c's shape should be (1000,20, 10, 1). For every instance from a and b: ai and bi, they are both two dimensional tensors.
The new instance ci is the result of ai and bi and it has 20 * 10 = 200 elements, that every element is the dot product of ai and bi with 128 dimension respectively. So there are 200 dot products results in sum. The ci is more like a 2-D image.
How can I initialize this operation?
Modified:
When I take the codes in usage, the operation of dot product should be replaced with some other function like guassian distance, or cosine distance etc, which is contact notation in the graph.
So I need to a common method to do this.
Here is what I design, but I am not sure whether it is a efficient way to do this:
with tf.Session() as sess:
with tf.variable_scope('masssdsms'):
a = tf.get_variable('a', [1000,24,128], dtype=tf.float32, initializer=tf.random_normal_initializer(stddev=0.1) )
b = tf.get_variable('b', [1000,15,128], dtype=tf.float32, initializer=tf.random_normal_initializer(stddev=0.1) )
i = 999 # for i in range(1000):
ai = tf.slice(a,[i,0,0],[1,-1,-1]) # (1,24,128)
bi = tf.slice(b,[i,0,0],[1,-1,-1]) # (1,15,128)
ci = contact_func(ai,bi) # (1,24,15)
You can achieve that with clever application of broadcasting. Try this:
a = tf.ones([1000, 20, 128])
b = tf.ones([1000, 10, 128])
a = tf.expand_dims(a, axis=1) # [1000, 1, 20, 128]
b = tf.expand_dims(b, axis=2) # [1000, 10, 1, 128]
products = a * b # [1000, 10, 20, 128]
reduced = tf.reduce_sum(products, axis=-1) # [1000, 10, 20]
The products contains all pairwise multiplications of all items in a and b. And the reduced aggregates the sum over the last axis.
Doing a matmul of the matrix a with the transpose of the dimension-1 of b should give the desired result:
c = tf.matmul(a, tf.transpose(b, [0, 2, 1])) # [1000, 20, 10]
# to get (1000, 20, 10, 1) you do
tf.expand_dims(c, 3)
EDIT:
For the contact_func operation, you may need to manually do the broadcasting using tile operator. Here is the code for gaussian distance:
# use tile to repeat the rows
d = tf.reshape(tf.tile(a, [1, 1, b.shape[1]]), (-1,a.shape[1]*b.shape[1],a.shape[2]))
#[1000, 360, 128],
# repeat the columns
e = tf.tile(b, [1, a.shape[1], 1])
#[1000, 360, 128]
# exp(-d_i_j), where d_i_j is the eucludian distance of i, j
c = tf.reshape(tf.exp(tf.reduce_sum(d-e, 2)), (-1, a.shape[1], b.shape[1]))
#[1000, 24, 15]
I would like to center my set of rows with several means and get several sets of centered rows.
My data has shape of (4, 3) i.e. four 3D vectors:
data = tf.get_variable("myvar1", shape=[4, 3], dtype=tf.float64)
I have two centers (two 3D vectors):
mu = tf.get_variable("mu", initializer=tf.constant(np.arange(2*3).reshape(2, 3), dtype=tf.float64))
I would like to center data once per each mu. In numpy I would write loop:
data = np.arange(4 * 3).reshape(4, 3)
mu = np.arange(2*3).reshape(2, 3)
centered_data = np.empty((2, 4, 3))
for i_data in range(len(data)):
for i_mu in range(len(mu)):
centered = data[i_data] - mu[i_mu]
centered_data[i_mu, i_data, :] = centered
How to do the same in tensorflow?
Bulk method for numpy would also be appreciated!
Apparently I can insert singular dimension to provoke broadcasting:
data = tf.get_variable("myvar1", shape=[4, 3], dtype=tf.float64)
mu = tf.get_variable("mu", initializer=tf.constant(np.arange(2*3).reshape(2, 3), dtype=tf.float64))
centered_data = data - tf.expand_dims(mu, axis=1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
ans_value, centered_data_value, mu_value = sess.run([centered_data, data, mu], {data: np.arange(4 * 3).reshape(4, 3)})
print("centered_data_value: ", centered_data_value)
print("mu: ", mu_value)
print("ans: ", ans_value)
The same is in numpy:
mu = np.reshape(mu, (2, 1, 3))
centered_data = data - mu
You only need to use - or tf.substract it will do element wise operation then:
centered_data = tf.substract(data, mu)
I have a tensor X whose shape is (None, 56, 300, 1), and another tensor y whose shape is (None, 15), the first dimension of these tensors is batch_size, I wanna use y as index to get a tensor z, the shape of z is (None, 15, 300, 1). Is there any decent way to do this?
I write a simple code to test, for I found it's difficult for me because in practice I don't know the batch_size(first dimension of these tensors is None),
Here is my test code:
import numpy as np
import tensorflow as tf
# In this test code , batch_size is 4.
# params' shape is (4, 3, 2 ,1), in practice is (None, 56, 300, 1),
params = [
[[['a0'], ['b0']], [['d0'], ['e0']], [['f0'], ['g0']]],
[[['a1'], ['b1']], [['d1'], ['e1']], [['f1'], ['g1']]],
[[['a2'], ['b2']], [['d2'], ['e2']], [['f2'], ['g2']]],
[[['a3'], ['b3']], [['d3'], ['e3']], [['f3'], ['g3']]],
]
# ind's shape is (4, 2) (In practice is (None, 15)),
# so I wanna get output whose's shape is (4, 2, 2, 1), (In practice is (None, 15, 300, 1))
ind = [[1, 0], [0, 2], [2, 0], [2, 1]]
#ouput = [
# [[['d0'], ['e0']], [['a0'], ['b0']]],
# [[['a1'], ['b1']], [['f1'], ['g1']]],
# [[['f2'], ['g2']], [['a2'], ['b2']]],
# [[['f3'], ['g3']], [['d3'], ['e3']]]
#]
with tf.variable_scope('gather') as scope:
tf_par = tf.constant(params)
tf_ind = tf.constant(ind)
res = tf.gather_nd(tf_par, tf_ind)
with tf.Session() as sess:
init = tf.global_variables_initializer()
print sess.run(res)
print res
To slice x along the second dimension with ind, that is, to slice
tensor x of shape (d0, d1, d2,...), d0 being possibly None,
with a tensor of indices ind of shape (d0, n1),
to obtain a tensor y of shape (d0, n1, d2, ...),
you could use tf.gather_nd along with tf.shape to get the shape at run time:
ind_shape = tf.shape(ind)
ndind = tf.stack([tf.tile(tf.range(ind_shape[0])[:, None], [1, ind_shape[1]]),
ind], axis=-1)
y = tf.gather_nd(x, ndind)
For results you suppose, you should use:
ind = [[0, 1], [0, 0], [1, 0], [1, 2], [2, 2], [2, 0], [3, 2], [3, 1]]
Update
You can use this code for get what you want, with current input:
with tf.variable_scope('gather') as scope:
tf_par = tf.constant(params)
tf_ind = tf.constant(ind)
tf_par_shape = tf.shape(tf_par)
tf_ind_shape = tf.shape(tf_ind)
tf_r = tf.div(tf.range(0, tf_ind_shape[0] * tf_ind_shape[1]), tf_ind_shape[1])
tf_r = tf.expand_dims(tf_r, 1)
tf_ind = tf.expand_dims(tf.reshape(tf_ind, shape = [-1]), 1)
tf_ind = tf.concat([tf_r, tf_ind], axis=1)
res = tf.gather_nd(tf_par, tf_ind)
res = tf.reshape(res, shape = (-1, tf_ind_shape[1], tf_par_shape[2], tf_par_shape[3]))
I have this function:
def resize_image(input_layer, counter ,width):
shape = input_layer.get_shape().as_list()
H = tf.cast((width * shape[2] / shape[1]), tf.int32)
print (H)
resized_images = tf.image.resize_images(input_layer, [width, H], tf.image.ResizeMethod.BICUBIC)
print (resized_images)
pad_diff = width - H
padd_images = tf.pad(resized_images, [[0, 0], [0, pad_diff], [0, 0], [0, 0]] , 'CONSTANT')
return padd_images, counter
When I run this :
sess = tf.InteractiveSession()
I = tf.random_uniform([15, 15, 13, 5], minval = -5, maxval = 10, dtype = tf.float32)
padd_images, counter = resize_image(I, 1, 5)
print (I)
print(padd_images)
sess.run(padd_images)
I get this:
Tensor("Cast/x:0", shape=(), dtype=int32)
Tensor("ResizeBicubic:0", shape=(15, 5, 4, 5), dtype=float32)
Tensor("random_uniform:0", shape=(15, 15, 13, 5), dtype=float32)
Tensor("Pad:0", shape=(?, ?, ?, ?), dtype=float32)
Why there are ? in the shape of padd_images? Is there a way to know its shape?
The problem is a the line
H = tf.cast((width * shape[2] / shape[1]), tf.int32)
Here you're defining a tensor. Thus when you compute:
pad_diff = width - H
you're defining an operation into the graph.
Thus you don't know at compile time what the pad_diff value is, but you'll now it only at runtime.
Since you don't need to have H as a tensor, just use the regular python cast operation, changing thus H with
H = int(width * shape[2] / shape[1])
In this way, the next operations that use H are executed within the python environment and thus the value are known at "compile time".
After that you'll see:
Tensor("Pad:0", shape=(15, 6, 4, 5), dtype=float32)