How to access the value projection at MultiHeadAttention layer in Pytorch - python

I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism I am adding that bias by hand and doing the matrix multiplication for the data with the attention weights outside the attention mechanism:
import torch as th
from torch import nn
# Variable inicialization
B, T, C, H = 2, 3, 4, 2
self_attn = nn.MultiheadAttention(C, H, batch_first = True)
# Tensors
x = th.randn(B, T, C)
attn_bias = th.ones((B, T, T))
# Self-attention mechanism
_, attn_wei = self_attn(query=x, key=x, value=x)
# Adding attention bias
if attn_bias is not None:
attn_wei = attn_wei + attn_bias
x = attn_wei # x # TODO use value(x) instead of x
print(x)
This works, but for using the full potential of self-attention, the last matrix multiplication should be like x = attn_wei # value(x) but I am not able to get the value projector from the selt_attn object as it should have something like that inside of it.
How could I do this?

Related

In Tensorflow, how can I create a new tensor from a cosine similarity operation?

I'm trying to create an output tensor with dimensionality 32 x 576 x 2 from an operation between matrices M and X, with the following shapes:
M.shape: (576, 2, 2048)
X.shape: (32, 2048)
The operation I'm defining is an element-wise cosine similarity, from the following equation:
which represents the cosine similarity between the feature vector 𝑥 and the vector M_j,k.
This is how I've implemented it in code (incorrectly), where BATCH_SIZE=32, C=576, V=2:
#tf.function
def call(self, X):
M = self.kernel
norm_M = tf.norm(M, ord=2, axis=2)
norm_X = tf.norm(X, ord=2, axis=1)
l_r = (some scalar value, separate to this question)
# Compute cosine similarity between X and M
# as a matrix with dimensionality:
# BATCH_SIZE x C x V
feature_batch_size = tf.shape(X)[0]
c = tf.shape(M)[0]
v = tf.shape(M)[1]
output_matrix = tf.zeros([feature_batch_size, c, v])
output_matrix = tf.Variable(output_matrix, trainable=False)
for row in tf.range(feature_batch_size):
for column in tf.range(c):
for channel in tf.range(v):
a = tf.tensordot(M[column][channel], X[row], 1)
b = norm_M[column][channel] * norm_X[row]
output_matrix[row][column][channel] = a / b
return [output_matrix, l_r]
This fails on the line output_matrix[row][column][channel] = a / b because it's unhappy with an assignment to an individual row:column:channel of a tf.Variable.
Is there a better way to do this operation over these two matrices to create the desired output matrix so that it can be done without these three nested for loops and maintain compatibility with the tf.Function graph functionality?
If not, what can I do to assign variables to individual elements on a tf.Variable as I'm unsuccessfully attempting to do here?
Extra information:
norm_M.shape: (576, 2)
norm_X.shape: (32,)
You can replace these loops completely by using vectorized operations in the place of for loops.
num = tf.einsum('ij,klj->ikl',X,M)
denom = tf.einsum('i,jk->ijk',norm_X, norm_M)
output_matrix = num/denom

Implement ConvND in Tensorflow

So I need a ND convolutional layer that also supports complex numbers. So I decided to code it myself.
I tested this code on numpy alone and it worked. Tested with several channels, 2D and 1D and complex. However, I have problems when I do it on TF.
This is my code so far:
def call(self, inputs):
with tf.name_scope("ComplexConvolution_" + str(self.layer_number)) as scope:
inputs = self._verify_inputs(inputs) # Check inputs are of expected shape and format
inputs = self.apply_padding(inputs) # Add zeros if needed
output_np = np.zeros( # I use np because tf does not support the assigment
(inputs.shape[0],) + # Per each image
self.output_size, # Image out size
dtype=self.input_dtype # To support complex numbers
)
img_index = 0
for image in inputs:
for filter_index in range(self.filters):
for i in range(int(np.prod(self.output_size[:-1]))): # for each element in the output
index = np.unravel_index(i, self.output_size[:-1])
start_index = tuple([a * b for a, b in zip(index, self.stride_shape)])
end_index = tuple([a+b for a, b in zip(start_index, self.kernel_shape)])
# set_trace()
sector_slice = tuple(
[slice(start_index[ind], end_index[ind]) for ind in range(len(start_index))]
)
sector = image[sector_slice]
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
# I use Tied Bias https://datascience.stackexchange.com/a/37748/75968
output_np[img_index][index][filter_index] = new_value # The complicated line
img_index += 1
output = apply_activation(self.activation, output_np)
return output
input_size is a tuple of shape (dim1, dim2, ..., dim3, channels). An 2D rgb conv for example will be (32, 32, 3) and inputs will have shape (None, 32, 32, 3).
The output size is calculated from an equation I found in this paper: A guide to convolution arithmetic for deep learning
out_list = []
for i in range(len(self.input_size) - 1): # -1 because the number of input channels is irrelevant
out_list.append(int(np.floor((self.input_size[i] + 2 * self.padding_shape[i] - self.kernel_shape[i]) / self.stride_shape[i]) + 1))
out_list.append(self.filters)
Basically, I use np.zeros because if I use tf.zeros I cannot assign the new_value and I get:
TypeError: 'Tensor' object does not support item assignment
However, in this current state I am getting:
NotImplementedError: Cannot convert a symbolic Tensor (placeholder_1:0) to a numpy array.
On that same assignment. I don't see an easy fix, I think I should change the strategy of the code completely.
In the end, I did it in a very inefficient way based in this comment, also commented here but at least it works:
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
indices = (img_index,) + index + (filter_index,)
mask = tf.Variable(tf.fill(output_np.shape, 1))
mask = mask[indices].assign(0)
mask = tf.cast(mask, dtype=self.input_dtype)
output_np = array * mask + (1 - mask) * new_value
I say inefficient because I create a whole new array for each assignment. My code is taking ages to compute for the moment so I will keep looking for improvements and post here if I get something better.

map function in Pytorch

Is there any map function in Pytorch? (something like map in python).
I need to map a 1xDxhxw tensor variable to a 1x(9D)xhxw tensor, to augment embedding of each pixel with its 8 neighbour embeddings. Is there any functionality in Pytorch that lets me do that efficiently?
I tried using map in Python this way:
n, d, h, w = embedding.size()
padder = nn.ReflectionPad2d(padding=1)
embedding = padder(embedding)
embedding = map(lambda i, j, M: M[:, :, i-1:i+2, j-1:j+2], range(1, h), range(1, w), embedding)
But it does not work for w > 2 and h > 2.
From your question, it is not clear what you are attempting to accomplish.
Note that full Python is supported in PyTorch, but what you are doing is creating a map object in your last line of code. The following should work for your purposes (? I'm guessing) though:
import torch
import torch.nn as nn
n, d, h, w = 20, 3, 32, 32
embedding = torch.randn(n, d, h, w)
padder = nn.ReflectionPad2d(padding=1)
embedding = padder(embedding)
new = [embedding[:,:, (i-1):(i+2), (j-1):(j+2)] for i, j in zip(range(1,h), range(1,w))]
Note however, that there are more elegant ways to chunk up a tensor (e.g. torch.chunk() or to operate on patches with convolutions (e.g. torch.nn.Conv2d)

Determining tensor shapes at time of graph creation in TensorFlow

I'm trying to write a chunk of reusable code that reads the shape of one tensor and then uses the resulting object to define the shape of other tensors. I have a choice of reading the dynamic shape of the tensor with tf.shape(tensor) or the static shape of the tensor with tensor.get_shape(). The toy example looks like this (with the two different strategies):
def my_function_strategy_1(x, y):
x_shape = tf.shape(x)
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
def my_function_strategy_2(x, y):
x_shape = x.get_shape()
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
I want to use this chunk of code in different graphs. Sometimes the shape of the input tensors will be known and sometimes they will be unknown:
graph_A = tf.Graph()
with graph_A.as_default():
x = tf.placeholder(tf.float32, [2, 4])
y = tf.placeholder(tf.float32, [8])
a, b, c, d = my_function(x, y)
with graph_B.as_default():
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
The behavior I want is: (A) When the shapes of the input tensors are known (as in graph_A), I want TensorFlow to calculate all of the shapes in the graph at graph creation time (so it can efficiently allocate resources, etc.), and (B) When the shapes of the input tensors are unknown (as in graph_B), I want the TensorFlow to wait until runtime to calculate all of the shapes in the graph.
The strategy_1 version of the function almost does this. It achieves (B), but it doesn't quite achieve (A) because TensorFlow leaves the shape of some tensors unknown. For example, in the toy example above, the shapes of a, b, and c are calculated at graph creation time, but the shape of d is left unknown (even though d uses very similar operations). You can check this by printing a.get_shape(), b.get_shape(), etc.
Conversely, the strategy_2 version of the function achieves (A) for all tensors in the graph, but doesn't achieve (B) because TensorFlow (understandably) throws an exception when it tries to use the (unknown) static shape of the input tensor to shape other tensors.
Is there a way to achieve both (A) and (B) in a single function? How/why does the strategy_1 version work for most tensors in the graph, but not all?
You can carefully pick the elements of the shape that you know to have a "best of both worlds" result:
def my_get_shape(tensor):
if tensor.shape.ndims is None:
# Fully dynamic
return tf.shape(tensor)
if tensor.shape.is_fully_defined():
# Fully static
return tensor.shape
# Partially static
dyn_shape = tf.shape(tensor)
shape = []
for i, d in enumerate(tensor.shape):
shape.append(d.value if d.value is not None else dyn_shape[i])
return shape
def my_function(x, y):
x_shape = my_get_shape(x) # Or just tf.shape(x)! - see edit
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
# Fully static
with tf.Graph().as_default():
x = tf.placeholder(tf.float32, [2, 4])
y = tf.placeholder(tf.float32, [8])
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: (2, 4) , b: (2, 4) , c: (2, 4) , d: (2, 4)
# Fully dynamic
with tf.Graph().as_default():
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: <unknown> , b: <unknown> , c: (?, 4) , d: (?, 4)
# Partially static
with tf.Graph().as_default():
x = tf.placeholder(tf.float32, [None, 4])
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: (?, 4) , b: (?, 4) , c: (?, 4) , d: (?, 4)
EDIT:
Actually, replacing my_get_shape with tf.shape in the previous snippet works exacly the same. It seems that tf.shape should be the default (being careful not to cram the graph with it) unless you explicitly want to keep dimensions undefined.
I have investigated a bit, and I couldn't work the whole thing out completely. I don't know if this is useful, but here are some things I found out. Apparently TensorFlow has, at C++ level (it seems it used to be in Python before, but not anymore), a "shape inference" mechanism. If you look, for example, in tensorflow/core/ops/array_ops.cc) you will see that every operation declaration includes a .SetShapeFn at the end, which is a function that uses InferenceContext to try to guess the output shape of the operation. This class can, among other things , check whether values in a tensor are already known, which is true for example for tf.shape when the given tensor is static or for tf.fill (and related like tf.ones) with known values. The resolution of the shape inference algorithm is what is set as tensor shape in Python, and it can be called directly (although I don't see how it can be useful) through call_cpp_shape_fn:
from tensorflow.python.framework.common_shapes import call_cpp_shape_fn
with tf.Graph().as_default():
print(call_cpp_shape_fn(tf.reshape(tf.placeholder(tf.float32), tf.fill([2], 3)).op))
# Shows this:
# {
# 'shapes': [dim { size: 3 } dim { size: 3 }],
# 'handle_data': [None],
# 'inputs_needed': b'\x12\x01\x01'
# }
print(call_cpp_shape_fn(tf.reshape(tf.placeholder(tf.float32), (2 * tf.fill([2], 3))).op))
# Shows this:
# {
# 'shapes': [dim { size: -1 } dim { size: -1 }],
# 'handle_data': [None],
# 'inputs_needed': b'\x12\x01\x01'
# }
You can see that, while tf.fill([2], 3) was correctly inspected, TensorFlow didn't work out that 2 * tf.fill([2], 3) is [6, 6], presumably because statically keeping track of operations like multiplication, even if operands are known constants, was deemed too expensive.
What I haven't found out is where do ops declare that their values can be statically known, or where/how these values are retrieved exactly. It seems that, for example, for tf.shape, it is able to specifically pick known values and leave the rest as undefined.

Tensorflow runtime determine the shape of Tensor

Does Tensorflow support runtime determine the shape of Tensor?
The problem is to build a Constant tensor in runtime based on the input vector length_q. The number of columns of the target tensor is the sum of length_q. The code snippet is shown as follows, the length of length_q is fixed to 64.
T = tf.reduce_sum(length_q, 0)[0]
N = np.shape(length_q)[0]
wm = np.zeros((N, T), dtype=np.float32)
# Something inreletive.
count = 0
for i in xrange(N):
ones = np.ones(length_q[i])
wm[i][count:count+length_q[i]] = ones
count += length_q[i]
return tf.Constant(wm)
Update
I want to create a dynamic Tensor according to the input length_q. length_q is some input vector (64*1). The new tensor's shape I want to create depends on the sum of length_q because in each batch the data in length_q changes. The current code snippet is as follows:
def some_matrix(length_q):
T = tf.reduce_sum(length_q, 0)[0]
N = np.shape(length_q)[0]
wm = np.zeros((N, T), dtype=np.float32)
count = 0
return wm
def network_inference(length_q):
wm = tf.constant(some_matrix(length_q));
...
And the problem occurs probably because length_q is the placeholder and doesn't have summation operation. Are there some ways to solve this problem?
It sounds like the tf.fill() op is what you need. This op allows you to specify a shape as a tf.Tensor (i.e. a runtime value) along with a value:
def some_matrix(length_q):
T = tf.reduce_sum(length_q, 0)[0]
N = tf.shape(length_q)[0]
wm = tf.fill([T, N], 0.0)
return wm
Not clear about what you are calculating. If you need to calculate N shape, you can generate ones like this
T = tf.constant(20.0,tf.float32) # tf variable which is reduced sum , 20.0 is example float value
T = tf.cast(T,tf.int32) # columns will be integer only
N = 10 # if numpy integer- assuming np.shape giving 10
# N = length_q.getshape()[0] # if its a tensor, 'lenght_q' replace by your tensor name
wm = tf.ones([N,T],dtype=tf.float32) # N rows and T columns
sess = tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(wm)

Categories