Determining tensor shapes at time of graph creation in TensorFlow - python

I'm trying to write a chunk of reusable code that reads the shape of one tensor and then uses the resulting object to define the shape of other tensors. I have a choice of reading the dynamic shape of the tensor with tf.shape(tensor) or the static shape of the tensor with tensor.get_shape(). The toy example looks like this (with the two different strategies):
def my_function_strategy_1(x, y):
x_shape = tf.shape(x)
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
def my_function_strategy_2(x, y):
x_shape = x.get_shape()
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
I want to use this chunk of code in different graphs. Sometimes the shape of the input tensors will be known and sometimes they will be unknown:
graph_A = tf.Graph()
with graph_A.as_default():
x = tf.placeholder(tf.float32, [2, 4])
y = tf.placeholder(tf.float32, [8])
a, b, c, d = my_function(x, y)
with graph_B.as_default():
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
The behavior I want is: (A) When the shapes of the input tensors are known (as in graph_A), I want TensorFlow to calculate all of the shapes in the graph at graph creation time (so it can efficiently allocate resources, etc.), and (B) When the shapes of the input tensors are unknown (as in graph_B), I want the TensorFlow to wait until runtime to calculate all of the shapes in the graph.
The strategy_1 version of the function almost does this. It achieves (B), but it doesn't quite achieve (A) because TensorFlow leaves the shape of some tensors unknown. For example, in the toy example above, the shapes of a, b, and c are calculated at graph creation time, but the shape of d is left unknown (even though d uses very similar operations). You can check this by printing a.get_shape(), b.get_shape(), etc.
Conversely, the strategy_2 version of the function achieves (A) for all tensors in the graph, but doesn't achieve (B) because TensorFlow (understandably) throws an exception when it tries to use the (unknown) static shape of the input tensor to shape other tensors.
Is there a way to achieve both (A) and (B) in a single function? How/why does the strategy_1 version work for most tensors in the graph, but not all?

You can carefully pick the elements of the shape that you know to have a "best of both worlds" result:
def my_get_shape(tensor):
if tensor.shape.ndims is None:
# Fully dynamic
return tf.shape(tensor)
if tensor.shape.is_fully_defined():
# Fully static
return tensor.shape
# Partially static
dyn_shape = tf.shape(tensor)
shape = []
for i, d in enumerate(tensor.shape):
shape.append(d.value if d.value is not None else dyn_shape[i])
return shape
def my_function(x, y):
x_shape = my_get_shape(x) # Or just tf.shape(x)! - see edit
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
# Fully static
with tf.Graph().as_default():
x = tf.placeholder(tf.float32, [2, 4])
y = tf.placeholder(tf.float32, [8])
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: (2, 4) , b: (2, 4) , c: (2, 4) , d: (2, 4)
# Fully dynamic
with tf.Graph().as_default():
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: <unknown> , b: <unknown> , c: (?, 4) , d: (?, 4)
# Partially static
with tf.Graph().as_default():
x = tf.placeholder(tf.float32, [None, 4])
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: (?, 4) , b: (?, 4) , c: (?, 4) , d: (?, 4)
Actually, replacing my_get_shape with tf.shape in the previous snippet works exacly the same. It seems that tf.shape should be the default (being careful not to cram the graph with it) unless you explicitly want to keep dimensions undefined.
I have investigated a bit, and I couldn't work the whole thing out completely. I don't know if this is useful, but here are some things I found out. Apparently TensorFlow has, at C++ level (it seems it used to be in Python before, but not anymore), a "shape inference" mechanism. If you look, for example, in tensorflow/core/ops/ you will see that every operation declaration includes a .SetShapeFn at the end, which is a function that uses InferenceContext to try to guess the output shape of the operation. This class can, among other things , check whether values in a tensor are already known, which is true for example for tf.shape when the given tensor is static or for tf.fill (and related like tf.ones) with known values. The resolution of the shape inference algorithm is what is set as tensor shape in Python, and it can be called directly (although I don't see how it can be useful) through call_cpp_shape_fn:
from tensorflow.python.framework.common_shapes import call_cpp_shape_fn
with tf.Graph().as_default():
print(call_cpp_shape_fn(tf.reshape(tf.placeholder(tf.float32), tf.fill([2], 3)).op))
# Shows this:
# {
# 'shapes': [dim { size: 3 } dim { size: 3 }],
# 'handle_data': [None],
# 'inputs_needed': b'\x12\x01\x01'
# }
print(call_cpp_shape_fn(tf.reshape(tf.placeholder(tf.float32), (2 * tf.fill([2], 3))).op))
# Shows this:
# {
# 'shapes': [dim { size: -1 } dim { size: -1 }],
# 'handle_data': [None],
# 'inputs_needed': b'\x12\x01\x01'
# }
You can see that, while tf.fill([2], 3) was correctly inspected, TensorFlow didn't work out that 2 * tf.fill([2], 3) is [6, 6], presumably because statically keeping track of operations like multiplication, even if operands are known constants, was deemed too expensive.
What I haven't found out is where do ops declare that their values can be statically known, or where/how these values are retrieved exactly. It seems that, for example, for tf.shape, it is able to specifically pick known values and leave the rest as undefined.


How to access the value projection at MultiHeadAttention layer in Pytorch

I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism I am adding that bias by hand and doing the matrix multiplication for the data with the attention weights outside the attention mechanism:
import torch as th
from torch import nn
# Variable inicialization
B, T, C, H = 2, 3, 4, 2
self_attn = nn.MultiheadAttention(C, H, batch_first = True)
# Tensors
x = th.randn(B, T, C)
attn_bias = th.ones((B, T, T))
# Self-attention mechanism
_, attn_wei = self_attn(query=x, key=x, value=x)
# Adding attention bias
if attn_bias is not None:
attn_wei = attn_wei + attn_bias
x = attn_wei # x # TODO use value(x) instead of x
This works, but for using the full potential of self-attention, the last matrix multiplication should be like x = attn_wei # value(x) but I am not able to get the value projector from the selt_attn object as it should have something like that inside of it.
How could I do this?

Best way to mimic PyTorch sliced assignment with Keras/Tensorflow

I am trying to mimic the operation done in PyTorch below:
vol = Variable(torch.FloatTensor(A, B*2, C, D, E).zero_()).cuda()
for i in range(C):
if i > 0 :
vol[:, :B, i, :,i:] = input0[:,:,:,i:]
vol[:, B:, i, :,i:] = input1[:,:,:,:-i]
vol[:, :B, i, :,:] = input0
vol[:, B:, i, :,:] = input1
So far, I have tried using the following sliced assignment in TF and wrapping it in a Keras Lambda layer:
vol = tf.Variable(K.zeros((A, D, E, C, B*2)))
for i in range(C):
if i > 0:
vol[:, :, i:, i, :B].assign(input0[:,:,i:,:])
vol[:, :, i:, i, B:].assign(input1[:,:,:-i,:])
vol[:, :, :, i, :B].assign(input0)
vol[:, :, :, i, B:].assign(input1)
return vol
I also tried vol = vol[...].assign(...).
This assigns the values to the vol variable correctly, which I can then convert to a tensor to use in the rest of my graph. However, the gradient of this operation is undefined in TF (LookupError: No gradient defined for operation 'strided_slice/_assign' (op type: StridedSliceAssign)), and the gradient doesn't get propagated to the previous layers that generate input0 and input1, while they do appear to get transferred in the PyTorch implementation. Is there a way to construct this same variable in TF such that the gradient is defined and my previous operations don't have a None gradient?
You need to construct the tensor "by hand". Assuming both input0 and input1 have shape (A, D, E, B), you can do something like this:
# Make the indexing mask with TensorFlow
in_shape = tf.shape(input0)
in_dims = 4
idx = tf.meshgrid(*[tf.range(in_shape[i]) for i in range(in_dims)], indexing='ij')[2]
idx = tf.expand_dims(idx, axis=3)
r = tf.range(C)[tf.newaxis, tf.newaxis, tf.newaxis, :, tf.newaxis]
mask = idx >= r
# If all dimensions are known at graph construction time, you can instead
# make the mask with NumPy like this to save graph computation time
idx = np.meshgrid(*[np.arange(d) for d in (A, D, E, B)], indexing='ij')[2]
idx = np.expand_dims(idx, 3)
r = np.arange(C)[np.newaxis, np.newaxis, np.newaxis, :, np.newaxis]
mask = idx >= r
# Make the tensor
input0_tile = tf.tile(tf.expand_dims(input0, 3), (1, 1, 1, C, 1))
input1_tile = tf.tile(tf.expand_dims(input1, 3), (1, 1, 1, C, 1))
zero_tile = tf.zeros_like(input0_tile)
vol0 = np.where(mask, input0_tile, zero_tile)
vol1 = np.where(mask, input1_tile, zero_tile)
vol = tf.concat([vol0, vol1], axis=-1)
Note that you need either the first or the second block followed by the third block, not the three blocks (see comments). The code builds a binary mask using a tf.meshgrid and a tf.range of indices, then uses tf.where to select values from the inputs or zeros.
A tf.Variable is sort of a primitive/basic type. You shouldn't want to gradients to propagate out of them.
What you want is to construct a node that outputs the 5 dimensional tensor like you want.
I would run a concatenate operation on the 4th dimension to build the tensor and use the result in place of the vol.
If you don't care about the gradients propagating to input0 and input1, then I would just build the tensor outside of tensorflow and use it as an initializer.

How TensorArray and while_loop work together in tensorflow?

I am trying to produce a very easy example for combination of TensorArray and while_loop:
# 1000 sequence in the length of 100
matrix = tf.placeholder(tf.int32, shape=(100, 1000), name="input_matrix")
matrix_rows = tf.shape(matrix)[0]
ta = tf.TensorArray(tf.float32, size=matrix_rows)
ta = ta.unstack(matrix)
init_state = (0, ta)
condition = lambda i, _: i < n
body = lambda i, ta: (i + 1, ta.write(i,*2))
# run the graph
with tf.Session() as sess:
(n, ta_final) =, body, init_state),feed_dict={matrix: tf.ones(tf.float32, shape=(100,1000))})
print (ta_final.stack())
But I am getting the following error:
ValueError: Tensor("while/LoopCond:0", shape=(), dtype=bool) must be from the same graph as Tensor("Merge:0", shape=(), dtype=float32).
Anyone has on idea what is the problem?
There are several things in your code to point out. First, you don't need to unstack the matrix into the TensorArray to use it inside the loop, you can safely reference the matrix Tensor inside the body and index it using matrix[i] notation. Another issue is the different data type between your matrix (tf.int32) and the TensorArray (tf.float32), based on your code you're multiplying the matrix ints by 2 and writing the result into the array so it should be int32 as well. Finally, when you wish to read the final result of the loop, the correct operation is TensorArray.stack() which is what you need to run in your call.
Here's a working example:
import numpy as np
import tensorflow as tf
# 1000 sequence in the length of 100
matrix = tf.placeholder(tf.int32, shape=(100, 1000), name="input_matrix")
matrix_rows = tf.shape(matrix)[0]
ta = tf.TensorArray(dtype=tf.int32, size=matrix_rows)
init_state = (0, ta)
condition = lambda i, _: i < matrix_rows
body = lambda i, ta: (i + 1, ta.write(i, matrix[i] * 2))
n, ta_final = tf.while_loop(condition, body, init_state)
# get the final result
ta_final_result = ta_final.stack()
# run the graph
with tf.Session() as sess:
# print the output of ta_final_result
print, feed_dict={matrix: np.ones(shape=(100,1000), dtype=np.int32)})

Difference in matrix multiplication tensorflow vs numpy

I have a case where matrix multiplication of two matrices with certain dimensions work in numpy, but doesn't work in tensorflow.
x = np.ndarray(shape=(10,20,30), dtype = float)
y = np.ndarray(shape=(30,40), dtype = float)
z = np.matmul(x,y)
print("np shapes: %s x %s = %s" % (np.shape(x), np.shape(y), np.shape(z)))
This works as expected and prints:
np shapes: (10, 20, 30) x (30, 40) = (10, 20, 40)
However in tensorflow when I try to multiply placeholder and variable of the same shapes as the numpy arrays above I get an error
x = tf.placeholder(tf.float32, shape=(10,20,30))
y = tf.Variable(tf.truncated_normal([30,40], name='w'))
print("tf shapes: %s x %s" % (x.get_shape(), y.get_shape()))
Results in
tf shapes: (10, 20, 30) x (30, 40)
Shape must be rank 2 but is rank 3 for 'MatMul_12'
(op: 'MatMul') with input shapes: [10,20,30], [30,40].
Why does this operation fail?
Don't know why tf.matmul does not support this kind of multiplication (may be one of the core developers could provide a meaningful answer).
But if you just want to be able to multiply tensors in this way, take a look at tf.einsum function. It could operate with tensors of arbitrary rank.
As suggested by Dmytro tf.einsum can be used to multiply these two arrays.
x = np.ndarray(shape=(10,20,30), dtype = float)
y = np.ndarray(shape=(30,40), dtype = float)
These two operations produce exactly the same result:
np.einsum('ijk,kl->ijl', x, y)
And corresponding tensorflow operation also works
tf.einsum('ijk,kl->ijl', tf_x,tf_y)
People already told you that you can use tf.einsum() to get the result you want.
import tensorflow as tf
x = tf.random_normal([10, 20, 30])
y = tf.random_normal([30, 40])
z = tf.einsum('ijk,kl->ijl', x, y)
The reason why tf.matmul() does not work the way you expected is written in the documentation.
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
In your case you have a matrix y and a tensor x (rank 3 > 2). In your case inner dimensions do not match. If you want, them to match, you will need to have something like this:
import tensorflow as tf
a, b, c = 12, 50, 20
x = tf.random_normal([a, b, c])
y = tf.random_normal([a, c, b])
z = tf.matmul(x, y)
But clearly it calculates not the stuff you want.

Folding two lists in tensorflow

How do I fold two tensors using tensorflow? tensorflow.foldl takes as input
a function of type a, b -> a (Here a and b represents the type of tensors of a particular shape)
a Tensor that can be unpacked into a list [b] of entries of type b
an initial accumulator of type a
I need a function that takes as input
a function of type a, b, c -> a
a Tensor that can be unpacked into a list [b] of entries of type b
a Tensor that can be unpacked into a list [c] of entries of type c
an initial accumulator of type a.
Use a while loop:
import tensorflow as tf
def fold2(f, li1, li2, init):
(_, a1) = tf.while_loop(lambda i, a: i<tf.shape(li1)[0], lambda i, a: (i+1, f(a, li1[i], li2[i])), (0,init))
return a1
Incidentally, this also works with TensorArray's, while tf.foldl does not.
Use concat and transpose and fold over 0 dimension, but it will work only for same types. Example:
data_x = [[i for i in range(1,11)]]
data_y = [[10*i for i in range(1,11)]]
x = tf.placeholder(tf.float32, shape=(1,10))
y = tf.placeholder(tf.float32, shape=(1,10))
c = tf.constant(100.)
cn = tf.concat([x,y], axis=0)
t = tf.transpose(cn)
f = tf.foldl(lambda a,y: a+y[0]+y[1], t, c)
with tf.Session() as sess:
res =, feed_dict={x: data_x, y: data_y})
