there might be an obvious solution, but I haven't found it yet. I want to do a simple multiplication, where I have one tensor that gives me a kind of weight vector and another one that are stacked tensors (same number as weights). It seems straight forward using tf.tensordot but that doesn't work for unknown batch sizes.
import collections
import tensorflow as tf
tf.reset_default_graph()
x = tf.placeholder(shape=(None, 4, 1), dtype=tf.float32, name='x')
y_true = tf.placeholder(shape=(None, 4, 1), dtype=tf.float32, name='y_true')
# These are the models that I want to combine
linear_model0 = tf.layers.Dense(units=1, name='linear_model0')
linear_model1 = tf.layers.Dense(units=1, name='linear_model1')
agents = collections.OrderedDict()
agents[0] = linear_model0(x) # shape (?,4,1)
agents[1] = linear_model1(x) # shape (?,4,1)
stacked = tf.stack(list(agents.values()), axis=1) # shape (?,2,4,1)
# This is the model that produces the weights
x_flat = tf.layers.Flatten()(x)
weight_model = tf.layers.Dense(units=2, name='weight_model')
weights = weight_model(x_flat) # shape: (?,2)
# This is the final output
y_pred = tf.tensordot(weights, stacked, axes = 2, name='y_pred')
# PROBLEM HERE: shape: (4,1) instead of (?,4,1)
# Running the whole thing
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
# Example1 (output of shape (1,4,1) expected, got (4,1))
print('model', sess.run(y_pred,
{x: [[[1], [2], [3], [4]]]}).shape)
# Example2 (output of (2,4,1) expected, got (4,1))
print('model', sess.run(y_pred,
{x: [[[1], [2], [3], [4]], [[1], [2], [3], [4]]]}).shape)
So, the multiplication works as expected for the first input, but does only that first one and not a batch of inputs. Any help?
Similar questions that didn't resolve my issue:
obstacles in tensorflow's tensordot using batch multiplication
The tf.tensordot is not suitable in the case because, based on your explanation, it necessary to set axis equal to 1 which cause the incompatibility in matrix sizes. One is [batch_size, 2] the other is [batch_size, 8]. On the other, If you set the axis to [[1],[1]] it is not what you expected:
tf.tensordot(weights, stacks, axes=[[1],[1]]) # shape = (?,?,1,1)
How to fix the issue?
Use tf.ensim as contraction between tensors of arbitrary dimension:
tf.einsum('ij,ijkl->ikl', weights, stacked)
Related
In setting up the model I sometimes see the code:
# Scenario 1
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=Y))
or
# Scenario 2
# Evaluate model (with test logits, for dropout to be disabled)
prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))
The definition of tf.reduce_mean states that it "calculates the mean of tensor elements along various dimensions of the tensor." I am confused about what it does in simpler language? When do we need to use it, maybe with reference to # Scenario 1 & 2 ? Thank you
As far as I understand, tensorflow.reduce_mean is the same as numpy.mean. It creates an operation in the underlying tensorflow graph which computes the mean of a tensor.
The most important keyword argument of tensorflow.reduce_mean is axis. Basically, if you have a tensor with shape (4, 3, 2) and axis=1, an empty array with shape (4, 2) will be created, and the mean values along the selected axis will be computed to fill in the empty array. (This is just a pseudo-process to help you make sense of the output, but may not be the actual process)
Here is a simple example to help you understand
import tensorflow as tf
import numpy as np
one = np.linspace(1, 30, 30).reshape(5, 3, 2)
x = tf.placeholder('float32', shape=[5, 3, 2])
op_1 = tf.reduce_mean(x)
op_2 = tf.reduce_mean(x, axis=0)
op_3 = tf.reduce_mean(x, axis=1)
op_4 = tf.reduce_mean(x, axis=2)
with tf.Session() as sess:
print(sess.run(op_1, feed_dict={x: one}))
print(sess.run(op_2, feed_dict={x: one}))
print(sess.run(op_3, feed_dict={x: one}))
print(sess.run(op_4, feed_dict={x: one}))
The first output is a number because we didn't provide an axis. The shapes of the rest of the outputs are (3, 2), (5, 2) and (5, 3), respectively.
reduce_mean can be useful when the target value is a matrix.
User #meTchaikovsky explained the general case of tf.reduce_mean. In both of your cases tf.reduce_mean simply works as any mean calculator i.e,. you're not taking mean along any particular axis of a tensor, you simply divide the sum of the elements in a tensor by number of elements.
Let's decode what exactly is happening in both the cases. For the both the cases assume batch_size = 2 and num_classes = 5, meaning that there are two examples per batch.
Now for the first case, tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y) returns an array of shape (2,).
>>import numpy as np
>>import tensorflow as tf
>>sess= tf.InteractiveSession()
>>batch_size = 2
>>num_classes = 5
>>logits = np.random.rand(batch_size,num_classes)
>>print(logits)
[[0.94108451 0.68186329 0.04000461 0.25996487 0.50391948]
[0.22781201 0.32305269 0.93359371 0.22599208 0.05942905]]
>>labels = np.array([[1,0,0,0,0],[0,1,0,0,0]])
>>print(labels)
[[1 0 0 0 0]
[0 1 0 0 0]]
>>logits_ = tf.placeholder(dtype=tf.float32,shape=(batch_size,num_classes))
>>Y_ = tf.placeholder(dtype=tf.int32,shape=(batch_size,num_classes))
>>loss_op = tf.nn.softmax_cross_entropy_with_logits(logits=logits_, labels=Y_)
>>loss_per_example = sess.run(loss_op,feed_dict={Y_:labels,logits_:logits})
>>print(loss_per_example)
array([1.2028817, 1.6912657], dtype=float32)
You can see that loss_per_example is of shape (2,). If we take the mean of this variable then we can approximate the average loss for the full batch. Hence we calculate
>>loss_per_example_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size))
>>final_loss_per_batch = tf.reduce_mean(loss_per_example_holder)
>>final_loss = sess.run(final_loss_per_batch,feed_dict={loss_per_example_holder:loss_per_example})
>>print(final_loss)
1.4470737
Coming to your second case:
>>predictions_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size,num_classes))
>>labels_holder = tf.placeholder(dtype=tf.int32,shape=(batch_size,num_classes))
>>prediction_tf = tf.equal(tf.argmax(predictions_holder, 1), tf.argmax(labels_holder, 1))
>>labels_match = sess.run(prediction_tf,feed_dict={predictions_holder:logits,labels_holder:labels})
>>print(labels_match)
[ True False]
The above output was expected because only the first example of the variable logits says that the neuron with highest activation (0.9410) is zeroth which is same as labels. Now we want to calculate the accuracy, which means we have to take the average of the variable labels_match.
>>labels_match_holder = tf.placeholder(dtype=tf.float32,shape=(batch_size))
>>accuracy_calc = tf.reduce_mean(tf.cast(labels_match_holder, tf.float32))
>>accuracy = sess.run(accuracy_calc, feed_dict={labels_match_holder:labels_match})
>>print(accuracy)
0.5
I have two tensors, one of shape [None, 20, 2], and one of shape [None, 1].
I would like to do an operation on each of the sub-tensors in lockstep to produce a value such that I would end up with a tensor of shape [None, 1].
In python land, I would zip these two, and iterate over the result.
So, just to be clear, I'd like to write a function that takes a [20, 2]-shape tensor and a [1]-shape tensor, and produces a [1]-shape tensor, then apply this function to the [None, 20, 2] and [None, 1] tensors, to produce a [None, 1] tensor.
Hope I articulated that well enough; higher dimensionality makes my head spin sometimes.
This works for me (TensorFlow version 1.4.0)
tf.reset_default_graph()
sess = tf.Session()
# Define placeholders with undefined first dimension.
a = tf.placeholder(dtype=tf.float32, shape=[None, 3, 4])
b = tf.placeholder(dtype=tf.float32, shape=[None, 1])
# Create some input data.
a_input = np.arange(24).reshape(2, 3, 4)
b_input = np.arange(2).reshape(2, 1)
# TensorFlow map function.
def f_tf(x):
return tf.reduce_sum(x[0]) + tf.reduce_sum(x[1])
# Numpy map function (for validation of results).
def f_numpy(x):
return np.sum(x[0]) + np.sum(x[1])
# Run TensorFlow function.
s = tf.map_fn(f, [a, b], dtype=tf.float32)
sess.run(s, feed_dict={a: a_input, b: b_input})
array([ 66., 211.], dtype=float32)
# Run Numpy function.
for inp in zip(a_input, b_input):
print(f_numpy(inp))
66
211
I am trying to develop a seq2seq model from a low level perspective (creating by myself all the tensors needed). I am trying to feed the model with a sequence of vectors as a two-dimensional tensor, however, i can't iterate over one dimension of the tensor to extract vector by vector. Does anyone know what could I do to feed a batch of vectors and later get them one by one?
This is my code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
gru_layer1 = GRU(input_array, input_dim, hidden_dim) #This is a class created by myself
for i in range(input_array.shape[-1]):
word = input_array[:,i]
previous_state = gru_encoder.h_t
gru_layer1.forward_pass(previous_state,word)
And this is the error that I get
TypeError: Expected binary or unicode string, got <tf.Tensor 'input_7:0' shape=(10, ?) dtype=float64>
Tensorflow does deferred execution.
You usually can't know how big the vector will be (words in a sentance, audio samples, etc...). The common thing to do is to cap it at some reasonably large value and then pad the shorter sequences with an empty token.
Once you do this you can select the data for a time slice with the slice operator:
data = tf.placeholder(shape=(batch_size, max_size, numer_of_inputs))
....
for i in range(max_size):
time_data = data[:, i, :]
DoStuff(time_data)
Also lookup tf.transpose for swapping batch and time indices. It can help with performance in certain cases.
Alternatively consider something like tf.nn.static_rnn or tf.nn.dynamic_rnn to do the boilerplate stuff for you.
Finally I found an approach that solves my problem. It worked using tf.scan() instead of a loop, which doesn't require the input tensor to have a defined number in the second dimension. Consecuently you hace to prepare the input tensor previously to be parsed as you want throught tf.san(). In my case this is the code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
x_t = tf.transpose(input_array, [1, 0], name='x_t')
h_0 = tf.convert_to_tensor(h_0, dtype=tf.float64)
h_t_transposed = tf.scan(forward_pass, x_t, h_0, name='h_t_transposed')
h_t = tf.transpose(h_t_transposed, [1, 0], name='h_t')
I am having difficulty applying tf.scatter_nd_add() to 2D tensors. The documentation is a bit unclear and has does not contain an example for sparse update but only for full slice updates.
My case is the following:
updates - 2D tensor of shape [None, 6]
indices - 2D tensor of shape [None, 6]
ref - 2D Variable of zeros of shape [None, 6]
It is guaranteed that updates, indices and ref will always have their first dimension equal, but the size of that dimension can be varying. The update I want to perform looks like
for i, j:
k = indices[i][j]
ref[i][k] += updates[i][j]
Note that indices contains duplicates. tf.scatter_nd_add(ref, indices, updates) complains about shape mismatch and I cannot figure out how I need to restructure the tensors in order to performs the update.
I figured it out. Each 2D entry in indices must actually specify the absolute location that will get updated in ref. This means that indices must be 3D and then the non-vectorized update looks like:
for i, j:
r, k = indices[i][j]
ref[r][k] += updates[i][j]
In the above question it just happens that r is always equal to i.
Here is a full Tensorflow implementation with varying shapes. For clarity, in the following example, col_indices corresponds to indices from the original question:
import tensorflow as tf
import numpy as np
updates = tf.placeholder(dtype=tf.float32, shape=[None, 6])
col_indices = tf.placeholder(dtype=tf.int32, shape=[None, 6])
row_indices = tf.cumsum(tf.ones_like(col_indices), axis=0, exclusive=True)
indices = tf.concat([tf.expand_dims(row_indices, axis=-1),
tf.expand_dims(col_indices, axis=-1)], axis=-1)
tmp_var = tf.Variable(0, trainable=False, dtype=tf.float32, validate_shape=False)
ref = tf.assign(tmp_var, tf.zeros_like(updates), validate_shape=False)
# This makes sure that ref is always 0 before scatter_nd_add() runs
with tf.control_dependencies([target_var]):
result = tf.scatter_nd_add(ref, indices, updates)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Create example input data
np_input = np.arange(0, 6, 1, dtype=np.int32)
np_input = np.tile(np_input[None,:], [10, 1])
res = sess.run(result, feed_dict={updates: np_input, col_indices: np_input})
print(res)
I would like to train a network with two different shapes of input tensor. Each epoch chooses one type.
Here I write a small code:
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
imgs1 = tf.placeholder(tf.float32, [4, 224, 224, 3], name = 'input_imgs1')
imgs2 = tf.placeholder(tf.float32, [4, 180, 180, 3], name = 'input_imgs2')
epoch_num_tf = tf.placeholder(tf.int32, [], name = 'input_epoch_num')
imgs = tf.cond(tf.equal(tf.mod(epoch_num_tf, 2), 0),
lambda: tf.Print(imgs2, [imgs2.get_shape()], message='(even number) input epoch number is '),
lambda: tf.Print(imgs1, [imgs1.get_shape()], message='(odd number) input epoch number is'))
print(imgs.get_shape())
for epoch in range(10):
epoch_num = np.array(epoch).astype(np.int32)
imgs1_input = np.ones([4, 224, 224, 3], dtype = np.float32)
imgs2_input = np.ones([4, 180, 180, 3], dtype = np.float32)
output = sess.run(imgs, feed_dict = {epoch_num_tf: epoch_num,
imgs1: imgs1_input,
imgs2: imgs2_input})
When I execute it, the output of imgs.get_shape() is (4, ?, ?, 3)
i.e. imgs.get_shape()[1]=None, imgs.get_shape()[2]=None.
But I will use the value of the output of imgs.get_shape() to define the kernel (ksize) and strides size (strides) of the tf.nn.max_pool() e.g. ksize=[1,imgs.get_shape()[1]/6, imgs.get_shape()[2]/6, 1] in the future code.
I think ksize and strides cannot support tf.Tensor value.
How to solve this problem? Or how to set the shape of imgs conditionally?
When you do print(a.get_shape()), you are getting the static shape of the tensor a. Assuming you mean imgs.get_shape() and not a.get_shape() in the code above, dimensions 1 and 2 of imgs vary dynamically with the value of epoch_num_tf. Therefore the static shape in those dimensions is unknown, which TensorFlow represents as None.
If you want to use the dynamic shape of imgs in subsequent code, you should use the tf.shape() operator to get the shape as a tf.Tensor. For example, instead of imgs.get_shape()[2], you can use tf.shape(imgs)[2].
Unfortunately, the ksize and strides arguments of tf.nn.max_pool() do not accept tf.Tensor values. (I think this is a historical limitation, because these were configured as "attrs" rather than "inputs" of the corresponding kernel. Please open a GitHub issue if you'd like to request this feature!) One possible workaround would be to use another tf.cond():
imgs = ...
# Could also use `tf.equal(tf.mod(epoch_num_tf, 2), 0)` as the predicate.
pool_output = tf.cond(tf.equal(tf.shape(imgs)[2], 180),
lambda: tf.nn.max_pool(imgs, ksize=[1, 180/6, 180/6, 1], ...),
lambda: tf.nn.max_pool(imgs, ksize=[1, 224/6, 224/6, 1], ...))