I want to initialize a w_gate tensor with a custom np.array as in the code below:
w_init = np.ones(shape=(dim, self.config.nmodels)) / self.config.nmodels
w_gate = tf.Variable(
name="W",
initial_value=w_init,
dtype=tf.float32)
Every a certain number of train iterations, I want w_gate to be re-initialized again to the w_init array. For this, and based on Re-initialize variables in Tensorflow, I tried
sess.run(tf.variables_initializer([w_gate]))
inside my training loop. This line is executed every certain number of iterations. Although, w_gate doesn't seem to be re-initialized. What am I missing here?
Could you try this and check ?
w_gate_assign = tf.assign(w_gate, w_init)
sess.run(w_gate_assign)
Related
Im using tensorflow 2.0 and try to speed up my training by optimizing my code a little bit.
I run my model batchwise and want to safe the results from each batch to have all results at the end of one epoch in one tensor.
This is how my code looks like:
...
for epoch in range(start_epoch, end_epoch):
# this vector shall hold all results for one epoch
predictions_epoch = tf.zeros(0,)
for batch in tf_dataset:
# get prediction with predictions_batch.shape[0] euqals batch_size
predictions_batch = model(batch)
# Add the batch result to the previous results
predictions_epoch = tf.concat(predictions_batch, predictions_epoch)
# DO SOME OTHER STUFF LIKE BACKPROB
...
# predictions_epoch.shape[0] now equals number of all samples in dataset
with writer.as_default():
tf.summary.histogram(name='predictions', data=predictions_epoch, step=epoch)
Lets assume, one prediction is just a scalar value. So predictions_batch is a tensor with shape=[batchsize,].
This way of doing the concaternation just works fine.
Now my question is:
Does this tf.concat() operation slow down my whole training? I also used tf.stack()for this purpose, but it seems like no difference in speed.
I wonder, because once I worked with Matlab, adding new values to a Vector (and hence change its size) within a for-loop was extremly slow. Initializing the vector with zeros and then assign values in the loop was way more efficient regarding speed.
Is this also true for tensorflow? Or is there another more 'proper' way of doing something like adding tensors together in a for-loop which is more clean or faster?
I did not find any alternative solution online.
Thanks for the help.
Yes, this is not the most recommendable way to do it. It is better to simply add each tensor to a list and concatenate them once at the end:
for epoch in range(start_epoch, end_epoch):
predictions_batches = []
for batch in tf_dataset:
predictions_batch = model(batch)
predictions_batches.append(predictions_batch)
# ...
predictions_epoch = tf.concat(predictions_batches)
You can also use a tf.TensorArray, which may be better if you want to decorate the code with tf.function.
for epoch in range(start_epoch, end_epoch):
# Pass arguments as required
# If the number of batches is know or an upper bound
# can be estimated use that and dynamic_size=False
predictions_batches = tf.TensorArray(
tf.float32, INTIAL_SIZE, dynamic_size=True, element_shape=[BATCH_SIZE])
i = tf.constant(0)
for batch in tf_dataset:
predictions_batch = model(batch)
predictions_batches = predictions_batches.write(i, predictions_batch)
i += 1
# ...
predictions_epoch = predictions_batches.concat()
When implementing a custom loss function in Keras, I require a tf.Variable with the shape of the batch size of my input data (y_true, y_pred).
def custom_loss(y_true, y_pred):
counter = tf.Variable(tf.zeros(K.shape(y_true)[0], dtype=tf.float32))
...
However, this produces the error:
You must feed a value for placeholder tensor 'dense_17_target' with dtype float and shape [?,?]
If I fix the batch_size to a value:
def custom_loss(y_true, y_pred):
counter = tf.Variable(tf.zeros(batch_size, dtype=tf.float32))
...
such that |training_set| % batch_size and |val_set| % batch_size are equal to zero, everything works fine.
Are there any suggestions, why the assignment of the variable with batch size based on the shape of the input (y_true and y_pred) does not work?
SOLUTION
I found a satisfying solution that works.
I initialized the variable with the max batch_size possible (specified during model build time) and used the K.shape(y_true)[0] only for slicing the variable. That way it works perfectly. Here the code:
def custom_loss(y_true, y_pred):
counter = tf.Variable(tf.zeros(batch_size, dtype=tf.float32))
...
true_counter = counter[:K.shape(y_true)[0]]
...
It does not work because K.shape returns you a symbolic shape, which is a tensor itself, not a tuple of int values. To get the value from a tensor, you have to evaluate it under a session. See documentation for this. To get a real value prior to evaluation time, use K.int_shape: https://keras.io/backend/#int_shape
However, K.int_shape also not gonna work here, as it is just some static metadata and won't normally reflect the current batch size, but has a placeholder value None.
The solution you found (have a control over the batch size and use it inside the loss) is indeed a good one.
I believe the problem is because you need to know the batch size at the definition time to build the Variable, but it will be known only during the session run time.
If you were working with it as with a tensor, it should be ok, see this example.
An alternative solution is to create a Variable and change its shape dynamically using tf.assign with validate_shape=False:
counter = tf.Variable(0.0)
...
val = tf.zeros(tf.shape(y_true)[:1], 0.0)
counter = tf.assign(counter, val, validate_shape=False)
I have one question about random variables in TensorFlow. Let's suppose I need a random variable inside my loss function.
In TensorFlow tutorials I find random functions used for initialize variables, like weights that in a second time are modified by training process.
In my case I need a random vector of floats (let's say 128 values), that follows a particular distribution (uniform or Gaussian) but that can change in each loss calculation.
Defining this variable in my loss function, is this the simple thing that I need to do, since at each epoch I get new values (that anyway follow the selected distribution) or do I get that the values that are always the same in all the iterations?
A random node in TensorFlow always takes a different value each time it is called, as you can verify by calling it several times
import tensorflow as tf
x = tf.random_uniform(shape=())
sess = tf.Session()
sess.run(x)
# 0.79877698
sess.run(x)
# 0.76016617
It is not a Variable in the tensorflow terminology, as you can check from the code above, which runs without calling variable initialization.
If you assign the values randomly generated to a Variable then this value will remain fixed until you update this variable.
If you, instead, put in the loss function directly the "generation" (tf.random_*) of the numbers, then they'll be different at each call.
Just try this out:
import tensorflow as tf
# generator
x = tf.random_uniform((3,1), minval=0, maxval=10)
# variable
a = tf.get_variable("a", shape=(3,1), dtype=tf.float32)
# assignment
b = tf.assign(a, x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(5):
# 5 different values
print(sess.run(x))
# assign the value
sess.run(b)
for i in range(5):
# 5 equal values
print(sess.run(a))
I'm trying to create an incremental classifier that will get trained on data containing n classes for some set number of epochs, then n+m classes for a set number of epochs, then n+m+k, etc, where each successive set of classes contains the previous set as a subset.
In order to do this without having to train the model, save it, manually edit the graph, re-train, repeat, I'm simply defining all the weights I will need to classify the entire set of classes, but keeping the weights corresponding to unseen classes frozen at 0 until the classifier is introduced to those classes.
My strategy for this is to define a placeholder that is fed in an array of Boolean values defining whether or not some given set of weights are trainable.
Relevant code below:
output_train = tf.placeholder(tf.int32, shape = (num_incremental_grps), name = "output_train")
.
.
.
weights = []
biases = []
for i in range(num_incremental_grps):
W = tf.Variable(tf.zeros([batch_size, classes_per_grp]),
trainable=tf.cond(tf.equal(output_train[i], tf.constant(1)),lambda: tf.constant(True), lambda: tf.constant(False)))
weights.append(W)
b = tf.Variable(tf.zeros([classes_per_grp]), trainable=tf.cond(tf.equal(output_train[i],
tf.constant(1)), lambda:tf.constant(True), lambda: tf.constant(False)))
biases.append(b)
out_weights = tf.stack(weights, axis=1).reshape((batch_size, -1))
out_biases = tf.stack(biases, axis=1).reshape((batch_size, -1))
outputs = tf.identity(tf.matmul(inputs, out_weights) + out_biases, name='values')
.
.
.
# Will change this to an array that progressively updates as classes are added.
output_trainable = np.ones(num_incremental_grps, dtype=bool)
.
.
.
with tf.Session() as sess:
init.run()
for epoch in range(epochs):
for iteration in range(iterations):
X_batch, y_batch = batch.getBatch()
fd={X: X_batch, y: y_batch, training: True, output_train: output_trainable}
_, loss_val = sess.run([training_op, loss], feed_dict=fd)
This returns the error message
Using a 'tf.Tensor' as a Python `bool` is not allowed. Use `if t is not None:` instead of
`if t:` to test if a tensor is defined,and use TensorFlow ops such as tf.cond to execute
subgraphs conditioned on the value of a tensor.
I've tried tinkering around with this, like making the initial placeholder datatype tf.bool instead of tf.int32. I've also tried just feeding in a slice of the tensor into the 'trainable' argument in the weights/biases like this
W = tf.Variable(tf.zeros([batch_size, classes_per_grp]), trainable=output_variable[i])
but I get the same error message. I'm not sure how to proceed from here, aside from trying a completely different approach to updating the number of predictable classes. Any help would be much appreciated.
The error occurs because tf.cond takes a decision based on a single boolean — much like an if statement. What you want here is to make a choice per element of your tensor.
You could use tf.where to fix that problem, but then you will run into another one, which is that trainable is not a property that you can fix at runtime, it is part of the definition of a variable. If a variable will be trained at some point, perhaps not at the beginning but definitely later, then it must be trainable.
I would suggest to take a much simpler route: define output_train to be an array of tf.float32
output_train = tf.placeholder(tf.float32, shape=(num_incremental_grps), name="output_train")
then later simply multiply your weights and variables with this vector.
W = tf.Variable(...)
W = W * output_train
...
Provide values of 1 to output_train where you want training to happen, 0 otherwise.
Be careful to also mask your loss to ignore output from unwanted channels, because event though they now always output 0, that may still affect your loss. For example,
logits = ...
logits = tf.matrix_transpose(tf.boolean_mask(
tf.matrix_transpose(logits ),
output_train == 1))
loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)
I want to implement a model like DSSM (Deep Semantic Similarity Model).
I want to train one RNN model and use this model to get three hidden vector for three different inputs, and use these hidden vector to compute loss function.
I try to code in a variable scope with reuse=None like:
gru_cell = tf.nn.rnn_cell.GRUCell(size)
gru_cell = tf.nn.rnn_cell.DropoutWrapper(gru_cell,output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell([gru_cell] * 2, state_is_tuple=True)
embedding = tf.get_variable("embedding", [vocab_size, wordvec_size])
inputs = tf.nn.embedding_lookup(embedding, self._input_data)
inputs = tf.nn.dropout(inputs, 0.5)
with tf.variable_scope("rnn"):
_, self._states_2 = rnn_states_2[config.num_layers-1] = tf.nn.dynamic_rnn(cell, inputs, sequence_length=self.lengths, dtype=tf.float32)
self._states_1 = rnn_states_1[config.num_layers-1]
with tf.variable_scope("rnn", reuse=True):
_, rnn_states_2 = tf.nn.dynamic_rnn(cell,inputs,sequence_length=self.lengths,dtype=tf.float32)
self._states_2 = rnn_states_2[config.num_layers-1]
I use the same inputs and reuse the RNN model, but when I print 'self_states_1' and 'self_states_2', these two vectors are different.
I use with tf.variable_scope("rnn", reuse=True): to compute 'rnn_states_2' because I want to use the same RNN model like 'rnn_states_1'.
But why I get different hidden vectors with the same inputs and the same model?
Where did i go wrong?
Thanks for your answering.
Update:
I find the reason may be the 'tf.nn.rnn_cell.DropoutWrapper' , when I remove the drop out wrapper, the hidden vectors are same, when I add the drop out wrapper, these vector become different.
So, the new question is :
How to fix the part of vector which be 'dropped out' ? By setting the 'seed' parameter ?
When training a DSSM, should I fix the drop out action ?
If you structure your code to use tf.contrib.rnn.DropoutWrapper, you can set variational_recurrent=True in your wrapper, which causes the same dropout mask to be used at all steps, i.e. the dropout mask will be constant. Is that what you want?
Setting the seed parameter in tf.nn.dropout will just make sure that you get the same sequence of dropout masks every time you run with that seed. That does not mean the dropout mask will be constant, just that you'll always see the same dropout mask at a particular iteration. The mask will be different for every iteration.