Im using tensorflow 2.0 and try to speed up my training by optimizing my code a little bit.
I run my model batchwise and want to safe the results from each batch to have all results at the end of one epoch in one tensor.
This is how my code looks like:
...
for epoch in range(start_epoch, end_epoch):
# this vector shall hold all results for one epoch
predictions_epoch = tf.zeros(0,)
for batch in tf_dataset:
# get prediction with predictions_batch.shape[0] euqals batch_size
predictions_batch = model(batch)
# Add the batch result to the previous results
predictions_epoch = tf.concat(predictions_batch, predictions_epoch)
# DO SOME OTHER STUFF LIKE BACKPROB
...
# predictions_epoch.shape[0] now equals number of all samples in dataset
with writer.as_default():
tf.summary.histogram(name='predictions', data=predictions_epoch, step=epoch)
Lets assume, one prediction is just a scalar value. So predictions_batch is a tensor with shape=[batchsize,].
This way of doing the concaternation just works fine.
Now my question is:
Does this tf.concat() operation slow down my whole training? I also used tf.stack()for this purpose, but it seems like no difference in speed.
I wonder, because once I worked with Matlab, adding new values to a Vector (and hence change its size) within a for-loop was extremly slow. Initializing the vector with zeros and then assign values in the loop was way more efficient regarding speed.
Is this also true for tensorflow? Or is there another more 'proper' way of doing something like adding tensors together in a for-loop which is more clean or faster?
I did not find any alternative solution online.
Thanks for the help.
Yes, this is not the most recommendable way to do it. It is better to simply add each tensor to a list and concatenate them once at the end:
for epoch in range(start_epoch, end_epoch):
predictions_batches = []
for batch in tf_dataset:
predictions_batch = model(batch)
predictions_batches.append(predictions_batch)
# ...
predictions_epoch = tf.concat(predictions_batches)
You can also use a tf.TensorArray, which may be better if you want to decorate the code with tf.function.
for epoch in range(start_epoch, end_epoch):
# Pass arguments as required
# If the number of batches is know or an upper bound
# can be estimated use that and dynamic_size=False
predictions_batches = tf.TensorArray(
tf.float32, INTIAL_SIZE, dynamic_size=True, element_shape=[BATCH_SIZE])
i = tf.constant(0)
for batch in tf_dataset:
predictions_batch = model(batch)
predictions_batches = predictions_batches.write(i, predictions_batch)
i += 1
# ...
predictions_epoch = predictions_batches.concat()
Related
I am implementing an encoder-(dual-)decoder model in tensorflow. The decoder is RNN-type. The input to the decoder is a feature map, the output of the previous time-step and the hidden state of the decoder from the previous time-step. I only only want to trigger the decoder(s) when the prediction from the previous time-step is one of a particular set of tokens.
I have tried using tf.boolean_mask on the prediction of the previous time-step to remove those examples that do not predict a trigger-token. Below is an example:
# initialize input
dec_input = tf.expand_dims([token2integer['<start>']] * target.shape[0], 1)
features = encoder(img_tensor)
hidden = decoder.reset_state(batch_size=target.shape[0])
# make first prediction
predictions, hidden, _ = decoder(dec_input, features, hidden)
# add to total loss
loss += loss_function(dec_input, predictions)
# construct input of next time-step (here with teacher forcing)
dec_input =tf.expand_dims(target[:, i], -1)
#compute mask to only trigger for certain predictions
mask_struc = compute_mask_struc(dec_input)
# apply mask to input
features = tf.boolean_mask(features, mask_struc)
hidden = tf.boolean_mask(hidden, mask_struc)
target = tf.boolean_mask(target, mask_struc )
dec_input = tf.boolean_mask(dec_input, mask_struc )
# make next prediction and so on ...
I have implemented this into a training function. My implementation is working but it is slow. And when I run the function as a graph (with #tf.function) it gets 10x slower. If I remove the boolen_mask and run as a graph (with #tf.function) it is faster than without the #tf.function.
How can I speed up the execution (with or without the #tf.function)?
My ideas:
fix whatever is making the graph execution slow: I don't know how.
find alternative approach (without boolean_mask): I need inspiration
give up and try with PyTorch which I am more familiar with: not guaranteed to be faster.
I am programming a custom optimizer now, and the legth of bias in first dimension is not certain because last batch have no enough data to bulid a batch. So the initialization of weights with fixed batch_size do not satisfy for torch.add between the last batch and the fixed length weights.
bias = torch.randn(batch_size,units)
batch_data = generator(path)
# for example
weights.shape # is (128,256)
# but the last batch has only 50 samples.
out = sigmoid(x*weights+bias) # where the length of first dimension is not mathed.
So, I wonder whether I can create a tensor where the lenght of some dimension could be variable, like variable length list.
Why do you want bias to depend on batch size? In test time, would you always test your net with batches of exactly the same size? If so, what is the meaning of a smaller batch?
If you still insist on using smaller batch, you can ignore the "un-used" entries of bias:
out = sigmoid(x * weights[:x.size(0), ...] + bias[:x.size(0), ...])
This link may be helpful for you: How to initialize weights in PyTorch?
If you use Pytorch's built in data loader class in pytorch. It will generate an iterator to be used to automatically handle batching. The batch size should be explicitly set up front by passing the 'batch_size' keyword to your data loader.
The last batch will be smaller if the dataset isn't divisible by the batch size unless drop last is explicitly set to true for the data loader.
Bias's don't work like this and don't depend on the size of the dataset or batch size.
I want to use Tensorflow to calculate the gradients of a function. However, if I use the tf.gradients function, it returns a single list of gradients. How to return a list for each point of the batch?
# in a tensorflow graph I have the following code
tf_x = tf.placeholder(dtype=tf.float32, shape=(None,N_in), name='x')
tf_net #... conveniently defined neural network
tf_y = tf.placeholder(dtype=tf.float32, shape=(None,1), name='y')
tf_cost = (tf_net(tf_x) - tf_y)**2 # this should have length N_samples because I did not apply a tf.reduce_mean
tf_cost_gradients = tf.gradients(tf_cost,tf_net.trainable_weights)
If we run it in a tensorflow session,
# suppose myx = np.random.randn(N_samples,N_in) and myy conveniently chosen
feed = {tf_x:myx, tx_y:myy}
sess.run(tf_cost_gradients,feed)
I get only one list, and not a list for each sample as I would like. I can use
for i in len(myx):
feed = {tf_x:myx[i], tx_y:myy[i]}
sess.run(tf_cost_gradients,feed)
but this is extremely slow! What can I do? Thank you
Although, there is an 'aggregation_method' parameter in tf.gradients, it is not easy to get the individual gradients.
aggregation_method: Specifies the method used to combine gradient terms.
Please see these threads:
https://github.com/tensorflow/tensorflow/issues/15760
https://github.com/tensorflow/tensorflow/issues/4897
In one of the threads(#4897), Ian Goodfellow makes the following suggestion to speed up individual gradient computation:
This is only pseudocode, but basic idea is:
examples = tf.split(batch)
weight_copies = [tf.identity(weights) for x in examples]
output = tf.stack(f(x, w) in zip(examples, weight_copies))
cost = cost_function(output)
per_example_gradients = tf.gradients(cost, weight_copies)
I'm implementing an algorithm involving alternating optimization. That is, at each iteration, the algorithm fetches a data batch, and uses the data batch to optimize two losses sequentially. My current implementation with tf.data.Dataaset and tf.data.Iterator is something like this (which is indeed incorrect as detailed below):
data_batch = iterator.get_next()
train_op_1 = get_train_op(data_batch)
train_op_2 = get_train_op(data_batch)
for _ in range(num_steps):
sess.run(train_op_1)
sess.run(train_op_2)
Note that the above is incorrect because each call of sess.run will advance the iterator to get next data batch. So train_op_1 and train_op_2 are indeed using different data batches.
I cannot do something like sess.run([train_op_1, train_op_2]) either, because the two optimization steps need to be sequential (i.e., the 2nd optimization step depends on the latest variable value by the 1st optimization step.)
I'm wondering is there any way to somehow "freeze" the iterator, so that it won't advance in a sess.run call?
I was doing something similar so that is part of my code stripped from some unnecessary stuff. It does a bit more as it has train and validation iterators, but you should get the idea of using is_keep_previous flag. Basically passed as True it fill force reuse of the previous value of the iterator, in case of False it will get new value.
iterator_t = ds_t.make_initializable_iterator()
iterator_v = ds_v.make_initializable_iterator()
iterator_handle = tf.placeholder(tf.string, shape=[], name="iterator_handle")
iterator = tf.data.Iterator.from_string_handle(iterator_handle,
iterator_t.output_types,
iterator_t.output_shapes)
def get_next_item():
# sometimes items need casting
next_elem = iterator.get_next(name="next_element")
x, y = tf.cast(next_elem[0], tf.float32), next_elem[1]
return x, y
def old_data():
# just forward the existing batch
return inputs, target
is_keep_previous = tf.placeholder_with_default(tf.constant(False),shape=[], name="keep_previous_flag")
inputs, target = tf.cond(is_keep_previous, old_data, new_data)
with tf.Session() as sess:
sess.run([tf.global_variables_initializer(),tf.local_variables_initializer()])
handle_t = sess.run(iterator_t.string_handle())
handle_v = sess.run(iterator_v.string_handle())
# Run data iterator initialisation
sess.run(iterator_t.initializer)
sess.run(iterator_v.initializer)
while True:
try:
inputs_, target_ = sess.run([inputs, target], feed_dict={iterator_handle: handle_t, is_keep_previous:False})
print(inputs_, target_)
inputs_, target_ = sess.run([inputs, target], feed_dict={iterator_handle: handle_t, is_keep_previous:True})
print(inputs_, target_)
inputs_, target_ = sess.run([inputs, target], feed_dict={iterator_handle: handle_v})
print(inputs_, target_)
except tf.errors.OutOfRangeError:
# now we know we run out of elements in the validationiterator
break
Use control dependencies when building the graph for train_op_2 so it can see the updated values of the variables.
Or use eager execution.
I want to initialize a w_gate tensor with a custom np.array as in the code below:
w_init = np.ones(shape=(dim, self.config.nmodels)) / self.config.nmodels
w_gate = tf.Variable(
name="W",
initial_value=w_init,
dtype=tf.float32)
Every a certain number of train iterations, I want w_gate to be re-initialized again to the w_init array. For this, and based on Re-initialize variables in Tensorflow, I tried
sess.run(tf.variables_initializer([w_gate]))
inside my training loop. This line is executed every certain number of iterations. Although, w_gate doesn't seem to be re-initialized. What am I missing here?
Could you try this and check ?
w_gate_assign = tf.assign(w_gate, w_init)
sess.run(w_gate_assign)