Using tf.timestamp() in the graph in tensorflow v2 - python

I am trying to benchmark specific blocks in my tf model. Therefore, I am trying to use tf.timestamp(). I need to include it in the graph execution so that it will be executed every time I call the model.
I can actually do it by using tf.compat.v1.Print() as following,
x = self.mp1(x)
x = tf.compat.v1.Print(x, [tf.timestamp()])
x = self.c3(x)
But this is printing the value and this causes some overhead. Instead, I want to store it to some variable so that I can work with it after execution. Is there any other way to embed tf.timestamp() to the graph of tf2.

IF you are using this graph for optimization (ex. as a deep learning model),
you will probably run model.fit() function.
then you can use a callback function to save after every epoch.
import tensorflow as tf
from tf.keras.callbacks import ModelCheckpoint
EPOCHS = 10
checkpoint_filepath = '/tmp/checkpoint'
model_checkpoint_callback = ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_acc',
mode='max',
save_best_only=True)
# Model weights are saved at the end of every epoch, if it's the best seen
# so far.
model.fit(epochs=EPOCHS, callbacks=[model_checkpoint_callback])
# The model weights (that are considered the best) are loaded into the model.
model.load_weights(checkpoint_filepath)

I solved it after trying couple of things. What I did is:
Defined a custom layer as
class TimeStamp(layers.Layer):
def __init__(self):
super(TimeStamp, self).__init__()
self.ts = tf.Variable(initial_value=0., dtype=tf.float64, trainable=False)
def call(self, inputs):
self.ts = tf.timestamp()
return tf.identity(inputs)
def getTs(self):
return self.ts
Then used it multiple times in a model, then found the elapsed time by subtracting these self.ts values.

Related

Transferring Exponential Moving Average (EMA) of Tensorflow custom model to another instance of the model

I have made two instances of the same custom model in Tensorflow 2.9.1 (i.e., model = Model() and ema_model = Model()). During the training of model in a custom loop, I want to calculate its EMA and update the ema_model with these variables.
Having checked this solution and also using ema_model.set_weights(model.get_weights()), my attempts were not successful. To be specific, I used them right after the optimization in the train_step function.
In other words, I want the parameters of the model follow the normal training, while the parameters of the ema_model are updated as the decayed version of the model.
Any hits/solution to this problem?
I am trying out the same thing. Here's the solution I have come up with:
class EMA(tf.keras.callbacks.Callback):
def __init__(self, decay=0.996):
super(EMA, self).__init__()
self.decay = decay
# Create an ExponentialMovingAverage object
self.ema = tf.train.ExponentialMovingAverage(decay=self.decay)
def on_train_begin(self, logs=None):
self.ema.apply(self.model.get_layer('anchor_model').trainable_variables)
def on_train_batch_end(self, batch, logs=None):
# Get exponential moving average of anchor model weights.
train_vars = self.model.get_layer('anchor_model').trainable_variables
averages = [self.ema.average(var) for var in train_vars]
# Assign the average weights to target model
target_model_vars = self.model.get_layer('target_model').non_trainable_variables
assert len(target_model_vars) == len(averages)
for i, var in enumerate(target_model_vars):
var.assign(averages[i])
self.ema.apply(self.model.get_layer('anchor_model').trainable_variables)

How does one train a list of models in Torch?

I am trying to train a list of torch neural network models. I am using a list because I want to have any number of models, and be able to iterate through the list. I am currently trying to
for i in range(len(model_list)):
old_model = model_list[i]
new_model = train_model(old_model, data) # train_model take a model, trains it, and returns it
model_list[i] = new_model
However I am getting some kind of scope problem, since the models in the list do not update their parameters. I assume it has something to do with the model I am updating being some kind of clone, though I do not understand why returning the model has no effect. My training code looks like this:
def train_model(model, data):
model_optimizer = optim.Adam(model.parameters())
model_output = model(data)
model_loss = criterion(model_output, target) # lets just say target we get from data
model_loss.backward()
model_optimizer.step()
return model
I don't see why code like this would not work, however the models in the list are not updating (after checking their coeffficients), and the loss does not change. Is this some kind of scope problem with regard to the models or their parameters? Or is there some other problem? Thanks.
I guess your current train_model function is only computing the loss and updating the weights of the models only for one iteration as you return the model always.
So, I suggest you to modify this function in order to train the models for X epochs before returning the model.
Suggested Code:
def train_model(model, data):
model_optimizer = optim.Adam(model.parameters())
for epoch in range(EPOCHS):
for data in train_loader:
model.zero_grad()
model_output = model(data)
model_loss = criterion(model_output, target) # lets just say target we get from data
model_loss.backward()
model_optimizer.step()
return model

Starting training takes a very long time in Tensorflow 2

Tensorflow 2 takes about 15 minutes to make its static graph (or whatever it's doing before the first pass). The training time after this is normal, but obviously it's hard to experiment with 15 mins of waiting for any feedback.
The generator encoder and discriminator are RNNs (not unrolled) with GRU cells in a Keras model.
The generator decoder is defined and called like this:
class GeneratorDecoder(tf.keras.layers.Layer):
def __init__(self, feature_dim):
super(GeneratorDecoder, self).__init__()
self.cell = tf.keras.layers.GRUCell(
GRUI_DIM, activation='tanh', recurrent_activation='sigmoid',
dropout=DROPOUT, recurrent_dropout=DROPOUT)
self.batch_normalization = tf.keras.layers.BatchNormalization()
self.dense = tf.keras.layers.Dense(
feature_dim, activation='tanh')
#tf.function
def __call__(self, z, timesteps, training):
# z has shape (batch_size, features)
outputs = []
output, state = z, z
for i in range(timesteps):
output, state = self.cell(inputs=output, states=state,
training=training)
dense_output = self.dense(
self.batch_normalization(output))
outputs.append(dense_output)
return outputs
Here is my training loop (the mask_gt and missing_data variables are cast using tf.cast and should so already be tensors):
for it in tqdm(range(NO_ITERATIONS)):
print(it)
train_step()
#tf.function
def train_step():
with tf.GradientTape(persistent=True) as tape:
generator_output = generator(missing_data, training=True)
imputed_data = get_imputed_data(missing_data, generator_output)
mask_pred = discriminator(imputed_data)
D_loss = discriminator.loss(mask_pred, mask_gt)
G_loss = generator.loss(missing_data, mask_gt,
generator_output, mask_pred)
gen_enc_grad = tape.gradient(
G_loss, generator.encoder.trainable_variables)
gen_dec_grad = tape.gradient(
G_loss, generator.decoder.trainable_variables)
disc_grad = tape.gradient(
D_loss, discriminator.model.trainable_variables)
del tape
generator.optimizer.apply_gradients(
zip(gen_enc_grad, generator.encoder.trainable_variables))
generator.optimizer.apply_gradients(
zip(gen_dec_grad, generator.decoder.trainable_variables))
discriminator.optimizer.apply_gradients(
zip(disc_grad, discriminator.model.trainable_variables))
Note that "0" is printed within a few seconds, so the slow part is definitely not earlier.
And this is the get_imputed_data function that is called:
def get_imputed_data(incomplete_series, generator_output):
return tf.where(tf.math.is_nan(incomplete_series), generator_output, incomplete_series)
Thanks for any answers! Hope I provided just enough code to give a sense of where the problem lies. This is my first time posting here after reading for at least five years :)
I use Python 3.6 and Tensorflow 2.1.
The problem was solved by removing the tf.function decorator for the calling functions of the generator and discriminator. I was using a single global python scalar (the iteration no.) in two of the tf.function decorated functions. This caused a new graph to be created every time (see the caution in the tf.function docs).
The solution is to drop the python variables used or convert them to tensorflow variables.

Train a single pytorch model on multiple GPUs with some layers fixed?

I met some problems when using pytorch DistributedDataParallel. The situation is:
My model is A, and it has been trained on a single GPU as usual. Suppose that there are three layers in A:
class A(nn.module):
def __init__(self):
super(A,self).__init__()
self.layer0 = layer0
self.layer1 = layer1
self.layer2 = layer2
def forward(self,x):
x=self.layer0(x)
x=self.layer1(x)
x=self.layer2(x)
return x
Now I have some new data. I want to fine-tune A with it on multiple GPUs. I need to wrap A as a multi-GPU model B.
But there are two training stages. In the 1st stage, I want to fix layer0 and layer1 of B. In the 2nd stage, only to fix layer0. Then requires_grad of parameters in layer1 should be changed during training. However, DistributedDataParallel doc says:
You should never try to change your model’s parameters after wrapping up your model with DistributedDataParallel.
In fact, I tried to use B.module to refer A wrapped in B. But the test results were abnormal compared to the single-GPU model. Maybe this way is disallowed.
What should I do? Is there any proper way to wrap my model? And what should be take care for when saving and loading the model?
Just run it on a single machine with multiple GPUs so you can ignore the distributed situation using multiple machines. Many thanks.
Update 2019.12.03
As suggested by #jodag, I tried DataParallel, but it didn't work. This time I didn't change anything in B (except training it) after wrapping it. For simplification, My code is like this (and I refered this):
class B(nn.DataParallel):
def __getattr__(self, name):
try:
return super().__getattr__(name)
except AttributeError:
return getattr(self.module, name)
a = A()
b = B(a,device_ids=[0,1])
b = b.cuda()
trained_param = b.layer2.parameters()
# trained_param = [{'params':b.layer2.parameters()},{'params':b.layer1.parameters()}]
optimizer = optim.Adam(trained_param)
b.train()
...
for x, label in data_loader:
optimizer.zero_grad()
x = x.to(0) # This line can be commented.
y = b(x)
l = loss(y, label)
l.backword()
optimizer.step()
If you only try to optimize part of the parameters, why not try controlling this via the optimizer, rather than the model?
You can leave your model as-is (wrapped in a DistributedDataParallel) and pass only part of its parameters to the relevant optimizer.

tf.assign doesn't update tf.Variable value in custom callback init constructor using model._function_kwargs

I need to create a custom callback to get the target values i.e, y_true and y_pred (predicted values). So, I read the post :Create keras callback to save model predictions and targets for each batch during training
And created my callback the same as they created in the answer
from keras.callbacks import Callback
from keras import backend as K
import tensorflow as tf
class CollectOutputAndTarget(Callback):
def __init__(self):
super(CollectOutputAndTarget, self).__init__()
self.targets = [] # collect y_true batches
self.outputs = [] # collect y_pred batches
# the shape of these 2 variables will change according to batch shape
# to handle the "last batch", specify `validate_shape=False`
self.var_y_true = tf.Variable(0., validate_shape=False)
self.var_y_pred = tf.Variable(0., validate_shape=False)
def on_batch_end(self, batch, logs=None):
# evaluate the variables and save them into lists
self.targets.append(K.eval(self.var_y_true))
self.outputs.append(K.eval(self.var_y_pred))
# build a simple model
# have to compile first for model.targets and model.outputs to be prepared
model = Sequential([Dense(5, input_shape=(10,))])
model.compile(loss='mse', optimizer='adam')
# initialize the variables and the `tf.assign` ops
cbk = CollectOutputAndTarget()
fetches = [tf.assign(cbk.var_y_true, model.targets[0], validate_shape=False),
tf.assign(cbk.var_y_pred, model.outputs[0], validate_shape=False)]
model._function_kwargs = {'fetches': fetches} # use `model._function_kwargs` if using `Model` instead of `Sequential`
When I add on_epoch_end and try to print the value of self.targets.
I get arrays of 0s.
For on_epoch_end the code is as below:
def on_epoch_end(self, epoch, logs={}):
print(self.targets)
My model is created using the functional API Model and has pretrained weights loaded into it, instead of Sequential. And after the compiling the model as model.compile, I instantiate the callback and create the fetches object and add it to train_function as follows:
cbk = CollectOutputAndTarget()
fetches = [tf.assign(cbk.var_y_true, model.targets[0], validate_shape=False),
tf.assign(cbk.var_y_pred, model.outputs[0], validate_shape=False)]
model._function_kwargs = {'fetches': fetches}
And, then I call model.fit_generator using my datagenerator. I am getting 0s in self.targets, which shouldn't happen, if var_y_true and var_y_pred are getting updated with model.targets and model.outputs . Also, I don't understand that if we're already assigning the values for cbk.var_y_true and cbk.var_y_pred, then why do we need to use model._function_kwargs?
I tried using model.train_function = None after setting the fetches and before calling fit_generator, but still I get the same result.

Categories