Error in TF variable singleton variable creation in updating dynamic model - python

Following the papers Progressive Gans (https://arxiv.org/abs/1710.10196), I implement keras.Model that needs to grow in size (layers). I first initialize the full model. But when the time making an inference, I will use only partial of model but the same trainable_variables, e.g. 4x4 then 8x8. So that the trainable_variables passing to train_step which decorated with tf.function will be different. This work properly for computing gradient etc. but not optimizer.apply_gradients.
The code look something like this:
strategy = tf.distribute.MirroredStrategy()
G = Generator()
with strategy.scope():
Opt = keras.optimizers.Adam()
G.initialize_model() # initialize full model
#tf.function
def train_step(optimizer, model, var_to_train):
with tf.GradientTape() as tape:
Loss = loss(model(datasets))
grads = tape.gradient(Loss, variables)
opt.apply_gradients(zip(grads, variables)) # this will raise ValueError
res = 4 # resolution of 4x4
for ep in range(epochs):
if ep % 100 == 0:
res = res * 2
cur_model = G.forward(output_shape=(res, res, 3)) # output for given image resolution
var = cur_model.trainable_variables # this variables will be increasing as we grow model
strategy.run(train_step, args=(Opt, cur_model, var))
Note, however, that this will work fine when train_step is not used in the context of tf.function or in MirroredStrategy. From last section of seem not to solve the problem. I tried tf.distribute.ReplicaContext.all_reduce or any equivalent method for obtaining local results from all replica but it won't work since the trainable_variables are created inside the strategy.scope() so every update must be in the context of Replica.
The only naive solution I could is to train, for example 4x4 model and save it. Then use transfer learning load it back to 8x8 model.
I want to use usual keras optimizer which support any dynamic trainable_variables passed through tf.function context.
[1]: https://www.tensorflow.org/guide/function#creating_tfvariables:~:text=shape%3D()%2C%20dtype%3Dfloat32)

First call optimizer._create_all_weights(var) where the argument var is the full model. This will make the optimizer create all variables and must be done at the beginning before making any updates. In updating gradient it won't create again but still can provide a subset of it. This work in the context of tf.function too.

Related

Access output of intermediate layers in Tensor-flow 2.0 in eager mode

I have CNN that I have built using on Tensor-flow 2.0. I need to access outputs of the intermediate layers. I was going over other stackoverflow questions that were similar but all had solutions involving Keras sequential model.
I have tried using model.layers[index].output but I get
Layer conv2d has no inbound nodes.
I can post my code here (which is super long) but I am sure even without that someone can point to me how it can be done using just Tensorflow 2.0 in eager mode.
I stumbled onto this question while looking for an answer and it took me some time to figure out as I use the model subclassing API in TF 2.0 by default (as in here https://www.tensorflow.org/tutorials/quickstart/advanced).
If somebody is in a similar situation, all you need to do is assign the intermediate output you want, as an attribute of the class. Then keep the test_step without the #tf.function decorator and create its decorated copy, say val_step, for efficient internal computation of validation performance during training. As a short example, I have modified a few functions of the tutorial from the link accordingly. I'm assuming we need to access the output after flattening.
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
self.intermediate=x #assign it as an object attribute for accessing later
x = self.d1(x)
return self.d2(x)
#Remove #tf.function decorator from test_step for prediction
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
return
#Create a decorated val_step for object's internal use during training
#tf.function
def val_step(images, labels):
return test_step(images, labels)
Now when you run model.predict() after training, using the un-decorated test step, you can access the intermediate output using model.intermediate which would be an EagerTensor whose value is obtained simply by model.intermediate.numpy(). However, if you don't remove the #tf_function decorator from test_step, this would return a Tensor whose value is not so straightforward to obtain.
Thanks for answering my earlier question. I wrote this simple example to illustrate how what you're trying to do might be done in TensorFlow 2.x, using the MNIST dataset as the example problem.
The gist of the approach:
Build an auxiliary model (aux_model in the example below), which is so-called "functional model" with multiple outputs. The first output is the output of the original model and will be used for loss calculation and backprop, while the remaining output(s) are the intermediate-layer outputs that you want to access.
Use tf.GradientTape() to write a custom training loop and expose the detailed gradient values on each individual variable of the model. Then you can pick out the gradients that are of interest to you. This requires that you know the ordering of the model's variables. But that should be relatively easy for a sequential model.
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# This is the original model.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=[28, 28, 1]),
tf.keras.layers.Dense(100, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")])
# Make an auxiliary model that exposes the output from the intermediate layer
# of interest, which is the first Dense layer in this case.
aux_model = tf.keras.Model(inputs=model.inputs,
outputs=model.outputs + [model.layers[1].output])
# Define a custom training loop using `tf.GradientTape()`, to make it easier
# to access gradients on specific variables (the kernel and bias of the first
# Dense layer in this case).
cce = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.optimizers.Adam()
with tf.GradientTape() as tape:
# Do a forward pass on the model, retrieving the intermediate layer's output.
y_pred, intermediate_output = aux_model(x_train)
print(intermediate_output) # Now you can access the intermediate layer's output.
# Compute loss, to enable backprop.
loss = cce(tf.one_hot(y_train, 10), y_pred)
# Do backprop. `gradients` here are for all variables of the model.
# But we know we want the gradients on the kernel and bias of the first
# Dense layer, which happens to be the first two variables of the model.
gradients = tape.gradient(loss, aux_model.variables)
# This is the gradient on the first Dense layer's kernel.
intermediate_layer_kerenl_gradients = gradients[0]
print(intermediate_layer_kerenl_gradients)
# This is the gradient on the first Dense layer's bias.
intermediate_layer_bias_gradients = gradients[1]
print(intermediate_layer_bias_gradients)
# Update the variables of the model.
optimizer.apply_gradients(zip(gradients, aux_model.variables))
The most straightforward solution would go like this:
mid_layer = model.get_layer("layer_name")
you can now treat the "mid_layer" as a model, and for instance:
mid_layer.predict(X)
Oh, also, to get the name of a hidden layer, you can use this:
model.summary()
this will give you some insights about the layer input/output as well.

Converting short tensorflow 1.13 script into tensorflow 2.0

I am trying to learn the dynamics of tensorflow2.0 by converting my tensorflow1.13 script (below) into a tensorflow2.0 script. However I am struggling to do this.
I think the main reason why I am struggling is because the examples of tensorflow2.0 I have seen train neural networks and so they have a model which they compile and fit. However in my simple example below I am not using a neural network so I can't see how to adapt this code to tensorflow2.0 (For example, how do I replace session?). Help is much appreciated and thanks in advance.
data = tf.placeholder(tf.int32)
theta = tf.Variable(np.zeros(100))
p_s = tf.nn.softmax(theta)
loss = tf.reduce_mean(-tf.log(tf.gather(p_s, data)))
train_step = tf.train.AdamOptimizer().minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(10):
for datum in sample_data(): #sample_data() is a list of integer datapoints
_ = sess.run([train_step], feed_dict={data:datum})
print(sess.run(p_s))
I have looked at this (which is most relavant) and so far I have come up with the below:
#data = tf.placeholder(tf.int32)
theta = tf.Variable(np.zeros(100))
p_s = tf.nn.softmax(theta)
loss = tf.reduce_mean(-tf.math.log(tf.gather(p_s, **data**)))
optimizer = tf.keras.optimizers.Adam()
for epoch in range(10):
for datum in sample_data():
optimizer.apply_gradients(loss)
print(p_s)
However the above obviously does not run because the placeholder data inside the loss function does not exist anymore - however I am not sure how to replace it. :S
Anyone? Note that I don't have a def forward(x) because my input datum isn't transformed - it is used directly to calculate the loss.
Instead of using the conversion tool (that exists, but I don't like it since it just prefixes (more or less) the API calls with tf.compat.v1 and uses the old Tensoflow 1.x API) I help you convert your code to the new version.
Sessions are disappeared, and so are the placeholders. The reason? The code is executed line by line - that is the Tensorflow eager mode.
To train a model you correctly have to use an optimizer. If you want to use the minimize method, in Tensorflowe 2.0 you have to define the function to minimize (the loss) as a Python callable.
# This is your "model"
theta = tf.Variable(np.zeros(100))
p_s = tf.nn.softmax(theta)
# Define the optimizer
optimizer = tf.keras.optimizers.Adam()
# Define the training loop with the loss inside (because we use the
# .minimnize method that requires a callable with no arguments)
trainable_variables = [theta]
for epoch in range(10):
for datum in sample_data():
# The loss must be callable and return the value to minimize
def loss_fn():
loss = tf.reduce_mean(-tf.math.log(tf.gather(p_s, datum)))
return loss
optimizer.minimize(loss_fn, var_list=trainable_variables)
tf.print("epoch ", epoch, " finished. ps: ", p_s)
Disclaimer: I haven't tested the code - but it should work (or at least give you an idea on how to implement what you're trying to achieve in TF 2)

Custom loss function which depends on another neural network in keras

I have a "How can I do that" question with keras :
Assuming that I have a first neural network, say NNa which has 4 inputs (x,y,z,t) which is already trained.
If I have a second neural network, say NNb, and that its loss function depends on the first neural network.
The custom loss function of NNb customLossNNb calls the prediction of NNa with a fixed grid (x,y,z) and just modify the last variable t.
Here in pseudo-python-code what I would like to do to traine the second NN : NNb:
grid=np.mgrid[0:10:1,0:10:1,0:10:1].reshape(3,-1).T
Y[:,0]=time
Y[:,1]=something
def customLossNNb(NNa,grid):
def diff(y_true,y_pred):
for ii in range(y_true.shape[0]):
currentInput=concatenation of grid and y_true[ii,0]
toto[ii,:]=NNa.predict(currentInput)
#some stuff with toto
return #...
return diff
Then
NNb.compile(loss=customLossNNb(NNa,K.variable(grid)),optimizer='Adam')
NNb.fit(input,Y)
In fact the line that cause me troubles is currentInput=concatenation of grid and y_true[ii,0]
I tried to send to customLossNNb the grid as a tensor with K.variable(grid). But I can't defined a new tensor inside the loss function, something like CurrentY which has a shape (grid.shape[0],1) fill with y[ii,0](i.e. the current t) and then concatenate grid and currentY to build currentInput
Any ideas?
Thanks
You can include your custom loss function into the graph using functional API of keras. The model in this case can be used as a function, something like this:
for l in NNa.layers:
l.trainable=False
x=Input(size)
y=NNb(x)
z=NNa(y)
Predict method will not work, since loss function should be part of the graph, and predict method returns np.array
First, make NNa untrainable. Notice that you should do this recursively if your model has inner models.
def makeUntrainable(layer):
layer.trainable = False
if hasattr(layer, 'layers'):
for l in layer.layers:
makeUntrainable(l)
makeUntrainable(NNa)
Then you have two options:
Attach NNa to the end of your model (notice that both y_true and y_pred will be changed)
Then change your targets (predict with NNa) for correct results since your model is now expecting the output of NNa, not NNb.
Create a custom loss function that uses NNa inside it, without changing your targets
Option 1 - Attaching models
inputs = NNb.inputs
outputs = NNa(NNb.outputs) #make sure NNb is outputing 4 tensors to match NNa inputs
fullModel = Model(inputs,outputs)
#changing the targets:
newY_train = NNa.predict(oldY_train)
Option 2 - Creating a custom loss
Warning: please test whether NNa's weights are really frozen while training this configuration
from keras.losses import binary_crossentropy
def customLoss(true,pred):
true = NNa(true)
pred = NNa(pred)
#use some of the usual losses or create your own
binary_crossentropy(true,pred)
NNb.compile(optimizer=anything, loss = customLoss)

Running a training operation inside another training operation

I want to run a small training operation inside another training operation as follows:
def get_alphas(weights, filters):
alphas = tf.Variable(...)
# Define some loss and training_op here
with tf.Session() as sess:
for some_epochs:
sess.run(training_op)
return tf.convert_to_tensor(sess.run(alphas))
def get_updated_weights(default_weights):
weights = tf.Variable(default_weights)
# Some operation on weights to get filters
# Now, the following will produce errors since weights is not initialized
alphas = get_alphas(weights, filters)
# Other option is to initialize it here as follows
with tf.Session() as sess:
sess.run(tf.variables_initializer([weights]))
calculated_filters = sess.run(filters)
alphas = get_alphas(default_weights, calculated_filters)
return Some operation on alphas and filters
So, what I want to do is to create a Variable by the name of weights. alphas and filters are dynamically dependent (through some training) on weights. Now, as weights are trained, filters will change as it is created through some operations on weights, but alphas also need to change, which can be found only though another training operation.
I will provide the exact functions, if intention is not clear from above.
The trick you describe won't work, because tf.Session.close releases all associated resources, such as variables, queues, and readers. So the result of get_alphas won't be a valid tensor.
The best course of action is to define several losses and training ops (affecting different parts of the graph) and run them within a single session, when you need to.
alphas = tf.Variable(...)
# Define some loss and training_op here
def get_alphas(sess, weights, filters):
for some_epochs:
sess.run(training_op)
# The rest of the training...

What is actually happening when executing a tensorflow graph using python API?

I am a newbie to tensorflow. But I think understanding about tesnorflow core operation is a must. If we use tf python API with object oriented manner we can fist create different graph operations as definition.
def _create_placeholders(self):
""" Step 1: define the placeholders for input and output """
with tf.name_scope("data"):
self.center_words = tf.placeholder(tf.int32, shape=[self.batch_size], name='center_words')
print("Extracting the op",self.center_words.op)
self.target_words = tf.placeholder(tf.int32, shape=[self.batch_size, 1], name='target_words')
print("so",self.center_words.op)
def _create_embedding(self):
""" Step 2: define weights. In word2vec, it's actually the weights that we care about """
# Assemble this part of the graph on the CPU. You can change it to GPU if you have GPU
with tf.device('/cpu:0'):
with tf.name_scope("embed"):
self.embed_matrix = tf.Variable(tf.random_uniform([self.vocab_size,
self.embed_size], -1.0, 1.0),
name='embed_matrix')
def _create_loss(self):
""" Step 3 + 4: define the model + the loss function """
with tf.device('/cpu:0'):
with tf.name_scope("loss"):
# Step 3: define the inference
embed = tf.nn.embedding_lookup(self.embed_matrix, self.center_words, name='embed')
# Step 4: define loss function
# construct variables for NCE loss
nce_weight = tf.Variable(tf.truncated_normal([self.vocab_size, self.embed_size],
stddev=1.0 / (self.embed_size ** 0.5)),
name='nce_weight')
nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name='nce_bias')
# define loss function to be NCE loss function
self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight,
biases=nce_bias,
labels=self.target_words,
inputs=embed,
num_sampled=self.num_sampled,
num_classes=self.vocab_size), name='loss')
Here I have mentioned two definitions which are for creating embedding and calculate loss.
So if I run one of this def with _create_loss() it will create a node in the graph. I went through the tf source code , What I saw was during the graph building stage is in that stage it will load each any every operation to some-kind of buffer.
Then during the session we just re run each and everything with real data.
with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
sess.run(tf.global_variables_initializer())
ckpt = tf.train.get_checkpoint_state(os.path.dirname('c/checkpointsq'))
# if that checkpoint exists, restore from checkpoint
if ckpt and ckpt.model_checkpoint_path:
print("Restoring the checkpoins")
saver.restore(sess, ckpt.model_checkpoint_path)
total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps
writer = tf.summary.FileWriter('./ improved_graph/lr' + str(LEARNING_RATE), sess.graph)
initial_step = model.global_step.eval()
for index in range(1):
centers, targets = batch_gen.__next__()
feed_dict={model.center_words: centers, model.target_words: targets}
loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op],
feed_dict=feed_dict)
Here is my problem. Here in sess.run tensorflow doesn't even care about the python API. It's only care about the graph operation which was loaded from the above graph initialization code. My question is where's all this operations are get executed in a session object. I can understand it's in the core. Do we have any access to that?
I believe the code that builds the backpropagation part of the graph is here.
compute_gradients() is called by minimize(), which is then called by user code.
The scheduling and execution of ops in a already built TensorFlow graph happen inside this function.

Categories