Related
I'm trying to build a workflow that uses tf.data.dataset batches and an iterator. For performance reasons, I am really trying to avoid using the placeholder->feed_dict loop workflow.
The process I'm trying to implement involves grad-cam (which requires the gradient of the loss with respect to the final convolutional layer of a CNN) as an intermediate step, and ideally I'd like to be able to try it out on several Keras pre-trained models, including non-sequential ones like ResNet.
Most implementations of grad-cam that I've found rely on hand-crafting the CNN of interest in tensorflow. I found one implementation, https://github.com/jacobgil/keras-grad-cam, that is made for keras models, and following that example, I get
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
The problem is that gradient_function([batch]) returns a numpy array whose value is determined by the first batch, so that heatmap doesn't change with subsequent evaluations.
I've tried replacing K.function with a Model in various ways, but nothing seems to work. I usually end up either with an error suggesting that grads evaluates to None or that one model or another is expecting a feed_dict and not receiving one.
Is this code salvageable? Is there a better way to do this besides looping through the data several times (once to get all the grad-cams and then again once I have them) or using placeholders and feed_dicts?
Edit:
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
# other operations on heatmap and batch ...
# ...
output_function = K.function(model.input, [node1, ..., nodeN])
for batch in range(n_batches):
outputs1, ... , outputsN = output_function(batch)
Gives me the desired outputs for each batch.
Yes, K.function returns numpy arrays because it evaluates the symbolic computation in your graph. What I think you should do is to keep everything symbolic up to K.function, and after getting the gradients, perform all computations of the Grad-CAM weights and final saliency map using numpy.
Then you can iterate on your dataset, evaluate gradient_function on a new batch of data, and compute the saliency map.
If you want to keep everything symbolic, then you should not use K.function to produce the gradient function, but use the symbolic gradient (the output of K.gradient, without lambda) and convolutional feature maps (conv_output) and perform the saliency map computation on top of that, and then build a function (using K.function) that takes the model input, and outputs the saliency map.
Hope the explanation is enough.
My question is about restoring the Denoised Trained Model.
I have my network defined in the following way.
Conv1->relu1->Conv2->relu2->Conv3->relu3->Deconv1
The tf.variable_scope(name) is same as above.
Now I have my loss, optimizer and accuracy defined with tf.name_scope.
When I try to restore loss function, It will ask even for labels (which I don't have).
feed_dict={x:input, y:labels}
sess.run('loss',feed_dict)
Can anyone please help me understand how to test this? Which operation should I restore ?
Should I have to call all layers, pass the input and check the loss(MSE)?
I checked many examples but it seems to be all Classification problem and defining softmax with logits at last works.
Edit:
Below is my code and now it is easily visible how tf.name_scope and tf.variable_scope is defined. I feel I may have to bring whole layer to test new Image. Is that right?
def new_conv_layer(input, num_input_channels, filter_size, num_filters, name):
with tf.variable_scope(name):
# Shape of the filter-weights for the convolution
shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights (filters) with the given shape
weights = tf.Variable(tf.truncated_normal([filter_size, filter_size, num_input_channels, num_filters], stddev=0.5))
# Create new biases, one for each filter
biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))
filters = tf.Variable(tf.truncated_normal([filter_size, filter_size, num_input_channels, num_filters], stddev=0.5))
# TensorFlow operation for convolution
layer = tf.nn.conv2d(input=input, filter=filters, strides=[1,1,1,1], padding='SAME')
# Add the biases to the results of the convolution.
layer += biases
return layer, weights
def new_relu_layer(input, name):
with tf.variable_scope(name):
#TensorFlow operation for convolution
layer = tf.nn.relu(input)
return layer
def new_pool_layer(input, name):
with tf.variable_scope(name):
# TensorFlow operation for convolution
layer = tf.nn.max_pool(value=input, ksize=[1, 1, 1, 1], strides=[1, 1, 1, 1], padding='SAME')
return layer
def new_layer(inputs, filters,kernel_size,strides,padding, name):
with tf.variable_scope(name):
layer = tf.layers.conv2d_transpose(inputs=inputs, filters=filters , kernel_size=kernel_size, strides=strides, padding=padding, data_format = 'channels_last')
return layer
layer_conv1, weights_conv1 = new_conv_layer(input=yTraininginput, num_input_channels=1, filter_size=5, num_filters=32, name ="conv1")
layer_relu1 = new_relu_layer(layer_conv1, name="relu1")
layer_conv2, weights_conv2 = new_conv_layer(input=layer_relu1, num_input_channels=32, filter_size=5, num_filters=64, name ="conv2")
layer_relu2 = new_relu_layer(layer_conv2, name="relu2")
layer_conv3, weights_conv3 = new_conv_layer(input=layer_relu2, num_input_channels=64, filter_size=5, num_filters=128, name ="conv3")
layer_relu3 = new_relu_layer(layer_conv3, name="relu3")
layer_deconv1 = new_layer(inputs=layer_relu3, filters=1, kernel_size=[5,5] ,strides=[1,1] ,padding='same',name = 'deconv1')
layer_relu4 = new_relu_layer(layer_deconv1, name="relu4")
layer_conv4, weights_conv4 = new_conv_layer(input=layer_relu4, num_input_channels=1, filter_size=5, num_filters=128, name ="conv4")
layer_relu5 = new_relu_layer(layer_conv4, name="relu5")
layer_deconv2 = new_layer(inputs=layer_relu5, filters=1, kernel_size=[5,5] ,strides=[1,1] ,padding='same',name = 'deconv2')
layer_relu6 = new_relu_layer(layer_deconv2, name="relu6")
# Use Cross entropy cost function
with tf.name_scope("loss"):
cross_entropy = tf.losses.mean_squared_error(labels = xTraininglabel,predictions = layer_relu6)
# Use Adam Optimizer
with tf.name_scope("optimizer"):
optimizer = tf.train.AdamOptimizer(learning_rate=1e-6).minimize(loss = cross_entropy)
# Accuracy
with tf.name_scope("accuracy"):
accuracy = tf.image.psnr(a=layer_relu6,b=xTraininglabel,max_val=1.0)
Try to view the graph of your code on tensorboard, get the operation name from the last layer(in your case deconv4). Something like below image.
Try loading the tensor, using below code:
operation = graph.get_tensor_by_name("<operationname:0>")
This should work, as your layers are interconnected.
Let me know if this worked!
Operation Image
I understand that bias are required in small networks, to shift the activation function. But in the case of Deep network that has multiple layers of CNN, pooling, dropout and other non -linear activations, is Bias really making a difference? The convolutional filter is learning local features and for a given conv output channel same bias is used.
This is not a dupe of this link. The above link only explains role of bias in small neural network and does not attempt to explain role of bias in deep-networks containing multiple CNN layers, drop-outs, pooling and non-linear activation functions.
I ran a simple experiment and the results indicated that removing bias from conv layer made no difference in final test accuracy.
There are two models trained and the test-accuracy is almost same (slightly better in one without bias.)
model_with_bias,
model_without_bias( bias not added in conv layer)
Are they being used only for historical reasons?
If using bias provides no gain in accuracy, shouldn't we omit them? Less parameters to learn.
I would be thankful if someone who have deeper knowledge than me, could explain the significance(if- any) of these bias in deep networks.
Here is the complete code and the experiment result bias-VS-no_bias experiment
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64
graph = tf.Graph()
with graph.as_default():
# Input data.
tf_train_dataset = tf.placeholder(
tf.float32, shape=(batch_size, image_size, image_size, num_channels))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Variables.
layer1_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, num_channels, depth], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([depth]))
layer2_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, depth, depth], stddev=0.1))
layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
layer3_weights = tf.Variable(tf.truncated_normal(
[image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.truncated_normal(
[num_hidden, num_labels], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
# define a Model with bias .
def model_with_bias(data):
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer1_biases)
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
# define a Model without bias added in the convolutional layer.
def model_without_bias(data):
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv ) # layer1_ bias is not added
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv) # + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
# bias are added only in Fully connected layer(layer 3 and layer 4)
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
# Training computation.
logits_with_bias = model_with_bias(tf_train_dataset)
loss_with_bias = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_with_bias))
logits_without_bias = model_without_bias(tf_train_dataset)
loss_without_bias = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_without_bias))
# Optimizer.
optimizer_with_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_with_bias)
optimizer_without_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_without_bias)
# Predictions for the training, validation, and test data.
train_prediction_with_bias = tf.nn.softmax(logits_with_bias)
valid_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_valid_dataset))
test_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_test_dataset))
# Predictions for without
train_prediction_without_bias = tf.nn.softmax(logits_without_bias)
valid_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_valid_dataset))
test_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_test_dataset))
num_steps = 1001
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized')
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
batch_labels = train_labels[offset:(offset + batch_size), :]
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
session.run(optimizer_with_bias, feed_dict=feed_dict)
session.run(optimizer_without_bias, feed_dict = feed_dict)
print('Test accuracy(with bias): %.1f%%' % accuracy(test_prediction_with_bias.eval(), test_labels))
print('Test accuracy(without bias): %.1f%%' % accuracy(test_prediction_without_bias.eval(), test_labels))
Output:
Initialized
Test accuracy(with bias): 90.5%
Test accuracy(without bias): 90.6%
Biases are tuned alongside weights by learning algorithms such as
gradient descent. biases differ from weights is that they are
independent of the output from previous layers. Conceptually bias is
caused by input from a neuron with a fixed activation of 1, and so is
updated by subtracting the just the product of the delta value and
learning rate.
In a large model, removing the bias inputs makes very little difference because each node can make a bias node out of the average activation of all of its inputs, which by the law of large numbers will be roughly normal. At the first layer, the ability for this to happens depends on your input distribution. On a small network, of course you need a bias input, but on a large network, removing it makes almost no difference.
Although in a large network it has no difference, it still depends on network architecture. For instance in LSTM:
Most applications of LSTMs simply initialize the LSTMs with small
random weights which works well on many problems. But this
initialization effectively sets the forget gate to 0.5. This
introduces a vanishing gradient with a factor of 0.5 per timestep,
which can cause problems whenever the long term dependencies are
particularly severe. This problem is addressed by simply initializing the
forget gates bias to a large value such as 1 or 2. By doing so, the
forget gate will be initialized to a value that is close to 1,
enabling gradient flow.
See also:
The rule of bias in Neural network
What is bias in Neural network
An Empirical Exploration of Recurrent Network Architectures
In most networks you have a batchnorm layer after the conv layer, which has a bias. So if you have a batchnorm layer there is no gain. See:
Can not use both bias and batch normalization in convolution layers
Otherwise, from a math perspective you are learning different functions. However, it turns out that in particular if you have a very complex network for a simple problem, you might achieve almost the same thing without biases than with biases but ending up using more parameters. In my experience, using a factor of 2-4 more parameters than needed rarely hurts performance in deep learning - in particular if you regularize. So, it is hard to notice any difference. However, you might try to use few channels (I don't think depth of the network matters as much as number of channels of the convolution) and see if bias make a difference. I would guess so.
I'm a beginner in deep learning and have taken a few courses on Udacity. Recently I'm trying to build a deep network detecting hand joints in the input depth images, which doesn't seem to be working well. (My dataset is ICVL Hand Posture Dataset)
The network structure is shown here.
① A batch of input images, 240x320;
② An 8-channel convolutional layer with a 5x5 kernel;
③ A max pooling layer, ksize = stride = 2;
④ A fully-connected layer, weight.shape = [38400, 1024];
⑤ A fully-connected layer, weight.shape = [1024, 48].
After several epochs of training, the output of the last layer converges as a (0, 0, ..., 0) vector. I chose the mean square error as the loss function and its value stayed above 40000 and didn't seem to reduce.
The network structure is already too simple to be simplified again but the problem remains. Could anyone offer any suggestions?
My main code is posted below:
image = tf.placeholder(tf.float32, [None, 240, 320, 1])
annotations = tf.placeholder(tf.float32, [None, 48])
W_convolution_layer1 = tf.Variable(tf.truncated_normal([5, 5, 1, 8], stddev=0.1))
b_convolution_layer1 = tf.Variable(tf.constant(0.1, shape=[8]))
h_convolution_layer1 = tf.nn.relu(
tf.nn.conv2d(image, W_convolution_layer1, [1, 1, 1, 1], 'SAME') + b_convolution_layer1)
h_pooling_layer1 = tf.nn.max_pool(h_convolution_layer1, [1, 2, 2, 1], [1, 2, 2, 1], 'SAME')
W_fully_connected_layer1 = tf.Variable(tf.truncated_normal([120 * 160 * 8, 1024], stddev=0.1))
b_fully_connected_layer1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_pooling_flat = tf.reshape(h_pooling_layer1, [-1, 120 * 160 * 8])
h_fully_connected_layer1 = tf.nn.relu(
tf.matmul(h_pooling_flat, W_fully_connected_layer1) + b_fully_connected_layer1)
W_fully_connected_layer2 = tf.Variable(tf.truncated_normal([1024, 48], stddev=0.1))
b_fully_connected_layer2 = tf.Variable(tf.constant(0.1, shape=[48]))
detection = tf.nn.relu(
tf.matmul(h_fully_connected_layer1, W_fully_connected_layer2) + b_fully_connected_layer2)
mean_squared_error = tf.reduce_sum(tf.losses.mean_squared_error(annotations, detection))
training = tf.train.AdamOptimizer(1e-4).minimize(mean_squared_error)
# This data loader reads images and annotations and convert them into batches of numbers.
loader = ICVLDataLoader('../data/')
with tf.Session() as session:
session.run(tf.global_variables_initializer())
for i in range(1000):
# batch_images: a list with shape = [BATCH_SIZE, 240, 320, 1]
# batch_annotations: a list with shape = [BATCH_SIZE, 48]
[batch_images, batch_annotations] = loader.get_batch(100).to_1d_list()
[x_, t_, l_, p_] = session.run([x_image, training, mean_squared_error, detection],
feed_dict={images: batch_images, annotations: batch_annotations})
And it runs like this.
The main issue is likely the relu activation in the output layer. You should remove this, i.e. let detection simply be the results of a matrix multiplication. If you want to force the outputs to be positive, consider something like the exponential function instead.
While relu is a popular hidden activation, I see one major problem with using it as an output activation: As is well known relu maps negative inputs to 0 -- however, crucially, the gradients will also be 0. This happening in the output layer basically means your network cannot learn from its mistakes when it produces outputs < 0 (which is likely to happen with random initializations). This will likely heavily impair the overall learning process.
I defined a funciton in tensorflow as follows:
def generator(keep_prob, z, out_channel_dim, alphag1, is_train=True):
"""
Create the generator network
:param z: Input z
:param out_channel_dim: The number of channels in the output image
:param is_train: Boolean if generator is being used for training
:return: The tensor output of the generator
"""
# TODO: Implement Function
# when it is training reuse=False
# when it is not training reuse=True
alpha=alphag1
with tf.variable_scope('generator',reuse=not is_train):
layer = tf.layers.dense(z, 3*3*512,activation=None,\
kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False))
layer = tf.reshape(layer, [-1, 3,3,512])
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.0001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 256, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.00001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 128, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.000001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 64, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.0000001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, out_channel_dim, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.00000001, dtype=tf.float32)
layer = tf.tanh(layer)
return layer
This is complicated such that to track each variable in each layer is difficult.
I later used tf.train.Saver() and saver.save to save everything after training.
Now I would like to restore this function so that I can use it to do further manipulations while keeping the trained weigts of each layer unchanged.
I found online that most function like tf.get_default_graph().get_tensor_by_name or some other functions were limited to restore only the values of the variables but not this function.
For example the input z of this function generator(keep_prob, z, out_channel_dim, alphag1, is_train=True) is a tensor from another function.
I want to restore this function so that I can use two new tensors z1 and z2 with she same shape as z.
layer1 = generator(keep_prob, z1, out_channel_dim, alphag1, is_train=False)
layer2 = generator(keep_prob, z2, out_channel_dim, alphag1, is_train=False)
layer = layer1 - layer2
and I can put this new tensor layer into another function.
Here layer1 and layer2 use the function with the saved weights.
The thing that is difficlut is that when I use the function generator I have to specifiy it with the trianed weights which was stored using Saver(). I find it difficult to specify this function with its weights. For, 1. too many layers to track off and 2. I don't know how to specify weights for tf.layers.conv2().
So are there anyone who know how to solve this issue?
This is a general question:
I save the whole model into file and need to restore part of the model into part of new model.
Here name_map is a dict:the key is new name in graph and value is name in ckpt file.
def get_restore_saver(self, name_map, restore_optimise_var=True):
var_grp = {_.op.name:_ for _ in tf.global_variables()}
varm = {}
for main_name in var_grp:
if main_name in name_map:
varm[name_map[main_name]] = var_grp[main_name]
elif restore_optimise_var: # I use adam to optimise
var_arr = main_name.split('/')
tail = var_arr[-1]
_ = '/'.join(var_arr[: -1])
if tail in ['Adam', 'Adam_1', 'ExponentialMovingAverage'] and _ in name_map:
varm[name_map[_] + "/" + tail] = var_grp[main_name]
return tf.train.Saver(varm)
Why do you need to restore the function and what does that even mean? If you need to use a model, you have to restore the corresponding graph. What your function does is defining nodes of the graph. You may use your function to build or rebuild that graph again and then load weights stored somewhere using Saver() or you may restore graph from the protobuf file.
In order to rebuild the graph, try to invoke invoke your function somewhere output_layer=generator(keep_prob, z, out_channel_dim, alphag1, is_train=True) and than use Saver class as usual to restore weights. Your function does not compute, it defines a part or whole of the graph. All computations are performed by the graph.
In the last case you will find useful the following thread. Usually, you will need to know names of the input and output layers. That can be obtained by the code:
[n.name for n in tf.get_default_graph().as_graph_def().node]
After a long time of searching, it seems that maybe the following is a solution.
Define all the variables in advance,i.e.layer1 = generator(keep_prob, z1,
out_channel_dim, alphag1, is_train=False)
layer2 = generator(keep_prob, z2, out_channel_dim, alphag1, is_train=False)
layer = layer1 - layer2.
Now you can use tf.get_collection to find the operators.
It seems that tensorflow will not give you the pre defined functions. It keeps the graph and values only but not in the form of function. One needs to set everything needed in the furture in the graph or one should keep track of every weights, even too many.