tensorflow copy variable but not trainable to pretrain next layers

tensorflow copy variable but not trainable to pretrain next layers - python

I want to implement the autoencoder (to be exact stacked convolutional autoencoder)
here I'd like to pretrain each layer first and then fine-tuning
So I created variables for weight of each layer
ex. W_1 = tf.Variable(initial_value, name,trainable=True etc) for first layer
and I pretrained W_1 of first layer
Then I want to pretrain weight of second layer (W_2)
Here I should use W_1 for calculating input of second layer.
However W_1 is trainable so if I use W_1 directly then tensorflow may train W_1 together.
So I should create W_1_out that keep value of W_1 but not trainable
To be honest I tried to modify code of this site
https://github.com/cmgreen210/TensorFlowDeepAutoencoder/blob/master/code/ae/autoencoder.py
At line 102 it creates variable by following code
self[name_w + "_fixed"] = tf.Variable(tf.identity(self[name_w]),
name=name_w + "_fixed",
trainable=False)
However it calls error cause it use uninitialized value
How should I do to copy variable but make it not trainable to pretrain next layers??

Not sure if still relevant, but I'll try anyway.
Generally, what I do in a situation like that is the following:
Populate the (default) graph according to the model you are building, e.g. for the first training step just create the first convolutional layer W1 you mention. When you train the first layer you can store the saved model once training is finished, then reload it and add the ops required for the second layer W2. Or you can just build the whole graph for W1 from scratch again directly in the code and then add the ops for W2.
If you are using the restore mechanism provided by Tensorflow, you will have the advantage that the weights for W1 are already the pre-trained ones. If you don't use the restore mechanism, you will have to set the W1 weights manually, e.g. by doing something shown in the snippet further below.
Then when you set up the training op, you can pass a list of variables as var_list to the optimizer which explicitly tells the optimizer which parameters are updated in order to minimize the loss. If this is set to None (the default), it just uses what it can find in tf.trainable_variables() which in turn is a collection of all tf.Variables that are trainable. May be check this answer, too, which basically says the same thing.
When using the var_list argument, graph collections come in handy. E.g. you could create a separate graph collection for every layer you want to train. The collection would contain the trainable variables for each layer and then you could very easily just retrieve the required collection and pass it as the var_list argument (see example below and/or the remark in the above linked documentation).
How to override the value of a variable: name is the name of the variable to be overriden, value is an array of the appropriate size and type and sess is the session:
variable = tf.get_default_graph().get_tensor_by_name(name)
sess.run(tf.assign(variable, value))
Note that the name needs an additional :0 in the end, so e.g. if the weights of your layer are called 'weights1' the name in the example should be 'weights1:0'.
To add a tensor to a custom collection: Use something along the following lines:
tf.add_to_collection('layer1_tensors', weights1)
tf.add_to_collection('layer1_tensors', some_other_trainable_variable)
Note that the first line creates the collection because it does not yet exist and the second line adds the given tensor to the existing collection.
How to use the custom collection: Now you can do something like this:
# loss = some tensorflow op computing the loss
var_list = tf.get_collection_ref('layer1_tensors')
optim = tf.train.AdamOptimizer().minimize(loss=loss, var_list=var_list)
You could also use tf.get_collection('layer_tensors') which would return you a copy of the collection.
Of course, if you don't wanna do any of this, you could just use trainable=False when creating the graph for all variables you don't want to be trainable as you hinted towards in your question. However, I don't like that option too much, because it requires you to pass in booleans into the functions that populate your graph, which is very easily overlooked and thus error-prone. Also, even if you decide to it like that, you would still have to restore the non-trainable variables manually.

Related

How can I obtain the output of an intermediate layer (feature extraction)?

I want to extract features of a optical image and save them into numpy array . I've seen similar questions , also can be seen here : https://keras.io/getting_started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer-feature-extraction , but don't know how to go about it .

Keras documentation does exaclty specify how to do that. If you have defined your model model_full you can create another one, that is just a part of it - from the input layer, to the one you're interested in.
model_part = Model(
inputs=model_full.input,
outputs=model_full.get_layer("intermed_layer").output)
Then you should be able to obtain output from intermediate layer using:
intermed_output = model_part(data)
In order to do that, you just need a model_full defined, which I assume you already have.
2nd approach
You can also use built-in Keras function, which I guess you already saw in documentation as well. It may look kind of complicated at first, but it's just creating a function with bound values i.e.
from keras import backend as K
get_3rd_layer_output = K.function(
[model.layers[0].input], # param 1 will be treated as layer[0].output
[model.layers[3].output]) # and this function will return output from 3rd layer
# here X is param 1 (input) and the function returns output from layers[3]
output = get_3rd_layer_output([X])[0]
Clearly, again model has to be defined. Not sure if there are any other requirements apart from that.

Easiest way to see the output of a hidden layer in Tensorflow/Keras?

I am working on a GAN and I'm trying to diagnose how any why mode collapse occurs. I want to be able to look "under the hood" and see what the outputs of various layers in the network look like for the last minibatch. I saw you can do something like model.layers[5].output, but this produces a tensor of shape [None, 64, 64, 512], which looks like an empty tensor and not the actual output from the previous run. My only other idea is to recompile a model that's missing all the layers after the one I'm interested in and then run a minibatch through, but this seems like an extremely inefficient way to do it. I'm wondering if there's an easier way. I want to run some statistics on layer outputs during the training process to see where things might be going wrong.

I did this for a GAN I was training myself. The method I used extends to both the generator (G) and discriminator (D) of a GAN.
The idea is to make a model with the same input as D or G, but with outputs according to each layer in the model that you require.
For me, I found it useful to check the activations. In Keras, with some model model (which will be D or G for you and me)
activation_layers = []
activation_names = []
# obtain the layers in a given model, but skip the first 6
# as these generally are the input / non-convolutional layers
model_layers = [layer for layer in sub_model.layers][6:]
# print the names of what we are looking at.
print("MODEL LAYERS:", model_layers)
for layer in model_layers:
# check if the layer is an activation
if isinstance(layer, (Activation, LeakyReLU)):
# append the output of this layer for later
activation_layers.append(layer.output)
# name it with a signature of its output for clarity
activation_names.append(layer.name + str(layer.output_shape[1]))
# now create a model which outputs every activation
activation_model = Model(inputs=model.inputs, outputs=activation_layers)
# this outputs a list of layers when given an input, so for G
noise = np.random.normal(size=(1,32,32,1)) # random image shape (change for yourself)
model_activations = model.predict(noise)
Now the rest is quite model-specific. This is the basic method for checking the outputs of the layers in a given model.
Note it can be done before, during or after training. It also does not need re-compiling.
The plotting of activation maps in this case is relatively straight forward and as you mentioned, you will probably have something specific you want to do. Still, I have to link this beautiful example here.

I report here a useful 2-line code block I extrapolated from the answer of #Homer that I used to inspect a single layer of the neural network.
ablation_model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
preds = ablation_model.predict(np.random.normal(size=(20,2))) # adapt size

Keras, What's the difference between keras.backend.concatenate and Keras.layers.Concatenate [duplicate]

I just recently started playing around with Keras and got into making custom layers. However, I am rather confused by the many different types of layers with slightly different names but with the same functionality.
For example, there are 3 different forms of the concatenate function from https://keras.io/layers/merge/ and https://www.tensorflow.org/api_docs/python/tf/keras/backend/concatenate
keras.layers.Concatenate(axis=-1)
keras.layers.concatenate(inputs, axis=-1)
tf.keras.backend.concatenate()
I know the 2nd one is used for functional API but what is the difference between the 3? The documentation seems a bit unclear on this.
Also, for the 3rd one, I have seen a code that does this below. Why must there be the line ._keras_shape after the concatenation?
# Concatenate the summed atom and bond features
atoms_bonds_features = K.concatenate([atoms, summed_bond_features], axis=-1)
# Compute fingerprint
atoms_bonds_features._keras_shape = (None, max_atoms, num_atom_features + num_bond_features)
Lastly, under keras.layers, there always seems to be 2 duplicates. For example, Add() and add(), and so on.

First, the backend: tf.keras.backend.concatenate()
Backend functions are supposed to be used "inside" layers. You'd only use this in Lambda layers, custom layers, custom loss functions, custom metrics, etc.
It works directly on "tensors".
It's not the choice if you're not going deep on customizing. (And it was a bad choice in your example code -- See details at the end).
If you dive deep into keras code, you will notice that the Concatenate layer uses this function internally:
import keras.backend as K
class Concatenate(_Merge):
#blablabla
def _merge_function(self, inputs):
return K.concatenate(inputs, axis=self.axis)
#blablabla
Then, the Layer: keras.layers.Concatenate(axis=-1)
As any other keras layers, you instantiate and call it on tensors.
Pretty straighforward:
#in a functional API model:
inputTensor1 = Input(shape) #or some tensor coming out of any other layer
inputTensor2 = Input(shape2) #or some tensor coming out of any other layer
#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputTensor = keras.layers.Concatenate(axis=someAxis)([inputTensor1, inputTensor2])
This is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).
Finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)
This is not a layer. This is a function that will return the tensor produced by an internal Concatenate layer.
The code is simple:
def concatenate(inputs, axis=-1, **kwargs):
#blablabla
return Concatenate(axis=axis, **kwargs)(inputs)
Older functions
In Keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". Their names were related to the merge word.
But since Keras 2 doesn't mention or document these, I'd probably avoid using them, and if old code is found, I'd probably update it to a proper Keras 2 code.
Why the _keras_shape word?
This backend function was not supposed to be used in high level codes. The coder should have used a Concatenate layer.
atoms_bonds_features = Concatenate(axis=-1)([atoms, summed_bond_features])
#just this line is perfect
Keras layers add the _keras_shape property to all their output tensors, and Keras uses this property for infering the shapes of the entire model.
If you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.
The coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (This may work now, but in case of keras updates this code will break while proper codes will remain ok)

Keras historically supports 2 different interfaces for their layers, the new functional one and the old one, that requires model.add() calls, hence the 2 different functions.
For the TF -- their concatenate() functions does not do everything that required for Keras to work, hence, the additional calls to make ._keras_shape variable correct and not to upset Keras that expects that variable to have some particular value.

How does this function create a deep neural network and not just rename the same variable?

I'm self learning from Geron's "Hands on Machine Learning" and I'm a little confused about how this function (in box [114] of the following page) creates a deep neural network.
https://github.com/ageron/handson-ml/blob/master/11_deep_learning.ipynb
he_init = tf.variance_scaling_initializer()
def dnn(inputs, n_hidden_layers=5, n_neurons=100, name=None,
activation=tf.nn.elu, initializer=he_init):
with tf.variable_scope(name, "dnn"):
for layer in range(n_hidden_layers):
inputs = tf.layers.dense(inputs, n_neurons, activation=activation,
kernel_initializer=initializer,
name="hidden%d" % (layer + 1))
return inputs
It just looks like it resets the same input each time with a different name. Can someone explain how this is supposed to create a deep neural network?

There is a strong misconception about model construction in TensorFlow. You are advised to read more about TensorFlow's computational graph and other low-level details of this API in the official guide.
Operations built using TensorFlow are not bound to a Python variable
(assume that we are not in Eager mode for this answer). When calling one of the layer construction functions in tf.layers (or other basic functions such as the ones in tf.nn), that will add new operations to the currently active graph and return the Tensor corresponding to the output of that layer. The operations do not disappear when removing or altering the contents of the Python variables that used to hold these tensors.
What the function dnn does is iteratively create a sequence of dense layers. At each step, the variable inputs is changed to point to the output of the most recently created layer, allowing it to be "fed" into the next one. Whether to use the same variable as the original inputs or a new one for this is a matter of opinion (I often use a new variable net myself). By default, this will result in a sequence of 5 fully connected layers. Only the graph was constructed in all this; no network training or weight initialization procedures were actually applied here.
This can also be validated visually. The following code will write the graph's signature to a TensorFlow summary file:
he_init = tf.variance_scaling_initializer()
def dnn(inputs, n_hidden_layers=5, n_neurons=100, name=None,
activation=tf.nn.elu, initializer=he_init):
with tf.variable_scope(name, "dnn"):
for layer in range(n_hidden_layers):
inputs = tf.layers.dense(inputs, n_neurons, activation=activation,
kernel_initializer=initializer,
name="hidden%d" % (layer + 1))
return inputs
x = tf.placeholder(tf.float32, [32, 128])
y = dnn(x)
writer = tf.summary.FileWriter(logdir='mydnn', graph=tf.get_default_graph())
writer.flush()
By opening the same log directory with TensorBoard, we get the following graph:

Modifying Neural Network weights using estimators in tensorflow

I need to modify the weight values during the execution, more specifically between the compute_gradients() and apply_gradients() functions. I was able to modify the gradients themselves, but i could not change the weights.
I'm using the tutorial for the Iris NN in tensorflow:
https://github.com/tensorflow/models/blob/master/samples/core/get_started/custom_estimator.py , the only difference being that i changed the minimize() function for the compute_gradients() and the apply_gradients() function.
grads_and_vars = optimizer.compute_gradients(loss)
// some way to change the weights
train_op = optimizer.apply_gradients(grads_and_vars, global_step=tf.train.get_global_step())
Thanks in advance.

My best guess is that you are looking for tf.assign (from here) to assign values to your Variable tensors.
According to the docs:
Update 'ref' by assigning 'value' to it.
This operation outputs a Tensor that holds the new value of 'ref' after the value has been assigned. This makes it easier to chain operations that need to use the reset value.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.