I'm trying to create an incremental classifier that will get trained on data containing n classes for some set number of epochs, then n+m classes for a set number of epochs, then n+m+k, etc, where each successive set of classes contains the previous set as a subset.
In order to do this without having to train the model, save it, manually edit the graph, re-train, repeat, I'm simply defining all the weights I will need to classify the entire set of classes, but keeping the weights corresponding to unseen classes frozen at 0 until the classifier is introduced to those classes.
My strategy for this is to define a placeholder that is fed in an array of Boolean values defining whether or not some given set of weights are trainable.
Relevant code below:
output_train = tf.placeholder(tf.int32, shape = (num_incremental_grps), name = "output_train")
.
.
.
weights = []
biases = []
for i in range(num_incremental_grps):
W = tf.Variable(tf.zeros([batch_size, classes_per_grp]),
trainable=tf.cond(tf.equal(output_train[i], tf.constant(1)),lambda: tf.constant(True), lambda: tf.constant(False)))
weights.append(W)
b = tf.Variable(tf.zeros([classes_per_grp]), trainable=tf.cond(tf.equal(output_train[i],
tf.constant(1)), lambda:tf.constant(True), lambda: tf.constant(False)))
biases.append(b)
out_weights = tf.stack(weights, axis=1).reshape((batch_size, -1))
out_biases = tf.stack(biases, axis=1).reshape((batch_size, -1))
outputs = tf.identity(tf.matmul(inputs, out_weights) + out_biases, name='values')
.
.
.
# Will change this to an array that progressively updates as classes are added.
output_trainable = np.ones(num_incremental_grps, dtype=bool)
.
.
.
with tf.Session() as sess:
init.run()
for epoch in range(epochs):
for iteration in range(iterations):
X_batch, y_batch = batch.getBatch()
fd={X: X_batch, y: y_batch, training: True, output_train: output_trainable}
_, loss_val = sess.run([training_op, loss], feed_dict=fd)
This returns the error message
Using a 'tf.Tensor' as a Python `bool` is not allowed. Use `if t is not None:` instead of
`if t:` to test if a tensor is defined,and use TensorFlow ops such as tf.cond to execute
subgraphs conditioned on the value of a tensor.
I've tried tinkering around with this, like making the initial placeholder datatype tf.bool instead of tf.int32. I've also tried just feeding in a slice of the tensor into the 'trainable' argument in the weights/biases like this
W = tf.Variable(tf.zeros([batch_size, classes_per_grp]), trainable=output_variable[i])
but I get the same error message. I'm not sure how to proceed from here, aside from trying a completely different approach to updating the number of predictable classes. Any help would be much appreciated.
The error occurs because tf.cond takes a decision based on a single boolean — much like an if statement. What you want here is to make a choice per element of your tensor.
You could use tf.where to fix that problem, but then you will run into another one, which is that trainable is not a property that you can fix at runtime, it is part of the definition of a variable. If a variable will be trained at some point, perhaps not at the beginning but definitely later, then it must be trainable.
I would suggest to take a much simpler route: define output_train to be an array of tf.float32
output_train = tf.placeholder(tf.float32, shape=(num_incremental_grps), name="output_train")
then later simply multiply your weights and variables with this vector.
W = tf.Variable(...)
W = W * output_train
...
Provide values of 1 to output_train where you want training to happen, 0 otherwise.
Be careful to also mask your loss to ignore output from unwanted channels, because event though they now always output 0, that may still affect your loss. For example,
logits = ...
logits = tf.matrix_transpose(tf.boolean_mask(
tf.matrix_transpose(logits ),
output_train == 1))
loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)
Related
I have subclassed tf.keras.Model and I use tf.keras.layers.GRUCell in a for loop to compute sequences 'y_t' (n, timesteps, hidden_units) and final hidden states 'h_t' (n, hidden_units). For my loop to output 'y_t', I update a tf.Variable after each iteration of the loop. Calling the model with model(input) is not a problem, but when I fit the model with the for loop in the call method I get either a TypeError or a ValueError.
Please note, I cannot simply use tf.keras.layers.GRU because I am trying to implement this paper. Instead of just passing x_t to the next cell in the RNN, the paper performs some computation as a step in the for loop (they implement in PyTorch) and pass the result of that computation to the RNN cell. They end up essentially doing this: h_t = f(special_x_t, h_t-1).
Please see the model below that causes the error:
class CustomGruRNN(tf.keras.Model):
def __init__(self, batch_size, timesteps, hidden_units, features, **kwargs):
# Inheritance
super().__init__(**kwargs)
# Args
self.batch_size = batch_size
self.timesteps = timesteps
self.hidden_units = hidden_units
# Stores y_t
self.rnn_outputs = tf.Variable(tf.zeros(shape=(batch_size, timesteps, hidden_units)), trainable=False)
# To be used in for loop in call
self.gru_cell = tf.keras.layers.GRUCell(units=hidden_units)
# Reshape to match input dimensions
self.dense = tf.keras.layers.Dense(units=features)
def call(self, inputs):
"""Inputs is rank-3 tensor of shape (n, timesteps, features) """
# Initial state for gru cell
h_t = tf.zeros(shape=(self.batch_size, self.hidden_units))
for timestep in tf.range(self.timesteps):
# Get the the timestep of the inputs
x_t = tf.gather(inputs, timestep, axis=1) # Same as x_t = inputs[:, timestep, :]
# Compute outputs and hidden states
y_t, h_t = self.gru_cell(x_t, h_t)
# Update y_t at the t^th timestep
self.rnn_outputs = self.rnn_outputs[:, timestep, :].assign(y_t)
# Outputs need to have same last dimension as inputs
outputs = self.dense(self.rnn_outputs)
return outputs
An example that would throw the error:
# Arbitrary values for dataset
num_samples = 128
batch_size = 4
timesteps = 5
features = 10
# Arbitrary dataset
x = tf.random.uniform(shape=(num_samples, timesteps, features))
y = tf.random.uniform(shape=(num_samples, timesteps, features))
train_data = tf.data.Dataset.from_tensor_slices((x, y))
train_data = train_data.shuffle(batch_size).batch(batch_size, drop_remainder=True)
# Model with arbitrary hidden units
model = CustomGruRNN(batch_size, timesteps, hidden_units=5)
model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam())
When running eagerly:
model.fit(train_data, epochs=2, run_eagerly=True)
Epoch 1/2
WARNING:tensorflow:Gradients do not exist for variables
['stack_overflow_gru_rnn/gru_cell/kernel:0',
'stack_overflow_gru_rnn/gru_cell/recurrent_kernel:0',
'stack_overflow_gru_rnn/gru_cell/bias:0'] when minimizing the loss.
ValueError: substring not found ValueError
When not running eagerly:
model.fit(train_data, epochs=2, run_eagerly=False)
Epoch 1/2
TypeError: in user code:
TypeError: Can not convert a NoneType into a Tensor or Operation.
Edit:
While the TensorFlow guide answer suffices, I think my self-answered question involving custom cells for RNNs is a much better option. Please see this answer. Using a custom RNN cell removes the need to use tf.Transpose and tf.TensorArrayand thus lowers complexity of the code while simultaneously improving readability.
Original Self-Answer:
The use of the DynamicRNN described near the bottom of TensorFlow's Guide to Effective TensorFlow2 solves my problem.
To expand briefly on the DynamicRNN's conceptual use, an RNN cell is defined, in my case GRU, and then any number of custom steps can be defined within the tf.range loop. Variables should be tracked using tf.TensorArray objects outside the loop but inside the call method itself, and the sizes of such arrays can be determined by simply calling the .shape method of (input) tensors. Notably, the DynamicRNN object works in model fit, wherein the default execution mode is 'Graph' mode as opposed to the slower 'Eager Execution' mode.
Lastly, one might require the use of a 'DynamicRNN' because by default, the `tf.keras.layers.GRU' computation is loosely described by the following recurrent logic (assume that 'f' defines a GRU cell):
# Numpy is used here for ease of indexing, but in general you should use
# tensors and transpose them accordingly (see the previously linked guide)
inputs = np.random.randn((batch, total_timesteps, features))
# List for tracking outputs -- just for simple demonstration... again please see the guide for more details
outputs = []
# Initialize the 'hidden state' (often referred to as h_naught and denoted h_0) of the RNN cell
state_at_t_minus_1 = tf.zeros(shape=(batch, hidden_cell_units))
# Iterate through the input until all timesteps in the sequence have been 'seen' by the GRU cell function 'f'
for timestep_t in total_timesteps:
# This is of shape (batch, features)
input_at_t = inputs[:, timestep_t, :]
# output_at_t of shape (batch, hidden_units_of_cell) and state_at_t (batch, hidden_units_of_cell)
output_at_t, state_at_t = f(input_at_t, state_at_t_minus_1)
outputs.append(output_at_t)
# When the loop restarts, this variable will be used in the next GRU Cell function call 'f'
state_at_t_minus_1 = state_at_t
One might wish to add other steps in the for loop of the recurrent logic (e.g., dense layers, other layers, etc.) to modify the inputs and states passed to the GRU Cell function 'f'. This is one motivation of the DynamicRNN.
I want to see what values filter has adopted after training? If after the training loop I mention filter.eval() will it get me the values of filter weights that the filter has adopted after complete training? I don't think so I can get the filter weights in this way because filter variable calling a function weight_variable which is picking some values from normal distribution. I think calling filter.eval() command after training loop is simply like printing filter before training. So how can I get the values of filter weights that filter has adopted after training?
`def weight_variable(shape):
initial = tf.truncated_normal(shape, mean=0, stddev=0.1)
return tf.Variable(initial)
#network
x = tf.placeholder(tf.float32, [None,
FLAGS.image_height*FLAGS.image_width])
y_ = tf.placeholder(tf.float32, [None, 2])
input=tf.reshape(x,
[-1,FLAGS.image_height,FLAGS.image_width,FLAGS.input_channel])
filter = weight_variable([FLAGS.filter_size, FLAGS.filter_size,
FLAGS.input_channel, FLAGS.filter_channel])
conv_out = tf.nn.sigmoid(conv2d(input, filter))
pool_out = max_pool(conv_out)
pool_list = pool_out.get_shape().as_list()
input_dim = pool_list[1]* pool_list[2]* pool_list[3]
pool_2D = tf.reshape(pool_out, [-1, input_dim])
W_fc = weight_variable([input_dim, 2])
logits = tf.matmul(pool_2D, W_fc) #(batch_size,2)
y_conv=tf.nn.softmax(logits)`
after checking correct predictions apply training loop
`
for i in range(max.training_step):
#Check training and test accuracy
print(filter.eval())`
First, make sure you use the same graph and session. Graph contains the setup of your model and session contains the values of all the weights in the model. If you don't use the same graph/session you will not get the same weights. I think in your example you use some default graph and session. It should work but it's better if you are more explicit about them.
It looks like filter should be a tensor. If you do print(filter) you should see something like <tensor object shape=(..) >. In this case print(filter.eval()) will return you the correct weights.
If not, you can use the graph to get the tensor. graph.get_tensor_by_name('filter_weights:0') will give you a tensor that you can look at. You can get the name from tensorboard (from the graph) or you can run tf.trainable_variables() to get a list of all the variables defined in the graph (and pick the one you want).
An alternative to filter.eval() is to do session.run(filter). They are equivalent.
I want to run a small training operation inside another training operation as follows:
def get_alphas(weights, filters):
alphas = tf.Variable(...)
# Define some loss and training_op here
with tf.Session() as sess:
for some_epochs:
sess.run(training_op)
return tf.convert_to_tensor(sess.run(alphas))
def get_updated_weights(default_weights):
weights = tf.Variable(default_weights)
# Some operation on weights to get filters
# Now, the following will produce errors since weights is not initialized
alphas = get_alphas(weights, filters)
# Other option is to initialize it here as follows
with tf.Session() as sess:
sess.run(tf.variables_initializer([weights]))
calculated_filters = sess.run(filters)
alphas = get_alphas(default_weights, calculated_filters)
return Some operation on alphas and filters
So, what I want to do is to create a Variable by the name of weights. alphas and filters are dynamically dependent (through some training) on weights. Now, as weights are trained, filters will change as it is created through some operations on weights, but alphas also need to change, which can be found only though another training operation.
I will provide the exact functions, if intention is not clear from above.
The trick you describe won't work, because tf.Session.close releases all associated resources, such as variables, queues, and readers. So the result of get_alphas won't be a valid tensor.
The best course of action is to define several losses and training ops (affecting different parts of the graph) and run them within a single session, when you need to.
alphas = tf.Variable(...)
# Define some loss and training_op here
def get_alphas(sess, weights, filters):
for some_epochs:
sess.run(training_op)
# The rest of the training...
I want to implement a model like DSSM (Deep Semantic Similarity Model).
I want to train one RNN model and use this model to get three hidden vector for three different inputs, and use these hidden vector to compute loss function.
I try to code in a variable scope with reuse=None like:
gru_cell = tf.nn.rnn_cell.GRUCell(size)
gru_cell = tf.nn.rnn_cell.DropoutWrapper(gru_cell,output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell([gru_cell] * 2, state_is_tuple=True)
embedding = tf.get_variable("embedding", [vocab_size, wordvec_size])
inputs = tf.nn.embedding_lookup(embedding, self._input_data)
inputs = tf.nn.dropout(inputs, 0.5)
with tf.variable_scope("rnn"):
_, self._states_2 = rnn_states_2[config.num_layers-1] = tf.nn.dynamic_rnn(cell, inputs, sequence_length=self.lengths, dtype=tf.float32)
self._states_1 = rnn_states_1[config.num_layers-1]
with tf.variable_scope("rnn", reuse=True):
_, rnn_states_2 = tf.nn.dynamic_rnn(cell,inputs,sequence_length=self.lengths,dtype=tf.float32)
self._states_2 = rnn_states_2[config.num_layers-1]
I use the same inputs and reuse the RNN model, but when I print 'self_states_1' and 'self_states_2', these two vectors are different.
I use with tf.variable_scope("rnn", reuse=True): to compute 'rnn_states_2' because I want to use the same RNN model like 'rnn_states_1'.
But why I get different hidden vectors with the same inputs and the same model?
Where did i go wrong?
Thanks for your answering.
Update:
I find the reason may be the 'tf.nn.rnn_cell.DropoutWrapper' , when I remove the drop out wrapper, the hidden vectors are same, when I add the drop out wrapper, these vector become different.
So, the new question is :
How to fix the part of vector which be 'dropped out' ? By setting the 'seed' parameter ?
When training a DSSM, should I fix the drop out action ?
If you structure your code to use tf.contrib.rnn.DropoutWrapper, you can set variational_recurrent=True in your wrapper, which causes the same dropout mask to be used at all steps, i.e. the dropout mask will be constant. Is that what you want?
Setting the seed parameter in tf.nn.dropout will just make sure that you get the same sequence of dropout masks every time you run with that seed. That does not mean the dropout mask will be constant, just that you'll always see the same dropout mask at a particular iteration. The mask will be different for every iteration.
I've constructed a LSTM recurrent NNet using lasagne that is loosely based on the architecture in this blog post. My input is a text file that has around 1,000,000 sentences and a vocabulary of 2,000 word tokens. Normally, when I construct networks for image recognition my input layer will look something like the following:
l_in = nn.layers.InputLayer((32, 3, 128, 128))
(where the dimensions are batch size, channel, height and width) which is convenient because all the images are the same size so I can process them in batches. Since each instance in my LSTM network has a varying sentence length, I have an input layer that looks like the following:
l_in = nn.layers.InputLayer((None, None, 2000))
As described in above referenced blog post,
Masks:
Because not all sequences in each minibatch will always have the same length, all recurrent layers in
lasagne
accept a separate mask input which has shape
(batch_size, n_time_steps)
, which is populated such that
mask[i, j] = 1
when
j <= (length of sequence i)
and
mask[i, j] = 0
when
j > (length
of sequence i)
.
When no mask is provided, it is assumed that all sequences in the minibatch are of length
n_time_steps.
My question is: Is there a way to process this type of network in mini-batches without using a mask?
Here is a simplified version if my network.
# -*- coding: utf-8 -*-
import theano
import theano.tensor as T
import lasagne as nn
softmax = nn.nonlinearities.softmax
def build_model():
l_in = nn.layers.InputLayer((None, None, 2000))
lstm = nn.layers.LSTMLayer(l_in, 4096, grad_clipping=5)
rs = nn.layers.SliceLayer(lstm, 0, 0)
dense = nn.layers.DenseLayer(rs, num_units=2000, nonlinearity=softmax)
return l_in, dense
model = build_model()
l_in, l_out = model
all_params = nn.layers.get_all_params(l_out)
target_var = T.ivector("target_output")
output = nn.layers.get_output(l_out)
loss = T.nnet.categorical_crossentropy(output, target_var).sum()
updates = nn.updates.adagrad(loss, all_params, 0.005)
train = theano.function([l_in.input_var, target_var], cost, updates=updates)
From there I have generator that spits out (X, y) pairs and I am computing train(X, y) and updating the gradient with each iteration. What I want to do is do an N number of training steps and then update the parameters with the average gradient.
To do this, I tried creating a compute_gradient function:
gradient = theano.grad(loss, all_params)
compute_gradient = theano.function(
[l_in.input_var, target_var],
output=gradient
)
and then looping over several training instances to create a "batch" and collect the gradient calculations to a list:
grads = []
for _ in xrange(1024):
X, y = train_gen.next() # generator for producing training data
grads.append(compute_gradient(X, y))
this produces a list of lists
>>> grads
[[<CudaNdarray at 0x7f83b5ff6d70>,
<CudaNdarray at 0x7f83b5ff69f0>,
<CudaNdarray at 0x7f83b5ff6270>,
<CudaNdarray at 0x7f83b5fc05f0>],
[<CudaNdarray at 0x7f83b5ff66f0>,
<CudaNdarray at 0x7f83b5ff6730>,
<CudaNdarray at 0x7f83b5ff6b70>,
<CudaNdarray at 0x7f83b5ff64f0>] ...
From here I would need to take the mean of the gradient at each layer, and then update the model parameters. This is possible to do in pieces like this does does the gradient calc/parameter update need to happen all in one theano function?
Thanks.
NOTE: this is a solution, but by no means do i have enough experience to verify its best and the code is just a sloppy example
You need 2 theano functions. The first being the grad one you seem to have already judging from the information provided in your question.
So after computing the batched gradients you want to immediately feed them as an input argument back into another theano function dedicated to updating the shared variables. For this you need to specify the expected batch size at the compile time of your neural network. so you could do something like this: (for simplicity i will assume you have a global list variable where all your params are stored)
params #list of params you wish to update
BATCH_SIZE = 1024 #size of the expected training batch
G = [T.matrix() for i in range(BATCH_SIZE) for param in params] #placeholder for grads result flattened so they can be fed into a theano function
updates = [G[i] for i in range(len(params))] #starting with list of param updates from first batch
for i in range(len(params)): #summing the gradients for each individual param
for j in range(1, len(G)/len(params)):
updates[i] += G[i*BATCH_SIZE + j]
for i in range(len(params)): #making a list of tuples for theano.function updates argument
updates[i] = (params[i], updates[i]/BATCH_SIZE)
update = theano.function([G], 0, updates=updates)
Like this theano will be taking the mean of the gradients and updating the params as usual
dont know if you need to flatten the inputs as I did, but probably
EDIT: gathering from how you edited your question it seems important that the batch size can vary in that case you could add 2 theano functions to your existing one:
the first theano function takes a batch of size 2 of your params and returns the sum. you could apply this theano function using python's reduce() and get the sum of the over the whole batch of gradients
the second theano function takes those summed param gradients and a scaler (the batch size) as input and hence is able to update the NN params over the mean of the summed gradients.