tensorflow: shared variables error with simple LSTM network - python

I am trying to build a simplest possible LSTM network. Just want it to predict the next value in the sequence np_input_data.
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
import numpy as np
num_steps = 3
num_units = 1
np_input_data = [np.array([[1.],[2.]]), np.array([[2.],[3.]]), np.array([[3.],[4.]])]
batch_size = 2
graph = tf.Graph()
with graph.as_default():
tf_inputs = [tf.placeholder(tf.float32, [batch_size, 1]) for _ in range(num_steps)]
lstm = rnn_cell.BasicLSTMCell(num_units)
initial_state = state = tf.zeros([batch_size, lstm.state_size])
loss = 0
for i in range(num_steps-1):
output, state = lstm(tf_inputs[i], state)
loss += tf.reduce_mean(tf.square(output - tf_inputs[i+1]))
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
feed_dict={tf_inputs[i]: np_input_data[i] for i in range(len(np_input_data))}
loss = session.run(loss, feed_dict=feed_dict)
print(loss)
The interpreter returns:
ValueError: Variable BasicLSTMCell/Linear/Matrix already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
output, state = lstm(tf_inputs[i], state)
What do I do wrong?

The call to lstm here:
for i in range(num_steps-1):
output, state = lstm(tf_inputs[i], state)
will try to create variables with the same name each iteration unless you tell it otherwise. You can do this using tf.variable_scope
with tf.variable_scope("myrnn") as scope:
for i in range(num_steps-1):
if i > 0:
scope.reuse_variables()
output, state = lstm(tf_inputs[i], state)
The first iteration creates the variables that represent your LSTM parameters and every subsequent iteration (after the call to reuse_variables) will just look them up in the scope by name.

I ran into a similar issue in TensorFlow v1.0.1 using tf.nn.dynamic_rnn. It turned out that the error only arose if I had to re-train or cancel in the middle of training and restart my training process. Basically the graph was not being reset.
Long story short, throw a tf.reset_default_graph() at the start of your code and it should help. At least when using tf.nn.dynamic_rnn and retraining.

Use tf.nn.rnn or tf.nn.dynamic_rnn which do this, and a lot of other nice things, for you.

Related

Converting short tensorflow 1.13 script into tensorflow 2.0

I am trying to learn the dynamics of tensorflow2.0 by converting my tensorflow1.13 script (below) into a tensorflow2.0 script. However I am struggling to do this.
I think the main reason why I am struggling is because the examples of tensorflow2.0 I have seen train neural networks and so they have a model which they compile and fit. However in my simple example below I am not using a neural network so I can't see how to adapt this code to tensorflow2.0 (For example, how do I replace session?). Help is much appreciated and thanks in advance.
data = tf.placeholder(tf.int32)
theta = tf.Variable(np.zeros(100))
p_s = tf.nn.softmax(theta)
loss = tf.reduce_mean(-tf.log(tf.gather(p_s, data)))
train_step = tf.train.AdamOptimizer().minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(10):
for datum in sample_data(): #sample_data() is a list of integer datapoints
_ = sess.run([train_step], feed_dict={data:datum})
print(sess.run(p_s))
I have looked at this (which is most relavant) and so far I have come up with the below:
#data = tf.placeholder(tf.int32)
theta = tf.Variable(np.zeros(100))
p_s = tf.nn.softmax(theta)
loss = tf.reduce_mean(-tf.math.log(tf.gather(p_s, **data**)))
optimizer = tf.keras.optimizers.Adam()
for epoch in range(10):
for datum in sample_data():
optimizer.apply_gradients(loss)
print(p_s)
However the above obviously does not run because the placeholder data inside the loss function does not exist anymore - however I am not sure how to replace it. :S
Anyone? Note that I don't have a def forward(x) because my input datum isn't transformed - it is used directly to calculate the loss.
Instead of using the conversion tool (that exists, but I don't like it since it just prefixes (more or less) the API calls with tf.compat.v1 and uses the old Tensoflow 1.x API) I help you convert your code to the new version.
Sessions are disappeared, and so are the placeholders. The reason? The code is executed line by line - that is the Tensorflow eager mode.
To train a model you correctly have to use an optimizer. If you want to use the minimize method, in Tensorflowe 2.0 you have to define the function to minimize (the loss) as a Python callable.
# This is your "model"
theta = tf.Variable(np.zeros(100))
p_s = tf.nn.softmax(theta)
# Define the optimizer
optimizer = tf.keras.optimizers.Adam()
# Define the training loop with the loss inside (because we use the
# .minimnize method that requires a callable with no arguments)
trainable_variables = [theta]
for epoch in range(10):
for datum in sample_data():
# The loss must be callable and return the value to minimize
def loss_fn():
loss = tf.reduce_mean(-tf.math.log(tf.gather(p_s, datum)))
return loss
optimizer.minimize(loss_fn, var_list=trainable_variables)
tf.print("epoch ", epoch, " finished. ps: ", p_s)
Disclaimer: I haven't tested the code - but it should work (or at least give you an idea on how to implement what you're trying to achieve in TF 2)

tensorflow: assigning weights after finalizing graph

Solution below
If you are just interested in solving this problem, you can skip to my answer below.
Original question
I'm using tensorflow for reinforcement learning. A swarm of agents uses the model in parallel and one central entity trains it on the collected data.
I had found here: Is it thread-safe when using tf.Session in inference service? that tensorflow sessions are threadsafe. So I simply let the prediction and updating run in parallel.
But now I would like to change the setup. Instead of updating and training on one single model, I now need to keep two models. One is used for prediction and the second one is trained. After some training steps the weights from the second one are copied over to the first. Below is a minimal example in keras. For multiprocessing, it is recommended to finalize the graph, but then I can't copy weights:
# the usual imports
import numpy as np
import tensorflow as tf
from keras.models import *
from keras.layers import *
# set up the first model
i = Input(shape=(10,))
b = Dense(1)(i)
prediction_model = Model(inputs=i, outputs=b)
# set up the second model
i2 = Input(shape=(10,))
b2 = Dense(1)(i2)
training_model = Model(inputs=i2, outputs=b2)
# look at this code, to check if the weights are the same
# here the output is different
prediction_model.predict(np.ones((1, 10)))
training_model.predict(np.ones((1, 10)))
# now to use them in multiprocessing, the following is necessary
prediction_model._make_predict_function()
training_model._make_predict_function()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
default_graph = tf.get_default_graph()
# the following line is the critical part
# if this is uncommented, the two options below both fail
# default_graph.finalize()
# option 1, use keras methods to update the weights
prediction_model.set_weights(training_model.get_weights())
# option 2, use tensorflow to update the weights
update_ops = [tf.assign(to_var, from_var) for to_var, from_var in
zip(prediction_model.trainable_weights, training_model.trainable_weights)]
sess.run(update_ops)
# now the predictions are the same
prediction_model.predict(np.ones((1, 10)))
training_model.predict(np.ones((1, 10)))
According to the question above, it is recommended to finalize the graph. If it is not finalized, there can be memory leaks (!?), so that seems like a strong recommendation.
But if I finalize it, I can no longer update the weights.
What confuses me about this is: It is possible to train the network, so changing the weights is allowed. Assignment looks to me like the weights are just overwritten, why is this different from applying an optimizer step ?
In short, my problem was to assign values to weights of a finalized graph. If this assignment is done after finalization, tensorflow complains that the graph can no longer be changed.
I was confused why this is forbidden. After all, changing the weights by backpropagation is allowed.
But the problem is not related to changing the weights. Keras set_weights() is confusing because it looks as if the weights are simply overwritten (like in backprop). Actually, behind the scenes, assignment operations are added and executed. These new operations represent a change in the graph and that change is forbidden.
So the solution is to set up the assignment operations before finalizing the graph. You have to reorder the code:
# the usual imports
import numpy as np
import tensorflow as tf
from keras.models import *
from keras.layers import *
# set up the first model
i = Input(shape=(10,))
b = Dense(1)(i)
prediction_model = Model(inputs=i, outputs=b)
# set up the second model
i2 = Input(shape=(10,))
b2 = Dense(1)(i2)
training_model = Model(inputs=i2, outputs=b2)
# set up operations to move weights from training to prediction
update_ops = [tf.assign(to_var, from_var) for to_var, from_var in
zip(prediction_model.trainable_weights, training_model.trainable_weights)]
# now to use them in multiprocessing, the following is necessary
prediction_model._make_predict_function()
training_model._make_predict_function()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
default_graph = tf.get_default_graph()
default_graph.finalize()
# this can be executed now
sess.run(update_ops)
# now the predictions are the same
prediction_model.predict(np.ones((1, 10)))
training_model.predict(np.ones((1, 10)))

restoring sub-graph variables fails with 'Cannot interpret feed_dict key'

The context is that I'm trying to incrementally grow a rnn autoencoder, by first training a single cell encoder/decoder then extending. I'd like to load the parameters of the preceding cells.
This code here is a minimal code where I'm investigating how to do this, and it fails with:
TypeError: Cannot interpret feed_dict key as Tensor: The name 'save_1/Const:0' refers to a Tensor which does not exist. The operation, 'save_1/Const', does not exist in the graph.
I've searched and found nothing, this thread and this thread are not the same problem.
MVCE
import tensorflow as tf
import numpy as np
with tf.Session(graph=tf.Graph()) as sess:
cell1 = tf.nn.rnn_cell.LSTMCell(1,name='lstm_cell1')
cell = tf.nn.rnn_cell.MultiRNNCell([cell1])
inputs = tf.random_normal((5,10,1))
rnn1 = tf.nn.dynamic_rnn(cell,inputs,dtype=tf.float32)
vars0 = tf.trainable_variables()
saver = tf.train.Saver(vars0,max_to_keep=1)
sess.run(tf.initialize_all_variables())
saver.save(sess,'./save0')
vars0_val = sess.run(vars0)
# creating a new graph/session because it is not given that it'll be in the same session.
with tf.Session(graph=tf.Graph()) as sess:
cell1 = tf.nn.rnn_cell.LSTMCell(1,name='lstm_cell1')
#one extra cell
cell2 = tf.nn.rnn_cell.LSTMCell(1,name='lstm_cell2')
cell = tf.nn.rnn_cell.MultiRNNCell([cell1,cell2])
inputs = tf.random_normal((5,10,1))
rnn1 = tf.nn.dynamic_rnn(cell,inputs,dtype=tf.float32)
sess.run(tf.initialize_all_variables())
# new saver with first cell variables
saver = tf.train.Saver(vars0,max_to_keep=1)
# fails
saver.restore(sess,'./save0')
# Should be the same
vars0_val1 = sess.run(vars0)
assert np.all(vars0_val1 = vars0_val)
The mistake comes from the line,
saver = tf.train.Saver(vars0,max_to_keep=1)
if the second session. vars0 refers to actual tensor objects that existed in the previous graph (not the current one). Saver's var_list requires an actual set of tensors (not strings, which I assumed would be good enough).
To make it work the second Saver object should be initialized with the corresponding tensors in the current graph.
Something like,
vars0_names = [v.name for v in vars0]
load_vars = [sess.graph.get_tensor_by_name(n) for n in vars0_names]
saver = tf.train.Saver(load_vars,max_to_keep=1)

What is actually happening when executing a tensorflow graph using python API?

I am a newbie to tensorflow. But I think understanding about tesnorflow core operation is a must. If we use tf python API with object oriented manner we can fist create different graph operations as definition.
def _create_placeholders(self):
""" Step 1: define the placeholders for input and output """
with tf.name_scope("data"):
self.center_words = tf.placeholder(tf.int32, shape=[self.batch_size], name='center_words')
print("Extracting the op",self.center_words.op)
self.target_words = tf.placeholder(tf.int32, shape=[self.batch_size, 1], name='target_words')
print("so",self.center_words.op)
def _create_embedding(self):
""" Step 2: define weights. In word2vec, it's actually the weights that we care about """
# Assemble this part of the graph on the CPU. You can change it to GPU if you have GPU
with tf.device('/cpu:0'):
with tf.name_scope("embed"):
self.embed_matrix = tf.Variable(tf.random_uniform([self.vocab_size,
self.embed_size], -1.0, 1.0),
name='embed_matrix')
def _create_loss(self):
""" Step 3 + 4: define the model + the loss function """
with tf.device('/cpu:0'):
with tf.name_scope("loss"):
# Step 3: define the inference
embed = tf.nn.embedding_lookup(self.embed_matrix, self.center_words, name='embed')
# Step 4: define loss function
# construct variables for NCE loss
nce_weight = tf.Variable(tf.truncated_normal([self.vocab_size, self.embed_size],
stddev=1.0 / (self.embed_size ** 0.5)),
name='nce_weight')
nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name='nce_bias')
# define loss function to be NCE loss function
self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight,
biases=nce_bias,
labels=self.target_words,
inputs=embed,
num_sampled=self.num_sampled,
num_classes=self.vocab_size), name='loss')
Here I have mentioned two definitions which are for creating embedding and calculate loss.
So if I run one of this def with _create_loss() it will create a node in the graph. I went through the tf source code , What I saw was during the graph building stage is in that stage it will load each any every operation to some-kind of buffer.
Then during the session we just re run each and everything with real data.
with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
sess.run(tf.global_variables_initializer())
ckpt = tf.train.get_checkpoint_state(os.path.dirname('c/checkpointsq'))
# if that checkpoint exists, restore from checkpoint
if ckpt and ckpt.model_checkpoint_path:
print("Restoring the checkpoins")
saver.restore(sess, ckpt.model_checkpoint_path)
total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps
writer = tf.summary.FileWriter('./ improved_graph/lr' + str(LEARNING_RATE), sess.graph)
initial_step = model.global_step.eval()
for index in range(1):
centers, targets = batch_gen.__next__()
feed_dict={model.center_words: centers, model.target_words: targets}
loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op],
feed_dict=feed_dict)
Here is my problem. Here in sess.run tensorflow doesn't even care about the python API. It's only care about the graph operation which was loaded from the above graph initialization code. My question is where's all this operations are get executed in a session object. I can understand it's in the core. Do we have any access to that?
I believe the code that builds the backpropagation part of the graph is here.
compute_gradients() is called by minimize(), which is then called by user code.
The scheduling and execution of ops in a already built TensorFlow graph happen inside this function.

How do I set TensorFlow RNN state when state_is_tuple=True?

I have written an RNN language model using TensorFlow. The model is implemented as an RNN class. The graph structure is built in the constructor, while RNN.train and RNN.test methods run it.
I want to be able to reset the RNN state when I move to a new document in the training set, or when I want to run a validation set during training. I do this by managing the state inside the training loop, passing it into the graph via a feed dictionary.
In the constructor I define the the RNN like so
cell = tf.nn.rnn_cell.LSTMCell(hidden_units)
rnn_layers = tf.nn.rnn_cell.MultiRNNCell([cell] * layers)
self.reset_state = rnn_layers.zero_state(batch_size, dtype=tf.float32)
self.state = tf.placeholder(tf.float32, self.reset_state.get_shape(), "state")
self.outputs, self.next_state = tf.nn.dynamic_rnn(rnn_layers, self.embedded_input, time_major=True,
initial_state=self.state)
The training loop looks like this
for document in document:
state = session.run(self.reset_state)
for x, y in document:
_, state = session.run([self.train_step, self.next_state],
feed_dict={self.x:x, self.y:y, self.state:state})
x and y are batches of training data in a document. The idea is that I pass the latest state along after each batch, except when I start a new document, when I zero out the state by running self.reset_state.
This all works. Now I want to change my RNN to use the recommended state_is_tuple=True. However, I don't know how to pass the more complicated LSTM state object via a feed dictionary. Also I don't know what arguments to pass to the self.state = tf.placeholder(...) line in my constructor.
What is the correct strategy here? There still isn't much example code or documentation for dynamic_rnn available.
TensorFlow issues 2695 and 2838 appear relevant.
A blog post on WILDML addresses these issues but doesn't directly spell out the answer.
See also TensorFlow: Remember LSTM state for next batch (stateful LSTM).
One problem with a Tensorflow placeholder is that you can only feed it with a Python list or Numpy array (I think). So you can't save the state between runs in tuples of LSTMStateTuple.
I solved this by saving the state in a tensor like this
initial_state = np.zeros((num_layers, 2, batch_size, state_size))
You have two components in an LSTM layer, the cell state and hidden state, thats what the "2" comes from. (this article is great: https://arxiv.org/pdf/1506.00019.pdf)
When building the graph you unpack and create the tuple state like this:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0],l[idx][1])
for idx in range(num_layers)]
)
Then you get the new state the usual way
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, series_batch_input, initial_state=rnn_tuple_state)
It shouldn't be like this... perhaps they are working on a solution.
A simple way to feed in an RNN state is to simply feed in both components of the state tuple individually.
# Constructing the graph
self.state = rnn_cell.zero_state(...)
self.output, self.next_state = tf.nn.dynamic_rnn(
rnn_cell,
self.input,
initial_state=self.state)
# Running with initial state
output, state = sess.run([self.output, self.next_state], feed_dict={
self.input: input
})
# Running with subsequent state:
output, state = sess.run([self.output, self.next_state], feed_dict={
self.input: input,
self.state[0]: state[0],
self.state[1]: state[1]
})

Categories