Related
Let's suppose we have a neural nets with three layers : Inputs > Hidden > Outputs and consider that the weigths between the Hidden and Outputs layers are : W, b where W is a matrix of shape (N, M). By default, all components of W and b are set as trainable in keras. I know how to set the entire W or b as non trainable like in the link below:
How to set parameters in keras to be non-trainable?
What I want is to be able to set only a specific component of W (for example) to be non trainable. For instance, If:
W = [[W11, W12]
[W21, W22]]
Which can be rewritten in:
W = [W1, W2] with W1 = [W11, W12] and W2 = [W21, W22]
and all W1 and W2 are of type tf.Variable,
How to set for instance W1 as non trainable?
I looked for some other topics but non of them helps me to get what I want. Some examples of links are belows:
Link 1 : https://keras.io/guides/transfer_learning/
Link 2 : https://github.com/tensorflow/tensorflow/issues/47597
Can anyone help me to solve this?
Thank you in advance
The tensor W is stored as a single tf.Variable (not four variables w11, w12, w21, w22) and tf.Variable.trainable controls entire tensors, not sub tensors. Worse yet, inside a keras layer, all variables have the same trainable attribute, because they are controlled by the tf.keras.layers.Layer.trainable attribute.
To do what you want, you'd want two variables W1 and W2, each wrapped in a different instance of a layer. You'd apply each layer to the input, resulting in half the answer. Then you can concat to get the complete answer.
You can create your own layers in keras. This will help you to customize your weights within your layers, e.g., whether they are trainable or not.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
import tensorflow as tf
from keras.layers import *
from keras.models import *
# Your custom layer
class Linear(Layer):
def __init__(self, units=32,**kwargs):
super(Linear, self).__init__(**kwargs)
self.units = units
def build(self, input_shape):
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(self.units,), initializer="random_normal", trainable=False
)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
In Linear, the weights w are trainable and the bias b are not. Here, I am creating a training loop for dummy data to visualize the weight updating.
batch_size=10
input_shape=(batch_size,5,5)
## model
model = Sequential()
model.add(Input(shape=input_shape))
model.add(Linear(units=4,name='my_linear_layer'))
model.add(Dense(1))
## dummy dataset
x = tf.random.normal(input_shape) # dummy input
y = tf.ones((batch_size,1)) # dummy output
## loss functions and optimizer
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2)
### training loop
epochs = 3
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
tf.print(model.get_layer('my_linear_layer').get_weights())
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
This loop returns the following result,
Start of epoch 0
[array([[ 0.08920084, -0.04294993, 0.06111819, 0.08334437],
[-0.0369432 , -0.05014499, 0.0305218 , -0.07486793],
[-0.01227043, 0.09460627, -0.0560123 , 0.01324316],
[-0.00255878, 0.00214959, -0.02924518, 0.04721532],
[-0.05532415, -0.02014978, -0.06785563, -0.07330619]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
Start of epoch 1
[array([[ 0.08961578, -0.04327399, 0.06152926, 0.08325274],
[-0.03829437, -0.04908974, 0.02918325, -0.07456956],
[-0.01417133, 0.09609085, -0.05789544, 0.01366292],
[-0.00236284, 0.00199657, -0.02905108, 0.04717206],
[-0.05536905, -0.02011472, -0.06790011, -0.07329627]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
Start of epoch 2
[array([[ 0.09001605, -0.04358549, 0.06192534, 0.08316355],
[-0.03960795, -0.04806747, 0.02788337, -0.07427685],
[-0.01599812, 0.09751251, -0.05970317, 0.01406999],
[-0.00217021, 0.00184666, -0.02886046, 0.04712913],
[-0.05540781, -0.02008455, -0.06793848, -0.07328764]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
As you see while the weights w are updating, the bias b stays constant.
So i'm trying to solve a similar problem at the moment. What you would need to do is first use the functional API of keras. Then put all the weights that you want to be trainable into one layers and all the weights you want to be non-trianable into another layer. Have the previous layer input into both these layers. Then what you can do is use the tensorflow concatenate layer to combine the layers back together. So say you had a hidden layer with 5 neurons, 3 where you wanted them to be trainable and 2 where you wanted them to be non-trainable.
X = Dense(5, activation='relu')(X) #previous layer
Y = Dense(3, activation='relu',name='trainable_layer')(X)
Z = Dense(2, activation='relu',name='non_trainable_layer')(X)
Z.trainable = False
X = Concatenate()([Y, Z])
X = Dense(5, activation='relu')(X) #layer after layer with mixed trainable weights
I am testing some TensorFlow code; I'm seeing this error:
AttributeError: module 'tensorflow' has no attribute 'variable_scope'
I am running TensorFlow version 2.1.0.
Here is the code that I am testing.
# imports
import os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Input data:
# For this tutorial we use the MNIST dataset. MNIST is a dataset of handwritten digits. If you are into machine learning, you might have heard of this dataset by now. MNIST is kind of benchmark of datasets for deep learning. One other reason that we use the MNIST is that it is easily accesible through Tensorflow. If you want to know more about the MNIST dataset you can check Yann Lecun's website. We can easily import the dataset and see the size of training, test and validation set:
# Import MNIST data
# from tensorflow.examples.tutorials.mnist import input_data
#import tensorflow_datasets as tfds
# Construct a tf.data.Dataset
#mnist = tfds.load(name="mnist", split=tfds.Split.TRAIN)
mnist = tf.keras.datasets.mnist
#mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
#print("Size of:")
#print("- Training-set:\t\t{}".format(len(mnist.train.labels)))
#print("- Test-set:\t\t{}".format(len(mnist.test.labels)))
#print("- Validation-set:\t{}".format(len(mnist.validation.labels)))
# hyper-parameters
logs_path = "C:/Users/ryans/MNIST_data/logs/embedding/" # path to the folder that we want to save the logs for Tensorboard
learning_rate = 0.001 # The optimization learning rate
epochs = 10 # Total number of training epochs
batch_size = 100 # Training batch size
display_freq = 100 # Frequency of displaying the training results
# Network Parameters
# We know that MNIST images are 28 pixels in each dimension.
img_h = img_w = 28
# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_h * img_w
# Number of classes, one class for each of 10 digits.
n_classes = 10
# number of units in the first hidden layer
h1 = 200
# Graph:
# Like before, we start by constructing the graph. But, we need to define some functions that we need rapidly in our code.
# weight and bais wrappers
def weight_variable(name, shape):
"""
Create a weight variable with appropriate initialization
:param name: weight name
:param shape: weight shape
:return: initialized weight variable
"""
initer = tf.truncated_normal_initializer(stddev=0.01)
return tf.get_variable('W_' + name,
dtype=tf.float32,
shape=shape,
initializer=initer)
def bias_variable(name, shape):
"""
Create a bias variable with appropriate initialization
:param name: bias variable name
:param shape: bias variable shape
:return: initialized bias variable
"""
initial = tf.constant(0., shape=shape, dtype=tf.float32)
return tf.get_variable('b_' + name,
dtype=tf.float32,
initializer=initial)
def fc_layer(x, num_units, name, use_relu=True):
"""
Create a fully-connected layer
:param x: input from previous layer
:param num_units: number of hidden units in the fully-connected layer
:param name: layer name
:param use_relu: boolean to add ReLU non-linearity (or not)
:return: The output array
"""
with tf.variable_scope(name):
in_dim = x.get_shape()[1]
W = weight_variable(name, shape=[in_dim, num_units])
tf.summary.histogram('W', W)
b = bias_variable(name, [num_units])
tf.summary.histogram('b', b)
layer = tf.matmul(x, W)
layer += b
if use_relu:
layer = tf.nn.relu(layer)
return layer
# Now that we have our helper functions we can create our graph.
# Create graph
# Placeholders for inputs (x), outputs(y)
with tf.compat.v1.variable_scope('Input'):
x = tf.compat.v1.placeholder(tf.float32, shape=[None, img_size_flat], name='X')
tf.summary.image('input_image', tf.reshape(x, (-1, img_w, img_h, 1)), max_outputs=5)
y = tf.compat.v1.placeholder(tf.float32, shape=[None, n_classes], name='Y')
fc1 = fc_layer(x, h1, 'Hidden_layer', use_relu=True)
output_logits = fc_layer(fc1, n_classes, 'Output_layer', use_relu=False)
# Define the loss function, optimizer, and accuracy
with tf.compat.v1.variable_scope('Train'):
with tf.compat.v1.variable_scope('Loss'):
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output_logits), name='loss')
tf.summary.scalar('loss', loss)
with tf.compat.v1.variable_scope('Optimizer'):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, name='Adam-op').minimize(loss)
with tf.compat.v1.variable_scope('Accuracy'):
correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')
tf.summary.scalar('accuracy', accuracy)
# Network predictions
cls_prediction = tf.argmax(output_logits, axis=1, name='predictions')
# Initializing the variables
init = tf.global_variables_initializer()
merged = tf.summary.merge_all()
# Session:
# Launch the graph (session)
sess = tf.InteractiveSession() # using InteractiveSession instead of Session to test network in separate cell
sess.run(init)
train_writer = tf.summary.FileWriter(logs_path, sess.graph)
num_tr_iter = int(mnist.train.num_examples / batch_size)
global_step = 0
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
for iteration in range(num_tr_iter):
batch_x, batch_y = mnist.train.next_batch(batch_size)
global_step += 1
# Run optimization op (backprop)
feed_dict_batch = {x: batch_x, y: batch_y}
_, summary_tr = sess.run([optimizer, merged], feed_dict=feed_dict_batch)
train_writer.add_summary(summary_tr, global_step)
if iteration % display_freq == 0:
# Calculate and display the batch loss and accuracy
loss_batch, acc_batch = sess.run([loss, accuracy],
feed_dict=feed_dict_batch)
print("iter {0:3d}:\t Loss={1:.2f},\tTraining Accuracy={2:.01%}".
format(iteration, loss_batch, acc_batch))
# Run validation after every epoch
feed_dict_valid = {x: mnist.validation.images, y: mnist.validation.labels}
loss_valid, acc_valid = sess.run([loss, accuracy], feed_dict=feed_dict_valid)
print('---------------------------------------------------------')
print("Epoch: {0}, validation loss: {1:.2f}, validation accuracy: {2:.01%}".
format(epoch + 1, loss_valid, acc_valid))
print('---------------------------------------------------------')
I think the code was designed for an earlier version of TensorFlow. I made a few small modifications to get the code to run on my laptop. Here's the part that I am struggling with.
# Placeholders for inputs (x), outputs(y)
with tf.compat.v1.variable_scope('Input'):
x = tf.compat.v1.placeholder(tf.float32, shape=[None, img_size_flat], name='X')
tf.summary.image('input_image', tf.reshape(x, (-1, img_w, img_h, 1)), max_outputs=5)
y = tf.compat.v1.placeholder(tf.float32, shape=[None, n_classes], name='Y')
fc1 = fc_layer(x, h1, 'Hidden_layer', use_relu=True)
output_logits = fc_layer(fc1, n_classes, 'Output_layer', use_relu=False)
The 'with' statement runs, but I am getting an error on this line:
fc1 = fc_layer(x, h1, 'Hidden_layer', use_relu=True)
I thought the change to 'tf.compat.v1' would oversome the issue of different TensorFlow versions, but apparently not.
I found the code sample here.
https://www.easy-tensorflow.com/tf-tutorials/tensorboard/tb-embedding-visualization
As placeholder is removed from tensorflow 2.0, compat.v1 must be used. However, another problem is incompatibility and can be solved by using tf.compat.v1.disable_eager_execution() before with tf.compat.v1.variable_scope(...):
In a way, you can turn on the eager execution by calling tf.compat.v1.enable_eager_execution
You may check https://www.tensorflow.org/guide/migrate
I have to build a binary classifier to predict whether the input video contains an action or not.
The input to the model will be of shape: [batch, frames, height, width, channel]
Here, batch is number of videos, frames is number of images in that video (It's fixed for every video), height is number of rows in that image, width is number of columns in that image, and channel is RGB colors.
I found in Andrej Karpathy blog that many to many Recurrent Neural Network is best for this application: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Thus, I need to implement this in TensorFlow:
I learned how to implement LSTM using this tutorial: https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py#L52
But, it is implementing many to one LSTM and predicting output and reducing loss using only last tensor: outputs[-1]
And, I want to predict output using many tensors (let say 4) and reduce loss using them.
Here's my implementation:
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np
# Training Parameters
batch = 5 # number of examples
frames = time_step_size = 20
height = 60
width = 80
channel = 3
lstm_size = 240
num_classes = 2
# Creating random data
input_x = np.random.normal(size=[batch, frames, height, width, channel])
input_y = np.zeros((batch, num_classes))
B = np.ones(batch)
input_y[:,1] = B
X = tf.placeholder("float", [None, frames, height, width, channel], name='InputData')
Y = tf.placeholder("float", [None, num_classes], name='LabelData')
with tf.name_scope('Model'):
XR = tf.reshape(X, [-1, height*width*channel]) # shape=(?, 14400)
X_split3 = tf.split(XR, time_step_size, 0) # 20 tensors of shape=(?, 14400)
lstm = rnn.BasicLSTMCell(lstm_size, forget_bias=1.0, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split3, dtype=tf.float32) # 20 tensors of shape=(?, 240)
logits = tf.layers.dense(outputs[-1], num_classes, name='logits') # shape=(?, 2)
prediction = tf.nn.softmax(logits)
# Define loss and optimizer
with tf.name_scope('Loss'):
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
with tf.name_scope('optimizer'):
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(loss_op)
# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
with tf.name_scope('Accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
with tf.Session() as sess:
tf.global_variables_initializer().run()
logits_output = sess.run(logits, feed_dict={X: input_x})
print(logits_output.shape) # shape=(5, 2)
sess.run(train_op, feed_dict={X: input_x, Y: input_y})
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: input_x, Y: input_y})
print("Loss: ", loss) # loss: 1.46626135e-05
print("Accuracy: ", acc) # Accuracy: 1.0
Problems:
1. I need help to implement many to many LSTM and predict output after certain frames (let say 4), but, I am only using last tensor outputs[-1] to reduce loss. There are 20 tensors, one for each frames or time_step_size. If I transform every 5th tensor: outputs[4], outputs[9], outputs[14], outputs[-1], I will get 4 logits. So, how I am going to reduce loss on all four of them?
2. One more problem is, I have to implement binary classifier, but I only have video of action I want to identify. So, the input_y is one hot representation of labels in which 1st column is always 0 and 2nd column is always 1 (action I have to identify), and I don’t have any example video in which 1st column's value is 1. Do you think it will work?
3. Why in above implement, in only one iteration the accuracy is 1?
Thanks
For 1., Dense takes any number of batch dimensions, so you should be able to transform into logits from all of the steps in one go (then likewise operate on a batch until you get a final loss for each step, then aggregate e.g. by taking the mean).
For 2. and 3., it seems like you need to find some negative examples. There's a literature on "positive and unlabeled (PU)" learning and "one-class classification" which may help.
I am following the tutorial at TF Slim. However at
loss = slim.losses.sum_of_squares(predictions, targets)
I seem to be getting AttributeError: 'module' object has no attribute 'sum_of_squares'. I have installed TF version 0.12head running on Ubuntu 16.04, CPU version. Complete code I am running is follows:
import matplotlib.pyplot as plt
import math
import numpy as np
import tensorflow as tf
import time
from datasets import dataset_utils
# Main slim library
slim = tf.contrib.slim
def regression_model(inputs, is_training=True, scope="deep_regression"):
"""Creates the regression model.
Args:
inputs: A node that yields a `Tensor` of size [batch_size, dimensions].
is_training: Whether or not we're currently training the model.
scope: An optional variable_op scope for the model.
Returns:
predictions: 1-D `Tensor` of shape [batch_size] of responses.
end_points: A dict of end points representing the hidden layers.
"""
with tf.variable_scope(scope, 'deep_regression', [inputs]):
end_points = {}
# Set the default weight _regularizer and acvitation for each fully_connected layer.
with slim.arg_scope([slim.fully_connected],
activation_fn=tf.nn.relu,
weights_regularizer=slim.l2_regularizer(0.01)):
# Creates a fully connected layer from the inputs with 32 hidden units.
net = slim.fully_connected(inputs, 32, scope='fc1')
end_points['fc1'] = net
# Adds a dropout layer to prevent over-fitting.
net = slim.dropout(net, 0.8, is_training=is_training)
# Adds another fully connected layer with 16 hidden units.
net = slim.fully_connected(net, 16, scope='fc2')
end_points['fc2'] = net
# Creates a fully-connected layer with a single hidden unit. Note that the
# layer is made linear by setting activation_fn=None.
predictions = slim.fully_connected(net, 1, activation_fn=None, scope='prediction')
end_points['out'] = predictions
return predictions, end_points
with tf.Graph().as_default():
# Dummy placeholders for arbitrary number of 1d inputs and outputs
inputs = tf.placeholder(tf.float32, shape=(None, 1))
outputs = tf.placeholder(tf.float32, shape=(None, 1))
# Build model
predictions, end_points = regression_model(inputs)
# Print name and shape of each tensor.
print "Layers"
for k, v in end_points.iteritems():
print 'name = {}, shape = {}'.format(v.name, v.get_shape())
# Print name and shape of parameter nodes (values not yet initialized)
print "\n"
print "Parameters"
for v in slim.get_model_variables():
print 'name = {}, shape = {}'.format(v.name, v.get_shape())
def produce_batch(batch_size, noise=0.3):
xs = np.random.random(size=[batch_size, 1]) * 10
ys = np.sin(xs) + 5 + np.random.normal(size=[batch_size, 1], scale=noise)
return [xs.astype(np.float32), ys.astype(np.float32)]
x_train, y_train = produce_batch(200)
x_test, y_test = produce_batch(200)
plt.scatter(x_train, y_train)
def convert_data_to_tensors(x, y):
inputs = tf.constant(x)
inputs.set_shape([None, 1])
outputs = tf.constant(y)
outputs.set_shape([None, 1])
return inputs, outputs
# The following snippet trains the regression model using a sum_of_squares loss.
ckpt_dir = '/tmp/regression_model/'
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
inputs, targets = convert_data_to_tensors(x_train, y_train)
# Make the model.
predictions, nodes = regression_model(inputs, is_training=True)
# Add the loss function to the graph.
loss = slim.losses.sum_of_squares(predictions, targets)
# The total loss is the uers's loss plus any regularization losses.
total_loss = slim.losses.get_total_loss()
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.005)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training inside a session.
final_loss = slim.learning.train(
train_op,
logdir=ckpt_dir,
number_of_steps=5000,
save_summaries_secs=5,
log_every_n_steps=500)
print("Finished training. Last batch loss:", final_loss)
print("Checkpoint saved in %s" % ckpt_dir)
Apparently for some reason, it has been removed in the latest build as can be seen at GitHub Repo. I switched to loss = slim.losses.mean_squared_error(predictions, targets) which should serve the purpose I assume.
I am new to TensorFlow and have difficulties understanding the RNN module. I am trying to extract hidden/cell states from an LSTM.
For my code, I am using the implementation from https://github.com/aymericdamien/TensorFlow-Examples.
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))}
biases = {'out': tf.Variable(tf.random_normal([n_classes]))}
def RNN(x, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Permuting batch_size and n_steps
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_input])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(0, n_steps, x)
# Define a lstm cell with tensorflow
#with tf.variable_scope('RNN'):
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out'], states
pred, states = RNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
Now I want to extract the cell/hidden state for each time step in a prediction. The state is stored in a LSTMStateTuple of the form (c,h), which I can find out by evaluating print states. However, trying to call print states.c.eval() (which according to the documentation should give me values in the tensor states.c), yields an error stating that my variables are not initialized even though I am calling it right after I am predicting something. The code for this is here:
# Launch the graph
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
for v in tf.get_collection(tf.GraphKeys.VARIABLES, scope='RNN'):
print v.name
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
print states.c.eval()
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
step += 1
print "Optimization Finished!"
and the error message is
InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
The states are also not visible in tf.all_variables(), only the trained matrix/bias tensors (as described here: Tensorflow: show or save forget gate values in LSTM). I don't want to build the whole LSTM from scratch though since I have the states in the states variable, I just need to call it.
You may simply collect the values of the states in the same way accuracy is collected.
I guess, pred, states, acc = sess.run(pred, states, accuracy, feed_dict={x: batch_x, y: batch_y}) should work perfectly fine.
One comment about your assumption: the "states" does have only the values of "hidden state" and "memory cell" from last timestep.
The "outputs" contain the "hidden state" from each time step you want (the size of outputs is [batch_size, seq_len, hidden_size]. So I assume that you want "outputs" variable, not "states". See the documentation.
I have to disagree with the answer of user3480922. For the code:
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
to be able to extract the hidden state for each time_step in a prediction, you have to use the outputs. Because outputs have the hidden state value for each time_step. However, I am not sure is there any way we can store the values of the cell state for each time_step as well. Because states tuple provides the cell state values but only for the last time_step.
For example, in the following sample with 5 time_steps, the outputs[4,:,:], time_step = 0,...,4 has the hidden state values for time_step=4, whereas the states tuple h only has the hidden state values for time_step=4. State tuple c has the cell value at the time_step=4 though.
outputs = [[[ 0.0589103 -0.06925126 -0.01531546 0.06108122]
[ 0.00861215 0.06067181 0.03790079 -0.04296958]
[ 0.00597713 0.03916606 0.02355802 -0.0277683 ]]
[[ 0.06252582 -0.07336216 -0.01607122 0.05024602]
[ 0.05464711 0.03219429 0.06635305 0.00753127]
[ 0.05385715 0.01259535 0.0524035 0.01696803]]
[[ 0.0853352 -0.06414541 0.02524283 0.05798233]
[ 0.10790729 -0.05008117 0.03003334 0.07391824]
[ 0.10205664 -0.04479517 0.03844892 0.0693808 ]]
[[ 0.10556188 0.0516542 0.09162509 -0.02726674]
[ 0.11425048 -0.00211394 0.06025286 0.03575509]
[ 0.11338984 0.02839304 0.08105748 0.01564003]]
**[[ 0.10072514 0.14767936 0.12387902 -0.07391471]
[ 0.10510238 0.06321315 0.08100517 -0.00940042]
[ 0.10553667 0.0984127 0.10094948 -0.02546882]]**]
states = LSTMStateTuple(c=array([[ 0.23870754, 0.24315512, 0.20842518, -0.12798975],
[ 0.23749796, 0.10797793, 0.14181322, -0.01695861],
[ 0.2413336 , 0.16692916, 0.17559692, -0.0453596 ]], dtype=float32), h=array(**[[ 0.10072514, 0.14767936, 0.12387902, -0.07391471],
[ 0.10510238, 0.06321315, 0.08100517, -0.00940042],
[ 0.10553667, 0.0984127 , 0.10094948, -0.02546882]]**, dtype=float32))