Tensorflow - Train only a subset of embedding matrix - python

I have an embedding matrix e defined as follows
e = tf.get_variable(name="embedding", shape=[n_e, d],
where n_e refers to the number of entities and d is the number of latent dimensions. For this example, say d=10.
optimizer = tf.train.GradientDescentOptimizer(0.01)
grads_and_vars = optimizer.compute_gradients(loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
The model is saved after training.
At some point later, new entities(e.g., 2) are added resulting in n_e_new. Now I would like to re-train the model, however retaining the embeddings for the already trained entities i.e., retraining only the delta (the 2 new entities).
I load the saved e and
init_e = np.zeros((n_e_new, d), dtype=np.float32)
r = list(range(n_e_new - 2))
init_e[r, :] = # load e from saved model
e = tf.get_variable(name="embedding", initializer=init_e)
gather_e = tf.nn.embedding_lookup(e, [n_e, n_e+1])
optimizer = tf.train.GradientDescentOptimizer(0.01)
grads_and_vars = optimizer.compute_gradients(loss, gather_e)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
I get an error at compute_gradients:
NotImplementedError: ('Trying to optimize unsupported type ', )
I understand that the second parameter gather_e to compute_gradients is not a variable but cannot figure out how to achieve this partial training/update.
P.S - I also had a look at this post, but cannot seem to find a solution there either.
Code sample(as per the approach suggested by #meruf):
if new_data_available:
e = tf.get_variable(name="embedding", shape=[n_e_new, 1, d],
e_old = tf.get_variable(name="embedding_old", initializer=<load e from saved model>, trainable=False)
e_new = tf.concat([e_old, e], 0)
e = tf.get_variable(name="embedding", shape=[n_e, d],
Lookup is as follows:
if new_data_available:
var_p = tf.nn.embedding_lookup(e_new, indices)
var_p = tf.nn.embedding_lookup(e, indices)
loss = #some operations on var_p and other variabes that are a result of the lookup above
The issue is that when new_data_available is true, neither e nor e_new change during each epoch. They remain same.

You should not change code at optimizer level. You can easily tell tensorflow which variable is trainable or not.
Let's take a look at tf.getVariable() defination,
Here trainable parameter represents that if the parameter is trainable or not. When you do not want to train a parameter then make it false.
for your case make 2 set of variable. One is trainable=True and for other trainable=false.
Assume you have 100 pretrained variable and 10 new variables to train. Now load the pretrained variable to A and new variables to B.
For implementation details, you should take a look at tf.cond function for runtime decisions. Mostly for lookup. because now your new B embeddings have index starting from 0. But you may have assigned them from # of pretrained embedding+1 in your dataset or program. So in tensorflow you can take runtime decision that
if index_number is >= number of pretrained embedding
index_number = index_number - number of pretrained embedding
look_up on B matrix
look_up on A matrix
An Ipython Notebook of the example. (slightly different than the example given here.)
Let's take look at an example what I meant,
at first load the library
import tensorflow as tf
declare the placeholders
y_ = tf.placeholder(tf.float32, [None, 2])
x = tf.placeholder(tf.int32, [None])
z = tf.placeholder(tf.bool, []) # is the example in the x contains new data or not
create the network
e = tf.get_variable(name="embedding", shape=[5,10],initializer=tf.contrib.layers.xavier_initializer(uniform=False))
e_old = tf.get_variable(name="embedding1", shape=[5,10],initializer=tf.contrib.layers.xavier_initializer(uniform=False),trainable=False)
out = tf.cond(z,lambda : e, lambda : e_old)
lookup = tf.nn.embedding_lookup(out,x)
W = tf.get_variable(name="weight", shape=[10,2],initializer=tf.contrib.layers.xavier_initializer(uniform=False))
l = tf.nn.relu(tf.matmul(lookup,W))
y = tf.nn.softmax(l)
calculate loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
optimize loss
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
load and run the graph
sess = tf.InteractiveSession()
print the initialized value
We are printing the values so that we can check later if our value changes or not.
e_out_tf,e_out_old_tf = sess.run([e,e_old])
print("New Data ", e_out_tf)
print("Old Data", e_out_old_tf)
('New Data ', array([[-0.38952214, -0.37217963, 0.11370762, -0.13024905, 0.11420489,
-0.09138191, 0.13781562, -0.1624797 , -0.27410012, -0.5404499 ],
[-0.0065698 , 0.04728106, 0.53637034, -0.13864517, -0.36171854,
0.40325132, 0.7172644 , -0.28067762, -0.0258827 , -0.5615116 ],
[-0.17240004, 0.3765518 , 0.4658525 , 0.16545495, -0.37515178,
-0.39557686, -0.50662124, -0.06570222, -0.3605038 , 0.13746035],
[ 0.19647208, -0.16588202, 0.5739292 , 0.43803877, -0.05350745,
0.71350956, 0.39937392, -0.45939735, 0.09050641, -0.18077391],
[-0.05588558, 0.7295865 , 0.42288807, 0.57227516, 0.7268311 ,
-0.1194113 , 0.28589466, 0.09422033, -0.10094754, 0.3942643 ]],
('Old Data', array([[ 0.5308224 , -0.14003026, -0.7685277 , 0.06644323, -0.02585996,
-0.1713268 , 0.04987739, 0.01220775, 0.33571896, 0.19891626],
[ 0.3288728 , -0.09298109, 0.14795913, 0.21343362, 0.14123142,
-0.19770677, 0.7366793 , 0.38711038, 0.37526497, 0.440099 ],
[-0.29200613, 0.4852043 , 0.55407804, -0.13675605, -0.2815263 ,
-0.00703347, 0.31396288, -0.7152872 , 0.0844975 , 0.4210107 ],
[ 0.5046112 , 0.3085646 , 0.19497707, -0.5193338 , -0.0429871 ,
-0.5231836 , -0.38976955, -0.2300536 , -0.00906788, -0.1689194 ],
[-0.1231837 , 0.54029703, 0.45702592, -0.07886257, -0.6420077 ,
-0.24090563, -0.02165782, -0.44103763, -0.20914222, 0.40911582]],
test case
Now we will test our theory if
1. non-trainable variable changes or not
2. trainable variable changes or not.
We declared an additional placeholder z to indicate if the our input ontains new data or old data.
Here, index 0 contains new data that is trainable if z is True.
feed_dict={x: [0],z:True}
lookup_tf = sess.run([lookup], feed_dict=feed_dict)
check that the value matches with above value.
[array([[-0.38952214, -0.37217963, 0.11370762, -0.13024905, 0.11420489,
-0.09138191, 0.13781562, -0.1624797 , -0.27410012, -0.5404499 ]],
we will send z=True to indicate on which embedding you want to lookup.
So while you send a batch make sure that the batch contains only either old data or new data.
feed_dict={x: [0], y_: [[0,1]], z:True}
_, = sess.run([train_step], feed_dict=feed_dict)
lookup_tf = sess.run([lookup], feed_dict=feed_dict)
after training let's check is it behaves ok or not.
[array([[-0.559212 , -0.362611 , 0.06011545, -0.02056453, 0.26133284,
-0.24933788, 0.18598196, -0.00602196, -0.12775017, -0.6666256 ]],
See index 0 contains new data that is trainable and changes from previous value because of SGD update.
let's try the opposite
feed_dict={x: [0], y_: [[0,1]], z:False}
lookup_tf = sess.run([lookup], feed_dict=feed_dict)
_, = sess.run([train_step], feed_dict=feed_dict)
lookup_tf = sess.run([lookup], feed_dict=feed_dict)
[array([[ 0.5308224 , -0.14003026, -0.7685277 , 0.06644323, -0.02585996,
-0.1713268 , 0.04987739, 0.01220775, 0.33571896, 0.19891626]],
[array([[ 0.5308224 , -0.14003026, -0.7685277 , 0.06644323, -0.02585996,
-0.1713268 , 0.04987739, 0.01220775, 0.33571896, 0.19891626]],


How to get prediction scores between 0 and 1 (or -1 and 1)?

I am training a model that adds a couple of layers to the predifined VGGish network (see github repo), so that it can predict the class of input logmel spectrograms extracted from audio files (full code at bottom).
I generate X_train, X_test, y_train, y_test sets from a previous function first and then run the main() codeblock. This predicts the classes of the X_test at line 78 and prints these:
predictions_sigm = logits.eval(feed_dict = {features_input:X_test})
[[ -9.074987 8.840093 -8.426974 ]
[ -9.376444 9.13514 -8.79967 ]
[-10.03653 -7.725624 7.2162223]
[ -9.650997 9.308293 -8.9559 ]
[ 7.789041 -7.8485446 -9.8974285]
[ 7.7869387 -7.850354 -9.899081 ]
[-10.4985485 -8.368322 7.558868 ]
[-10.306433 -8.043555 7.4093537]
[ 7.787068 -7.850254 -9.898217 ]
[ 7.789579 -7.851698 -9.90515 ]
[ 7.787512 -7.8483863 -9.90212 ]
[ -9.28933 9.058059 -8.713937 ]
[ 7.7886 -7.8486743 -9.901876 ]
[ 7.7899137 -7.8464875 -9.899316 ]
[-10.434939 -8.171508 7.459009 ]
[-10.714449 -8.394194 7.642472 ]
[-10.564347 -8.165948 7.6475844]
[ -9.63355 9.158067 -8.794765 ]
[ -9.501944 9.241178 -8.889491 ]]
My main query is how do I get the array to instead print like this, where it returns 0's and 1's or values between -1 to 1 for each prediction, which I can then convert to 0's and 1's:
[[ 0 1 0 ]
[ 0 1 0 ]
[ 0 0 1]
[ 0 1 0 ]]
I thought this could be done using predictions_sigm = prediction.eval(...) for this line (78) instead of predictions_sigm = logits.eval(...), as it appeared to be named 'prediction' and use sigmoid some how, at line 27 tf.sigmoid(logits, name='prediction'), but using this gives a 'NameError: name 'prediction' is not defined'.
If presented as a range of values, either -11 to 10 or -1 to 0, are their values useful for something?
Full code:
#run using:
#python vggish_train_demo.py --num_batches 100
batch_size = 10
def main(X):
with tf.Graph().as_default(), tf.Session() as sess:
# Define VGGish.
embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
# Define a shallow classification model and associated training ops on top
# of VGGish.
with tf.variable_scope('mymodel'):
# Add a fully connected layer with 100 units. Add an activation function
# to the embeddings since they are pre-activation.
num_units = 100
fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
logits = slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='logits')
tf.sigmoid(logits, name='prediction')
# Add training ops.
with tf.variable_scope('train'):
global_step = tf.train.create_global_step()
# Labels are assumed to be fed as a batch multi-hot vectors, with
# a 1 in the position of each positive class label, and 0 elsewhere.
labels_input = tf.placeholder(
tf.float32, shape=(None, _NUM_CLASSES), name='labels')
# Cross-entropy label loss.
xent = tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels_input, name='xent')
loss = tf.reduce_mean(xent, name='loss_op')
tf.summary.scalar('loss', loss)
# We use the same optimizer and hyperparameters as used to train VGGish.
optimizer = tf.train.AdamOptimizer(
train_op = optimizer.minimize(loss, global_step=global_step)
# Initialize all variables in the model, and then load the pre-trained
# VGGish checkpoint.
vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
# The training loop.
features_input = sess.graph.get_tensor_by_name(
for epoch in range(FLAGS.num_batches):
epoch_loss = 0
while i < len(X_train):
start = i
end = i+batch_size
batch_x = np.array(X_train[start:end])
batch_y = np.array(y_train[start:end])
_, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
epoch_loss += c
print('Epoch', epoch+1, 'completed out of',FLAGS.num_batches,', loss:',epoch_loss)
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1))
print('Accuracy:',accuracy.eval({features_input:X_test, labels_input:y_test}))
predictions = logits.eval(feed_dict = {features_input:X_test})
print(predictions) #shows table of predictions
#Saves csv file of table of predictions for test data
time = datetime.now().strftime('%H.%M.%S')
np.savetxt("test_predictions_"+time+".csv", predictionsm, delimiter=",") #put 'r"r'C:\Users\bw339\...\test_predictions' to save in a different folder
if __name__ == '__main__':
#think the 'An exception has occurred, use %tb to see the full traceback.' is a jupyter thing, hopefully won't happen
#when run in conda or bash
Edit for ahmet hamza emra
def main(X):
with tf.Graph().as_default(), tf.Session() as sess:
# Define VGGish.
embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
#embeddings = vggish_slim.define_vggish_slim(features_tensor= X_train, training=FLAGS.train_vggish) #gives an error that arrays are not right type. no idea why as the shpae of X[0] matches what vggish_slim_define() asks for
#prediction = vggish_slim.define_vggish_slim(X)
# Define a shallow classification model and associated training ops on top
# of VGGish.
with tf.variable_scope('mymodel'):
# Add a fully connected layer with 100 units. Add an activation function
# to the embeddings since they are pre-activation.
num_units = 100
fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
#logits = slim.fully_connected( ### logits threw me, would be easier to name this 'end model' or something
# fc, _NUM_CLASSES, activation_fn=None, scope='logits')
#tf.sigmoid(logits, name='prediction')
linear_out= slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
logits = tf.sigmoid(logits, name='logits')
# Add training ops.
with tf.variable_scope('train'):
global_step = tf.train.create_global_step()
# Labels are assumed to be fed as a batch multi-hot vectors, with
# a 1 in the position of each positive class label, and 0 elsewhere.
labels_input = tf.placeholder(
tf.float32, shape=(None, _NUM_CLASSES), name='labels')
# Cross-entropy label loss.
xent = tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels_input, name='xent') ###=labels is selecting my 'y', logits is like a precursor to predictions?
loss = tf.reduce_mean(xent, name='loss_op')
tf.summary.scalar('loss', loss)
# We use the same optimizer and hyperparameters as used to train VGGish.
optimizer = tf.train.AdamOptimizer(
train_op = optimizer.minimize(loss, global_step=global_step)
# Initialize all variables in the model, and then load the pre-trained
# VGGish checkpoint.
sess.run(tf.global_variables_initializer()) ### this starts the session appaz
vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
# The training loop.
features_input = sess.graph.get_tensor_by_name(
for epoch in range(FLAGS.num_batches):
epoch_loss = 0
while i < len(X_train):
start = i
end = i+batch_size
batch_x = np.array(X_train[start:end])
batch_y = np.array(y_train[start:end])
_, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
epoch_loss += c
print('Epoch', epoch+1, 'completed out of',FLAGS.num_batches,', loss:',epoch_loss)
#Get accuracy if executed on test data
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) #This line returns the max value of each array, which we want o be the same (think the prediction/logits is value given to each class with the highest value being the best match)
accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
print('Accuracy:',accuracy.eval({features_input:X_test, labels_input:y_test})) #TF is smart so just knows to feed it through the model without us seeming to tell it to. .eval() uses the current session which I guess is my model?
#Save predictions for test data
predictions_sigm = logits.eval(feed_dict = {features_input:X_test}) #not really _sigm, change back later
#print(predictions_sigm) #shows table of predictions
test_preds = pd.DataFrame(predictions_sigm, columns = col_names) #converts predictions to df
true_class = np.argmax(y_test, axis = 1) #This saves the true class
test_preds['True class'] = true_class #This adds true class to the df
#Saves csv file of table of predictions for test data. NB. header will not save when using np.text for some reason
time = datetime.now().strftime('%H.%M.%S')
#np.savetxt("test_predictions_"+time+".csv", test_preds.values, delimiter=",") #put 'r"r'C:\Users\bw339\...\test_predictions' to save in a different folder
##Save model
#saver = tf.train.Saver()
#saver.save(sess, 'my-test-model')
if __name__ == '__main__':
#think the 'An exception has occurred, use %tb to see the full traceback.' is a jupyter thing, hopefully won't happen
#when run in conda or bash
You are outputing the linear-layer before the sigmoid. Change the code as following:
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
linear_out= slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
logits = tf.sigmoid(linear_out, name='logits')
This will ensure you output the values between 0 and 1.
Note: Your evaluation is not considering multi-class classification, argmax will return the index of the largest value which in your case will be single output.

Keras LSTM layer output and the output of a numpy LSTM implementation are similar but not same with the same weights and Input

I modeled a two layered LSTM Keras model then I compared the output of the first LSTM layer with my simple python implementation of the LSTM layer by feeding in the same weights and Inputs. The results for the first sequence of a batch are similar but not same and from the second sequence the results deviates too far.
Below is my keras model:
For comparison of the Keras model I first created an intermediate layer where the intermediate layer outputs the result of the first layer with print(intermediate_output[0,0])for the first sequence and print(intermediate_output[0][1]) for the second sequence of the same batch then print(intermediate_output[0][127]) for the last sequence.
inputs = Input(shape=(128,9))
f2=LSTM((n_hidden), return_sequences=False,name='lstm2')(f1)
model2 = Model(inputs=inputs, outputs=fc)
layer_name = 'lstm1'
intermediate_layer_model = Model(inputs=model2.input,
intermediate_output = intermediate_layer_model.predict(X_single_sequence[0,:,:])
print(intermediate_output[0,0]) # takes input[0][9]
print(intermediate_output[0][1]) # takes input[1][9] and hidden layer output of intermediate_output[0,0]
Re-Implemented first layer of the same model:
I defined LSTMlayer function where it does the same computation....after that weightLSTM loads the saved weights and x_t the same input sequence and later on h_t contains outputs for the next sequence. intermediate_out is a function corresponding to that of LSTM layer.
def sigmoid(x):
def LSTMlayer(warr,uarr, barr,x_t,h_tm1,c_tm1):
c_tm1 = np.array([0,0]).reshape(1,2)
h_tm1 = np.array([0,0]).reshape(1,2)
x_t = np.array([1]).reshape(1,1)
warr.shape = (nfeature,hunits*4)
uarr.shape = (hunits,hunits*4)
barr.shape = (hunits*4,)
s_t = (x_t.dot(warr) + h_tm1.dot(uarr) + barr)
hunit = uarr.shape[0]
i = sigmoid(s_t[:,:hunit])
f = sigmoid(s_t[:,1*hunit:2*hunit])
_c = np.tanh(s_t[:,2*hunit:3*hunit])
o = sigmoid(s_t[:,3*hunit:])
c_t = i*_c + f*c_tm1
h_t = o*np.tanh(c_t)
weightLSTM = model2.layers[1].get_weights()
warr,uarr, barr = weightLSTM
def intermediate_out(n,warr,uarr,barr,X_test):
for i in range(0, n+1):
if i==0:
c_tm1 = np.array([0]*hunits, dtype=np.float32).reshape(1,32)
h_tm1 = np.array([0]*hunits, dtype=np.float32).reshape(1,32)
h_t,ct = LSTMlayer(warr,uarr, barr,X_test[0][0:1][0:9],h_tm1,c_tm1)
h_t,ct = LSTMlayer(warr,uarr, barr,X_test[0][i:i+1][0:9],h_t,ct)
return h_t
# 1st sequence
ht0 = intermediate_out(0,warr,uarr,barr,X_test)
# 2nd sequence
ht1 = intermediate_out(1,warr,uarr,barr,X_test)
# 128th sequence
ht127 = intermediate_out(127,warr,uarr,barr,X_test)
The outputs of the keras LSTM layer from print(intermediate_output[0,0]) are as follows:
array([-0.05616369, -0.02299516, -0.00801201, 0.03872827, 0.07286803,
-0.0081161 , 0.05235862, -0.02240333, 0.0533984 , -0.08501752,
-0.04866522, 0.00254417, -0.05269946, 0.05809477, -0.08961852,
0.03975506, 0.00334282, -0.02813114, 0.01677909, -0.04411673,
-0.06751891, -0.02771493, -0.03293832, 0.04311397, -0.09430656,
-0.00269871, -0.07775293, -0.11201388, -0.08271968, -0.07464679,
-0.03533605, -0.0112953 ], dtype=float32)
and the outputs of my implementation from print(ht0) are:
array([[-0.05591469, -0.02280132, -0.00797964, 0.03681555, 0.06771626,
-0.00855897, 0.05160453, -0.02309707, 0.05746563, -0.08988875,
-0.05093143, 0.00264367, -0.05087904, 0.06033305, -0.0944235 ,
0.04066657, 0.00344291, -0.02881387, 0.01696692, -0.04101779,
-0.06718517, -0.02798996, -0.0346873 , 0.04402719, -0.10021093,
-0.00276826, -0.08390114, -0.1111543 , -0.08879325, -0.07953986,
-0.03261982, -0.01175724]], dtype=float32)
The outputs from print(intermediate_output[0][1]):
array([-0.13193817, -0.03231169, -0.02096735, 0.07571879, 0.12657365,
0.00067896, 0.09008797, -0.05597101, 0.09581321, -0.1696091 ,
-0.08893952, -0.0352162 , -0.07936387, 0.11100324, -0.19354928,
0.09691346, -0.0057206 , -0.03619875, 0.05680932, -0.08598096,
-0.13047703, -0.06360915, -0.05707538, 0.09686109, -0.18573627,
0.00711019, -0.1934243 , -0.21811798, -0.15629804, -0.17204499,
-0.07108577, 0.01727455], dtype=float32)
array([[-1.34333193e-01, -3.36792655e-02, -2.06091907e-02,
7.15097040e-02, 1.18231244e-01, 7.98894180e-05,
9.03479978e-02, -5.85013032e-02, 1.06357656e-01,
-1.82848617e-01, -9.50253978e-02, -3.67032290e-02,
-7.70251378e-02, 1.16113290e-01, -2.08772928e-01,
9.89214852e-02, -5.82863577e-03, -3.79538871e-02,
6.01535551e-02, -7.99121782e-02, -1.31876275e-01,
-6.66067824e-02, -6.15542643e-02, 9.91254672e-02,
-2.00229391e-01, 7.51443207e-03, -2.13641390e-01,
-2.18286291e-01, -1.70858681e-01, -1.88928470e-01,
-6.49823472e-02, 1.72227081e-02]], dtype=float32)
array([-0.46212202, 0.280646 , 0.514289 , -0.21109435, 0.53513926,
0.20116206, 0.24579187, 0.10773794, -0.6350403 , -0.0052841 ,
-0.15971565, 0.00309152, 0.04909453, 0.29789132, 0.24909772,
0.12323025, 0.15282209, 0.34281147, -0.2948742 , 0.03674917,
-0.22213924, 0.17646286, -0.12948939, 0.06568322, 0.04172657,
-0.28638166, -0.29086435, -0.6872528 , -0.12620741, 0.63395363,
-0.37212485, -0.6649531 ], dtype=float32)
array([[-0.47431907, 0.29702517, 0.5428258 , -0.21381126, 0.6053808 ,
0.22849198, 0.25656056, 0.10378123, -0.6960949 , -0.09966939,
-0.20533416, -0.01677105, 0.02512029, 0.37508538, 0.35703233,
0.14703275, 0.24901289, 0.35873395, -0.32249793, 0.04093777,
-0.20691746, 0.20096642, -0.11741923, 0.06169611, 0.01019177,
-0.33316574, -0.08499744, -0.6748463 , -0.06659956, 0.71961826,
-0.4071832 , -0.6804066 ]], dtype=float32)
The outputs from (print(intermediate_output[0,0]), print(h_t[0])) and (print(intermediate_output[0][1]), print(h_t1)) are similar...but the output from print(intermediate_output[0][127]) and print(h_t127) not same and both the algorithms are implemented on the same gpu...
I saw the keras documentation and to me it seems that I am not doing anything wrong....Please comment on this and let me know that what else am I missing here ??

Tensorflow minimise with respect to only some elements of a variable

Is it possible to minimise a loss function by changing only some elements of a variable? In other words, if I have a variable X of length 2, how can I minimise my loss function by changing X[0] and keeping X[1] constant?
Hopefully this code I have attempted will describe my problem:
import tensorflow as tf
import tensorflow.contrib.opt as opt
X = tf.Variable([1.0, 2.0])
X0 = tf.Variable([3.0])
Y = tf.constant([2.0, -3.0])
scatter = tf.scatter_update(X, [0], X0)
with tf.control_dependencies([scatter]):
loss = tf.reduce_sum(tf.squared_difference(X, Y))
opt = opt.ScipyOptimizerInterface(loss, [X0])
init = tf.global_variables_initializer()
with tf.Session() as sess:
print("X: {}".format(X.eval()))
print("X0: {}".format(X0.eval()))
which outputs:
INFO:tensorflow:Optimization terminated with:
Objective function value: 26.000000
Number of iterations: 0
Number of functions evaluations: 1
X: [3. 2.]
X0: [3.]
where I would like to to find the optimal value of X0 = 2 and thus X = [2, 2]
Motivation for doing this: I would like to import a trained graph/model and then tweak various elements of some of the variables depending on some new data I have.
You can use this trick to restrict the gradient calculation to one index:
import tensorflow as tf
import tensorflow.contrib.opt as opt
X = tf.Variable([1.0, 2.0])
part_X = tf.scatter_nd([[0]], [X[0]], [2])
X_2 = part_X + tf.stop_gradient(-part_X + X)
Y = tf.constant([2.0, -3.0])
loss = tf.reduce_sum(tf.squared_difference(X_2, Y))
opt = opt.ScipyOptimizerInterface(loss, [X])
init = tf.global_variables_initializer()
with tf.Session() as sess:
print("X: {}".format(X.eval()))
part_X becomes the value you want to change in a one-hot vector of the same shape as X. part_X + tf.stop_gradient(-part_X + X) is the same as X in the forward pass, since part_X - part_X is 0. However in the backward pass the tf.stop_gradient prevents all unnecessary gradient calculations.
I'm not sure if it is possible with the SciPy optimizer interface, but using one of the regular tf.train.Optimizer subclasses you can do something like that by calling compute_gradients first, then masking the gradients and then calling apply_gradients,
instead of calling minimize (which, as the docs say, basically calls the previous ones).
import tensorflow as tf
X = tf.Variable([3.0, 2.0])
# Select updatable parameters
X_mask = tf.constant([True, False], dtype=tf.bool)
Y = tf.constant([2.0, -3.0])
loss = tf.reduce_sum(tf.squared_difference(X, Y))
opt = tf.train.GradientDescentOptimizer(learning_rate=0.1)
# Get gradients and mask them
((X_grad, _),) = opt.compute_gradients(loss, var_list=[X])
X_grad_masked = X_grad * tf.cast(X_mask, dtype=X_grad.dtype)
# Apply masked gradients
train_step = opt.apply_gradients([(X_grad_masked, X)])
init = tf.global_variables_initializer()
with tf.Session() as sess:
for i in range(10):
_, X_val = sess.run([train_step, X])
print("Step {}: X = {}".format(i, X_val))
print("Final X = {}".format(X.eval()))
Step 0: X = [ 2.79999995 2. ]
Step 1: X = [ 2.63999987 2. ]
Step 2: X = [ 2.51199985 2. ]
Step 3: X = [ 2.40959978 2. ]
Step 4: X = [ 2.32767987 2. ]
Step 5: X = [ 2.26214385 2. ]
Step 6: X = [ 2.20971513 2. ]
Step 7: X = [ 2.16777205 2. ]
Step 8: X = [ 2.13421774 2. ]
Step 9: X = [ 2.10737419 2. ]
Final X = [ 2.10737419 2. ]
This should be pretty easy to do by using the var_list parameter of the minimize function.
trainable_var = X[0]
train_op = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(loss, var_list=[trainable_var])
You should note that by convention all trainable variables are added to the tensorflow default collection GraphKeys.TRAINABLE_VARIABLES, so you can get a list of all trainable variables using:
all_trainable_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
This is just a list of variables which you can manipulate as you see fit and use as the var_list parameter.
As a tangent to your question, if you ever want to take customizing the optimization process a step further you can also compute the gradients manually using grads = tf.gradients(loss, var_list) manipulate the gradients as you see fit, then call tf.train.GradientDescentOptimizer(...).apply_gradients(grads_and_vars_as_list_of_tuples). Under the hood minimize is just doing these two steps for you.
Also note that you are perfectly free to create different optimizers for different collections of variables. You could create an SGD optimizer with learning rate 1e-4 for some variables, and another Adam optimizer with learning rate 1e-2 for another set of variables. Not that there's any specific use case for this, I'm just pointing out the flexibility you now have.
The answer by Oren in the second link below calls a function (defined in the first link) that takes a Boolean hot matrix of the parameters to optimize and the tensor of parameters. It uses stop_gradient and works like a charm for a neural network I developed.
Update only part of the word embedding matrix in Tensorflow

InvalidArgumentError: You must feed a value for placeholder tensor 'ground_truth' with dtype double

I am trying to understand the transfer learning through Tensorflow. But I am getting the stated error.
This is my code
def add_final_training_ops(graph, class_count, final_tensor_name,
"""Adds a new softmax and fully-connected layer for training.
We need to retrain the top layer to identify our new classes, so this function
adds the right operations to the graph, along with some variables to hold the
weights, and then sets up all the gradients for the backward pass.
The set up for the softmax and fully-connected layers is based on:
graph: Container for the existing model's Graph.
class_count: Integer of how many categories of things we're trying to
final_tensor_name: Name string for the new final node that produces results.
ground_truth_tensor_name: Name string of the node we feed ground truth data
bottleneck_tensor1 = graph.get_tensor_by_name(ensure_name_has_port(
bottleneck_tensor = tf.placeholder_with_default(bottleneck_tensor1, shape=[None, 2048])
layer_weights = tf.Variable(
tf.truncated_normal([BOTTLENECK_TENSOR_SIZE, class_count], stddev=0.001),
layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
logits = tf.matmul(bottleneck_tensor, layer_weights,
name='final_matmul') + layer_biases
tf.nn.softmax(logits, name=final_tensor_name)
ground_truth_placeholder = tf.placeholder(tf.float64,
[None, class_count],
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=ground_truth_placeholder)
cross_entropy_mean = tf.reduce_mean(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(
return train_step, cross_entropy_mean
def do_train(sess,X_input, Y_input, X_validation, Y_validation):
ground_truth_tensor_name = 'ground_truth'
mini_batch_size = 10
n_train = X_input.shape[0]
graph = create_graph()
train_step, cross_entropy = add_final_training_ops(
graph, len(classes), FLAGS.final_tensor_name,
init = tf.initialize_all_variables()
evaluation_step = add_evaluation_step(graph, FLAGS.final_tensor_name, ground_truth_tensor_name)
# Get some layers we'll need to access during training.
bottleneck_tensor1 = graph.get_tensor_by_name(ensure_name_has_port(BOTTLENECK_TENSOR_NAME))
bottleneck_tensor = tf.placeholder_with_default(bottleneck_tensor1, shape=[None, 2048])
ground_truth_tensor1 = graph.get_tensor_by_name(ensure_name_has_port(ground_truth_tensor_name))
ground_truth_tensor = tf.placeholder_with_default(ground_truth_tensor1, shape=[None, len(classes)])
epocs = 1
for epoch in range(epocs):
shuffledRange = np.random.permutation(n_train)
y_one_hot_train = encode_one_hot(len(classes), Y_input)
y_one_hot_validation = encode_one_hot(len(classes), Y_validation)
shuffledX = X_input[shuffledRange,:]
shuffledY = y_one_hot_train[shuffledRange]
for Xi, Yi in iterate_mini_batches(shuffledX, shuffledY, mini_batch_size):
print Xi.shape
print type(Xi)
print type(Yi)
print Yi.shape
print Yi.dtype
print Yi[0]
feed_dict={bottleneck_tensor: Xi,
ground_truth_tensor: Yi})
Print statements has the following outputs :
(10, 2048)
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
(10, 5)
[ 0. 0. 0. 1. 0.]
I am getting the error at :
sess.run(train_step,feed_dict={bottleneck_tensor: Xi,ground_truth_tensor: Yi})
Can someone tell me why I am facing this error?
The problem is that you created a placeholder in add_final_training_ops that you don't feed. You might think that the placeholder ground_truth_tensor that you create in add_final_training_ops is the same, but it is not, it is a new one, even if it is initialized by the former.
The easiest fix would be perhaps to return the placeholder from add_final_training_ops and use this one instead.

How to extract the cell state and hidden state from an RNN model in tensorflow?

I am new to TensorFlow and have difficulties understanding the RNN module. I am trying to extract hidden/cell states from an LSTM.
For my code, I am using the implementation from https://github.com/aymericdamien/TensorFlow-Examples.
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))}
biases = {'out': tf.Variable(tf.random_normal([n_classes]))}
def RNN(x, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Permuting batch_size and n_steps
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_input])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(0, n_steps, x)
# Define a lstm cell with tensorflow
#with tf.variable_scope('RNN'):
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out'], states
pred, states = RNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
Now I want to extract the cell/hidden state for each time step in a prediction. The state is stored in a LSTMStateTuple of the form (c,h), which I can find out by evaluating print states. However, trying to call print states.c.eval() (which according to the documentation should give me values in the tensor states.c), yields an error stating that my variables are not initialized even though I am calling it right after I am predicting something. The code for this is here:
# Launch the graph
with tf.Session() as sess:
step = 1
# Keep training until reach max iterations
for v in tf.get_collection(tf.GraphKeys.VARIABLES, scope='RNN'):
print v.name
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
print states.c.eval()
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
step += 1
print "Optimization Finished!"
and the error message is
InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
The states are also not visible in tf.all_variables(), only the trained matrix/bias tensors (as described here: Tensorflow: show or save forget gate values in LSTM). I don't want to build the whole LSTM from scratch though since I have the states in the states variable, I just need to call it.
You may simply collect the values of the states in the same way accuracy is collected.
I guess, pred, states, acc = sess.run(pred, states, accuracy, feed_dict={x: batch_x, y: batch_y}) should work perfectly fine.
One comment about your assumption: the "states" does have only the values of "hidden state" and "memory cell" from last timestep.
The "outputs" contain the "hidden state" from each time step you want (the size of outputs is [batch_size, seq_len, hidden_size]. So I assume that you want "outputs" variable, not "states". See the documentation.
I have to disagree with the answer of user3480922. For the code:
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
to be able to extract the hidden state for each time_step in a prediction, you have to use the outputs. Because outputs have the hidden state value for each time_step. However, I am not sure is there any way we can store the values of the cell state for each time_step as well. Because states tuple provides the cell state values but only for the last time_step.
For example, in the following sample with 5 time_steps, the outputs[4,:,:], time_step = 0,...,4 has the hidden state values for time_step=4, whereas the states tuple h only has the hidden state values for time_step=4. State tuple c has the cell value at the time_step=4 though.
outputs = [[[ 0.0589103 -0.06925126 -0.01531546 0.06108122]
[ 0.00861215 0.06067181 0.03790079 -0.04296958]
[ 0.00597713 0.03916606 0.02355802 -0.0277683 ]]
[[ 0.06252582 -0.07336216 -0.01607122 0.05024602]
[ 0.05464711 0.03219429 0.06635305 0.00753127]
[ 0.05385715 0.01259535 0.0524035 0.01696803]]
[[ 0.0853352 -0.06414541 0.02524283 0.05798233]
[ 0.10790729 -0.05008117 0.03003334 0.07391824]
[ 0.10205664 -0.04479517 0.03844892 0.0693808 ]]
[[ 0.10556188 0.0516542 0.09162509 -0.02726674]
[ 0.11425048 -0.00211394 0.06025286 0.03575509]
[ 0.11338984 0.02839304 0.08105748 0.01564003]]
**[[ 0.10072514 0.14767936 0.12387902 -0.07391471]
[ 0.10510238 0.06321315 0.08100517 -0.00940042]
[ 0.10553667 0.0984127 0.10094948 -0.02546882]]**]
states = LSTMStateTuple(c=array([[ 0.23870754, 0.24315512, 0.20842518, -0.12798975],
[ 0.23749796, 0.10797793, 0.14181322, -0.01695861],
[ 0.2413336 , 0.16692916, 0.17559692, -0.0453596 ]], dtype=float32), h=array(**[[ 0.10072514, 0.14767936, 0.12387902, -0.07391471],
[ 0.10510238, 0.06321315, 0.08100517, -0.00940042],
[ 0.10553667, 0.0984127 , 0.10094948, -0.02546882]]**, dtype=float32))
