Thanks for your help. I am coding a multiclass binary classifier for facial actions (such as raised eyebrow, parted lips), and I want to make a confusion matrix. There are 6 facial actions and 593 samples. I'm getting this error: I'm getting this error: "Shape (?, 2, 6) must have rank 2". From documentation, tf.confusion_matrix takes 1-D vectors, but I think there should be a way to shape the input data from the feed_dict so that it works, based on Tensorflow Confusion Matrix in TensorBoard. The labels and predictions look like:
# Rows are samples, columns are classes, and the classes shows a facial
# action which is either 1 for detection or 0 for no detection.
[[0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 1],...]
I'm using a feed-forward MLP and the variable 'pred' is the prediction, with a threshold forcing a choice of 0 or 1. I tried multiplying predictions and labels by np.arange(1,7) to have the positive values match the indices but I got stuck on the shape of the arguments.
There's more code, but I'm showing what I think is relevant.
sess = tf.Session()
x = tf.placeholder(tf.float32, [None, n_input], name = "x")
y = tf.placeholder(tf.float32, [None, n_output], name = "labels")
#2 fully connected layers
fc1 = fc_layer(x, n_input, n_hidden_1, "fc1")
relu = tf.nn.relu(fc1)
tf.summary.histogram("fc1/relu", relu)
logits = fc_layer(relu, n_hidden_1, n_output, "fc2")
# Calculate loss function
with tf.name_scope("xent"):
xent = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=y, name="xent"))
with tf.name_scope("train"):
train_step = tf.train.AdamOptimizer(learning_rate).minimize(xent)
# Choose between 0 and 1
onesMat = tf.ones_like(logits)
zerosMat = tf.zeros_like(logits)
pred = tf.cast(tf.where(logits>=zero,onesMat,zerosMat),dtype=tf.float32, name = "op_to_restore")
# Problem occurs when I add this line.
confusion = tf.confusion_matrix(predictions = pred*np.arange(1,7), labels = y*np.arange(1,7), num_classes = n_output, name = "confusion")
# Save and visualize results
saver = tf.train.Saver()
init = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init)
writer = tf.summary.FileWriter(LOGDIR + hparam + '/train')
writer.add_graph(sess.graph)
# Train
for i in range(2001):
if i % 5 == 0:
[train_accuracy, s] = sess.run([accuracy, summ], feed_dict={x: train_x, y: train_y})
writer.add_summary(s, i)
if i % 50 == 0:
[acc,s] = sess.run([accuracy, summ],feed_dict={x: test_x, y: test_y})
sess.run(train_step, feed_dict={x: train_x, y: train_y})
Thank you!
I had the same problem as yours. I made use of the argmax function which fixed my problem.
Try this piece of code (or similiar):
cm = tf.confusion_matrix(labels=tf.argmax(y*np.arange(1,7), 1), predictions=tf.argmax(pred*np.arange(1,7)))
#then check the result:
with tf.Session() as sess:
cm_reachable = cm.eval()
print(cm_reachable)
And check out this detailed instruction:
Tensorflow confusion matrix using one-hot code
Related
Problem
I'm trying to classify some 64x64 images as a black box exercise. The NN I have written doesn't change my weights. First time writing something like this, the same code, but on MNIST letters input works just fine, but on this code it does not train like it should:
import tensorflow as tf
import numpy as np
path = ""
# x is a holder for the 64x64 image
x = tf.placeholder(tf.float32, shape=[None, 4096])
# y_ is a 1 element vector, containing the predicted probability of the label
y_ = tf.placeholder(tf.float32, [None, 1])
# define weights and balances
W = tf.Variable(tf.zeros([4096, 1]))
b = tf.Variable(tf.zeros([1]))
# define our model
y = tf.nn.softmax(tf.matmul(x, W) + b)
# loss is cross entropy
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
# each training step in gradient decent we want to minimize cross entropy
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
train_labels = np.reshape(np.genfromtxt(path + "train_labels.csv", delimiter=',', skip_header=1), (14999, 1))
train_data = np.genfromtxt(path + "train_samples.csv", delimiter=',', skip_header=1)
# perform 150 training steps with each taking 100 train data
for i in range(0, 15000, 100):
sess.run(train_step, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]})
if i % 500 == 0:
print(sess.run(cross_entropy, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]}))
print(sess.run(b), sess.run(W))
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.close()
How do I solve this problem?
The key to the problem is that the class number of you output y_ and y is 1.You should adopt one-hot mode when you use tf.nn.softmax_cross_entropy_with_logits on classification problems in tensorflow. tf.nn.softmax_cross_entropy_with_logits will first compute tf.nn.softmax. When your class number is 1, your results are all the same. For example:
import tensorflow as tf
y = tf.constant([[1],[0],[1]],dtype=tf.float32)
y_ = tf.constant([[1],[2],[3]],dtype=tf.float32)
softmax_var = tf.nn.softmax(logits=y_)
cross_entropy = tf.multiply(y, tf.log(softmax_var))
errors = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
with tf.Session() as sess:
print(sess.run(softmax_var))
print(sess.run(cross_entropy))
print(sess.run(errors))
[[1.]
[1.]
[1.]]
[[0.]
[0.]
[0.]]
[0. 0. 0.]
This means that no matter what your output y_, your loss will be zero. So your weights and bias haven't been updated.
The solution is to modify the class number of y_ and y.
I suppose your class number is n.
First approch:You can change data to one-hot before feed data.Then use the following code.
y_ = tf.placeholder(tf.float32, [None, n])
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))
Second approch:change data to one-hot after feed data.
y_ = tf.placeholder(tf.int32, [None, 1])
y_ = tf.one_hot(y_,n) # your dtype of y_ need to be tf.int32
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))
All your initial weights are zeros. When you have that way, the NN doesn't learn well. You need to initialize all the initial weights with random values.
See Why should weights of Neural Networks be initialized to random numbers?
"Why Not Set Weights to Zero?
We can use the same set of weights each time we train the network; for example, you could use the values of 0.0 for all weights.
In this case, the equations of the learning algorithm would fail to make any changes to the network weights, and the model will be stuck. It is important to note that the bias weight in each neuron is set to zero by default, not a small random value.
"
See
https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/
I am new to Tensorflow and I am working for training with LSTM-RNN in Tensorflow.
I need to save the model so that I can restore and run with Test data again.
I am not sure what to save.
I need to save sess or I need to save pred
When I save sess, restore and test the Test data as
one_hot_predictions, accuracy, final_loss = sess.run(
[pred, accuracy, cost],
feed_dict={
x: X_test,
y: one_hot(y_test)
}
)
Then the error is unknown for pred.
Since I am new to Tensorflow, I am not sure what to save and what to restore to test with new data?
X_train = load_X(X_train_path)
X_test = load_X(X_test_path)
y_train = load_y(y_train_path)
y_test = load_y(y_test_path)
# proof that it actually works for the skeptical: replace labelled classes with random classes to train on
#for i in range(len(y_train)):
# y_train[i] = randint(0, 5)
# Input Data
training_data_count = len(X_train) # 4519 training series (with 50% overlap between each serie)
test_data_count = len(X_test) # 1197 test series
n_input = len(X_train[0][0]) # num input parameters per timestep
n_hidden = 34 # Hidden layer num of features
n_classes = 6
#updated for learning-rate decay
# calculated as: decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
decaying_learning_rate = True
learning_rate = 0.0025 #used if decaying_learning_rate set to False
init_learning_rate = 0.005
decay_rate = 0.96 #the base of the exponential in the decay
decay_steps = 100000 #used in decay every 60000 steps with a base of 0.96
global_step = tf.Variable(0, trainable=False)
lambda_loss_amount = 0.0015
training_iters = training_data_count *300 # Loop 300 times on the dataset, ie 300 epochs
batch_size = 512
display_iter = batch_size*8 # To show test set accuracy during training
#Utility functions for training:
def LSTM_RNN(_X, _weights, _biases):
# model architecture based on "guillaume-chevalier" and "aymericdamien" under the MIT license.
_X = tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size
_X = tf.reshape(_X, [-1, n_input])
# Rectifies Linear Unit activation function used
_X = tf.nn.relu(tf.matmul(_X, _weights['hidden']) + _biases['hidden'])
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(_X, n_steps, 0)
# Define two stacked LSTM cells (two recurrent layers deep) with tensorflow
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
lstm_cells = tf.contrib.rnn.MultiRNNCell([lstm_cell_1, lstm_cell_2], state_is_tuple=True)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cells, _X, dtype=tf.float32)
# A single output is produced, in style of "many to one" classifier, refer to http://karpathy.github.io/2015/05/21/rnn-effectiveness/ for details
lstm_last_output = outputs[-1]
# Linear activation
return tf.matmul(lstm_last_output, _weights['out']) + _biases['out']
def extract_batch_size(_train, _labels, _unsampled, batch_size):
# Fetch a "batch_size" amount of data and labels from "(X|y)_train" data.
# Elements of each batch are chosen randomly, without replacement, from X_train with corresponding label from Y_train
# unsampled_indices keeps track of sampled data ensuring non-replacement. Resets when remaining datapoints < batch_size
shape = list(_train.shape)
shape[0] = batch_size
batch_s = np.empty(shape)
batch_labels = np.empty((batch_size,1))
for i in range(batch_size):
# Loop index
# index = random sample from _unsampled (indices)
index = random.choice(_unsampled)
batch_s[i] = _train[index]
batch_labels[i] = _labels[index]
_unsampled.remove(index)
return batch_s, batch_labels, _unsampled
def one_hot(y_):
# One hot encoding of the network outputs
# e.g.: [[5], [0], [3]] --> [[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]]
y_ = y_.reshape(len(y_))
n_values = int(np.max(y_)) + 1
return np.eye(n_values)[np.array(y_, dtype=np.int32)] # Returns FLOATS
# Graph input/output
x = tf.placeholder(tf.float32, [None, n_steps, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Graph weights
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
'out': tf.Variable(tf.random_normal([n_hidden, n_classes], mean=1.0))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
pred = LSTM_RNN(x, weights, biases)
# Loss, optimizer and evaluation
l2 = lambda_loss_amount * sum(
tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables()
) # L2 loss prevents this overkill neural network to overfit the data
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred)) + l2 # Softmax loss
if decaying_learning_rate:
learning_rate = tf.train.exponential_decay(init_learning_rate, global_step*batch_size, decay_steps, decay_rate, staircase=True)
#decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps) #exponentially decayed learning rate
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost,global_step=global_step) # Adam Optimizer
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#Train the network:
test_losses = []
test_accuracies = []
train_losses = []
train_accuracies = []
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
init = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
sess.run(init)
# Perform Training steps with "batch_size" amount of data at each loop.
# Elements of each batch are chosen randomly, without replacement, from X_train,
# restarting when remaining datapoints < batch_size
step = 1
time_start = time.time()
unsampled_indices = range(0,len(X_train))
while step * batch_size <= training_iters:
#print (sess.run(learning_rate)) #decaying learning rate
#print (sess.run(global_step)) # global number of iterations
if len(unsampled_indices) < batch_size:
unsampled_indices = range(0,len(X_train))
batch_xs, raw_labels, unsampled_indicies = extract_batch_size(X_train, y_train, unsampled_indices, batch_size)
batch_ys = one_hot(raw_labels)
# check that encoded output is same length as num_classes, if not, pad it
if len(batch_ys[0]) < n_classes:
temp_ys = np.zeros((batch_size, n_classes))
temp_ys[:batch_ys.shape[0],:batch_ys.shape[1]] = batch_ys
batch_ys = temp_ys
# Fit training using batch data
_, loss, acc = sess.run(
[optimizer, cost, accuracy],
feed_dict={
x: batch_xs,
y: batch_ys
}
)
train_losses.append(loss)
train_accuracies.append(acc)
# Evaluate network only at some steps for faster training:
if (step*batch_size % display_iter == 0) or (step == 1) or (step * batch_size > training_iters):
# To not spam console, show training accuracy/loss in this "if"
print("Iter #" + str(step*batch_size) + \
": Learning rate = " + "{:.6f}".format(sess.run(learning_rate)) + \
": Batch Loss = " + "{:.6f}".format(loss) + \
", Accuracy = {}".format(acc))
# Evaluation on the test set (no learning made here - just evaluation for diagnosis)
loss, acc = sess.run(
[cost, accuracy],
feed_dict={
x: X_test,
y: one_hot(y_test)
}
)
test_losses.append(loss)
test_accuracies.append(acc)
print("PERFORMANCE ON TEST SET: " + \
"Batch Loss = {}".format(loss) + \
", Accuracy = {}".format(acc))
step += 1
print("Optimization Finished!")
EDIT:
I can save the model as
print("Optimization Finished!")
save_path = saver.save(sess, "/home/test/venv/TFCodes/HumanActivityRecognition/model.ckpt")
Then I tried to restore and ok, I can restore. But I don't know how to test with test data.
My restore code is
X_test = load_X(X_test_path)
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/home/nyan/venv/TFCodes/HumanActivityRecognition/model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
print("Model restored.")
all_vars = tf.trainable_variables()
for i in range(len(all_vars)):
name = all_vars[i].name
values = sess.run(name)
print('name', name)
#print('value', values)
print('shape',values.shape)
result = sess.run(prediction, feed_dict={X: X_test})
print("loss:", l, "prediction:", result, "true Y:", y_data)
# print char using dic
result_str = [idx2char[c] for c in np.squeeze(res
ult)]
print("\tPrediction str:", ''.join(result_str))
The output is
Model restored.
('name', u'Variable_1:0')
('shape', (36, 34))
('name', u'Variable_2:0')
('shape', (34, 6))
('name', u'Variable_3:0')
('shape', (34,))
('name', u'Variable_4:0')
('shape', (6,))
('name', u'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0')
('shape', (68, 136))
('name', u'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0')
('shape', (136,))
('name', u'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0')
('shape', (68, 136))
('name', u'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0')
('shape', (136,))
Traceback (most recent call last):
File "restore.py", line 74, in <module>
result = sess.run(prediction, feed_dict={X: X_test})
NameError: name 'prediction' is not defined
How to test the model restored?
What I find the easiest is the tf.saved_model.simple_save() function. It saves the computation graph you use, the weights, the input and the output in a .pb model and the weight variables.
You can later restore this model or even put it on ml-engine or use tf serving.
An example code snippit with a keras model and applied on YOLO:
inputs = {"image_bytes": model.input,
"shape": image_shape}
outputs = {"boxes": boxes,
"scores": scores,
"classes": classes}
tf.saved_model.simple_save(sess, "saved_model/", inputs, outputs)
I'm working on a simple linear regression model to predict the next step in a series. I'm giving it x/y coordinate data and I want the regressor to predict where the next point on the plot will lie.
I'm using dense layers with AdamOptmizer and have my loss function set to:
tf.reduce_mean(tf.square(layer_out - y))
I'm trying to create linear regression models from scratch (I don't want to utilize the TF estimator package here).
I've seen ways to do it by manually specifying weights and biases, but nothing goes into deep regression.
X = tf.placeholder(tf.float32, [None, self.data_class.batch_size, self.inputs])
y = tf.placeholder(tf.float32, [None, self.data_class.batch_size, self.outputs])
layer_input = tf.layers.dense(inputs=X, units=10, activation=tf.nn.relu)
layer_hidden = tf.layers.dense(inputs=layer_input, units=10, activation=tf.nn.relu)
layer_out = tf.layers.dense(inputs=layer_hidden, units=1, activation=tf.nn.relu)
cost = tf.reduce_mean(tf.square(layer_out - y))
optmizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
training_op = optmizer.minimize(cost)
init = tf.initialize_all_variables()
iterations = 10000
with tf.Session() as sess:
init.run()
for iteration in range(iterations):
X_batch, y_batch = self.data_class.get_data_batch()
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
if iteration % 100 == 0:
mse = cost.eval(feed_dict={X:X_batch, y:y_batch})
print(mse)
array = []
for i in range(len(self.data_class.dates), (len(self.data_class.dates)+self.data_class.batch_size)):
array.append(i)
x_pred = np.array(array).reshape(1, self.data_class.batch_size, 1)
y_pred = sess.run(layer_out, feed_dict={X: x_pred})
print(y_pred)
predicted = np.array(y_pred).reshape(self.data_class.batch_size)
predicted = np.insert(predicted, 0, self.data_class.prices[0], axis=0)
plt.plot(self.data_class.dates, self.data_class.prices)
array = [self.data_class.dates[0]]
for i in range(len(self.data_class.dates), (len(self.data_class.dates)+self.data_class.batch_size)):
array.append(i)
plt.plot(array, predicted)
plt.show()
When I run training I'm getting the same loss value over and over again.
It's not being reduced, like it should, why?
The issue is that I'm applying an activation to the output layer. This is causing that output to go to whatever it activates to.
By specifying in the last layer that activation=None the deep regression works as intended.
Here is the updated architecture:
layer_input = tf.layers.dense(inputs=X, units=150, activation=tf.nn.relu)
layer_hidden = tf.layers.dense(inputs=layer_input, units=100, activation=tf.nn.relu)
layer_out = tf.layers.dense(inputs=layer_hidden, units=1, activation=None)
I just browsed through Stack Overflow and other forums but couldn't find anything helpful for my problem. But it seems related to this question.
I currently have a trained model of Tensorflow (128 inputs and 11 outputs) which I saved, following the MNIST tutorial by Tensorflow.
It seemed to be successful and I have a model in this folder now with the 3 files (.meta, .ckpt.data and .index). However, I want to restore it and use it for predictions:
#encoding[0] => numpy ndarray (128, ) # anyway a list with only one entry
#unknowndata = np.array(encoding[0])[None]
unknowndata = np.expand_dims(encoding[0], axis=0)
print(unknowndata.shape) # Output (1, 128)
# Restore pre-trained tf model
with tf.Session() as sess:
#saver.restore(sess, "models/model_1.ckpt")
saver = tf.train.import_meta_graph('models/model_1.ckpt.meta')
saver.restore(sess,tf.train.latest_checkpoint('models/./'))
y = tf.get_collection('final tensor') # tf.nn.softmax(tf.matmul(y2, W3) + b3)
X = tf.get_collection('input') # tf.placeholder(tf.float32, [None, 128])
# W1 = tf.get_collection('vars')[0]
# b1 = tf.get_collection('vars')[1]
# W2 = tf.get_collection('vars')[2]
# b2 = tf.get_collection('vars')[3]
# W3 = tf.get_collection('vars')[4]
# b3 = tf.get_collection('vars')[5]
# y1 = tf.nn.relu(tf.matmul(X, W1) + b1)
# y2 = tf.nn.relu(tf.matmul(y1, W2) + b2)
# yLog = tf.matmul(y2, W3) + b3
# y = tf.nn.softmax(yLog)
prediction = tf.argmax(y, 1)
print(sess.run(prediction, feed_dict={i: d for i,d in zip(X, unknowndata.T)}))
# also had sess.run(prediction, feed_dict={X: unknowndata.T}) and also not transposed, still errors
# Output: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] # one should be 1 obviously with a specific percentage
There I only run in problems ...
ValueError: Cannot feed value of shape (1,) for Tensor 'x:0', which has shape '(?, 128)'
Altough I print the shape of the 'unknowndata' and it matches the (1, 128).
I also tried it with
sess.run(prediction, feed_dict={X: unknownData})) # with transposed etc. but nothing worked for me there I got the other error
TypeError: unhashable type: 'list'
I only want some predictions of this beautiful Tensorflow trained model.
I figured out the problem!
First I need to restore all of the values (weights and biases and matmul them seperately).
Second I need to create the same input as in the trained model, in my case:
X = tf.placeholder(tf.float32, [None, 128])
and then just call the prediction:
sess.run(prediction, feed_dict={X: unknownData})
But I do not get any percentage distribution but I expect that due to the softmax function. Does anybody know how to access those?
The prediction tensor is obtained by an argmax on y. Instead of returning only prediction, you can add y to your output feed when you execute sess.run.
output_feed = [prediction, y]
preds, probs = sess.run(output_feed, print(sess.run(prediction, feed_dict={i: d for i,d in zip(X, unknowndata.T)}))
preds will have the predictions of the model andprobs will have the probability scores.
First when you save, you have to add to the collection the placeholders you need tf.add_to_collection('i', i) and then retrieve them and pass them the feed_dict.
In your example is "i":
i = tf.get_collection('i')[0]
#sess.run(prediction, feed_dict={i: d for i,d in zip(X, unknowndata.T)})
I know that this is a very broad question, but I have asked many other questions and I have still been unable to properly implement a simple dynamic-k max pooling convolutional neural network as described in this paper. Currently, I am trying to modify the code from this tutorial. I believe I have successfully implemented the dynamic-k part. However, my main problem is because the k value is different for each input, the tensors that are produced are different shapes. I have tried countless things to try and fix this (which is why you may see some funny reshaping), but I can't figure out how. I think that you'd need to pad each tensor to get them all to be the size of the biggest one, but I can't seem to get that to work. Here is my code (I am sorry, it is generally rather sloppy).
# train.py
import datetime
import time
import numpy as np
import os
import tensorflow as tf
from env.src.sentiment_analysis.dcnn.text_dcnn import TextDCNN
from env.src.sentiment_analysis.cnn import data_helpers as data_helpers
from tensorflow.contrib import learn
# Model Hyperparameters
tf.flags.DEFINE_integer("embedding_dim", 128, "Dimensionality of character embedding (default: 128)")
tf.flags.DEFINE_string("filter_sizes", "3,4,5", "Comma-separated filter sizes (default: '3,4,5')")
tf.flags.DEFINE_integer("num_filters", 128, "Number of filters per filter size (default: 128)")
tf.flags.DEFINE_float("dropout_keep_prob", 0.5, "Dropout keep probability (default: 0.5)")
tf.flags.DEFINE_float("l2_reg_lambda", 0.0, "L2 regularizaion lambda (default: 0.0)")
# Training parameters
tf.flags.DEFINE_integer("batch_size", 256, "Batch Size (default: 64)")
tf.flags.DEFINE_integer("num_epochs", 200, "Number of training epochs (default: 200)")
tf.flags.DEFINE_integer("evaluate_every", 100, "Evaluate model on dev set after this many steps (default: 100)")
tf.flags.DEFINE_integer("checkpoint_every", 100, "Save model after this many steps (default: 100)")
# Misc Parameters
tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement")
tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices")
tf.flags.DEFINE_string("positive_file", "../rotten_tomatoes/rt-polarity.pos", "Location of the rt-polarity.pos file")
tf.flags.DEFINE_string("negative_file", "../rotten_tomatoes/rt-polarity.neg", "Location of the rt-polarity.neg file")
FLAGS = tf.flags.FLAGS
FLAGS._parse_flags()
print("\nParameters:")
for attr, value in sorted(FLAGS.__flags.items()):
print("{} = {}".format(attr.upper(), value))
print("")
# Data Preparatopn
# Load data
print("Loading data...")
x_text, y = data_helpers.load_data_and_labels(FLAGS.positive_file, FLAGS.negative_file)
# Build vocabulary
max_document_length = max([len(x.split(" ")) for x in x_text])
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
x = np.array(list(vocab_processor.fit_transform(x_text)))
x_arr = np.array(x_text)
seq_lens = []
for s in x_arr:
seq_lens.append(len(s.split(" ")))
# Randomly shuffle data
np.random.seed(10)
shuffle_indices = np.random.permutation(np.arange(len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices]
# Split train/test set
x_train, x_dev = x_shuffled[:-1000], x_shuffled[-1000:]
y_train, y_dev = y_shuffled[:-1000], y_shuffled[-1000:]
print("Vocabulary Size: {:d}".format(len(vocab_processor.vocabulary_)))
print("Train/Dev split: {:d}/{:d}".format(len(y_train), len(y_dev)))
# Training
with tf.Graph().as_default():
session_conf = tf.ConfigProto(
allow_soft_placement=FLAGS.allow_soft_placement,
log_device_placement=FLAGS.log_device_placement
)
sess = tf.Session(config=session_conf)
with sess.as_default():
print("HERE")
print(x_train.shape)
dcnn = TextDCNN(
sequence_lengths=seq_lens,
sequence_length=x_train.shape[1],
num_classes=y_train.shape[1],
vocab_size=len(vocab_processor.vocabulary_),
embedding_size=FLAGS.embedding_dim,
filter_sizes=list(map(int, FLAGS.filter_sizes.split(","))),
num_filters=FLAGS.num_filters,
)
# The training procedure
global_step = tf.Variable(0, name="global_step", trainable=False)
optimizer = tf.train.AdamOptimizer(1e-4)
grads_and_vars = optimizer.compute_gradients(dcnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
# Output directory for models and summaries
timestamp = str(int(time.time()))
out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp))
print("Writing to {}\n".format(out_dir))
# Summaries for loss and accuracy
loss_summary = tf.scalar_summary("loss", dcnn.loss)
acc_summary = tf.scalar_summary("accuracy", dcnn.accuracy)
# Summaries for training
train_summary_op = tf.merge_summary([loss_summary, acc_summary])
train_summary_dir = os.path.join(out_dir, "summaries", "train")
train_summary_writer = tf.train.SummaryWriter(train_summary_dir, sess.graph)
# Summaries for devs
dev_summary_op = tf.merge_summary([loss_summary, acc_summary])
dev_summary_dir = os.path.join(out_dir, "summaries", "dev")
dev_summary_writer = tf.train.SummaryWriter(dev_summary_dir, sess.graph)
# Checkpointing
checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
checkpoint_prefix = os.path.join(checkpoint_dir, "model")
# TensorFlow assumes this directory already exsists so we need to create it
if not os.path.exists(checkpoint_dir):
os.makedirs(checkpoint_dir)
saver = tf.train.Saver(tf.all_variables())
# Write vocabulary
vocab_processor.save(os.path.join(out_dir, "vocab"))
# Initialize all variables
sess.run(tf.initialize_all_variables())
def train_step(x_batch, y_batch):
"""
A single training step.
Args:
x_batch: A batch of X training values.
y_batch: A batch of Y training values
Returns: void
"""
feed_dict = {
dcnn.input_x: x_batch,
dcnn.input_y: y_batch,
dcnn.dropout_keep_prob: FLAGS.dropout_keep_prob
}
# Execute train_op
_, step, summaries, loss, accuracy = sess.run(
[train_op, global_step, train_summary_op, dcnn.loss, dcnn.accuracy],
feed_dict
)
# Print and save to disk loss and accuracy of the current training batch
time_str = datetime.datetime.now().isoformat()
print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))
train_summary_writer.add_summary(summaries, step)
def dev_step(x_batch, y_batch, writer=None):
"""
Evaluates a model on a dev set.
Args:
x_batch: A batch of X training values.
y_batch: A batch of Y training values.
writer: The writer to use to record the loss and accuracy
Returns: void
"""
feed_dict = {
dcnn.input_x: x_batch,
dcnn.input_y: y_batch,
dcnn.dropout_keep_prob : 1.0
}
step, summaries, loss, accuracy = sess.run(
[global_step, dev_summary_op, dcnn.loss, dcnn.accuracy],
feed_dict
)
time_str = datetime.datetime.now().isoformat()
print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))
if writer:
writer.add_summary(summaries, step)
# Generate batches
batches = data_helpers.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)
# Training loop. For each batch...
for batch in batches:
x_batch, y_batch = zip(*batch)
train_step(x_batch, y_batch)
current_step = tf.train.global_step(sess, global_step)
if current_step % FLAGS.evaluate_every == 0:
print("\nEvaluation:")
dev_step(x_dev, y_dev, writer=dev_summary_writer)
print("")
if current_step % FLAGS.checkpoint_every == 0:
path = saver.save(sess, checkpoint_prefix, global_step=current_step)
print("Saved model checkpoint to {}\n".format(path))
And here is the actual DCNN class:
import tensorflow as tf
class TextDCNN(object):
"""
A CNN for NLP tasks. Architecture is as follows:
Embedding layer, conv layer, max-pooling and softmax layer
"""
def __init__(self, sequence_lengths, sequence_length, num_classes, vocab_size, embedding_size, filter_sizes, num_filters):
"""
Makes a new CNNClassifier
Args:
sequence_length: The length of each sentence
num_classes: Number of classes in the output layer (positive and negative would be 2 classes)
vocab_size: The size of the vocabulary, needed to define the size of the embedding layer
embedding_size: Dimensionality of the embeddings
filter_sizes: Number of words the convolutional filters will cover, there will be num_filters for each size
specified.
num_filters: The number of filters per filter size.
Returns: A new CNNClassifier with the given parameters.
"""
# Define the inputs and the dropout
print("SEQL")
print(sequence_length)
self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x")
self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y")
self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")
# Runs the operations on the CPU and organizes them into an embedding scope
with tf.device("/cpu:0"), tf.name_scope("embedding"):
W = tf.Variable( # Make a 4D tensor to store batch, width, height, and channel
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W"
)
self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
pooled_outputs = []
for i, filter_size in enumerate(filter_sizes):
with tf.name_scope("conv-maxpool-%s" % filter_size):
# Conv layer
filter_shape = [filter_size, embedding_size, 1, num_filters]
# W is the filter matrix
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 1, 1],
padding="VALID",
name="conv"
)
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# Max-pooling layer over the outputs
print(sequence_lengths[i] - filter_size + 1)
print(h)
pooled = tf.nn.max_pool(
h,
ksize=[1, sequence_lengths[i] - filter_size + 1, 1, 1],
strides=[1, 1, 1, 1],
padding="VALID",
name="pool"
)
pooled = tf.reshape(pooled, [-1, 1, 1, num_filters])
print(pooled)
pooled_outputs.append(pooled)
# Combine all of the pooled features
num_filters_total = num_filters * len(filter_sizes)
max_shape = tf.reduce_max(pooled_outputs, 1)
print("shapes")
print([p.get_shape() for p in pooled_outputs])
# pooled_outputs = [tf.pad(p, [[0, int(max_shape.get_shape()[0]) - int(p.get_shape()[0])], [0, 0], [0, 0], [0, 0]]) for p in pooled_outputs]
# pooled_outputs = [tf.reshape(p, [-1, 1, 1, num_filters]) for p in pooled_outputs]
# pooled_outputs = [tf.reshape(out, [-1, 1, 1, self.max_length]) for out in pooled_outputs]
self.h_pool = tf.concat(3, pooled_outputs)
self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])
print("here")
print(self.h_pool_flat)
self.h_pool_flat = tf.reshape(self.h_pool, [max(sequence_lengths), num_filters_total])
# Add dropout
with tf.name_scope("dropout"):
# casted = tf.cast(self.dropout_keep_prob, tf.int32)
self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)
self.h_drop = tf.reshape(self.h_drop, [-1, num_filters_total])
# Do raw predictions (no softmax)
with tf.name_scope("output"):
W = tf.Variable(tf.truncated_normal([num_filters_total, num_classes], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
# xw_plus_b(...) is just Wx + b matmul alias
self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
self.predictions = tf.argmax(self.scores, 1, name="predictions")
# Calculate mean cross-entropy loss
with tf.name_scope("loss"):
# softmax_cross_entropy_with_logits(...) calculates cross-entropy loss
losses = tf.nn.softmax_cross_entropy_with_logits(self.scores, self.input_y)
'''print("here")
print(losses.get_shape())
print(self.scores.get_shape())
print(self.input_y.get_shape())'''
self.loss = tf.reduce_mean(losses)
# Calculate accuracy
with tf.name_scope("accuracy"):
correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
I am using the Rotten Tomatoes sentiment labeled data set. The current error I am getting is this:
InvalidArgumentError (see above for traceback): input[1,0] mismatch: 5888 vs. 4864
[[Node: gradients/concat_grad/ConcatOffset = ConcatOffset[N=3, _device="/job:localhost/replica:0/task:0/cpu:0"](concat/concat_dim, gradients/concat_grad/ShapeN, gradients/concat_grad/ShapeN:1, gradients/concat_grad/ShapeN:2)]]
How can I fix this code so that all of the tensors are normalized to the same size after pooling (while keeping pooling dynamic) and so that the code runs to completion?
Sorry about all of the random commented out lines and prints and stuff, but I have tried extensively to make this work.
Although tensorflow doesn't provide k-max pooling directly, I think tf.nn.top_k might help you build that op.
There are three things to note here.
max-pooling and k-max pooling are two different operations.
max-pooling retrieves the maximum valued activation out of the pooling window while k-max pooling retrieves k maximum values from the pooling window.
Tensorflow doesn't provide API for k-max pooling as of now. The one
which you are trying now is max-pooling operation and not k-max
pooling operation.
As per my knowledge, tensorflow does not provide functionality to handle pooling resulting in different size of matrices. So, you may use bucketing to create batches of sentences of similar length and the use k-max pooling.