Difference between two versions of cross_entropy calculations in TensorFlow - python

I'm using TensorFlow in order to train binary classification neural network.
In order to build the network, half-year ago I followed the tutorial in TensorFlow website - Deep MNIST for Experts.
Today, when I comparing both codes (the one in the tutorial and the one I wrote), I can see a difference in the cross-entropy calculation. A difference that I can't tell why it's there.
In the tutorial the cross-entropy calculated as follows:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
While in my code the calculation is as follows:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
I'm new in Tensorflow, and I feel that I'm missing something. Mabey the difference is between two versions of the TensorFlow tutorials? What is the actual difference between the two lines?
Really appreciate your help. Thanks!
The relevant code from the tutorial:
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
...
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
My code:
# load data
folds = build_database_tuple.load_data(data_home_dir=data_home_dir,validation_ratio=validation_ratio,patch_size=patch_size)
# starting the session. using the InteractiveSession we avoid build the entiee comp. graph before starting the session
sess = tf.InteractiveSession()
# start building the computational graph
# the 'None' indicates the number of classes - a value that we wanna leave open for now
x = tf.placeholder(tf.float32, shape=[None, patch_size**2]) #input images - 28x28=784
y_ = tf.placeholder(tf.float32, shape=[None, 2]) #output classes (using one-hot vectors)
# the vriables for the linear layer
W = tf.Variable(tf.zeros([(patch_size**2),2])) #weights - 784 input features and 10 outputs
b = tf.Variable(tf.zeros([2])) #biases - 10 classes
# initialize all the variables using the session, in order they could be used in it
sess.run(tf.initialize_all_variables())
# implementation of the regression model
y = tf.nn.softmax(tf.matmul(x,W) + b)
# Done!
# FIRST LAYER:
# build the first layer
W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size, 1, first_conv_output_channels]) # 5x5 patch, 1 input channel, 32 output channels (features)
b_conv1 = bias_variable([first_conv_output_channels])
x_image = tf.reshape(x, [-1,patch_size,patch_size,1]) # reshape x to a 4d tensor. 2,3 are the image dimensions, 4 is ine color channel
# apply the layers
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
# SECOND LAYER:
# 64 features each 5x5 patch
W_conv2 = weight_variable([sec_conv_kernel_size, sec_conv_kernel_size, patch_size, sec_conv_output_channels])
b_conv2 = bias_variable([sec_conv_output_channels])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
# FULLY CONNECTED LAYER:
# 1024 neurons, 8x8 - new size after 2 pooling layers
W_fc1 = weight_variable([(patch_size/4) * (patch_size/4) * sec_conv_output_channels, fc_vec_size])
b_fc1 = bias_variable([fc_vec_size])
h_pool2_flat = tf.reshape(h_pool2, [-1, (patch_size/4) * (patch_size/4) * sec_conv_output_channels])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# dropout layer - meant to reduce over-fitting
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# READOUT LAYER:
# softmax regression
W_fc2 = weight_variable([fc_vec_size, 2])
b_fc2 = bias_variable([2])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# TRAIN AND EVALUATION:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())

The difference is small but quite significative.
softmax_cross_entropy_with_logits takes logits (real numbers without any range limit), passes them through the softmax function and then computes the cross-entropy. Combining both into one function is to apply some optimizations to improve numerical accuracy.
The second code just applies cross-entropy directly to y_conv, which seems to be the output of a softmax function. This is correct, and both should give similar but not the same results, softmax_cross_entropy_with_logits is superior because of numerical stability. Just remember to give it logits and not the output of a softmax.

Related

Neural Network not converging in multilabel classification task - TensorFlow

I am a beginner with TensorFlow and I really need some help with this task. I am doing some image pixels classification, and my problem is set this way:
my inputs is array X that contain 20 values. These values represent 4 pixels (there are 5 values per pixel). My output is an array of 4 values, where each value can be either 1 or 0, and it means that the specific pixel may or may not have a specific characteristic. As you can imagine, it is possible that my y is of the form y=[1, 0, 0, 1], so I can have multiple classes for each instance.
To perform this classification task, I organized a neural network with an input layer of 20 neurons, a hidden layer of 15, followed by another hidden layer of 10 and eventually there is an output layer of 4. The activation functions for the hidden layers are ReLU, and I applied 50% dropout to them. As a loss function to minimize, I use tensorflow's sigmoid_cross_entropy_with_logits, because it computes independent probabilities for each class, allowing to perform multilabel classification.
When I first attempted to train the network, I was getting all NaN results, because of (I think) an exploding gradient problem. This apparently solved after I lowered the learning rate.
My problem now is that this network is not converging at all, and I believe this is because something is wrong with the cost and activation functions that I am using.
Note: the input has been scaled with sklearn.preprocessing.StandardScaler
Here is the code:
import tensorflow as tf
n_inputs = 20
n_hidden1 = 15
n_hidden2 = 10
n_outputs = 4
dropout_rate = 0.5
learning_rate = 0.000000001
#This is a boolean value that indicates when to apply dropout
training = tf.placeholder_with_default(True, shape=(), name="training")
X = tf.placeholder(tf.float64, shape=(None, n_inputs), name ="X")
y = tf.placeholder(tf.int64, shape=(None, 4), name = "y")
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation = tf.nn.relu)
hidden1_drop = tf.layers.dropout(hidden1, dropout_rate, training = training)
hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.relu)
hidden2_drop = tf.layers.dropout(hidden2, dropout_rate, training = training)
logits = tf.layers.dense(hidden2_drop, n_outputs, name="outputs")
with tf.name_scope("loss"):
xentropy = tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.cast(y, tf.float64), logits = tf.cast(logits, tf.float64))
loss = tf.reduce_mean(xentropy, name = 'loss')
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
n_epochs = 50
batch_size = 50
n_batches = 1000000
#training
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for i in range(n_batches):
X_batch = np.asarray(X_batches[i]).reshape(-1, 20)
y_batch = np.asarray(y_batches[i]).reshape(-1, 4)
sess.run(training_op, feed_dict ={X: X_batch, y: y_batch, training:True})
if (i % 10000) == 0:
raws = logits.eval(feed_dict={X: X_batch, training:False})
print("epoca = "+str(epoch))
print("iterazione = "+str(i))
print("accuratezza = "+str(get_global_accuracy_rate(raws, y_batch)))
print("X = "+str(X_batch[0])+ " y = "+str(y_batch[0]))
print("raws = "+str(raws[0])+" pred = " + str(get_prediction(raws[0])))
save_path= saver.save(sess, "./my_model_final_1.ckpt")
Thank you so much in advance for any help!!

CNN model not able to make prediction

I trained a CNN model successfully, however I am getting errors when I feed images to the model for it to predict the labels.
This is my model (I am restoring it with saver.restore)...
# load dataset
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# interactive session
sess = tf.InteractiveSession()
# data and labels placeholder
x = tf.placeholder(tf.float32, shape=[None, 784])
y = tf.placeholder(tf.float32, shape=[None, 10])
# 32 filters of size 5x5 and 32 biases,
# the filters are used to create 32 feature maps
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_img = tf.reshape(x, [-1, 28, 28, 1])
# first layer activated using a Relu activation function
conv1 = tf.nn.relu(conv2d(x_img, W_conv1) + b_conv1)
pool1 = max_pool_2x2(conv1)
# 64 filters of size 5x5
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
# second layer
conv2 = tf.nn.relu(conv2d(pool1, W_conv2) + b_conv2)
pool2 = max_pool_2x2(conv2)
# fully connected layer with 1024 neurons
W_fully = weight_variable([7 * 7 * 64, 1024])
b_fully = bias_variable([1024])
pool2flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
fully = tf.nn.relu(tf.matmul(pool2flat, W_fully) + b_fully)
# dropout layer removes dead neurons
prob_drop = tf.placeholder(tf.float32)
dropout = tf.nn.dropout(fully, prob_drop)
# readout layer that will return the raw values
# of our predictions
W_readout = weight_variable([1024, 10])
b_readout = bias_variable([10])
y_conv = tf.matmul(dropout, W_readout) + b_readout
# loss function
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_conv, labels=y))
# restore the trained CNN model
saver = tf.train.Saver()
saver.restore(sess, "/tmp/model2.ckpt")
y_conv is the predictor.
The model is trained on the mnist dataset, now I have an image of a number and I want the model to tell me what it thinks it is in terms of accuracy. I tried the following...
prediction = tf.argmax(y_conv, 1)
print(sess.run(prediction, feed_dict={x:two_images[0]}))
After feeding the image two_images[0] to the model, I got the following error...
ValueError: Cannot feed value of shape (784,) for Tensor 'Placeholder:0', which has shape '(?, 784)'
So I fixed it by doing the following...
prediction = tf.argmax(y_conv, 1)
print(sess.run(prediction, feed_dict={x:two_images[0].reshape((1, 784))}))
But now I am getting a whole bunch of errors that I cannot decipher...
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_2' with dtype float
[[Node: Placeholder_2 = Placeholderdtype=DT_FLOAT, shape=, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
I am not sure what I am doing wrong.
EDIT
This is how I populate the variable two_images...
# extract the indices of the number 2
two_idxs_list = np.where(mnist.test.labels[:, 2].astype(int) == 1)
two_idxs = two_idxs_list[0][:10]
# use the indices to extract the images of 2 and their corresponding label
two_images = mnist.test.images[two_idxs]
two_labels = mnist.test.labels[two_idxs]
Okay with the code added I was able to test in on my machine. The issue is that your network expects two inputs, one image, and a label. Even if you only do inference, you have to supply an input, maybe just some zeroes? Obviously the loss calculation will be wrong, but you're not interested in that, only in the prediction. So your sess.run line should be:
print( sess.run( prediction, feed_dict= {
x: two_images[0].reshape((1, 784)),
y: np.zeros( shape = ( 1, 10 ), dtype = np.float32 ) } ) )

Tensor Flow Tutorial: Why isn't my version of 'Deep MNIST for Experts' working correctly?

I tried to follow the tensorflow tutorial 'Deep MNIST for Experts' [1]. My program (source below) is working, but it converges very slowly and achieves a quite bad accuracy of ~90%. It should achieve about 99,2% accuracy. I compared my solution to the 'mnist_deep.py' available for download [2], which looks quite similar ... but achieves those 99,2% accuracy on the same machine (so it's not a bug in tensorflow nor anything wrong with my installation). Surprisingly, it needs much more time for the training on the same machine telling me the trained model must be different / more complex. I checked my sources and compared it to mine, reordered stuff and checked the numbers. But, I didn't find any relevant difference except for coding style. I'm new to python - so maybe it's just some simple syntax issue...
Questions:
What is the difference in my version causing this problem?
Additional: How to debug those issues in tensorflow? I saw some generated graphs of the models on the webpage... how to generate them from the source?
My Program:
import os
import tensorflow as tf
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
from tensorflow.examples.tutorials.mnist import input_data
x = tf.placeholder(tf.float32, [None, 784])
# reshape
x_image = tf.reshape(x, [-1, 28, 28, 1])
# conv1
W_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# pool1
h_pool1 = max_pool_2x2(h_conv1)
# conv2
W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# pool2
h_pool2 = max_pool_2x2(h_conv2)
# fcl
W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# map 1024 features to 10 classes
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
y_ = tf.placeholder(tf.float32, [None, 10])
# loss function
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)
)
# ADAM Optimizer
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# accuracy
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
correct_prediction = tf.cast(correct_prediction, tf.float32)
accuracy = tf.reduce_mean(correct_prediction)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: batch[0], y_: batch[1], keep_prob: 1.0
})
print('step %d, training accuracy %g' % (i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_:batch[1], keep_prob: 0.5})
print('test accuracy %g' % accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
OUTPUT:
# python3 mnist_deep.py
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0, training accuracy 0.06
step 100, training accuracy 0.06
step 200, training accuracy 0
step 300, training accuracy 0.12
step 400, training accuracy 0.06
step 500, training accuracy 0.1
step 600, training accuracy 0.18
step 700, training accuracy 0.12
step 800, training accuracy 0.1
step 900, training accuracy 0.22
step 1000, training accuracy 0.2
[...]
Version from the Webpage:
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A deep MNIST classifier using convolutional layers.
See extensive documentation at
https://www.tensorflow.org/get_started/mnist/pros
"""
# Disable linter warnings to maintain consistency with tutorial.
# pylint: disable=invalid-name
# pylint: disable=g-bad-import-order
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import sys
import tempfile
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
FLAGS = None
def deepnn(x):
"""deepnn builds the graph for a deep net for classifying digits.
Args:
x: an input tensor with the dimensions (N_examples, 784), where 784 is the
number of pixels in a standard MNIST image.
Returns:
A tuple (y, keep_prob). y is a tensor of shape (N_examples, 10), with values
equal to the logits of classifying the digit into one of 10 classes (the
digits 0-9). keep_prob is a scalar placeholder for the probability of
dropout.
"""
# Reshape to use within a convolutional neural net.
# Last dimension is for "features" - there is only one here, since images are
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.
with tf.name_scope('reshape'):
x_image = tf.reshape(x, [-1, 28, 28, 1])
# First convolutional layer - maps one grayscale image to 32 feature maps.
with tf.name_scope('conv1'):
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# Pooling layer - downsamples by 2X.
with tf.name_scope('pool1'):
h_pool1 = max_pool_2x2(h_conv1)
# Second convolutional layer -- maps 32 feature maps to 64.
with tf.name_scope('conv2'):
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# Second pooling layer.
with tf.name_scope('pool2'):
h_pool2 = max_pool_2x2(h_conv2)
# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image
# is down to 7x7x64 feature maps -- maps this to 1024 features.
with tf.name_scope('fc1'):
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Dropout - controls the complexity of the model, prevents co-adaptation of
# features.
with tf.name_scope('dropout'):
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# Map the 1024 features to 10 classes, one for each digit
with tf.name_scope('fc2'):
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
return y_conv, keep_prob
def conv2d(x, W):
"""conv2d returns a 2d convolution layer with full stride."""
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
"""max_pool_2x2 downsamples a feature map by 2X."""
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def weight_variable(shape):
"""weight_variable generates a weight variable of a given shape."""
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
"""bias_variable generates a bias variable of a given shape."""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def main(_):
# Import data
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
# Create the model
x = tf.placeholder(tf.float32, [None, 784])
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])
# Build the graph for the deep net
y_conv, keep_prob = deepnn(x)
with tf.name_scope('loss'):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_,
logits=y_conv)
cross_entropy = tf.reduce_mean(cross_entropy)
with tf.name_scope('adam_optimizer'):
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
correct_prediction = tf.cast(correct_prediction, tf.float32)
accuracy = tf.reduce_mean(correct_prediction)
graph_location = tempfile.mkdtemp()
print('Saving graph to: %s' % graph_location)
train_writer = tf.summary.FileWriter(graph_location)
train_writer.add_graph(tf.get_default_graph())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: batch[0], y_: batch[1], keep_prob: 1.0})
print('step %d, training accuracy %g' % (i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print('test accuracy %g' % accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str,
default='/tmp/tensorflow/mnist/input_data',
help='Directory for storing input data')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
OUTPUT:
# python3 mnist_deep.py
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
Saving graph to: /tmp/tmpw8uaz0vs
step 0, training accuracy 0.12
step 100, training accuracy 0.64
step 200, training accuracy 0.86
step 300, training accuracy 0.94
step 400, training accuracy 0.94
[...]
Links
[1] https://www.tensorflow.org/get_started/mnist/pros
[2] https://www.github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/examples/tutorials/mnist/mnist_deep.py
Question 1
Possible reason for poor results: train_step.run(feed_dict={x: batch[0], y_:batch[1], keep_prob: 0.5}) should be outside the if i%100 statement.
Question 2
To answer you second question, the graphs are generated by tensorboard (as mentioned by #JoshVarty). To visualize your graph, you need to write it to a file. Add the following somewhere before running the session and after defining the complete graph:
file_writer = tf.summary.FileWriter(".", tf.get_default_graph())
Then start tensorboard server in the terminal from your current path tensorboard --logdir=".", and you can open it from your browser with default port 6006 and the graph will appear after you run the script.

Dillema with prediction with TensorFlow on MNIST set

currently I am a newbie on TensorFlow, I have trained a model using MNIST set and now I have made some pictures with numbers and I want to try to test the precision. I think I have a syntax or understanding of how things in TensorFlow are working
This is my model:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
sess = tf.InteractiveSession()
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
#stride 1 and 0 padding
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
#pooling over 2x2
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
# Second Layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
#Fully connected layer
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#Readout layer
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
for i in range(200):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
# Here is my custom dataset
custom_data=GetDataset()
print sess.run(y_conv,feed_dict={x: custom_data})
It is not the right syntax for making prediction with my custom data ? I am missing something here ? My data are in the same format as the one from the MNIST set but I can find a correct syntax for how to make a prediction:
print sess.run(y_conv,feed_dict={x: custom_data})
Thanks a lot for any help !
y_conv will provide you what you need to make recommendations. You're probably just not understanding the form the data takes on in that tensor.
In your code you have a loss function and an optimizer:
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
Note that you pass y_conv to the softmax_cross_encropy_with_logits method. y_conv at this point is an unscaled number. Negative values represent the negative class, positive values represent the positive class.
softmax will convert these to a probability distribution over all outputs. This notably converts all outputs to a [0,1] range. Cross entropy then computes the error (cross entropy assumes values in the [0,1] range).
It's common to simply create another tensor that actually computes the prediction, if you're using softmax then:
prediction = tf.softmax(y_conv)
That would give you the predicted probability distribution over the labels. Just request that tensor in your sess.run step.
If you just care about the most probable class, then you can take the max of the y_conv values. Note also that if this statement is true you might want to experiment with tf.nn.sigmoid_cross_entropy_with_logits which is a slightly more tuned to producing results that are not a probability distribution (slightly better for single class prediction, and mandatory for multi class prediction).

Placeholder missing error in Tensor flow for CNN

I am using tensor flow to run a convolution neural network on MNIST database. But I am getting the following error.
tensorflow.python.framework.errors.InvalidArgumentError: You must feed
a value for placeholder tensor 'x' with dtype float [[Node: x =
Placeholderdtype=DT_FLOAT, shape=[],
_device="/job:localhost/replica:0/task:0/cpu:0"]]
x = tf.placeholder(tf.float32, [None, 784], name='x') # mnist data image of shape 28*28=784
I thought I correctly update the value of x using feed_dict, but its saying i haven't update the value of placeholder x.
Also, is there any other logical flaw in my code?
Any help would be greatly appreciated. Thanks.
import tensorflow as tf
import numpy
from tensorflow.examples.tutorials.mnist import input_data
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# Parameters
learning_rate = 0.01
training_epochs = 10
batch_size = 100
display_step = 1
# tf Graph Input
#x = tf.placeholder(tf.float32, [50, 784], name='x') # mnist data image of shape 28*28=784
#y = tf.placeholder(tf.float32, [50, 10], name='y') # 0-9 digits recognition => 10 classes
# Set model weights
W = tf.Variable(tf.zeros([784, 10]), name="weights")
b = tf.Variable(tf.zeros([10]), name="bias")
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
# Initializing the variables
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
# Training cycle
for i in range(1000):
print i
batch_xs, batch_ys = mnist.train.next_batch(50)
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
y_conv=tf.nn.softmax(tf.matmul(h_fc1, W_fc2) + b_fc2)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(y_conv), reduction_indices=[1]))
sess.run(
[cross_entropy, y_conv],
feed_dict={x: batch_xs, y: batch_ys})
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y,1))
print correct_prediction.eval()
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Why are you trying to create placeholder variables ? You should be able to use the outputs generated by mnist.train.next_batch(50) directly provided that you move the computation of correct_prediction and accuracy inside the model itself.
batch_xs, batch_ys = mnist.train.next_batch(50)
x_image = tf.reshape(batch_xs, [-1,28,28,1])
...
cross_entropy = tf.reduce_mean(-tf.reduce_sum(batch_ys * tf.log(y_conv), reduction_indices=[1]))
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(batch_ys,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
predictions_correct, acc = sess.run([cross_entropy, y_conv, correct_prediction, accuracy])
print predictions_correct, acc
You're receiving that error because you're attempting to run eval() on correct_prediction. That tensor requires the batch inputs (x and y) in order to be evaluated. You could correct the error by changing it to:
print correct_prediction.eval(feed_dict={x: batch_xs, y: batch_ys})
But as Benoit Steiner mentioned, you could just as easily pull it into the model.
On a more general note, you're not doing any kind of optimization here, but maybe you just haven't gotten around to that yet. As it stands now, it'll just print out bad predictions for a while. :)
Firstly your x and y are commented out, if this is present in your actual code it is very likely the issue.
correct_prediction.eval() is equivalent to tf.session.run(correct_prediction) (or in your case sess.run() ) and thus requires the same syntax*. So it would need to be correct_prediction.eval(feed_dict={x: batch_xs, y: batch_ys}) in order to run, be warned however that this is generally RAM intensive, and may cause your system to hang. Pulling the accuracy function into the model may be a good idea because of the ram usage.
I did not see an optimization function to utilize your cross entropy, however i have never tried not using one,so if it works don't fix it. but if it ends up throwing an error you may want to try:
optimizer = optimizer = tf.train.AdamOptimizer().minimize(cross_entropy)
and replace the 'cross_entropy' in
sess.run([cross_entropy, y_conv],feed_dict={x: batch_xs, y: batch_ys})
with 'optimizer'
https://pythonprogramming.net/tensorflow-neural-network-session-machine-learning-tutorial/
check the accuracy evaluation section of the script.

Categories