I have trained a convolution neural network with batch size of 10.
However when testing, I want to predict the classification for each dataset separately and not in batches, this gives error:
Assign requires shapes of both tensors to match. lhs shape= [1,3] rhs shape= [10,3]
I understand 10 refers to batch_size and 3 is the number of classes that I am classifying into.
Can we not train using batches and test individually?
Update:
Training Phase:
batch_size=10
classes=3
#vlimit is some constant : same for training and testing phase
X = tf.placeholder(tf.float32, [batch_size,vlimit ], name='X_placeholder')
Y = tf.placeholder(tf.int32, [batch_size, classes], name='Y_placeholder')
w = tf.Variable(tf.random_normal(shape=[vlimit, classes], stddev=0.01), name='weights')
b = tf.Variable(tf.ones([batch_size,classes]), name="bias")
logits = tf.matmul(X, w) + b
entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y, name='loss')
loss = tf.reduce_mean(entropy)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)
Testing Phase:
batch_size=1
classes=3
X = tf.placeholder(tf.float32, [batch_size,vlimit ], name='X_placeholder')
Y = tf.placeholder(tf.int32, [batch_size, classes], name='Y_placeholder')
w = tf.Variable(tf.random_normal(shape=[vlimit, classes], stddev=0.01), name='weights')
b = tf.Variable(tf.ones([batch_size,classes]), name="bias")
logits = tf.matmul(X, w) + b
entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y, name='loss')
loss = tf.reduce_mean(entropy)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)
When you define your placeholder, use:
X = tf.placeholder(tf.float32, [None, vlimit ], name='X_placeholder')
Y = tf.placeholder(tf.int32, [None, classes], name='Y_placeholder')
...
instead for both your training and testing phase (actually, you shouldn't need to re-define these for the testing phase). Also define your bias as:
b = tf.Variable(tf.ones([classes]), name="bias")
Otherwise you are training a separate bias for each sample in your batch, which is not what you want.
TensorFlow should automatically unroll along the first dimension of your input and recognize that as the batch size, so for training you can feed it batches of 10, and for testing you can feed it individual samples (or batches of 100 or whatever).
Absolutely. Placeholders are 'buckets' that get fed data from your inputs. The only thing they do is direct data into your model. They can act like 'infinite buckets' using the None trick - you can chuck as much (or as little) data into them as you want (depending on available resources obviously).
In training, try replacing batch_size with None for the Training placeholders:
X = tf.placeholder(tf.float32, [None, vlimit ], name='X_placeholder')
Y = tf.placeholder(tf.int32, [None, classes], name='Y_placeholder')
Then define everything else you have as before.
Then do some training ops, for example:
_, Tr_loss, Tr_acc = sess.run([optimizer, loss, accuracy], feed_dict{x: btc_x, y: btc_y})
For testing, re-use these same placeholders (X,Y) and don't bother redefining the other variables.
All Tensorflow variables are static for a single Tensorflow graph definition. If you're restoring the model, then the placeholders still exist from when it was trained. As will the other variables e.g. w,b,logits,entropy & optimizer.
Then do some testing op, for example:
Ts_loss, Ts_acc = sess.run( [loss, accuracy], feed_dict{ x: test_x , y: test_y } )
Related
Problem
I'm trying to classify some 64x64 images as a black box exercise. The NN I have written doesn't change my weights. First time writing something like this, the same code, but on MNIST letters input works just fine, but on this code it does not train like it should:
import tensorflow as tf
import numpy as np
path = ""
# x is a holder for the 64x64 image
x = tf.placeholder(tf.float32, shape=[None, 4096])
# y_ is a 1 element vector, containing the predicted probability of the label
y_ = tf.placeholder(tf.float32, [None, 1])
# define weights and balances
W = tf.Variable(tf.zeros([4096, 1]))
b = tf.Variable(tf.zeros([1]))
# define our model
y = tf.nn.softmax(tf.matmul(x, W) + b)
# loss is cross entropy
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
# each training step in gradient decent we want to minimize cross entropy
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
train_labels = np.reshape(np.genfromtxt(path + "train_labels.csv", delimiter=',', skip_header=1), (14999, 1))
train_data = np.genfromtxt(path + "train_samples.csv", delimiter=',', skip_header=1)
# perform 150 training steps with each taking 100 train data
for i in range(0, 15000, 100):
sess.run(train_step, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]})
if i % 500 == 0:
print(sess.run(cross_entropy, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]}))
print(sess.run(b), sess.run(W))
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.close()
How do I solve this problem?
The key to the problem is that the class number of you output y_ and y is 1.You should adopt one-hot mode when you use tf.nn.softmax_cross_entropy_with_logits on classification problems in tensorflow. tf.nn.softmax_cross_entropy_with_logits will first compute tf.nn.softmax. When your class number is 1, your results are all the same. For example:
import tensorflow as tf
y = tf.constant([[1],[0],[1]],dtype=tf.float32)
y_ = tf.constant([[1],[2],[3]],dtype=tf.float32)
softmax_var = tf.nn.softmax(logits=y_)
cross_entropy = tf.multiply(y, tf.log(softmax_var))
errors = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
with tf.Session() as sess:
print(sess.run(softmax_var))
print(sess.run(cross_entropy))
print(sess.run(errors))
[[1.]
[1.]
[1.]]
[[0.]
[0.]
[0.]]
[0. 0. 0.]
This means that no matter what your output y_, your loss will be zero. So your weights and bias haven't been updated.
The solution is to modify the class number of y_ and y.
I suppose your class number is n.
First approch:You can change data to one-hot before feed data.Then use the following code.
y_ = tf.placeholder(tf.float32, [None, n])
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))
Second approch:change data to one-hot after feed data.
y_ = tf.placeholder(tf.int32, [None, 1])
y_ = tf.one_hot(y_,n) # your dtype of y_ need to be tf.int32
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))
All your initial weights are zeros. When you have that way, the NN doesn't learn well. You need to initialize all the initial weights with random values.
See Why should weights of Neural Networks be initialized to random numbers?
"Why Not Set Weights to Zero?
We can use the same set of weights each time we train the network; for example, you could use the values of 0.0 for all weights.
In this case, the equations of the learning algorithm would fail to make any changes to the network weights, and the model will be stuck. It is important to note that the bias weight in each neuron is set to zero by default, not a small random value.
"
See
https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/
In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:
keras_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)
model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)
nb_batchs = int(len(x_train)/batch_size)
for epoch in range(epochs):
loss = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
loss_batch, acc_batch = model.train_on_batch([x], [y])
loss += loss_batch
print(epoch, loss / nb_batchs)
tensorflow_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)
input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)
nb_batchs = int(len(x_train)/batch_size)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
loss = 0.0
acc = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
feed_dict = {input_x: x,
input_y: y}
_, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)
loss += loss_batch
print(epoch, loss / nb_batchs)
Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.
Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:
Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities
Implementation of optimizers in Keras and Tensorflow are different.
It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.
I am trying to write a two layer neural network to train a class labeler. The input to the network is a 150-feature list of about 1000 examples; all features on all examples have been L2 normalized.
I only have two outputs, and they should be disjoint--I am just attempting to predict whether the example is a one or a zero.
My code is relatively simple; I am feeding the input data into the hidden layer, and then the hidden layer into the output. As I really just want to see this working in action, I am training on the entire data set with each step.
My code is below. Based on the other NN implementations I have referred to, I believe that the performance of this network should be improving over time. However, regardless of the number of epochs I set, I am getting back an accuracy of about ~20%. The accuracy is not changing when the number of steps are changed, so I don't believe that my weights and biases are being updated.
Is there something obvious I am missing with my model? Thanks!
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.01
epochs = 200
n_input = 150
n_hidden = 75
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden]))
b1 = tf.Variable(tf.truncated_normal([n_output]))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden]))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output]))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
model = tf.train.GradientDescentOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run(model,feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
I changed your optimizer to AdamOptimizer (in many cases it performs better than GradientDescentOptimizer).
I also played a bit with the parameters. In particular, I took smaller std for your variable initialization, decreased learning rate (as your loss was unstable and "jumped around") and increased epochs (as I noticed that your loss continues to decrease).
I also reduced the size of the hidden layer. It is harder to train networks with large hidden layer when you don't have that much data.
Regarding your loss, it is better to apply tf.reduce_mean on it so that loss would be a number. In addition, following the answer of ml4294, I used softmax instead of sigmoid, so the loss looks like:
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y))
The code below achieves accuracy of around 99.9% on the training data:
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.002
epochs = 400
n_input = 150
n_hidden = 60
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden],stddev=0.2,seed=0))
b1 = tf.Variable(tf.truncated_normal([n_output],stddev=0.2,seed=0))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden],stddev=0.2,seed=0))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output],stddev=0.2,seed=0))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y)) # calculate loss between prediction and actual
model = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run([model,loss],feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
Just a suggestion in addition to the answer provided by Miriam Farber:
You use a multi-dimensional output label ([0., 1.]) for the classification. I suggest to use the softmax cross entropy tf.nn.softmax_cross_entropy_with_logits() instead of the sigmoid cross entropy, since you assume the outputs to be disjoint softmax on Wikipedia. I achieved much faster convergence with this small modification.
This should also improve your performance once you decide to increase your output dimensionality from 2 to a higher number.
I guess you have some problem here:
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
It should look smth like that:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y))
Did't look at you code much, so if this would't work out you can check udacity deep learning course or forum they have good samples of that are you trying to do.
GL
I am new to TensorFlow and have difficulties understanding the RNN module. I am trying to extract hidden/cell states from an LSTM.
For my code, I am using the implementation from https://github.com/aymericdamien/TensorFlow-Examples.
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))}
biases = {'out': tf.Variable(tf.random_normal([n_classes]))}
def RNN(x, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Permuting batch_size and n_steps
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_input])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(0, n_steps, x)
# Define a lstm cell with tensorflow
#with tf.variable_scope('RNN'):
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out'], states
pred, states = RNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
Now I want to extract the cell/hidden state for each time step in a prediction. The state is stored in a LSTMStateTuple of the form (c,h), which I can find out by evaluating print states. However, trying to call print states.c.eval() (which according to the documentation should give me values in the tensor states.c), yields an error stating that my variables are not initialized even though I am calling it right after I am predicting something. The code for this is here:
# Launch the graph
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
for v in tf.get_collection(tf.GraphKeys.VARIABLES, scope='RNN'):
print v.name
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
print states.c.eval()
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
step += 1
print "Optimization Finished!"
and the error message is
InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
The states are also not visible in tf.all_variables(), only the trained matrix/bias tensors (as described here: Tensorflow: show or save forget gate values in LSTM). I don't want to build the whole LSTM from scratch though since I have the states in the states variable, I just need to call it.
You may simply collect the values of the states in the same way accuracy is collected.
I guess, pred, states, acc = sess.run(pred, states, accuracy, feed_dict={x: batch_x, y: batch_y}) should work perfectly fine.
One comment about your assumption: the "states" does have only the values of "hidden state" and "memory cell" from last timestep.
The "outputs" contain the "hidden state" from each time step you want (the size of outputs is [batch_size, seq_len, hidden_size]. So I assume that you want "outputs" variable, not "states". See the documentation.
I have to disagree with the answer of user3480922. For the code:
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
to be able to extract the hidden state for each time_step in a prediction, you have to use the outputs. Because outputs have the hidden state value for each time_step. However, I am not sure is there any way we can store the values of the cell state for each time_step as well. Because states tuple provides the cell state values but only for the last time_step.
For example, in the following sample with 5 time_steps, the outputs[4,:,:], time_step = 0,...,4 has the hidden state values for time_step=4, whereas the states tuple h only has the hidden state values for time_step=4. State tuple c has the cell value at the time_step=4 though.
outputs = [[[ 0.0589103 -0.06925126 -0.01531546 0.06108122]
[ 0.00861215 0.06067181 0.03790079 -0.04296958]
[ 0.00597713 0.03916606 0.02355802 -0.0277683 ]]
[[ 0.06252582 -0.07336216 -0.01607122 0.05024602]
[ 0.05464711 0.03219429 0.06635305 0.00753127]
[ 0.05385715 0.01259535 0.0524035 0.01696803]]
[[ 0.0853352 -0.06414541 0.02524283 0.05798233]
[ 0.10790729 -0.05008117 0.03003334 0.07391824]
[ 0.10205664 -0.04479517 0.03844892 0.0693808 ]]
[[ 0.10556188 0.0516542 0.09162509 -0.02726674]
[ 0.11425048 -0.00211394 0.06025286 0.03575509]
[ 0.11338984 0.02839304 0.08105748 0.01564003]]
**[[ 0.10072514 0.14767936 0.12387902 -0.07391471]
[ 0.10510238 0.06321315 0.08100517 -0.00940042]
[ 0.10553667 0.0984127 0.10094948 -0.02546882]]**]
states = LSTMStateTuple(c=array([[ 0.23870754, 0.24315512, 0.20842518, -0.12798975],
[ 0.23749796, 0.10797793, 0.14181322, -0.01695861],
[ 0.2413336 , 0.16692916, 0.17559692, -0.0453596 ]], dtype=float32), h=array(**[[ 0.10072514, 0.14767936, 0.12387902, -0.07391471],
[ 0.10510238, 0.06321315, 0.08100517, -0.00940042],
[ 0.10553667, 0.0984127 , 0.10094948, -0.02546882]]**, dtype=float32))
I'm trying to (re)train AlexNet (based on the code found here) for a particular binary classification problem. Since my GPU is not very powerful, I settled on a batch size of 8 for training. This size determines the shape of the input tensor (8,227,227,3). However, one can use a larger batch size for the testing process, since there is no backprop involved.
My question is, how could I reuse the already trained hidden layers to create a different network on the same graph specifically for testing?
Here's a snippet of what I have tried to do:
NUM_TRAINING_STEPS = 200
BATCH_SIZE = 1
LEARNING_RATE = 1e-1
IMAGE_SIZE = 227
NUM_CHANNELS = 3
NUM_CLASSES = 2
def main():
graph = tf.Graph()
trace = Tracer()
train_data = readImage(filename1)
test_data = readImage(filename2)
train_labels = np.array([[0.0,1.0]])
with graph.as_default():
batch_data = tf.placeholder(tf.float32, shape=(BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS) )
batch_labels = tf.placeholder(tf.float32, shape=(BATCH_SIZE, NUM_CLASSES) )
logits_training = createNetwork(batch_data)
loss = lossLayer(logits_training, batch_labels)
train_prediction = tf.nn.softmax(logits_training)
print 'Prediction shape: ' + str(train_prediction.get_shape())
optimizer = tf.train.GradientDescentOptimizer(learning_rate=LEARNING_RATE).minimize(loss)
test_placeholder = tf.placeholder(tf.float32, shape=(1, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS) )
logits_test = createNetwork(test_placeholder)
test_prediction = tf.nn.softmax(logits_test)
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
for step in range(NUM_TRAINING_STEPS):
print 'Step #: ' + str(step+1)
feed_dict = {batch_data: train_data, batch_labels : train_labels}
_, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
feed_dict = {batch_data:test_data, test_placeholder:test_data}
logits1, logits2 = session.run([logits_training,logits_test],feed_dict=feed_dict)
print (logits1 - logits2)
return
I'm only training with a single image, just to evaluate whether network is actually being trained and if the values of logits1 and logits2 are the same. They are not, by several orders of magnitude.
createNetwork is a function which loads the weights for AlexNet and builds the model, based on the code for the myalexnet.py script found on the page to which I linked.
I've tried to replicate the examples from the Udacity course on Deep Learning, in particular, assignments 3 and 4.
If anyone could figure out how I could use the same layers for training and testing, I would be very grateful.
Use shape=Nonefor your placeholders: placeholder doc
This way you can feed any shape of data. Another (worse) option is to recreate your graph for testing with the shapes that you need, and load the ckpt that was created during training.