Does anyone have an example for code that uses tf.train.AdadeltaOptimizer with good results?
I have a TF graph, that was originally set with tf.train.AdamOptimizer, and is working well. When I replace it with AdadeltaOptimizer, with the default params, it gives lousy results.
I used Cuda 7.5.
The below is example code which works with 'AdadeltaOptimizer' optimizer. It works with 'Adam'. The only difference between them that Adam is insensitive to "learning rate" and 'Adadelta' is sensitive.
I advice you to read more about optimization algorithm (like here).
In your own example, just try to change 'learning rate' to be smaller or bigger (it is named 'hyperparameter optimization').
From my experience, 'Adam' is a very good optimizer for RNN, better than 'AdaDelta' (using example code, 'Adam' achieve better score much faster). On the other hand, for CNN, SGD+Momentum works best.
Code, which learn MNIST classification using Bi-LSTM:
# Mnist classification using Bi-LSTM
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
learning_rate = 0.01
training_epochs = 100
batch_size = 64
seq_length = 28
heigh_image = 28
hidden_size = 128
class_numer = 10
input = tf.placeholder(tf.float32, [None, None, heigh_image])
target = tf.placeholder(tf.float32, [None, class_numer])
seq_len = tf.placeholder(tf.int32, [None])
def fulconn_layer(input_data, output_dim, activation_func=None):
input_dim = int(input_data.get_shape()[1])
W = tf.Variable(tf.random_normal([input_dim, output_dim]))
b = tf.Variable(tf.random_normal([output_dim]))
if activation_func:
return activation_func(tf.matmul(input_data, W) + b)
return tf.matmul(input_data, W) + b
with tf.name_scope("BiLSTM"):
with tf.variable_scope('forward'):
lstm_fw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, forget_bias=1.0, state_is_tuple=True)
with tf.variable_scope('backward'):
lstm_bw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, forget_bias=1.0, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell, cell_bw=lstm_bw_cell, inputs=input,sequence_length=seq_len, dtype=tf.float32, scope="BiLSTM")
# As we have Bi-LSTM, we have two output, which are not connected. So merge them
outputs = tf.concat(2, outputs)
# As we want do classification, we only need the last output from LSTM.
last_output = outputs[:,0,:]
# Create the final classification layer
yhat = fulconn_layer(last_output, class_numer)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(yhat, target))
optimizer = tf.train.AdadeltaOptimizer(learning_rate=learning_rate).minimize(cost) # AdamOptimizer
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(target, 1), tf.argmax(yhat, 1)), tf.float32))
gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts)) as session:
print ("Start Learing")
for epoch in range(training_epochs):
for i in range(int(mnist.train.num_examples/batch_size)):
x_batch, y_batch = mnist.train.next_batch(batch_size)
x_batch = x_batch.reshape([batch_size, seq_length, heigh_image])
train_seq_len = np.ones(batch_size) * seq_length[optimizer], feed_dict={input: x_batch, target: y_batch, seq_len: train_seq_len})
train_accuracy =, feed_dict={input: x_batch, target: y_batch, seq_len: train_seq_len})
x_test = mnist.test.images.reshape([-1, seq_length, heigh_image])
y_test = mnist.test.labels
test_seq_len = np.ones(x_test.shape[0]) * seq_length
test_accuracy =, feed_dict={input: x_test, target: y_test, seq_len: test_seq_len})
print("epoch: %d, train_accuracy: %3f, test_accuracy: %3f" % (epoch, train_accuracy, test_accuracy))
i'm trying to create a neural network model for a kaggle competition using mnist dataset. currently, my code looks like this since i am trying to capture certain metrics. however, i can't seem to figure out how to turn this into an output to submit.
import time
import tensorflow.compat.v1 as tf
import pandas as pd
import numpy as np
from tensorflow.python.framework import ops
import ssl
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
# Legacy Python that doesn't verify HTTPS certificates by default
# Handle target environment that doesn't support HTTPS verification
ssl._create_default_https_context = _create_unverified_https_context
# Load training and testing data directly from TensorFlow
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
# Initialize metrics
metrics = {}
# Initialize metric names
names = ['Number of Hidden Layers', 'Nodes per Layer', 'Time in Seconds',
'Training Set Accuracy', 'Test Set Accuracy']
# Set fixed parameters
n_epochs = 20
batch_size = 50
learning_rate = 0.01
# Function that creates batch generator used in training
def shuffle_batch(X, y, batch_size):
rnd_idx = np.random.permutation(len(X))
n_batches = len(X) // batch_size
for batch_idx in np.array_split(rnd_idx, n_batches):
X_batch, y_batch = X[batch_idx], y[batch_idx]
yield X_batch, y_batch
# Start timer
start = time.process_time()
n_hidden = 300
# Reset the session
# Set X and y placeholders
X = tf.placeholder(tf.float32, shape=(None, 784), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden, name="hidden1",
hidden2 = tf.layers.dense(hidden1, n_hidden, name="hidden2",
logits = tf.layers.dense(hidden2, 10, name="outputs")
y_proba = tf.nn.softmax(logits)
with tf.name_scope("loss"):
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
loss = tf.reduce_mean(xentropy, name="loss")
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
with tf.Session() as sess:
for epoch in range(n_epochs):
for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):, feed_dict={X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={X: X_train, y: y_train})
acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
# Record the clock time it takes
duration = time.process_time() - start
metrics['Model 1'] = [2, n_hidden, duration, acc_train, acc_test]
# Convert metrics dictionary to dataframe for display
results_summary = pd.DataFrame.from_dict(metrics, orient='index')
results_summary.columns = names
# Sort by model number
results_summary.sort_values(by=['index'], axis=0, inplace=True)
results_summary.set_index(['index'], inplace=True) = None
# Export to csv
i need to create an output that looks something like this in csv file:
ImageId Label
0 1 2
1 2 0
2 3 9
3 4 0
4 5 3
would i have to recreate the whole thing in order to actually create "y_pred" when doing something like model.predict(X_test), or can i just reshape the existing code in some way to do this? ideally, i would like to capture predicted values and compare them to true values using a confusion matrix.
Since you're using old tensorflow style, predict using old tensorflow style at the end:
feed_dict = {X: X_test}
classification =, feed_dict)
label = numpy.argmax(classification, axis=-1)
Maybe you need to create a new session for this, or use the existing, I'm not sure how this works, try to see which option gives out reasonable results.
I'm trying to write this tensorflow tutorial and I got the below error:
ValueError: No gradients provided for any variable: ['Variable:0',
'Variable_1:0', 'Variable:0', 'Variable_1:0', 'Variable:0',
'Variable_1:0', 'Variable_6:0', 'Variable_7:0'].
import tensorflow as tf
# from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)
mnist = tf.keras.datasets.mnist
n_nodes_hl1 = 500
n_nodes_hl2 = 500
n_nodes_hl3 = 500
n_classes = 10
batch_size = 100
x = tf.compat.v1.placeholder('float', [None, 784])
y = tf.compat.v1.placeholder('float')
class NN:
def __init__(self):
self.hidden_1_layer = {}
self.hidden_2_layer = {}
self.hidden_3_layer = {}
self.output_layer = {}
def neural_network_model(self,data):
self.hidden_1_layer = {'weights':tf.Variable(tf.compat.v1.random.normal([784, n_nodes_hl1])),
self.hidden_2_layer = {'weights':tf.Variable(tf.compat.v1.random.normal([n_nodes_hl1, n_nodes_hl2])),
self.hidden_3_layer = {'weights':tf.Variable(tf.compat.v1.random.normal([n_nodes_hl2, n_nodes_hl3])),
self.output_layer = {'weights':tf.Variable(tf.compat.v1.random.normal([n_nodes_hl3, n_classes])),
l1 = tf.add(tf.matmul(data,self.hidden_1_layer['weights']), self.hidden_1_layer['biases'])
l1 = tf.nn.relu(l1)
l2 = tf.add(tf.matmul(l1,self.hidden_2_layer['weights']), self.hidden_2_layer['biases'])
l2 = tf.nn.relu(l2)
l3 = tf.add(tf.matmul(l2,self.hidden_3_layer['weights']), self.hidden_3_layer['biases'])
l3 = tf.nn.relu(l3)
output = tf.matmul(l3,self.output_layer['weights']) + self.output_layer['biases']
return output
def train_neural_network(self,x):
prediction = self.neural_network_model(x)
cost = lambda: tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) )
optimizer = tf.optimizers.Adam().minimize(cost, var_list=[self.hidden_1_layer['weights'],self.hidden_1_layer['biases'],self.hidden_2_layer['weights'],self.hidden_2_layer['biases'],self.hidden_3_layer['weights'],self.hidden_3_layer['biases'],self.output_layer['weights'],self.output_layer['biases']])
hm_epochs = 10
with tf.compat.v1.Session() as sess:
for epoch in range(hm_epochs):
epoch_loss = 0
for _ in range(int(mnist.train.num_examples/batch_size)):
epoch_x, epoch_y = mnist.train.next_batch(batch_size)
_, c =[optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
epoch_loss += c
print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)
correct = tf.math.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))
model = NN()
tensorflow version is 2.1.0..
[edit]: to initilize layers in init
where is the problem?
Oh my... If your new to tensorflow, you should really become familiar with layers, training, testing, making models, and such. From what I can tell, what you have above is a detailed expansion of what goes on under the hood of the following code:
import tensorflow as tf
mnist = tf.keras.datasets.mnist.load_data()
model = tf.keras.models.Sequential([
#Trains in 1 min or less on cpu. Few seconds on gpu.
#Plot some data to see what you trained.
import matplotlib.pyplot as plt
#Does this look like a 5?
#Here are first few predictions for the images.
#Notice the first image was predicted to be a 5 by the trained model.
#Here is what we trained to predict.
#Do our labels match what our model predicts?
#Accuracy (match the predicted against our labels)
matching = model.predict(mnist[1][0]).round().argmax(1) == mnist[1][1]
I got an accuracy of 93.95%. Play with the layers, change 'relu' to 'sigmoid', change the 500 neurons to 400. Try 'sgd' instead of 'adam'. Do more epochs. Increase the batch_size to 1000. What trains faster? What trains slower? Can you get an accuracy of 95%, 98%, 99.9%? Hope this helps.
I'm currently learning how to use Tensorflow and I'm having some issues to implement this Softmax Regression aplication.
There's no error when compiling but, for some reasson text validation and test predictions shows no improvement, only the train prediction is showing improvement.
I'm using Stocastic Gradient Descent(SGD) with minibatches in order to converge faster, but don't know if this could be causing a trouble somehow.
I'll be thankful if you could share some ideas, here's the full code:
import input_data
import numpy as np
import random as ran
import tensorflow as tf
import matplotlib.pyplot as plt
mnist = input_data.read_data_sets('MNIST_Data/', one_hot=True)
#Features & Data
num_features = 784
num_labels = 10
learning_rate = 0.05
batch_size = 128
num_steps = 5001
train_dataset = mnist.train.images
train_labels = mnist.train.labels
test_dataset = mnist.test.images
test_labels = mnist.test.labels
valid_dataset = mnist.validation.images
valid_labels = mnist.validation.labels
graph = tf.Graph()
with graph.as_default():
tf_train_data = tf.placeholder(tf.float32, shape=(batch_size, num_features))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_data = tf.constant(valid_dataset)
tf_test_data = tf.constant(test_dataset)
W = tf.Variable(tf.truncated_normal([num_features, num_labels]))
b = tf.Variable(tf.zeros([num_labels]))
score_vector = tf.matmul(tf_train_data, W) + b
cost_func = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
labels=tf_train_labels, logits=score_vector))
score_valid = tf.matmul(tf_test_data, W) + b
score_test = tf.matmul(tf_valid_data, W) + b
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_func)
train_pred = tf.nn.softmax(score_vector)
valid_pred = tf.nn.softmax(score_valid)
test_pred = tf.nn.softmax(score_test)
def accuracy(predictions, labels):
correct_pred = np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
accu = (100.0 * correct_pred) / predictions.shape[0]
return accu
with tf.Session(graph=graph) as sess:
for step in range(num_steps):
offset = np.random.randint(0, train_labels.shape[0] - batch_size - 1)
batch_data = train_dataset[offset:(offset+batch_size), :]
batch_labels = train_labels[offset:(offset+batch_size), :]
feed_dict = {tf_train_data : batch_data,
tf_train_labels : batch_labels
_, l, predictions =[optimizer, cost_func, train_pred],
if (step % 500 == 0):
print("Minibatch loss at step {0}: {1}".format(step, l))
print("Minibatch accuracy: {:.1f}%".format(
accuracy(predictions, batch_labels)))
print("Validation accuracy: {:.1f}%".format(
accuracy(valid_pred.eval(), valid_labels)))
print("\nTest accuracy: {:.1f}%".format(
accuracy(test_pred.eval(), test_labels)))
It sounds like overfitting, which isn't surprising since this model is basically a linear regression model.
There are few options you can try:
1. add hidden layers + activation functions( elu paper works on mnist data set with vanilla DNN).
2. Use either CNN or RNN, although CNN is more apt for image problems.
3. Use a better optimizer. If you are new, try ADAM optimizer (, and then move onto using momentum with nestrov(
Without feature engineering, it'll be hard to pull off image classification using just linear regression. Also, you do not need to run softmax on your outcomes since softmax is designed to smooth argmax. Lastly, you should input (None,num_features) into shape of placeholders instead to have variational batch size. This will allow you to directly feed your valid and test datasets into feed_dict without having to create additional tensors.
I'm trying to create a neural network that takes 13 features as input from multiple csv files one at a time and measure accuracy after each iteration. Here is my code snippet:
import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import fully_connected
import os
import pandas as pd
n_inputs = 13
n_hidden1 = 30
n_hidden2 = 10
n_outputs = 2
learning_rate = 0.01
n_epochs = 40
batch_size = 1
patient_id = os.listdir('./subset_numerical')
output = pd.read_csv('output.csv')
sepsis_pat = output['output'].tolist()
X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")
y = tf.placeholder(tf.int64, shape=[None], name="y")
def data_processor(n):
id = pd.read_csv('./subset_numerical/'+patient_id[n])
id_input = np.array([id['VALUE'].tolist()])
for s in sepsis_pat:
if str(s) == str(patient_id[n].split('.')[0]):
a = 1
if a == 1:
a = 0
return [id_input, np.array([1])]
return [id_input, np.array([0])]
def test_set():
id_combined = []
out = []
for p in range(300, len(patient_id)):
id1 = pd.read_csv('./subset_numerical/' + patient_id[p])
id_input1 = np.array(id1['VALUE'].tolist())
for s in sepsis_pat:
if str(s) == str(patient_id[p].split('.')[0]):
a = 1
if a == 1:
a = 0
out.append([1, 0])
out.append([0, 1])
return [np.array(id_combined), np.array(out)]
# Declaration of hidden layers and calculation of loss goes here
# Construction phase begins
with tf.name_scope("dnn"):
hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
logits = fully_connected(hidden2, n_outputs, scope="outputs", activation_fn=None) # We will apply softmax here later
# Calculating loss
with tf.name_scope("loss"):
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
# Training with gradient descent optimizer
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
# Measuring accuracy
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
accuracy_summary = tf.summary.scalar('accuracy', accuracy)
# Variable initialization and saving model goes here
# Construction is finished. Let's get this to work.
with tf.Session() as sess:
for epoch in range(n_epochs):
a = 0
for iteration in range(300 // batch_size):
X_batch, y_batch = data_processor(iteration), feed_dict={X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
X_test, y_test = test_set()
acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
save_path =, "./my_model_final.ckpt")
But I'm stuck with this error:
logits and labels must be same size: logits_size=[1,2] labels_size=[1,1]
The error seems to occur at this line:
correct = tf.nn.in_top_k(logits, y, 1)
What am I doing wrong?
Based on your error log provided, the problem is in this line of your code:
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits)
Ensure that both of them have same shape and dtype.
The shape should be of the format [batch_size, num_classes] and dtype should be of type float16, float32 or float64. Check the documentation of softmax_cross_entropy_with_logits for more details.
Since you've defined n_outputs = 2, the shape of logits is [?, 2] (? means batch size), while the shape of y is just [?]. In order to apply the softmax loss function, the last FC layer should return a flat tensor, which can be compared with y.
Solution: set n_outputs = 1.
I'm working on an RBF network using Tensorflow, but there's this error that comes up at line 112 that says this: ValueError: Cannot feed value of shape (40, 13) for Tensor 'Placeholder:0', which has shape '(?, 12)'
Here's my code below. I created my own activation function for my RBF network by following this tutorial. Also, if there is anything else you notice that needs to be fixed, please point it out to me, because I am very new to Tensorflow so it would be helpful to get any feedback I can get.
import tensorflow as tf
import numpy as np
import math
from sklearn import datasets
from sklearn.model_selection import train_test_split
from tensorflow.python.framework import ops
boston = datasets.load_boston()
data = boston["data"]
target = boston["target"]
N_INSTANCES = data.shape[0]
N_INPUT = data.shape[1] - 1
batch_size = 40
training_epochs = 400
learning_rate = 0.001
display_step = 20
hidden_size = 200
target_ = np.zeros((N_INSTANCES, N_CLASSES))
data_train, data_test, target_train, target_test = train_test_split(data, target_, test_size=0.1, random_state=100)
x_data = tf.placeholder(shape=[None, N_INPUT], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, N_CLASSES], dtype=tf.float32)
# creates activation function
def gaussian_function(input_layer):
initial = math.exp(-2*math.pow(input_layer, 2))
return initial
np_gaussian_function = np.vectorize(gaussian_function)
def d_gaussian_function(input_layer):
initial = -4 * input_layer * math.exp(-2*math.pow(input_layer, 2))
return initial
np_d_gaussian_function = np.vectorize(d_gaussian_function)
np_d_gaussian_function_32 = lambda input_layer: np_d_gaussian_function(input_layer).astype(np.float32)
def tf_d_gaussian_function(input_layer, name=None):
with ops.name_scope(name, "d_gaussian_function", [input_layer]) as name:
y = tf.py_func(np_d_gaussian_function_32, [input_layer],[tf.float32], name=name, stateful=False)
return y[0]
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
rnd_name = 'PyFunGrad' + str(np.random.randint(0, 1E+8))
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
def gaussian_function_grad(op, grad):
input_variable = op.inputs[0]
n_gr = tf_d_gaussian_function(input_variable)
return grad * n_gr
np_gaussian_function_32 = lambda input_layer: np_gaussian_function(input_layer).astype(np.float32)
def tf_gaussian_function(input_layer, name=None):
with ops.name_scope(name, "gaussian_function", [input_layer]) as name:
y = py_func(np_gaussian_function_32, [input_layer], [tf.float32], name=name, grad=gaussian_function_grad)
return y[0]
# end of defining activation function
def rbf_network(input_layer, weights):
layer1 = tf.matmul(tf_gaussian_function(input_layer), weights['h1'])
layer2 = tf.matmul(tf_gaussian_function(layer1), weights['h2'])
output = tf.matmul(tf_gaussian_function(layer2), weights['output'])
return output
weights = {
'h1': tf.Variable(tf.random_normal([N_INPUT, hidden_size], stddev=0.1)),
'h2': tf.Variable(tf.random_normal([hidden_size, hidden_size], stddev=0.1)),
'output': tf.Variable(tf.random_normal([hidden_size, N_CLASSES], stddev=0.1))
pred = rbf_network(x_data, weights)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y_target))
my_opt = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y_target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
# Training loop
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(data_train.shape[0] / batch_size)
for i in range(total_batch):
randidx = np.random.randint(int(TRAIN_SIZE), size=batch_size)
batch_xs = data_train[randidx, :]
batch_ys = target_train[randidx, :], feed_dict={x_data: batch_xs, y_target: batch_ys})
avg_cost +=, feed_dict={x_data: batch_xs, y_target: batch_ys})/total_batch
if epoch % display_step == 0:
print("Epoch: %03d/%03d cost: %.9f" % (epoch, training_epochs, avg_cost))
train_accuracy =, feed_dict={x_data: batch_xs, y_target: batch_ys})
print("Training accuracy: %.3f" % train_accuracy)
test_acc =, feed_dict={x_data: data_test, y_target: target_test})
print("Test accuracy: %.3f" % (test_acc))
As it has been said, you should have N_Input = data.shape[1].
Actually data.shape[0] relates the number of realisations you have in your data-set and data.shape[1] tells us how many features the network should consider.
The number of features is by definition the size of the input layer regardless how many data you will propose (via feed_dict) to your network.
Plus boston dataset is a regression problem while softmax_cross_entropy is a cost function for classification problem. You can try tf.square to evaluate the euclidean distance between what you are predicting and what you want :
cost = tf.reduce_mean(tf.square(pred - y_target))
You will see that your network is learning, even though the accuracy is not very high.
Edit :
Your code is actually learning well but you used the wrong tool to measure it.
Mainly, your errors still reside in the fact that you are dealing with regression problem not with a classification problem.
In classification problem you can evaluate the accuracy of your on-going learning process using
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y_target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
It consists in checking if the predicted class is the same as the expected class, for an input among x_test.
In regression problem, doing so is senseless since you are looking for a real number i.e. an infinity of possibility from the classification point of view.
In regression problem you can estimate the error (mean or whatever) between predicted values and expected values. We can use what I suggested below :
cost = tf.reduce_mean(tf.square(pred - y_target))
I modified your code consequently here it is
pred = rbf_network(x_data, weights)
cost = tf.reduce_mean(tf.square(pred - y_target))
my_opt = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
#correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y_target, 1))
#accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
plt.figure("Error evolution")
plt.ylabel("Error evolution")
tol = 5e-4
epoch, err=0, 1
# Training loop
while epoch <= training_epochs and err >= tol:
avg_cost = 0.
total_batch = int(data_train.shape[0] / batch_size)
for i in range(total_batch):
randidx = np.random.randint(int(TRAIN_SIZE), size=batch_size)
batch_xs = data_train[randidx, :]
batch_ys = target_train[randidx, :], feed_dict={x_data: batch_xs, y_target: batch_ys})
avg_cost +=, feed_dict={x_data: batch_xs, y_target: batch_ys})/total_batch
plt.plot(epoch, avg_cost, marker='o', linestyle="none", c='k')
err = avg_cost
if epoch % 10 == 0:
print("Epoch: {}/{} err = {}".format(epoch, training_epochs, avg_cost))
epoch +=1
print ("End of learning process")
print ("Final epoch = {}/{} ".format(epoch, training_epochs))
print ("Final error = {}".format(err) )
The output is
Epoch: 0/400 err = 0.107879924503
Epoch: 10/400 err = 0.00520248359747
Epoch: 20/400 err = 0.000651647908274
End of learning process
Final epoch = 26/400
Final error = 0.000474644409471
We plot the evolution of the error in the training through the different epochs
I'm also new to Tensorflow and this is my first answer in stackoverflow. I tried your code and I got the same error.
You can see in the error code ValueError: Cannot feed value of shape (40, 13) for Tensor 'Placeholder:0', which has shape '(?, 12), that there is a mismatch in the shapes of the first placeholder:
x_data = tf.placeholder(shape=[None, N_INPUT], dtype=tf.float32)
so I'm not sure why the N_INPUT has a -1 in this line
N_INPUT = data.shape[1] - 1
I tried removing it and the code runs. Though it looks like the network isn't learning.
While this implementation will do the job, I don't think its the most optimal RBF implementation. You are using a fixed size of 200 centroids (hidden units) in your RBF. This causes the centroids to not be optimally placed and the width of your Gaussian basis function to not be optimally sized. Typically the centroids should be learned in an unsupervised pre-stage by using K Means or any other kind of clustering algorithm.
So your 1st training stage would involve finding the centroids/centers of the RBFs, and the 2nd stage would be the actual classification/regression using the RBF Network