Tensorflow Multi-layer perceptron graph won't converge

Tensorflow Multi-layer perceptron graph won't converge - python

I am new to python and tensorflow. After better(maybe) understanding DNN and its math. I start to learn to use tensorflow by exercises.
One of my exercises is to predict x^2. Which means after fine training. When I give 5.0, it will predict 25.0.
Parameters and settings:
Cost function = E((y-y')^2)
Two hidden Layers and they are fully connected.
learning_rate = 0.001
n_hidden_1 = 3
n_hidden_2 = 2
n_input = 1
n_output = 1
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
def generate_input():
import random
val = random.uniform(-10000, 10000)
return np.array([val]).reshape(1, -1), np.array([val*val]).reshape(1, -1)
# tf Graph input
# given one value and output one value
x = tf.placeholder("float", [None, 1])
y = tf.placeholder("float", [None, 1])
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
distance = tf.sub(pred, y)
cost = tf.reduce_mean(tf.pow(distance, 2))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
avg_cost = 0.0
for iter in range(10000):
inp, ans = generate_input()
_, c = sess.run([optimizer, cost], feed_dict={x: inp, y: ans})
print('iter: '+str(iter)+' cost='+str(c))
However, it turns out that c sometimes gets larger,and sometimes lower. (but it is big)

It seems that your training data have big range because of the statement val = random.uniform(-10000, 10000), try to do some data preprocessing before you train. for example,
val = random.uniform(-10000, 10000)
val = np.asarray(val).reshape(1, -1)
val -= np.mean(val, axis=0)
val /= np.std(val, axis=0)
As for loss value, it's ok that sometimes it gets larger,and sometimes lower, just make sure the loss is decreasing when training epoch increase in general. PS: we are often using SGD optimizer.

Related

Neural Network not converging in multilabel classification task - TensorFlow

I am a beginner with TensorFlow and I really need some help with this task. I am doing some image pixels classification, and my problem is set this way:
my inputs is array X that contain 20 values. These values represent 4 pixels (there are 5 values per pixel). My output is an array of 4 values, where each value can be either 1 or 0, and it means that the specific pixel may or may not have a specific characteristic. As you can imagine, it is possible that my y is of the form y=[1, 0, 0, 1], so I can have multiple classes for each instance.
To perform this classification task, I organized a neural network with an input layer of 20 neurons, a hidden layer of 15, followed by another hidden layer of 10 and eventually there is an output layer of 4. The activation functions for the hidden layers are ReLU, and I applied 50% dropout to them. As a loss function to minimize, I use tensorflow's sigmoid_cross_entropy_with_logits, because it computes independent probabilities for each class, allowing to perform multilabel classification.
When I first attempted to train the network, I was getting all NaN results, because of (I think) an exploding gradient problem. This apparently solved after I lowered the learning rate.
My problem now is that this network is not converging at all, and I believe this is because something is wrong with the cost and activation functions that I am using.
Note: the input has been scaled with sklearn.preprocessing.StandardScaler
Here is the code:
import tensorflow as tf
n_inputs = 20
n_hidden1 = 15
n_hidden2 = 10
n_outputs = 4
dropout_rate = 0.5
learning_rate = 0.000000001
#This is a boolean value that indicates when to apply dropout
training = tf.placeholder_with_default(True, shape=(), name="training")
X = tf.placeholder(tf.float64, shape=(None, n_inputs), name ="X")
y = tf.placeholder(tf.int64, shape=(None, 4), name = "y")
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation = tf.nn.relu)
hidden1_drop = tf.layers.dropout(hidden1, dropout_rate, training = training)
hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.relu)
hidden2_drop = tf.layers.dropout(hidden2, dropout_rate, training = training)
logits = tf.layers.dense(hidden2_drop, n_outputs, name="outputs")
with tf.name_scope("loss"):
xentropy = tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.cast(y, tf.float64), logits = tf.cast(logits, tf.float64))
loss = tf.reduce_mean(xentropy, name = 'loss')
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
n_epochs = 50
batch_size = 50
n_batches = 1000000
#training
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for i in range(n_batches):
X_batch = np.asarray(X_batches[i]).reshape(-1, 20)
y_batch = np.asarray(y_batches[i]).reshape(-1, 4)
sess.run(training_op, feed_dict ={X: X_batch, y: y_batch, training:True})
if (i % 10000) == 0:
raws = logits.eval(feed_dict={X: X_batch, training:False})
print("epoca = "+str(epoch))
print("iterazione = "+str(i))
print("accuratezza = "+str(get_global_accuracy_rate(raws, y_batch)))
print("X = "+str(X_batch[0])+ " y = "+str(y_batch[0]))
print("raws = "+str(raws[0])+" pred = " + str(get_prediction(raws[0])))
save_path= saver.save(sess, "./my_model_final_1.ckpt")
Thank you so much in advance for any help!!

ValueError: Cannot feed value of shape (165,) for Tensor 'Placeholder_11:0', which has shape '(?, 2)'

I'm just learning TensorFlow. The goal of this program is to detect Mines and rocks. But I have a problem when I feed the placeholders. I read many questions about this problem in this website but impossible to find a solution.
In the feed_dict the shape of train_y is (165,) and y (the placeholder) is (?, 2)... Here is the problem I think. But I don't know how to solve it. I tried to reshape train_y but it doesn't work.
I have this error:
ValueError: Cannot feed value of shape (165,) for Tensor
'Placeholder_11:0', which has shape '(?, 2)'
Here is the data and my program:
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
#traitement des données
df = pd.read_csv('sonar.all-data.csv')
X = df[df.columns[:60]].values
y = df[df.columns[60]]
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)
#mix data
X,y = shuffle(X, y, random_state = 1)
#separate date for training
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size = 0.2, random_state = 415)
# Parameters
learning_rate = 0.1
num_steps = 500
display_step = 100
# Network Parameters
n_hidden_1 = 60 # 1st layer number of neurons 60
n_hidden_2 = 60 # 2nd layer number of neurons 60
num_input = X.shape[1]
num_classes = 2
# tf Graph input
X = tf.placeholder("float", [None, num_input])
Y = tf.placeholder("float", [None, num_classes])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([num_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, num_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([num_classes]))
}
# Create model
def neural_net(x):
# Hidden fully connected layer with 50 neurons
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden fully connected layer with 50 neurons
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output fully connected layer with a neuron for each class
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
out_layer = tf.nn.relu(out_layer)
return out_layer
# Construct model
logits = neural_net(X)
prediction = tf.nn.softmax(logits)
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
# Evaluate model
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
for step in range(1, num_steps+1):
sess.run(train_op, feed_dict={X: train_x, Y: train_y})
if step % display_step == 0 or step == 1:
# Calculate loss and accuracy
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: train_x,
Y: train_y})
print("Step " + str(step) + ", Minibatch Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(acc))
Thank you for your help!

I think you should apply one-hot encoding of your labels y: replace your LabelEncoder with sklearn.preprocessing.OneHotEncoder.
This code should work for your data:
y = df[df.columns[60]].apply(lambda x: 0 if x == 'R' else 1).values.reshape(-1, 1)
encoder = OneHotEncoder()
encoder.fit(y)
y = encoder.transform(y)

same batch but different batch size generate different result

For example, we have 64*100 input data which will be send to the tensor flow graph, and it will generate 64*(n_hidden nodes) output before putting into softmax or whatever loss function. We put 1*100 into the same graph, the result should be the the first row of previous output, but results are not. I use the tensor flow example on Mnist to test the comparison.
'''
A Multilayer Perceptron implementation example using TensorFlow library.
This example is using the MNIST database of handwritten digits
(http://yann.lecun.com/exdb/mnist/)
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''
from __future__ import print_function
import numpy as np
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1
# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1]), name ='layer1'),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]), name = 'layer2'),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]), name = 'layer3')
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1]), name = 'layer1_b'),
'b2': tf.Variable(tf.random_normal([n_hidden_2]), name = 'layer2_b'),
'out': tf.Variable(tf.random_normal([n_classes]), name = 'layer3_b')
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
#optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
var = tf.all_variables()
trainer = tf.train.AdamOptimizer(learning_rate=learning_rate)
grads = trainer.compute_gradients(cost, var)
update = trainer.apply_gradients(grads)
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
#_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
#c, v,grad, Pred, bi = sess.run([cost, var,grads, pred, biases], feed_dict={x: batch_x, y: batch_y})
Pred_2 = sess.run(pred, feed_dict={x: batch_x, y: batch_y})
Pred_1 = sess.run(pred , feed_dict={x: batch_x[0:1,:], y: batch_y[0:1]})
print(Pred_2[0] == Pred_1)
# Compute average loss
avg_cost += c / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", \
"{:.9f}".format(avg_cost))
# print(len(v))
# g1 = np.array(grad[0])
# g2 = np.array(grad[1])
# g3 = np.array(grad[2])
# g4 = np.array(grad[3])
# g5 = np.array(grad[4])
# g6 = np.array(grad[5])
# print(g1.shape)
# print(g2.shape)
# print(g3.shape)
# print(g4.shape)
# print(g5.shape)
# print(g6.shape)
# print(g6[0,:])
# print(g6[1,:])
# print(bi['out'])
#print(type(updating))
print("Optimization Finished!")
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
print(Pred_2[0] == Pred_1) Should be the same, but they are not. It is strange.

The gradient descent path should be different if your weight and bias initialization is random and the gradients each time are different, it might take a path towards a different minimum.

Tensorflow batch normalization

Below are the code i am using as a learning programming in Tensorflow.
from __future__ import print_function
from datetime import datetime
import time, os
import tensorflow as tf
# Import data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# Parameters
learning_rate = 0.001
training_epoch = 5
batch_size = 128
display_step = 10
model_path = "./output/model.ckpt"
logs_path = './logs'
directory = os.path.dirname(model_path)
if not os.path.exists(directory):
os.makedirs(directory)
directory = os.path.dirname(logs_path)
if not os.path.exists(directory):
os.makedirs(directory)
# Network Parameters
n_input = 784 # data input
n_classes = 10 # classes
dropout = 0.5 # Dropout, probability to keep units
l2_regularization_strength = 0.0005 #l2 regularization strength
# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input], name='InputData')
y = tf.placeholder(tf.float32, [None, n_classes], name='LabelData')
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)
mode = tf.placeholder(tf.int32);
# Create some wrappers for simplicity
def conv2d(x, kernel_shape, strides=1, mode=0):
# Conv2D wrapper, with batch normalization and relu activation
weights = tf.get_variable('weights', kernel_shape, initializer=tf.contrib.layers.xavier_initializer())
x = tf.nn.conv2d(x, weights, strides=[1, strides, strides, 1], padding='SAME')
pop_mean = tf.get_variable('bn_pop_mean', [x.get_shape()[-1]], initializer=tf.constant_initializer(0), trainable=False)
pop_var = tf.get_variable('bn_pop_var', [x.get_shape()[-1]], initializer=tf.constant_initializer(1), trainable=False)
scale = tf.get_variable('bn_scale', [x.get_shape()[-1]], initializer=tf.constant_initializer(1))
beta = tf.get_variable('bn_beta', [x.get_shape()[-1]], initializer=tf.constant_initializer(0))
epsilon = 1e-3
decay = 0.999
if mode == 0:
batch_mean, batch_var = tf.nn.moments(x,[0, 1, 2])
train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
bn = tf.nn.batch_normalization(x, batch_mean, batch_var, beta, scale, epsilon, name='bn')
else:
bn = tf.nn.batch_normalization(x, pop_mean, pop_var, beta, scale, epsilon, name='bn')
return tf.nn.relu(bn, name = 'relu')
def maxpool2d(x, k=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME', name='maxpool')
# Create model
def conv_net(x, dropout, mode):
# Reshape input picture
x = tf.reshape(x, shape=[-1, 28, 28, 1])
with tf.variable_scope("conv1"):
# Convolution Layer
conv1 = conv2d(x, [5, 5, 1, 32], mode=mode)
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)
with tf.variable_scope("conv2"):
# Convolution Layer
conv2 = conv2d(conv1, [5, 5, 32, 64], mode=mode)
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
with tf.variable_scope("fc1"):
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
weights = tf.get_variable("weights", [7*7*64, 1024], initializer=tf.contrib.layers.xavier_initializer())
biases = tf.get_variable("biases", [1024], initializer=tf.constant_initializer(0.0))
fc1 = tf.reshape(conv2, [-1, weights.get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights), biases)
fc1 = tf.nn.relu(fc1, name = 'relu')
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout, name='dropout')
with tf.variable_scope("output"):
# Output, class prediction
weights = tf.get_variable("weights", [1024, n_classes], initializer=tf.contrib.layers.xavier_initializer())
biases = tf.get_variable("biases", [n_classes], initializer=tf.constant_initializer(0.0))
out = tf.add(tf.matmul(fc1, weights), biases)
return out
with tf.name_scope('Model'):
# Construct model
pred = conv_net(x, keep_prob, mode)
with tf.name_scope('Loss'):
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
vars = tf.trainable_variables()
l2_regularization = tf.add_n([tf.nn.l2_loss(v) for v in vars if any(x in v.name for x in ['weights', 'biases'])])
for v in vars:
if any(x in v.name for x in ['weights', 'biases']):
print(v.name + '-included!')
else:
print(v.name)
cost += l2_regularization_strength*l2_regularization
with tf.name_scope('Optimizer'):
# Define optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
# Op to calculate every variable gradient
grads = tf.gradients(cost, tf.trainable_variables())
grads = list(zip(grads, tf.trainable_variables()))
# Op to update all variables according to their gradient
apply_grads = optimizer.apply_gradients(grads_and_vars=grads)
with tf.name_scope('Accuracy'):
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
# Create a summary to monitor cost tensor
tf.scalar_summary('cost', cost)
# Create a summary to monitor l2_regularization tensor
tf.scalar_summary('l2_regularization', l2_regularization)
# Create a summary to monitor accuracy tensor
tf.scalar_summary('accuracy', accuracy)
# Create summaries to visualize weights
for var in tf.trainable_variables():
tf.histogram_summary(var.name, var)
for var in tf.all_variables():
if 'bn_pop' in var.name:
tf.histogram_summary(var.name, var)
# Summarize all gradients
for grad, var in grads:
tf.histogram_summary(var.name + '/gradient', grad)
# Merge all summaries into a single op
merged_summary_op = tf.merge_all_summaries()
# 'Saver' op to save and restore all the variables
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
step = 1
# op to write logs to Tensorboard
summary_writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())
# Keep training until reach max epoch
while step * batch_size < training_epoch * mnist.train.num_examples:
start_time = time.time()
# Get barch
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
sess.run(apply_grads, feed_dict={x: batch_x, y: batch_y, keep_prob: dropout, mode: 0})
duration = time.time() - start_time
if step % display_step == 0:
# Calculate batch loss and accuracy
loss, acc, summary = sess.run([cost, accuracy, merged_summary_op], feed_dict={x: batch_x,
y: batch_y,
keep_prob: 1.,
mode: 1})
# Write logs at every iteration
summary_writer.add_summary(summary, step)
# Calculate number sample per sec
samples_per_sec = batch_size / duration
format_str = ('%s: Iter %d, Epoch %d, (%.1f examples/sec; %.3f sec/batch), Minibatch Loss = %.5f , Training Accuracy=%.5f')
print (format_str % (datetime.now(), step*batch_size, int(step*batch_size/mnist.train.num_examples) + 1, samples_per_sec, float(duration), loss, acc))
step += 1
print("Optimization Finished!")
# Calculate accuracy for 256 mnist test images
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: mnist.test.images[:5000],
y: mnist.test.labels[:5000],
keep_prob: 1.,
mode: 2}))
# Save model weights to disk
save_path = saver.save(sess, model_path)
print("Model saved in file: %s" % save_path)
When i open the tensorboard and look at the histogram and distribution sesstion, the 'bn_pop_mean' and 'bn_pop_var' in 'conv1' and 'conv2' are not updateing (they are constant at the initialised value).
Although after the training i achieved around 97% accuracy, i don't know if it the batch normalization is in effect.

In your conv_net function, you didn't set the "reuse" parameter for the tf.variable_scope(). The default setting for "reuse" is "None". Every time conv2d function is called, "bn_pop_mean" and "bn_pop_var" are re-initalized.

if mode == 0:
batch_mean, batch_var = tf.nn.moments(x,[0, 1, 2])
train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
bn = tf.nn.batch_normalization(x, batch_mean, batch_var, beta, scale, epsilon, name='bn')
else:
bn = tf.nn.batch_normalization(x, pop_mean, pop_var, beta, scale, epsilon, name='bn')
It seems that the if prediction here always evaluate to be False. I guess what you want to do is using mode via feed_dict to control your batch normalization. So you should use tf.cond in TensorFlow instead of if in Python.

TensorFlow: Network output has not expected shape

I am implementing a network using TensorFlow.
The network take as input a binary feature vector, and it should predict a float value as output.
I am expecting a (1,1) tensor object as output for my function multilayer_perceptron(), instead, when running pred, it returns a vector of the same length of my input data (X,1).
Since i am new to this framework I expect the error to be very trivial.
What am I doing wrong?
import tensorflow as tf
print "**** Defining parameters..."
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 1
display_step = 1
print "**** Defining Network..."
# Network Parameters
n_hidden_1 = 10 # 1st layer num features
n_hidden_2 = 10 # 2nd layer num features
n_input = Xa.shape[1] # data input(feature vector length)
n_classes = 1 # total classes (IC50 value)
# tf Graph input
x = tf.placeholder("int32", [batch_size, None])
y = tf.placeholder("float", [None, n_classes])
# Create model
def multilayer_perceptron(_X, _weights, _biases):
lookup_h1 = tf.nn.embedding_lookup(_weights['h1'], _X)
layer_1 = tf.nn.relu(tf.add(tf.reduce_sum(lookup_h1, 0), _biases['b1'])) #Hidden layer with RELU activation
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2'])) #Hidden layer with RELU activation
pred = tf.matmul(layer_2, _weights['out']) + _biases['out']
return pred
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.square(tf.sub(pred, y))) # MSE
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) #Gradient descent
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
print "**** Launching the graph..."
# Launch the graph
with tf.Session() as sess:
sess.run(init)
print "**** Training..."
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(Xa.tocsc().shape[0]/batch_size)
# Loop over all batches
for i in range(total_batch):
# Extract sample
batch_xs = Xa.tocsc()[i,:].tocoo()
batch_ys = np.reshape(Ya.tocsc()[i,0], (batch_size,1))
#****************************************************************************
# Extract sparse indeces from input matrix (They will be used as actual input)
ids = batch_xs.nonzero()[1]
# Fit training using batch data
sess.run(optimizer, feed_dict={x: ids, y: batch_ys})
# Compute average loss
avg_cost += sess.run(cost, feed_dict={x: ids, y: batch_ys})/total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost)
print "Optimization Finished!"

Your pred should be of shape [n_input, n_class] because you define weights['out'] and biases['out'] in that way. The only way you get a (1,1) tensor from pred is your n_class = 1...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow Multi-layer perceptron graph won't converge - python

Related

Neural Network not converging in multilabel classification task - TensorFlow

ValueError: Cannot feed value of shape (165,) for Tensor 'Placeholder_11:0', which has shape '(?, 2)'

same batch but different batch size generate different result

Tensorflow batch normalization

TensorFlow: Network output has not expected shape

Categories

Resources