I am studying Tensorflow and got some problems. I want to minimize loss function when i am trying to approximate 2x+2z-3t=y (to get a,b,c values where a=2,b=2,c=-3) but it doesn't work. Where is my mistake?
This is my output:
a: [ 0.51013279] b: [ 0.51013279] c: [ 1.00953674] loss: 2.72952e+10
I need a:2 b:2 c:-3 and loss close to 0
import tensorflow as tf
import numpy as np
a = tf.Variable([1], dtype=tf.float32)
b = tf.Variable([1], dtype=tf.float32)
c = tf.Variable([0], dtype=tf.float32)
x = tf.placeholder(tf.float32)
z = tf.placeholder(tf.float32)
t = tf.placeholder(tf.float32)
linear_model = a * x + b * z + c * t
y = tf.placeholder(tf.float32)
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
x_train = np.arange(0, 5000, 1)
z_train = np.arange(0, 10000, 2)
t_train = np.arange(0, 5000, 1)
y_train = list(map(lambda x, z, t: 2 * x + 2 * z - 3 * t, x_train, z_train,
t_train))
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(10000):
sess.run(train, {x: x_train, z: z_train, t: t_train, y: y_train})
curr_a, curr_b, curr_c, curr_loss = sess.run([a, b, c, loss], {x: x_train,
z: z_train, t: t_train, y: y_train})
print("a: %s b: %s c: %s loss: %s" % (curr_a, curr_b, curr_c, curr_loss))
I changed Maxim's code a bit to see values of a,b,c like this:
_, loss_val, curr_a, curr_b, curr_c, model_val = sess.run([optimizer,
loss,a, b, c, linear model], {x: x_train, z: z_train, t: t_train,
y: y_train})
So my output is:
10 2.04454e-11 1.83333 0.666667 -0.166667
20 2.04454e-11 1.83333 0.666667 -0.166667
30 2.04454e-11 1.83333 0.666667 -0.166667
I expected a=2,b=2,c=-3
First up, there is no single solution, so the optimizer can converge to any one of local minima. The exact value greatly depends on initialization of your variables.
Short answer concerning your bug: be careful with the learning rate. Checkout my version of your code:
a = tf.Variable(2, dtype=tf.float32)
b = tf.Variable(1, dtype=tf.float32)
c = tf.Variable(0, dtype=tf.float32)
x = tf.placeholder(shape=[None, 1], dtype=tf.float32)
z = tf.placeholder(shape=[None, 1], dtype=tf.float32)
t = tf.placeholder(shape=[None, 1], dtype=tf.float32)
y = tf.placeholder(shape=[None, 1], dtype=tf.float32)
linear_model = a * x + b * z + c * t
loss = tf.reduce_mean(tf.square(linear_model - y)) # sum of the squares
optimizer = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)
n = 50
x_train = np.arange(0, n, 1).reshape([-1, 1])
z_train = np.arange(0, 2*n, 2).reshape([-1, 1])
t_train = np.arange(0, n, 1).reshape([-1, 1])
y_train = np.array(map(lambda x, z, t: 2 * x + 2 * z - 3 * t, x_train, z_train, t_train)).reshape([-1, 1])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(101):
_, loss_val = sess.run([optimizer, loss], {x: x_train, z: z_train, t: t_train, y: y_train})
if i % 10 == 0:
a_val, b_val, c_val = sess.run([a, b, c])
print('iteration %2i, loss=%f a=%.5f b=%.5f c=%.5f' % (i, loss_val, a_val, b_val, c_val))
If you run it, you'll notice that it converges very fast - in less than 10 iterations. However, if you increase the training size n from 50 to 75, the model is going to diverge. But decreasing the learning rate 0.00001 will make it converge again, though not so fast as before. The more data you push to the optimizer, the more important an appropriate learning rate becomes.
You've tried 5000 training size: I can't even imaging how small the learning rate should be to process that many points at once correctly.
Related
I have generated a balanced dataset of 4000 examples, 2000 for the negative class and 2000 for the positive one. Then, I've build a neural net with one single hidden layer and 3 neurons with a ReLU activation function and an output layer with a sigmoid. The cost function is a standard cross-entropy function and I chose Adam as optimizer. Using minibatches of 15 examples, after 1000 epochs of running the final accuracy 96.37%, so I am assuming that the model is doing well on the test set. But when I want to display the decision boundary, that's what I get:
I cannot figure out if the problem is a code error or the model just needs mode training. Script I'm using for this:
# implement a neural network that finds a decision boundary under a
constraint on the second hidden layer with tensorflow
import numpy as np
from sklearn.utils import shuffle
from sklearn.preprocessing import normalize
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tf_utils import random_mini_batches
import matplotlib.pyplot as plt
def generate_dataset():
np.random.seed(2)
# positive class samples
d1_x = np.random.normal(5, 10, 1000)
d1_y = np.random.normal(5, 2, 1000)
d2_x = np.random.normal(40, 20, 1000)
d2_y = np.random.normal(2, 1, 1000)
# negative class samples
d3_x = np.random.normal(60, 5, 2000)
d3_y = np.random.normal(10, 1, 2000)
plt.scatter(d1_x, d1_y, color='b')
plt.scatter(d2_x, d2_y, color='b')
plt.scatter(d3_x, d3_y, color='r')
Y = np.zeros((4000, 1))
d_x = np.concatenate([d1_x, d2_x, d3_x])
d_y = np.concatenate([d1_y, d2_y, d3_y])
d_x = d_x.reshape(d_x.shape[0], 1)
d_y = d_y.reshape(d_y.shape[0], 1)
X = np.concatenate([d_x, d_y], axis=1)
Y[2000:] = 1
return X, Y
# define a tensorflow model 5-3-1 with two hideen layers and the output
being scalar
costs = []
print_cost = True
learning_rate = .0009
minibatch_size = 15
num_epochs = 1000
XX, YY = generate_dataset()
XX, YY = shuffle(XX, YY)
X_norm = normalize(XX)
X_train, X_test, y_train, y_test = train_test_split(X_norm, YY,
test_size=0.2, random_state=42)
X_train = np.transpose(X_train)
y_train = np.transpose(y_train)
X_test = np.transpose(X_test)
y_test = np.transpose(y_test)
# define train and test sets
m = XX.shape[1] # input dimension
n = YY.shape[1] # output dimension
X = tf.placeholder(tf.float32, shape = [m, None], name = 'X')
y = tf.placeholder(tf.float32, shape = [n, None], name = 'y')
# model parameters
n1 = 3 # output dimension of the first hidden layer
#n2 = 4 # output dimension of the second hidden layer
#n3 = 2
W1 = tf.get_variable("W1", [n1, m],
initializer=tf.contrib.layers.xavier_initializer(seed=1))
b1 = tf.get_variable("b1", [n1 ,1], initializer=tf.zeros_initializer)
#W2 = tf.get_variable("W2", [n2, n1],
initializer=tf.contrib.layers.xavier_initializer(seed=1))
#b2 = tf.get_variable("b2", [n2, 1], initializer=tf.zeros_initializer)
#W3 = tf.get_variable("W3", [n3, n2],
initializer=tf.contrib.layers.xavier_initializer(seed=1))
#b3 = tf.get_variable("b3", [n3, 1], initializer=tf.zeros_initializer)
W4 = tf.get_variable("W4", [n, n1],
initializer=tf.contrib.layers.xavier_initializer(seed=1))
b4 = tf.get_variable("b4", [n, 1], initializer=tf.zeros_initializer)
# forward propagation
z1 = tf.add(tf.matmul(W1, X), b1)
a1 = tf.nn.relu(z1)
#z2 = tf.add(tf.matmul(W2, a1), b2)
#a2 = tf.nn.relu(z2)
#z3 = tf.add(tf.matmul(W3, a2), b3)
#a3 = tf.nn.relu(z3)
z4 = tf.add(tf.matmul(W4, a1), b4)
pred = tf.nn.sigmoid(z4)
# cost function
cost = tf.reduce_mean(tf.losses.log_loss(labels=y, predictions=pred)) #
logit is the probability estimate given by the model --> this is what is used inside the formula, not the net input z
# ADAM optimizer
optimizer =
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# metrics
correct_prediction = tf.less_equal(tf.abs(pred - y), 0.5)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
init = tf.global_variables_initializer()
with tf.Session() as sess:
seed = 1
sess.run(init)
for epoch in range(num_epochs):
epoch_cost = 0
seed += 1
num_minibatches = int(X_train.shape[0] / minibatch_size)
minibatches = random_mini_batches(X_train, y_train, minibatch_size, seed)
for minibatch in minibatches:
(minibatch_X, minibatch_Y) = minibatch
_, minibatch_cost = sess.run([optimizer, cost], feed_dict={X:minibatch_X, y:minibatch_Y})
epoch_cost += minibatch_cost / minibatch_size
# Print the cost every epoch
if print_cost == True and epoch % 100 == 0:
print("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % minibatch_size == 0:
costs.append(epoch_cost)
#plt.plot(costs)
#plt.show()
cp, val_accuracy = sess.run([correct_prediction, accuracy], feed_dict={X: X_test, y: y_test})
# plot the cost
# plt.plot(np.squeeze(costs))
# plt.ylabel('cost'), feed_dict={X: X_test, y: y_test})
# plt.xlabel('iterations (per fives)')
# plt.title("Learning rate =" + str(learning_rate))
# plt.show()
cmap = plt.get_cmap('Paired')
# Define region of interest by data limits
xmin, xmax = min(XX[:, 0]) - 1, max(XX[:, 0]) + 1
ymin, ymax = min(XX[:, 1]) - 1, max(XX[:, 1]) + 1
steps = 100
x_span = np.linspace(xmin, xmax, steps)
y_span = np.linspace(ymin, ymax, steps)
xx, yy = np.meshgrid(x_span, y_span)
A = np.concatenate([[xx.ravel()], [yy.ravel()]], axis=0)
A = normalize(A, axis=0)
# Make predictions across region of interest
predictions = sess.run(pred, feed_dict={X: A})
# Plot decision boundary in region of interest
z = predictions.reshape(xx.shape)
plt.contourf(xx, yy, z, cmap=cmap, alpha=.5)
plt.show()
# Get predicted labels on training data and plot
#train_labels = model.predict(X)
#ax.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap, lw=0)
I am trying to create my first linear regressor using Tensor Flow (without the help of estimators), and in each iteration, I only see a cost value of NaN. I think I am not doing something right, but unable to zero in on the issue. Can someone please help me troubleshoot the problem?
I am using the CA housing dataset
# Common imports
import math
import numpy as np
import tensorflow as tf
import pandas as pd
from sklearn import metrics
california_housing_dataframe = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv", sep=",")
I am predicting the median_house_value column
data_X = california_housing_dataframe.iloc[:, :8]
data_y = california_housing_dataframe.iloc[:, 8]
print('Features (X):\n', data_X.head(), '\n')
print('Target (y):\n', data_y.head(), '\n')
Create training and validation sets
from sklearn.model_selection import train_test_split
data_X_train, data_X_validate = train_test_split(data_X, test_size=0.2, random_state=42)
data_y_train, data_y_validate = train_test_split(data_y, test_size=0.2, random_state=42)
Setup the hyperspace parameters and TensorFlow variables
# Hyperspace Params
learning_rate = 0.01
training_epochs = 1 #40
batch_size = 500 #50
totalBatches = len(data_X_train)/batch_size
n, m = data_X_train.shape # 17,000 Rows + 9 Features
print('n=', n, ', m=', m)
W = tf.Variable(tf.random_uniform([m, 1], -1.0, 1.0, dtype = tf.float64), name="theta") # Random initialization
b = tf.Variable(np.random.randn(), name = "b", dtype = tf.float64)
X = tf.placeholder(tf.float64, shape=(None, m), name="X")
y = tf.placeholder(tf.float64, shape=(None, 1), name="y")
print('X.shape :\n', X.shape, '\n')
print('y.shape :\n', y.shape, '\n')
print('b.shape :\n', b.shape, '\n')
print('Thetha.shape (W):\n', W.shape, '\n')
y_pred = tf.add(tf.matmul(X, W), b, name="predictions")
error = y_pred - y
cost = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Global Variables Initializer
init = tf.global_variables_initializer()
Now, training the model returns me NaN values only
def get_batch(X, y, batch_size):
rnd_idx = np.random.permutation(len(X))
n_batches = len(X) // batch_size
for batch_idx in np.array_split(rnd_idx, n_batches):
X_batch, y_batch = X.iloc[batch_idx, :], y[batch_idx]
yield X_batch, y_batch
# Global Variables Initializer
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
for X_batch, y_batch in get_batch(data_X_train, data_y_train, batch_size):
y_batch = np.array(y_batch).reshape(-1, 1)
sess.run(optimizer, feed_dict={X: X_batch, y: y_batch})
curr_y_pred, curr_error, curr_cost = sess.run([y_pred, error, cost], {X: X_batch, y: y_batch})
print('Training... batch.shape: ', X_batch.shape,'curr_error:', curr_error)
Result looks like
Training... batch.shape: (504, 8) curr_error: [[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
...
Your issue comes from the pd.read_csv(...) function. I swapped it for the NumPy version (I am not familiar with Pandas) and it works like a charm. Here is the whole snippet:
import math
import numpy as np
import tensorflow as tf
from sklearn import metrics
california_housing_dataframe = np.genfromtxt('https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv', delimiter=',', skip_header=1)
data_X = california_housing_dataframe[:, :8]
data_y = california_housing_dataframe[:, 8]
from sklearn.model_selection import train_test_split
data_X_train, data_X_validate = train_test_split(data_X, test_size=0.2, random_state=42)
data_y_train, data_y_validate = train_test_split(data_y, test_size=0.2, random_state=42)
# Hyperspace Params
learning_rate = 0.01
training_epochs = 1 #40
batch_size = 500 #50
totalBatches = len(data_X_train)/batch_size
n, m = data_X_train.shape # 17,000 Rows + 9 Features
print('n=', n, ', m=', m)
W = tf.Variable(tf.random_uniform([m, 1], -1.0, 1.0, dtype = tf.float64), name="theta") # Random initialization
b = tf.Variable(np.random.randn(), name = "b", dtype = tf.float64)
X = tf.placeholder(tf.float64, shape=(None, m), name="X")
y = tf.placeholder(tf.float64, shape=(None, 1), name="y")
print('X.shape :\n', X.shape, '\n')
print('y.shape :\n', y.shape, '\n')
print('b.shape :\n', b.shape, '\n')
print('Thetha.shape (W):\n', W.shape, '\n')
y_pred = tf.add(tf.matmul(X, W), b, name="predictions")
error = y_pred - y
cost = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Global Variables Initializer
init = tf.global_variables_initializer()
def get_batch(X, y, batch_size):
rnd_idx = np.random.permutation(len(X))
n_batches = len(X) // batch_size
for batch_idx in np.array_split(rnd_idx, n_batches):
X_batch, y_batch = X[batch_idx, :], y[batch_idx]
yield X_batch, y_batch
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
for X_batch, y_batch in get_batch(data_X_train, data_y_train, batch_size):
y_batch = np.array(y_batch).reshape(-1, 1)
sess.run(optimizer, feed_dict={X: X_batch, y: y_batch})
curr_y_pred, curr_error, curr_cost = sess.run([y_pred, error, cost], {X: X_batch, y: y_batch})
print('Training... batch.shape: ', X_batch.shape,'curr_error:', curr_error)
I'm trying to train a neural network to predict the sum of two numbers. But I don't understand what's wrong with my model. Model consists of 2 inputs, 2 hidden and 1 output layers. Every 1000 iteration I print test execution, but the result is getting smaller and smaller.
import numpy as np
import tensorflow as tf
input_size = 2
hidden_size = 3
out_size = 1
def generate_test_data():
inp = 0.5*np.random.rand(10, 2)
oup = np.zeros((10, 1))
for idx, val in enumerate(inp):
oup[idx] = np.array([val[0] + val[1]])
return inp, oup
def create_network():
x = tf.placeholder(tf.float32, [None, input_size])
w01 = tf.Variable(tf.truncated_normal([input_size, hidden_size], stddev=0.1))
y1 = tf.sigmoid(tf.matmul(tf.sigmoid(x), w01))
w12 = tf.Variable(tf.truncated_normal([hidden_size, out_size], stddev=0.1))
y2 = tf.sigmoid(tf.matmul(y1, w12))
y_ = tf.placeholder(tf.float32, [None, out_size])
return x, y_, y2
def train(x, y_, y2):
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y2)
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Train
for i in range(100000):
batch_xs, batch_ys = generate_test_data()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Test
if i % 1000 == 0:
out_batch = sess.run(y2, {x: batch_xs})
inx = 0
print(batch_xs[inx][0], " + ", batch_xs[inx][1], " = ", out_batch[inx][0])
(x, y_, y2) = create_network()
train(x, y_, y2)
Output every 1000 iteration:
0.37301352864927173 + 0.28949461772342683 = 0.49111518
0.050899466843458474 + 0.006174158992116541 = 0.0025260744
0.3974852369427063 + 0.22402098418952499 = 0.00090828544
0.15735921047969498 + 0.39645077887600294 = 0.0005903727
0.23560825884336228 + 0.29010766384718145 = 0.0004317883
0.4250063393420791 + 0.24181166029062096 = 0.00031525563
= smaller and smaller
Cross-entropy loss is used for classification problems, while your task is clearly a regression. The computed cross_entropy value doesn't make sense, hence the result.
Change your loss to:
cross_entropy = tf.reduce_mean(
tf.nn.l2_loss(y_ - y2)
)
... and you'll see much more sensible results.
Maxim, thanks a lot. Now it's work.
import numpy as np
import tensorflow as tf
input_size = 2
hidden_size = 3
out_size = 1
def generate_test_data():
inp = 0.5*np.random.rand(10, 2)
oup = np.zeros((10, 1))
for idx, val in enumerate(inp):
oup[idx] = np.array([val[0] + val[1]])
return inp, oup
def create_network():
x = tf.placeholder(tf.float32, [None, input_size])
w01 = tf.Variable(tf.truncated_normal([input_size, hidden_size], stddev=0.1))
y1 = tf.matmul(x, w01)
w12 = tf.Variable(tf.truncated_normal([hidden_size, out_size], stddev=0.1))
y2 = tf.matmul(y1, w12)
y_ = tf.placeholder(tf.float32, [None, out_size])
return x, y_, y2
def train(x, y_, y2):
cross_entropy = tf.reduce_mean(
tf.nn.l2_loss(y_ - y2)
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Train
for i in range(100000):
batch_xs, batch_ys = generate_test_data()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Test
if i % 2000 == 0:
out_batch = sess.run(y2, {x: batch_xs})
inx = 0
print(batch_xs[inx][0], " + ", batch_xs[inx][1], " = ", out_batch[inx][0], "|", batch_xs[inx][0] + batch_xs[inx][1])
(x, y_, y2) = create_network()
train(x, y_, y2)
If you consider predicting each digit to be a classification problem where you predict a value in "0123456789 ", you can use cross-entropy as your loss. For reference, see the Keras - Addition RNN Example.
But like Maxim said, it shouldn't be used for a regression problem.
Sorry if the title isn't very clear... I'm trying to solve for the value of "w" in the following problem with Tensorflow:
Y = X*B(w) + e
where Y is a 22x5 matrix, X is a 22x3 matrix, and B(w) is a 3*5 matrix with the following structure:
B = [[1, 1, 1, 1, 1],
[exp(-3w), exp(-6w), exp(-12w), exp(-24w), exp(-36w)],
[3*exp(-3w), 6*exp(-6w), 12*exp(-12w), 24*exp(-24w), 36*exp(-36w)]]
Here's my code:
# Parameters
learning_rate = 0.01
display_step = 50
tolerance = 0.0000000000000001
# Training Data
Y_T = df.values
X_T = factors.values
X = tf.placeholder("float32", shape = (22, 3))
Y = tf.placeholder("float32", shape = (22, 5))
w = tf.Variable(1.0, name="w")
def slope_loading(q):
return tf.exp(tf.multiply(tf.negative(q),w))
def curve_loading(q):
return tf.multiply(w,tf.exp(tf.multiply(tf.negative(q),w)))
B = tf.Variable([[1.0, 1.0, 1.0, 1.0, 1.0],
[slope_loading(float(x)) for x in [3, 6, 12, 24, 36]],
[curve_loading(float(x)) for x in [3, 6, 12, 24, 36]]])
pred = tf.matmul(X,B)
cost = tf.matmul(tf.transpose(Y-pred), (Y-pred))/22
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
# Set initial values for weights
sess.run(init)
# Set initial values for the error tolerance
tol = abs(sess.run(cost, feed_dict={X: X_T, Y: Y_T})[0][0])
iteration = 0
while tol > tolerance:
c_old = sess.run(cost, feed_dict={X: X_T, Y: Y_T})[0][0]
sess.run(optimizer, feed_dict={X: X_T, Y: Y_T})
c_new = sess.run(cost, feed_dict={X: X_T, Y: Y_T})[0][0]
tol = abs(c_new - c_old)
iteration = iteration + 1
if iteration % display_step == 0:
print("Iteration= ", iteration, "Gain= ", tol)
training_cost = sess.run(cost, feed_dict={X: X_T, Y: Y_T})
But i'm getting the error "FailedPreconditionError (see above for traceback): Attempting to use uninitialized value w..."
I'm guessing this has to do with the how I'm constructing B and passing it along to the cost function, but I'm too new to Tensorflow to see what I'm doing wrong.
Any help?
You can't use a variable to define the initial value for another variable. A better way to construct B is like this
ones = tf.ones(5)
vals = tf.constant([3.0, 6.0, 12.0, 24.0, 36.0])
slopes = slope_loading(vals)
curves = curve_loading(vals)
B = tf.stack([ones, slopes, curves])
I am trying to reproduce a deep learning regression result in Tensorflow. If I train a neural network with the MLPRegressor class from sklearn I get very nice results of 98% validation.
The MLPRegressor:
http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
I am trying to reproduce the model in Tensorflow. By copying the default values of the MLPRegressor class in a Tensorflow model. However I cannot get the same result. I only get 75% most of the time.
My TF model:
tf.reset_default_graph()
graph = tf.Graph()
n_input = 3 # n variables
n_hidden_1 = 100
n_hidden_2 = 1
n_output = 1
beta = 0.001
learning_rate = 0.001
with graph.as_default():
tf_train_feat = tf.placeholder(tf.float32, shape=(None, n_input))
tf_train_label = tf.placeholder(tf.float32, shape=(None))
tf_test_feat = tf.constant(test_feat, tf.float32)
"""
Weights and biases. The weights matix' columns will be the output vector.
* ndarray([rows, columns])
* ndarray([in, out])
tf.placeholder(None) and tf.placeholder([None, 3]) means that the row's size is not set. In the second
placeholder the columns are prefixed at 3.
"""
W = {
"layer_1": tf.Variable(tf.truncated_normal([n_input, n_hidden_1])),
"layer_2": tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2])),
"layer_3": tf.Variable(tf.truncated_normal([n_hidden_2, n_output])),
}
b = {
"layer_1": tf.Variable(tf.zeros([n_hidden_1])),
"layer_2": tf.Variable(tf.zeros([n_hidden_2])),
}
def computation(X):
layer_1 = tf.nn.relu(tf.matmul(X, W["layer_1"]) + b["layer_1"])
layer_2 = tf.nn.relu(tf.matmul(layer_1, W["layer_2"]) + b["layer_2"])
return layer_2
tf_prediction = computation(tf_train_feat)
tf_test_prediction = computation(tf_test_feat)
tf_loss = tf.reduce_mean(tf.pow(tf_train_label - tf_prediction, 2))
tf_loss = tf.reduce_mean( tf_loss + beta * tf.nn.l2_loss(W["layer_2"]) )
tf_optimizer = tf.train.AdamOptimizer(learning_rate).minimize(tf_loss)
#tf_optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(tf_loss)
init = tf.global_variables_initializer()
My TF session:
def accuracy(y_pred, y):
a = 0
for i in range(y.shape[0]):
a += abs(1 - y_pred[i][0] / y[i])
return round((1 - a / y.shape[0]) * 100, 3)
def accuracy_tensor(y_pred, y):
a = 0
for i in range(y.shape[0]):
a += abs(1 - y_pred[i][0] / y[i])
return round((1 - a / y.shape[0]) * 100, 3)
# Shuffles two arrays.
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = np.empty(a.shape, dtype=a.dtype)
shuffled_b = np.empty(b.shape, dtype=b.dtype)
permutation = np.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
train_epoch = int(5e4)
batch = int(200)
n_batch = int(X.shape[0] // batch)
prev_acc = 0
stable_count = 0
session = tf.InteractiveSession(graph=graph)
session.run(init)
print("Initialized.\n No. of epochs: %d.\n No. of batches: %d." % (train_epoch, n_batch))
for epoch in range(train_epoch):
offset = (epoch * n_batch) % (Y.shape[0] - n_batch)
for i in range(n_batch):
x = X[offset:(offset + n_batch)]
y = Y[offset:(offset + n_batch)]
x, y = shuffle_in_unison(x, y)
feed_dict = {tf_train_feat: x, tf_train_label: y}
_, l, pred, pred_label = session.run([tf_optimizer, tf_loss, tf_prediction, tf_train_label], feed_dict=feed_dict)
if epoch % 1 == 0:
print("Epoch: %d. Batch' loss: %f" %(epoch, l))
test_pred = tf_test_prediction.eval(session=session)
acc_test = accuracy(test_pred, test_label)
acc_train = accuracy_tensor(pred, pred_label)
print("Accuracy train set %s%%" % acc_train)
print("Accuracy test set: %s%%" % acc_test)
Am I missing something in the Tensorflow code? Thanks!
Unless you have a very good reason to not use them, regression should have linear output units. I ran into a similar problem a while back and ended up using linear outputs and linear hidden units which seemed to mirror the mlpregressor in my case.
There is a great section in Goodfellow's Deep Learning Book in chapter 6, starting at page 181, that goes over the activation functions.
At the very least try this for your output layer
layer_2 = tf.matmul(layer_1, W["layer_2"]) + b["layer_2"]