I am new to tensorflow and I am trying to implement a simple feed-forward network for regression, just for learning purposes. The complete executable code is as follows.
The regression mean squared error is around 6, which is quite large. It is a little unexpected because the function to regress is linear and simple 2*x+y, and I expect a better performance.
I am asking for help to check if I did anything wrong in the code. I carefully checked the matrix dimensions so that should be good, but it is possible that I misunderstand something so the network or the session is not properly configured (like, should I run the training session multiple times, instead of just one time (the code below enclosed by #TRAINING#)? I see in some examples they input data piece by piece, and run the training progressively. I run the training just one time and input all data).
If the code is good, maybe this is a modeling issue, but I really don't expect to use a complicated network for such a simple regression.
import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error
# inputs are points from a 100x100 grid in domain [-2,2]x[-2,2], total 10000 points
lsp = np.linspace(-2,2,100)
gridx,gridy = np.meshgrid(lsp,lsp)
inputs = np.dstack((gridx,gridy))
inputs = inputs.reshape(-1,inputs.shape[-1]) # reshpaes the grid into a 10000x2 matrix
feature_size = inputs.shape[1] # feature_size is 2, features are the 2D coordinates of each point
input_size = inputs.shape[0] # input_size is 10000
# a simple function f(x)=2*x[0]+x[1] to regress
f = lambda x: 2 * x[0] + x[1]
label_size = 1
labels = f(inputs.transpose()).reshape(-1,1) # reshapes labels as a column vector
ph_inputs = tf.placeholder(tf.float32, shape=(None, feature_size), name='inputs')
ph_labels = tf.placeholder(tf.float32, shape=(None, label_size), name='labels')
# just one hidden layer with 16 units
hid1_size = 16
w1 = tf.Variable(tf.random_normal([hid1_size, feature_size], stddev=0.01), name='w1')
b1 = tf.Variable(tf.random_normal([hid1_size, label_size]), name='b1')
y1 = tf.nn.relu(tf.add(tf.matmul(w1, tf.transpose(ph_inputs)), b1))
# the output layer
wo = tf.Variable(tf.random_normal([label_size, hid1_size], stddev=0.01), name='wo')
bo = tf.Variable(tf.random_normal([label_size, label_size]), name='bo')
yo = tf.transpose(tf.add(tf.matmul(wo, y1), bo))
# defines optimizer and predictor
lr = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss = tf.losses.mean_squared_error(ph_labels,yo)
optimizer = tf.train.GradientDescentOptimizer(lr).minimize(loss)
predictor = tf.identity(yo)
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
_, c = sess.run([optimizer, loss], feed_dict={lr:0.05, ph_inputs: inputs, ph_labels: labels})
# TRAINING
# gets the regression results
predictions = np.zeros((input_size,1))
for i in range(input_size):
predictions[i] = sess.run(predictor, feed_dict={ph_inputs: inputs[i, None]}).squeeze()
# prints regression MSE
print(mean_squared_error(predictions, labels))
You're right, you understood the problem by yourself.
The problem is, in fact, that you're running the optimization step only one time. Hence you're doing one single update step of your network parameter and therefore the cost won't decrease.
I just changed the training session of your code in order to make it work as expected (100 training steps):
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(100):
_, c = sess.run(
[optimizer, loss],
feed_dict={
lr: 0.05,
ph_inputs: inputs,
ph_labels: labels
})
print("Train step {} loss value {}".format(i, c))
# TRAINING
and at the end of the training step I go:
Train step 99 loss value 0.04462708160281181
0.044106700712455045
Related
I've been working on this neural network with the intent to predict TBA (time based availability) of simulated windmill parks based on certain attributes. The neural network runs just fine, and gives me some predictions, however I'm not quite satisfied with the results. It fails to notice some very obvious correlations that I can clearly see by myself. Here is my current code:
`# Import
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
maxi = 0.96
mini = 0.7
# Make data a np.array
data = pd.read_csv('datafile_ML_no_avg.csv')
data = data.values
# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(data)))
data = data[shuffle_indices]
# Training and test data
data_train = data[0:int(len(data)*0.8),:]
data_test = data[int(len(data)*0.8):int(len(data)),:]
# Scale data
scaler = MinMaxScaler(feature_range=(mini, maxi))
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)
# Build X and y
X_train = data_train[:, 0:5]
y_train = data_train[:, 6:7]
X_test = data_test[:, 0:5]
y_test = data_test[:, 6:7]
# Number of stocks in training data
n_args = X_train.shape[1]
multi = int(8)
# Neurons
n_neurons_1 = 8*multi
n_neurons_2 = 4*multi
n_neurons_3 = 2*multi
n_neurons_4 = 1*multi
# Session
net = tf.InteractiveSession()
# Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_args])
Y = tf.placeholder(dtype=tf.float32, shape=[None,1])
# Initialize1s
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg",
distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()
# Hidden weights
W_hidden_1 = tf.Variable(weight_initializer([n_args, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))
# Output weights
W_out = tf.Variable(weight_initializer([n_neurons_4, 1]))
bias_out = tf.Variable(bias_initializer([1]))
# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2),
bias_hidden_2))
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3),
bias_hidden_3))
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4),
bias_hidden_4))
# Output layer (transpose!)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))
# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)
# Init
net.run(tf.global_variables_initializer())
# Fit neural net
batch_size = 10
mse_train = []
mse_test = []
# Run
epochs = 10
for e in range(epochs):
# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]
# Minibatch training
for i in range(0, len(y_train) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
net.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 50) == 0:
mse_train.append(net.run(mse, feed_dict={X: X_train, Y: y_train}))
mse_test.append(net.run(mse, feed_dict={X: X_test, Y: y_test}))
pred = net.run(out, feed_dict={X: X_test})
print(pred)`
Have tried to tweak around with the number of hidden layers, number of nodes per layer, number of epochs to run and trying different activation functions and optimizers. However, I am quite new to neural networks, so there might be something very obvious that I'm missing.
Thanks in advance to anyone who managed to read through all of that.
It will make is much easier you you will share a small dataset that illustrate the problem. However, I will state some of the issues with non-standards datasets and how to overcome them.
Possible solutions
Regularization and validation-based optimization - are methods that are always good to try when looking for some extra-accuracy. See dropout methods here (original paper), and some overview here.
Unbalanced data - Sometimes of the time series categories/events behave like anomalies, or just in unbalanced ways. If you read a book, words like the or it will appear much more times than warehouse or such. This can become a problem if your main task is to detect the word warehouse and you train your network (even lstms) in traditional ways. A way to overcome this problem is by balancing the samples (creating balanced datasets) or to give more weight to low-frequent categories.
Model structure - sometimes fully connected layers are not enough. See computer vision problems for instance, where we train using convolution layers. The convolution and pooling layers enforce structure on the model, which is suitable for images. This is also some sort of regulation, since we have less parameters in those layers. In time-series problems, convolutions are also possible and turns out that works just fine. See example in Conditional Time Series Forecasting with Convolution Neural Networks.
The above suggestions are presented in the order I would suggest to try.
Good luck!
My input data is as follows:
AT V AP RH PE
14.96 41.76 1024.07 73.17 463.26
25.18 62.96 1020.04 59.08 444.37
5.11 39.4 1012.16 92.14 488.56
20.86 57.32 1010.24 76.64 446.48
10.82 37.5 1009.23 96.62 473.9
26.27 59.44 1012.23 58.77 443.67
15.89 43.96 1014.02 75.24 467.35
9.48 44.71 1019.12 66.43 478.42
14.64 45 1021.78 41.25 475.98
....................................
I am basically working on Python using Tensorflow Library.
As of now,I have a linear model,which is working fine for 4 inputs and 1 output.This is basically a regression problem.
For e.g: After training my neural network with sufficient data(say if the size of data is some 10000), then while training my neural network,if I am passing the values 45,30,25,32,as inputs , it is returning the value 46 as Output.
I basically have two queries:
As of now, in my code, I am using the parameters
training_epochs , learning_rate etc. I am as of now giving the
value of training_epochs as 10000.So, when I am testing my neural
network by passing four input values, I am getting the output as
some 471.25, while I expect it to be 460.But if I am giving the
value of training_epochs as 20000, instead of 10000, I am getting
my output value as 120.5, which is not at all close when compared to
the actual value "460".
Can you please explain, how can one chose the values of training_epochs and learning_rate(or any other parameter values) in my code, so that I can get good accuracy.
Now, the second issue is, my neural network as of now is working
only for linear data as well as only for 1 output. If I want to have
3 inputs and 2 outputs and also a non-linear model, what are the
possible changes I can make in my code?
I am posting my code below:
import tensorflow as tf
import numpy as np
import pandas as pd
#import matplotlib.pyplot as plt
rng = np.random
# In[180]:
# Parameters
learning_rate = 0.01
training_epochs = 10000
display_step = 1000
# In[171]:
# Read data from CSV
df = pd.read_csv("H:\MiniThessis\Sample.csv")
# In[173]:
# Seperating out dependent & independent variable
train_x = df[['AT','V','AP','RH']]
train_y = df[['PE']]
trainx = train_x.as_matrix().astype(np.float32)
trainy = train_y.as_matrix().astype(np.float32)
# In[174]:
n_input = 4
n_classes = 1
n_hidden_1 = 5
n_samples = 9569
# tf Graph Input
#Inserts a placeholder for a tensor that will be always fed.
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Set model weights
W_h1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
W_out = tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
b_h1 = tf.Variable(tf.random_normal([n_hidden_1]))
b_out = tf.Variable(tf.random_normal([n_classes]))
# In[175]:
# Construct a linear model
layer_1 = tf.matmul(x, W_h1) + b_h1
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, W_out) + b_out
# In[176]:
# Mean squared error
cost = tf.reduce_sum(tf.pow(out_layer-y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# In[177]:
# Initializing the variables
init = tf.global_variables_initializer()
# In[181]:
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
_, c = sess.run([optimizer, cost], feed_dict={x: trainx,y: trainy})
# Display logs per epoch step
if (epoch+1) % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={x: trainx,y: trainy})
print(training_cost)
correct_prediction = tf.equal(tf.argmax(out_layer, 1), tf.argmax(y, 1))
best = sess.run([out_layer], feed_dict=
{x:np.array([[14.96,41.76,1024.07,73.17]])})
print(correct_prediction)
print(best)
1.you can adjust these following lines;
# In general baises are either initialized as zeros or not zero constant, but not Gaussian
b_h1 = tf.Variable(tf.zeros([n_hidden_1]))
b_out = tf.Variable(tf.zeros([n_classes]))
# MSE error
cost = tf.reduce_mean(tf.pow(out_layer-y, 2))/(2*n_samples)
Also, Feed the data as mini batches; as the optimizer you are using is tuned for minibatch optimization; feeding the data as a whole doesn't result in optimal performance.
2.
for multiple ouputs you need to change only the n_classes and the cost fucntion (tf.nn.softmax_cross_entropy_with_logits). Also the model you defined here isn't linear; as you are using the non linear activation function tf.nn.relu.
I am trying to write a two layer neural network to train a class labeler. The input to the network is a 150-feature list of about 1000 examples; all features on all examples have been L2 normalized.
I only have two outputs, and they should be disjoint--I am just attempting to predict whether the example is a one or a zero.
My code is relatively simple; I am feeding the input data into the hidden layer, and then the hidden layer into the output. As I really just want to see this working in action, I am training on the entire data set with each step.
My code is below. Based on the other NN implementations I have referred to, I believe that the performance of this network should be improving over time. However, regardless of the number of epochs I set, I am getting back an accuracy of about ~20%. The accuracy is not changing when the number of steps are changed, so I don't believe that my weights and biases are being updated.
Is there something obvious I am missing with my model? Thanks!
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.01
epochs = 200
n_input = 150
n_hidden = 75
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden]))
b1 = tf.Variable(tf.truncated_normal([n_output]))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden]))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output]))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
model = tf.train.GradientDescentOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run(model,feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
I changed your optimizer to AdamOptimizer (in many cases it performs better than GradientDescentOptimizer).
I also played a bit with the parameters. In particular, I took smaller std for your variable initialization, decreased learning rate (as your loss was unstable and "jumped around") and increased epochs (as I noticed that your loss continues to decrease).
I also reduced the size of the hidden layer. It is harder to train networks with large hidden layer when you don't have that much data.
Regarding your loss, it is better to apply tf.reduce_mean on it so that loss would be a number. In addition, following the answer of ml4294, I used softmax instead of sigmoid, so the loss looks like:
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y))
The code below achieves accuracy of around 99.9% on the training data:
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.002
epochs = 400
n_input = 150
n_hidden = 60
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden],stddev=0.2,seed=0))
b1 = tf.Variable(tf.truncated_normal([n_output],stddev=0.2,seed=0))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden],stddev=0.2,seed=0))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output],stddev=0.2,seed=0))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y)) # calculate loss between prediction and actual
model = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run([model,loss],feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
Just a suggestion in addition to the answer provided by Miriam Farber:
You use a multi-dimensional output label ([0., 1.]) for the classification. I suggest to use the softmax cross entropy tf.nn.softmax_cross_entropy_with_logits() instead of the sigmoid cross entropy, since you assume the outputs to be disjoint softmax on Wikipedia. I achieved much faster convergence with this small modification.
This should also improve your performance once you decide to increase your output dimensionality from 2 to a higher number.
I guess you have some problem here:
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
It should look smth like that:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y))
Did't look at you code much, so if this would't work out you can check udacity deep learning course or forum they have good samples of that are you trying to do.
GL
I am trying to implement a simple linear regression in tensorflow (with the goal of eventually extending it to more advanced models). My current code looks as follows:
def linear_regression(data, labels):
# Setup placeholders and variables
num_datapoints = data.shape[0]
num_features = data.shape[1]
x = tf.placeholder(tf.float32, [None, num_features])
y_ = tf.placeholder(tf.float32, [None])
coeffs = tf.Variable(tf.random_normal(shape=[num_features, 1]))
bias = tf.Variable(tf.random_normal(shape=[1]))
# Prediction
y = tf.matmul(x, coeffs) + bias
# Cost function
cost = tf.reduce_sum(tf.pow(y-y_, 2))/(2.*num_datapoints)
# Optimizer
NUM_STEPS = 500
optimizer = tf.train.AdamOptimizer()
train_step = optimizer.minimize(lasso_cost)
# Fit the model
init = tf.initialize_all_variables()
cost_history = np.zeros(NUM_STEPS)
sess = tf.Session()
sess.run(init)
for i in range(NUM_STEPS):
if i % 100 == 0:
print 'Step:', i
for xi, yi in zip(data, labels):
sess.run(train_step, feed_dict={x: np.expand_dims(xi, axis=0),
y_: np.expand_dims(yi, axis=0)})
cost_history[i] = sess.run(lasso_cost, feed_dict={x: data,
y_:labels})
return sess.run(coeffs), cost_history
The code works, and finds the correct coefficients. However, it is extremely slow. On my MacBook Pro, it takes several minutes just to run a few training epochs for a data set with 1000 data points and 10 features. Since I'm running OSX I don't have GPU acceleration, which could explain some of the slowness, but I would think that it could be faster than this. I have experimented with different optimizers, but the performance is very similar.
Is there some obvious way to speed up this code? Otherwise, it feels like tensorflow is pretty much useless for these types of problems.
It is so slow, since you train the network point by point which requires NUM_STEPS * num_datapoints iterations (which leads to 5 hundred thousands cycles).
All you actually need to train your network is
for i in range(NUM_STEPS):
sess.run(train_step, feed_dict={x: data, y_:labels})
This would take just a couple of seconds.
I started with simple implementation of single variable linear gradient descent but don't know to extend it to multivariate stochastic gradient descent algorithm ?
Single variable linear regression
import tensorflow as tf
import numpy as np
# create random data
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.5
# Find values for W that compute y_data = W * x_data
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
y = W * x_data
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# Before starting, initialize the variables
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
# Fit the line.
for step in xrange(2001):
sess.run(train)
if step % 200 == 0:
print(step, sess.run(W))
You have two part in your question:
How to change this problem to a higher dimension space.
How to change from the batch gradient descent to a stochastic gradient descent.
To get a higher dimensional setting, you can define your linear problem y = <x, w>. Then, you just need to change the dimension of your Variable W to match the one of w and replace the multiplication W*x_data by a scalar product tf.matmul(x_data, W) and your code should run just fine.
To change the learning method to a stochastic gradient descent, you need to abstract the input of your cost function by using tf.placeholder.
Once you have defined X and y_ to hold your input at each step, you can construct the same cost function. Then, you need to call your step by feeding the proper mini-batch of your data.
Here is an example of how you could implement such behavior and it should show that W quickly converges to w.
import tensorflow as tf
import numpy as np
# Define dimensions
d = 10 # Size of the parameter space
N = 1000 # Number of data sample
# create random data
w = .5*np.ones(d)
x_data = np.random.random((N, d)).astype(np.float32)
y_data = x_data.dot(w).reshape((-1, 1))
# Define placeholders to feed mini_batches
X = tf.placeholder(tf.float32, shape=[None, d], name='X')
y_ = tf.placeholder(tf.float32, shape=[None, 1], name='y')
# Find values for W that compute y_data = <x, W>
W = tf.Variable(tf.random_uniform([d, 1], -1.0, 1.0))
y = tf.matmul(X, W, name='y_pred')
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y_ - y))
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# Before starting, initialize the variables
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
# Fit the line.
mini_batch_size = 100
n_batch = N // mini_batch_size + (N % mini_batch_size != 0)
for step in range(2001):
i_batch = (step % n_batch)*mini_batch_size
batch = x_data[i_batch:i_batch+mini_batch_size], y_data[i_batch:i_batch+mini_batch_size]
sess.run(train, feed_dict={X: batch[0], y_: batch[1]})
if step % 200 == 0:
print(step, sess.run(W))
Two side notes:
The implementation below is called a mini-batch gradient descent as at each step, the gradient is computed using a subset of our data of size mini_batch_size. This is a variant from the stochastic gradient descent that is usually used to stabilize the estimation of the gradient at each step. The stochastic gradient descent can be obtained by setting mini_batch_size = 1.
The dataset can be shuffle at every epoch to get an implementation closer to the theoretical consideration. Some recent work also consider only using one pass through your dataset as it prevent over-fitting. For a more mathematical and detailed explanation, you can see Bottou12. This can be easily change according to your problem setup and the statistic property your are looking for.