Related
after a long time, I am still not able to run my nn without any bugs. Accuracy of this toy nn is an astonishing 1-2% (60 neurons in hidden layer, 100 epochs, 0.3 learning rate, tanh activation, MNIST dataset downloaded via TF) - so basically it is not learning at all. After all this time looking at videos / post about backpropagation, I am still not able to fix it.
So my bug must be in between the part marked with two ##### lines. I think that my understanding of derivatives in general is good, but I just cannot connect this knowlege with backpropagation.
If the backpropagation base is correct, then the mistake must at axis = 0/1, because I also cannot understand, how to determine on which axis I will be working on.
Also, I have a strong feeling, that dZ2 = A2 - Y might be wrong, it should be dZ2 = Y - A2, but after that correction, nn starts to guess only one number.
(and yes, backpropagation itself I haven't written, I have found it on the internet)
#importing data and normalizing it
#"x_test" will be my X
#"y_test" will be my Y
import tensorflow as tf
(traindataX, traindataY), (testdataX, testdataY) = tf.keras.datasets.mnist.load_data()
x_test = testdataX.reshape(testdataX.shape[0], testdataX.shape[1]**2).astype('float32')
x_test = x_test / 255
y_test = testdataY
y_test = np.eye(10)[y_test]
#Activation functions:
def tanh(z):
a = (np.exp(z)-np.exp(-z))/(np.exp(z)+np.exp(-z))
return a
###############################################################################START
def softmax(z):
smExp = np.exp(z - np.max(z, axis=0))
out = smExp / np.sum(smExp, axis=0)
return out
###############################################################################STOP
def neural_network(num_hid, epochs,
learning_rate, X, Y):
#num_hid - number of neurons in the hidden layer
#X - dataX - shape (10000, 784)
#Y - labels - shape (10000, 10)
#inicialization
W1 = np.random.randn(784, num_hid) * 0.01
W2 = np.random.randn(num_hid, 10) * 0.01
b1 = np.zeros((1, num_hid))
b2 = np.zeros((1, 10))
correct = 0
for x in range(1, epochs+1):
#feedforward
Z1 = np.dot(X, W1) + b1
A1 = tanh(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = softmax(Z2)
###############################################################################START
m = X.shape[1] #-> 784
loss = - np.sum((Y * np.log(A2)), axis=0, keepdims=True)
cost = np.sum(loss, axis=1) / m
#backpropagation
dZ2 = A2 - Y
dW2 = (1/m)*np.dot(A1.T, dZ2)
db2 = (1/m)*np.sum(dZ2, axis = 1, keepdims = True)
dZ1 = np.multiply(np.dot(dZ2, W2.T), 1 - np.power(A1, 2))
dW1 = (1/m)*np.dot(X.T, dZ1)
db1 = (1/m)*np.sum(dZ1, axis = 1, keepdims = True)
###############################################################################STOP
#parameters update - gradient descent
W1 = W1 - dW1*learning_rate
b1 = b1 - db1*learning_rate
W2 = W2 - dW2*learning_rate
b2 = b2 - db2*learning_rate
for i in range(np.shape(Y)[1]):
guess = np.argmax(A2[i, :])
ans = np.argmax(Y[i, :])
print(str(x) + " " + str(i) + ". " +"guess: ", guess, "| ans: ", ans)
if guess == ans:
correct = correct + 1;
accuracy = (correct/np.shape(Y)[0]) * 100
Lucas,
Good problem to refresh the fundamentals. I made a few fixes to your code:
calculation of m
Transposed all weights and biases (can't explain properly, but it was not working otherwise).
changed calculation of accuracy (and loss, which is not used).
See the corrected code below. It gets to 90% accuracy with your original parameters:
def neural_network(num_hid, epochs, learning_rate, X, Y):
#num_hid - number of neurons in the hidden layer
#X - dataX - shape (10000, 784)
#Y - labels - shape (10000, 10)
#inicialization
# W1 = np.random.randn(784, num_hid) * 0.01
# W2 = np.random.randn(num_hid, 10) * 0.01
# b1 = np.zeros((1, num_hid))
# b2 = np.zeros((1, 10))
W1 = np.random.randn(num_hid, 784) * 0.01
W2 = np.random.randn(10, num_hid) * 0.01
b1 = np.zeros((num_hid, 1))
b2 = np.zeros((10, 1))
for x in range(1, epochs+1):
correct = 0 # moved inside cycle
#feedforward
# Z1 = np.dot(X, W1) + b1
Z1 = np.dot(W1, X.T) + b1
A1 = tanh(Z1)
# Z2 = np.dot(A1, W2) + b2
Z2 = np.dot(W2, A1) + b2
A2 = softmax(Z2)
###############################################################################START
m = X.shape[0] #-> 784 # SHOULD BE NUMBER OF SAMPLES IN THE BATCH
# loss = - np.sum((Y * np.log(A2)), axis=0, keepdims=True)
loss = - np.sum((Y.T * np.log(A2)), axis=0, keepdims=True)
cost = np.sum(loss, axis=1) / m
#backpropagation
# dZ2 = A2 - Y
# dW2 = (1/m)*np.dot(A1.T, dZ2)
# db2 = (1/m)*np.sum(dZ2, axis = 1, keepdims = True)
# dZ1 = np.multiply(np.dot(dZ2, W2.T), 1 - np.power(A1, 2))
# dW1 = (1/m)*np.dot(X.T, dZ1)
dZ2 = A2 - Y.T
dW2 = (1/m)*np.dot(dZ2, A1.T)
db2 = (1/m)*np.sum(dZ2, axis = 1, keepdims = True)
dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
dW1 = (1/m)*np.dot(dZ1, X)
db1 = (1/m)*np.sum(dZ1, axis = 1, keepdims = True)
###############################################################################STOP
#parameters update - gradient descent
W1 = W1 - dW1*learning_rate
b1 = b1 - db1*learning_rate
W2 = W2 - dW2*learning_rate
b2 = b2 - db2*learning_rate
guess = np.argmax(A2, axis=0) # axis fixed
ans = np.argmax(Y, axis=1) # axis fixed
# print (guess.shape, ans.shape)
correct += sum (guess==ans)
# #print(str(x) + " " + str(i) + ". " +"guess: ", guess, "| ans: ", ans)
# if guess == ans:
# correct = correct + 1;
accuracy = correct / x_test.shape[0]
print (f"Epoch {x}. accuracy = {accuracy*100:.2f}%")
neural_network (64, 100, 0.3, x_test, y_test)
Epoch 1. accuracy = 14.93%
Epoch 2. accuracy = 34.70%
Epoch 3. accuracy = 47.41%
(...)
Epoch 98. accuracy = 89.29%
Epoch 99. accuracy = 89.33%
Epoch 100. accuracy = 89.37%
It might be because you should normalize your inputs between the values of 0 and 1 by dividing X by 255 (255 is max pixel value). You should also have Y one hot encoded as series of size 10 vectors. I think your backprop is right, but you should implement gradient checking to double check.
You are calculating accuracy in the wrong way.
First correct variable should be initialized to 0 for each iteration of an epoch, second
if y.shape is (10000, 10), then for calculating accuracy, loop should be for i in range(np.shape(Y)[0]) not for i in range(np.shape(Y)[1]) , the first one will iterate for 10,000 times the second one will iterate for 10 times.
A better approach will be to use NumPy to calculate the number of correct guesses correct = np.sum(np.argmax(A2,axis=1) == np.argmax(Y,axis=1))
Your learning rate is too high, I was able to achieve 50% accuracy by setting learning rate to 0.003 for 50 epoch and 60 hidden neuron
def neural_network(num_hid, epochs,
learning_rate, X, Y):
#num_hid - number of neurons in the hidden layer
#X - dataX - shape (10000, 784)
#Y - labels - shape (10000, 10)
#inicialization
W1 = np.random.randn(784, num_hid) * 0.01
W2 = np.random.randn(num_hid, 10) * 0.01
b1 = np.zeros((1, num_hid))
b2 = np.zeros((1, 10))
correct = 0
for x in range(1, epochs+1):
#feedforward
Z1 = np.dot(X, W1) + b1
A1 = tanh(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = softmax(Z2)
###############################################################################START
m = X.shape[1] #-> 784
loss = - np.sum((Y * np.log(A2)), axis=0, keepdims=True)
cost = np.sum(loss, axis=1) / m
#backpropagation
dZ2 = A2 - Y
dW2 = (1/m)*np.dot(A1.T, dZ2)
db2 = (1/m)*np.sum(dZ2, axis = 1, keepdims = True)
dZ1 = np.multiply(np.dot(dZ2, W2.T), 1 - np.power(A1, 2))
dW1 = (1/m)*np.dot(X.T, dZ1)
db1 = (1/m)*np.sum(dZ1, axis = 1, keepdims = True)
###############################################################################STOP
#parameters update - gradient descent
W1 = W1 - dW1*learning_rate
b1 = b1 - db1*learning_rate
W2 = W2 - dW2*learning_rate
b2 = b2 - db2*learning_rate
correct = 0
for i in range(np.shape(Y)[0]):
guess = np.argmax(A2[i, :])
ans = np.argmax(Y[i, :])
# print(str(x) + " " + str(i) + ". " +"guess: ", guess, "| ans: ", ans)
if guess == ans:
correct = correct + 1
# correct = np.sum(np.argmax(A2,axis=1) == np.argmax(Y,axis=1))
# print(correct)
accuracy = (correct/np.shape(Y)[0]) * 100
print(accuracy)
You need to be experimental, for good accuracy, try tuning number of hidden layer , epoch and learning rate.
How can I implement such linear classifier in TensorFlow:
x1*w1 + x2*w2 + x3*w3 = y_pred,
where x1, x2, x3 - vectors and w1, w2 and w3 - scalars?
I have nice tutorial for case where x1, x2, x3 - scalars (link),
but for case where x1, x2, x3 are vectors I have no realization ideas.
UPDATE
That is I am trying to implement the following model:
x1*w1+ x2*w1+x3*w1+x4*w2+x5*w2+x6*w2+x7*w3+x8*w3+x9*w3=y_pred,
where x1..x9 and w1..w9 are scalars.
The linear multiclass classifier to be implemented:
pred = w1 * (x1 + x2 + x3) + w2 * (x4 + x5 + x6) + w3 * (x7 + x8 + x9)
in which all variables are scalars.
In this model, since pred is a scalar, you cannot use cross-entropy loss for training the classifier (pred is not a distribution). You have to treat it as a regression problem.
Example dataset
import numpy as np
x1 = np.ones((100, 3)) # for w1
x2 = np.ones((100, 3)) * 2 # for w2
x3 = np.ones((100, 3)) * 3 # for w3
# set(y) is {0, 1, 2, 3}, corresponds to the four class labels
y = np.random.randint(0, 4, 100).reshape(-1, 1)
Example tensorflow code:
import tensorflow as tf
tf.reset_default_graph()
f1 = tf.placeholder('float32', shape=[None, 3], name='f1')
f2 = tf.placeholder('float32', shape=[None, 3], name='f2')
f3 = tf.placeholder('float32', shape=[None, 3], name='f3')
target = tf.placeholder('float32', shape=[None, 1], name='target')
# the three scalars
w1 = tf.get_variable('w1', shape=[1], initializer=tf.random_normal_initializer())
w2 = tf.get_variable('w2', shape=[1], initializer=tf.random_normal_initializer())
w3 = tf.get_variable('w3', shape=[1], initializer=tf.random_normal_initializer())
pred_1 = tf.reduce_sum(tf.multiply(f1, w1), axis=1)
pred_2 = tf.reduce_sum(tf.multiply(f2, w2), axis=1)
pred_3 = tf.reduce_sum(tf.multiply(f3, w3), axis=1)
# till now the linear classifier has been constructed
# pred = w1(x1 + x2 + x3) + w2(x4 + x5 + x6) + w3(x7 + x8 + x9)
pred = tf.add_n([pred_1, pred_2, pred_3])
# treat it as a regression problem
loss = tf.reduce_mean(tf.square(pred - target))
optimizer = tf.train.GradientDescentOptimizer(1e-5)
updates = optimizer.minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t in range(50):
loss_val, _ = sess.run([loss, updates],
feed_dict={f1: x1, f2: x2, f3: x3, target: y})
print(t, loss_val)
Below is a simple example uses cross-entropy loss for training a multiclass classifier. As you can notice, this model is a neural network model
import numpy as np
import tensorflow as tf
x1 = np.ones((100, 3)) # for w1
x2 = np.ones((100, 3)) * 2 # for w2
x3 = np.ones((100, 3)) * 3 # for w3
y = np.random.randint(0, 4, 400).reshape(100, 4)
tf.reset_default_graph()
f1 = tf.placeholder('float32', shape=[None, 3], name='f1')
f2 = tf.placeholder('float32', shape=[None, 3], name='f2')
f3 = tf.placeholder('float32', shape=[None, 3], name='f3')
target = tf.placeholder('float32', shape=[None, 4], name='target')
# the three scalars
w1 = tf.get_variable('w1', shape=[1], initializer=tf.random_normal_initializer())
w2 = tf.get_variable('w2', shape=[1], initializer=tf.random_normal_initializer())
w3 = tf.get_variable('w3', shape=[1], initializer=tf.random_normal_initializer())
w = tf.get_variable('w', shape=[3, 4], initializer=tf.random_normal_initializer())
pred_1 = tf.reduce_sum(tf.multiply(f1, w1), axis=1)
pred_2 = tf.reduce_sum(tf.multiply(f2, w2), axis=1)
pred_3 = tf.reduce_sum(tf.multiply(f3, w3), axis=1)
pred = tf.stack([pred_1, pred_2, pred_3], axis=1)
pred = tf.matmul(pred, w)
loss = tf.losses.softmax_cross_entropy(onehot_labels=target, logits=pred)
optimizer = tf.train.GradientDescentOptimizer(1e-5)
updates = optimizer.minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t in range(50):
loss_val, _ = sess.run([loss, updates],
feed_dict={f1: x1, f2: x2, f3: x3, target: y})
print(t, loss_val)
I used created an array that looks like [w1, w1, w1, w2, w2, w2 ...] and multiplied it (element-wise) by x before summing all terms up. I could not get model.fit to work so I copied the train_step code from https://www.tensorflow.org/tutorials/quickstart/advanced. It seems to work just fine. I left my test code at the bottom for you to inspect.
This makes use of tensorlfow 2.0 and the intergration with keras models
import numpy as np
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.optimizers import Adam
print(tf.executing_eagerly())
class ProductAdd(Model):
def __init__(self):
super(ProductAdd, self).__init__()
self.vars = list(np.empty([3])) # Creates an empty list (same as [ , , ])
for i in range(3):
self.vars[i] = tf.Variable( # Creates 3 variables to act as weights
np.random.standard_normal(), # Assigns variables random value to start
name='var'+str(i)) # Names them var0 var1...
def call(self, x):
extended_vars = [self.vars[int(np.floor(i/3))] # "Extends" var array to look like:
for i in range(9)] # [w1, w1, w1, w2, w2, w2, w3, w3, w3]
return np.sum(np.multiply(x, extended_vars)) # Perfoms element-wise multiplication on x and sums
loss_object = MeanSquaredError() # Create loss and optimizer
optimizer = Adam()
#tf.function # This function perfoms trains the model
def train_step(images, labels): # I got it from https://www.tensorflow.org/tutorials/quickstart/advanced
with tf.GradientTape() as tape:
predictions = model(images)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
model = ProductAdd()
for _ in range(100):
train_step([1.0, 2.0 ,3.0 ,4.0, 5.0, 6.0, 7.0, 8.0, 9.0], [0.0])
print(model([1.0, 2.0 ,3.0 ,4.0, 5.0, 6.0, 7.0, 8.0, 9.0]).numpy())
This question is ill-posed. You say you want x_1, x_2, x_3 to be vectors, however it's not clear what you would do with w_1, w_2, w_3. There are two possibilities.
If you want to keep them as scalars, as your question seems to imply, then the model is not really a vector model, you're just doing the same scalar operation on all the entries of the x vectors, but at once. This is equivalent to a scalar model.
Otherwise, you can define w_1, w_2, w_3 as matrices, or row vectors, if the label is scalar. In this case, there is no reason to write the equation as you wrote it, because you could stack the xs in a single vector and the ws in a single vector and write wx = y. In any case, this is a multivariate linear regression, of which you can find many examples, and tutorials on how to solve it in Tensorflow and Torch.
Update, given OP's clarification
In your comment, you now say you're interested in solving the following equation:
w1*(x1 + x2 + x3) + w2*(x4 + x5 + x6) + w3*(x7 + x8 + x9) == y
where all variables are scalars. Note that the x variables are known, so we can define (a simple arithmetic operation):
z1 = x1 + x2 + x3; z2 = x4 + x5 + x6; z3 = x7 + x8 + x9
And the equation becomes
w1*z1 + w2*z2 + w3*z3 = y.
So this is more like a linear algebra question rather than a tensorflow/torch question, because this equation can be solved analytically, and does not require numerical fitting. However, it is still ill-defined, because it has 3 unknowns (w1, w2, w3) for one linear equation. So it will not have a unique solution, but a two-dimensional linear space of solutions (it identifies a plane in the 3-dimensional w-space). To get some solutions, you can arbitrarily decide to set, for example, w1 = w2 = 0, from which you automatically get w3 = z3/y. Then do the same for the other two, and you'll get three different and linearly independent solutions.
Hope this helps. In summary, you don't need code at all.
Second update (from comment)
Why does it need to solved using optimization? If the problem is as you presented it, it clearly does not. Unless you mean you have many values for the Xs and Ys. In that case, you're doing multivariate linear regression. MLR can be solved using ordinary least squares, see for example https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9
I've been trying to train a model as usual with train/test data. I was able to have my accuracy, cost + the valid accuracy and cost. So I presume that the model is working and the result is enough with an 85%.
Now, after I finished with my train/test data, I have a csv file with the same type and structure of data but without one column (default -indicate if client will pay or be delayed). I'm trying to predict this value with the model. I'm bugging on how to insert those data and get back with the missing column.
Problem section :
This is my code for restoring and predict on the new data -> (y_pred [5100x41])
with tf.Session() as sess:
saver = tf.train.import_meta_graph('my_test_model101.meta')
print("Model found.")
saver.restore(sess, tf.train.latest_checkpoint('./'))
print("Model restored compl.")
z = tf.placeholder(tf.float32, shape= (None,5100))
y_pred= y_pred.as_matrix()
output =sess.run(z,feed_dict={x: y_pred})
print(output)
Can anyone help me to understand what's I am doing wrong here ?!!!
Error message is:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_4' with dtype float and shape [?,5100]
[[Node: Placeholder_4 = Placeholder[dtype=DT_FLOAT, shape=[?,5100], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Expecting:
My input [5100 x 41] but the last column had initially Nan value, I want it with the predicted value which is supposed to be 0 or 1.
To see the trained model architecure :
Model architecture :
# Number of input nodes.
input_nodes = 41
# Multiplier maintains a fixed ratio of nodes between each layer.
mulitplier = 3
# Number of nodes in each hidden layer
hidden_nodes1 = 41
hidden_nodes2 = round(hidden_nodes1 * mulitplier)
hidden_nodes3 = round(hidden_nodes2 * mulitplier)
# Percent of nodes to keep during dropout.
pkeep = tf.placeholder(tf.float32)
# input
x = tf.placeholder(tf.float32, [None, input_nodes])
# layer 1
W1 = tf.Variable(tf.truncated_normal([input_nodes, hidden_nodes1], stddev = 0.15))
b1 = tf.Variable(tf.zeros([hidden_nodes1]))
y1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
# layer 2
W2 = tf.Variable(tf.truncated_normal([hidden_nodes1, hidden_nodes2], stddev = 0.15))
b2 = tf.Variable(tf.zeros([hidden_nodes2]))
y2 = tf.nn.sigmoid(tf.matmul(y1, W2) + b2)
# layer 3
W3 = tf.Variable(tf.truncated_normal([hidden_nodes2, hidden_nodes3], stddev = 0.15))
b3 = tf.Variable(tf.zeros([hidden_nodes3]))
y3 = tf.nn.sigmoid(tf.matmul(y2, W3) + b3)
y3 = tf.nn.dropout(y3, pkeep)
# layer 4
W4 = tf.Variable(tf.truncated_normal([hidden_nodes3, 2], stddev = 0.15))
b4 = tf.Variable(tf.zeros([2]))
y4 = tf.nn.softmax(tf.matmul(y3, W4) + b4)
# output
y = y4
y_ = tf.placeholder(tf.float32, [None, 2])
After building the model, I understand you need to add Placeholder to stock what you're looking for. So :
# Parameters
training_epochs = 5 # These proved to be enough to let the network learn
training_dropout = 0.9
display_step = 1 # 10
n_samples = y_train.shape[0]
batch_size = 2048
learning_rate = 0.001
# Cost function: Cross Entropy
cost = -tf.reduce_sum(y_ * tf.log(y))
# We will optimize our model via AdamOptimizer
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# Correct prediction if the most likely value (default or non Default) from softmax equals the target value.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Till now everything is working well and I saved the model. I was able to restore this model (printed the variables and all was there---So restore is fine)
The placeholder 'z' has nothing in it and nothing is assigned to it. So when you run the session, nothing needs to be done because 'z' depends on nothing in the model. I think you want,
output =sess.run(y,feed_dict={x: y_pred})
Because 'y' is the output tensor.
Having said that, I think you might want to read up a little more on the flow graph used by tensorflow to understand how the calculations happen. Currently, it doesn't sound like you have fully understood the placeholder variables.
Thanks for looking into this question.
I am trying to train a 3-layer NN to predict the stock price on next 10 days based on stock price in previous 15 days. While using the GradientDescentOptimizer, the weights of the variables have not changed, hence would like to seek some assistance from you. I have tried the following:
Check that there's a tf.placeholder and that I have fed in a tensor with correct dimension.
Changed the learning rate and see if loss improves.
Changed loss function from reduce_sum to reduce_mean of the squared differences between actual data and prediction.
Randomised my tf.Variables.
The code that I'm running is as follow. Some symbols are not defined here for clarity of the code. Appreciate your kind advice on this matter!
#Setting value placeholder
x = tf.placeholder(tf.float64,shape=(19,15,1), name = 'Input')
y_ = tf.placeholder(tf.float64,shape=(19,10,1), name = 'Output')
#Setting DNN key architectural values
n_layers = 3
n_nodes_l1 = 20
n_nodes_l2 = 30
n_nodes_l3 = 10
W01 = tf.Variable(tf.random_uniform([n_nodes_l1, 15],0,1,dtype=tf.float64,name="W01"))
W02 = tf.Variable(tf.random_uniform([n_nodes_l2, n_nodes_l1],0,1,dtype=tf.float64),name='W02')
W03 = tf.Variable(tf.random_uniform([n_nodes_l3, n_nodes_l2],0,1,dtype=tf.float64),name='W03')
b01 = tf.Variable(tf.random_uniform([n_nodes_l1,1],0,1,dtype=tf.float64),name='b01')
b02 = tf.Variable(tf.random_uniform([n_nodes_l2,1],0,1,dtype=tf.float64),name='b02')
b03 = tf.Variable(tf.random_uniform([n_nodes_l3,1],0,1,dtype=tf.float64),name='b03')
#Building the architecture
def neural(X):
a01 = tf.matmul(W01, X) + b01
X2 = tf.sigmoid(a01)
a02 = tf.matmul(W02, X2) + b02
X3 = tf.sigmoid(a02)
a03 = tf.matmul(W03, X3) + b03
y_prediction= tf.sigmoid(a03)
return y_prediction
#Loss and Optimizer
loss = []
final_loss= []
y_pred_col = []
for n_batch in range(0,len(x_data)):
y_pred = neural(x[n_batch])
y_pred_col.append(y_pred)
loss = tf.reduce_mean(tf.square(y_ - y_pred_col))
optimizer = tf.train.GradientDescentOptimizer(0.0005).minimize(loss)
#Setting up Tensor Session
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
n_steps = 30
for iter in range(n_steps):
_, l, W01_train = sess.run([optimizer,loss,W01], feed_dict = {x: x_data, y_: y_data})
print(l)
I would do things a bit differently. There is something that doesn't make sense in your code:
for n_batch in range(0,len(x_data)):
y_pred = neural(x[n_batch])
y_pred_col.append(y_pred)
Here, each call of neural is creating a new neural network, so you end up having len(x_data) networks. I pressume that you want a single network. In that case, you should be calling neural only once:
y_pred = neural(x)
This will require you to define the tf.matmul operations from neural in a different way (as now you need to take the first dimension of X into account). The loss function will then be defined as:
loss = tf.reduce_mean(tf.square(y_ - y_pred))
Putting it all together:
#Setting value placeholder
x = tf.placeholder(tf.float64,shape=(None,15), name = 'Input')
y_ = tf.placeholder(tf.float64,shape=(None,10), name = 'Output')
#Setting DNN key architectural values
n_layers = 3
n_nodes_l1 = 20
n_nodes_l2 = 30
n_nodes_l3 = 10
W01 = tf.Variable(tf.random_uniform([15, n_nodes_l1],0,1,dtype=tf.float64,name="W01"))
W02 = tf.Variable(tf.random_uniform([n_nodes_l1, n_nodes_l2],0,1,dtype=tf.float64),name='W02')
W03 = tf.Variable(tf.random_uniform([n_nodes_l2, n_nodes_l3],0,1,dtype=tf.float64),name='W03')
b01 = tf.Variable(tf.random_uniform([n_nodes_l1],0,1,dtype=tf.float64),name='b01')
b02 = tf.Variable(tf.random_uniform([n_nodes_l2],0,1,dtype=tf.float64),name='b02')
b03 = tf.Variable(tf.random_uniform([n_nodes_l3],0,1,dtype=tf.float64),name='b03')
#Building the architecture
def neural(X):
a01 = tf.matmul(X, W01) + b01
X2 = tf.sigmoid(a01)
a02 = tf.matmul(X2, W02) + b02
X3 = tf.sigmoid(a02)
a03 = tf.matmul(X3, W03) + b03
y_prediction= tf.sigmoid(a03)
return y_prediction
#Loss and Optimizer
y_pred = neural(x)
loss = tf.reduce_mean(tf.square(y_ - y_pred))
optimizer = tf.train.GradientDescentOptimizer(0.0005).minimize(loss)
#Setting up Tensor Session
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
n_steps = 30
for iter in range(n_steps):
_, l, W01_train = sess.run([optimizer,loss,W01], feed_dict = {x: x_data, y_: y_data})
print(l)
Note that I changed the definition of the placeholders and weights for convenience. The code above will run provided that the shapes of x_data and y_data are (batch_size=19,15) and (batch_size=19,10), respectively. If the problem still remains after this modifications then it is probably due to other reasons (i.e. dependent on your data or hyperparameters).
I am working on a model where I have to classify my data into two classes. Most of the codes use tf.nn.sigmoid_cross_entropy_with_logits for calculating cross entropy for binary classification.
When I use the same function to I train my model, I am getting negative values of entropy. I want to ask if I can use tf.nn.softmax_cross_entropy_with_logits to overcome the negative entropy?
x = tf.placeholder(tf.float32, [None, Pixels])
W1 = tf.Variable(tf.random_normal([Pixels, Nodes1], stddev=0.01))
b1 = tf.Variable(tf.zeros([Nodes1]))
y1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
W2 = tf.Variable(tf.random_normal([Nodes1, Labels], stddev=0.01))
b2 = tf.Variable(tf.zeros([Labels]))
y = tf.matmul(y1, W2) + b2
cross_entropy =
tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y,
logits=y_))