I have the following code extraction for calculation of cost function with iteration. Before that, feature scaling, resharp, lstm and training have been done and using the same set of data and variables to perform the calculation of cost function.
# learning parameter
learning_rate = 0.01
# iterative parameters
EPOCHS = 1000 # number of iterations
PRINT_STEP = 100 # the interval of printing validation result
# read data and data preprocessings
read_data_pd = pd.read_csv('./price.csv')
input_pd = read_data_pd.drop(['year','month','day'], axis=1)
temp_pd = feature_scaling(input_pd[feature_to_scale],sacling_meathod) # call the function feature scaling
input_pd[feature_to_scale] = temp_pd
x_ = tf.placeholder(tf.float32, [None, batch_size, n_features])
y_ = tf.placeholder(tf.float32, [None, 1])
# call the lstm-rnn function
lstm_output = lstm(x_, n_features, batch_size, n_lstm_layers, lstm_scope_name)
# linear regressor
# w is the weight vector
W = tf.Variable(tf.random_normal([n_features, n_pred_class]))
# b is the bias
b = tf.Variable(tf.random_normal([n_pred_class]))
# Y = WX + b
y = tf.matmul(lstm_output, W) + b
#define the cost function
cost_func = tf.reduce_mean(tf.square(y - y_))
train_op = tf.train.AdamOptimizer(learning_rate).minimize(cost_func)
# initialize all variables
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for ii in range(EPOCHS):
sess.run(train_op, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
if ii % PRINT_STEP == 0:
cost = sess.run(cost_func, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
print 'iteration =', ii, 'training cost:', cost
When I run the program, the print of calculation results of cost function is work. Yet, each time when the program runs, the results is different. For example, the results of 100 iteration sometimes will print 0.868856, but sometime might be 0.905526, which means the code have some problem.
One thing I noticed is the lines with initialize all variables: tf.initialize_all_variables() as message said that
initialize_all_variables is deprecated and will be removed after 2017-03-02.
Instructions for updating: Use `tf.global_variables_initializer` instead.
I follow the instruction, but it has no effect to amend the calculation error.
Therefore, I would like to know what is wrong with the codes and also how to correct it?
Related
I want to use tf.identity to copy the loss and variables either before or after the optimization step:
Here is the before case:
Copy the current loss and variables (save variables and associated loss)
Run one step of optimization (changes loss and values)
Repeat
Here is the after case:
Run one step of optimization (changes loss and values)
Copy the current loss and variables (save variables and associated loss)
Repeat
By "copy", I mean to create nodes in the computation graph to store the current values of loss and variables with tf.identity.
Somehow, this is what actually happens:
Copy loss
Run one step of optimization (changes loss and values)
Copy variable
(this value doesn't correspond to the loss saved in step 1)
Repeat
How to fix this?
I could evaluate the loss again right after step 3. But that means wasting one evaluation of the loss in every cycle.
Test:
Copy loss and variables
Run optimization step and copy loss and variables.
If copying of loss and variables always happen before the optimization step, then the copies made in step 1 and step 2 would be the same.
Otherwise, copies made in step 1 and step 2 can be different.
import numpy as np
import tensorflow as tf
x = tf.get_variable('x', initializer=np.array([1], dtype=np.float64))
loss = x * x
optim = tf.train.AdamOptimizer(1)
## Control Dependencies ##
loss_ident = tf.identity(loss) # <-- copy loss
x_ident = tf.identity(x) # <-- copy variable
with tf.control_dependencies([loss_ident, x_ident]):
train_op = optim.minimize(loss)
## Run ##
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
for i in range(1000):
# step 1
a_, x1_ = sess.run([loss, x_ident])
# step 2
b_, x2_ = sess.run([loss_ident, x_ident, train_op])[:-1]
print("loss:", a_, b_)
assert np.allclose(a_, b_)
print("variables:", x1_, x2_)
assert np.allclose(x1_, x2_)
Result:
step 1 step 2
loss: [1.] [1.]
variables: [1.] [1.58114875e-07] # <-- not the same
AssertionError
Unfortunately, the copies of the variable in step 1 and step 2 are different. Therefore, copying of the variable does not always happen before the optimization
I am not entirely clear why the control dependencies don't work with the Tensors. But you can get it to work with Variables and tf.assign(). Here's my solution. From my understanding all you need is the copy to happen before train_op. From the quick few tests I did, this seems to work.
import tensorflow as tf
tf.reset_default_graph()
x = tf.get_variable('x', initializer=np.array([1], dtype=np.float64))
x_ident = tf.get_variable('x_ident', initializer=np.array([1], dtype=np.float64))
loss = x * x
loss_ident = tf.get_variable('loss', initializer=np.array([1.0]), dtype=tf.float64)
optim = tf.train.AdamOptimizer(1)
## Control Dependencies ##
loss_ident = tf.assign(loss_ident, loss, name='loss_assign') # <-- copy loss
x_ident = tf.assign(x_ident, x, name='x_assign') # <-- copy variable
with tf.control_dependencies([x_ident, loss_ident]):
train_op = optim.minimize(loss)
## Run ##
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
for i in range(10):
# step 1
a, x1 = sess.run([loss_ident, x_ident])
# step 2
b, x2, _ = sess.run([loss_ident, x_ident, train_op])
#print("loss:", a_, b_)
print('ab',a,b)
print('x1x2',x1, x2)
assert np.allclose(a, b)
#print("variables:", x1_, x2_)
assert np.allclose(x1, x2)
Hopefully, this is what you're looking for.
I am trying to implement linear regression using tensor-flow. Following is the code I am using.
import tensorflow as tf
import numpy as np
import pandas as pd
import os
rng = np.random
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# reading data from a csv file
file1 = pd.read_csv('out.csv')
x_data=file1['^GSPC']
# converting datafram into array
x_data=x_data.values
y_data=file1['FB']
#converting dataframe into array
y_data=y_data.values
n_steps = 1000 #Total number of steps
n_iterations = [] #Nth iteration value
n_loss = [] #Loss at nth iteration
learned_weight = [] #weight at nth iteration
learned_bias = [] #bias value at nth iteration
# Try to find values for W and b that compute y_data = W * x_data + b
W = tf.Variable(rng.randn())
b = tf.Variable(rng.rand())
y = W * x_data + b
# Minimize the mean squared errors.
loss=tf.reduce_sum(tf.pow(y-y_data, 2))/(2*28)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
with tf.Session() as sess:
# Before starting, initialize the variables. We will 'run' this first.
sess.run(tf.global_variables_initializer())
for step in range(n_steps):
sess.run(train)
n_iterations.append(step)
n_loss.append(loss.eval())
learned_weight.append(W.eval())
learned_bias.append(b.eval())
print("Final Weight: "+str(learned_weight[-1])+", Final Bias: "+str(learned_bias[-1]) + ", Final cost:"+str(n_loss[-1]))
The problem is every time I run the code I get different result (weights, bias and cost(loss)). I have studied from a few resources that weights, bias and cost should be approximately same in every run.
Secondly, the line i.e ( y=weights*x_data+bias) does not quite fit the training data.
Thirdly, I have to convert dataframe x_data and y_data to array by implementing the following
x_data=x_data.values
y_data=y_data.values
if I don’t do as shown above my code run the following error:
Traceback (most recent call last): File "python", line 33, in File "tensorflow/python/framework/fast_tensor_util.pyx", line 120, in tensorflow.python.framework.fast_tensor_util.AppendObjectArrayToTensorProto TypeError: Expected binary or unicode string, got tf.Tensor 'sub:0' shape=(28,) dtype=float32
Please help me understanding what I am doing wrong!
P.S: My questions may sound stupid because I am new to tensor flow and machine learning.
The code is implemented wrongly:
Use tf.Placeholders for data that will be passed into the model.
Use the feed_dict attribute of sess.run to pass data to the placeholder when executing the graph.
Here's an updated example:
Build the Graph
import numpy as np
import tensorflow as tf
import numpy as np
# dataset
X_data = np.random.randn(100,3)
y_data = 2*np.sum(X_data, 1)+0.01
# reshape y to be a column vector
y_data = np.reshape(y_data, [-1, 1])
# parameters
n_steps = 1000 #Total number of steps
batch_size = 20
input_length = X_data.shape[0] # => 100
display_cost = 500
# data placeholders
X = tf.placeholder(shape=[None, 3],dtype = tf.float32)
y = tf.placeholder(shape=[None, 1],dtype = tf.float32)
# build the model
W = tf.Variable(initial_value = tf.random_normal([3,1]))
b = tf.Variable(np.random.rand())
y_fitted = tf.add(tf.matmul(X, W), b)
# Minimize the mean squared errors
loss=tf.losses.mean_squared_error(labels=y, predictions=y_fitted)
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
Execute in Session
# execute in Session
with tf.Session() as sess:
# initialize all variables
tf.global_variables_initializer().run()
# Train the model
for steps in range(n_steps):
mini_batch = zip(range(0, input_length, batch_size),
range(batch_size, input_length+1, batch_size))
# train data in mini-batches
for (start, end) in mini_batch:
sess.run(optimizer, feed_dict = {X: X_data[start:end],
y: y_data[start:end]})
# print training performance
if (steps+1) % display_cost == 0:
print('Step: {}'.format((steps+1)))
# evaluate loss function
cost = sess.run(loss, feed_dict = {X: X_data,
y: y_data})
print('Cost: {}'.format(cost))
# report rmse for training and test data
print('\nFinal Weight: {}'.format(W.eval()))
print('\nFinal Bias: {}'.format(b.eval()))
Output for 2 runs
# Run 1
Step: 500
Cost: 3.1569701713918263e-11
Step: 1000
Cost: 3.1569701713918263e-11
Final Weight: [[2.0000048]
[2.0000024]
[1.9999973]]
Final Bias: 0.010000854730606079
# Run 2
Step: 500
Cost: 7.017615221566187e-12
Step: 1000
Cost: 7.017615221566187e-12
Final Weight: [[1.9999975]
[1.9999989]
[1.9999999]]
Final Bias: 0.0099998963996768
Indeed, the weight and bias are approximately the same for multiple calls to build a classifier using the same dataset. Also when doing numerical computations, Numpy ndarrays are mostly the preferred data format hence the conversion using .values.
I am trying to understand how policy gradient works and build a pong game agent from sketch using Tensorflow but it seems doesn't work. I am not sure if I have some misunderstanding for the policy gradient. Here I post my code with explanations.
Create placeholder first for holding the input frame obtained from the environmant
input_frame = tf.placeholder(tf.float32, [380*400, None], name='input_frame')
Define the parameter for the simple network and the model
W1 = tf.get_variable(name='w1', shape=[200, 380*400],initializer=tf.contrib.layers.xavier_initializer(seed=2))
b1 = tf.get_variable(name='b1', shape=[200, 1], initializer=tf.zeros_initializer())
W2 = tf.get_variable(name='w2', shape=[3, 200],initializer=tf.contrib.layers.xavier_initializer(seed=2))
b2 = tf.get_variable(name='b2', shape=[3, 1], initializer=tf.zeros_initializer())
def build_model():
L_1 =tf.nn.relu(tf.add(tf.matmul(W1, input_frame), b1))
L_2 =tf.nn.relu(tf.add(tf.matmul(W2, L_1), b2))
Y = tf.nn.softmax(L_2, axis=0)
return Y
Discount reward function
def discount_rewards(r):
discounted_r = np.zeros_like(r)
running_add = 0
for t in reversed(range(0, discounted_r.size)):
if r[t] != 0: running_add = 0
#https://github.com/hunkim/ReinforcementZeroToAll/issues/1
running_add = running_add * gamma + r[t]
discounted_r[t] = running_add
return discounted_r
Build the model
Y = build_model()
Sample operation
sample_op = tf.multinomial(logits=tf.reshape(Y, (1, 3)), num_samples=1)
Y_action = sample_op - 1 #Move up: -1, stay still: 0, move down: 1
Routine initialisation
global_step = tf.train.get_or_create_global_step()
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
Retrieve next frame from the environment with cropping
next_frame = my_pong.get_next_frame().T[pong_game.SCOREBOARD_HEIGHT + 6 : pong_game.WINDOW_HEIGHT, 0 + pong_game.PADDLE_THICKNESS:pong_game.WINDOW_WIDTH - pong_game.PADDLE_THICKNESS].reshape(380*400, 1)
Get the observation by subtracting the current frame and the last frame
observation_ = next_frame - last_frame
action_ = sess.run(Y_action, feed_dict={
input_frame: observation_})
For each frame, get the reward from the environment by inputting the sampled action, and append the episode memory.
done_, reward_ = my_pong.paddle_2_control(action_)
episode_menory.append((observation_, action_, float(reward_)))
last_frame = next_frame
If one side obtains 21 point, the done_ value will be true and start the training process
First discount and normalise the reward
if done_:
obs, lab, rwd = zip(*episode_menory)
prwd = discount_rewards(rwd)
prwd -= np.mean(prwd)
prwd /= np.std(prwd)
obs_reshape = np.squeeze(obs).T
Build a one hot label using the sampled action in the episode
lab_one_hot = tf.one_hot(np.squeeze(lab)+1, 3, axis=0)
Cross entropy loss. One hot labels are the sampled action while logits are the output of the model. (I am not sure if this is correct or not)
cross_entropy = tf.losses.softmax_cross_entropy(onehot_labels=lab_one_hot, logits=Y)
Define the cost function by multiplying the processed reward "prwd" and the cross entropy lost
cost = cross_entropy * prwd
Define the optimizer and start training
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(cost)
_, cost_ = sess.run([train_op, cost], feed_dict={input_frame: obs_reshape})
episode_menory = []
print("Episode %d finish! Cost = %f" % (episode, np.sum(cost_)))
episode += 1
I run this program on my Titan X device for a weekend but it seems no improvement for the performance. The cost is outputting randomly sometimes negative sometimes positive. And I don't know how to measure the model preferences from the outputted cost value. I wonder which part that I miss or mistaken. Thank you very much.
I post the full code here: https://github.com/ivonchan0414/pg_pong
You have a rich (/complex) training and feedback setup here. I can not pinpoint an error by simple casual examination.
A way to isolate the issue might be to follow a typical pipelines debugging and verification process:
instrument the individual stages of your pipeline by saving the intermediate results
validate the intermediate results: you will then narrow down where the behavior diverges from the expected
In addition:
start with an input dataset for the testing that has a well known expected behavior. You will likely not see that in the output initially: but at least you will be sure that the cause were not due to data that your model can not readily handle.
I am new to tensorflow and I am trying to implement a simple feed-forward network for regression, just for learning purposes. The complete executable code is as follows.
The regression mean squared error is around 6, which is quite large. It is a little unexpected because the function to regress is linear and simple 2*x+y, and I expect a better performance.
I am asking for help to check if I did anything wrong in the code. I carefully checked the matrix dimensions so that should be good, but it is possible that I misunderstand something so the network or the session is not properly configured (like, should I run the training session multiple times, instead of just one time (the code below enclosed by #TRAINING#)? I see in some examples they input data piece by piece, and run the training progressively. I run the training just one time and input all data).
If the code is good, maybe this is a modeling issue, but I really don't expect to use a complicated network for such a simple regression.
import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error
# inputs are points from a 100x100 grid in domain [-2,2]x[-2,2], total 10000 points
lsp = np.linspace(-2,2,100)
gridx,gridy = np.meshgrid(lsp,lsp)
inputs = np.dstack((gridx,gridy))
inputs = inputs.reshape(-1,inputs.shape[-1]) # reshpaes the grid into a 10000x2 matrix
feature_size = inputs.shape[1] # feature_size is 2, features are the 2D coordinates of each point
input_size = inputs.shape[0] # input_size is 10000
# a simple function f(x)=2*x[0]+x[1] to regress
f = lambda x: 2 * x[0] + x[1]
label_size = 1
labels = f(inputs.transpose()).reshape(-1,1) # reshapes labels as a column vector
ph_inputs = tf.placeholder(tf.float32, shape=(None, feature_size), name='inputs')
ph_labels = tf.placeholder(tf.float32, shape=(None, label_size), name='labels')
# just one hidden layer with 16 units
hid1_size = 16
w1 = tf.Variable(tf.random_normal([hid1_size, feature_size], stddev=0.01), name='w1')
b1 = tf.Variable(tf.random_normal([hid1_size, label_size]), name='b1')
y1 = tf.nn.relu(tf.add(tf.matmul(w1, tf.transpose(ph_inputs)), b1))
# the output layer
wo = tf.Variable(tf.random_normal([label_size, hid1_size], stddev=0.01), name='wo')
bo = tf.Variable(tf.random_normal([label_size, label_size]), name='bo')
yo = tf.transpose(tf.add(tf.matmul(wo, y1), bo))
# defines optimizer and predictor
lr = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss = tf.losses.mean_squared_error(ph_labels,yo)
optimizer = tf.train.GradientDescentOptimizer(lr).minimize(loss)
predictor = tf.identity(yo)
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
_, c = sess.run([optimizer, loss], feed_dict={lr:0.05, ph_inputs: inputs, ph_labels: labels})
# TRAINING
# gets the regression results
predictions = np.zeros((input_size,1))
for i in range(input_size):
predictions[i] = sess.run(predictor, feed_dict={ph_inputs: inputs[i, None]}).squeeze()
# prints regression MSE
print(mean_squared_error(predictions, labels))
You're right, you understood the problem by yourself.
The problem is, in fact, that you're running the optimization step only one time. Hence you're doing one single update step of your network parameter and therefore the cost won't decrease.
I just changed the training session of your code in order to make it work as expected (100 training steps):
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(100):
_, c = sess.run(
[optimizer, loss],
feed_dict={
lr: 0.05,
ph_inputs: inputs,
ph_labels: labels
})
print("Train step {} loss value {}".format(i, c))
# TRAINING
and at the end of the training step I go:
Train step 99 loss value 0.04462708160281181
0.044106700712455045
I am trying to write a two layer neural network to train a class labeler. The input to the network is a 150-feature list of about 1000 examples; all features on all examples have been L2 normalized.
I only have two outputs, and they should be disjoint--I am just attempting to predict whether the example is a one or a zero.
My code is relatively simple; I am feeding the input data into the hidden layer, and then the hidden layer into the output. As I really just want to see this working in action, I am training on the entire data set with each step.
My code is below. Based on the other NN implementations I have referred to, I believe that the performance of this network should be improving over time. However, regardless of the number of epochs I set, I am getting back an accuracy of about ~20%. The accuracy is not changing when the number of steps are changed, so I don't believe that my weights and biases are being updated.
Is there something obvious I am missing with my model? Thanks!
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.01
epochs = 200
n_input = 150
n_hidden = 75
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden]))
b1 = tf.Variable(tf.truncated_normal([n_output]))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden]))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output]))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
model = tf.train.GradientDescentOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run(model,feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
I changed your optimizer to AdamOptimizer (in many cases it performs better than GradientDescentOptimizer).
I also played a bit with the parameters. In particular, I took smaller std for your variable initialization, decreased learning rate (as your loss was unstable and "jumped around") and increased epochs (as I noticed that your loss continues to decrease).
I also reduced the size of the hidden layer. It is harder to train networks with large hidden layer when you don't have that much data.
Regarding your loss, it is better to apply tf.reduce_mean on it so that loss would be a number. In addition, following the answer of ml4294, I used softmax instead of sigmoid, so the loss looks like:
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y))
The code below achieves accuracy of around 99.9% on the training data:
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.002
epochs = 400
n_input = 150
n_hidden = 60
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden],stddev=0.2,seed=0))
b1 = tf.Variable(tf.truncated_normal([n_output],stddev=0.2,seed=0))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden],stddev=0.2,seed=0))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output],stddev=0.2,seed=0))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y)) # calculate loss between prediction and actual
model = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run([model,loss],feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
Just a suggestion in addition to the answer provided by Miriam Farber:
You use a multi-dimensional output label ([0., 1.]) for the classification. I suggest to use the softmax cross entropy tf.nn.softmax_cross_entropy_with_logits() instead of the sigmoid cross entropy, since you assume the outputs to be disjoint softmax on Wikipedia. I achieved much faster convergence with this small modification.
This should also improve your performance once you decide to increase your output dimensionality from 2 to a higher number.
I guess you have some problem here:
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
It should look smth like that:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y))
Did't look at you code much, so if this would't work out you can check udacity deep learning course or forum they have good samples of that are you trying to do.
GL