Why model performs poor after normalization? - python

I'm using fully connected neural network and I am using normalized data such that every single sample values range from 0 to 1. I have used 100 neurons in first layer and 10 in second layer and used almost 50 lack samples during training. I want to classify my data into two classes. But my networks performance is too low, almost 49 percent on training and test data. I tried to increase the performance by changing the values of hyper parameters. But it didn't work. Can some one please tell me what should I do to get higher performance?
x = tf.placeholder(tf.float32, [None, nPixels])
W1 = tf.Variable(tf.random_normal([nPixels, nNodes1], stddev=0.01))
b1 = tf.Variable(tf.zeros([nNodes1]))
y1 = tf.nn.relu(tf.matmul(x, W1) + b1)
W2 = tf.Variable(tf.random_normal([nNodes1, nNodes2], stddev=0.01))
b2 = tf.Variable(tf.zeros([nNodes2]))
y2 = tf.nn.relu(tf.matmul(y1, W2) + b2)
W3 = tf.Variable(tf.random_normal([nNodes2, nLabels], stddev=0.01))
b3 = tf.Variable(tf.zeros([nLabels]))
y = tf.nn.softmax(tf.matmul(y2, W3) + b3)
y_ = tf.placeholder(dtype=tf.float32, shape=[None, 2])
cross_entropy = -1*tf.reduce_sum(y_* tf.log(y), axis=1)
loss = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
correct_prediction = tf.equal(tf.argmax(y_,axis=1), tf.argmax(y, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Your computational model knows nothing about "images", it only sees numbers. So if you trained it with pixels of values from 0-255, it has learned what "light" means, what "dark" means and how do these combine to give you whatever target value you try model.
And what you did by the normalization is that you forced all pixel to be 0-1. So as far as the model cares, they are all black as night. No surprise that it cannot extract anything meaningful.
You need to apply the same input normalization during both training and testing.
And speaking about normalization for NN models, it is better to normalize to zero mean.

Related

Not fully connected layer in tensorflow

I want to create a network where in the input layer nodes are just connected to some nodes in the next layer. Here is a small example:
My solution so far is that I set the weight of the edge between i1 and h1 to zero and after every optimization step I multiply the weights with a matrix (I call this matrix mask matrix) in which every entry is 1 except the entry of the weight of the edge between i1 and h1.
(See code below)
Is this approach right? Or does this have a affect on the GradientDescent? Is there another approach to create this kind of a network in TensorFlow?
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tf.enable_eager_execution()
model = tf.keras.Sequential([
tf.keras.layers.Dense(2, activation=tf.sigmoid, input_shape=(2,)), # input shape required
tf.keras.layers.Dense(2, activation=tf.sigmoid)
])
#set the weights
weights=[np.array([[0, 0.25],[0.2,0.3]]),np.array([0.35,0.35]),np.array([[0.4,0.5],[0.45, 0.55]]),np.array([0.6,0.6])]
model.set_weights(weights)
model.get_weights()
features = tf.convert_to_tensor([[0.05,0.10 ]])
labels = tf.convert_to_tensor([[0.01,0.99 ]])
mask =np.array([[0, 1],[1,1]])
#define the loss function
def loss(model, x, y):
y_ = model(x)
return tf.losses.mean_squared_error(labels=y, predictions=y_)
#define the gradient calculation
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return loss_value, tape.gradient(loss_value, model.trainable_variables)
#create optimizer an global Step
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
global_step = tf.train.get_or_create_global_step()
#optimization step
loss_value, grads = grad(model, features, labels)
optimizer.apply_gradients(zip(grads, model.variables),global_step)
#masking the optimized weights
weights=(model.get_weights())[0]
masked_weights=tf.multiply(weights,mask)
model.set_weights([masked_weights])
If you are looking for a solution for the specific example you provided, you can simply use tf.keras Functional API and define two Dense layers where one is connected to both neurons in the previous layer and the other one is only connected to one of the neurons:
from tensorflow.keras.layer import Input, Lambda, Dense, concatenate
from tensorflow.keras.models import Model
inp = Input(shape=(2,))
inp2 = Lambda(lambda x: x[:,1:2])(inp) # get the second neuron
h1_out = Dense(1, activation='sigmoid')(inp2) # only connected to the second neuron
h2_out = Dense(1, activation='sigmoid')(inp) # connected to both neurons
h_out = concatenate([h1_out, h2_out])
out = Dense(2, activation='sigmoid')(h_out)
model = Model(inp, out)
# simply train it using `fit`
model.fit(...)
The problem with your solution and some others suggested by other answers in this post is that they do not prevent training of this weight. They allow the gradient descent to train the non existent weight and then overwrite it retrospectively. This will result in a network that has a zero in this location as desired, but will negatively affect your training process as the back propagation calculation will not see the masking step as it is not part of a TensorFlow graph and so the gradient descent will follow a path which includes the assumption that this weight does have an affect on the outcome (it does not).
A better solution would be to include the masking step as a part of your TensorFlow graph, so that it can be factored into the gradient descent. Since the masking step is simply a element wise multiplication by your sparse, binary martix mask, you could just include the mask matrix as an elementwise matrix multiplicaiton in the graph definition using tf.multiply.
Sadly this means sying goodbye to the user friendly keras,layers methods and embracing a more nuts & bolts approach to TensorFlow. I can't see an obvious way to do it using the layers API.
See the implementation below, I have tried to provide comments explaining what is happening at each stage.
import tensorflow as tf
## Graph definition for model
# set up tf.placeholders for inputs x, and outputs y_
# these remain fixed during training and can have values fed to them during the session
with tf.name_scope("Placeholders"):
x = tf.placeholder(tf.float32, shape=[None, 2], name="x") # input layer
y_ = tf.placeholder(tf.float32, shape=[None, 2], name="y_") # output layer
# set up tf.Variables for the weights at each layer from l1 to l3, and setup feeding of initial values
# also set up mask as a variable and set it to be un-trianable
with tf.name_scope("Variables"):
w_l1_values = [[0, 0.25],[0.2,0.3]]
w_l1 = tf.Variable(w_l1_values, name="w_l1")
w_l2_values = [[0.4,0.5],[0.45, 0.55]]
w_l2 = tf.Variable(w_l2_values, name="w_l2")
mask_values = [[0., 1.], [1., 1.]]
mask = tf.Variable(mask_values, trainable=False, name="mask")
# link each set of weights as matrix multiplications in the graph. Inlcude an elementwise multiplication by mask.
# Sequence takes us from inputs x to output final_out, which will be compared to labels fed to placeholder y_
l1_out = tf.nn.relu(tf.matmul(x, tf.multiply(w_l1, mask)), name="l1_out")
final_out = tf.nn.relu(tf.matmul(l1_out, w_l2), name="output")
## define loss function and training operation
with tf.name_scope("Loss"):
# some loss defined as a function of graph output: final_out and labels: y_
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=final_out, labels=y_, name="loss")
with tf.name_scope("Train"):
# some optimisation strategy, arbitrary learning rate
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, name="optimizer_adam")
train_op = optimizer.minimize(loss, name="train_op")
# create session, initialise variables and train according to inputs and corresponding labels
# This should show that the values of the first layer weights change, but the one set to 0 remains at 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
print(initial_l1_weights.eval())
inputs = [[0.05, 0.10]]
labels = [[0.01, 0.99]]
ans = sess.run(train_op, feed_dict={"Placeholders/x:0": inputs, "Placeholders/y_:0": labels})
train_steps = 1
for i in range(train_steps):
initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
print(initial_l1_weights.eval())
Or use the answer provided by today for a keras friendly option.
You have multiple options here.
First, you could use the dynamic masking approach in your example. I believe this will work as expected since the gradients w.r.t. the masked-out parameters will be zero (the output is constant when you change the unused parameters). This approach is simple and it can be used even when your mask is not constant during the training.
Second, if you know beforehand which weights will be always zero, you can compose your weight matrix using tf.get_variable to get a submatrix, and then concatenate it with a tf.constant tensor, e.g.:
weights_sub = tf.get_variable("w", [dim_in, dim_out - 1])
zeros = tf.zeros([dim_in, 1])
weights = tf.concat([weights_sub, zeros], axis=1)
this example will make one column of your weight matrix to be always zero.
Finally, if your mask is more complex, you can use tf.get_variable on a flattened vector and then compose a tf.SparseTensor with the variable values on the used indices:
weights_used = tf.get_variable("w", [num_used_vars])
indices = ... # get your indices in a 2-D matrix of shape [num_used_vars, 2]
dense_shape = tf.constant([dim_in, dim_out]) # this is the final shape of the weight matrix
weights = tf.SparseTensor(indices, weights_used, dense_shape)
EDIT: This probably won't work in combination with Keras' set_weights method, as it expects Numpy arrays, not Tensors.

Neural Network predicting the same probability for each row

I've depeloped a 5 layers neural network for classification and it always predict the same probability for each row which ends predicting the same class.
I'm using Relu as activation fuction (if I use sigmoid or tanh it outputs NaN's) and tf.nn.sigmoid_cross_entropy_with_logits
The model will be this:
X, Y = create_placeholders(n_x, n_y)
parameters = initialize_parameters()
Z5 = forward_propagation(X, parameters)
cost = cost_function(Z5, Y)
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
And the prediction:
predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
The cost value drops until 0.24, in the first 100 epochs, but the values predicted don't change.
At the beginning I thought it was cause of an imbalance class problem but I fixed it (upsampling and wonsampling 1s and 0s) so it should not be the problem.
Thanks in advance

Relu function, return 0 and big numbers

Hi I'm new in neuralNetworks with tensorflow. I've taken a small fraction of the spaces365 dataset. I want to make a neural network to classify betweeen 10 places.
For that I've tried to do a small copy of a vgg network. The problem I have is that at the output of the softmax function I get a one-hot encoded array. Looking for problems in my code, I've realised that the output of relu functions are either 0 or a big number (around 10000).
I don't know where I'm wrong. Here it's my code:
def variables(shape):
return tf.Variable(2*tf.random_uniform(shape,seed=1)-1)
def layerConv(x,filter):
return tf.nn.conv2d(x,filter, strides=[1, 1, 1, 1], padding='SAME')
def maxpool(x):
return tf.nn.max_pool(x,[1,2,2,1],[1,2,2,1],padding='SAME')
weights0 = variables([3,3,1,16])
l0 = tf.nn.relu(layerConv(input,weights0))
l0 = maxpool(l0)
weights1 = variables([3,3,16,32])
l1 = tf.nn.relu(layerConv(l0,weights1))
l1 = maxpool(l1)
weights2 = variables([3,3,32,64])
l2 = tf.nn.relu(layerConv(l1,weights2))
l2 = maxpool(l2)
l3 = tf.reshape(l2,[-1,64*32*32])
syn0 = variables([64*32*32,1024])
bias0 = variables([1024])
l4 = tf.nn.relu(tf.matmul(l3,syn0) + bias0)
l4 = tf.layers.dropout(inputs=l4, rate=0.4)
syn1 = variables([1024,10])
bias1 = variables([10])
output_pred = tf.nn.softmax(tf.matmul(l4,syn1) + bias1)
error = tf.square(tf.subtract(output_pred,output),name='error')
loss = tf.reduce_sum(error, name='cost')
#TRAINING
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)
The input of the neural netWork is a normalized grayscale image of 256*256 pixels.
The learning Rate is 0.1 and the Batch Size is 32.
Thank you in advance!!
Essentially what reLu is :
def relu(vector):
vector[vector < 0] = 0
return vector
and softmax:
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
The output of softmax being a one-hot encoded array means there is a problem and that could be many things.
You can try reducing the learning_rate for starters, you can use 1e-4 / 1e-3 and check. If it doesn't work, try adding some regularization. I am also skeptical about your weight initialization.
Regulatization : This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting. - Regularization in ML
Link to : Build a multilayer neural network with L2 regularization in tensorflow
The problem you have is your weight initialization. NN are highly complicated non-convex optimization problems. Therefore, a good init is paramount to getting any good results. If you use ReLUs you should use the Initialization proposed by He et al. (https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf?spm=5176.100239.blogcont55892.28.pm8zm1&file=He_Delving_Deep_into_ICCV_2015_paper.pdf).
In Essence the initialization of your network should be initialized with iid gaussian distributed values with mean 0 and standard deviation as follows:
stddev = sqrt(2 / Nr_input_neurons)

Write a custom MSE loss function in Keras

I'm trying to create an image denoising ConvNet in Keras and I want to create my own loss function. I want it to take a noisy image as an input and to get the noise as an output. This loss function is pretty much like a MSE loss but which will make my network learn to remove the clean image and not the noise from the input noisy image.
The loss function I want to implement with y the noisy image, x the clean image and R(y) the predicted image:
I've tried to make it by myself but I don't know how to make the loss access to my noisy images since it changes all the time.
def residual_loss(noisy_img):
def loss(y_true, y_pred):
return np.mean(np.square(y_pred - (noisy_img - y_true), axis=-1)
return loss
Basically, what I need to do is something like this :
input_img = Input(shape=(None,None,3))
c1 = Convolution2D(64, (3, 3))(input_img)
a1 = Activation('relu')(c1)
c2 = Convolution2D(64, (3, 3))(a1)
a2 = Activation('relu')(c2)
c3 = Convolution2D(64, (3, 3))(a2)
a3 = Activation('relu')(c3)
c4 = Convolution2D(64, (3, 3))(a3)
a4 = Activation('relu')(c4)
c5 = Convolution2D(3, (3, 3))(a4)
out = Activation('relu')(c5)
model = Model(input_img, out)
model.compile(optimizer='adam', loss=residual_loss(input_img))
But if I try this, I get :
IndexError: tuple index out of range
What can I do ?
Since it's quite unusual to use the "input" in the loss function (it's not meant for that), I think it's worth saying:
It's not the role of the loss function to separate the noise.
The loss function is just a measure of "how far from right you are".
It's your model that will separate things, and the results you expect from your model are y_true.
You should use a regular loss, with X_training = noisy images and Y_training = noises.
That said...
You can create a tensor for noisy_img outside the loss function and keep it stored. All operations inside a loss function must be tensor functions, so use the keras backend for that:
import keras.backend as K
noisy_img = K.variable(X_training) #you must do this for each bach
But you must take batch sizes into account, this var being outside the loss function will need you to fit just one batch per epoch.
def loss(y_true,y_pred):
return K.mean(K.square(y_pred-y_true) - K.square(y_true-noisy_img))
Training one batch per epoch:
for batch in range(0,totalSamples,size):
noisy_img = K.variable(X_training[batch:size])
model.fit(X_training[batch:size],Y_training[batch:size], batch_size=size)
For using just a mean squared error, organize your data like this:
originalImages = loadYourImages() #without noises
Y_training = createRandomNoises() #without images
X_training = addNoiseToImages(originalImages,Y_training)
Now you just use a "mse", or any other built-in loss.
model.fit(X_training,Y_training,....)

tensorflow neural network model based on mnist with two outputs

I want to train a neural network with 12 inputs and 2 outputs. Here I have a simple tensorflow neural network that has two outputs. When I run the code it always consistently gives one output. That is, if the two outputs are labeled 'l1' and 'l2' the model always chooses 'l1' for its output. Is this a problem with my input (that it doesn't vary enough between 'l1' and 'l2') or is this a problem with choosing to use just two outputs? This is my question. If it's the latter, what do I do to remidy this? My model is supposed to detect skin tones in a photo. ('l1' = skin tone, 'l2' = not skin tone). I'm not sure this makes sense. It is adapted from the mnist example, but that code has ten outputs.
def nn_setup(self):
input_num = 4 * 3
mid_num = 3
output_num = 2
x = tf.placeholder(tf.float32, [None, input_num])
W_1 = tf.Variable(tf.zeros([input_num, mid_num]))
b_1 = tf.Variable(tf.zeros([mid_num]))
y_mid = tf.nn.softmax(tf.matmul(x,W_1) + b_1)
W_2 = tf.Variable(tf.zeros([mid_num, output_num]))
b_2 = tf.Variable(tf.zeros([output_num]))
y = tf.nn.softmax(tf.matmul(y_mid, W_2) + b_2)
y_ = tf.placeholder(tf.float32, [None, output_num])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.initialize_all_variables()
self.sess = tf.Session()
self.sess.run(init)
for i in range(1000):
batch_xs, batch_ys = self.get_nn_next_train()
self.sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
self.nn_test.images, self.nn_test.labels = self.get_nn_next_test()
print(self.sess.run(accuracy, feed_dict={x: self.nn_test.images, y_: self.nn_test.labels}))
There are a few "odd" things with your network, such as having softmax in your middle layer.
You have two major issues I can find with your implementation.
1. Weight initialisation
W_1 = tf.Variable(tf.zeros([input_num, mid_num]))
W_2 = tf.Variable(tf.zeros([mid_num, output_num]))
This will initialise the weights to be identical. So they will have identical gradient values, and be changed at each step identically.
Effectively by doing this you have created a network with one neuron in each layer (which is then copied to create the layer matrix that you use).
Use a different initial value, it is usual to take a small random matrix like this:
W_1 = tf.Variable(tf.random_normal([input_num, mid_num], stddev=0.5))
In general you will want a smaller standard deviation the larger your layers are. You don't have to do this for biases as well, but you can if you like.
This won't fix everything with your network, but it should at least start to calculate different values from input data and train a little.
2. Use of cost function
You have used this loss function incorrectly:
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(y, y_) )
. . . because softmax_cross_entropy_with_logits is designed to work with the input to softmax, not the output. So your cost function is incorrect. Instead you want to reference y_logits like this where currently you calculate y:
y_logits = tf.matmul(y_mid, W_2) + b_2
y = tf.nn.softmax(y_logits)
Then your cross-entropy would be
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(y_logits, y_) )
After the hidden layer initialization, you have calculated softmax of the logits for the hidden layer: y_mid = tf.nn.softmax(tf.matmul(x,W_1) + b_1). In a classification problem, softmax should be applied to the values obtained from the output layer. Try something like: y_mid = tf.nn.relu(tf.matmul(x,W_1) + b_1) to compute the activations from the hidden layer and see if your classification improves. If that does not solve your problem, check for the population of 'l1' and 'l2' in your training data. If your training data is highly skewed towards 'l1', you will always get 'l1' as the output. You may consider minority-oversampling or undersampling techniques to resolve population imbalance problem.

Categories