I have a model that is training (it goes through steps and epochs and evaluate losses) but weights are not training.
I trying to train a discriminator that would distinguish whether the image is synthetic or real. It's part of GANs model, I'm trying to build.
Basic structure is as followed:
I have two inputs:
1. image (could be real or synthetic) 2. labels (0 for real, 1 for synthetic)
Source Estimator is where I extract features from images. I had already trained the model and restored the weights and biases. These layers are frozen (not trainable).
def SourceEstimator(eye, name, trainable = True):
# source estimator and target representer shares the same structure.
# SE is not trainable, while TR is.
net = tf.layers.conv2d(eye, 32, 3, (1,1), padding='same', activation=tf.nn.leaky_relu, trainable=trainable, name=name+'_conv2d_1')
net = tf.layers.conv2d(net, 32, 3, (1,1), padding='same', activation=tf.nn.leaky_relu, trainable=trainable, name=name+'_conv2d_2')
net = tf.layers.conv2d(net, 64, 3, (1,1), padding='same', activation=tf.nn.leaky_relu, trainable=trainable, name=name+'_conv2d_3')
c3 = net
net = tf.layers.max_pooling2d(net, 3, (2,2), padding='same', name=name+'_maxpool_4')
net = tf.layers.conv2d(net, 80, 3, (1,1), padding='same', activation=tf.nn.leaky_relu, trainable=trainable, name=name+'_conv2d_5')
net = tf.layers.conv2d(net, 192, 3, (1,1), padding='same', activation=tf.nn.leaky_relu, trainable=trainable, name=name+'_conv2d_6')
c5 = net
return (c3, c5)
Discriminator is as followed:
def DiscriminatorModel(features, reuse=False):
with tf.variable_scope('discriminator', reuse=tf.AUTO_REUSE):
net = tf.layers.conv2d(features, 64, 3, 2, padding='same', kernel_initializer='truncated_normal', activation=tf.nn.leaky_relu, trainable=True, name='discriminator_c1')
net = tf.layers.conv2d(net, 128, 3, 2, padding='same', kernel_initializer='truncated_normal', activation=tf.nn.leaky_relu, trainable=True, name='discriminator_c2')
net = tf.layers.conv2d(net, 256, 3, 2, padding='same', kernel_initializer='truncated_normal', activation=tf.nn.leaky_relu, trainable=True, name='discriminator_c3')
net = tf.contrib.layers.flatten(net)
net = tf.layers.dense(net, units=1, activation=tf.nn.softmax, name='descriminator_out', trainable=True)
return net
Input goes to SourceEstimator model and extracts features (c3,c5).
Then c3 and c5 is concatenated along the channel axis and passed to discriminator model.
c3, c5 = CommonModel(self.left_eye, 'el', trainable=False)
c5 = tf.image.resize_images(c5, size=(self.config.img_size,self.config.img_size))
features = tf.concat([c3, c5], axis=3)
##---------------------------------------- DISCRIMINATOR ------------------------------------------##
with tf.variable_scope('discriminator'):
logit = DiscriminatorModel(features)
Finally losses and train_ops
##---------------------------------------- LOSSES ------------------------------------------##
with tf.variable_scope("discriminator_losses"):
self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logit, labels=self.label))
##---------------------------------------- TRAIN ------------------------------------------##
# optimizers
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
disc_optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate)
self.disc_op = disc_optimizer.minimize(self.loss, global_step=self.global_step_tensor, name='disc_op')
Train steps and epochs. I'm using 32 batch size. And data generator class to get the image each step.
def train_epoch(self):
num_iter_per_epoch = self.train_data.get_size() // self.config.get('batch_size')
loop = tqdm(range(num_iter_per_epoch))
for i in loop:
dloss = self.train_step(i)
loop.set_postfix(loss='{:05.3f}'.format(dloss))
def train_step(self, i):
el, label = self.train_data.get_batch(i)
## ------------------- train discriminator -------------------##
feed_dict = {
self.model.left_eye: el,
self.model.label: label
}
_, dloss = self.sess.run([self.model.disc_op, self.model.loss], feed_dict=feed_dict)
return dloss
While the model is going through steps and epochs, weight remains unchanged.
Loss fluctuates during the training steps, but loss for every epoch is the same. For example, if I don't shuffle the dataset each epoch, loss on the graph will follow the same pattern each epoch.
Which I think means the model recognizes the different losses, but is not updating parameters according to the losses.
Here are few other things I tried and did not help:
tried small and big learning rate (0.1 and 1e-8)
tried with SourceEstimator layers trainable==True
flipped labels (0 == synthetic, 1 == real)
increased kernel sizes and filter sizes in discriminator.
I've been stuck on this problem for a while now, I really need some insights. Thanks in advance.
------EDIT 1-----
def initialize_uninitialized(sess):
global_vars = tf.global_variables()
is_initialized= sess.run([tf.is_variable_initialized(var) for var in global_vars])
not_initialized_vars = [v for (v, f) in zip(global_vars, is_initialized) if not f]
# for var in not_initialized_vars: # only for testing
# print(var.name)
if len(not_initialized_vars):
sess.run(tf.variables_initializer(not_initialized_vars))
self.sess = tf.Session()
## inbetween here I create data generator, model and restore pretrained model.
self.initilize_uninitialized(self.sess)
for current_epoch in range(self.model.current_epoch_tensor.eval(self.sess), self.config.num_epochs, 1)
self.train_epoch() # included above
self.sess.run(self.model.increment_current_epoch_tensor)
I can see that you are calling minimize as well as loss function in session.run(). You should only call minimize() function. i.e. only self.model.disc_op which will internally call loss function.
Also, I cannot see your session initialization call anywhere. See to it that it is getting called only once.
Looking at your updated code, I can see that you are equating tf.is_variable_initialized() call to is_not_initialized. Thus, it is initializing those variables which are already initialized.
I never managed to find out what was wrong with the code.
My colleague suggested trying the same model in a different isolated environment, so I rewrote the code using Keras Library.
And now it's working. :/
We still don't know what exactly was wrong with the code above - I didn't change anything. I even used the same code for weight transferring and variable initialization.
If anyone ever faces similar problem, I would suggest trying the same model in a different environment.
Or if anyone knows what was wrong with the code above please share!
Related
I am trying to replicate (a way smaller version) of the AlphaGo Zero system. However, in the network model, I am having a problem. The loss function I am supposed to implement is the following:
Where:
z is the label (a real value between -1 and 1) of one of the two heads of network and v is this value predicted by the network.
pi is the label of a distribution probability over all actions and p is the distribution probability over all actions predicted by the network.
c is the L2 regularization parameter.
I pass to the network a list of channels (representing the game state) and an array (same size of the pi and p) representing which actions are indeed valid (by putting 1 if valid, 0 otherwise).
As you can see, the loss function uses both the target and the network predictions for the calculation. But after extensive search, when implementing my custom loss function, I can only pass as parameter y_true and y_pred even though I have two "y_true's" and two "y_pred's". I have tried using indexing to get those values but I'm pretty sure it is not working.
The modeling of the network and the custom loss function is in the code below:
def custom_loss(y_true, y_pred):
# I am pretty sure this does not work
output_prob_dist = y_pred[0]
output_value = y_pred[1]
label_prob_dist = y_true[0]
label_value = y_pred[1]
mse_loss = K.mean(K.square(label_value - output_value), axis=-1)
cross_entropy_loss = K.dot(K.transpose(label_prob_dist), output_prob_dist)
return mse_loss - cross_entropy_loss
def define_model():
"""Neural Network model implementation using Keras + Tensorflow."""
state_channels = Input(shape = (5,5,6), name='States_Channels_Input')
valid_actions_dist = Input(shape = (32,), name='Valid_Actions_Input')
conv = Conv2D(filters=10, kernel_size=2, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='Conv_Layer')(state_channels)
pool = MaxPooling2D(pool_size=(2, 2), name='Pooling_Layer')(conv)
flat = Flatten(name='Flatten_Layer')(pool)
# Merge of the flattened channels (after pooling) and the valid action
# distribution. Used only as input in the probability distribution head.
merge = concatenate([flat, valid_actions_dist])
#Probability distribution over actions
hidden_fc_prob_dist_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_1')(merge)
hidden_fc_prob_dist_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_2')(hidden_fc_prob_dist_1)
output_prob_dist = Dense(32, kernel_regularizer=regularizers.l2(0.0001), activation='softmax', name='Output_Dist')(hidden_fc_prob_dist_2)
#Value of a state
hidden_fc_value_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_1')(flat)
hidden_fc_value_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_2')(hidden_fc_value_1)
output_value = Dense(1, kernel_regularizer=regularizers.l2(0.0001), activation='tanh', name='Output_Value')(hidden_fc_value_2)
model = Model(inputs=[state_channels, valid_actions_dist], outputs=[output_prob_dist, output_value])
model.compile(loss=custom_loss, optimizer='adam', metrics=['accuracy'])
return model
# In the main method
model = define_model()
# ...
# MCTS routine to collect the data for the network input
# ...
x_train = [channels_input, valid_actions_dist_input]
y_train = [dist_probs_label, who_won_label]
model.fit(x_train, y_train, epochs=10)
In short, my question is: how do I correctly implement this custom loss function that uses both the network outputs and label values of the network?
I check their git and there is a lot going on; As showing in the equetion the final loss is the combination of three different losses, and the three networks are minimizing this final loss. Their code of losses is below:
# train ops
policy_cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=tf.stop_gradient(labels['pi_tensor'])))
value_cost = params['value_cost_weight'] * tf.reduce_mean(
tf.square(value_output - labels['value_tensor']))
reg_vars = [v for v in tf.trainable_variables()
if 'bias' not in v.name and 'beta' not in v.name]
l2_cost = params['l2_strength'] * \
tf.add_n([tf.nn.l2_loss(v) for v in reg_vars])
combined_cost = policy_cost + value_cost + l2_cost
You can refer this and make your changes accordingly.
I created a simple DCGAN with 6 layers and trained it on CelebA dataset (a portion of it containing 30K images).
I noticed my network generated images are dimmed looking and as the network trains more, the bright colors fade into dim ones!
here are some example:
This is how CelebA images look like (real images used for training) :
and these are the generated ones ,the number shows the epoch number(they were trained for 30 epochs ultimately) :
What is the cause for this phenomenon?
I tried to do all the general tricks concerning GANs, such as rescaling the input image between -1 and 1, or not using BatchNorm in the first layer of the Discriminator, and for the last layer of the Generator or
using LeakyReLU(0.2) in the Discriminator, and ReLU for the Generator. yet I have no idea why the images are this dim/dark!
Is this caused by simply less training images?
or is it caused by the networks deficiencies ? if so what is the source of such deficiencies?
Here are how these networks implemented :
def conv_batch(in_dim, out_dim, kernel_size, stride, padding, batch_norm=True):
layers = nn.ModuleList()
conv = nn.Conv2d(in_dim, out_dim, kernel_size, stride, padding, bias=False)
layers.append(conv)
if batch_norm:
layers.append(nn.BatchNorm2d(out_dim))
return nn.Sequential(*layers)
class Discriminator(nn.Module):
def __init__(self, conv_dim=32, act = nn.ReLU()):
super().__init__()
self.conv_dim = conv_dim
self.act = act
self.conv1 = conv_batch(3, conv_dim, 4, 2, 1, False)
self.conv2 = conv_batch(conv_dim, conv_dim*2, 4, 2, 1)
self.conv3 = conv_batch(conv_dim*2, conv_dim*4, 4, 2, 1)
self.conv4 = conv_batch(conv_dim*4, conv_dim*8, 4, 1, 1)
self.conv5 = conv_batch(conv_dim*8, conv_dim*10, 4, 2, 1)
self.conv6 = conv_batch(conv_dim*10, conv_dim*10, 3, 1, 1)
self.drp = nn.Dropout(0.5)
self.fc = nn.Linear(conv_dim*10*3*3, 1)
def forward(self, input):
batch = input.size(0)
output = self.act(self.conv1(input))
output = self.act(self.conv2(output))
output = self.act(self.conv3(output))
output = self.act(self.conv4(output))
output = self.act(self.conv5(output))
output = self.act(self.conv6(output))
output = output.view(batch, self.fc.in_features)
output = self.fc(output)
output = self.drp(output)
return output
def deconv_convtranspose(in_dim, out_dim, kernel_size, stride, padding, batchnorm=True):
layers = []
deconv = nn.ConvTranspose2d(in_dim, out_dim, kernel_size = kernel_size, stride=stride, padding=padding)
layers.append(deconv)
if batchnorm:
layers.append(nn.BatchNorm2d(out_dim))
return nn.Sequential(*layers)
class Generator(nn.Module):
def __init__(self, z_size=100, conv_dim=32):
super().__init__()
self.conv_dim = conv_dim
# make the 1d input into a 3d output of shape (conv_dim*4, 4, 4 )
self.fc = nn.Linear(z_size, conv_dim*4*4*4)#4x4
# conv and deconv layer work on 3d volumes, so we now only need to pass the number of fmaps and the
# input volume size (its h,w which is 4x4!)
self.drp = nn.Dropout(0.5)
self.deconv1 = deconv_convtranspose(conv_dim*4, conv_dim*3, kernel_size =4, stride=2, padding=1)
self.deconv2 = deconv_convtranspose(conv_dim*3, conv_dim*2, kernel_size =4, stride=2, padding=1)
self.deconv3 = deconv_convtranspose(conv_dim*2, conv_dim, kernel_size =4, stride=2, padding=1)
self.deconv4 = deconv_convtranspose(conv_dim, conv_dim, kernel_size =3, stride=2, padding=1)
self.deconv5 = deconv_convtranspose(conv_dim, 3, kernel_size =4, stride=1, padding=1, batchnorm=False)
def forward(self, input):
output = self.fc(input)
output = self.drp(output)
output = output.view(-1, self.conv_dim*4, 4, 4)
output = F.relu(self.deconv1(output))
output = F.relu(self.deconv2(output))
output = F.relu(self.deconv3(output))
output = F.relu(self.deconv4(output))
# we create the image using tanh!
output = F.tanh(self.deconv5(output))
return output
# testing nets
dd = Discriminator()
zd = np.random.rand(2,3,64,64)
zd = torch.from_numpy(zd).float()
# print(dd)
print(dd(zd).shape)
gg = Generator()
z = np.random.uniform(-1,1,size=(2,100))
z = torch.from_numpy(z).float()
print(gg(z).shape)
I think that the problem lies rather in the architecture itself and I would first consider the overall quality of generated images rather than their brightness or darkness. The generations clearly get better as you train for more epochs. I agree that the images get darker but even in the early epochs, the generated images are significantly darker than the ones in the training samples. (At least compared to ones that you posted.)
And now coming back to your architecture, 30k samples are actually enough to obtain very convincing results as achieved by state-of-the-art models in face generations. The generations do get better but they are still far away from being "very good".
I think the generator is definitely not strong enough and is the problematic part. (The fact that your generator loss skyrockets can also be a hint for this.) In the generator, all you do is just upsampling and upsampling. You should note that the transposed convolution is more like a heuristic and it does not provide much learnability. This is related to the nature of the problem. When you are doing convolutions, you have all the information and you are trying to learn to encode but in the decoder, you are trying to recover information that was previously lost :). So, in a way, it is harder to learn because the information taken as input is limited and lacking.
In fact, deterministic bilinear interpolation methods do perform similar or even better than transposed convolutions and these are purely based on scaling/extending with zero learnability. (https://arxiv.org/pdf/1707.05847.pdf)
To observe the transposed convolutions' limits, I suggest that you replace all the Transposedconv2d with UpSampling2D (https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling2D) and I claim that the results will not be much different. UpSampling2D is one of those deterministic methods that I mentioned.
To improve your generator, you can try to insert convolutional layers between upsampling layers. These layers would refine the features/images and correct some of the mistakes that occurred during the up-sampling. In addition to corrections, the next upsampling layer would take a more informative input. What I mean is to try a UNet like decoding that you can find in this link (https://arxiv.org/pdf/1505.04597.pdf). Of course, that would be a primary step to explore. There are many more GAN architectures that you can try and probably perform better.
My question is about restoring the Denoised Trained Model.
I have my network defined in the following way.
Conv1->relu1->Conv2->relu2->Conv3->relu3->Deconv1
The tf.variable_scope(name) is same as above.
Now I have my loss, optimizer and accuracy defined with tf.name_scope.
When I try to restore loss function, It will ask even for labels (which I don't have).
feed_dict={x:input, y:labels}
sess.run('loss',feed_dict)
Can anyone please help me understand how to test this? Which operation should I restore ?
Should I have to call all layers, pass the input and check the loss(MSE)?
I checked many examples but it seems to be all Classification problem and defining softmax with logits at last works.
Edit:
Below is my code and now it is easily visible how tf.name_scope and tf.variable_scope is defined. I feel I may have to bring whole layer to test new Image. Is that right?
def new_conv_layer(input, num_input_channels, filter_size, num_filters, name):
with tf.variable_scope(name):
# Shape of the filter-weights for the convolution
shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights (filters) with the given shape
weights = tf.Variable(tf.truncated_normal([filter_size, filter_size, num_input_channels, num_filters], stddev=0.5))
# Create new biases, one for each filter
biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))
filters = tf.Variable(tf.truncated_normal([filter_size, filter_size, num_input_channels, num_filters], stddev=0.5))
# TensorFlow operation for convolution
layer = tf.nn.conv2d(input=input, filter=filters, strides=[1,1,1,1], padding='SAME')
# Add the biases to the results of the convolution.
layer += biases
return layer, weights
def new_relu_layer(input, name):
with tf.variable_scope(name):
#TensorFlow operation for convolution
layer = tf.nn.relu(input)
return layer
def new_pool_layer(input, name):
with tf.variable_scope(name):
# TensorFlow operation for convolution
layer = tf.nn.max_pool(value=input, ksize=[1, 1, 1, 1], strides=[1, 1, 1, 1], padding='SAME')
return layer
def new_layer(inputs, filters,kernel_size,strides,padding, name):
with tf.variable_scope(name):
layer = tf.layers.conv2d_transpose(inputs=inputs, filters=filters , kernel_size=kernel_size, strides=strides, padding=padding, data_format = 'channels_last')
return layer
layer_conv1, weights_conv1 = new_conv_layer(input=yTraininginput, num_input_channels=1, filter_size=5, num_filters=32, name ="conv1")
layer_relu1 = new_relu_layer(layer_conv1, name="relu1")
layer_conv2, weights_conv2 = new_conv_layer(input=layer_relu1, num_input_channels=32, filter_size=5, num_filters=64, name ="conv2")
layer_relu2 = new_relu_layer(layer_conv2, name="relu2")
layer_conv3, weights_conv3 = new_conv_layer(input=layer_relu2, num_input_channels=64, filter_size=5, num_filters=128, name ="conv3")
layer_relu3 = new_relu_layer(layer_conv3, name="relu3")
layer_deconv1 = new_layer(inputs=layer_relu3, filters=1, kernel_size=[5,5] ,strides=[1,1] ,padding='same',name = 'deconv1')
layer_relu4 = new_relu_layer(layer_deconv1, name="relu4")
layer_conv4, weights_conv4 = new_conv_layer(input=layer_relu4, num_input_channels=1, filter_size=5, num_filters=128, name ="conv4")
layer_relu5 = new_relu_layer(layer_conv4, name="relu5")
layer_deconv2 = new_layer(inputs=layer_relu5, filters=1, kernel_size=[5,5] ,strides=[1,1] ,padding='same',name = 'deconv2')
layer_relu6 = new_relu_layer(layer_deconv2, name="relu6")
# Use Cross entropy cost function
with tf.name_scope("loss"):
cross_entropy = tf.losses.mean_squared_error(labels = xTraininglabel,predictions = layer_relu6)
# Use Adam Optimizer
with tf.name_scope("optimizer"):
optimizer = tf.train.AdamOptimizer(learning_rate=1e-6).minimize(loss = cross_entropy)
# Accuracy
with tf.name_scope("accuracy"):
accuracy = tf.image.psnr(a=layer_relu6,b=xTraininglabel,max_val=1.0)
Try to view the graph of your code on tensorboard, get the operation name from the last layer(in your case deconv4). Something like below image.
Try loading the tensor, using below code:
operation = graph.get_tensor_by_name("<operationname:0>")
This should work, as your layers are interconnected.
Let me know if this worked!
Operation Image
I understand that bias are required in small networks, to shift the activation function. But in the case of Deep network that has multiple layers of CNN, pooling, dropout and other non -linear activations, is Bias really making a difference? The convolutional filter is learning local features and for a given conv output channel same bias is used.
This is not a dupe of this link. The above link only explains role of bias in small neural network and does not attempt to explain role of bias in deep-networks containing multiple CNN layers, drop-outs, pooling and non-linear activation functions.
I ran a simple experiment and the results indicated that removing bias from conv layer made no difference in final test accuracy.
There are two models trained and the test-accuracy is almost same (slightly better in one without bias.)
model_with_bias,
model_without_bias( bias not added in conv layer)
Are they being used only for historical reasons?
If using bias provides no gain in accuracy, shouldn't we omit them? Less parameters to learn.
I would be thankful if someone who have deeper knowledge than me, could explain the significance(if- any) of these bias in deep networks.
Here is the complete code and the experiment result bias-VS-no_bias experiment
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64
graph = tf.Graph()
with graph.as_default():
# Input data.
tf_train_dataset = tf.placeholder(
tf.float32, shape=(batch_size, image_size, image_size, num_channels))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Variables.
layer1_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, num_channels, depth], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([depth]))
layer2_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, depth, depth], stddev=0.1))
layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
layer3_weights = tf.Variable(tf.truncated_normal(
[image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.truncated_normal(
[num_hidden, num_labels], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
# define a Model with bias .
def model_with_bias(data):
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer1_biases)
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
# define a Model without bias added in the convolutional layer.
def model_without_bias(data):
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv ) # layer1_ bias is not added
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv) # + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
# bias are added only in Fully connected layer(layer 3 and layer 4)
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
# Training computation.
logits_with_bias = model_with_bias(tf_train_dataset)
loss_with_bias = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_with_bias))
logits_without_bias = model_without_bias(tf_train_dataset)
loss_without_bias = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_without_bias))
# Optimizer.
optimizer_with_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_with_bias)
optimizer_without_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_without_bias)
# Predictions for the training, validation, and test data.
train_prediction_with_bias = tf.nn.softmax(logits_with_bias)
valid_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_valid_dataset))
test_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_test_dataset))
# Predictions for without
train_prediction_without_bias = tf.nn.softmax(logits_without_bias)
valid_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_valid_dataset))
test_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_test_dataset))
num_steps = 1001
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized')
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
batch_labels = train_labels[offset:(offset + batch_size), :]
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
session.run(optimizer_with_bias, feed_dict=feed_dict)
session.run(optimizer_without_bias, feed_dict = feed_dict)
print('Test accuracy(with bias): %.1f%%' % accuracy(test_prediction_with_bias.eval(), test_labels))
print('Test accuracy(without bias): %.1f%%' % accuracy(test_prediction_without_bias.eval(), test_labels))
Output:
Initialized
Test accuracy(with bias): 90.5%
Test accuracy(without bias): 90.6%
Biases are tuned alongside weights by learning algorithms such as
gradient descent. biases differ from weights is that they are
independent of the output from previous layers. Conceptually bias is
caused by input from a neuron with a fixed activation of 1, and so is
updated by subtracting the just the product of the delta value and
learning rate.
In a large model, removing the bias inputs makes very little difference because each node can make a bias node out of the average activation of all of its inputs, which by the law of large numbers will be roughly normal. At the first layer, the ability for this to happens depends on your input distribution. On a small network, of course you need a bias input, but on a large network, removing it makes almost no difference.
Although in a large network it has no difference, it still depends on network architecture. For instance in LSTM:
Most applications of LSTMs simply initialize the LSTMs with small
random weights which works well on many problems. But this
initialization effectively sets the forget gate to 0.5. This
introduces a vanishing gradient with a factor of 0.5 per timestep,
which can cause problems whenever the long term dependencies are
particularly severe. This problem is addressed by simply initializing the
forget gates bias to a large value such as 1 or 2. By doing so, the
forget gate will be initialized to a value that is close to 1,
enabling gradient flow.
See also:
The rule of bias in Neural network
What is bias in Neural network
An Empirical Exploration of Recurrent Network Architectures
In most networks you have a batchnorm layer after the conv layer, which has a bias. So if you have a batchnorm layer there is no gain. See:
Can not use both bias and batch normalization in convolution layers
Otherwise, from a math perspective you are learning different functions. However, it turns out that in particular if you have a very complex network for a simple problem, you might achieve almost the same thing without biases than with biases but ending up using more parameters. In my experience, using a factor of 2-4 more parameters than needed rarely hurts performance in deep learning - in particular if you regularize. So, it is hard to notice any difference. However, you might try to use few channels (I don't think depth of the network matters as much as number of channels of the convolution) and see if bias make a difference. I would guess so.
I want to train a neural network with a genetic algorithm. I use the tflearn library to create my network. When I predict the outcome of my network one time everything is fine, however when I create a loop where in every iteration I create a new model of the network I get errors. In the first iteration everything still works, however in the second iteration I have an error stating:
Cannot feed value of shape (145, 5, 1) for Tensor u'InputData/X:0',
which has shape '(?, 145, 5, 1)
I even tried to clear all my variables at the end of the loop, recreate the class and run the function again, but during the second iteration I still had the error.
This is my main:
for x in range(0, 5):
model = self.build_model()
result_ETH = self.calculate_model_performance(model)
this is my build_model
input_layer = input_data(shape = [None, self.input_length, self.input_types, 1])
fc1 = fully_connected(input_layer, self.neurons_layer_1, activation = 'relu', trainable = False, name = "full1")
fc2 = fully_connected(fc1, self.neurons_layer_2, activation = 'relu', trainable = False, name = "full2")
fc3 = fully_connected(fc2, self.neurons_layer_3, activation = 'relu', trainable = False, name = "full3")
network = fully_connected(fc3, 2, activation = 'softmax')
model = tflearn.DNN(network, clip_gradients=0., tensorboard_verbose=0)
return model
The part of calculate_model_performance giving the error
predictset = self.create_predict_set(timepoint, ETH)
reshaped = predictset.reshape([-1,self.look_back+1,5,1])
prediction = model.predict(reshaped)
When I print the shape of variable reshaped, it's the same in first and second iteration
Found out myself I needed to add
with tf.Graph().as_default():