Tensorflow: zero gradients - python

Essentially, I'm running into an issue where, when I try to update some of the variables within a scope (i.e. those in the discriminator scope), the gradient is always zero (I can verify this by printing them out after computing them).
This is confusing because I don't see why the loss isn't being propagated through.
My code stubs are as follows:
def build_model(self):
# Process inputs
self.inputs = tf.placeholder(tf.float32, shape = self.input_shape, name = "input")
self.is_training = tf.placeholder(tf.bool, name = "is_training")
self.targets = tf.placeholder(tf.float32, shape = self.output_shape, name = "targets")
self.target_p = tf.placeholder(tf.float32, shape = self.patient_shape, name = "targets_patients")
self.target_s = tf.placeholder(tf.float32, shape = self.sound_shape, name = "targets_sounds")
# Process outputs
self.encoded_X = self.encoder(self.inputs)
self.posteriors = self.predictor(self.encoded_X)
self.patient_predict, self.sound_predict = self.discriminator(self.encoded_X, tf.expand_dims(self.posteriors, axis = -1))
self.patient_predict_id = tf.argmax(tf.nn.softmax(self.patient_predict, axis = -1))
self.sound_predict_id = tf.argmax(tf.nn.softmax(self.sound_predict, axis = -1))
# Process losses
self.segment_loss = tf.losses.mean_squared_error(self.targets, self.posteriors)
self.patient_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits = self.patient_predict, labels = self.target_p)
self.sound_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits = self.sound_predict, labels = self.target_s)
self.disc_loss = self.patient_loss + self.sound_loss
self.combined_loss = self.segment_loss - self.lambda_param*(self.disc_loss)
self.extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(self.extra_update_ops):
predictor_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="predictor")
encoder_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="encoder")
discrim_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="discriminator")
self.discrim_train = tf.train.AdamOptimizer(0.001).minimize(tf.reduce_mean(-1*self.combined_loss), var_list=discrim_vars)
self.predict_train = tf.train.AdamOptimizer(0.001).minimize(tf.reduce_mean(self.combined_loss), var_list=predictor_vars)
self.encode_train = tf.train.AdamOptimizer(0.001).minimize(tf.reduce_mean(self.combined_loss), var_list=encoder_vars)
As you can see self.combined_loss must depend on self.patient_loss which is found from self.discriminator.
My code for discriminator() is here:
def discriminator(self, encoded_X, posterior, reuse = False):
with tf.variable_scope("discriminator") as scope:
if reuse: scope.reuse_variables()
print('\n############## \nDiscriminator\n')
print('Discriminator encode input-shape: ', self.encode_shape)
print('Discriminator posterior input-shape: ', self.output_shape, ' (Expanded to correct size)')
inputs = tf.concat([encoded_X, posterior], axis = -2)
tf.stop_gradient(posterior)
print('Stacked input shape: ', inputs.get_shape())
h = tf.layers.conv2d(inputs, 10, (5, 2), padding = 'SAME', activation = tf.nn.relu)
h = tf.layers.max_pooling2d(h, (5, 2), (5, 2))
print('Layer 1: ', h.get_shape())
h = tf.layers.conv2d(h, 5, (5, 2), padding = 'SAME', activation = tf.nn.relu)
h = tf.squeeze(tf.layers.max_pooling2d(h, (3, 2), (3, 2)), axis = -2)
h = tf.layers.flatten(h)
print('Layer 2: ', h.get_shape())
h_p = tf.layers.dense(h, self.patient_shape[-1])
h_s = tf.layers.dense(h, self.sound_shape[-1])
print('Discriminator patient o/p shape: ', h_p.get_shape(), ' Expected shape: ', self.patient_shape)
print('Discriminator sound o/p shape: ', h_s.get_shape(), ' Expected shape: ', self.sound_shape)
return h_p, h_s
tf.stop_gradient is called because I do not want the gradient to flow through from the discriminator to the model that produces the posteriors.
Finally, I call my model here:
feed_dict= {
self.inputs: X,
self.targets: y,
self.target_p: y_p_oh,
self.target_s: y_s_oh,
self.is_training: True
}
posteriors, cost_loss, disc_loss, patient_id_pred , sound_id_pred, _ , _ = self.sess.run([
self.posteriors,
self.combined_loss,
self.disc_loss,
self.patient_predict_id,
self.sound_predict_id,
self.predict_train,
self.encode_train,
], feed_dict = feed_dict)
j = 0
print('Combined-loss: ', np.mean(cost_loss), 'Discriminator-loss: ', np.mean(disc_loss))
while np.mean(disc_loss) > entropy_cutoff:
disc_loss, _ = self.sess.run([self.disc_loss, self.discrim_train], feed_dict = feed_dict)
j+=1
print(' Inner loop iteration: ', j, ' Loss: ', np.around(np.mean(disc_loss), 5), ' Cutoff: ', np.around(entropy_cutoff, 5), end = '\r')
print("")
From my investigation, the code gets stuck on the while-loop as the train-step minimisation (to effectively maximise the combined_loss) simply gives zero gradients for all the variables within the discriminator scope. Why would this be occurring?
EDIT: I think I've localised the error to:
self.disc_loss = self.patient_loss + self.sound_loss
self.combined_loss = self.segment_loss - self.lambda_param * self.disc_loss
If I apply the minimisation on self.disc_loss, it works fine. But when I apply the minimisation on self.combined_loss, the operation breaks and the gradients zero. Why would this be the case?
EDIT: Tensorboard Graph

Related

Keras - ValueError: Could not interpret loss function identifier

I am trying to build the autoencoder structure detailed in this IEEE article. The autoencoder uses a separable loss function where it is required that I create a custom loss function for the "cluster loss" term of the separable loss function as a function of the average output of the encoder. I create my own layer called RffConnected that calculates the cluster loss and utilizes the add_loss method. Otherwise, this RffConnected layer should act as just a normal deep layer.
Here are my relevant code snippets:
import matplotlib.pyplot as plot
from mpl_toolkits.axes_grid1 import ImageGrid
import numpy as np
import math
from matplotlib.figure import Figure
import tensorflow as tf
import keras
from keras import layers
import random
import time
from os import listdir
#loads data from a text file
def loadData(basePath, samplesPerFile, sampleRate):
real = []
imag = []
fileOrder = []
for file in listdir(basePath):
if((file != "READ_ME") and ((file != "READ_ME.txt"))):
fid = open(basePath + "\\" + file, "r")
fileOrder.append(file)
t = 0
sampleEvery = samplesPerFile / sampleRate
temp1 = []
temp2 = []
times = []
for line in fid.readlines():
times.append(t)
samples = line.split("\t")
temp1.append(float(samples[0]))
temp2.append(float(samples[1]))
t = t + sampleEvery
real.append(temp1)
imag.append(temp2)
fid.close()
real = np.array(real)
imag = np.array(imag)
return real, imag, times, fileOrder
#####################################################################################################
#Breaks up and randomizes data
def breakUpData(real, imag, times, numPartitions, basePath):
if(len(real) % numPartitions != 0):
raise ValueError("Error: The length of the dataset must be divisible by the number of partitions.")
newReal = []
newImag = []
newTimes = []
fileOrder = listdir(basePath)
dataFiles = []
interval = int(len(real[0]) / numPartitions)
for i in range(0, interval):
newTimes.append(times[i])
for i in range(0, len(real)):
tempI = []
tempQ = []
for j in range(0, len(real[0])):
tempI.append(real[i, j])
tempQ.append(imag[i, j])
if((j + 1) % interval == 0):
newReal.append(tempI)
newImag.append(tempQ)
#fileName = fileOrder[i][0: fileOrder[i].find("_") + 3]
dataFiles.append(fileOrder[i])
tempI = []
tempQ = []
#randomizes the broken up dataset and the file list
for i in range(0, len(newReal)):
r = random.randint(0, len(newReal) - 1)
tempReal = newReal[i]
tempImag = newImag[i]
newReal[i] = newReal[r]
newImag[i] = newImag[r]
newReal[r] = tempReal
newImag[r] = tempImag
tempFile = dataFiles[i]
dataFiles[i] = dataFiles[r]
dataFiles[r] = tempFile
#return np.array(newReal), np.array(newImag), newTimes, dataFiles
return newReal, newImag, newTimes, dataFiles
#####################################################################################################
#custom loss layer for the RffAe-S that calculates the clustering loss term
class RffConnected(layers.Layer):
def __init__(self, output_dim, batchSize, beta, alpha):
super(RffConnected, self).__init__()
# self.total = tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False)
#array = np.zeros(output_dim)
self.iters = 0.0
self.beta = beta
self.alpha = alpha
self.batchSize = batchSize
self.output_dim = output_dim
self.sum = tf.zeros(output_dim, tf.float64)
self.moving_average = tf.zeros(output_dim, tf.float64)
self.clusterloss = tf.zeros(output_dim, tf.float64)
self.sum = tf.cast(self.sum, tf.float32)
self.moving_average = tf.cast(self.moving_average, tf.float32)
self.clusterloss = tf.cast(self.clusterloss, tf.float32)
# self.sum = keras.Input(shape=(self.output_dim,))
# self.moving_average = keras.Input(shape=(self.output_dim,))
# self.clusterloss = keras.Input(shape=(self.output_dim,))
def build(self, input_shape):
self.kernel = self.add_weight(name = 'kernel', \
shape = (int(input_shape[-1]), self.output_dim), \
initializer = 'normal', trainable = True)
#self.kernel = tf.cast(self.kernel, tf.float64)
super(RffConnected, self).build(int(input_shape[-1]))
def call(self, inputs):
#keeps track of training epochs
self.iters = self.iters + 1
#inputs = tf.cast(inputs, tf.float64)
#where this custom layer acts as a normal layer- the loss then uses this
#calc = keras.backend.dot(inputs, self.kernel)
calc = tf.matmul(inputs, self.kernel)
#cumulative sum of deep encoded features
#self.sum = state_ops.assign(self.sum, tf.reshape(tf.math.add(self.sum, calc), tf.shape(self.sum)))
#self.sum = tf.ops.state_ops.assign(self.sum, tf.math.add(self.sum, calc))
#self.sum.assign_add(calc)
self.sum = tf.math.add(self.sum, calc)
#calculate the moving average and loss if we have already trained one batch
if(self.iters >= self.batchSize):
self.moving_average = tf.math.divide(self.sum, self.iters)
self.clusterloss = tf.math.exp(\
tf.math.multiply(-1 * self.beta, tf.math.reduce_sum(tf.math.square(tf.math.subtract(inputs, self.moving_average)))))
#self.add_loss(tf.math.multiply(self.clusterloss, self.alpha))
self.add_loss(self.clusterloss.numpy() * self.alpha)
return calc
#####################################################################################################
def customloss(y_true, y_pred):
loss = tf.square(y_true - y_pred)
print(loss)
return loss
#####################################################################################################
realTraining = np.array(real[0:2200])
realTesting = np.array(real[2200:-1])
imagTraining = np.array(imag[0:2200])
imagTesting = np.array(imag[2200:-1])
numInputs = len(realTraining[0])
i_sig = keras.Input(shape=(numInputs,))
q_sig = keras.Input(shape=(numInputs,))
iRff = tf.keras.layers.experimental.RandomFourierFeatures(numInputs, \
kernel_initializer='gaussian', scale=9.0)(i_sig)
rff1 = keras.Model(inputs=i_sig, outputs=iRff)
qRff = tf.keras.layers.experimental.RandomFourierFeatures(numInputs, \
kernel_initializer='gaussian', scale=9.0)(q_sig)
rff2 = keras.Model(inputs=q_sig, outputs=qRff)
combined = layers.Concatenate()([iRff, qRff])
combineRff = tf.keras.layers.experimental.RandomFourierFeatures(4 * numInputs, \
kernel_initializer='gaussian', scale=10.0)(combined)
preprocess = keras.Model(inputs=[iRff, qRff], outputs=combineRff)
#print(realTraining[0:5])
preprocessedTraining = preprocess.predict([realTraining, imagTraining])
preprocessedTesting = preprocess.predict([realTesting, imagTesting])
################## Entering Encoder ######################
encoderIn = keras.Input(shape=(4*numInputs,))
#connected1 = layers.Dense(100, activation="sigmoid")(encoderIn)
clusterLossLayer = RffConnected(100, 30, 1.00, 100.00)(encoderIn)
#clusterLossLayer = myRffConnected(256)(connected1)
encoder = keras.Model(inputs=encoderIn, outputs=clusterLossLayer)
################## Entering Decoder ######################
connected2 = layers.Dense(125, activation="sigmoid")(clusterLossLayer)
relu1 = layers.ReLU()(connected2)
dropout = layers.Dropout(0.2)(relu1)
reshape1 = layers.Reshape((25, 5, 1))(dropout)
bn1 = layers.BatchNormalization()(reshape1)
trans1 = layers.Conv2DTranspose(1, (4, 2))(bn1)
ups1 = layers.UpSampling2D(size=(2, 1))(trans1)
relu2 = layers.ReLU()(ups1)
bn2 = layers.BatchNormalization()(relu2)
trans2 = layers.Conv2DTranspose(1, (4, 2))(bn2)
ups2 = layers.UpSampling2D(size=(2, 1))(trans2)
relu3 = layers.ReLU()(ups2)
bn3 = layers.BatchNormalization()(relu3)
trans3 = layers.Conv2DTranspose(1, (5, 2))(bn3)
ups3 = layers.UpSampling2D(size=(2, 1))(trans3)
relu4 = layers.ReLU()(ups3)
bn4 = layers.BatchNormalization()(relu4)
trans4 = layers.Conv2DTranspose(1, (7, 1))(bn4)
reshape2 = layers.Reshape((4*numInputs, 1, 1))(trans4)
autoencoder = keras.Model(inputs=encoderIn, outputs=reshape2)
encoded_input = keras.Input(shape=(None, 100))
decoder_layer = autoencoder.layers[-1]
#autoencoder.summary()
autoencoder.compile(optimizer='adam', loss=[autoencoder.losses[-1], customloss], metrics=['accuracy', 'accuracy'])
autoencoder.fit(preprocessedTraining, preprocessedTraining, epochs=100, batch_size=20, shuffle=True, validation_data=(preprocessedTesting, preprocessedTesting))
It seems like it runs for two training epochs then it gives me an error. I end up getting this error when I run it:
ValueError: Could not interpret loss function identifier: Tensor("rff_connected_137/Const:0", shape=(100,), dtype=float32)
I've already spent a considerable amount of time debugging this thing, although if you spot any more errors I would appreciate a heads-up. Thank you in advance.
According to the documentation of the keras Keras Model Training-Loss, the 'loss' attribute can take the value of float tensor (except for the sparse loss functions returning integer arrays) with a specific shape.
If it is necessary to combine two loss functions, it would be better to perform mathematical calculations within your custom loss function to return an output of float tensor. This reference might be a help Keras CustomLoss definition.

Is there a way to fix param.grad = none in pytorch model?

I am working on the Point Cloud Registration Network(PCRNET) and I have a issue with the training process. For that I wrote a pytorch model that consists of 5 convolutional layers and 5 fully connected layers. My custom loss output changes with each new initialization of the network but then for each epoch I obtain the same values for each batch. Therefore no training is happening. I narrowed the error down to the fact that no gradients are being computed.
Here is my network and forward pass
class pcrnetwork(nn.Module):
def __init__(self,):
# This is the network that gets initialized with every new instance
super().__init__()
self.conv1 = nn.Conv1d(3,64,1, padding="valid")
self.conv2 = nn.Conv1d(64,64,1,padding="valid")
self.conv3 = nn.Conv1d(64,64,1,padding="valid")
self.conv4 = nn.Conv1d(64,128,1,padding="valid")
self.conv5 = nn.Conv1d(128,1024,1,padding="valid")
self.fc1 = nn.Linear(2048,1024)
self.fc2 = nn.Linear(1024,512)
self.fc3 = nn.Linear(512,512)
self.fc4 = nn.Linear(512,256)
self.fc5 = nn.Linear(256,6)
self.bn1 = nn.BatchNorm1d(64)
self.bn2 = nn.BatchNorm1d(64)
self.bn3 = nn.BatchNorm1d(64)
self.bn4 = nn.BatchNorm1d(128)
self.bn6 = nn.BatchNorm1d(1024)
self.bn7 = nn.BatchNorm1d(512)
self.bn8 = nn.BatchNorm1d(512)
self.bn9 = nn.BatchNorm1d(256)
def forward1(self,input,input1,points):
point_cloud = torch.cat((input,input1),dim=2)
net = Func.relu(self.bn1(self.conv1(point_cloud)))
net = Func.relu(self.bn2(self.conv2(net)))
net = Func.relu(self.bn3(self.conv3(net)))
net = Func.relu(self.bn4(self.conv4(net)))
net = Func.relu(self.conv5(net))
net_s = net[:,:,0:points]
net_t = net[:,:,points:points*2]
pool = nn.MaxPool1d(net_s.size(-1))(net_s)
pool2 = nn.MaxPool1d(net_t.size(-1))(net_t)
global_feature = torch.cat((pool,pool2),1)
#global_feature = torch.squeeze(global_feature,dim=2)
global_feature = torch.flatten(global_feature,start_dim=1)
# fully connected part
net = Func.relu(self.bn6(self.fc1(global_feature)))
net = Func.relu(self.bn7(self.fc2(net)))
net = Func.relu(self.bn8(self.fc3(net)))
net = Func.relu(self.bn9(self.fc4(net)))
net = Func.relu(self.fc5(net))
pose = net
output = appply_transformation(torch.transpose(input,1,2),pose)
return output
my training loop looks like this:
def train1():
losss = []
for epoch in range(1):
model.train()
total_loss = 0.0
#poses = []
for idx, data in enumerate(train_loader,0):
x = data["source"] # shape= [32,2048,3]
y = data["target"]
x = torch.transpose(x,1,2)
x = x.to(device)
y = torch.transpose(y,1,2)
y = y.to(device)
optimizer.zero_grad()
output = model.forward1(x,y,2048)
y = torch.transpose(y,1,2)
loss = og_chamfer1(y,output)
loss.backward()
optimizer.step()
print(loss.item())
And finally here is the code for my loss function. The idea here is to let the network calculate 6 parameters(3 rotational, 3 translational) that get fed into my apply transformation function. Then my actual loss(=Chamfer Distance) is being calculated on the transformed source point cloud and the target point cloud.
def dist_vec(source, targ):
#AB = torch.matmul(targ,torch.transpose(source,1,2))
AB = torch.matmul(targ,torch.transpose(source,0,1))
#print("ab hat die shape",AB.shape)
AA = torch.sum(torch.square(targ),1)
#AA = AA[:,:,None]
#print("AA hat die shape", AA.shape)
BB = torch.sum(torch.square(source),1)
#BB = BB[:,:,None]
dist_matrix = torch.transpose((BB - 2 * AB), 0,1) + AA
return dist_matrix
def og_chamfer1(sourc,targ): # source =[32,2048,3]
batch_loss1 = torch.zeros(size=(len(sourc),))
batch_loss = []
#print(len(source))
for i in range(len(sourc)):
dist = dist_vec(sourc[i],targ[i])
#print("dist hat die shape", dist.shape)
min_x_val, min_x_idx = torch.min(dist, axis=0)
#print("this is minx", min_x_val)
#min_x = torch.tensor(min_x[0])
min_y_val, min_y_idx = torch.min(dist,axis=1)
#print("this is min y", min_y_val)
mean = torch.mean(min_x_val) + torch.mean(min_y_val)
batch_loss1[i] = mean
#batch_loss.append(mean)
#print(batch_loss)
#print(len(batch_loss))
#batch_loss_total = sum(batch_loss)/len(sourc)
#print(mean.shape)
batch_loss1 = torch.mean(batch_loss1)
return batch_loss1
all of these functions should work, I just post them for reference. I think the problem for para.grad=None lays somewhere in my apply transformation function:
def rotate_cloud_by_angle_z(input, rotation_angle):
# the input here should have shape=(num.of points x 3)
# dtype for the rotation matrix needs to be set to float64
cosval = torch.cos(rotation_angle) # DONT USE TF.MATH.COS BECAUSE U GET A TENSOR NOT A NUMBER
sinval = torch.sin(rotation_angle)
#print("sinval hat shape:",sinval.shape)
#cosval = torch.from_numpy(cosval)
#sinval = torch.from_numpy(sinval)
rotation_matrix =torch.tensor([[cosval.item(),-sinval.item(),0],[sinval.item(),cosval.item(),0],[0,0,1]],dtype=torch.float32, requires_grad=False)
rotation_matrix = rotation_matrix.to(device)
product = torch.matmul(input, rotation_matrix)
return product
def appply_transformation(datas,poses):
transformed_data = datas
#print("poses hat die shape", poses.shape)
for i in range(datas.shape[0]):
#print("poses[i,5] hat shape:", poses[i,5])
#print("poses hat shape:", poses.shape)
transformed_data[i,:,:] = rotate_cloud_by_angle_z(transformed_data[i,:,:].clone(),poses[i,5])
#print(poses.shape)
#print("poses[i,5] hat shape:", poses[i,5])
transformed_data[i,:,:] = rotate_cloud_by_angle_y(transformed_data[i,:,:].clone(),poses[i,4])
transformed_data[i,:,:] = rotate_cloud_by_angle_x(transformed_data[i,:,:].clone(),poses[i,3])
transformed_data[i,:,:] = translation(transformed_data[i,:,:].clone(),torch.tensor([poses[i,0],poses[i,1],poses[i,2]],requires_grad=False).to(device))
return transformed_data
on https://discuss.pytorch.org/t/model-param-grad-is-none-how-to-debug/52634/3 I could find out that one shouldn't use .item() or rewrapping of tensors like x = torch.tensor(x) but essentially I don't know how to change my apply transformation function in such that the gradient calculation works.
If anyone has any tips on that I would be super grateful!

lstm rnn from scratch giving bad results

This is my my code but the output is really bad.
and by that i mean the values are way off than what they are supposed to be, the testing error is very high. if i de-normalize the values and compare the difference, it's massive.
I have two questions:
1) Can anyone tell me why this is happening and what i can do to make it perform better?
2) When the values goes through so many functions, how do i get the output back to the original format.
I am new to this and jumped into a complex topic immediately, so i know my code isn't the best, if you could tell me how to improve that would be great as well! anyways, so please bear with me!
The data i used was a list of multiples of two.
ps: when i used the tensorflow models like dynamic_rnn() the output i got was accurate and also i just had to denormalize the output to get the number in the original format( correct size that is), how will just denormalizing it get the output, i dont get that!!
Thanks!
# LSTM [ Many to One ]
# START
# imports
import csv
import numpy as np
import tensorflow as tf
import sys
import os
import json
from random import shuffle
from tensorflow.python import debug as tf_debug
# CALCULATE ALL POSSIBLE BATCH SIZES
def calculate_batch_sizes(n_train):
batch_sizes = []
for i in range(2, int(n_train/2)):
if n_train % i == 0 and n_train / i > 1:
batch_sizes.append(i)
return batch_sizes
def de_normalize(value, m1, m2):
return (value*(m1-m2)) + m2
class lstm_network():
name = "lstm_"
# initialization function
def __init__(self, config_params):
self.sequence_length = config_params["sequence_length"]
self.batch_size = config_params["batch_size"]
self.hidden_layers_size = config_params["hidden_layers_size"]
self.data_path = config_params["data_path"]
self.n_epochs = config_params["no_of_epochs"]
self.learning_rate = config_params["learning_rate"]
self.w_igate, self.w_fgate, self.w_ogate, self.w_cgate = tf.get_variable('w_igate', shape = [self.sequence_length, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer()), tf.get_variable('w_fgate', shape = [self.sequence_length, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer()), tf.get_variable('w_ogate', shape = [self.sequence_length, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer()), tf.get_variable('w_cgate', shape = [self.sequence_length, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer())
self.u_igate, self.u_fgate, self.u_ogate, self.u_cgate = tf.get_variable('u_igate', shape = [self.hidden_layers_size, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer()), tf.get_variable('u_fgate', shape = [self.hidden_layers_size, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer()), tf.get_variable('u_ogate', shape = [self.hidden_layers_size, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer()), tf.get_variable('u_cgate', shape = [self.hidden_layers_size, self.hidden_layers_size], initializer = tf.contrib.layers.xavier_initializer())
self.outputs = [0.0] * self.batch_size
self.testing_loss = float(0)
self.training_loss = float(0)
self.ft, self.ct, self._ct, self.it = [0.0]*(self.hidden_layers_size), [0.0]*(self.hidden_layers_size), [0.0]*(self.hidden_layers_size), [0.0]*(self.hidden_layers_size)
self.ot, self.ht, self.ct_prev, self.ht_prev = [0.0]*(self.hidden_layers_size), [0.0]*(self.hidden_layers_size), np.array([0.0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size), np.array([0.0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size)
self.w_output_layer = tf.get_variable('w_output_layer', shape = [self.hidden_layers_size, 1], initializer = tf.contrib.layers.xavier_initializer())
print("\n Object of class lstm_network initialized with the given configuration")
# print values function
def print_model_info(self):
print("\n\n\n\t\t MODEL INFORMATION\n\n")
print("\n Weights of the LSTM layer: ")
print("\n\n input Gate Weights: \n w: ", self.w_igate,"\n u: ", self.u_igate)
print("\n\n Forget Gate Weights: \n w: ", self.w_fgate,"\n u: ", self.u_fgate)
print("\n\n Context Gate Weights: \n w: ", self.w_cgate,"\n u: ", self.u_cgate)
print("\n\n Output Gate Weights: \n w: ", self.w_ogate,"\n u: ", self.u_ogate)
print("\n\n Average loss while training: ", self.training_loss)
print("\n\n Average loss while testing: ", self.testing_loss)
# loading function
def load_data(self):
with open(self.data_path, 'r') as data_file:
data_reader = csv.reader(data_file, delimiter = ',')
self.data = [float(row[1]) for row in data_reader]
self.data_max, self.data_min, self.n_data = float(max(self.data)), float(min(self.data)), len(self.data)
for i in range(len(self.data)):
self.data[i] = float( (self.data[i]-self.data_min)/(self.data_max-self.data_min) )
self.data_x = [ self.data[i:i+self.sequence_length] for i in range(self.n_data - self.sequence_length-1)]
self.data_y = [ self.data[i] for i in range(self.sequence_length+1, self.n_data)]
self.n_data = len(self.data_x)
temp = list(zip(self.data_x,self.data_y))
shuffle(temp)
test_size = 0.25
self.data_x, self.data_y = zip(*temp)
self.trainx, self.trainy, self.testx, self.testy = self.data_x[:-int(test_size*self.n_data)], self.data_y[:-int(test_size*self.n_data)], self.data_x[-int(test_size*self.n_data):], self.data_y[-int(test_size*self.n_data):]
self.n_train, self.n_test = len(self.trainx), len(self.testx)
batch_sizes = []
batch_sizes.extend(calculate_batch_sizes(self.n_train))
while self.batch_size not in batch_sizes:
print("\n batch size provided in the initial configuration cannot be used, please select one from the following batch sizes:\n",batch_sizes)
self.batch_size = int(input("\n enter a batch size: "))
self.n_train_batches = int( self.n_train/self.batch_size )
self.trainx, self.trainy, self.testx, self.testy = np.float32(self.trainx), np.float32(self.trainy), np.float32(self.testx), np.float32(self.testy)
self.trainx_batches, self.trainy_batches = self.trainx.reshape(self.n_train_batches, self.batch_size, self.sequence_length), self.trainy.reshape(self.n_train_batches,self.batch_size, 1)
print("\n data loaded succesfully")
# graph building and training function
def build_graph_train(self):
outputs = [0.0]*self.batch_size#tf.placeholder(tf.float32, shape = [1, self.batch_size])
x = self.trainx_batches
ht_prev = tf.reshape(np.float32([0]*(self.hidden_layers_size)), [1, self.hidden_layers_size]) #[tf.placeholder(tf.float32, shape = [1, self.hidden_layers_size], name = 'ht_prev')
ct_prev = tf.reshape(np.float32([0]*(self.hidden_layers_size)), [1, self.hidden_layers_size]) #tf.placeholder(tf.float32, shape = [1, self.hidden_layers_size], name = 'ct_prev')
self.ht_prev = np.array([0.0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size)
self.ct_prev = np.array([0.0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size)
for i1 in range(self.n_train_batches):
for i2 in range(self.batch_size):
#self.ht_prev = [self.ht_prev[i:i+9] for i in range(0, self.hidden_layers_size, 9)]
self.ft = tf.sigmoid( tf.matmul(tf.reshape(x[i1][i2], [1, self.sequence_length]), self.w_fgate) + tf.matmul(ht_prev, self.u_fgate) )
self.it = tf.sigmoid( tf.matmul(tf.reshape(x[i1][i2], [1, self.sequence_length]), self.w_igate) + tf.matmul(ht_prev, self.u_igate) )
self.ot = tf.sigmoid( tf.matmul(tf.reshape(x[i1][i2], [1, self.sequence_length]), self.w_ogate) + tf.matmul(ht_prev, self.u_ogate) )
self._ct = tf.sigmoid( tf.matmul(tf.reshape(x[i1][i2], [1, self.sequence_length]), self.w_cgate) + tf.matmul(ht_prev, self.u_cgate) )
self.ct = tf.tanh(tf.multiply(self.ft, ct_prev) + tf.multiply(self.it, self._ct))
self.ht = tf.multiply(self.ot, self.ct)
ht_prev = self.ht
ct_prev = self.ct
outputs[i2] = tf.nn.relu( tf.matmul(self.ht, self.w_output_layer) )
loss = tf.reduce_mean(tf.square(tf.subtract(outputs, self.trainy_batches[i1])))
self.ht_prev = ht_prev
self.ct_prev = ct_prev
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(loss)
print("\n Graph built \n\n Now training begins...\n")
#training
i = 0
avg_loss = float(0)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#sess = tf_debug.LocalCLIDebugWrapperSession(sess)
for ep in range(self.n_epochs + 1):
#ht_prev = np.float32([0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size)
#ct_prev = np.float32([0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size)
#loss.eval( feed_dict= { x: np.float32(self.trainx_batches).reshape(self.n_train_batches, self.batch_size, self.sequence_length) })
sess.run(self.train_op)#, feed_dict= { x: np.float32(self.trainx_batches).reshape(self.n_train_batches, self.batch_size, self.sequence_length) } )#, ht_prev: np.float32([0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size), ct_prev: np.float32([0.0]*(self.hidden_layers_size)).reshape(1, self.hidden_layers_size) })
if ep % 10 == 0:
i += 1
mse = loss.eval()# feed_dict= { x: np.float32(self.trainx_batches).reshape(self.n_train_batches, self.batch_size, self.sequence_length) })
avg_loss = float(avg_loss + mse)
print("\n Epoch: ", ep, "\t Loss: ", mse)
avg_loss = float(avg_loss/i)
self.training_loss = avg_loss
print("\n Training Loss: ", avg_loss)
# Predict function
def predict(self):
print("\n testing begins...")
x_test_row = tf.placeholder(tf.float32, shape = [1, self.sequence_length])
avg_error = float(0)
input_row = []
output_row = 0.0
predictions = []
#ht_prev = tf.placeholder(tf.float32, shape = [1, self.hidden_layers_size]) # ht_prev = tf.varaible(self.ht_prev)
#ct_prev = tf.placeholder(tf.float32, shape = [1, self.hidden_layers_size]) # ct_prev = tf.varaible(self.ct_prev)
# one forward pass
self.ft = tf.sigmoid( tf.matmul(x_test_row, self.w_fgate) + tf.matmul(self.ht_prev, self.u_fgate) )
self.it = tf.sigmoid( tf.matmul(x_test_row, self.w_igate) + tf.matmul(self.ht_prev, self.u_igate ) )
self.ot = tf.sigmoid( tf.matmul(x_test_row, self.w_ogate) + tf.matmul(self.ht_prev, self.u_ogate) )
self._ct = tf.sigmoid( tf.matmul(x_test_row, self.w_cgate) + tf.matmul(self.ht_prev, self.u_cgate) )
self.ct = tf.tanh(tf.multiply(self.ft, self.ct_prev) + tf.multiply(self.it, self._ct))
self.ht = tf.multiply(self.ot,self.ct)
pred_output = tf.nn.relu( tf.matmul(self.ht, self.w_output_layer) )
with tf.Session() as sess:
sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
print("\n loaded the variables")
for i1 in range(self.n_test):
del input_row[:]
output_row = float(self.testy[i1])
for i2 in range(self.sequence_length):
input_row.append(self.testx[i1][i2])
#sess.run(pred_output, feed_dict = { x_test_row: np.array(input_row).reshape(1, self.sequence_length), ht_prev:self.ht_prev, ct_prev: self.ct_prev })
predictions.append([pred_output.eval(feed_dict = { x_test_row: np.float32(input_row).reshape(1, self.sequence_length) }), output_row])
avg_error += abs(predictions[i1][0] - output_row)
avg_error = float(avg_error/i1)
self.testing_loss = avg_error
print("\n testing Error: ", avg_error)
return np.array(predictions)
# save model function
def save_model(self):
print("\n\n model's information saved in model_info.txt and weights stored in model.json\n\n")
f = open("model.json", "w+")
model_dict = { 'w_output_layer': self.w_output_layer, 'w_igate': self.w_igate, 'u_igate': self.u_igate, 'w_fgate': self.w_fgate, 'u_fgate': self.u_fgate, 'w_cgate': self.w_cgate, 'u_cgate': self.u_cgate, 'w_ogate': self.w_ogate, 'u_ogate': self.u_ogate }
f.write(str(model_dict))
f.close()
# main function()
def main():
# parameters of the network
config_params = dict()
config_params["sequence_length"] = 3
config_params["batch_size"] = 33
config_params["hidden_layers_size"] = 9
config_params["data_path"] = "data.csv"
config_params["no_of_epochs"] = 2000
config_params["learning_rate"] = 0.01
# object of class lstm_network
test_object = lstm_network(config_params)
test_object.load_data()
test_object.build_graph_train()
predictions = test_object.predict()
print("\n predictions are: \n", predictions)
test_object.save_model()
# run
main()
for this configuration:
Average testing error i got was: 0.15911798179149628
Average training error i got was: 0.10901389649110053
They look low im guessing because of normalizing the values

Gradients error using TensorArray Tensorflow

i am trying to implement multidimentional lstm in tensorflow, I am using TensorArray to remember previous states, i am using a complicated way to get two neigbours state(above and from left). tf.cond want that both posible condition to exist and to have the same number of inputs. this is why i added one more cell.zero_state to the (last index +1) of the states. then i using a function to get the correct indexes to the states. when i am trying to use an optimizer in order to minimize a cost, i getting that error:
InvalidArgumentError (see above for traceback): TensorArray
MultiDimentionalLSTMCell-l1-multi-l1/state_ta_262#gradients: Could not
read from TensorArray index 809 because it has not yet been written
to.
Can someone tell how to fix it?
Ps: without optimizer it works!
class MultiDimentionalLSTMCell(tf.nn.rnn_cell.RNNCell):
"""
Note that state_is_tuple is always True.
"""
def __init__(self, num_units, forget_bias=1.0, activation=tf.nn.tanh):
self._num_units = num_units
self._forget_bias = forget_bias
self._activation = activation
#property
def state_size(self):
return tf.nn.rnn_cell.LSTMStateTuple(self._num_units, self._num_units)
#property
def output_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
"""Long short-term memory cell (LSTM).
#param: imputs (batch,n)
#param state: the states and hidden unit of the two cells
"""
with tf.variable_scope(scope or type(self).__name__):
c1,c2,h1,h2 = state
# change bias argument to False since LN will add bias via shift
concat = tf.nn.rnn_cell._linear([inputs, h1, h2], 5 * self._num_units, False)
i, j, f1, f2, o = tf.split(1, 5, concat)
new_c = (c1 * tf.nn.sigmoid(f1 + self._forget_bias) +
c2 * tf.nn.sigmoid(f2 + self._forget_bias) + tf.nn.sigmoid(i) *
self._activation(j))
new_h = self._activation(new_c) * tf.nn.sigmoid(o)
new_state = tf.nn.rnn_cell.LSTMStateTuple(new_c, new_h)
return new_h, new_state
def multiDimentionalRNN_whileLoop(rnn_size,input_data,sh,dims=None,scopeN="layer1"):
"""Implements naive multidimentional recurent neural networks
#param rnn_size: the hidden units
#param input_data: the data to process of shape [batch,h,w,chanels]
#param sh: [heigth,width] of the windows
#param dims: dimentions to reverse the input data,eg.
dims=[False,True,True,False] => true means reverse dimention
#param scopeN : the scope
returns [batch,h/sh[0],w/sh[1],chanels*sh[0]*sh[1]] the output of the lstm
"""
with tf.variable_scope("MultiDimentionalLSTMCell-"+scopeN):
cell = MultiDimentionalLSTMCell(rnn_size)
shape = input_data.get_shape().as_list()
if shape[1]%sh[0] != 0:
offset = tf.zeros([shape[0], sh[0]-(shape[1]%sh[0]), shape[2], shape[3]])
input_data = tf.concat(1,[input_data,offset])
shape = input_data.get_shape().as_list()
if shape[2]%sh[1] != 0:
offset = tf.zeros([shape[0], shape[1], sh[1]-(shape[2]%sh[1]), shape[3]])
input_data = tf.concat(2,[input_data,offset])
shape = input_data.get_shape().as_list()
h,w = int(shape[1]/sh[0]),int(shape[2]/sh[1])
features = sh[1]*sh[0]*shape[3]
batch_size = shape[0]
x = tf.reshape(input_data, [batch_size,h,w, features])
if dims is not None:
x = tf.reverse(x, dims)
x = tf.transpose(x, [1,2,0,3])
x = tf.reshape(x, [-1, features])
x = tf.split(0, h*w, x)
sequence_length = tf.ones(shape=(batch_size,), dtype=tf.int32)*shape[0]
inputs_ta = tf.TensorArray(dtype=tf.float32, size=h*w,name='input_ta')
inputs_ta = inputs_ta.unpack(x)
states_ta = tf.TensorArray(dtype=tf.float32, size=h*w+1,name='state_ta',clear_after_read=False)
outputs_ta = tf.TensorArray(dtype=tf.float32, size=h*w,name='output_ta')
states_ta = states_ta.write(h*w, tf.nn.rnn_cell.LSTMStateTuple(tf.zeros([batch_size,rnn_size], tf.float32),
tf.zeros([batch_size,rnn_size], tf.float32)))
def getindex1(t,w):
return tf.cond(tf.less_equal(tf.constant(w),t),
lambda:t-tf.constant(w),
lambda:tf.constant(h*w))
def getindex2(t,w):
return tf.cond(tf.less(tf.constant(0),tf.mod(t,tf.constant(w))),
lambda:t-tf.constant(1),
lambda:tf.constant(h*w))
time = tf.constant(0)
def body(time, outputs_ta, states_ta):
constant_val = tf.constant(0)
stateUp = tf.cond(tf.less_equal(tf.constant(w),time),
lambda: states_ta.read(getindex1(time,w)),
lambda: states_ta.read(h*w))
stateLast = tf.cond(tf.less(constant_val,tf.mod(time,tf.constant(w))),
lambda: states_ta.read(getindex2(time,w)),
lambda: states_ta.read(h*w))
currentState = stateUp[0],stateLast[0],stateUp[1],stateLast[1]
out , state = cell(inputs_ta.read(time),currentState)
outputs_ta = outputs_ta.write(time,out)
states_ta = states_ta.write(time,state)
return time + 1, outputs_ta, states_ta
def condition(time,outputs_ta,states_ta):
return tf.less(time , tf.constant(h*w))
result , outputs_ta, states_ta = tf.while_loop(condition, body, [time,outputs_ta,states_ta])
outputs = outputs_ta.pack()
states = states_ta.pack()
y = tf.reshape(outputs, [h,w,batch_size,rnn_size])
y = tf.transpose(y, [2,0,1,3])
if dims is not None:
y = tf.reverse(y, dims)
return y
def tanAndSum(rnn_size,input_data,scope):
outs = []
for i in range(2):
for j in range(2):
dims = [False]*4
if i!=0:
dims[1] = True
if j!=0:
dims[2] = True
outputs = multiDimentionalRNN_whileLoop(rnn_size,input_data,[2,2],
dims,scope+"-multi-l{0}".format(i*2+j))
outs.append(outputs)
outs = tf.pack(outs, axis=0)
mean = tf.reduce_mean(outs, 0)
return tf.nn.tanh(mean)
graph = tf.Graph()
with graph.as_default():
input_data = tf.placeholder(tf.float32, [20,36,90,1])
#input_data = tf.ones([20,36,90,1],dtype=tf.float32)
sh = [2,2]
out1 = tanAndSum(20,input_data,'l1')
out = tanAndSum(25,out1,'l2')
cost = tf.reduce_mean(out)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
#out = multiDimentionalRNN_raw_rnn(2,input_data,sh,dims=[False,True,True,False],scopeN="layer1")
#cell = MultiDimentionalLSTMCell(10)
#out = cell.zero_state(2, tf.float32).c
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
ou,k,_ = session.run([out,cost,optimizer],{input_data:np.ones([20,36,90,1],dtype=np.float32)})
print(ou.shape)
print(k)
You should add parameter parallel_iterations=1 to your while loop call.
Such as:
result, outputs_ta, states_ta = tf.while_loop(
condition, body, [time,outputs_ta,states_ta], parallel_iterations=1)
This is required because inside body you perform read and write operations on the same tensor array (states_ta). And in case of parallel loop execution(parallel_iterations > 1) some thread may try to read info from tensorArray, that was not written to it by another one.
I've test your code snippet with parallel_iterations=1 on tensorflow 0.12.1 and it works as expected.

The node 'Merge/MergeSummary' has inputs from different frames: what does it mean?

trying to merge all my summaries, I have an error saying that the inputs of Merge/MergeSummary comes from different frames. So, first of all: what is a frame? Could you please point me somewhere in the TF documentation about such stuff? -- of course, I googled a bit but could find almost nothing. How can I fix this issue? Below the code to reproduce the error. Thanks in advance.
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
tf.set_random_seed(23)
BATCH = 2
LENGTH = 4
SIZE = 5
ATT_SIZE = 3
NUM_QUERIES = 2
def linear(inputs, output_size, use_bias=True, activation_fn=None):
"""Linear projection."""
input_shape = inputs.get_shape().as_list()
input_size = input_shape[-1]
output_shape = input_shape[:-1] + [output_size]
if len(output_shape) > 2:
output_shape_tensor = tf.unstack(tf.shape(inputs))
output_shape_tensor[-1] = output_size
output_shape_tensor = tf.stack(output_shape_tensor)
inputs = tf.reshape(inputs, [-1, input_size])
kernel = tf.get_variable("kernel", [input_size, output_size])
output = tf.matmul(inputs, kernel)
if use_bias:
output = output + tf.get_variable('bias', [output_size])
if len(output_shape) > 2:
output = tf.reshape(output, output_shape_tensor)
output.set_shape(output_shape) # pylint: disable=I0011,E1101
if activation_fn is not None:
return activation_fn(output)
return output
class Attention(object):
"""Attention mechanism implementation."""
def __init__(self, attention_states, attention_size):
"""Initializes a new instance of the Attention class."""
self._states = attention_states
self._attention_size = attention_size
self._batch = tf.shape(self._states)[0]
self._length = tf.shape(self._states)[1]
self._size = self._states.get_shape()[2].value
self._features = None
def _init_features(self):
states = tf.reshape(
self._states, [self._batch, self._length, 1, self._size])
weights = tf.get_variable(
"kernel", [1, 1, self._size, self._attention_size])
self._features = tf.nn.conv2d(states, weights, [1, 1, 1, 1], "SAME")
def get_weights(self, query, scope=None):
"""Reurns the attention weights for the given query."""
with tf.variable_scope(scope or "Attention"):
if self._features is None:
self._init_features()
else:
tf.get_variable_scope().reuse_variables()
vect = tf.get_variable("Vector", [self._attention_size])
with tf.variable_scope("Query"):
query_features = linear(query, self._attention_size, False)
query_features = tf.reshape(
query_features, [-1, 1, 1, self._attention_size])
activations = vect * tf.tanh(self._features + query_features)
activations = tf.reduce_sum(activations, [2, 3])
with tf.name_scope('summaries'):
tf.summary.histogram('histogram', activations)
return tf.nn.softmax(activations)
states = tf.placeholder(tf.float32, shape=[BATCH, None, SIZE]) # unknown length
queries = tf.placeholder(tf.float32, shape=[NUM_QUERIES, BATCH, ATT_SIZE])
attention = Attention(states, ATT_SIZE)
func = lambda x: attention.get_weights(x, "Softmax")
weights = tf.map_fn(func, queries)
for var in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES):
name = var.name.replace(':', '_')
tf.summary.histogram(name, var)
summary_op = tf.summary.merge_all()
states_np = np.random.rand(BATCH, LENGTH, SIZE)
queries_np = np.random.rand(NUM_QUERIES, BATCH, ATT_SIZE)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
weights_np, summary_str = sess.run([weights, summary_op], {states: states_np, queries: queries_np})
print weights_np
The error message was indeed not user friendly. It has been updated to
ValueError: Cannot use 'map/while/summaries/histogram' as input to 'Merge/MergeSummary' because 'map/while/summaries/histogram' is in a while loop. See info log for more details.
As the new message says, the problem is that you cannot produce summaries from inside of the while loop. The frame that the original message referred to is the "execution frame" of the while loop - all the state for each iteration of the while loop is kept in a frame.
In this case, the while_loop is created by tf.map_fn and the summary inside it is tf.summary.histogram('histogram', activations).
There are a couple of ways to deal with this. You can take the summary out of the get_weights, have the get_weights return activations as well, create the summary using the newly returned activations from tf.map_fn call.
Another approach, if NUM_QUERIES is constant and small, can be to statically unroll the loop instead of using tf.map_fn. Here is the code to do this:
# TOP PART OF THE CODE IS THE SAME
states = tf.placeholder(tf.float32, shape=[BATCH, None, SIZE]) # unknown length
queries = tf.placeholder(tf.float32, shape=[NUM_QUERIES, BATCH, ATT_SIZE])
attention = Attention(states, ATT_SIZE)
func = lambda x: attention.get_weights(x, "Softmax")
# NEW CODE BEGIN
split_queries = tf.split(queries, NUM_QUERIES)
weights = []
for query in split_queries:
weights.append(func(query))
# NEW CODE END
for var in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES):
name = var.name.replace(':', '_')
tf.summary.histogram(name, var)
summary_op = tf.summary.merge_all()
states_np = np.random.rand(BATCH, LENGTH, SIZE)
queries_np = np.random.rand(NUM_QUERIES, BATCH, ATT_SIZE)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# NEW CODE BEGIN
results = sess.run(weights + [summary_op], {states: states_np, queries: queries_np})
weights_np, summary_str = results[:-1], results[-1]
# NEW CODE END
print weights_np

Categories