normalize the input of relu function encountered a runtime error - python

when I try to pass the maximum activation value from previous layer to normalize the input of relu in next layer I encounter a runtime error as below. However, when I pass fixed value it works well without any error.
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved
tensors after they have already been freed). Saved intermediate values of the graph are freed
when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to
backward through the graph a second time or if you need to access saved tensors after calling
backward.
As you see in this code below, I pass the argument prev_layer_max from the previous layer and encounter the error:
class th_norm_ReLU(nn.Module):
def __init__(self, modify):
super(th_norm_ReLU, self).__init__()
self.therelu = F.relu
def forward(self, input, prev_layer_max):
output = input * (prev_layer_max / input.max())
norm_output = self.therelu (output)
return norm_output
But if I use a fixed value instead of passed prev_layer_max argument, as this code below I make it equal to 1 it works normally without any error:
def forward(self, input, prev_layer_max = 1):
output = input * (1 / input.max())
norm_output = self.therelu (output)
the training loop is as below :
for epoch in range(params.epochs):
running_loss = 0
start_time = time.time()
for i, (images, labels) in enumerate(train_loader):
model.train()
model.zero_grad()
optimizer.zero_grad()
labels.to(device)
images = images.float().to(device)
outputs = model(images, epoch)
loss = criterion(outputs.cpu(), labels)
running_loss += loss.item()
loss.backward()
optimizer.step()
here is the forward in the model where I record the max of each layer in a list ( thresh_list ):
def forward(self, input, epoch):
x = self.conv1(input)
x = self.relu(x,1)
self.thresh_list[0] = max(self.thresh_list[0], x.max()) #to get the max activation
x = self.conv_dropout(x)
x = self.conv2(x)
x = self.relu(x, self.thresh_list[0])
self.thresh_list[1] = max(self.thresh_list[1], x.max())
x = self.pool1(x)
x = self.conv_dropout(x)
x = self.conv3(x)
x = self.relu(x, self.thresh_list[1] )
self.thresh_list[2] = max(self.thresh_list[2], x.max())
The Relue function I call is :
self.relu = th_norm_ReLU(True)
and the_norm_ReLU model is shown above.

Related

Pytorch getting accuracy of train_loop doesn't work

I want to get the accuracy of my train section of my neuronal network
But i get this error:
correct += (prediction.argmax(1) == y).type(torch.float).item()
ValueError: only one element tensors can be converted to Python scalars
With this code :
def train_loop(dataloader, model, optimizer):
model.train()
size = len(dataloader.dataset)
correct = 0, 0
l_loss = 0
for batch, (X, y) in enumerate(dataloader):
prediction = model(X)
loss = cross_entropy(prediction, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
correct += (prediction.argmax(1) == y).type(torch.float).sum().item()
loss, current = loss.item(), batch * len(X)
l_loss = loss
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
correct /= size
accu = 100 * correct
train_loss.append(l_loss)
train_accu.append(accu)
print(f"Accuracy: {accu:>0.1f}%")
I don't understand why it is not working becaus in my test section it work perfektly fine with execly the same code line.
item function is used to convert a one-element tensor to a standard python number as stated in the here. Please try to make sure that the result of the sum() is only a one-element tensor before using item().
x = torch.tensor([1.0,2.0]) # a tensor contains 2 elements
x.item()
error message: ValueError: only one element tensors can be converted to Python scalars
Try to use this:
prediction = prediction.argmax(1)
correct = prediction.eq(y)
correct = correct.sum()
print(correct) # to check if it is a one value tensor
correct_sum += correct.item()

What is the correct way to update an input variable during training?

I have an input
inp = torch.tensor([1.0])
and a neural network
class Model_updater(nn.Module):
def __init__(self):
super(Model_updater, self).__init__()
self.fc1 = nn.Linear(1, 2)
self.fc2 = nn.Linear(2, 3)
self.fc3 = nn.Linear(3, 2)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net_updater = Model_updater()
opt_updater = optim.Adam(net_updater.parameters())
I'm trying to update my input using the neural network's output:
inp = torch.tensor([1.0])
epochs = 3
for i in range(epochs):
opt_updater.zero_grad()
inp_copy = inp.detach().clone()
mu, sigma = net_updater(inp_copy)
dist1 = Normal(mu, torch.abs(sigma))
a = dist1.rsample()
inp += a
loss = torch.tensor(5.0) - inp
loss.backward(retain_graph=True)
opt_updater.step()
But getting the error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3, 2]], which is output 0 of TBackward, is at version 2; expected version 1
I also tried changing the loss calculations with
loss = torch.tensor(5.0) - inp_copy
But got the error
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I also tried without the retain_graph=True but I get
RuntimeError: Trying to backward through the graph a second time,
but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.
Which doesn't really makes sense to me because I don't see where I'm calling backward() twice
Most likely, this is what you want
inp1 = inp + a # create a separate variable for updated value
inp.data = inp1.data # update the value without touching the graph
loss = torch.tensor(5.0) - inp1 # use updated value which has gradient

AttributeError: 'tuple' object has no attribute 'size'

UPDATE: after looking back on this question, most of the code was unnecessary. In summary, the hidden layer of a Pytorch RNN needs to be a torch tensor. When I posted the question, the hidden layer was a tuple.
Below is my data loader.
from torch.utils.data import TensorDataset, DataLoader
def batch_data(log_returns, sequence_length, batch_size):
"""
Batch the neural network data using DataLoader
:param log_returns: asset's daily log returns
:param sequence_length: The sequence length of each batch
:param batch_size: The size of each batch; the number of sequences in a batch
:return: DataLoader with batched data
"""
# total number of batches we can make
n_batches = len(log_returns)//batch_size
# Keep only enough characters to make full batches
log_returns = log_returns[:n_batches * batch_size]
y_len = len(log_returns) - sequence_length
x, y = [], []
for idx in range(0, y_len):
idx_end = sequence_length + idx
x_batch = log_returns[idx:idx_end]
x.append(x_batch)
# only making predictions after the last word in the batch
batch_y = log_returns[idx_end]
y.append(batch_y)
# create tensor datasets
x_tensor = torch.from_numpy(np.asarray(x))
y_tensor = torch.from_numpy(np.asarray(y))
# make x_tensor 3-d instead of 2-d
x_tensor = x_tensor.unsqueeze(-1)
data = TensorDataset(x_tensor, y_tensor)
data_loader = DataLoader(data, shuffle=False, batch_size=batch_size)
# return a dataloader
return data_loader
def init_hidden(self, batch_size):
''' Initializes hidden state '''
# Create two new tensors with sizes n_layers x batch_size x n_hidden,
# initialized to zero, for hidden state and cell state of LSTM
weight = next(self.parameters()).data
if (train_on_gpu):
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
else:
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
return hidden
I don't know what is wrong. When I try to start training the model, I am getting the error message:
AttributeError: 'tuple' object has no attribute 'size'
The issue comes from the fact that hidden (in the forward definition) isn't a Torch.Tensor. Therefore, r_output, hidden = self.gru(nn_input, hidden) raises a rather confusing error without specifying exaclty what's wrong in the arguments. Altough you can see it's raised inside a nn.RNN function named check_hidden_size()...
I was confused at first, thinking that the second argument of nn.RNN: h0 was a tuple containing (hidden_state, cell_state). Same can be said ofthe second element returned by that call: hn. That's not the case h0 and hn are both Torch.Tensors. Interestingly enough though, you are able to unpack stacked tensors:
>>> z = torch.stack([torch.Tensor([1,2,3]), torch.Tensor([4,5,6])])
>>> a, b = z
>>> a, b
(tensor([1., 2., 3.]), tensor([4., 5., 6.]))
You are supposed to provide a tensor as the second argument of a nn.GRU __call__.
Edit - After further inspection of your code I found out that you are converting hidden back again to a tuple... In cell [14] you have hidden = tuple([each.data for each in hidden]). Which basically overwrites the modification you did in init_hidden with torch.stack.
Take a step back and look at the source code for RNNBase the base class for RNN modules. If the hidden state is not given to the forward it will default to:
if hx is None:
num_directions = 2 if self.bidirectional else 1
hx = torch.zeros(self.num_layers * num_directions,
max_batch_size, self.hidden_size,
dtype=input.dtype, device=input.device)
This is essentially the exact init as the one you are trying to implement. Granted you only want to reset the hidden states on every epoch, (I don't see why...). Anyhow, a basic alternative would be to set hidden to None at the start of an epoch, passed as it is to self.forward_back_prop then to rnn, then to self.rnn which will in turn default initialize it for you. Then overwrite hidden with the hidden state returned by that RNN forward call.
To summarize, I've only kept the relevant parts of the code. Remove the init_hidden function from AssetGRU and make those modifications:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
...
if hidden is not None:
hidden = hidden.detach()
...
output, hidden = rnn(inp, hidden)
...
return loss.item(), hidden
def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches):
...
for epoch_i in range(1, n_epochs + 1):
hidden = None
for batch_i, (inputs, labels) in enumerate(train_loader, 1):
loss, hidden = forward_back_prop(rnn, optimizer, criterion,
inputs, labels, hidden)
...
...
There should be [] brackets instead of () around 0.
def forward(self, nn_input, hidden):
''' Forward pass through the network.
These inputs are x, and the hidden/cell state `hidden`. '''
# batch_size equals the input's first dimension
batch_size = nn_input.size(0)

How to fix MatMul Op has type float64 that does not match type float32 TypeError?

I am trying to save Nueral Network weights into a file and then restoring those weights by initializing the network instead of random initialization. My code works fine with random initialization. But, when i initialize weights from file it is showing me an error TypeError: Input 'b' of 'MatMul' Op has type float64 that does not match type float32 of argument 'a'. I don't know how do i solve this issue.Here is my code:
Model Initialization
# Parameters
training_epochs = 5
batch_size = 64
display_step = 5
batch = tf.Variable(0, trainable=False)
regualarization = 0.008
# Network Parameters
n_hidden_1 = 300 # 1st layer num features
n_hidden_2 = 250 # 2nd layer num features
n_input = model.layer1_size # Vector input (sentence shape: 30*10)
n_classes = 12 # Sentence Category detection total classes (0-11 categories)
#History storing variables for plots
loss_history = []
train_acc_history = []
val_acc_history = []
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
Model parameters
#loading Weights
def weight_variable(fan_in, fan_out, filename):
stddev = np.sqrt(2.0/fan_in)
if (filename == ""):
initial = tf.random_normal([fan_in,fan_out], stddev=stddev)
else:
initial = np.loadtxt(filename)
print initial.shape
return tf.Variable(initial)
#loading Biases
def bias_variable(shape, filename):
if (filename == ""):
initial = tf.constant(0.1, shape=shape)
else:
initial = np.loadtxt(filename)
print initial.shape
return tf.Variable(initial)
# Create model
def multilayer_perceptron(_X, _weights, _biases):
layer_1 = tf.nn.relu(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1']))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']))
return tf.matmul(layer_2, weights['out']) + biases['out']
# Store layers weight & bias
weights = {
'h1': w2v_utils.weight_variable(n_input, n_hidden_1, filename="weights_h1.txt"),
'h2': w2v_utils.weight_variable(n_hidden_1, n_hidden_2, filename="weights_h2.txt"),
'out': w2v_utils.weight_variable(n_hidden_2, n_classes, filename="weights_out.txt")
}
biases = {
'b1': w2v_utils.bias_variable([n_hidden_1], filename="biases_b1.txt"),
'b2': w2v_utils.bias_variable([n_hidden_2], filename="biases_b2.txt"),
'out': w2v_utils.bias_variable([n_classes], filename="biases_out.txt")
}
# Define loss and optimizer
#learning rate
# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
learning_rate = tf.train.exponential_decay(
0.02*0.01, # Base learning rate. #0.002
batch * batch_size, # Current index into the dataset.
X_train.shape[0], # Decay step.
0.96, # Decay rate.
staircase=True)
# Construct model
pred = tf.nn.relu(multilayer_perceptron(x, weights, biases))
#L2 regularization
l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()])
#Softmax loss
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
#Total_cost
cost = cost+ (regualarization*0.5*l2_loss)
# Adam Optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost,global_step=batch)
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Initializing the variables
init = tf.initialize_all_variables()
print "Network Initialized!"
ERROR DETAILS
The tf.matmul() op does not perform automatic type conversions, so both of its inputs must have the same element type. The error message you are seeing indicates that you have a call to tf.matmul() where the first argument has type tf.float32, and the second argument has type tf.float64. You must convert one of the inputs to match the other, for example using tf.cast(x, tf.float32).
Looking at your code, I don't see anywhere that a tf.float64 tensor is explicitly created (the default dtype for floating-point values in the TensorFlow Python API—e.g. for tf.constant(37.0)—is tf.float32). I would guess that the errors are caused by the np.loadtxt(filename) calls, which might be loading an np.float64 array. You can explicitly change them to load np.float32 arrays (which are converted to tf.float32 tensors) as follows:
initial = np.loadtxt(filename).astype(np.float32)
Although It's an old question but I would like you include that I came across the same problem. I resolved it using dtype=tf.float64 for parameter initialization and for creating X and Y placeholders as well.
Here is the snap of my code.
X = tf.placeholder(shape=[n_x, None],dtype=tf.float64)
Y = tf.placeholder(shape=[n_y, None],dtype=tf.float64)
and
parameters['W' + str(l)] = tf.get_variable('W' + str(l), [layers_dims[l],layers_dims[l-1]],dtype=tf.float64, initializer = tf.contrib.layers.xavier_initializer(seed = 1))
parameters['b' + str(l)] = tf.get_variable('b' + str(l), [layers_dims[l],1],dtype=tf.float64, initializer = tf.zeros_initializer())
Declaring all placholders and parameters with float64 datatype will resolve this issue.
For Tensorflow 2
You can cast one of the tensor, like this for example:
_X = tf.cast(_X, dtype='float64')
You can get rid of this error by setting all layers to have a default dtype of float64:
tf.keras.backend.set_floatx('float64')

TensorFlow error: TensorShape() must have the same rank

def compileActivation(self, net, layerNum):
variable = net.x if layerNum == 0 else net.varArrayA[layerNum - 1]
#print tf.expand_dims(net.dropOutVectors[layerNum], 1)
#print net.varWeights[layerNum]['w'].get_shape().as_list()
z = tf.matmul((net.varWeights[layerNum]['w']), (variable * (tf.expand_dims(net.dropOutVectors[layerNum], 1) if self.dropout else 1.0))) + tf.expand_dims(net.varWeights[layerNum]['b'], 1)
a = self.activation(z, self.pool_size)
net.varArrayA.append(a)
I am running an activation function which computes z and passes it into a sigmoid activation.
When I try to execute the above function, I get the following error:
ValueError: Shapes TensorShape([Dimension(-2)]) and TensorShape([Dimension(None), Dimension(None)]) must have the same rank
The theano equivalent for computing z is working just fine:
z = T.dot(net.varWeights[layerNum]['w'], variable * (net.dropOutVectors[layerNum].dimshuffle(0, 'x') if self.dropout else 1.0)) + net.varWeights[layerNum]['b'].dimshuffle(0, 'x')
Mihir,
When I encountered this problem it was because my placeholders were the wrong size in my feed dictionary. Also you should know how to run the graph in a session. tf.Session.run(fetches, feed_dict=None)
Here's my code to make the placeholders
# Note this place holder is for the input data feed-dict definition
input_placeholder = tf.placeholder(tf.float32, shape=(batch_size, FLAGS.InputLayer))
# Not sure yet what this will be used for.
desired_output_placeholder = tf.placeholder(tf.float32, shape=(batch_size, FLAGS.OutputLayer))
Here's my fill feed dictionary function:
def feel_feed_funct(data_sets_train, input_pl, output_pl):
ti_feed, dto_feed = data_sets_train.next_batch(FLAGS.batch_size)
feed_dict = {
input_pl: ti_feed,
output_pl: dto_feed
}
return feed_dict
Later I do this:
# Fill a feed dictionary with the actual set of images and labels
# for this particular training step.
feed_dict = fill_feed_dict(data_sets.train, input_placeholder, desired_output_placeholder)
Then to run the session and fetch the outputs I have this line
_, l = sess.run([train_op, loss], feed_dict=feed_dict)

Categories