AttributeError: 'tuple' object has no attribute 'size' - python

UPDATE: after looking back on this question, most of the code was unnecessary. In summary, the hidden layer of a Pytorch RNN needs to be a torch tensor. When I posted the question, the hidden layer was a tuple.
Below is my data loader.
from torch.utils.data import TensorDataset, DataLoader
def batch_data(log_returns, sequence_length, batch_size):
"""
Batch the neural network data using DataLoader
:param log_returns: asset's daily log returns
:param sequence_length: The sequence length of each batch
:param batch_size: The size of each batch; the number of sequences in a batch
:return: DataLoader with batched data
"""
# total number of batches we can make
n_batches = len(log_returns)//batch_size
# Keep only enough characters to make full batches
log_returns = log_returns[:n_batches * batch_size]
y_len = len(log_returns) - sequence_length
x, y = [], []
for idx in range(0, y_len):
idx_end = sequence_length + idx
x_batch = log_returns[idx:idx_end]
x.append(x_batch)
# only making predictions after the last word in the batch
batch_y = log_returns[idx_end]
y.append(batch_y)
# create tensor datasets
x_tensor = torch.from_numpy(np.asarray(x))
y_tensor = torch.from_numpy(np.asarray(y))
# make x_tensor 3-d instead of 2-d
x_tensor = x_tensor.unsqueeze(-1)
data = TensorDataset(x_tensor, y_tensor)
data_loader = DataLoader(data, shuffle=False, batch_size=batch_size)
# return a dataloader
return data_loader
def init_hidden(self, batch_size):
''' Initializes hidden state '''
# Create two new tensors with sizes n_layers x batch_size x n_hidden,
# initialized to zero, for hidden state and cell state of LSTM
weight = next(self.parameters()).data
if (train_on_gpu):
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
else:
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
return hidden
I don't know what is wrong. When I try to start training the model, I am getting the error message:
AttributeError: 'tuple' object has no attribute 'size'

The issue comes from the fact that hidden (in the forward definition) isn't a Torch.Tensor. Therefore, r_output, hidden = self.gru(nn_input, hidden) raises a rather confusing error without specifying exaclty what's wrong in the arguments. Altough you can see it's raised inside a nn.RNN function named check_hidden_size()...
I was confused at first, thinking that the second argument of nn.RNN: h0 was a tuple containing (hidden_state, cell_state). Same can be said ofthe second element returned by that call: hn. That's not the case h0 and hn are both Torch.Tensors. Interestingly enough though, you are able to unpack stacked tensors:
>>> z = torch.stack([torch.Tensor([1,2,3]), torch.Tensor([4,5,6])])
>>> a, b = z
>>> a, b
(tensor([1., 2., 3.]), tensor([4., 5., 6.]))
You are supposed to provide a tensor as the second argument of a nn.GRU __call__.
Edit - After further inspection of your code I found out that you are converting hidden back again to a tuple... In cell [14] you have hidden = tuple([each.data for each in hidden]). Which basically overwrites the modification you did in init_hidden with torch.stack.
Take a step back and look at the source code for RNNBase the base class for RNN modules. If the hidden state is not given to the forward it will default to:
if hx is None:
num_directions = 2 if self.bidirectional else 1
hx = torch.zeros(self.num_layers * num_directions,
max_batch_size, self.hidden_size,
dtype=input.dtype, device=input.device)
This is essentially the exact init as the one you are trying to implement. Granted you only want to reset the hidden states on every epoch, (I don't see why...). Anyhow, a basic alternative would be to set hidden to None at the start of an epoch, passed as it is to self.forward_back_prop then to rnn, then to self.rnn which will in turn default initialize it for you. Then overwrite hidden with the hidden state returned by that RNN forward call.
To summarize, I've only kept the relevant parts of the code. Remove the init_hidden function from AssetGRU and make those modifications:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
...
if hidden is not None:
hidden = hidden.detach()
...
output, hidden = rnn(inp, hidden)
...
return loss.item(), hidden
def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches):
...
for epoch_i in range(1, n_epochs + 1):
hidden = None
for batch_i, (inputs, labels) in enumerate(train_loader, 1):
loss, hidden = forward_back_prop(rnn, optimizer, criterion,
inputs, labels, hidden)
...
...

There should be [] brackets instead of () around 0.
def forward(self, nn_input, hidden):
''' Forward pass through the network.
These inputs are x, and the hidden/cell state `hidden`. '''
# batch_size equals the input's first dimension
batch_size = nn_input.size(0)

Related

How to fix input and parameter tensors are not at the same device?

I have seen other people have this error and I try to follow the steps to resolve, but continue to receive this error. "RuntimeError: Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0"
I run both model.to(device) and input_seq.to(device). Error says it found an input tensor on CPU, but all input data should be on GPU with input_seq.to(device). Below is fill code
text = ['hey how are you','good i am fine','have a nice day']
# Join all the sentences together and extract the unique characters from the combined sentences
chars = set(''.join(text))
# Creating a dictionary that maps integers to the characters
int2char = dict(enumerate(chars))
# Creating another dictionary that maps characters to integers
char2int = {char: ind for ind, char in int2char.items()}
# Finding the length of the longest string in our data
maxlen = len(max(text, key=len))
# Padding
# A simple loop that loops through the list of sentences and adds a ' ' whitespace until the length of
# the sentence matches the length of the longest sentence
for i in range(len(text)):
while len(text[i])<maxlen:
text[i] += ' '
# Creating lists that will hold our input and target sequences
input_seq = []
target_seq = []
for i in range(len(text)):
# Remove last character for input sequence
input_seq.append(text[i][:-1])
# Remove first character for target sequence
target_seq.append(text[i][1:])
print("Input Sequence: {}\nTarget Sequence: {}".format(input_seq[i], target_seq[i]))
for i in range(len(text)):
input_seq[i] = [char2int[character] for character in input_seq[i]]
target_seq[i] = [char2int[character] for character in target_seq[i]]
dict_size = len(char2int)
seq_len = maxlen - 1
batch_size = len(text)
def one_hot_encode(sequence, dict_size, seq_len, batch_size):
# Creating a multi-dimensional array of zeros with the desired output shape
features = np.zeros((batch_size, seq_len, dict_size), dtype=np.float32)
# Replacing the 0 at the relevant character index with a 1 to represent that character
for i in range(batch_size):
for u in range(seq_len):
features[i, u, sequence[i][u]] = 1
return features
# Input shape --> (Batch Size, Sequence Length, One-Hot Encoding Size)
input_seq = one_hot_encode(input_seq, dict_size, seq_len, batch_size)
input_seq = torch.from_numpy(input_seq)
target_seq = torch.Tensor(target_seq)
# torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
is_cuda = torch.cuda.is_available()
# If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
if is_cuda:
device = torch.device("cuda")
print("GPU is available")
else:
device = torch.device("cpu")
print("GPU not available, CPU used")
class Model(nn.Module):
def __init__(self, input_size, output_size, hidden_dim, n_layers):
super(Model, self).__init__()
# Defining some parameters
self.hidden_dim = hidden_dim
self.n_layers = n_layers
#Defining the layers
# RNN Layer
self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
# Fully connected layer
self.fc = nn.Linear(hidden_dim, output_size)
def forward(self, x):
batch_size = x.size(0)
# Initializing hidden state for first input using method defined below
hidden = self.init_hidden(batch_size)
# Passing in the input and hidden state into the model and obtaining outputs
out, hidden = self.rnn(x, hidden)
# Reshaping the outputs such that it can be fit into the fully connected layer
out = out.contiguous().view(-1, self.hidden_dim)
out = self.fc(out)
return out, hidden
def init_hidden(self, batch_size):
# This method generates the first hidden state of zeros which we'll use in the forward pass
# We'll send the tensor holding the hidden state to the device we specified earlier as well
hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
return hidden
# Instantiate the model with hyperparameters
model = Model(input_size=dict_size, output_size=dict_size, hidden_dim=12, n_layers=1)
# We'll also set the model to the device that we defined earlier (default is CPU)
model.to(device)
# Define hyperparameters
n_epochs = 100
lr=0.01
# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
# Training Run
for epoch in range(1, n_epochs + 1):
optimizer.zero_grad() # Clears existing gradients from previous epoch
input_seq.to(device)
target_seq.to(device)
output, hidden = model(input_seq)
loss = criterion(output, target_seq.view(-1).long())
loss.backward() # Does backpropagation and calculates gradients
optimizer.step() # Updates the weights accordingly
if epoch%10 == 0:
print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
print("Loss: {:.4f}".format(loss.item()))
Unlike the to method available on nn.Modules such as your model. The to method on Tensors is not an in-place operation! As stated on the documentation page:
This method [nn.Module.to] modifies the module in-place.
vs for Tensor.to:
[...] the returned tensor is a copy of self with the desired [...] torch.device.
In other words, you need to reassign the tensors in order to effectively send them to the device.
input_seq = input_seq.to(device)
target_seq = target_seq.to(device)
While an nn.Module won't need this treatment:
model.to(device)
To clearly understand what happens here, take this example:
>>> x = torch.zeros(1) # on cpu
>>> y = x.cuda() # y is a copy of x
>>> y.device # placed on cuda device
'cuda:0'
>>> x.device # but x remains on the original device
'cpu'

Implementing a batch dependent loss in Keras

I have an autoencoder set up in Keras. I want to be able to weight the features of the input vector according to a predetermined 'precision' vector. This continuous valued vector has the same length as the input, and each element lies in the range [0, 1], corresponding to the confidence in the corresponding input element, where 1 is completely confident and 0 is no confidence.
I have a precision vector for every example.
I have defined a loss that takes into account this precision vector. Here, reconstructions of low-confidence features are down-weighted.
def MAEpw_wrapper(y_prec):
def MAEpw(y_true, y_pred):
return K.mean(K.square(y_prec * (y_pred - y_true)))
return MAEpw
My issue is that the precision tensor y_prec depends on the batch. I want to be able to update y_prec according to the current batch so that each precision vector is correctly associated with its observation.
I have the done the following:
global y_prec
y_prec = K.variable(P[:32])
Here P is a numpy array containing all precision vectors with the indices corresponding to the examples. I initialize y_prec to have the correct shape for a batch size of 32. I then define the following DataGenerator:
class DataGenerator(Sequence):
def __init__(self, batch_size, y, shuffle=True):
self.batch_size = batch_size
self.y = y
self.shuffle = shuffle
self.on_epoch_end()
def on_epoch_end(self):
self.indexes = np.arange(len(self.y))
if self.shuffle == True:
np.random.shuffle(self.indexes)
def __len__(self):
return int(np.floor(len(self.y) / self.batch_size))
def __getitem__(self, index):
indexes = self.indexes[index * self.batch_size: (index+1) * self.batch_size]
# Set precision vector.
global y_prec
new_y_prec = K.variable(P[indexes])
y_prec = K.update(y_prec, new_y_prec)
# Get training examples.
y = self.y[indexes]
return y, y
Here I am aiming to update y_prec in the same function that generates the batch. This seems to be updating y_prec as expected. I then define my model architecture:
dims = [40, 20, 2]
model2 = Sequential()
model2.add(Dense(dims[0], input_dim=64, activation='relu'))
model2.add(Dense(dims[1], input_dim=dims[0], activation='relu'))
model2.add(Dense(dims[2], input_dim=dims[1], activation='relu', name='bottleneck'))
model2.add(Dense(dims[1], input_dim=dims[2], activation='relu'))
model2.add(Dense(dims[0], input_dim=dims[1], activation='relu'))
model2.add(Dense(64, input_dim=dims[0], activation='linear'))
And finally, I compile and run:
model2.compile(optimizer='adam', loss=MAEpw_wrapper(y_prec))
model2.fit_generator(DataGenerator(32, digits.data), epochs=100)
Where digits.data is a numpy array of observations.
However, this ends up defining separate graphs:
StopIteration: Tensor("Variable:0", shape=(32, 64), dtype=float32_ref) must be from the same graph as Tensor("Variable_4:0", shape=(32, 64), dtype=float32_ref).
I've scoured SO for a solution to my problem but nothing I've found works. Any help on how to do this properly is appreciated.
This autoencoder can be easily implemented using the Keras functional API. This will allow to have an additional input placeholder y_prec_input, which will be fed with the "precision" vector. The full source code can be found here.
Data generator
First, let's reimplement your data generator as follows:
class DataGenerator(Sequence):
def __init__(self, batch_size, y, prec, shuffle=True):
self.batch_size = batch_size
self.y = y
self.shuffle = shuffle
self.prec = prec
self.on_epoch_end()
def on_epoch_end(self):
self.indexes = np.arange(len(self.y))
if self.shuffle:
np.random.shuffle(self.indexes)
def __len__(self):
return int(np.floor(len(self.y) / self.batch_size))
def __getitem__(self, index):
indexes = self.indexes[index * self.batch_size: (index + 1) * self.batch_size]
y = self.y[indexes]
y_prec = self.prec[indexes]
return [y, y_prec], y
Note that I got rid of the global variable. Now, instead, the precision vector P is provided as input argument (prec), and the generator yields an additional input that will be fed to the precision placeholder y_prec_input (see model definition).
Model
Finally, your model can be defined and trained as follows:
y_input = Input(shape=(input_dim,))
y_prec_input = Input(shape=(1,))
h_enc = Dense(dims[0], activation='relu')(y_input)
h_enc = Dense(dims[1], activation='relu')(h_enc)
h_enc = Dense(dims[2], activation='relu', name='bottleneck')(h_enc)
h_dec = Dense(dims[1], activation='relu')(h_enc)
h_dec = Dense(input_dim, activation='relu')(h_dec)
model2 = Model(inputs=[y_input, y_prec_input], outputs=h_dec)
model2.compile(optimizer='adam', loss=MAEpw_wrapper(y_prec_input))
# Train model
model2.fit_generator(DataGenerator(32, digits.data, P), epochs=100)
where input_dim = digits.data.shape[1]. Note that I also changed the output dimension of the decoder to input_dim, since it must match the input dimension.
Try to test your code with worker=0 when you call fit_generator, if it works normally then threading is your problem.
If threading is the cause, try this:
# In the code that executes on the main thread
graph = tf.get_default_graph()
# In code that executes in other threads(e.g. your generator)
with graph.as_default():
...
...
new_y_prec = K.variable(P[indexes])
y_prec = K.update(y_prec, new_y_prec)

Confused about multi-layered Bidirectional RNN in Tensorflow

I'm building a multilayered bidirectional RNN using Tensorflow .I'm a bit confused about the implementation though .
I have built two functions that creates multilayered bidirectional RNN the first one works fine , but I'm not sure about the predictions its making, as it is performing as a unidirectional multilayered RNN . below is my implementation :
def encoding_layer_old(rnn_inputs, rnn_size, num_layers, keep_prob,
source_sequence_length, source_vocab_size,
encoding_embedding_size):
"""
Create encoding layer
:param rnn_inputs: Inputs for the RNN
:param rnn_size: RNN Size
:param num_layers: Number of layers
:param keep_prob: Dropout keep probability
:param source_sequence_length: a list of the lengths of each sequence in the batch
:param source_vocab_size: vocabulary size of source data
:param encoding_embedding_size: embedding size of source data
:return: tuple (RNN output, RNN state)
"""
# Encoder embedding
enc_embed = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size)
def create_cell_fw(rnn_size):
with tf.variable_scope("create_cell_fw"):
lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size,initializer=tf.random_uniform_initializer(-0.1,0.1,seed=2), reuse=False)
drop = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
return drop
def create_cell_bw(rnn_size):
with tf.variable_scope("create_cell_bw"):
lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size,initializer=tf.random_uniform_initializer(-0.1,0.1,seed=2), reuse=False)
drop = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
return drop
enc_cell_fw = tf.contrib.rnn.MultiRNNCell([create_cell_fw(rnn_size) for _ in range(num_layers)])
enc_cell_bw = tf.contrib.rnn.MultiRNNCell([create_cell_bw(rnn_size) for _ in range(num_layers)])
((encoder_fw_outputs, encoder_bw_outputs),(encoder_fw_final_state,encoder_bw_final_state)) = tf.nn.bidirectional_dynamic_rnn(enc_cell_fw,enc_cell_bw, enc_embed,
sequence_length=source_sequence_length,dtype=tf.float32)
encoder_outputs = tf.concat([encoder_fw_outputs, encoder_bw_outputs], 2)
print(encoder_outputs)
#encoder_final_state_c=[]#tf.Variable([num_layers] , dtype=tf.int32)
#encoder_final_state_h=[]#tf.Variable([num_layers] , dtype=tf.int32)
encoder_final_state = ()
for x in range((num_layers)):
encoder_final_state_c=tf.concat((encoder_fw_final_state[x].c, encoder_bw_final_state[x].c), 1)#tf.stack(tf.concat((encoder_fw_final_state[x].c, encoder_bw_final_state[x].c), 1))
encoder_final_state_h=tf.concat((encoder_fw_final_state[x].h, encoder_bw_final_state[x].h), 1)# tf.stack(tf.concat((encoder_fw_final_state[x].h, encoder_bw_final_state[x].h), 1))
encoder_final_state =encoder_final_state+ (tf.contrib.rnn.LSTMStateTuple(c=encoder_final_state_c,h=encoder_final_state_h),)
#encoder_final_state = tf.contrib.rnn.LSTMStateTuple(c=encoder_final_state_c,h=encoder_final_state_h)
print('before')
print(encoder_fw_final_state)
return encoder_outputs, encoder_final_state
I have found another implementation here as shown below :
t
def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob,
source_sequence_length, source_vocab_size,
encoding_embedding_size):
"""
Create encoding layer
:param rnn_inputs: Inputs for the RNN
:param rnn_size: RNN Size
:param num_layers: Number of layers
:param keep_prob: Dropout keep probability
:param source_sequence_length: a list of the lengths of each sequence in the batch
:param source_vocab_size: vocabulary size of source data
:param encoding_embedding_size: embedding size of source data
:return: tuple (RNN output, RNN state)
"""
# Encoder embedding
enc_embed = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size)
def create_cell_fw(rnn_size,x):
with tf.variable_scope("create_cell_fw_"+str(x)):
lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size,initializer=tf.random_uniform_initializer(-0.1,0.1,seed=2) , reuse=tf.AUTO_REUSE )
drop = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
return drop
def create_cell_bw(rnn_size,x):
with tf.variable_scope("create_cell_bw_"+str(x)):
lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size,initializer=tf.random_uniform_initializer(-0.1,0.1,seed=2) ,reuse=tf.AUTO_REUSE )
drop = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
return drop
enc_cell_fw = [create_cell_fw(rnn_size,x) for x in range(num_layers)]
enc_cell_bw = [create_cell_bw(rnn_size,x) for x in range(num_layers)]
output=enc_embed
for n in range(num_layers):
cell_fw = enc_cell_fw[n]
cell_bw = enc_cell_bw[n]
state_fw = cell_fw.zero_state(batch_size, tf.float32)
state_bw = cell_bw.zero_state(batch_size, tf.float32)
((output_fw, output_bw),(encoder_fw_final_state,encoder_bw_final_state))= tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,source_sequence_length,
state_fw, state_bw, dtype=tf.float32)
output = tf.concat([output_fw, output_bw], axis=2)
final_state=tf.concat([encoder_fw_final_state,encoder_bw_final_state], axis=2 )
return output , final_state
the problem with this implementation is that I get a shape error :
Trying to share variable bidirectional_rnn/fw/lstm_cell/kernel, but specified shape (168, 224) and found shape (256, 224).
it appears that other people have faced a similar when creating the RNN cells and the solution is to use the MultiRNNCell to create the layered cell . But if use MultiRNNCell I will not be able to use the second implementation since the multiRNNCell does not support indexing. thus I will not be ale to loop through the list of cells and create multiples RNNs .
I would really appreciate your help to guide me on this .
I'm using tensorflow 1.3
Both codes does seem a little overly complex. Anyway I tried a much simpler version of it and it worked. In your code, try after removing reuse=tf.AUTO_REUSE from create_cell_fw and create_cell_bw. Below is my simpler implementation.
def encoding_layer(input_data, num_layers, rnn_size, sequence_length, keep_prob):
output = input_data
for layer in range(num_layers):
with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):
cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)
cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)
outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw,
cell_bw,
output,
sequence_length,
dtype=tf.float32)
output = tf.concat(outputs,2)
state = tf.concat(states,2)
return output, state

TensorFlow dynamic RNN not training

Problem statement
I am trying to train a dynamic RNN in TensorFlow v1.0.1 on Linux RedHat 7.3 (problem also manifests on Windows 7), and no matter what I try, I get the exact same training and validation error at every epoch, i.e. my weights are not updating.
I appreciate any help you can offer.
Example
I tried to reduce this to a minimum example that shows my issue, but the minimum example is still pretty large. I based the network structure largely on this gist.
Network definition
import functools
import numpy as np
import tensorflow as tf
def lazy_property(function):
attribute = '_' + function.__name__
#property
#functools.wraps(function)
def wrapper(self):
if not hasattr(self, attribute):
setattr(self, attribute, function(self))
return getattr(self, attribute)
return wrapper
class MyNetwork:
"""
Class defining an RNN for labeling a time series.
"""
def __init__(self, data, target, num_hidden=64):
self.data = data
self.target = target
self._num_hidden = num_hidden
self._num_steps = int(self.target.get_shape()[1])
self._num_classes = int(self.target.get_shape()[2])
self._weight_and_bias() # create weight and bias tensors
self.prediction
self.error
self.optimize
#lazy_property
def prediction(self):
"""Defines the recurrent neural network prediction scheme."""
# Dynamic LSTM.
network = tf.contrib.rnn.BasicLSTMCell(self._num_hidden)
output, _ = tf.nn.dynamic_rnn(network, data, dtype=tf.float32)
# Flatten and apply same weights to all time steps.
output = tf.reshape(output, [-1, self._num_hidden])
prediction = tf.nn.softmax(tf.matmul(output, self.weight) + self.bias)
prediction = tf.reshape(prediction,
[-1, self._num_steps, self._num_classes])
return prediction
#lazy_property
def cost(self):
"""Defines the cost function for the network."""
cross_entropy = -tf.reduce_sum(self.target * tf.log(self.prediction),
axis=[1, 2])
cross_entropy = tf.reduce_mean(cross_entropy)
return cross_entropy
#lazy_property
def optimize(self):
"""Defines the optimization scheme."""
learning_rate = 0.003
optimizer = tf.train.RMSPropOptimizer(learning_rate)
return optimizer.minimize(self.cost)
#lazy_property
def error(self):
"""Defines a measure of prediction error."""
mistakes = tf.not_equal(tf.argmax(self.target, 2),
tf.argmax(self.prediction, 2))
return tf.reduce_mean(tf.cast(mistakes, tf.float32))
def _weight_and_bias(self):
"""Returns appropriately sized weight and bias tensors for the output layer."""
self.weight = tf.Variable(tf.truncated_normal(
[self._num_hidden, self._num_classes],
mean=0.0,
stddev=0.01,
dtype=tf.float32))
self.bias = tf.Variable(tf.constant(0.1, shape=[self._num_classes]))
Training
Here is my training process. The all_data class just holds my data and labels, and uses a batch generator class to spit out batches for training when I call all_data.train.next() and all_data.train_labels.next(). You can reproduce with any batch generation scheme you like, and I can add the code if you think it is relevant; I felt like this was getting too long as it is.
tf.reset_default_graph()
data = tf.placeholder(tf.float32,
[None, all_data.num_steps, all_data.num_features])
target = tf.placeholder(tf.float32,
[None, all_data.num_steps, all_data.num_outputs])
model = MyNetwork(data, target, NUM_HIDDEN)
print('Training the model...')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print('Initialized.')
for epoch in range(3):
print('Epoch {} |'.format(epoch), end='', flush=True)
for step in range(all_data.train_size // BATCH_SIZE):
# Generate the next training batch and train.
d = all_data.train.next()
t = all_data.train_labels.next()
sess.run(model.optimize,
feed_dict={data: d, target: t})
# Update the user periodically.
if step % summary_frequency == 0:
print('.', end='', flush=True)
# Show training and validation error at the end of each epoch.
print('|', flush=True)
train_error = sess.run(model.error,
feed_dict={data: d, target: t})
valid_error = sess.run(model.error,
feed_dict={
data: all_data.valid,
target: all_data.valid_labels
})
print('Training error: {}%'.format(100 * train_error))
print('Validation error: {}%'.format(100 * valid_error))
# Check testing error after everything.
test_error = sess.run(model.error,
feed_dict={
data: all_data.test,
target: all_data.test_labels
})
print('Testing error after {} epochs: {}%'.format(epoch + 1, 100 * test_error))
For a simple example, I generated random data and labels, where data has shape [num_samples, num_steps, num_features], and each sample has a single label associated with the whole thing:
data = np.random.rand(5000, 1000, 2)
labels = np.random.randint(low=0, high=2, size=[5000])
I then converted my labels to one-hot vectors and tiled them so that the resulting labels tensor was the same size as the data tensor.
Results
No matter what I do, I get results like this:
Training the model...
Initialized.
Epoch 0 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch 1 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch 2 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Testing error after 3 epochs: 49.000000953674316%
Where I have exactly the same error at every epoch. Even if my weights were randomly walking around this should change. For the example shown here, I used random data with random labels, so I do not expect much improvement, but I do expect some change, and I am getting the exact same results every epoch. When I do this with my actual data set, I get the same behavior.
Insight
I hesitate to include this in case it proves to be a red herring, but I believe that my optimizer is calculating cost function gradients of None. When I tried a different optimizer and attempted to clip the gradients, I went ahead and used tf.Print to output the gradients as well. The network crashed with an error that tf.Print could not handle None-type values.
Attempted fixes
I have tried the following things, and the problem persists in all cases:
Using different optimizers, e.g. AdamOptimizer with and without modifications to the gradients (clipping).
Adjusting batch sizes.
Using many more and many fewer hidden nodes.
Running for more epochs.
Initializing my weights with different values assigned to stddev.
Initializing my biases to zeros (using tf.zeros) and to different constants.
Using weights and biases that are defined within the prediction method and are not member variables of the class, and a _weight_and_bias method that is defined as a #staticmethod like in this gist.
Determining logits in the prediction function instead of softmax predictions, i.e. predictions = tf.matmul(output, self.weights) + self.bias, and then using tf.nn.softmax_cross_entropy_with_logits. This requires some reshaping because the method wants its labels and targets given with shape [batch_size, num_classes], so the cost method becomes:
(line added to get code to format...)
#lazy_property
def cost(self):
"""Defines the cost function for the network."""
targs = tf.reshape(self.target, [-1, self._num_classes])
logits = tf.reshape(self.predictions, [-1, self._num_classes])
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=targs, logits=logits)
cross_entropy = tf.reduce_mean(cross_entropy)
return cross_entropy
Changing which size dimension I leave as None when I create my placeholders as suggested in this answer, which requires a bit of rewriting in the network definition. Basically setting size = [all_data.batch_size, -1, all_data.num_features] and size = [all_data.batch_size, -1, all_data.num_classes].
Using tf.contrib.rnn.DropoutWrapper in my network definition and passing a dropout value set to 0.5 in training and 1.0 in validation and testing.
The problem went away when I used
output = tf.contrib.layers.flatten(output)
logits = tf.contrib.layers.fully_connected(output, some_size, activation_fn=None)
instead of flattening my network output, defining weights, and performing the tf.matmul(output, weight) + bias manually. I then used logits (instead of predictions in the question) in my cost function with
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=target,
logits=logits)
If you want to get the network prediction, you will still need to do prediction = tf.nn.softmax(logits).
I have no idea why this helped, but the network would not train even on random made-up data until I made these changes.

How to fix MatMul Op has type float64 that does not match type float32 TypeError?

I am trying to save Nueral Network weights into a file and then restoring those weights by initializing the network instead of random initialization. My code works fine with random initialization. But, when i initialize weights from file it is showing me an error TypeError: Input 'b' of 'MatMul' Op has type float64 that does not match type float32 of argument 'a'. I don't know how do i solve this issue.Here is my code:
Model Initialization
# Parameters
training_epochs = 5
batch_size = 64
display_step = 5
batch = tf.Variable(0, trainable=False)
regualarization = 0.008
# Network Parameters
n_hidden_1 = 300 # 1st layer num features
n_hidden_2 = 250 # 2nd layer num features
n_input = model.layer1_size # Vector input (sentence shape: 30*10)
n_classes = 12 # Sentence Category detection total classes (0-11 categories)
#History storing variables for plots
loss_history = []
train_acc_history = []
val_acc_history = []
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
Model parameters
#loading Weights
def weight_variable(fan_in, fan_out, filename):
stddev = np.sqrt(2.0/fan_in)
if (filename == ""):
initial = tf.random_normal([fan_in,fan_out], stddev=stddev)
else:
initial = np.loadtxt(filename)
print initial.shape
return tf.Variable(initial)
#loading Biases
def bias_variable(shape, filename):
if (filename == ""):
initial = tf.constant(0.1, shape=shape)
else:
initial = np.loadtxt(filename)
print initial.shape
return tf.Variable(initial)
# Create model
def multilayer_perceptron(_X, _weights, _biases):
layer_1 = tf.nn.relu(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1']))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']))
return tf.matmul(layer_2, weights['out']) + biases['out']
# Store layers weight & bias
weights = {
'h1': w2v_utils.weight_variable(n_input, n_hidden_1, filename="weights_h1.txt"),
'h2': w2v_utils.weight_variable(n_hidden_1, n_hidden_2, filename="weights_h2.txt"),
'out': w2v_utils.weight_variable(n_hidden_2, n_classes, filename="weights_out.txt")
}
biases = {
'b1': w2v_utils.bias_variable([n_hidden_1], filename="biases_b1.txt"),
'b2': w2v_utils.bias_variable([n_hidden_2], filename="biases_b2.txt"),
'out': w2v_utils.bias_variable([n_classes], filename="biases_out.txt")
}
# Define loss and optimizer
#learning rate
# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
learning_rate = tf.train.exponential_decay(
0.02*0.01, # Base learning rate. #0.002
batch * batch_size, # Current index into the dataset.
X_train.shape[0], # Decay step.
0.96, # Decay rate.
staircase=True)
# Construct model
pred = tf.nn.relu(multilayer_perceptron(x, weights, biases))
#L2 regularization
l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()])
#Softmax loss
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
#Total_cost
cost = cost+ (regualarization*0.5*l2_loss)
# Adam Optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost,global_step=batch)
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Initializing the variables
init = tf.initialize_all_variables()
print "Network Initialized!"
ERROR DETAILS
The tf.matmul() op does not perform automatic type conversions, so both of its inputs must have the same element type. The error message you are seeing indicates that you have a call to tf.matmul() where the first argument has type tf.float32, and the second argument has type tf.float64. You must convert one of the inputs to match the other, for example using tf.cast(x, tf.float32).
Looking at your code, I don't see anywhere that a tf.float64 tensor is explicitly created (the default dtype for floating-point values in the TensorFlow Python API—e.g. for tf.constant(37.0)—is tf.float32). I would guess that the errors are caused by the np.loadtxt(filename) calls, which might be loading an np.float64 array. You can explicitly change them to load np.float32 arrays (which are converted to tf.float32 tensors) as follows:
initial = np.loadtxt(filename).astype(np.float32)
Although It's an old question but I would like you include that I came across the same problem. I resolved it using dtype=tf.float64 for parameter initialization and for creating X and Y placeholders as well.
Here is the snap of my code.
X = tf.placeholder(shape=[n_x, None],dtype=tf.float64)
Y = tf.placeholder(shape=[n_y, None],dtype=tf.float64)
and
parameters['W' + str(l)] = tf.get_variable('W' + str(l), [layers_dims[l],layers_dims[l-1]],dtype=tf.float64, initializer = tf.contrib.layers.xavier_initializer(seed = 1))
parameters['b' + str(l)] = tf.get_variable('b' + str(l), [layers_dims[l],1],dtype=tf.float64, initializer = tf.zeros_initializer())
Declaring all placholders and parameters with float64 datatype will resolve this issue.
For Tensorflow 2
You can cast one of the tensor, like this for example:
_X = tf.cast(_X, dtype='float64')
You can get rid of this error by setting all layers to have a default dtype of float64:
tf.keras.backend.set_floatx('float64')

Categories