Related
if self.use_embedding:
input_x = Input(shape=(self.input_length,), name='embedding_input')
x = Embedding(self.input_dim, self.embedding_dim)(input_x)
else:
input_x = Input(shape=(self.input_length,self.input_dim), name='notes_input')
x = input_x
encoder_input_list = [input_x]
encoded = self._build_encoder(x)
self.encoder = Model(inputs=encoder_input_list, outputs=encoded)
encoded_input = Input(shape=(self.latent_rep_size,), name='encoded_input')
if self.use_embedding:
input_decoder_x = Input(shape=(self.output_dim,), name='embedding_input_decoder_start')
#decoder_x = Embedding(self.output_dim, self.output_dim, input_length=1)(input_decoder_x)
decoder_x = input_decoder_x
else:
input_decoder_x = Input(shape=(self.output_dim,), name='input_decoder_start')
decoder_x = input_decoder_x
autoencoder_decoder_input_list = [input_decoder_x, encoded]
decoder_input_list = [input_decoder_x, encoded_input]
autoencoder_input_list = [input_x, input_decoder_x]
autoencoder_output_list = []
if self.teacher_force:
ground_truth_input = Input(shape=(self.output_length, self.output_dim), name='ground_truth_input')
decoder_input_list.append(ground_truth_input)
autoencoder_decoder_input_list.append(ground_truth_input)
autoencoder_input_list.append(ground_truth_input)
else:
ground_truth_input = None
if self.history:
history_input = Input(shape=(self.latent_rep_size,), name='history_input')
decoder_input_list.append(history_input)
autoencoder_decoder_input_list.append(history_input)
autoencoder_input_list.append(history_input)
else:
history_input = None
decoded= self._build_decoder(decoder_x, encoded_input, ground_truth_input, history_input)
loss_list = []
loss_weights_list = []
sample_weight_modes = []
loss_weights_list.append(1.0)
sample_weight_modes.append('temporal')
loss_list.append(self.vae_loss)
metrics_list = ['accuracy']
decoder_output = decoded
self.decoder = Model(inputs=decoder_input_list, outputs=decoder_output, name='decoder')
decoder_final_output = self.decoder(autoencoder_decoder_input_list)
if isinstance(decoder_final_output, list):
autoencoder_output_list.extend(decoder_final_output)
else:
autoencoder_output_list.append(decoder_final_output)
self.autoencoder = Model(inputs=autoencoder_input_list, outputs=autoencoder_output_list, name='autoencoder')
self.autoencoder.compile(optimizer=self.optimizer,
loss=loss_list,
loss_weights=loss_weights_list,
sample_weight_mode=sample_weight_modes,
metrics=metrics_list)
Here is my code for a variational sequence to sequence model. I want to transform the decoder output during the training so that It gives me two other matrices and use two different loss functions for the new matrices. So the total loss should be like L_total = L_mat1 + L_mat2 + L_mat3 - KL loss. Is there a way to do it in Keras? I have the ground truth values for the transformed matrices. Any good soul would help me with this?
I've added an example on how to implement a matrix split operation to this colab notebook:
https://colab.research.google.com/drive/127DY9OUWasFQzM9G2AH4RQO8ryhSTJny?usp=sharing
The short answer is that you can use a lambda to split the output of any layer.
In my simple example:
inputs = layers.Input(shape=(2,))
d = layers.Dense(10)(inputs)
out1, out2 = layers.Lambda(lambda x: (x[:, :5], x[:, 5:]))(d)
model = keras.Model(inputs, [out1, out2])
model.compile(loss=['mse', 'mae'])
return model
The lambda layer splits the first 5 columns of the matrix into out1 and the last 5 cols into out2.
I am trying to put a dataset through a neural network. It is running on a Google Cloud virtual machine using a Tesla V100 GPU. However, before I can finish training a single epoch, I get an error message: "Cuda error: device side assert triggered". I think the problem may be in my data, but I have no idea where and I'm not sure what the problem is exactly (but I tested the code with a different dataset and it ran fine).
The thing that is odd is that the network actually runs for some time before triggering the error. I had it print every time it finished a batch and sometimes it finishes 60+ batches, sometimes 80+, I've even gotten it to finish as many as 140 batches (given the size of my data and my batches, there are 200 batches in each epoch). No matter how many it finishes, it eventually triggers this error and has not completed an epoch.
I tried setting CUDA_LAUNCH_BLOCKING = 1 and did not get any better error message. I of course made sure the neural network has the right number of input and output parameters (this is given because it works for the first however many batches). I also standardized the inputs. Some were really large and some were closes to zero so I normalized them to all fall in the range [-1,1]. Certainly the network should be able to handle that but it still causes a problem.
Here is my training loop which WORKS with a different data set. It is always the line "loss.backward()" that eventually triggers the error message.
CUDA_LAUNCH_BLOCKING = 1
start = time.time()
for epoch in range(1,6):
# Decrease learning rate at epoch 3 and 5
if epoch == 3 or epoch == 5:
lr = lr/3
# Setup optimizer
optimizer = optim.SGD(net.parameters(), lr=lr)
# Initialize stats to zeros to track network's progress
running_loss = 0
running_error = 0
num_batches = 0
# Shuffle indices to train randomly
shuffled_indices = torch.randperm(50000)
for count in range(0, 50000, bs):
# Clear gradient before each iteration
optimizer.zero_grad()
# Setup indices for minibatch
if (count + bs > 50000):
indices_list = shuffled_indices[count : ].tolist() + shuffled_indices[ : (count + bs) - 50000].tolist()
indices = torch.Tensor(indices_list)
else:
indices = shuffled_indices[count : count + bs]
# Create minibatch
minibatch_data = train_data[indices]
minibatch_label = train_label[indices]
# Send minibatch to gpu for training
minibatch_data = minibatch_data.to(device)
minibatch_label = minibatch_label.to(device)
temp = minibatch_data - mean
# Standardize entries with mean and std
inputs = ((minibatch_data - mean) / std).view(bs, 33)
# Begin tracking changes
inputs.requires_grad_()
# Forward inputs through the network
scores = net(inputs)
print(scores[:2])
print(minibatch_label)
# Compute loss
loss = criterion(scores, minibatch_label)
# Back propogate neural network
loss.backward()
# Do one step of stochastic gradient descent
optimizer.step()
# Update summary statistics
with torch.no_grad():
num_batches += 1
error = get_error(scores, minibatch_label)
running_error += error
running_loss += loss.item()
print("success: ", num_batches)
# At the end of each epoch, compute and print summary statistics
total_error = running_error / num_batches
avg_loss = running_loss / num_batches
print('Epoch: ', epoch)
print('Time: ', time.time(), '\t Loss: ', avg_loss, '\t Error (%): ', total_error * 100)
Here is my dataset formatting and normalizing:
train_list_updated = []
train_label_list = []
for entry in train_list[1:]:
entry[0] = string_to_int(entry[0])
entry[1] = handedness[entry[1]]
entry[2] = string_to_int(entry[2])
entry[3] = handedness[entry[3]]
entry[4] = string_to_int(entry[4])
entry[5] = string_to_int(entry[5])
entry[6] = string_to_int(entry[6])
entry[17] = entry[17].replace(':','')
entry[-3] = pitch_types[entry[-3]]
entry[-2] = pitch_outcomes[entry[-2]]
train_label_list.append(entry[-2])
del entry[-1]
del entry[-1]
del entry[-3]
train_list_updated.append(entry)
final_train_list = []
for entry in train_list_updated:
for index in range(len(entry)):
try:
entry[index] = float(entry[index])
except:
entry[index] = 0.
final_train_list.append(entry)
# Do the same for the test data
test_list_updated = []
for entry in test_list[1:]:
entry[0] = string_to_int(entry[0])
entry[1] = handedness[entry[1]]
entry[2] = string_to_int(entry[2])
entry[3] = handedness[entry[3]]
entry[4] = string_to_int(entry[4])
entry[5] = string_to_int(entry[5])
entry[6] = string_to_int(entry[6])
entry[17] = entry[17].replace(':','')
entry[-3] = pitch_types[entry[-3]]
del entry[-1]
del entry[-1]
del entry[-3]
test_list_updated.append(entry)
final_test_list = []
for entry in test_list_updated:
for index in range(len(entry)):
try:
entry[index] = float(entry[index])
except:
entry[index] = 0.
final_test_list.append(entry)
# Create tensors of test and train data
train_data = torch.tensor(final_train_list)
train_label = torch.tensor(train_label_list)
test_data = torch.tensor(final_test_list)
And normalizing:
max_indices = torch.argmax(train_data, dim = 0)
min_indices = torch.argmin(train_data, dim = 0)
max_values = []
min_values = []
for i in range(33):
max_idx = max_indices[i].item()
min_idx = min_indices[i].item()
max_val = train_data[max_idx][i]
min_val = train_data[min_idx][i]
max_values.append(max_val)
min_values.append(min_val)
max_values = torch.Tensor(max_values)
min_values = torch.Tensor(min_values)
ranges = max_values - min_values
min_values = min_values.view(1, 33)
min_values = torch.repeat_interleave(min_values, 582205, dim = 0)
ranges = ranges.view(1, 33)
ranges = torch.repeat_interleave(ranges, 582205, dim = 0)
train_data = train_data - min_values
train_data = 2 * (train_data / ranges)
train_data = train_data - 1
And here's my net (a lot is commented out since I thought maybe there was an issue with the gradient zeroing or something. A five layer neural network should definitely not cause a problem though):
"""
DEFINING A NEURAL NETWORK
"""
# Define a fifteen layer artificial neural network
class fifteen_layer_net(nn.Module):
def __init__(self):
super().__init__()
self.linear1 = nn.Linear(33, 200)
self.linear2 = nn.Linear(200, 250)
self.linear3 = nn.Linear(250, 300)
self.linear4 = nn.Linear(300, 350)
self.linear5 = nn.Linear(350, 7)
# self.linear6 = nn.Linear(400, 450)
# self.linear7 = nn.Linear(450, 500)
# self.linear8 = nn.Linear(500, 450)
# self.linear9 = nn.Linear(450, 400)
# self.linear10 = nn.Linear(400, 350)
# self.linear11 = nn.Linear(350, 300)
# self.linear12 = nn.Linear(300, 250)
# self.linear13 = nn.Linear(250, 200)
# self.linear14 = nn.Linear(200, 150)
# self.linear15 = nn.Linear(150, 7)
def forward(self, x):
x = self.linear1(x)
x = F.relu(x)
x = self.linear2(x)
x = F.relu(x)
x = self.linear3(x)
x = F.relu(x)
x = self.linear4(x)
x = F.relu(x)
scores = self.linear5(x)
# x = F.relu(x)
# x = self.linear6(x)
# x = F.relu(x)
# x = self.linear7(x)
# x = F.relu(x)
# x = self.linear8(x)
# x = F.relu(x)
# x = self.linear9(x)
# x = F.relu(x)
# x = self.linear10(x)
# x = F.relu(x)
# x = self.linear11(x)
# x = F.relu(x)
# x = self.linear12(x)
# x = F.relu(x)
# x = self.linear13(x)
# x = F.relu(x)
# x = self.linear14(x)
# x = F.relu(x)
# scores = self.linear15(x)
return scores
Network should output scores, compute a loss using cross entropy loss criterion, and then do one step of stochastic gradient descent. This works for awhile and then mysteriously breaks. I have no idea why.
Any help is greatly appreciated.
Thanks in advance.
I was also facing same issue, You can try few things :
Make sure there are no NaN, and inf values in your dataset.
set your batch size where number of samples % batchsize = 0
I am getting the error:
ValueError: Input arrays should have the same number of samples as target arrays. Found 6 input samples and 128 target samples.
when training with keras.
I am using a generator to generate a moving window over my timeseries that looks like this:
def generator_val(X,y,number_of_steps, batch_size=128, length=300, overview_steps = 300, shuffle=True, prediction = False):
while 1:
machine_idcs = np.concatenate(
[np.repeat(i, len(np.arange(length, Xi.shape[0], overview_steps))) for i, Xi in enumerate(X)])
# Generate all indicies for all possible steps.
step_idcs = np.concatenate(
[np.arange(length, Xi.shape[0], overview_steps) for Xi in X])
# We create a matrix of indices from which we sample for the mini
# batches.
examples = np.zeros((len(step_idcs), 2), dtype=np.int32)
examples[:, 0] = machine_idcs
examples[:, 1] = step_idcs
for i in range(0, examples.shape[0], batch_size):
# Get the machine and step indices of the mini-batch.
mbatch = examples[i:i + batch_size]
# Preinitialize the mini batch.
sequence = np.zeros(
(len(mbatch), length, X[0].shape[1]), np.float32)
mini_batch_y = np.zeros((batch_size,), dtype=np.float32)
for j in range(mbatch.shape[0]):
machine_idx = mbatch[j, 0]
step_idx = mbatch[j, 1]
sequence[j] = X[machine_idx][step_idx - length: step_idx]
mini_batch_y[j] = y[machine_idx][step_idx]
mini_batch_X = sequence
yield mini_batch_X, mini_batch_y
To start the training i am using the model.fit_generator:
model.fit_generator(generator(X, y, number_of_steps= number_of_steps,batch_size=128, length=300),
validation_data = generator(X_val, y_val, number_of_steps= number_of_steps_val,batch_size=128, length=300),
validation_steps = number_of_steps_val,
samples_per_epoch= number_of_steps,
epochs=2)
It is like if the generator is not using the infinity loop, or cant reset within one batch?
Is there a possibility to reset the generator after each epoch?
Update
###sample data
test_X = np.random.rand(len(X),10037, 24).astype(np.float32)
test_Y = np.random.randint(0,2,(len(X),10037)).astype(np.float32)
val_X = np.random.rand(len(X_val), 10037,24).astype(np.float32)
val_Y = np.random.randint(0,1,(len(X_val),10037)).astype(np.float32)
X = [item for item in test_X]
Y = [item for item in test_Y]
X_val = [item for item in val_X]
Y_val = [item for item in val_Y]
Workaround
I have found a solution for this Error, however i am not really happy with it, because it throws away some of the last sequences. The solution is, that there are as many sequences that the data set can be devided by the batch size.
The code for that looks like this:
window_steps = 50
number_of_samples = sum([X_[i].shape[0] for i in range(len(X_))])-len(X_)*300
number_of_steps = int(number_of_samples/128/window_steps)
number_of_samples_val = sum([X_val_[i].shape[0] for i in range(len(X_val_))])-len(X_val_)*300
number_of_steps_val = int(number_of_samples_val/128/window_steps)
def generator_val(X,y, number_of_steps, window_steps = window_steps, batch_size=128, length=300, overview_steps = 300, shuffle=True, prediction = False):
while 1:
machine_idcs = np.concatenate(
[np.repeat(i, len(np.arange(length, Xi.shape[0], window_steps))) for i, Xi in enumerate(X)])
# Generate all indicies for all possible steps.
step_idcs = np.concatenate(
[np.arange(length, Xi.shape[0], window_steps) for Xi in X])
# We create a matrix of indices from which we sample for the mini
# batches.
examples = np.zeros((number_of_steps*batch_size, 2), dtype=np.int32)
examples[:, 0] = machine_idcs[:number_of_steps*batch_size]
examples[:, 1] = step_idcs[:number_of_steps*batch_size]
for i in range(0, examples.shape[0], batch_size):
# Get the machine and step indices of the mini-batch.
mbatch = examples[i:i + batch_size]
# Preinitialize the mini batch.
sequence = np.zeros(
(len(mbatch), length, X[0].shape[1]), np.float32)
mini_batch_y = np.zeros((batch_size,), dtype=np.float32)
for j in range(mbatch.shape[0]):
machine_idx = mbatch[j, 0]
step_idx = mbatch[j, 1]
sequence[j] = X[machine_idx][step_idx - length: step_idx]
mini_batch_y[j] = y[machine_idx][step_idx]
mini_batch_X = sequence
yield mini_batch_X, mini_batch_y
model.fit_generator(generator_val(X, Y, number_of_steps= number_of_steps, window_steps = window_steps, batch_size=128, length=300),
validation_data = generator_val(X_val,Y_val, number_of_steps= number_of_steps_val,window_steps = window_steps,batch_size=128, length=300),
validation_steps = number_of_steps_val,
samples_per_epoch= number_of_steps,
epochs=2)
And here is a sample network for it:
input1 = Input(shape=(sequence_length, num_features))
h1 = LSTM(50)(input1)
prediction = Dense(1)(h1)
model = Model(inputs=[input1], outputs=[prediction])
loss = "binary_crossentropy"
optimizer = "adam"
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
So, do you know how to use all Sequences without throwing some away?
This program reads a text file RNNtext.txt, creates one-hot vector representation for all the data, trains the LSTM with the data and displays a bunch of sampled characters every now and then. However, even looking at the cost vs iterations graph shows that it's learning very very inefficiently. Honestly, the raw code (numpy) for the LSTM I have does a MUCH better job. It's not only faster but it produces mostly meaningful words. This produces gibberish only. Where is my mistake? I really am out of ideas and I can't seem to find where it is logically wrong.
import numpy as np
import random
import tensorflow as tf
import os
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
# Reading RNNtext.txt file
direc = os.path.dirname(os.path.realpath(__file__))
data = open(direc + "/RNNtext.txt", "r").read()
# Array of unique characters
chars = list(set(data))
num_hidden = 80
iterations = 1000
display_iteration = 100 # Sample when iteration % display_iteration == 0
sample_size = 250
batch_size = 120 # batch size or the number of time steps to unroll RNN
alpha = 0.01 # Learning rate
#Vocabulary and text file sizes
vocab_size = len(chars)
data_size = len(data)
# Bijection from a unique character to an index
char_to_ix = {}
# Bijection from an index to a unique character
ix_to_char = {}
for j in range(vocab_size):
char_to_ix[chars[j]] = j
ix_to_char[j] = chars[j]
# Transforming all characters to indices
data_ix = [char_to_ix[ch] for ch in data]
train_data = [] # This will contain one-hot vectors
for k in range(data_size):
# Representing each index/character by a one-hot vector
hot1 = np.zeros((vocab_size, 1))
hot1[data_ix[k]] = 1
train_data.append(hot1)
X = tf.placeholder(tf.float32, [None, vocab_size, 1]) #Number of examples, number of input, dimension of each input
target = tf.placeholder(tf.float32, [None, vocab_size])
cell = tf.contrib.rnn.LSTMCell(num_hidden,state_is_tuple=True)
output, _ = tf.nn.dynamic_rnn(cell, X, dtype = tf.float32)
output = tf.transpose(output, [1, 0, 2])
weight = tf.Variable(tf.random_normal([num_hidden, vocab_size]))
bias = tf.Variable(tf.constant(0.0, shape=[vocab_size]))
prediction = tf.matmul(output[-1], weight) + bias
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=target))
optimizer = tf.train.ProximalGradientDescentOptimizer(alpha)
minimize = optimizer.minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
ARR = [i for i in range(vocab_size)] # for extracting index by probabilities in np.random.choice()
ITER = []
COST = []
p = 0 # p will be iterated by batch_size steps
for i in range(iterations):
if p + batch_size >= data_size:
p = 0
# sweeping through data one-hot vectors
inp, out = train_data[p:p+batch_size], train_data[p+1:p+batch_size+1]
out = np.reshape(out, [-1, vocab_size])
c = sess.run(cost, {X: inp, target: out}) # calculating cost for plotting later
COST.append(c)
ITER.append(i)
sess.run(minimize, {X: inp, target: out})
# displaying sample_size number of characters with random seed
# doesn't affect training
if i % display_iteration == 0:
seed = np.random.randint(0, vocab_size)
CHARS = []
for j in range(sample_size):
x = np.zeros((vocab_size, 1))
x[seed] = 1
x = [x]
pred = sess.run(prediction, {X: x})[0]
pred = np.exp(pred) / np.sum(np.exp(pred))
pred = pred.ravel()
seed = np.random.choice(ARR, 1, p = pred)[0]
ch = ix_to_char[seed]
CHARS.append(ch)
TXT = ''.join(CHARS)
print("-------------------------------------------------")
print(TXT)
print("Iteration: ", str(i))
p += batch_size
sess.close()
plt.plot(ITER, COST)
plt.show()
EDIT: Added numpy code for comparison
import numpy as np
import matplotlib.pyplot as plt
import os
plt.style.use('fivethirtyeight')
direc = os.path.dirname(os.path.realpath(__file__))
readFile = open(direc + "\RNNtext.txt", 'r')
data = readFile.read()
readFile.close()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print(chars)
print("Vocabulary size: " + str(vocab_size))
char_to_ix = {}
ix_to_char = {}
for j in range(len(chars)):
char_to_ix[chars[j]] = j
ix_to_char[j] = chars[j]
hidden_size = 80
batch_size = 120
alpha = 0.1
sample_size = 250
iterations = 1000
display_iteration = 100
Wxh = np.random.randn(hidden_size, vocab_size)*0.01 # input to hidden
Whh = np.random.randn(hidden_size, hidden_size)*0.01 # hidden to hidden
Why = np.random.randn(vocab_size, hidden_size)*0.01 # hidden to output
bh = np.zeros((hidden_size, 1)) # hidden bias
by = np.zeros((vocab_size, 1)) # output bias
def sample(hid, seed, weights, sample_size):
X = np.zeros((vocab_size, 1))
X[seed] = 1
CHARS = []
ARR = [i for i in range(vocab_size)]
for t in range(sample_size):
hid = np.tanh(np.dot(Wxh, X) + np.dot(Whh, hid) + bh)
y = np.dot(Why, hid) + by
prob = np.exp(y) / np.sum(np.exp(y))
prob = prob.ravel()
ix = np.random.choice(ARR, 1, p=prob)[0]
CHARS.append(ix_to_char[ix])
X = np.zeros((vocab_size, 1))
X[ix] = 1
TXT = ''.join(CHARS)
return TXT
LOSS = []
ITER = []
p = 0
mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
mbh, mby = np.zeros_like(bh), np.zeros_like(by) # memory variables for Adagrad
smooth_loss = -np.log(1.0/vocab_size)*batch_size # loss at iteration 0
hprev = np.zeros((hidden_size,1))
for i in range(iterations): ## just time passing by
dWxh = np.zeros_like(Wxh)
dWhh = np.zeros_like(Whh)
dWhy = np.zeros_like(Why)
dbh = np.zeros_like(bh)
dby = np.zeros_like(by)
if p+batch_size >= len(data) or i == 0:
hprev = np.zeros((hidden_size,1))
p = 0
inputs = [char_to_ix[ch] for ch in data[p:p+batch_size]]
targets = [char_to_ix[ch] for ch in data[p+1:p+batch_size+1]]
HID = {}
X = {}
Y = {}
P = {}
HID[-1] = np.copy(hprev)
loss = 0
##======FORWARD======##
for t in range(len(inputs)):
X[t] = np.zeros((vocab_size,1))
X[t][inputs[t]] = 1
HID[t] = np.tanh(np.dot(Wxh, X[t]) + np.dot(Whh, HID[t-1]) + bh) # inp -> X
Y[t] = np.dot(Why, HID[t]) + by # tanh
P[t] = np.exp(Y[t]) / np.sum(np.exp(Y[t]))
loss += -np.log(P[t][targets[t]][0])
dhnext = np.zeros_like(HID[0])
##======BACKPROP======##
for t in reversed(range(len(inputs))):
dy = np.copy(P[t])
dy[targets[t]] -= 1
dh = (np.dot(Why.T, dy) + dhnext)*(1-HID[t]*HID[t])
dx = np.dot(Why.T, dy)*(1 - HID[t]**2)
dWhy += np.dot(dy, HID[t].T)
dWhh += np.dot(dh, HID[t-1].T)
dWxh += np.dot(dh, X[t].T)
dby += dy
dbh += dh
dhnext = np.dot(Whh.T, dh)
##=====================##
hprev = HID[-1]
smooth_loss = smooth_loss * 0.999 + loss * 0.001
for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
np.clip(dparam, -1, 1, out=dparam) # clip to mitigate exploding gradients
for param, dparam, mem in zip([Wxh, Whh, Why, bh, by],
[dWxh, dWhh, dWhy, dbh, dby],
[mWxh, mWhh, mWhy, mbh, mby]):
mem += dparam * dparam
param += -alpha * dparam / np.sqrt(mem + 1e-8) # Adagrad
if i % display_iteration == 0:
print(str(i))
weights = [Wxh,Whh,Why,bh,by]
seed = inputs[np.random.randint(0,len(inputs))]
TXT = sample(HID[-1], seed, weights, sample_size)
print("-----------------------------------------------")
print(TXT)
print("-----------------------------------------------")
with open(direc + "\RNNout.txt", 'w') as writeFile:
writeFile.write(TXT)
ITER.append(i)
LOSS.append(loss)
p += batch_size
best_text = sample(HID[-1], inputs[0], weights, sample_size)
plt.plot(ITER, LOSS, linewidth = 1)
plt.show()
writeFile.close()
Well, doh... looks like you are not re-using the state! How is LSTM (state machine) supposed to work properly if you are not maintaining the state?
To me this looks like a red flag:
output, _ = tf.nn.dynamic_rnn(cell, X, dtype = tf.float32)
the second output from tf.nn.dynamic_rnn is the latest state after the given sequence has been processed. Looks like you are explicitly ignoring it and not re-feeding it into each following iteration of training in sess.run(...) (and hence your dynamic_rnn doesn't have the initial_state parameter).
I would highly recommend changing that part of your code before looking any further.
Also, I don't know what your data looks like, but your feeding and batching strategy needs to be such as to make sense out of this whole state-passing exercise. Otherwise, once again, it will just produce gibberish.
With the information provided, I would suggest these two initial steps to try to improve the model.
Increase the number of iterations, Recurrent Neural Networks work differently than other deep arhitectures and could need maybe an additional order of magnitude in iteration number, to settle.
Play with the seeds: from my experience in order to get meaningful sequences can depend on the quality of the used seeds.
I`m just finished to write neural net with tensorflow
attached code :
import tensorflow as tensorFlow
import csv
# read data from csv
file = open('stub.csv')
reader = csv.reader(file)
temp = list(reader)
del temp[0]
# change data from string to float (Tensorflow)
# create data & goal lists
data = []
goal = []
for i in range(len(temp)):
data.append(map(float, temp[i]))
goal.append([data[i][6], 0.0])
del data[i][6]
# change lists to tuple
data = tuple(tuple(x) for x in data)
goal = tuple(goal)
# create training data and test data by 70-30
a = int(len(data) * 0.6) # training set 60%
b = int(len(data) * 0.8) # validation & test: each one is 20%
trainData = data[0:a] # 60%
validationData = data[b: len(data)]
testData = data[a: b] # 20%
trainGoal = goal[0:a]
validationGoal = goal[b:len(data)]
testGoal = goal[a: b]
numberOfLayers = 500
nodesLayer = []
# define the numbers of nodes in hidden layers
for i in range(numberOfLayers):
nodesLayer.append(500)
# define our goal class
classes = 2
batchSize = 2000
# x for input, y for output
sizeOfRow = len(data[0])
x = tensorFlow.placeholder(dtype= tensorFlow.float32, shape=[None, sizeOfRow])
y = tensorFlow.placeholder(dtype= tensorFlow.float32, shape=[None, classes])
hiddenLayers = []
layers = []
def neuralNetworkModel(x):
# first step: (input * weights) + bias, linear operation like y = ax + b
# each layer connection to other layer will represent by nodes(i) * nodes(i+1)
for i in range(0,numberOfLayers):
if i == 0:
hiddenLayers.append({"weights": tensorFlow.Variable(tensorFlow.random_normal([sizeOfRow, nodesLayer[i]])),
"biases": tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i]]))})
elif i > 0 and i < numberOfLayers-1:
hiddenLayers.append({"weights" : tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i], nodesLayer[i+1]])),
"biases" : tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i+1]]))})
else:
outputLayer = {"weights": tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i], classes])),
"biases": tensorFlow.Variable(tensorFlow.random_normal([classes]))}
# create the layers
for i in range(numberOfLayers):
if i == 0:
layers.append(tensorFlow.add(tensorFlow.matmul(x, hiddenLayers[i]["weights"]), hiddenLayers[i]["biases"]))
layers.append(tensorFlow.nn.relu(layers[i])) # pass values to activation function (i.e sigmoid, softmax) and add it to the layer
elif i >0 and i < numberOfLayers-1:
layers.append(tensorFlow.add(tensorFlow.matmul(layers[i-1], hiddenLayers[i]["weights"]), hiddenLayers[i]["biases"]))
layers.append(tensorFlow.nn.relu(layers[i]))
output = tensorFlow.matmul(layers[numberOfLayers-1], outputLayer["weights"]) + outputLayer["biases"]
return output
def neuralNetworkTrain(data, x, y):
prediction = neuralNetworkModel(x)
# using softmax function, normalize values to range(0,1)
cost = tensorFlow.reduce_mean(tensorFlow.nn.softmax_cross_entropy_with_logits(prediction, y))
# minimize the cost function
# using AdamOptimizer algorithm
optimizer = tensorFlow.train.AdadeltaOptimizer().minimize(cost)
epochs = 2 # feed machine forward + backpropagation = epoch
# build sessions and train the model
with tensorFlow.Session() as sess:
sess.run(tensorFlow.initialize_all_variables())
for epoch in range(epochs):
epochLoss = 0
i = 0
for _ in range(int(len(data) / batchSize)):
ex, ey = nextBatch(i) # takes 500 examples
i += 1
feedDict = {x :ex, y:ey }
_, cos = sess.run([optimizer,cost], feed_dict= feedDict) # start session to optimize the cost function
epochLoss += cos
print("Epoch", epoch + 1, "completed out of", epochs, "loss:", epochLoss)
correct = tensorFlow.equal(tensorFlow.argmax(prediction,1), tensorFlow.argmax(y, 1))
accuracy = tensorFlow.reduce_mean(tensorFlow.cast(correct, "float"))
print("Accuracy:", accuracy.eval({ x: trainData, y:trainGoal}))
# takes 500 examples each iteration
def nextBatch(num):
# Return the next `batch_size` examples from this data set.
# case: using our data & batch size
num *= batchSize
if num < (len(data) - batchSize):
return data[num: num+batchSize], goal[num: num +batchSize]
neuralNetworkTrain(trainData, x, y)
each epoch (iteration) I`ve got the value of loss function and all good.
now I want to try it on my validation/test set.
Someone now what should I do exactly?
Thanks
If you want to get predictions on the trained data you can simply put something like:
tf_p = tf.nn.softmax(prediction)
...In your graph, having loaded your test data into x_test. Then evaluate predictions with:
[p] = session.run([tf_p], feed_dict = {
x : x_test,
y : y_test
}
)
at the end of your neuralNetworkTrain method, and you should end up having them in p.
...Or using tf.train.Saver:
Alternatively you could use tf.train.Saver object to save and restore (and optionally persist) your model. In order to do that you create a saver after you initialise all variables:
...
tf.initialize_all_variables().run()
saver = tf.train.Saver()
...
And then save it once you're done training, at the end of your neuralNetworkTrain method:
...
model_path = saver.save(sess)
You then build a new graph for evaluation, and restore the model before running it on your test data:
# Load test dataset into X_test
...
tf_x = tf.constant(X_test)
tf_p = tf.nn.softmax(neuralNetworkModel(tf_x))
with tf.Session() as session:
tf.initialize_all_variables().run()
saver.restore(session, model_path)
p = tf_p.eval()
And, once again, p should contain softmax activations for your test dataset.
(I haven't actually run this code I'm afraid, but it should give you an idea of how to implement it.)