Variant Word2Vec to MetaWord2Vec - python

I'm trying to build a recommender system through a SkipGram Model, i built a variant of Word2vec adapted to my data (anime) where i use the anime watched list of a users like
data =[]
for key, chunk in ratings.groupby('user_id'):
data.append(chunk.sort_values('user_rating').anime_id.values)
my ratings dataframe has 30k unique users 12k unique anime , user rating and some metadata likes (anime_tags, year, numbers of episodes, synopsis ...etc.)
I'm trying to find a way to add that metadata in my "Anime2Vec" model.
May I need to build different model then (concat/blend/sum ???) the embeddings
about my baseline model it's construct like that i said previously watched session of users then i use this functions :
def get_batch(seq, rel_window_size=.2):
"""get a batch of data for a single user sequence.
data will be chosen randomly from within a anime
window relative to the size of the entire sequence.
with the initial point chosen at random.
"""
window = math.floor(len(seq) * rel_window_size) +1
x = []
y = []
i = random.randint(0, len(seq)-1)
for _ in range(batch_size):
x.append(seq[i])
moves = list(range(max(i-window, 0), i)) + list(range(i+1, min((i+window + 1, len(seq)))))
j = random.choice(moves)
y.append(seq[j])
i = j
return x, y
def gen_batch(data, batch_size, rel_window_size=.2, shuffle=True):
"""make a generator for data using each 'user' as a document."""
while True:
for doc_index, doc in enumerate(data):
x, y = get_batch(doc, rel_window_size)
yield np.array(x, 'int32'), np.expand_dims(np.array(y,'int32'), 1)
if shuffle:
random.shuffle(data)
Then i took few sample for try if my model works fine
val_anime = ['Naruto',
'Bleach',
"Clannad",
"Air",
"Code Geass Hangyaku no Lelouch",
"Overlord",
"Mononoke Hime",
"Hajime no Ippo",
"Fullmetal Alchemist"]
val_ids = [reverse_dictionary[s] for s in val_anime]
val_set = np.array(val_ids, 'int32')
Then i define model and data
# valid_examples = np.array(random.sample(range(1, valid_window+1), valid_size))# valid_e
vocab_size = len(dictionary)
batch_size = 128
embedding_size = 50 # Dimension of the embedding vector.
val_size =len(val_set)
lr = 1.
model = SkipGramModel(vocab_size, val_set, embedding_dims=embedding_size,
batch_size=batch_size, sample_factor=1., lr=lr, optimizer= tf.train.AdagradOptimizer)
# create the data generator
data_gen = gen_batch(data, batch_size, rel_window_size = .2, shuffle=True)
Parameters for training.
steps_per_cycle = (len(data) // batch_size)
n_iter = 300 # number of full cycles through data
num_steps = int(n_iter * steps_per_cycle)
lstep = steps_per_cycle # steps to show average loss
vstep = steps_per_cycle * 20 # steps to show val data
I define a function for nearest items
def show_n_similar(item, sim, k=5):
item_name = dictionary[item]
nearest = (-sim).argsort()[1:k + 1]
log_str = '\nNearest to - {}:\n'.format(item_name)
for k in range(k):
log_str += '\t{},\n'.format(dictionary[nearest[k]])
print(log_str)
losses = []
Finally i train model to find nearest neighbors to my previously defined list
with tf.Session() as sess:
sess.run(model.init_op())
print('initialized ...')
# set so we can watch average loss during training
for step in tqdm(range(num_steps)):
batch_inputs, batch_labels = next(data_gen)
feed_dict = {model.x: batch_inputs,
model.y: batch_labels}
_, loss_ = sess.run([model.optimize, model.loss], feed_dict=feed_dict)
losses.append(loss_) # for plotting
sim = model.similarity.eval()
for i in range(len(val_ids)):
show_n_similar(model.val_data[i], sim[i], k=5)
out_embeddings = model.normalized_embeddings.eval()
I would like some advice for add my metadata to the model, in a first try just add anime_tags.
My anime_tags is represented by a list of keys words like that :
anime_id | tags
8 | action, historical, sci-fi, comedy
12 | comedy, romance

Related

Use generator in TensorFlow/Keras to fit when the model gets 2 inputs

I want to train a model that uses an extra output layer to compute the loss (ArcFace) so the model gets two inputs: the features and the true label: [X, y].
So far I did with the all data loaded at once by the following code:
print("Unzipping DataSet to NumPy arrays")
x_train, y_train = dataset2arrays(train_ds, labels)
x_val, y_val = dataset2arrays(val_ds, val_labels)
model.fit(x=[x_train, y_train],
y=y_train,
batch_size=10,
validation_data=[[x_val, y_val], y_val],
n_epochs=20,
)
Now, this was done with "debugging" data, which is small (< 100 samples).
The real training data is very large (> 300 GB of files) so I can't load all the data at once.
Therefore I need to use a generator. In TensorFlow 2.8 a generator is implemented by inheriting from Keras Sequence class. The following generator is based on the example in https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
from os import path
import numpy as np
from tensorflow.keras.utils import Sequence
from keras.preprocessing.sequence import pad_sequences
from pre_processing import load_data
class DataGenerator(Sequence):
"""Generates data for Keras
Sequence based data generator. Suitable for building data generator for training and prediction.
"""
def __init__(self, list_IDs, labels, n_classes, input_path, target_path,
to_fit=True, batch_size=20, shuffle=True):
"""Initialization
:param list_IDs: list of all 'label' ids to use in the generator
:param to_fit: True to return X and y, False to return X only
:param batch_size: batch size at each iteration
:param shuffle: True to shuffle label indexes after every epoch
"""
self.input_path = input_path
self.target_path = target_path
self.list_IDs = list_IDs
self.labels = labels
self.n_classes = n_classes
self.to_fit = to_fit
self.batch_size = batch_size
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
"""Denotes the number of batches per epoch
:return: number of batches per epoch
"""
return int(np.floor(len(self.list_IDs) / self.batch_size))
def __getitem__(self, index):
"""Generate one batch of data
:param index: index of the batch
:return: X and y when fitting. X only when predicting
"""
# Generate indexes of the batch
indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]
# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
list_labels_temp = [self.labels[k] for k in indexes]
# Generate data
X = self._generate_X(list_IDs_temp)
if self.to_fit:
y = self._generate_y(list_labels_temp)
# print(indexes) # for debugging
return [X], y
else:
return [X]
def on_epoch_end(self):
"""
Updates indexes after each epoch
"""
self.indexes = np.arange(len(self.list_IDs))
if self.shuffle:
np.random.shuffle(self.indexes)
def _generate_X(self, list_IDs_temp):
"""Generates data containing batch_size images
:param list_IDs_temp: list of label ids to load
:return: batch of images
"""
# Initialization
X = []
# Generate data
for i, ID in enumerate(list_IDs_temp):
# Store sample
# temp = self._load_input(self.input_path, ID)
temp = load_data(path.join(self.input_path, ID))
X.append(temp)
X = pad_sequences(X, value=0, padding='post')
return X
def _generate_y(self, list_IDs_temp):
"""Generates data containing batch_size masks
:param list_IDs_temp: list of label ids to load
:return: batch if masks
"""
# TODO: modify
y = []
# Generate data
for i, ID in enumerate(list_IDs_temp):
# Store sample
# y.append(self._load_target(self.target_path, ID))
y.append(ID)
# y = pad_sequences(y, value=0, padding='post')
return y
The most important part is:
if self.to_fit:
y = self._generate_y(list_labels_temp)
print(indexes)
# Option 1:
return [X], y
# Option 2
return tuple([[X], [y]])
# Option 3
return tuple(((X), (y)))
# Option 4
Xy = []
for i in range(len(y)):
Xy.append([X[i,:,:], y[i]])
return Xy
# Option 5
Xy = []
for i in range(len(y)):
Xy.append(X[i,:,:])
return tuple((Xy, y))
else:
return [X]
With all (or most) of the options I tried as the output which the generator returns.
The new fit is:
history = model.fit(gen,
callbacks=callbacks,
batch_size = 10,
epochs =20 ,
# validation_data = tuple(validation_data),
shuffle=True,
verbose = 1, # display training on the terminal
)
With option 1 I get the following error:
ValueError: Layer "ForTraining" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None) dtype=int32>]
The other options don't work as well (most return the same error as above).
So what am I doing wrong?
So how to make my generator return correctly the tensor needed for training (features X and their labels y on batch-size b)?
The following link may be relevant: https://github.com/pierluigiferrari/ssd_keras/issues/380
Note that I am running TensorFlow 2.8 on Python 3.9.5 on a laptop with Windows 10 and without GPU (the real training on the full dataset will take place on a much stronger machine. This laptop is used only for debugging).
Solution:
The following solves the problem and now the training is running (when I comment out validation monitoring and callbacks):
def __getitem__(self, index):
"""Generate one batch of data
:param index: index of the batch
:return: X and y when fitting. X only when predicting
"""
# Generate indexes of the batch
indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]
# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
list_labels_temp = [self.labels[k] for k in indexes]
# Generate data
X = self._generate_X(list_IDs_temp)
if self.to_fit:
# Training/Fit case
y = self._generate_y(list_labels_temp)
y = np.array(y).reshape((len(y),1))
return (X, y), y
else:
# Prediction only
return [X]
How do I use the generator for validation data? I created another generator (identical to the train generator) and put it in "validation data" and the training procedure was completed successfully (without throwing an exception). It seems this is the solution to the problem.
The correct modification is:
def __getitem__(self, index):
"""Generate one batch of data
:param index: index of the batch
:return: X and y when fitting. X only when predicting
"""
# Generate indexes of the batch
indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]
# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
list_labels_temp = [self.labels[k] for k in indexes]
# Generate data
X = self._generate_X(list_IDs_temp)
if self.to_fit:
# Training/Fit case
y = self._generate_y(list_labels_temp)
y = np.array(y).reshape((len(y),1))
return (X, y), y
else:
# Prediction only
return [X]

How to split the data into training and testing data

Hi so right now I got data load code and I'm not sure how would i split it into training and testing data. can anyone give me suggestion how to do it this is my data load code.
def __init__(self, root, specific_folder, img_extension, preprocessing_method=None, crop_size=(96, 112),train = True):
"""
Dataloader of the LFW dataset.
root: path to the dataset to be used.
specific_folder: specific folder inside the same dataset.
img_extension: extension of the dataset images.
preprocessing_method: string with the name of the preprocessing method.
crop_size: retrieval network specific crop size.
"""
self.preprocessing_method = preprocessing_method
self.crop_size = crop_size
self.imgl_list = []
self.classes = []
self.people = []
self.model_align = None
self.arr = []
# read the file with the names and the number of images of each people in the dataset
with open(os.path.join(root, 'people.txt')) as f:
people = f.read().splitlines()[1:]
# get only the people that have more than 20 images
for p in people:
p = p.split('\t')
if len(p) > 1:
if int(p[1]) >= 20:
for num_img in range(1, int(p[1]) + 1):
self.imgl_list.append(os.path.join(root, specific_folder, p[0], p[0] + '_' +
'{:04}'.format(num_img) + '.' + img_extension))
self.classes.append(p[0])
self.people.append(p[0])
le = preprocessing.LabelEncoder()
self.classes = le.fit_transform(self.classes)
print(len(self.imgl_list), len(self.classes), len(self.people))
def __getitem__(self, index):
imgl = imageio.imread(self.imgl_list[index])
cl = self.classes[index]
# if image is grayscale, transform into rgb by repeating the image 3 times
if len(imgl.shape) == 2:
imgl = np.stack([imgl] * 3, 2)
imgl, bb = preprocess(imgl, self.preprocessing_method, crop_size=self.crop_size,
is_processing_dataset=True, return_only_largest_bb=True, execute_default=True)
# append image with its reverse
imglist = [imgl, imgl[:, ::-1, :]]
# normalization
for i in range(len(imglist)):
imglist[i] = (imglist[i] - 127.5) / 128.0
imglist[i] = imglist[i].transpose(2, 0, 1)
imgs = [torch.from_numpy(i).float() for i in imglist]
return imgs, cl, imgl, bb, self.imgl_list[index], self.people[index]
def __len__(self):
return len(self.imgl_list)
I need to split the data in there into 20% and 80% data so I can test my module it been almost a week now and still have no idea at all how to do it would be appreciate so much if anyone can help:
In general using PyTorch:
import torch
import numpy as np
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler
dataset = yourdatahere
batch_size = 16 #change to whatever you'd like it to be
test_split = .2
shuffle_dataset = True
random_seed= 42
# Creating data indices for training and validation splits:
dataset_size = len(dataset)
indices = list(range(dataset_size))
split = int(np.floor(test_split * dataset_size))
if shuffle_dataset :
np.random.seed(random_seed)
np.random.shuffle(indices)
train_indices, test_indices = indices[split:], indices[:split]
# Creating PT data samplers and loaders:
train_sampler = SubsetRandomSampler(train_indices)
test_sampler = SubsetRandomSampler(test_indices)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
sampler=train_sampler)
test_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
sampler=test_sampler)
# Usage Example:
num_epochs = 10
for epoch in range(num_epochs):
# Train:
for batch_index, (faces, labels) in enumerate(train_loader):
# ...
Please note that you should also split your training data into training + validation data. You may use the same logic from above to do so.

I want to use the GPU instead of CPU while performing computations using PyTorch

I'm trying to switch the stress from CPU to GPU as my trusty RTX2070 can do it better than the CPU but I keep running into this problem and I'm quite new to AI so if you are kind enough to share some insights with me regarding any potential solution, it would be highly appreciated, thank you.
**I'm using PyTorch
Here's the code that I'm using :
# to measure run-time
# for csv dataset
import os
# to shuffle data
import random
# to get the alphabet
import string
# import statements for iterating over csv file
import cv2
# for plotting
import matplotlib.pyplot as plt
import numpy as np
# pytorch stuff
import torch
import torch.nn as nn
from PIL import Image
# generate the targets
# the targets are one hot encoding vectors
# print(torch.cuda.is_available())
nvcc_args = [
'-gencode', 'arch=compute_30,code=sm_30',
'-gencode', 'arch=compute_35,code=sm_35',
'-gencode', 'arch=compute_37,code=sm_37',
'-gencode', 'arch=compute_50,code=sm_50',
'-gencode', 'arch=compute_52,code=sm_52',
'-gencode', 'arch=compute_60,code=sm_60',
'-gencode', 'arch=compute_61,code=sm_61',
'-gencode', 'arch=compute_70,code=sm_70',
'-gencode', 'arch=compute_75,code=sm_75'
]
alphabet = list(string.ascii_lowercase)
target = {}
# Initalize a target dict that has letters as its keys and empty one-hot encoding vectors of size 37 as its values
for letter in alphabet:
target[letter] = [0] * 37
# Do the one-hot encoding for each letter now
curr_pos = 0
for curr_letter in target.keys():
target[curr_letter][curr_pos] = 1
curr_pos += 1
# extra symbols
symbols = ["space", "number", "period", "comma", "colon", "apostrophe", "hyphen", "semicolon", "question",
"exclamation", "capitalize"]
# create vectors
for curr_symbol in symbols:
target[curr_symbol] = [0] * 37
# create one-hot encoding vectors
for curr_symbol in symbols:
target[curr_symbol][curr_pos] = 1
curr_pos += 1
# collect all data from the csv file
data = []
for tgt in os.listdir("dataset"):
if not tgt == ".DS_Store":
for folder in os.listdir("dataset/" + tgt + "/Uploaded"):
if not folder == ".DS_Store":
for filename in os.listdir("dataset/" + tgt + "/Uploaded/" + folder):
if not filename == ".DS_Store":
# store the image and label
picture = []
curr_target = target[tgt]
image = Image.open("dataset/" + tgt + "/Uploaded/" + folder + "/" + filename)
image = image.convert('RGB')
# f.show()
image = np.array(image)
# resize image to 28x28x3
image = cv2.resize(image, (28, 28))
# normalize to 0-1
image = image.astype(np.float32) / 255.0
image = torch.from_numpy(image)
picture.append(image)
# convert the target to a long tensor
curr_target = torch.Tensor([curr_target])
picture.append(curr_target)
# append the current image & target
data.append(picture)
# create a dictionary of all the characters
characters = alphabet + symbols
index2char = {}
number = 0
for char in characters:
index2char[number] = char
number += 1
# find the number of each character in a dataset
def num_chars(dataset, index2char):
chars = {}
for _, label in dataset:
char = index2char[int(torch.argmax(label))]
# update
if char in chars:
chars[char] += 1
# initialize
else:
chars[char] = 1
return chars
# Create dataloader objects
# shuffle all the data
random.shuffle(data)
# batch sizes for train, test, and validation
batch_size_train = 30
batch_size_test = 30
batch_size_validation = 30
# splitting data to get training, test, and validation sets
# change once get more data
# 1600 for train
train_dataset = data[:22000]
# test has 212
test_dataset = data[22000:24400]
# validation has 212
validation_dataset = data[24400:]
# create the dataloader objects
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size_train, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size_test, shuffle=False)
validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset, batch_size=batch_size_validation,
shuffle=True)
# to check if a dataset is missing a char
test_chars = num_chars(test_dataset, index2char)
num = 0
for char in characters:
if char in test_chars:
num += 1
else:
break
print(num)
class CNN(nn.Module):
def __init__(self):
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
super(CNN, self).__init__()
self.block1 = nn.Sequential(
# 3x28x28
nn.Conv2d(in_channels=3,
out_channels=16,
kernel_size=5,
stride=1,
padding=2),
# batch normalization
# nn.BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True),
# 16x28x28
nn.MaxPool2d(kernel_size=2),
# 16x14x14
nn.LeakyReLU()
)
# 16x14x14
self.block2 = nn.Sequential(
nn.Conv2d(in_channels=16,
out_channels=32,
kernel_size=5,
stride=1,
padding=2),
# batch normalization
# nn.BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True),
# 32x14x14
nn.MaxPool2d(kernel_size=2),
# 32x7x7
nn.LeakyReLU()
)
# linearly
self.block3 = nn.Sequential(
nn.Linear(32 * 7 * 7, 100),
# batch normalization
# nn.BatchNorm1d(100),
nn.LeakyReLU(),
nn.Linear(100, 37)
)
# 1x37
def forward(self, x):
out = self.block1(x)
out = self.block2(out)
# flatten the dataset
out = out.view(-1, 32 * 7 * 7)
out = self.block3(out)
return out
# convolutional neural network model
model = CNN()
model.cuda()
# print summary of the neural network model to check if everything is fine.
print(model)
print("# parameter: ", sum([param.nelement() for param in model.parameters()]))
# setting the learning rate
learning_rate = 1e-4
# Using a variable to store the cross entropy method
criterion = nn.CrossEntropyLoss()
# Using a variable to store the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# list of all train_losses
train_losses = []
# list of all validation losses
validation_losses = []
# for loop that iterates over all the epochs
num_epochs = 20
for epoch in range(num_epochs):
# variables to store/keep track of the loss and number of iterations
train_loss = 0
num_iter_train = 0
# train the model
model.train()
# Iterate over train_loader
for i, (images, labels) in enumerate(train_loader):
# need to permute so that the images are of size 3x28x28
# essential to be able to feed images into the model
images = images.permute(0, 3, 1, 2)
# Zero the gradient buffer
# resets the gradient after each epoch so that the gradients don't add up
optimizer.zero_grad()
# Forward, get output
outputs = model(images)
# convert the labels from one hot encoding vectors into integer values
labels = labels.view(-1, 37)
y_true = torch.argmax(labels, 1)
# calculate training loss
loss = criterion(outputs, y_true)
# Backward (computes all the gradients)
loss.backward()
# Optimize
# loops through all parameters and updates weights by using the gradients
# takes steps backwards to optimize (to reach the minimum weight)
optimizer.step()
# update the training loss and number of iterations
train_loss += loss.data
num_iter_train += 1
print('Epoch: {}'.format(epoch + 1))
print('Training Loss: {:.4f}'.format(train_loss / num_iter_train))
# append training loss over all the epochs
train_losses.append(train_loss / num_iter_train)
# evaluate the model
model.eval()
# variables to store/keep track of the loss and number of iterations
validation_loss = 0
num_iter_validation = 0
# Iterate over validation_loader
for i, (images, labels) in enumerate(validation_loader):
# need to permute so that the images are of size 3x28x28
# essential to be able to feed images into the model
images = images.permute(0, 3, 1, 2)
# Forward, get output
outputs = model(images)
# convert the labels from one hot encoding vectors to integer values
labels = labels.view(-1, 37)
y_true = torch.argmax(labels, 1)
# calculate the validation loss
loss = criterion(outputs, y_true)
# update the training loss and number of iterations
validation_loss += loss.data
num_iter_validation += 1
print('Validation Loss: {:.4f}'.format(validation_loss / num_iter_validation))
# append all validation_losses over all the epochs
validation_losses.append(validation_loss / num_iter_validation)
num_iter_test = 0
correct = 0
# Iterate over test_loader
for images, labels in test_loader:
# need to permute so that the images are of size 3x28x28
# essential to be able to feed images into the model
images = images.permute(0, 3, 1, 2)
# Forward
outputs = model(images)
# convert the labels from one hot encoding vectors into integer values
labels = labels.view(-1, 37)
y_true = torch.argmax(labels, 1)
# find the index of the prediction
y_pred = torch.argmax(outputs, 1).type('torch.FloatTensor')
# convert to FloatTensor
y_true = y_true.type('torch.FloatTensor')
# find the mean difference of the comparisons
correct += torch.sum(torch.eq(y_true, y_pred).type('torch.FloatTensor'))
print('Accuracy on the test set: {:.4f}%'.format(correct / len(test_dataset) * 100))
print()
# learning curve function
def plot_learning_curve(train_losses, validation_losses):
# plot the training and validation losses
plt.ylabel('Loss')
plt.xlabel('Number of Epochs')
plt.plot(train_losses, label="training")
plt.plot(validation_losses, label="validation")
plt.legend(loc=1)
# plot the learning curve
plt.title("Learning Curve (Loss vs Number of Epochs)")
plot_learning_curve(train_losses, validation_losses)
torch.save(model.state_dict(), "model1.pth")
I'm also using a trusty RTX 2070 and this is how I do GPU acceleration (for 1 GPU):
cuda_ = "cuda:0"
device = torch.device(cuda_ if torch.cuda.is_available() else "cpu")
model = CNN()
model.to(device)
This is the most up-to-date and recommended way to do GPU acceleration, as it gives more flexibility (don't need to amend code even when GPU isn't available). You would do the same to pass your images into the GPU vram, via images = images.to(device).

Cuda error: device side assert triggered - only after certain number of batches

I am trying to put a dataset through a neural network. It is running on a Google Cloud virtual machine using a Tesla V100 GPU. However, before I can finish training a single epoch, I get an error message: "Cuda error: device side assert triggered". I think the problem may be in my data, but I have no idea where and I'm not sure what the problem is exactly (but I tested the code with a different dataset and it ran fine).
The thing that is odd is that the network actually runs for some time before triggering the error. I had it print every time it finished a batch and sometimes it finishes 60+ batches, sometimes 80+, I've even gotten it to finish as many as 140 batches (given the size of my data and my batches, there are 200 batches in each epoch). No matter how many it finishes, it eventually triggers this error and has not completed an epoch.
I tried setting CUDA_LAUNCH_BLOCKING = 1 and did not get any better error message. I of course made sure the neural network has the right number of input and output parameters (this is given because it works for the first however many batches). I also standardized the inputs. Some were really large and some were closes to zero so I normalized them to all fall in the range [-1,1]. Certainly the network should be able to handle that but it still causes a problem.
Here is my training loop which WORKS with a different data set. It is always the line "loss.backward()" that eventually triggers the error message.
CUDA_LAUNCH_BLOCKING = 1
start = time.time()
for epoch in range(1,6):
# Decrease learning rate at epoch 3 and 5
if epoch == 3 or epoch == 5:
lr = lr/3
# Setup optimizer
optimizer = optim.SGD(net.parameters(), lr=lr)
# Initialize stats to zeros to track network's progress
running_loss = 0
running_error = 0
num_batches = 0
# Shuffle indices to train randomly
shuffled_indices = torch.randperm(50000)
for count in range(0, 50000, bs):
# Clear gradient before each iteration
optimizer.zero_grad()
# Setup indices for minibatch
if (count + bs > 50000):
indices_list = shuffled_indices[count : ].tolist() + shuffled_indices[ : (count + bs) - 50000].tolist()
indices = torch.Tensor(indices_list)
else:
indices = shuffled_indices[count : count + bs]
# Create minibatch
minibatch_data = train_data[indices]
minibatch_label = train_label[indices]
# Send minibatch to gpu for training
minibatch_data = minibatch_data.to(device)
minibatch_label = minibatch_label.to(device)
temp = minibatch_data - mean
# Standardize entries with mean and std
inputs = ((minibatch_data - mean) / std).view(bs, 33)
# Begin tracking changes
inputs.requires_grad_()
# Forward inputs through the network
scores = net(inputs)
print(scores[:2])
print(minibatch_label)
# Compute loss
loss = criterion(scores, minibatch_label)
# Back propogate neural network
loss.backward()
# Do one step of stochastic gradient descent
optimizer.step()
# Update summary statistics
with torch.no_grad():
num_batches += 1
error = get_error(scores, minibatch_label)
running_error += error
running_loss += loss.item()
print("success: ", num_batches)
# At the end of each epoch, compute and print summary statistics
total_error = running_error / num_batches
avg_loss = running_loss / num_batches
print('Epoch: ', epoch)
print('Time: ', time.time(), '\t Loss: ', avg_loss, '\t Error (%): ', total_error * 100)
Here is my dataset formatting and normalizing:
train_list_updated = []
train_label_list = []
for entry in train_list[1:]:
entry[0] = string_to_int(entry[0])
entry[1] = handedness[entry[1]]
entry[2] = string_to_int(entry[2])
entry[3] = handedness[entry[3]]
entry[4] = string_to_int(entry[4])
entry[5] = string_to_int(entry[5])
entry[6] = string_to_int(entry[6])
entry[17] = entry[17].replace(':','')
entry[-3] = pitch_types[entry[-3]]
entry[-2] = pitch_outcomes[entry[-2]]
train_label_list.append(entry[-2])
del entry[-1]
del entry[-1]
del entry[-3]
train_list_updated.append(entry)
final_train_list = []
for entry in train_list_updated:
for index in range(len(entry)):
try:
entry[index] = float(entry[index])
except:
entry[index] = 0.
final_train_list.append(entry)
# Do the same for the test data
test_list_updated = []
for entry in test_list[1:]:
entry[0] = string_to_int(entry[0])
entry[1] = handedness[entry[1]]
entry[2] = string_to_int(entry[2])
entry[3] = handedness[entry[3]]
entry[4] = string_to_int(entry[4])
entry[5] = string_to_int(entry[5])
entry[6] = string_to_int(entry[6])
entry[17] = entry[17].replace(':','')
entry[-3] = pitch_types[entry[-3]]
del entry[-1]
del entry[-1]
del entry[-3]
test_list_updated.append(entry)
final_test_list = []
for entry in test_list_updated:
for index in range(len(entry)):
try:
entry[index] = float(entry[index])
except:
entry[index] = 0.
final_test_list.append(entry)
# Create tensors of test and train data
train_data = torch.tensor(final_train_list)
train_label = torch.tensor(train_label_list)
test_data = torch.tensor(final_test_list)
And normalizing:
max_indices = torch.argmax(train_data, dim = 0)
min_indices = torch.argmin(train_data, dim = 0)
max_values = []
min_values = []
for i in range(33):
max_idx = max_indices[i].item()
min_idx = min_indices[i].item()
max_val = train_data[max_idx][i]
min_val = train_data[min_idx][i]
max_values.append(max_val)
min_values.append(min_val)
max_values = torch.Tensor(max_values)
min_values = torch.Tensor(min_values)
ranges = max_values - min_values
min_values = min_values.view(1, 33)
min_values = torch.repeat_interleave(min_values, 582205, dim = 0)
ranges = ranges.view(1, 33)
ranges = torch.repeat_interleave(ranges, 582205, dim = 0)
train_data = train_data - min_values
train_data = 2 * (train_data / ranges)
train_data = train_data - 1
And here's my net (a lot is commented out since I thought maybe there was an issue with the gradient zeroing or something. A five layer neural network should definitely not cause a problem though):
"""
DEFINING A NEURAL NETWORK
"""
# Define a fifteen layer artificial neural network
class fifteen_layer_net(nn.Module):
def __init__(self):
super().__init__()
self.linear1 = nn.Linear(33, 200)
self.linear2 = nn.Linear(200, 250)
self.linear3 = nn.Linear(250, 300)
self.linear4 = nn.Linear(300, 350)
self.linear5 = nn.Linear(350, 7)
# self.linear6 = nn.Linear(400, 450)
# self.linear7 = nn.Linear(450, 500)
# self.linear8 = nn.Linear(500, 450)
# self.linear9 = nn.Linear(450, 400)
# self.linear10 = nn.Linear(400, 350)
# self.linear11 = nn.Linear(350, 300)
# self.linear12 = nn.Linear(300, 250)
# self.linear13 = nn.Linear(250, 200)
# self.linear14 = nn.Linear(200, 150)
# self.linear15 = nn.Linear(150, 7)
def forward(self, x):
x = self.linear1(x)
x = F.relu(x)
x = self.linear2(x)
x = F.relu(x)
x = self.linear3(x)
x = F.relu(x)
x = self.linear4(x)
x = F.relu(x)
scores = self.linear5(x)
# x = F.relu(x)
# x = self.linear6(x)
# x = F.relu(x)
# x = self.linear7(x)
# x = F.relu(x)
# x = self.linear8(x)
# x = F.relu(x)
# x = self.linear9(x)
# x = F.relu(x)
# x = self.linear10(x)
# x = F.relu(x)
# x = self.linear11(x)
# x = F.relu(x)
# x = self.linear12(x)
# x = F.relu(x)
# x = self.linear13(x)
# x = F.relu(x)
# x = self.linear14(x)
# x = F.relu(x)
# scores = self.linear15(x)
return scores
Network should output scores, compute a loss using cross entropy loss criterion, and then do one step of stochastic gradient descent. This works for awhile and then mysteriously breaks. I have no idea why.
Any help is greatly appreciated.
Thanks in advance.
I was also facing same issue, You can try few things :
Make sure there are no NaN, and inf values in your dataset.
set your batch size where number of samples % batchsize = 0

How to test my Neural Network developed Tensorflow?

I`m just finished to write neural net with tensorflow
attached code :
import tensorflow as tensorFlow
import csv
# read data from csv
file = open('stub.csv')
reader = csv.reader(file)
temp = list(reader)
del temp[0]
# change data from string to float (Tensorflow)
# create data & goal lists
data = []
goal = []
for i in range(len(temp)):
data.append(map(float, temp[i]))
goal.append([data[i][6], 0.0])
del data[i][6]
# change lists to tuple
data = tuple(tuple(x) for x in data)
goal = tuple(goal)
# create training data and test data by 70-30
a = int(len(data) * 0.6) # training set 60%
b = int(len(data) * 0.8) # validation & test: each one is 20%
trainData = data[0:a] # 60%
validationData = data[b: len(data)]
testData = data[a: b] # 20%
trainGoal = goal[0:a]
validationGoal = goal[b:len(data)]
testGoal = goal[a: b]
numberOfLayers = 500
nodesLayer = []
# define the numbers of nodes in hidden layers
for i in range(numberOfLayers):
nodesLayer.append(500)
# define our goal class
classes = 2
batchSize = 2000
# x for input, y for output
sizeOfRow = len(data[0])
x = tensorFlow.placeholder(dtype= tensorFlow.float32, shape=[None, sizeOfRow])
y = tensorFlow.placeholder(dtype= tensorFlow.float32, shape=[None, classes])
hiddenLayers = []
layers = []
def neuralNetworkModel(x):
# first step: (input * weights) + bias, linear operation like y = ax + b
# each layer connection to other layer will represent by nodes(i) * nodes(i+1)
for i in range(0,numberOfLayers):
if i == 0:
hiddenLayers.append({"weights": tensorFlow.Variable(tensorFlow.random_normal([sizeOfRow, nodesLayer[i]])),
"biases": tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i]]))})
elif i > 0 and i < numberOfLayers-1:
hiddenLayers.append({"weights" : tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i], nodesLayer[i+1]])),
"biases" : tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i+1]]))})
else:
outputLayer = {"weights": tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i], classes])),
"biases": tensorFlow.Variable(tensorFlow.random_normal([classes]))}
# create the layers
for i in range(numberOfLayers):
if i == 0:
layers.append(tensorFlow.add(tensorFlow.matmul(x, hiddenLayers[i]["weights"]), hiddenLayers[i]["biases"]))
layers.append(tensorFlow.nn.relu(layers[i])) # pass values to activation function (i.e sigmoid, softmax) and add it to the layer
elif i >0 and i < numberOfLayers-1:
layers.append(tensorFlow.add(tensorFlow.matmul(layers[i-1], hiddenLayers[i]["weights"]), hiddenLayers[i]["biases"]))
layers.append(tensorFlow.nn.relu(layers[i]))
output = tensorFlow.matmul(layers[numberOfLayers-1], outputLayer["weights"]) + outputLayer["biases"]
return output
def neuralNetworkTrain(data, x, y):
prediction = neuralNetworkModel(x)
# using softmax function, normalize values to range(0,1)
cost = tensorFlow.reduce_mean(tensorFlow.nn.softmax_cross_entropy_with_logits(prediction, y))
# minimize the cost function
# using AdamOptimizer algorithm
optimizer = tensorFlow.train.AdadeltaOptimizer().minimize(cost)
epochs = 2 # feed machine forward + backpropagation = epoch
# build sessions and train the model
with tensorFlow.Session() as sess:
sess.run(tensorFlow.initialize_all_variables())
for epoch in range(epochs):
epochLoss = 0
i = 0
for _ in range(int(len(data) / batchSize)):
ex, ey = nextBatch(i) # takes 500 examples
i += 1
feedDict = {x :ex, y:ey }
_, cos = sess.run([optimizer,cost], feed_dict= feedDict) # start session to optimize the cost function
epochLoss += cos
print("Epoch", epoch + 1, "completed out of", epochs, "loss:", epochLoss)
correct = tensorFlow.equal(tensorFlow.argmax(prediction,1), tensorFlow.argmax(y, 1))
accuracy = tensorFlow.reduce_mean(tensorFlow.cast(correct, "float"))
print("Accuracy:", accuracy.eval({ x: trainData, y:trainGoal}))
# takes 500 examples each iteration
def nextBatch(num):
# Return the next `batch_size` examples from this data set.
# case: using our data & batch size
num *= batchSize
if num < (len(data) - batchSize):
return data[num: num+batchSize], goal[num: num +batchSize]
neuralNetworkTrain(trainData, x, y)
each epoch (iteration) I`ve got the value of loss function and all good.
now I want to try it on my validation/test set.
Someone now what should I do exactly?
Thanks
If you want to get predictions on the trained data you can simply put something like:
tf_p = tf.nn.softmax(prediction)
...In your graph, having loaded your test data into x_test. Then evaluate predictions with:
[p] = session.run([tf_p], feed_dict = {
x : x_test,
y : y_test
}
)
at the end of your neuralNetworkTrain method, and you should end up having them in p.
...Or using tf.train.Saver:
Alternatively you could use tf.train.Saver object to save and restore (and optionally persist) your model. In order to do that you create a saver after you initialise all variables:
...
tf.initialize_all_variables().run()
saver = tf.train.Saver()
...
And then save it once you're done training, at the end of your neuralNetworkTrain method:
...
model_path = saver.save(sess)
You then build a new graph for evaluation, and restore the model before running it on your test data:
# Load test dataset into X_test
...
tf_x = tf.constant(X_test)
tf_p = tf.nn.softmax(neuralNetworkModel(tf_x))
with tf.Session() as session:
tf.initialize_all_variables().run()
saver.restore(session, model_path)
p = tf_p.eval()
And, once again, p should contain softmax activations for your test dataset.
(I haven't actually run this code I'm afraid, but it should give you an idea of how to implement it.)

Categories