Autoencoder to encode features/categories of data

Autoencoder to encode features/categories of data - python

My question is regarding the use of autoencoders (in PyTorch). I have a tabular dataset with a categorical feature that has 10 different categories. Names of these categories are quite different - some names consist of one word, some of two or three words. But all in all I have 10 unique category names. What I'm trying to do is to create an autoencoder which will encode names of these categories - for example, if I have a category named 'Medium size class', I want to see if it is possible to train autoencoder to encode this name as something like 'mdmsc' or something like that. The use of it would be to found out which data points are hard to encode or not typical or something like that. I tried to adapt autoencoder architectures from various tutorials online however nothing seems to work for me or I simply do not know how to use them as they are all about images. Maybe someone has any idea how this type of autoencoder might be accomplished if it is at all possible?
Edit: here's the model I have so far (I just tried to adapt some architectures I found online):
class Autoencoder(nn.Module):
def __init__(self, input_shape, encoding_dim):
super(Autoencoder, self).__init__()
self.encode = nn.Sequential(
nn.Linear(input_shape, 128),
nn.ReLU(True),
nn.Linear(128, 64),
nn.ReLU(True),
nn.Linear(64, encoding_dim),
)
self.decode = nn.Sequential(
nn.Linear(encoding_dim, 64),
nn.ReLU(True),
nn.Linear(64, 128),
nn.ReLU(True),
nn.Linear(128, input_shape)
)
def forward(self, x):
x = self.encode(x)
x = self.decode(x)
return x
model = Autoencoder(input_shape=10, encoding_dim=5)
And also I use LabelEncoder() and then OneHotEncoder()to give these features/categories I mentioned numerical form. However, after training, output is the same as was input (no changes on the category name) but when I try to use only encoder part I'm unable to apply LabelEncoder() and then OneHotEncoder() because of dimension issues. I feel like maybe I can do something differently at the beginning, then I try to give those features numerical form, however I'm not sure what should I do.

First you will need to set up a train_loader depending on your data that will iterate over your data points.
Then you need to figure out what kind of loss you are going to use and optimizer:
# mean-squared error loss
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=.001) #learning rate depend on your task
Once you have that ready, you can train your autoencoder with basic steps:
for epoch in range(epochs):
for features in train_loader:
optimizer.zero_grad()
outputs = model(batch_features)
train_loss = criterion(outputs, features)
train_loss.backward()
optimizer.step()
Once model is done training you can examine embeddings using:
embedding = model.encode(your_input)

Related

PyTorch: VGG16 model not deterministic when ResNet18 model is

I currently try to learn two models (VGG16 ad ResNet18) on two Datasets (MNIST and CIFAR10). The goal here is to later test the effect different changes (like another loss function, or a manipulated dataset) have on the accuracy of the model. To make my results comparable I tried to make the learning process deterministic. To achieve this I set a fixed see for all the random generators with the following code.
def update_seed(seed):
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
os.environ['PYTHONHASHSEED'] = str(seed)
And for the ResNet18 model this works perfectly fine (The results are deterministic). But for the VGG16 model this does not work. And that is the point I don't understand, why is the above enough for ResNet18 to be deterministic, but not for VGG16?
So where is this extra randomness for VGG16 coming from and how can I disable it?
To get VGG16 deterministic I currently have to disable cuda and use the cpu only, but this makes the whole computing process very slow and is therefor not really an option.
The only difference between the two models is loading seen below and the learning rate when using CIFAR10.
def setup_vgg16(is_mnist_used):
vgg16_model = models.vgg16()
if is_mnist_used:
vgg16_model.features[0] = nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1)
vgg16_model.classifier[-1] = nn.Linear(4096, 10, bias=True)
return vgg16_model
def setup_resnet(is_mnist_used):
resnet_model = models.resnet18()
if is_mnist_used:
resnet_model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
resnet_model.fc = nn.Linear(512, 10, bias=True)
return resnet_model
What I have already tried (but with no success):
Adding bias=False to the VGG16 model, as it is the obvious difference between the two models
Testing the model before the learning (maybe the model is initiated with random values), but without learning the model is deterministic
Adding more stuff to the update_seed(seed) function
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.enabled = False (These two just decreases the
performance)
torch.use_deterministic_algorithms(True) -> This results in a cuda
error
Set num_worker=0 in the dataloader (this was suggested as a workaround for a similar problem in another thread)
This is the training function. Before this function the model is deterministic and after it is called for the first time, VGG16 is no longer deterministic.
def train_loop(dataloader, f_model, f_loss_fn, f_optimizer):
# setting the model into the train mode
f_model.train()
for batch, (x, y) in tqdm(enumerate(dataloader)):
# Moving the data to the same device as the model
x, y = x.to(device), y.to(device)
# Compute prediction and loss
pred = f_model(x)
loss = f_loss_fn(pred, y)
# Backpropagation
f_optimizer.zero_grad()
loss.backward()
f_optimizer.step()

I think that's because torchvision's VGG models use AdaptiveAvgPool2d, and AdaptiveAvgPool2d cannot be made non-deterministic and will throw a runtime error when used together with torch.use_deterministic_algorithms(True).

The difference of loading model parameters between load_state_dict and nn.Parameter in pytorch

When I wanna assign part of pre-trained model parameters to another module defined in a new model of PyTorch, I got two different outputs using two different methods.
The Network is defined as follows:
class Net:
def __init__(self):
super(Net, self).__init__()
self.resnet = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)
self.resnet = nn.Sequential(*list(self.resnet.children())[:-1])
self.freeze_model(self.resnet)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(512, 256),
nn.ReLU(),
nn.Linear(256, 3),
)
def forward(self, x):
out = self.resnet(x)
out = out.flatten(start_dim=1)
out = self.classifier(out)
return out
What I want is to assign pre-trained parameters to classifier in the net module. Two different ways were used for this task.
# First way
net.load_state_dict(torch.load('model_CNN_pretrained.ptl'))
# Second way
params = torch.load('model_CNN_pretrained.ptl')
net.classifier[1].weight = nn.Parameter(params['classifier.1.weight'], requires_grad =False)
net.classifier[1].bias = nn.Parameter(params['classifier.1.bias'], requires_grad =False)
net.classifier[3].weight = nn.Parameter(params['classifier.3.weight'], requires_grad =False)
net.classifier[3].bias = nn.Parameter(params['classifier.3.bias'], requires_grad =False)
The parameters were assigned correctly but got two different outputs from the same input data. The first method works correctly, but the second doesn't work well. Could some guys point what the difference of these two methods?

Finally, I find out where is the problem.
During the pre-trained process, buffer parameters in BatchNorm2d Layer of ResNet18 model were changed even if we set require_grad of parameters False. Buffer parameters were calculated by the input data after model.train() was processed, and unchanged after model.eval().
There is a link about how to freeze the BN layer.
How to freeze BN layers while training the rest of network (mean and var wont freeze)

Why is my loss trending down while my accuracy is going to zero?

I am trying to practice my machine learning skills with Tensorflow/Keras but I am having trouble around fitting the model. Let me explain what I've done and where I'm at.
I am using the dataset from Kaggle's Costa Rican Household Poverty Level Prediction Challenge
Since I am just trying to get familiar with the Tensorflow workflow, I cleaned the dataset by removing a few columns that had a lot of missing data and then filled in the other columns with their mean. So there are no missing values in my dataset.
Next I loaded the new, cleaned, csv in using make_csv_dataset from TF.
batch_size = 32
train_dataset = tf.data.experimental.make_csv_dataset(
'clean_train.csv',
batch_size,
column_names=column_names,
label_name=label_name,
num_epochs=1)
I set up a function to return my compiled model like so:
f1_macro = tfa.metrics.F1Score(num_classes=4, average='macro')
def get_compiled_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(137,)), # input shape required
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dense(4, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[f1_macro, 'accuracy'])
return model
model = get_compiled_model()
model.fit(train_dataset, epochs=15)
Below is the result of that
A link to my notebook is Here
I should mention that I strongly based my implementation on Tensorflow's iris data walkthrough
Thank you!

After a while, I was able to find the issues with your code they are in the order of importance. (First is of highest importance)
You are doing multi-class classification (not binary classification). Therefore your loss should be categorical_crossentropy.
You are not onehot encoding your labels. Using binary_crossentropy and having labels as a numerical ID is definitely not the way forward. Instead, you should do onehot encode your labels and solve this like a multi-class classification problem. Here's how you do that.
def pack_features_vector(features, labels):
"""Pack the features into a single array."""
features = tf.stack(list(features.values()), axis=1)
return features, tf.one_hot(tf.cast(labels-1, tf.int32), depth=4)
Normalizing your data. If you look at your training data. They are not normalized. And their values are all over the place. Therefore, you should consider normalizing your data by doing something like below. This is just for demonstration purposes. You should read about Scalers in scikit learn and choose what's best for you.
x = train_df[feature_names].values #returns a numpy array
min_max_scaler = preprocessing.StandardScaler()
x_scaled = min_max_scaler.fit_transform(x)
train_df = pd.DataFrame(x_scaled)
These issues should set your model straight.

Adding batch normalization decreases the performance

I'm using PyTorch to implement a classification network for skeleton-based action recognition. The model consists of three convolutional layers and two fully connected layers. This base model gave me an accuracy of around 70% in the NTU-RGB+D dataset. I wanted to learn more about batch normalization, so I added a batch normalization for all the layers except for the last one. To my surprise, the evaluation accuracy dropped to 60% rather than increasing But the training accuracy has increased from 80% to 90%. Can anyone say what am I doing wrong? or Adding batch normalization need not increase the accuracy?
The model with batch normalization
class BaseModelV0p2(nn.Module):
def __init__(self, num_person, num_joint, num_class, num_coords):
super().__init__()
self.name = 'BaseModelV0p2'
self.num_person = num_person
self.num_joint = num_joint
self.num_class = num_class
self.channels = num_coords
self.out_channel = [32, 64, 128]
self.loss = loss
self.metric = metric
self.bn_momentum = 0.01
self.bn_cv1 = nn.BatchNorm2d(self.out_channel[0], momentum=self.bn_momentum)
self.conv1 = nn.Sequential(nn.Conv2d(in_channels=self.channels, out_channels=self.out_channel[0],
kernel_size=3, stride=1, padding=1),
self.bn_cv1,
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2))
self.bn_cv2 = nn.BatchNorm2d(self.out_channel[1], momentum=self.bn_momentum)
self.conv2 = nn.Sequential(nn.Conv2d(in_channels=self.out_channel[0], out_channels=self.out_channel[1],
kernel_size=3, stride=1, padding=1),
self.bn_cv2,
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2))
self.bn_cv3 = nn.BatchNorm2d(self.out_channel[2], momentum=self.bn_momentum)
self.conv3 = nn.Sequential(nn.Conv2d(in_channels=self.out_channel[1], out_channels=self.out_channel[2],
kernel_size=3, stride=1, padding=1),
self.bn_cv3,
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2))
self.bn_fc1 = nn.BatchNorm1d(256 * 2, momentum=self.bn_momentum)
self.fc1 = nn.Sequential(nn.Linear(self.out_channel[2]*8*3, 256*2),
self.bn_fc1,
nn.ReLU(),
nn.Dropout2d(p=0.5)) # TO check
self.fc2 = nn.Sequential(nn.Linear(256*2, self.num_class))
def forward(self, input):
list_bn_layers = [self.bn_fc1, self.bn_cv3, self.bn_cv2, self.bn_cv1]
# set the momentum of the batch norm layers to given momentum value during trianing and 0 during evaluation
# ref: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561
# ref: https://github.com/pytorch/pytorch/issues/4741
for bn_layer in list_bn_layers:
if self.training:
bn_layer.momentum = self.bn_momentum
else:
bn_layer.momentum = 0
logits = []
for i in range(self.num_person):
out = self.conv1(input[:, :, :, :, i])
out = self.conv2(out)
out = self.conv3(out)
logits.append(out)
out = torch.max(logits[0], logits[1])
out = out.view(out.size(0), -1)
out = self.fc1(out)
out = self.fc2(out)
t = out
assert not ((t != t).any()) # find out nan in tensor
assert not (t.abs().sum() == 0) # find out 0 tensor
return out

My interpretation of the phenomenon you are observing,, is that instead of reducing the covariance shift, which is what the Batch Normalization is meant for, you are increasing it. In other words, instead of decrease the distribution differences between train and test, you are increasing it and that's what it is causing you to have a bigger difference in the accuracies between train and test. Batch Normalization does not assure better performance always, but for some problems it doesn't work well. I have several ideas that could lead to an improvement:
Increase the batch size if it is small, what would help the mean and std calculated in the Batch Norm layers to be more robust estimates of the population parameters.
Decrease the bn_momentum parameter a bit, to see if that also stabilizes the Batch Norm parameters.
I am not sure you should set bn_momentum to zero when test, I think you should just call model.train() when you want to train and model.eval() when you want to use your trained model to perform inference.
You could alternatively try Layer Normalization instead of Batch Normalization, cause it does not require accumulating any statistic and usually works well
Try regularizing a bit your model using dropout
Make sure you shuffle your training set in every epoch. Not shuffling the data set may lead to correlated batches that make the statistics in batch normalization cycle. That may impact your generalization
I hope any of these ideas work for you

The problem may be with your momentum. I see you are using 0.01.
Here is how I tried different betas to fit to points with momentum and with beta=0.01 I got bad results. Usually beta=0.1 is used.

It's almost happen because of two major reasons 1.non-stationary training'procedure and 2.train/test different distributions
If It's possible try other regularization technique's like Drop-out,I face to this problem and i found that my test and train distribution might be different so after i remove BN and use drop-out instead, got the reasonable result. read this for more
Use nn.BatchNorm2d(out_channels, track_running_stats=False) this disables the running statistics of the batches and uses the current batch’s mean and variance to do the normalization
In Training mode run some forward passes on data in with torch.no_grad() block. this stabilize the running_mean / running_std values
Use same batch_size in your dataset for both model.train() and model.eval()
Increase momentum of the BN. This means that the means and stds learned will be much more stable during the process of training
this is helpful whenever you use pre-trained model
for child in model.children():
for ii in range(len(child)):
if type(child[ii])==nn.BatchNorm2d:
child[ii].track_running_stats = False

Implementing a many-to-many regression task

Sorry if I present my problem not clearly, English is not my first language
Problem
Short description:
I want to train a model which map input x (with shape of [n_sample, timestamp, feature]) to an output y (with exact same shape). It's like mapping 2 space
Longer version:
I have 2 float ndarrays of shape [n_sample, timestamp, feature], representing MFCC feature of n_sample audio file. These 2 ndarray are 2 speakers' speech of the same corpus, which was aligned by DTW. Lets name these 2 arrays x and y. I want to train a model, which predict y[k] given x[k]. It's like mapping from space x to space y, and the output must be exact same shape as the input
What I've tried
It's time-series problem so I decide to use RNN approach. Here is my code in PyTorch (I put comment along the code. I removed the calculation of average loss for simplicity). Note that I've tried many option for learning rate, the behavior still the same
Class define
class Net(nn.Module):
def __init__(self, in_size, hidden_size, out_size, nb_lstm_layers):
super().__init__()
self.in_size = in_size
self.hidden_size = hidden_size
self.out_size = out_size
self.nb_lstm_layers = nb_lstm_layers
# self.fc1 = nn.Linear()
self.lstm = nn.LSTM(input_size=self.in_size, hidden_size=self.hidden_size, num_layers=self.nb_lstm_layers, batch_first=True, bias=True)
# self.fc = nn.Linear(self.hidden_size, self.out_size)
self.fc1 = nn.Linear(self.hidden_size, 128)
self.fc2 = nn.Linear(128, 128)
self.fc3 = nn.Linear(128, self.out_size)
def forward(self, x, h_state):
out, h_state = self.lstm(x, h_state)
output_fc = []
for frame in out:
output_fc.append(self.fc3(torch.tanh(self.fc1(frame)))) # I added fully connected layer to each frame, to make an output with same shape as input
return torch.stack(output_fc), h_state
def hidden_init(self):
if use_cuda:
h_state = torch.stack([torch.zeros(nb_lstm_layers, batch_size, 20) for _ in range(2)]).cuda()
else:
h_state = torch.stack([torch.zeros(nb_lstm_layers, batch_size, 20) for _ in range(2)])
return h_state
Training step:
net = Net(20, 20, 20, nb_lstm_layers)
optimizer = optim.Adam(net.parameters(), lr=0.0001, weight_decay=0.0001)
criterion = nn.MSELoss()
for epoch in range(nb_epoch):
count = 0
loss_sum = 0
batch_x = None
for i in (range(len(data))):
# data is my entire data, which contain A and B i specify above.
temp_x = torch.tensor(data[i][0])
temp_y = torch.tensor(data[i][1])
for ii in range(0, data[i][0].shape[0] - nb_frame_in_batch*2 + 1): # Create batches
batch_x, batch_y = get_batches(temp_x, temp_y, ii, batch_size, nb_frame_in_batch)
# this will return 2 tensor of shape (batch_size, nb_frame_in_batch, 20),
# with `batch_size` is the number of sample each time I feed to the net,
# nb_frame_in_batch is the number of frame in each sample
optimizer.zero_grad()
h_state = net.hidden_init()
prediction, h_state = net(batch_x.float(), h_state)
loss = criterion(prediction.float(), batch_y.float())
h_state = (h_state[0].detach(), h_state[1].detach())
loss.backward()
optimizer.step()
Problem is, the loss seems not to decrease but fluctuate a lot, without a clear behaviour
Please help me. Any suggestion will be greatly appreciated. If somebody can inspect my code and provide some comment, that would be so kind.
Thanks in advance!

It seems the network learning nothing from your data, hence the loss fluctuation (since weights depends on random initialization only). There are something you can try:
Try to normalize the data (this suggestion is quite broad, but I can't give you more details since I don't have your data, but normalize it to a specific range like [0, 1], or to a mean and std value is worth trying)
One very typical problem of LSTM in pytorch is its input dimension is quite different to other type of neural network. You must feed into your network a tensor with shape (seq_len, batch, input_size). You should go here, LSTM section for better details
One more thing: try to tune your hyperparameters. LSTM is harder to train compare to FC or CNN (to my experience).
Tell me if you have improvement. Debugging a neural network is always hard and full of potential coding mistake

With most ML algorithms it is tough to diagnose without seeing the data. Based on the inconsistency of your loss results this might be an issue with your data pre-processing. Have you tried normalizing the data first? Often times with large fluctuations in results, one of your input neuron values may be skewing your loss function making it unable to find a good direction.
How to normalize a NumPy array to within a certain range?
This is an example for audio normalization but I would also try adjusting the learning rate as it looks high and possibly removing a hidden layer.

May the problem was in the calculation of the loss. Try to sum the losses of each time-step in a sequence and then take the average over the batch. May it helps

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Autoencoder to encode features/categories of data - python

Related

PyTorch: VGG16 model not deterministic when ResNet18 model is

The difference of loading model parameters between load_state_dict and nn.Parameter in pytorch

Why is my loss trending down while my accuracy is going to zero?

Adding batch normalization decreases the performance

Implementing a many-to-many regression task

Categories

Resources