I am trying to implement a Neural Network for predicting the h1_hemoglobin in PyTorch. After creating a model, I kept 1 in the output layer as this is Regression. But I got the error as below. I'm not able to understand the mistake. Keeping a large value like 100 in the output layer removes the error but renders the model useless as I am trying to implement regression.
Data:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)
##### Creating Tensors
X_train=torch.tensor(X_train)
X_test=torch.tensor(X_test)
y_train=torch.LongTensor(y_train)
y_test=torch.LongTensor(y_test)
class ANN_Model(nn.Module):
def __init__(self,input_features=4,hidden1=20,hidden2=20,out_features=1):
super().__init__()
self.f_connected1=nn.Linear(input_features,hidden1)
self.f_connected2=nn.Linear(hidden1,hidden2)
self.out=nn.Linear(hidden2,out_features)
def forward(self,x):
x=F.relu(self.f_connected1(x))
x=F.relu(self.f_connected2(x))
x=self.out(x)
return x
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
epochs = 500
final_losses = []
for i in range(epochs):
i = i + 1
y_pred = model.forward(X_train.float())
loss=loss_function(y_pred, y_train)
final_losses.append(loss.item())
if i%10==1:
print("Epoch number: {} and the loss: {}".format(i, loss.item()))
optimizer.zero_grad()
loss.backward()
optimizer.step()
Error:
Since you are performing regression, the CrossEntropyLoss() internally implements the NLLLoss() function. The CrossEntropyLoss() expects C classes for C predictions but you have specified only one class. The NLLLoss() tries to index into the prediction logits based on the ground-truth value. E.g., in your case, the ground-truth is a single value 14. The loss step tries to index into the 14th logit of your predictions to get its corresponding value so that it can compute the negative log likelihood on it, which is essentially - -log(probability_k) where k is the index that the ground-truth outputs. Since you have only logit in your predictions, it throws an error - index out of bounds.
For regression problems, you should consider using distance based losses such as MSELoss().
Try replacing your loss function - loss_function = CrossEntropyLoss() with loss_function = MSELoss()
Your response variable h1_hemoglobin looks like continous response variable. If that's the case please change the Torch Tensor Type for y_train and y_test from LongTensor to FloatTensor or DoubleTensor.
According to the Pytorch docs, CrossEntropyLoss is useful for classification problems with a number of classes. Try to change your loss_function from CrossEntropyLoss to a more suitable one for your continuous response variable h1_hemoglobin.
In your case, the following might do it.
y_train=torch.DoubleTensor(y_train)
y_test=torch.DoubleTensor(y_test)
...
...
loss_function = nn.MSELoss()
Pytorch MSELoss
Pytorch CrossEntropyLoss
Related
I am quite new with PyTorch, and I am trying to use an object detection model to do transfer learning in order to learn how to detect my new dataset.
Here is how I load the dataset:
train_dataset = MyDataset(train_data_path, 512, 512, train_labels_path, get_train_transform())
train_loader = DataLoader(train_dataset,batch_size=8,shuffle=True,num_workers=4,collate_fn=collate_fn)
valid_dataset = MyDataset(test_data_path, 512, 512, test_labels_path, get_valid_transform())
valid_loader = DataLoader(valid_dataset,batch_size=8, shuffle=False,num_workers=4,collate_fn=collate_fn)
I define the model and optimizer as follows:
# load Faster RCNN pre-trained model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="FasterRCNN_ResNet50_FPN_Weights.COCO_V1") # get the number of input features
in_features = model.roi_heads.box_predictor.cls_score.in_features
# define a new head for the detector with the required number of classes
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model = model.to(DEVICE)
# get the model parameters
params = [p for p in model.parameters() if p.requires_grad]
# define the optimizer
# We are using the SGD optimizer with a learning rate of 0.001 and momentum on 0.9.
optimizer = torch.optim.SGD(params, lr=0.001, momentum=0.9, weight_decay=0.0005)
I train the model as follows:
def train(train_data_loader, model, optimizer, train_loss_hist):
global train_itr
global train_loss_list
prog_bar = tqdm(train_data_loader, total=len(train_data_loader), position=0, leave=True, ascii=True)
# Then we have the for loop iterating over the batches.
for i, data in enumerate(prog_bar):
optimizer.zero_grad()
images, targets = data
images = list(image.to(DEVICE) for image in images)
targets = [{k: v.to(DEVICE) for k, v in t.items()} for t in targets]
# Forward pass
loss_dict = model(images, targets)
# Then we sum the losses and append the current iterations loss value to train_loss_list list.
losses = sum(loss for loss in loss_dict.values())
loss_value = losses.item()
# We also send the current loss value to train_loss_hist of the Averager class.
train_loss_list.append(loss_value)
train_loss_hist.send(loss_value)
# Then we backpropagate the gradients and update parameters.
losses.backward()
optimizer.step()
train_itr += 1
return train_loss_list
Considering that I adapted one code I found and I am not sure where the loss is defined (I have not defined any kind of loss in the code, so I believe it will use the default loss that was used to train the original object detector), how can I train my network considering such an imbalanced dataset and update my code?
It seems that you have two questions.
How to deal with imbalanced dataset.
Note that Faster-RCNN is an Anchor-Based detector, which means number of anchors containing the object is extremely small compared to the number of total anchors, so you don't need to deal with the imbalanced dataset. Or you can use RetinaNet which proposed a loss function called focal loss to improve performance upon imbalanced dataset.
Where is the loss function.
torchvision integrated the loss function inside the model object, you can debug your python code step by step inside the torchvision package and see the implementation details
I'm attempting feature extraction in an unorthodox way. I extract features in eval() mode to switch off the batch norm and dropout layers and use the running means and std provided by ImageNet.
I use a feature extractor to extract features from two related images and concatenate the two tensors stackwise before passing through a linear dense classifier model for training. I'm wondering whether I can avoid using with torch.no_grad() as the two models are unrelated.
Here is a simplified version:
num_classes = 2
num_epochs = 10
criterion = nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(classifier.parameters(), lr=0.001)
densenet= DenseNetConv()
# set densenet to eval to switch off batch norm and dropout layers and use ImageNet running means/ std devs
densenet.eval()
densenet.to(device)
classifier = nn.Linear(4416, num_classes)
classifier.to(device)
for epoch in range(num_epochs):
classifier.train()
for i, (inputs_1, inputs_2, labels) in enumerate(dataloaders_dict['train']):
inputs_1= inputs_1.to(device)
inputs_2 = inputs_2.to(device)
labels = labels.to(device)
features_1 = densenet(inputs_1) # extract features 1
features_2 = densenet(inputs_2) # extract features 2
combined = torch.cat([features_1, features_2], dim=1) # combine features
combined = combined(-1, 4416) # reshape
optimizer.zero_grad()
# Forward pass to get output/logits
outputs = classifier(combined)
# Calculate Loss: softmax --> cross entropy loss
loss = criterion(outputs, labels)
_, pred = torch.max(outputs, 1)
equality_check = (labels.data == pred)
# Getting gradients w.r.t. parameters
loss.backward()
optimizer.step()
As you can see, I do not call with torch.no_grad(), despite having densenet.eval() as my separate feature extractor. Is there an issue with the way this is implemented or can I assume that this will not interfere with the classifier model?
If you are doing inference on a model, applying torch.no_grad() won't have any effect on the resulting output. As you've said only nn.Module.eval will since it modifies how the forward operation is performed (namely which statistics to use to normalize the batch elements).
It is recommended to switch off gradient computation when backpropagation is not necessary. This avoids caching activations on forward call resulting in faster inference time.
In your case, you can either wrap your inference call on densenet with torch.no_grad:
torch.no_grad():
features_1 = densenet(inputs_1) # extract features 1
features_2 = densenet(inputs_2) # extract features 2
Or alternatively, switch off the requires_grad flag on your module's parameter tensors using nn.Module.requires_grad_:
densenet.eval()
densenet.requires_grad_(False)
I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
output_attentions=False,
output_hidden_states=False)
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit(ds_tr_encoded, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820
tf.data.experimental.cardinality(ds_te_encoded) #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.
Since ds_te_encoded is of type tf.data.Dataset and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.
I am working with Keras' sample denoising autoencoder;
https://keras.io/examples/mnist_denoising_autoencoder/
As I compile it, I use the following options:
autoencoder.compile(loss='mse', optimizer= Adadelta, metrics=['accuracy'])
Followed by training. I did training deliberately WITHOUT using noisy training data(x_train_noisy), but merely tried to recover x_train.
autoencoder.fit(x_train, x_train, epochs=30, batch_size=128)
After training 60,000 inputs of MNIST digits, it gives me an accuracy of 81.25%. Does it mean there are 60000*81.25% images are PERFECTLY recovered (equaling to the original input pixel by pixel), that is, 81.25% output images from the autoencoder are IDENTICAL to their input counterparts, or something else?
Furthermore, I also conducted a manual check by comparing output and the original data (60000 28X28 matrices) pixel by pixel--counting non-zeros elements from their differences:
x_decoded = autoencoder.predict(x_train)
temp = x_train*255
x_train_uint8 = temp.astype('uint8')
temp = x_decoded*255
x_decoded_uint8 = temp.astype('uint8')
c = np.count_nonzero(x_train_uint8 - x_decoded_uint8)
cp = 1-c /60000/28/28
Yet cp is only about 71%. Could any tell me why there is a difference?
Accuracy doesn't make sense for a regression problem, hence the keras sample doesn't use that metric during autoencoder.compile.
In this case, keras calculates the accuracy as per this metric.
binary_accuracy
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
Using this numpy implementation, you should get the same value as output by Keras for validation accuracy at the end of training.
x_decoded = autoencoder.predict(x_test_noisy)
acc = np.mean(np.equal(x_test, np.round(x_decoded)))
print(acc)
Refer this answer for more details:
What function defines accuracy in Keras when the loss is mean squared error (MSE)?
I would like to train my neural network using a custom loss value of my own. Therefore, I would like to perform feed forward propagation for one mini batch to store the activations in the memory, and then perform back propagation using a my own loss value. This is to be done using tensorflow.
Finally, I need to do something like:
sess.run(optimizer, feed_dict={x: training_data, loss: my_custom_loss_value}
Is that possible? I am assuming that the optimizer depends on the loss which by itself depends on the input. Therefore, I want to inputs to be fed into the graph, but I want to use my value for the loss.
I guess since the optimizer depends on the activations, they will be evaluated, in other words, the input is going to be fed into the network. Here is an example:
import tensorflow as tf
a = tf.Variable(tf.constant(8.0))
a = tf.Print(input_=a, data=[a], message="a:")
b = tf.Variable(tf.constant(6.0))
b = tf.Print(input_=b, data=[b], message="b:")
c = a * b
optimizer = tf.train.AdadeltaOptimizer(learning_rate=0.1).minimize(c)
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
value, _ = sess.run([c, optimizer], feed_dict={c: 1})
print(value)
Finally, the printed value is 1.0, while the console shows: a:[8]b:[6] which means that the inputs got evaluated.
Exactly so.
When you train the optimizer using Gradient Descent or any other optimization algorithm like AdamOptimizer(), the optimizer minimizes your loss function, which could be a Softmax cross entropy tf.nn.softmax_cross_entropy_with_logits in terms of multi-class classification, or a squared error loss tf.losses.mean_squared_error in terms of regression or your own custom loss. The loss function is evaluated or computed using the model hypothesis.
So TensorFlow uses this cascade approach to train the model hypothesis by calling a tf.Session().run() on the optimizer. See the following as a rough example in a multi-classification setting:
batch_size = 128
# build the linear model
hypothesis = tf.add(tf.matmul(input_X, weight), bias)
# softmax cross entropy loss or cost function for logistic regression
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets,
logits=hypothesis))
# optimizer to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.001).minimize(loss)
# execute in Session
with tf.Session() as sess:
# initialize all variables
tf.global_variables_initializer().run()
tf.local_variables_initializer().run()
# Train the model
for steps in range(1000):
mini_batch = zip(range(0, X_train.shape[0], batch_size),
range(batch_size, X_train.shape[0]+1, batch_size))
# train using mini-batches
for (start, end) in mini_batch:
sess.run(optimizer, feed_dict = {input_X: X_features[start:end],
input_y: y_targets[start:end]})