Structural Similarity Loss implementation on pytorch gives NaN - python

I decided to write my lossfunction Structural Similarity Loss according to the article
https://arxiv.org/pdf/1910.08711.pdf
Testing different models for segmentation and different losses for them I have a problem that sometimes there is numerical instability and my self-written Segnet model gives out NaN during training, due to which loss also becomes NaN. While on other losses (bce, dice loss, focal loss) everything is stable. After printing out the variables in detail, I found out that the loss value before the y_pred=NaN arrives is adequate, so my assumption is that the loss gradients are counted incorrectly, but it's not clear how to fix it.
def ssl_loss (y_real, y_pred, window_size=11, eps = 0.01):
beta = 0.1
Lambda = 0.5
#input size(B, C, H, W)
#C = 1, because we compare monochrome segmentations
y_real, y_pred = y_real.to(device).squeeze(), y_pred.to(device).squeeze()
bce_matrix = (y_pred - y_real * y_pred + torch.log(1 + torch.exp(-y_pred)))
y_pred = torch.sigmoid(y_pred)
blurer = T.GaussianBlur(kernel_size=(11, 11), sigma=(1.5, 1.5))
mu_y = blurer(y_real)
sigma_y = blurer((y_real - mu_y) ** 2)
mu_p = blurer(y_pred)
sigma_p = blurer((y_pred - mu_p) ** 2)
errors = torch.abs((y_real - mu_y + eps) / (torch.sqrt(sigma_y) + eps) - (y_pred - mu_p + eps) / (torch.sqrt(sigma_p) + eps)).squeeze()
f_n_c = (errors > beta * errors.max()).int()
M = f_n_c.sum(dim=(1, 2)).unsqueeze(1).unsqueeze(2)
ssl_matrix = (errors * f_n_c * bce_matrix / M)
loss = Lambda * bce_matrix.mean() + (1 - Lambda) * ssl_matrix.mean()
return loss
And here's meaningful part of my train function
for epoch in range(epochs):
avg_loss = 0
model.train()
for X_batch, Y_batch in data_tr:
X_batch = X_batch.to(device)
Y_batch = Y_batch.to(device)
opt.zero_grad()
Y_pred = model(X_batch)
loss = loss_fn(Y_batch, Y_pred)
loss.backward()
opt.step()
avg_loss += loss / len(data_tr)
scheduler.step()

Related

How to write this custom loss function so it produces a loss for each sample?

I'm using this custom loss function for ccc
def ccc(y_true, y_pred):
ccc = ((ccc_v(y_true, y_pred) + ccc_a(y_true, y_pred)) / 2)
return 1 - ccc
def ccc_v(y_true, y_pred):
x = y_true[:,0]
y = y_pred[:,0]
x_mean = K.mean(x, axis=0)
y_mean = K.mean(y, axis=0)
covar = K.mean( (x - x_mean) * (y - y_mean) )
x_var = K.var(x)
y_var = K.var(y)
ccc = (2.0 * covar) / (x_var + y_var + (x_mean + y_mean)**2)
return ccc
def ccc_a(y_true, y_pred):
x = y_true[:,1]
y = y_pred[:,1]
x_mean = K.mean(x, axis=0)
y_mean = K.mean(y, axis=0)
covar = K.mean( (x - x_mean) * (y - y_mean) )
x_var = K.var(x)
y_var = K.var(y)
ccc = (2.0 * covar) / (x_var + y_var + (x_mean + y_mean)**2)
return ccc
Currently the loss function ccc returns a scalar. The loss function is split into 2 different functions (ccc_v and ccc_a) because I use them as metrics as well.
I've read from Keras doc and this question that a custom loss function should return a list of losses, one for each sample.
First question: my model trains even if the loss function returns a scalar. Is it that bad? How is training different if I use a loss function whose output is a scalar instead of a list of scalars?
Second question: how can I rewrite my loss function to return a list of losses? I know I should avoid means and sums but in my case I think it's not possible because there's not a global mean but different ones, one a the numerator for the covariance and a couple at the denominator for the variances.
if your using tensorflow there are automatic apis for calculating loss
tf.keras.losses.mse()
tf.keras.losses.mae()
tf.keras.losses.Huber()
# Define the loss function
def loss_function(w1, b1, w2, b2, features = borrower_features, targets = default):
predictions = model(w1, b1, w2, b2)
# Pass targets and predictions to the cross entropy loss
return keras.losses.binary_crossentropy(targets, predictions)
#if your using categorical_crossentropy than return the losses for it.
#convert your image into a single np.array for input
#build your SoftMax model
# Define a sequential model
model=keras.Sequential()
# Define a hidden layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(784,)))
# Define the output layer
model.add(keras.layers.Dense(4,activation='softmax'))
# Compile the model
model.compile('SGD', loss='categorical_crossentropy',metrics=['accuracy'])
# Complete the fitting operation
train_data=train_data.reshape((50,784))
# Fit the model
model.fit(train_data, train_labels, validation_split=0.2, epochs=3)
# Reshape test data
test_data = test_data.reshape(10, 784)
# Evaluate the model
model.evaluate(test_data, test_labels)

How to create Hybrid loss consisting from dice loss and focal loss [Python]

I'm trying to implement the Multiclass Hybrid loss function in Python from following article https://arxiv.org/pdf/1808.05238.pdf for my semantic segmentation problem using an imbalanced dataset. I managed to get my implementation correct enough to start while training the model, but the results are very poor. Model architecture - U-net, learning rate in Adam optimizer is 1e-5. Mask shape is (None, 512, 512, 3), with 3 classes (in my case forest, deforestation, other). The formula I used to implement my loss:
The code I created:
def build_hybrid_loss(_lambda_=1, _alpha_=0.5, _beta_=0.5, smooth=1e-6):
def hybrid_loss(y_true, y_pred):
C = 3
tversky = 0
# Calculate Tversky Loss
for index in range(C):
inputs_fl = tf.nest.flatten(y_pred[..., index])
targets_fl = tf.nest.flatten(y_true[..., index])
#True Positives, False Positives & False Negatives
TP = tf.reduce_sum(tf.math.multiply(inputs_fl, targets_fl))
FP = tf.reduce_sum(tf.math.multiply(inputs_fl, 1-targets_fl[0]))
FN = tf.reduce_sum(tf.math.multiply(1-inputs_fl[0], targets_fl))
tversky_i = (TP + smooth) / (TP + _alpha_ * FP + _beta_ * FN + smooth)
tversky += tversky_i
tversky += C
# Calculate Focal loss
loss_focal = 0
for index in range(C):
f_loss = - (y_true[..., index] * (1 - y_pred[..., index])**2 * tf.math.log(y_pred[..., index]))
# Average over each data point/image in batch
axis_to_reduce = range(1, 3)
f_loss = tf.math.reduce_mean(f_loss, axis=axis_to_reduce)
loss_focal += f_loss
result = tversky + _lambda_ * loss_focal
return result
return hybrid_loss
The prediction of the model after the end of an epoch (I have a problem with swapped colors, so the red in the prediction is actually green, which means forest, so the prediction is mostly forest and not deforestation):
The question is what is wrong with my hybrid loss implementation, what needs to be changed to make it work?
To simplify things a little, I have divided the Hybrid loss into four separate functions: Tversky's loss, Dice coefficient, Dice loss, Hybrid loss. You can see the code below.
def TverskyLoss(targets, inputs, alpha=0.5, beta=0.5, smooth=1e-16, numLabels=3):
tversky = 0
for index in range(numLabels):
inputs_fl = tf.nest.flatten(inputs[..., index])
targets_fl = tf.nest.flatten(targets[..., index])
#True Positives, False Positives & False Negatives
TP = tf.reduce_sum(tf.math.multiply(inputs_fl, targets_fl))
FP = tf.reduce_sum(tf.math.multiply(inputs_fl, 1-targets_fl[0]))
FN = tf.reduce_sum(tf.math.multiply(1-inputs_fl[0], targets_fl))
tversky_i = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
tversky += tversky_i
return numLabels - tversky
def dice_coef(y_true, y_pred, smooth=1e-16):
y_true_f = tf.nest.flatten(y_true)
y_pred_f = tf.nest.flatten(y_pred)
intersection = tf.math.reduce_sum(tf.math.multiply(y_true_f, y_pred_f))
return (2. * intersection + smooth) / (tf.math.reduce_sum(y_true_f) + tf.math.reduce_sum(y_pred_f) + smooth)
def dice_coef_multilabel(y_true, y_pred, numLabels=3):
dice=0
for index in range(numLabels):
dice -= dice_coef(y_true[..., index], y_pred[..., index])
return numLabels + dice
def build_hybrid_loss(_lambda_=0.5, _alpha_=0.5, _beta_=0.5, smooth=1e-16, C=3):
def hybrid_loss(y_true, y_pred):
tversky = TverskyLoss(y_true, y_pred, alpha=_alpha_, beta=_beta_)
dice = dice_coef_multilabel(y_true, y_pred)
result = tversky + _lambda_ * dice
return result
return hybrid_loss
Adding the loss=build_hybrid_loss() during model compilation will add Hybrid loss as the loss function of the model.
After a short research, I came to the conclusion that in my particular case, a Hybrid loss with _lambda_ = 0.2, _alpha_ = 0.5, _beta_ = 0.5 would not be much better than a single Dice loss or a single Tversky loss. Neither IoU (intersection over union) nor the standard accuracy metric are much better with Hybrid loss. But I believe it is not a rule of thumb that such a Hybrid loss will be worser or at the same level of performance as single loss at all cases.
link to Accuracy graph
link to IoU graph

Problem reproducing the predicted covariance of a gaussian process using gpytorch with same hyperparameters

I need to build a function that gives the a posteriori covariance of a Gaussian Process. The idea is to train a GP using GPytorch, then take the learned hyperparameters, and pass them into my kernel function. (for several reason I can't use the GPyTorch directly).
Now the problem is that I can't reproduce the prediction. Here the code I wrote. I have been working on it the whole day but I can't find the problem. Do you know what I am doing wrong?
from gpytorch.mlls import ExactMarginalLogLikelihood
import numpy as np
import gpytorch
import torch
train_x1 = torch.linspace(0, 0.95, 50) + 0.05 * torch.rand(50)
train_y1 = torch.sin(train_x1 * (2 * np.pi)) + 0.2 * torch.randn_like(train_x1)
n_datapoints = train_x1.shape[0]
def kernel_rbf(x1, x2, c, l):
# my RBF function
if x1.shape is ():
x1 = np.atleast_2d(x1)
if x2.shape is ():
x2 = np.atleast_2d(x2)
return c * np.exp(- np.matmul((x1 - x2).T, (x1 - x2)) / (2 * l ** 2))
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super().__init__(train_x, train_y, likelihood)
lengthscale_prior = gpytorch.priors.GammaPrior(3.0, 6.0)
outputscale_prior = gpytorch.priors.GammaPrior(2.0, 0.15)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.ScaleKernel(
gpytorch.kernels.RBFKernel(lengthscale_prior=lengthscale_prior),
outputscale_prior=outputscale_prior)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x1, train_y1, likelihood)
# Find optimal model hyperparameters
model.train()
likelihood.train()
mll = ExactMarginalLogLikelihood(likelihood, model)
# Use the Adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1) # Includes GaussianLikelihood parameters
training_iterations = 50
for i in range(training_iterations):
optimizer.zero_grad()
output = model(*model.train_inputs)
loss = -mll(output, model.train_targets)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
optimizer.step()
# Get the learned hyperparameters
outputscale = model.covar_module.outputscale.item()
lengthscale = model.covar_module.base_kernel.lengthscale.item()
noise = likelihood.noise_covar.noise.item()
train_x1 = train_x1.numpy()
train_y1 = train_y1.numpy()
# Get covariance train points
K = np.zeros((n_datapoints, n_datapoints))
for i in range(n_datapoints):
for j in range(n_datapoints):
K[i, j] = kernel_rbf(train_x1[i], train_x1[j], outputscale, lengthscale)
# Add noise
K += noise ** 2 * np.eye(n_datapoints)
# Get covariance train-test points
x_test = torch.rand(1, 1)
Ks = np.zeros((n_datapoints, 1))
for i in range(n_datapoints):
Ks[i] = kernel_rbf(train_x1[i], x_test.numpy(), outputscale, lengthscale)
# Get variance test points
Kss = kernel_rbf(x_test.numpy(), x_test.numpy(), outputscale, lengthscale)
L = np.linalg.cholesky(K)
v = np.linalg.solve(L, Ks)
var = Kss - np.matmul(v.T, v)
model.eval()
likelihood.eval()
with gpytorch.settings.fast_pred_var():
y_preds = likelihood(model(x_test))
print(f"Predicted variance with gpytorch:{y_preds.variance.item()}")
print(f"Predicted variance with my kernel:{var}")
I found the errors:
The noise is not squared so it is K += noise * np.eye(n_datapoints) and not K += noise**2 * np.eye(n_datapoints)
I forgot to add the noise term in the $$ K** $$, i.e. Kss += noise

Why does regularization in pytorch and scratch code does not match and what is the formula used for regularization in pytorch?

I have been trying to do L2 regularization on a binary classification model in PyTorch but when I match the results of PyTorch and scratch code it doesn't match,
Pytorch code:
class LogisticRegression(nn.Module):
def __init__(self,n_input_features):
super(LogisticRegression,self).__init__()
self.linear=nn.Linear(4,1)
self.linear.weight.data.fill_(0.0)
self.linear.bias.data.fill_(0.0)
def forward(self,x):
y_predicted=torch.sigmoid(self.linear(x))
return y_predicted
model=LogisticRegression(4)
criterion=nn.BCELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=0.05,weight_decay=0.1)
dataset=Data()
train_data=DataLoader(dataset=dataset,batch_size=1096,shuffle=False)
num_epochs=1000
for epoch in range(num_epochs):
for x,y in train_data:
y_pred=model(x)
loss=criterion(y_pred,y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Scratch Code:
def sigmoid(z):
s = 1/(1+ np.exp(-z))
return s
def yinfer(X, beta):
return sigmoid(beta[0] + np.dot(X,beta[1:]))
def cost(X, Y, beta, lam):
sum = 0
sum1 = 0
n = len(beta)
m = len(Y)
for i in range(m):
sum = sum + Y[i]*(np.log( yinfer(X[i],beta)))+ (1 -Y[i])*np.log(1-yinfer(X[i],beta))
for i in range(0, n):
sum1 = sum1 + beta[i]**2
return (-sum + (lam/2) * sum1)/(1.0*m)
def pred(X,beta):
if ( yinfer(X, beta) > 0.5):
ypred = 1
else :
ypred = 0
return ypred
beta = np.zeros(5)
iterations = 1000
arr_cost = np.zeros((iterations,4))
print(beta)
n = len(Y_train)
for i in range(iterations):
Y_prediction_train=np.zeros(len(Y_train))
Y_prediction_test=np.zeros(len(Y_test))
for l in range(len(Y_train)):
Y_prediction_train[l]=pred(X[l,:],beta)
for l in range(len(Y_test)):
Y_prediction_test[l]=pred(X_test[l,:],beta)
train_acc = format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)
test_acc = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100
arr_cost[i,:] = [i,cost(X,Y_train,beta,lam),train_acc,test_acc]
temp_beta = np.zeros(len(beta))
''' main code from below '''
for j in range(n):
temp_beta[0] = temp_beta[0] + yinfer(X[j,:], beta) - Y_train[j]
temp_beta[1:] = temp_beta[1:] + (yinfer(X[j,:], beta) - Y_train[j])*X[j,:]
for k in range(0, len(beta)):
temp_beta[k] = temp_beta[k] + lam * beta[k] #regularization here
temp_beta= temp_beta / (1.0*n)
beta = beta - alpha*temp_beta
graph of the losses
graph of training accuracy
graph of testing accuracy
Can someone please tell me why this is happening?
L2 value=0.1
Great question. I dug a lot through PyTorch documentation and found the answer. The answer is very tricky. Basically there are two ways to calculate regulalarization. (For summery jump to the last section).
The PyTorch uses the first type (in which regularization factor is not divided by batch size).
Here's a sample code which demonstrates that:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
class model(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
self.linear.weight.data.fill_(1.0)
self.linear.bias.data.fill_(1.0)
def forward(self, x):
return self.linear(x)
model = model()
optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1.0)
input = torch.tensor([[2], [4]], dtype=torch.float32)
target = torch.tensor([[7], [11]], dtype=torch.float32)
optimizer.zero_grad()
pred = model(input)
loss = F.mse_loss(pred, target)
print(f'input: {input[0].data, input[1].data}')
print(f'prediction: {pred[0].data, pred[1].data}')
print(f'target: {target[0].data, target[1].data}')
print(f'\nMSEloss: {loss.item()}\n')
loss.backward()
print('Before updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data, gradient]: {model.linear.weight.data, model.linear.weight.grad}')
print(f'bias [data, gradient]: {model.linear.bias.data, model.linear.bias.grad}')
print('--------------------------------------------------------------------------')
optimizer.step()
print('After updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data]: {model.linear.weight.data}')
print(f'bias [data]: {model.linear.bias.data}')
print('--------------------------------------------------------------------------')
which outputs:
input: (tensor([2.]), tensor([4.]))
prediction: (tensor([3.]), tensor([5.]))
target: (tensor([7.]), tensor([11.]))
MSEloss: 26.0
Before updation:
--------------------------------------------------------------------------
weight [data, gradient]: (tensor([[1.]]), tensor([[-32.]]))
bias [data, gradient]: (tensor([1.]), tensor([-10.]))
--------------------------------------------------------------------------
After updation:
--------------------------------------------------------------------------
weight [data]: tensor([[4.1000]])
bias [data]: tensor([1.9000])
--------------------------------------------------------------------------
Here m = batch size = 2, lr = alpha = 0.1, lambda = weight_decay = 1.
Now consider tensor weight which has value = 1 and grad = -32
case1(type1 regularization):
weight = weight - lr(grad + weight_decay.weight)
weight = 1 - 0.1(-32 + 1(1))
weight = 4.1
case2(type2 regularization):
weight = weight - lr(grad + (weight_decay/batch size).weight)
weight = 1 - 0.1(-32 + (1/2)(1))
weight = 4.15
From the output we can see that updated weight = 4.1000. That concludes PyTorch uses type1 regularization.
So finally In your code you are following type2 regularization. So just change some last lines to this:
# for k in range(0, len(beta)):
# temp_beta[k] = temp_beta[k] + lam * beta[k] #regularization here
temp_beta= temp_beta / (1.0*n)
beta = beta - alpha*(temp_beta + lam * beta)
And also PyTorch loss functions doesn't include regularization term(implemented inside optimizers) so also remove regularization terms inside your custom cost function.
In summary:
Pytorch use this Regularization function:
Regularization is implemented inside Optimizers (weight_decay parameter).
PyTorch Loss functions doesn't include Regularization term.
Bias is also regularized if Regularization is used.
To use Regularization try:
torch.nn.optim.optimiser_name(model.parameters(), lr, weight_decay=lambda).

Keras custom loss with one of the features used and a condition

I'm trying to make a custom function for the deviance in Keras.
Deviance is calculated as : 2 * (log(yTrue) - log(yPred))
The problem here is that my yTrue values are rare event count and therefore often equal to 0, resulting in a -inf error.
The derivation of deviance for my specific case (poisson unscaled deviance) gives a solution to this :
If yTrue = 0, then deviance is : 2 * D * yPred where D is a feature of my data.
If yTrue !=0, then deviance is : 2 * D * (yTrue * ln(yTrue) - yTrue * ln(yPred) - yTrue + yPred
There is two problems here i encounter :
I need to choose the function according to the value of yPred
I also need to pass D as argument to the loss function
I made a first iteration of a loss function before derivating deviance, adding small values to yTrue when it is equal to 0 to prevent the -Inf. problems, but it gives wrong results for deviance so i have to change it.
def DevianceBis(y_true, y_pred):
y_pred = KB.maximum(y_pred, 0.0 + KB.epsilon()) #make sure ypred is positive or ln(-x) = NAN
return (KB.sqrt(KB.square( 2 * KB.log(y_true + KB.epsilon()) - KB.log(y_pred))))
I'd like to know how to pass the D values into the loss function and how to use an if statement in order to choose the correct expression to use.
Thanks in advance
EDIT :
Tried this one but returns NaN
def custom_loss(data, y_pred):
y_true = data[:, 0]
d = data[:, 1:]
# condition
mask = keras.backend.equal(y_true, 0) #i.e. y_true != 0
mask = KB.cast(mask, KB.floatx())
# returns 0 when y_true =0, 1 otherwise
#calculate loss using d...
loss_value = mask * (2 * d * y_pred) + (1-mask) * 2 * d * (y_true * KB.log(y_true) - y_true * KB.log(y_pred) - y_true + y_pred)
return loss_value
def baseline_model():
# create model
#building model
model = keras.Sequential()
model.add(Dense(5, input_dim = 26, activation = "relu"))
#model.add(Dense(10, activation = "relu"))
model.add(Dense(1, activation = "exponential"))
model.compile(loss=custom_loss, optimizer='RMSProp')
return model
model = baseline_model()
model.fit(data2, np.append(y2, d, axis = 1), epochs=1, shuffle=True, verbose=1)
EDIT 2 :
def custom_loss(data, y_pred):
y_true = data[:, 0]
d = data[:, 1:]
# condition
mask2 = keras.backend.not_equal(y_true, 0) #i.e. y_true != 0
mask2 = KB.cast(mask2, KB.floatx())
# returns 0 when y_true =0, 1 otherwise
#calculate loss using d...
loss_value = 2 * d * y_pred + mask2 * (2 * d * y_true * KB.log(y_true) + 2 * d * y_true * KB.log(y_pred) - 2 * d * y_true)
return loss_value
EDIT 3 seems to be working without the logs (altough it isn't the result i am looking for) :
def custom_loss(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
# condition
mask2 = keras.backend.not_equal(y_true, 0) #i.e. y_true != 0
mask2 = KB.cast(mask2, KB.floatx())
# returns 0 when y_true =0, 1 otherwise
#calculate loss using d...
loss_value = 2 * d * y_pred #+ mask2 * (2 * d * y_true * KB.log(y_true) + 2 * d * y_true * KB.log(y_pred) - 2 * d * y_true)
return loss_value
def baseline_model():
# create model
#building model
model = keras.Sequential()
model.add(Dense(5, input_dim = 26, activation = "relu"))
#model.add(Dense(10, activation = "relu"))
model.add(Dense(1, activation = "exponential"))
model.compile(loss=custom_loss, optimizer='RMSProp')
return model
model = baseline_model()
model.fit(data2, np.append(y2, d, axis = 1), epochs=1, shuffle=True, verbose=1)
EDIT again :
def custom_loss3(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
# condition
loss_value = KB.switch(KB.greater(y_true, 0), 2 * d * y_pred, 2 * d * (y_true * KB.log(y_true + KB.epsilon()) - y_true * KB.log(y_pred + KB.epsilon()) - y_true + y_pred))
return loss_value
If the D is a feature on input vector, you can pad your label with extra D columns from input and write a custom loss. You can pass the extra prediction info w.r.t. your input as a numpy array like this
def custom_loss(data, y_pred):
y_true = data[:, 0]
d = data[:, 1:]
# condition
mask = K.not_equal(y_true, 0) #i.e. y_true != 0
# returns 0 when y_true =0, 1 otherwise
#calculate loss using d...
loss_value = mask*(2*d*y_pred) + mask*(2*d*(y_true*ln(y_true) - y_true*ln(y_pred) - y_true + y_pred)
return loss_value
def baseline_model():
# create model
i = Input(shape=(5,))
x = Dense(5, kernel_initializer='glorot_uniform', activation='linear')(i)
o = Dense(1, kernel_initializer='normal', activation='linear')(x)
model = Model(i, o)
model.compile(loss=custom_loss, optimizer=Adam(lr=0.0005))
return model
model.fit(X, np.append(Y_true, d, axis =1), batch_size = batch_size, epochs=90, shuffle=True, verbose=1)
EDIT:
I added the mask for the conditional statement. I am not exactly sure if it will work that way, or do you need to cast it to integer tensors; because the function returns a bool.
So here's the final answer ... after days i finally found how to do it.
def custom_loss3(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
lnYTrue = KB.switch(KB.equal(y_true, 0), KB.zeros_like(y_true), KB.log(y_true))
lnYPred = KB.switch(KB.equal(y_pred, 0), KB.zeros_like(y_pred), KB.log(y_pred))
loss_value = 2 * d * (y_true * lnYTrue - y_true * lnYPred[:, 0] - y_true + y_pred[:, 0])
return loss_value
Calculate the logs before the actual loss and give K.zeros_like instead of it if the value of y_true is 0. Also need to only take the first vector of y_pred since it will return a vector NxN and y_true will return Nx1.
Also had to delete values of d=0 in the data (they wern't of much use anyway).

Categories