I need to calculate Aitchison distance as a loss function between input and output datasets.
While calculating this mstric I need to calculate geometric mean on each row (where [batches x features] - size of a dataset during loss ).
In simple case we could imagine that there is only 1 batch so I need just to calculate one geomean for input and one for output dataset
So how it could be done on tensorflow? I didn't find any specified metrics or reduced functions
You can easily calculate the geometric mean of a tensor as a loss function (or in your case as part of the loss function) with tensorflow using a numerically stable formula highlighted here. The provided code fragment highly resembles to the pytorch solution posted here that follows the abovementioned formula (and scipy implementation).
from tensorflow.python.keras import backend as K
def gmean_loss((y_true, y_pred, dim=1):
error = y_pred - y_true
logx = K.log(inputs)
return K.exp(K.mean(logx, dim=dim))
You can define dim according to your needs or integrate it into your code.
Related
I'm trying to optimize two models in an alternating fashion using PyTorch. The first is a neural network that is changing the representation of my data (ie a map f(x) on my input data x, parameterized by some weights W). The second is a Gaussian mixture model that is operating on the f(x) points, ie in the neural network space (rather than clustering points in the input space. I am optimizing the GMM using expectation maximization, so the parameter updates are analytically derived, rather than using gradient descent.
I have two loss functions here: the first is a function of the distances ||f(x) - f(y)||, and the second is the loss function of the Gaussian mixture model (ie how 'clustered' everything looks in the NN representation space). What I want to do is take a step in the NN optimization using both of the above loss functions (since it depends on both), and then do an expectation-maximization step for the GMM. The code looks like this (I have removed a lot since there is a ton of code):
data, labels = load_dataset()
net = NeuralNetwork()
net_optim = torch.optim.Adam(net.parameters(), lr=0.05, weight_decay=1)
# initialize weights, means, and covariances for the Gaussian clusters
concentrations, means, covariances, precisions = initialization(net.forward_one(data))
for i in range(1000):
net_optim.zero_grad()
pairs, pair_labels = pairGenerator(data, labels) # samples some pairs of datapoints
outputs = net(pairs[:, 0, :], pairs[:, 1, :]) # computes pairwise distances
net_loss = NeuralNetworkLoss(outputs, pair_labels) # loss function based on pairwise dist.
embedding = net.forward_one(data) # embeds all data in the NN space
log_prob, log_likelihoods = expectation_step(embedding, means, precisions, concentrations)
concentrations, means, covariances, precisions = maximization_step(embedding, log_likelihoods)
gmm_loss = GMMLoss(log_likelihoods, log_prob, precisions, concentrations)
net_loss.backward(retain_graph=True)
gmm_loss.backward(retain_graph=True)
net_optim.step()
Essentially, this is what is happening:
Sample some pairs of points from the dataset
Push pairs of points through the NN and compute network loss based on those outputs
Embed all datapoints using the NN and perform a clustering EM step in that embedding space
Compute variational loss (ELBO) based on clustering parameters
Update neural network parameters using both the variational loss and the network loss
However, to perform (5), I am required to add the flag retain_graph=True, otherwise I get the error:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
It seems like having two loss functions means that I need to retain the computational graph?
I am not sure how to work around this, as with retain_graph=True, around iteration 400, each iteration is taking ~30 minutes to complete. Does anyone know how I might fix this? I apologize in advance – I am still very new to automatic differentiation.
I would recommend doing
total_loss = net_loss + gmm_loss
total_loss.backward()
Note that the gradient of net_loss w.r.t gmm weights is 0 thus summing the losses won't have any effect.
Here is a good thread on pytorch regarding the retain_graph. https://discuss.pytorch.org/t/what-exactly-does-retain-variables-true-in-loss-backward-do/3508/24
I'm working on a binary semantic segmentation task where the distribution of one class is very smalls across any input image, hence there are only a few pixels which are labeled. When using sparse_softmax_cross_entropy
the over all error is easily decreased when ignoring this class. Now, I'm looking for a way to weight the classes by a coefficient which penalizes missclassifications for the specific class higher compared to the other class.
The doc of the loss function states:
weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.
If I understand this correctly, it says that specific sample in a batch get weighted differently compared to others. But this is actually not what I'm looking for. Does anyone know how to implement a weighted version of this loss function where the weights scale the importance of a specific class rather than samples?
To answer my own question:
The authors of the U-Net paper used a pre-computed weight-map to handle imbalanced classes.
The Institute for Anstronomy of ETH Zurich provided a Tensorflow-based U-Net package which contains a weighted version of the Softmax function (not sparse but they flatten their labels and logits first):
class_weights = tf.constant(np.array(class_weights, dtype=np.float32))
weight_map = tf.multiply(flat_labels, class_weights)
weight_map = tf.reduce_sum(weight_map, axis=1)
loss_map = tf.nn.softmax_cross_entropy_with_logits_v2(logits=flat_logits, labels=flat_labels)
weighted_loss = tf.multiply(loss_map, weight_map)
loss = tf.reduce_mean(weighted_loss)
I've written an LSTM in Keras for univariate time series forecasting. I'm using an input window of size 48 and an output window of size 12, i.e. I'm predicting 12 steps at once. This is working generally well with an optimization metric such as RMSE.
For non-stationary time series I'm differencing the data before feeding the data to the LSTM. Then after predicting, I take the inverse difference of the predictions.
When differencing, RMSE is not suitable as an optimization metric as the earlier prediction steps are a lot more important than later steps. When we do the inverse difference after creating a 12-step forecast, then the earlier (differenced) prediction steps are going to affect the inverse difference of later steps.
So what I think I need is an optimization metric that gives the early prediction steps more weight, preferably exponentially.
Does such a metric exist already or should I write my own? Am I overlooking something?
Just wrote my own optimization metric, it seems to work well, certainly better than RMSE.
Still curious what's the best practice here. I'm relatively new to forecasting.
from tensorflow.keras import backend as K
def weighted_rmse(y_true, y_pred):
weights = K.arange(start=y_pred.get_shape()[1], stop=0, step=-1, dtype='float32')
y_true_w = y_true * weights
y_pred_w = y_pred * weights
return K.sqrt(K.mean(K.square(y_true_w - y_pred_w), axis=-1))
I'm interested in Finding the Gradient of Neural Network output with respect to the parameters (weights and biases).
More specifically, assume I have the following Neural Network Structure [6,4,3,1]. The input samples size is 20. What I'm interested in, is finding the gradient of Neural Network output w.r.t the weights (and biases), which if I'm not mistaken, in this case would be 47. In Literature , this gradient is sometimes known as Weight_Jacobian.
I'm using Pytorch version 0.4.0 on Python 3.6 on Jupyter Notebook.
The Code that I have produced is this:
def init_params(layer_sizes, scale=0.1, rs=npr.RandomState(0)):
return [(rs.randn(insize, outsize) * scale, # weight matrix
rs.randn(outsize) * scale) # bias vector
for insize, outsize in
zip(layer_sizes[:-1],layer_sizes[1:])]
layers = [6, 4, 3, 1]
w = init_params(layers)
first_layer_w = Variable(torch.tensor(w[0][0],requires_grad=True))
first_layer_bias = Variable(torch.tensor(w[0][1],requires_grad=True))
second_layer_w = Variable(torch.tensor(w[1][0],requires_grad=True))
second_layer_bias = Variable(torch.tensor(w[1][1],requires_grad=True))
third_layer_w = Variable(torch.tensor(w[2][0],requires_grad=True))
third_layer_bias = Variable(torch.tensor(w[2][1],requires_grad=True))
X = Variable(torch.tensor(X_batch),requires_grad=True)
output=torch.tanh(torch.mm(torch.tanh(torch.mm(torch.tanh(torch.mm(X,first_layer_w)+first_layer_bias),second_layer_w)+second_layer_bias),third_layer_w)+third_layer_bias)
output.backward()
As it is obvious from the code, I'm using hyperbolic tangent as the Non Linearity. The code produces the output vector with length 20. Now, I'm interested in finding the Gradient of this Output vector w.r.t all the weights (all 47 of them). I have read the documentation of Pytorch at here. I have also seen similar questions for example, here. However, I have failed to find the gradient of output vector w.r.t parameters.
If I use the Pytorch function backward(), it generates an error as
RuntimeError: grad can be implicitly created only for scalar outputs
My Question is, is there a way to calculate the gradient of output vector w.r.t parameters, which could essentially be represented as a 20*47 matrix as I have the size of output vector to be 20 and size of parameter vector to be 47? If so, how ? Is there anything wrong in my code ? You can take any example of X as long as its dimension is 20*6.
You're trying to compute a Jacobian of a function, while PyTorch is expecting you to compute vector-Jacobian products. You can see an in-depth discussion of computing Jacobians with PyTorch here.
You have two options. Your first option is to use JAX or autograd and use the jacobian() function. Your second option is to stick with Pytorch and compute 20 vector-jacobian products, by calling backwards(vec) 20 times, where vec is a length-20 one-hot vector where the index of the component which is one ranges from 0 to 19. If this is confusing, I recommend reading the autodiff cookbook from the JAX tutorials.
The matrix of partial derivatives of a function with respect to its parameters is known as the Jacobian, and can be computed in PyTorch with:
torch.autograd.functional.jacobian(func, inputs)
I am nooby in this field of study and probably this is a pretty silly question. I want to build a normal ANN, but I am not sure if I can use a weighted mean square error as the loss function.
If we are not treating each sample equally, I mean we care the prediction precision more for some of the categories of the samples more than the others, then we want to form a weighted loss function.
Lets say, we have a categorical feature ci, i is the index of the sample, and for simplicity, we assume that this feature takes binary value, either 0 or 1. So, we can form the loss function as
(ci + 1)(yi_hat - yi)^2
#and take the sum for all i
Are there going to be any problem with the back-propagation? I don't see any issue with calculating the gradient or updating the weights between layers.
And, if no issue, how can I program this loss function in Keras? Because it seems that the loss function only takes two parameters, y_true and y_pred, how can I plug in the vector c?
There is absolutely nothing wrong with that. Functions can declare the constants withing themselves or even take the constants from an outside scope:
import keras.backend as K
c = K.constant([c1,c2,c3,c4,...,cn])
def weighted_loss(y_true,y_pred):
loss = keras.losses.get('mse')
return c * loss(y_true,y_pred)
Exactly like yours:
def weighted_loss(y_true,y_pred):
weighted = (c+1)*K.square(y_true-y_pred)
return K.sum(weighted)