torch.nn.BCEloss() and torch.nn.functional.binary_cross_entropy - python

What is the basic difference between these two loss functions? I have already tried using both the loss functions.

The difference is that nn.BCEloss and F.binary_cross_entropy are two PyTorch interfaces to the same operations.
The former, torch.nn.BCELoss, is a class and inherits from nn.Module which makes it handy to be used in a two-step fashion, as you would always do in OOP (Object Oriented Programming): initialize then use. Initialization handles parameters and attributes initialization as the name implies which is quite useful when using stateful operators such as parametrized layers and the kind. This is the way to go when implementing classes of your own, for example:
class Trainer():
def __init__(self, model):
self.model = model
self.loss = nn.BCEloss()
def __call__(self, x, y)
y_hat = self.model(x)
loss = self.loss(y_hat, y)
return loss
On the other hand, the later, torch.nn.functional.binary_cross_entropy, is the functional interface. It is actually the underlying operator used by nn.BCELoss, as you can see at this line. You can use this interface but this can become cumbersome when using stateful operators. In this particular case, the binary cross-entropy loss does not have parameters (in the most general case), so you could do:
class Trainer():
def __init__(self, model):
self.model = model
def __call__(self, x, y)
y_hat = self.model(x)
loss = F.binary_cross_entropy(y_hat, y)
return loss

BCEloss is the Binary_Cross_Entropy loss.
torch.nn.functional.binary_cross_entropy calculates the actual loss inside the torch.nn.BCEloss()

Related

How to get the params in a custom implementation of a pytorch optimizer?

Quick overview of my issue: I'm implementing a custom optimizer, and all I want to do is add two class variables and an extra instruction to the step() function. Here's the class code for reference:
class DPSGD(torch.optim.SGD):
def __init__(
self, noise_multiplier: float = 0.5, l2_norm_clip: float = 1.5, *args, **kwargs
) -> None:
super().__init__(*args, **kwargs)
self.noise_multiplier = noise_multiplier
self.l2_norm_clip = l2_norm_clip
def step(self, closure=None) -> Optional[float]:
closure = super().step()
params = []
# works for getting the params I need but there's gotta be a better way
for group in self.param_groups:
for p in group["params"]:
if p.grad is not None:
params.append(p)
# custom function that takes model.parameters()
privacy.noise_and_clip_parameters(
params,
l2_norm_clip=self.l2_norm_clip,
noise_multiplier=self.noise_multiplier,
)
return closure
I don't really understand what the param_groups are. I know that when the model.parameters() are passed to an optimizer like
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
they are taken and converted into the param_groups as a class variable, but I don't know a simple way to just get the original params out as they were. Is there a way to extract those original model.parameters() that's already implemented in the base optimizer class? My goal is to not have to use that nested for loop because I'm planning on extending other torch optimizers in the same manner and for readability I'd rather avoid writing more code than I need to.

When are Model call() and train_step() called?

I am going through this tutorial on how to customize the training loop
https://colab.research.google.com/github/tensorflow/docs/blob/snapshot-keras/site/en/guide/keras/customizing_what_happens_in_fit.ipynb#scrollTo=46832f2077ac
The last example shows a GAN implemented with a custom training, where only __init__, train_step, and compile methods are defined
class GAN(keras.Model):
def __init__(self, discriminator, generator, latent_dim):
super(GAN, self).__init__()
self.discriminator = discriminator
self.generator = generator
self.latent_dim = latent_dim
def compile(self, d_optimizer, g_optimizer, loss_fn):
super(GAN, self).compile()
self.d_optimizer = d_optimizer
self.g_optimizer = g_optimizer
self.loss_fn = loss_fn
def train_step(self, real_images):
if isinstance(real_images, tuple):
real_images = real_images[0]
...
What happens if my model also has a call() custom function? Does train_step() overrides call()?
Aren't call() and train_step() both called by fit() and what is the difference between both ?
Below another piece of code "I" wrote where I wonder what is called into fit(), call() or train_step():
class MyModel(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, rnn_units):
super().__init__(self)
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(rnn_units,
return_sequences=True,
return_state=True,
reset_after=True
)
self.dense = tf.keras.layers.Dense(vocab_size)
def call(self, inputs, states=None, return_state=False, training=False):
x = inputs
x = self.embedding(x, training=training)
if states is None:
states = self.gru.get_initial_state(x)
x, states = self.gru(x, initial_state=states, training=training)
x = self.dense(x, training=training)
if return_state:
return x, states
else:
return x
#tf.function
def train_step(self, inputs):
# unpack the data
inputs, labels = inputs
with tf.GradientTape() as tape:
predictions = self(inputs, training=True) # forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss=self.compiled_loss(labels, predictions, regularization_losses=self.losses)
# compute the gradients
grads=tape.gradient(loss, model.trainable_variables)
# Update weights
self.optimizer.apply_gradients(zip(grads, model.trainable_variables))
# Update metrics (includes the metric that tracks the loss)
self.compiled_metrics.update_state(labels, predictions)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
These are different concepts and are used like this:
train_step is called by fit. Basically, fit loops over the dataset and provide each batch to train_step (and then handles metrics, bookkeeping, etc., of course).
call is used when you, well, call the model. To be precise, writing model(inputs) or in your case self(inputs) will use the function __call__, but the Model class has that function defined such that it will in turn use call.
Those are the technical aspects. Intuitively:
call should define the forward-pass of your model. i.e. how is the input transformed to the output.
train_step defines the logic of a training step, usually with gradient descent. It will often make use of call since the training step tends to include a forward pass of the model to compute gradients.
As for the GAN tutorial you linked, I would say that can actually be considered incomplete. It works without defining call because the custom train_step explicitly calls the generator/discriminator fields (as these are predefined models, they can be called as usual). If you tried to call the GAN model like gan(inputs), I would assume you get an error message (I did not test this). So you would always have to call gan.generator(inputs) to generate, for example.
Finally (this part may be a bit confusing), note that you can subclass a Model to define a custom training step, but then initialize it via the functional API (like model = Model(inputs, outputs)), in which case you can make use of call in the training step without ever defining it yourself because the functional API takes care of that.

Will my loss function work the way I would like it to work? (Keras)

So I implemented a neural network with this code:
self.model = keras.Sequential()
self.model.add(keras.Input(shape=(self.wejscia,), name="Input"))
self.model.add(layers.Dense(64, activation="relu", name="dense_1"))
self.model.add(layers.Dense(64, activation="relu", name="dense_2"))
self.model.add(layers.Dense(8, activation="softmax", name="predictions"))
But I wanted to make it possible to perform gradient descent on only one, chosen position of the output vector. The way i did it was like this:
First I created a class like that:
class CustomMSE(keras.losses.Loss):
def __init__(self, my_output, name="custom_mse"):
super().__init__(name=name)
self.my_output = my_output
def call(self, y_true, y_pred):
mse = tf.math.reduce_mean(tf.square(y_true[0,self.my_output] - y_pred[0,self.my_output]))
return mse
and then I just applied compile method like that:
self.model.compile(optimizer=keras.optimizers.Adam(), loss=CustomMSE(i))
I am not sure of two things.
First: will the .fit method modify the wages between the second hidden layer and the j-th output for j !=i (I hope it won't)
Second: will the instruction self.model.compile(optimizer=keras.optimizers.Adam(), loss=CustomMSE(i)) applied many times for different values of i affect the current wages of the model, or will it just change the further behavior of the network after aplying the .fit method?
With the code you have, it will not work as expected, as you are using tf. functions rather than keras.backend functions to create loss functions. Here is an example of how you can create a custom loss function:
import tensorflow.keras.backend as kb
def custom_loss(y_actual,y_pred):
custom_loss=kb.square(y_actual-y_pred)
return custom_loss
You can use this loss function like this:
model.compile(loss=custom_loss,optimizer=optimizer)
Of course, this is not the same loss function you implemented, but it shows the methodology.

Differentiating user-defined Variables when using Keras layers

I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.

Pytorch custom activation functions?

I'm having issues with implementing custom activation functions in Pytorch, such as Swish. How should I go about implementing and using custom activation functions in Pytorch?
There are four possibilities depending on what you are looking for. You will need to ask yourself two questions:
Q1) Will your activation function have learnable parameters?
If yes, you have no choice but to create your activation function as an nn.Module class because you need to store those weights.
If no, you are free to simply create a normal function, or a class, depending on what is convenient for you.
Q2) Can your activation function be expressed as a combination of existing PyTorch functions?
If yes, you can simply write it as a combination of existing PyTorch function and won't need to create a backward function which defines the gradient.
If no you will need to write the gradient by hand.
Example 1: SiLU function
The SiLU function f(x) = x * sigmoid(x) does not have any learned weights and can be written entirely with existing PyTorch functions, thus you can simply define it as a function:
def silu(x):
return x * torch.sigmoid(x)
and then simply use it as you would have torch.relu or any other activation function.
Example 2: SiLU with learned slope
In this case you have one learned parameter, the slope, thus you need to make a class of it.
class LearnedSiLU(nn.Module):
def __init__(self, slope = 1):
super().__init__()
self.slope = slope * torch.nn.Parameter(torch.ones(1))
def forward(self, x):
return self.slope * x * torch.sigmoid(x)
Example 3: with backward
If you have something for which you need to create your own gradient function, you can look at this example: Pytorch: define custom function
You can write a customized activation function like below (e.g. weighted Tanh).
class weightedTanh(nn.Module):
def __init__(self, weights = 1):
super().__init__()
self.weights = weights
def forward(self, input):
ex = torch.exp(2*self.weights*input)
return (ex-1)/(ex+1)
Don’t bother about backpropagation if you use autograd compatible operations.
I wrote the following SinActivation sub-class of nn.Module to implement the sin activation function.
class SinActivation(torch.nn.Module):
def __init__(self):
super(SinActivation, self).__init__()
return
def forward(self, x):
return torch.sin(x)

Categories