Formulaically updating parameters in Keras layers

Formulaically updating parameters in Keras layers - python

I am trying to write some custom layers in Keras. The ultimate goal is that certain parameters (updated according to a fixed formula after each batch of data is optimized over in the training process) be passed to the loss function. I do not believe it is possible to use dynamic loss functions in Keras, but that I should be able to pass these parameters to the loss function using multiple inputs and a custom layer.
I want to know whether it is possible to create a layer in Keras having parameters that are not trainable (and not optimized over at all in the training process), but instead updated according to a fixed formula at the end of each batch optimization in the training process.
The simplest example I can give: instead of optimizing a generic cost function (like cross-entropy), I want to optimize something proportional to the cross entropy (c*cross_entropy). After one batch of data is processed in the training procedure, I want to set, for example, c = 1.2*c, and this to be used as the c value in the batch of data.
(This should be more or less useless in this case as a positive constant times the loss function shouldn't affect the minima but it's fairly close to what I actually need to do).

Related

Loss Function in PyTorch where not all my training examples are equally weighted

I want to train a Neural Network in PyTorch. I have my training dataset, however I care more about some examples than about others. I want to include this information in the loss function - to let the NN know that it is very important to get some examples right and to not punish errors on other examples very much.
I want to do this by weighting the loss for training examples, let's say:
loss = weight_for_example*(y_true - y_pred)^2
Is there an easy way to do this in PyTorch?

It mainly depends on your task: for instance, BCEWithLogitsLoss has a weight parameter that allows a custom weight for each batch. Many other built-in losses also provide this option.
Aside from solutions already available in the framework such as this, a simple approach could be the following:
build a custom dataset, returning your data and a scalar weight for that sample in your __getitem__
proceed with the forward pass
compute your loss, which you can now multiply by the factors you provided.
There's only a caveat (which is the same of the BCELoss): you probably iterate on batches with size > 1, so your dataloader will provide a batch of data, with a batch of weights. You need to make sure you don't reduce your loss beforehand, so that you can still multiply it by your batch weight, then you can proceed with a manual reduction (e.g. loss = loss.mean()).
See some examples here.

How implement Batch Norm with SWA in Tensorflow?

I am using Stochastic Weight Averaging (SWA) with Batch Normalization layers in Tensorflow 2.2. For Batch Norm I use tf.keras.layers.BatchNormalization. For SWA I use my own code to average the weights (I wrote my code before tfa.optimizers.SWA appeared). I have read in multiple sources that if using batch norm and SWA we must run a forward pass to make certain data (running mean and st dev of activation weights and/or momentum values?) available to the batch norm layers. What I do not understand - despite a lot of reading - is exactly what needs to be done and how. Specifically:
When must the forward/prediction pass be run? At the end of each
mini-batch, end of each epoch, end of all training?
When the forward pass is run, how are the running mean & stdev values made available
to the batch norm layers?
Is this process performed magically by the tfa.optimizers.SWA class?

When must the forward/prediction pass be run? At the end of each
mini-batch, end of each epoch, end of all training?
At the end of training. Think of it like this, SWA is performed by swapping your final weights with a running average. But all batch norm layers are still calculated based on statistics from your old weights. So we need to run a forward pass to let them catch up.
When the forward pass is run, how are the running mean & stdev values
made available to the batch norm layers?
During a normal forward pass (prediction) the running mean and standard deviation will not be updated. So what we actually need to do is to train the network, but not update the weights. This is what the paper refers to when it says to run the forward pass in "training mode".
The easiest way to achieve this (that I know) is to reset the batch normalization layers and train one additional epoch with learning rate set to 0.
Is this process performed magically by the tfa.optimizers.SWA class?
I don't know. But if you are using Tensorflow Keras then I have made this Keras SWA callback that does it like in the paper including the learning rate schedules.

How does PyTorch's loss.backward() work when "retain_graph=True" is specified?

I'm a newbie with PyTorch and adversarial networks. I've tried to look for an answer on the PyTorch documentation and from previous discussions both in the PyTorch and StackOverflow forums, but I couldn't find anything useful.
I'm trying to train a GAN with a Generator and a Discriminator, but I cannot understand if the whole process is working or not. As far as I'm concerned, I should train the Generator first and, then, updating the Discriminator's weights (similarly as this). My code for updating the weights of both models is:
# computing loss_g and loss_d...
optim_g.zero_grad()
loss_g.backward()
optim_g.step()
optim_d.zero_grad()
loss_d.backward()
optim_d.step()
where loss_g is the generator loss, loss_d is the discriminator loss, optim_g is the optimizer referring to the generator's parameters and optim_d is the discriminator optimizer.
If I run the code like this, I get an error:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
So I specify loss_g.backward(retain_graph=True), and here comes my doubt: why should I specify retain_graph=True if there are two networks with two different graphs? Am I getting something wrong?

Having two different networks doesn't necessarily mean that the computational graph is different. The computational graph only tracks the operations that were performed from the input to the output and it doesn't matter where the operation takes place. In other words, if you use the output of the first model in the second model (e.g. model2(model1(input))), you have the same sequential operations as if they were part of the same model. In fact, that is no different from having different parts of the model, such as multiple convolutions, that you apply one after the other.
The error you get, indicates that you are trying to backpropagate from the discriminator through the generator, which would mean that the discriminator's output directly adapts the generator's parameters for the discriminator to be successful. In an adversarial setting that is precisely what you want to avoid, they should be independent from each other. By setting retrain_graph=True you incorrectly hide this bug. In nearly all cases retain_graph=True is not the solution and should be avoided.
To resolve that issue, the two models need to be made independent from each other. The crossover between the two models happens when you use the generators output for the discriminator, since it should decide whether that was real or fake. Something along these lines:
fake = generator(noise)
real_prediction = discriminator(real)
# Using the output of the generator, continues the graph.
fake_prediction = discriminator(fake)
Even though fake comes from the generator, as far as the discriminator is concerned, it's merely another input, just like real. Therefore fake should be treated the same as real, where it is not attached to any computational graph. That can easily be done with torch.Tensor.detach, which decouples the tensor from the graph.
fake = generator(noise)
real_prediction = discriminator(real)
# Detach to make it independent of the generator
fake_prediction = discriminator(fake.detach())
That is also done in the code you referenced, from erikqu/EnhanceNet-PyTorch - train.py:
hr_imgs = torch.cat([discriminator(hr), discriminator(generated_hr.detach())], dim=0)

Should I set model.eval for getting the current training loss in Pytorch?

We set model.train() during training, but during my training iterations, I also want to do a forward pass of the training dataset to see what my new loss is. When doing this, should I temporarily set model.eval()?

If your network has layers which act different during inference (torch.nn.BatchNormNd and torch.nn.DropoutNd could be an example, for the second case all neurons will be used but scaled by inverted probability of keeping neurons, see here or here for example) and you want to test how your network performs currently (which is usually called a validation step) then it is mandatory to use module.eval().
It is a common (and very good!) practice to always switch to eval mode when doing inference-like things no matter if this changes your actual model.
EDIT:
You should also use with torch.no_grad(): block during inference, see official tutorial code as gradients are not needed during this phase and it's wasteful to compute them.

Using Keras optimizers and models outside of the training process

I have a basic question. I want to know whether it is possible to use Keras (e.g. the functional API) to specify a neural network model, and then use the Keras optimization routines to train the network outside of the Keras training process. In other words, use Keras purely to specify a neural network, pull the weights out and put them into a loss function (outside the Keras APIs if necessary) then use one of the built-in optimization routines purely to minimize the loss function over a single batch of data (no multiple epochs etc. or anything beyond minimization of the loss function with regard to one set of data at this point).
My reason for wanting to do this is that I would like to use a dynamic optimization process that changes from batch iteration to batch iteration, which seems difficult to implement entirely within the Keras APIs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Formulaically updating parameters in Keras layers - python

Related

Loss Function in PyTorch where not all my training examples are equally weighted

How implement Batch Norm with SWA in Tensorflow?

How does PyTorch's loss.backward() work when "retain_graph=True" is specified?

Should I set model.eval for getting the current training loss in Pytorch?

Using Keras optimizers and models outside of the training process

Categories

Resources