Pytorch custom activation functions? - python

I'm having issues with implementing custom activation functions in Pytorch, such as Swish. How should I go about implementing and using custom activation functions in Pytorch?

There are four possibilities depending on what you are looking for. You will need to ask yourself two questions:
Q1) Will your activation function have learnable parameters?
If yes, you have no choice but to create your activation function as an nn.Module class because you need to store those weights.
If no, you are free to simply create a normal function, or a class, depending on what is convenient for you.
Q2) Can your activation function be expressed as a combination of existing PyTorch functions?
If yes, you can simply write it as a combination of existing PyTorch function and won't need to create a backward function which defines the gradient.
If no you will need to write the gradient by hand.
Example 1: SiLU function
The SiLU function f(x) = x * sigmoid(x) does not have any learned weights and can be written entirely with existing PyTorch functions, thus you can simply define it as a function:
def silu(x):
return x * torch.sigmoid(x)
and then simply use it as you would have torch.relu or any other activation function.
Example 2: SiLU with learned slope
In this case you have one learned parameter, the slope, thus you need to make a class of it.
class LearnedSiLU(nn.Module):
def __init__(self, slope = 1):
super().__init__()
self.slope = slope * torch.nn.Parameter(torch.ones(1))
def forward(self, x):
return self.slope * x * torch.sigmoid(x)
Example 3: with backward
If you have something for which you need to create your own gradient function, you can look at this example: Pytorch: define custom function

You can write a customized activation function like below (e.g. weighted Tanh).
class weightedTanh(nn.Module):
def __init__(self, weights = 1):
super().__init__()
self.weights = weights
def forward(self, input):
ex = torch.exp(2*self.weights*input)
return (ex-1)/(ex+1)
Don’t bother about backpropagation if you use autograd compatible operations.

I wrote the following SinActivation sub-class of nn.Module to implement the sin activation function.
class SinActivation(torch.nn.Module):
def __init__(self):
super(SinActivation, self).__init__()
return
def forward(self, x):
return torch.sin(x)

Related

Python Unit Testing: How is function automatically called without providing function name?

I am looking into the codes in vanilla_vae here and its unit test test_vae here.
In the code snippet of test_vae below, I am confused as to how self.model(x) portion in test_loss(self) function directly calls VanillaVAE class's forward method without mentioning the function name. Could anyone provide me insight on this?
def setUp(self) -> None:
# self.model2 = VAE(3, 10)
self.model = VanillaVAE(3, 10)
def test_loss(self):
x = torch.randn(16, 3, 64, 64)
result = self.model(x)
loss = self.model.loss_function(*result, M_N = 0.005)
print(loss)
This is because vanilla_vae inherits from BaseVAE, which inherits from nn.Module.
nn.Module contains a __call__ method, which is a built in method that makes classes callable.
This calls _call_impl where the forward function is referenced.
This behavior depends on the torch.nn.Module. That is the PyTorch base class for creating neural networks. In the forward function, you define how your model is going to be run, from input to output.
This means that every time you pass an input to your model, the forward function is called automatically and it returns what it is defined. In this case, as I can see from your link, a List[Tensor]:
def forward(self, input: Tensor, **kwargs) -> List[Tensor]:
mu, log_var = self.encode(input)
z = self.reparameterize(mu, log_var)
return [self.decode(z), input, mu, log_var]
Here you can also find a couple of examples on how the nn package is used from PyTorch.

torch.nn.BCEloss() and torch.nn.functional.binary_cross_entropy

What is the basic difference between these two loss functions? I have already tried using both the loss functions.
The difference is that nn.BCEloss and F.binary_cross_entropy are two PyTorch interfaces to the same operations.
The former, torch.nn.BCELoss, is a class and inherits from nn.Module which makes it handy to be used in a two-step fashion, as you would always do in OOP (Object Oriented Programming): initialize then use. Initialization handles parameters and attributes initialization as the name implies which is quite useful when using stateful operators such as parametrized layers and the kind. This is the way to go when implementing classes of your own, for example:
class Trainer():
def __init__(self, model):
self.model = model
self.loss = nn.BCEloss()
def __call__(self, x, y)
y_hat = self.model(x)
loss = self.loss(y_hat, y)
return loss
On the other hand, the later, torch.nn.functional.binary_cross_entropy, is the functional interface. It is actually the underlying operator used by nn.BCELoss, as you can see at this line. You can use this interface but this can become cumbersome when using stateful operators. In this particular case, the binary cross-entropy loss does not have parameters (in the most general case), so you could do:
class Trainer():
def __init__(self, model):
self.model = model
def __call__(self, x, y)
y_hat = self.model(x)
loss = F.binary_cross_entropy(y_hat, y)
return loss
BCEloss is the Binary_Cross_Entropy loss.
torch.nn.functional.binary_cross_entropy calculates the actual loss inside the torch.nn.BCEloss()

Will my loss function work the way I would like it to work? (Keras)

So I implemented a neural network with this code:
self.model = keras.Sequential()
self.model.add(keras.Input(shape=(self.wejscia,), name="Input"))
self.model.add(layers.Dense(64, activation="relu", name="dense_1"))
self.model.add(layers.Dense(64, activation="relu", name="dense_2"))
self.model.add(layers.Dense(8, activation="softmax", name="predictions"))
But I wanted to make it possible to perform gradient descent on only one, chosen position of the output vector. The way i did it was like this:
First I created a class like that:
class CustomMSE(keras.losses.Loss):
def __init__(self, my_output, name="custom_mse"):
super().__init__(name=name)
self.my_output = my_output
def call(self, y_true, y_pred):
mse = tf.math.reduce_mean(tf.square(y_true[0,self.my_output] - y_pred[0,self.my_output]))
return mse
and then I just applied compile method like that:
self.model.compile(optimizer=keras.optimizers.Adam(), loss=CustomMSE(i))
I am not sure of two things.
First: will the .fit method modify the wages between the second hidden layer and the j-th output for j !=i (I hope it won't)
Second: will the instruction self.model.compile(optimizer=keras.optimizers.Adam(), loss=CustomMSE(i)) applied many times for different values of i affect the current wages of the model, or will it just change the further behavior of the network after aplying the .fit method?
With the code you have, it will not work as expected, as you are using tf. functions rather than keras.backend functions to create loss functions. Here is an example of how you can create a custom loss function:
import tensorflow.keras.backend as kb
def custom_loss(y_actual,y_pred):
custom_loss=kb.square(y_actual-y_pred)
return custom_loss
You can use this loss function like this:
model.compile(loss=custom_loss,optimizer=optimizer)
Of course, this is not the same loss function you implemented, but it shows the methodology.

Differentiating user-defined Variables when using Keras layers

I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.

Trainable, Multi-Parameter Activ. Function (RBF) NeuPy / Theano

How do I implement a custom activation function (RBF kernel with mean and variances adjusted by gradient descent) in Neupy or Theano for use in Neupy.
{Quick Background: Gradient Descent works with every parameter in the network. I want to make a specialized features space that contains optimized feature parameters so Neupy}
I think my problems is in the creation of parameters, how they are sized, and how they are all connected.
Primary functions of interest.
Activation Function Class
class RBF(layers.ActivationLayer):
def initialize(self):
super(RBF, self).initialize()
self.add_parameter(name='mean', shape=(1,),
value=init.Normal(), trainable=True)
self.add_parameter(name='std_dev', shape=(1,),
value=init.Normal(), trainable=True)
def output(self, input_value):
return rbf(input_value, self.parameters)
RBF Function
def rbf(input_value, parameters):
K = _outer_substract(input_value, parameters['mean'])
return np.exp(- np.linalg.norm(K)/parameters['std_dev'])
Function to shape?
def _outer_substract(x, y):
return (x - y.T).T
Help will be much appreciated as this is will provide great insight into how to customize neupy networks. The documentation could use some work in some areas to say the least...
When layer changes shape of the input variable it has to inform the subsequent layers about the change. For this case it must have customized output_shape property. For example:
from neupy import layers
from neupy.utils import as_tuple
import theano.tensor as T
class Flatten(layers.BaseLayer):
"""
Slight modification of the Reshape layer from the neupy library:
https://github.com/itdxer/neupy/blob/master/neupy/layers/reshape.py
"""
#property
def output_shape(self):
# Number of output feature depends on the input shape
# When layer receives input with shape (10, 3, 4)
# than output will be (10, 12). First number 10 defines
# number of samples which you typically don't need to
# change during propagation
n_output_features = np.prod(self.input_shape)
return (n_output_features,)
def output(self, input_value):
n_samples = input_value.shape[0]
return T.reshape(input_value, as_tuple(n_samples, self.output_shape))
If you run it in terminal you will see that it works
>>> network = layers.Input((3, 4)) > Flatten()
>>> predict = network.compile()
>>> predict(np.random.random((10, 3, 4))).shape
(10, 12)
In your example I can see a few issues:
The rbf function doesn't return theano expression. It should fail during the function compilation
Functions like np.linalg.norm will return you scalar if you won't specify axis along which you want to calculate norm.
The following solution should work for you
import numpy as np
from neupy import layers, init
import theano.tensor as T
def norm(value, axis=None):
return T.sqrt(T.sum(T.square(value), axis=axis))
class RBF(layers.BaseLayer):
def initialize(self):
super(RBF, self).initialize()
# It's more flexible when shape of the parameters
# denend on the input shape
self.add_parameter(
name='mean', shape=self.input_shape,
value=init.Constant(0.), trainable=True)
self.add_parameter(
name='std_dev', shape=self.input_shape,
value=init.Constant(1.), trainable=True)
def output(self, input_value):
K = input_value - self.mean
return T.exp(-norm(K, axis=0) / self.std_dev)
network = layers.Input(1) > RBF()
predict = network.compile()
print(predict(np.random.random((10, 1))))
network = layers.Input(4) > RBF()
predict = network.compile()
print(predict(np.random.random((10, 4))))
Although itdxer answered the question sufficiently, I would like to add the exact solution to this problem.
Creation of Architecture
network = layers.Input(size) > RBF() > layers.Softmax(num_out)
Activation Function
# Elementwise Gaussian (RBF)
def rbf(value, mean, std):
return T.exp(-.5*T.sqr(value-mean)/T.sqr(std))/(std*T.sqrt(2*np.pi))
RBF Class
class RBF(layers.BaseLayer):
def initialize(self):
# Begin by initializing.
super(RBF, self).initialize()
# Add parameters to train
self.add_parameter(name='means', shape=self.input_shape,
value=init.Normal(), trainable=True)
self.add_parameter(name='std_dev', shape=self.input_shape,
value=init.Normal(), trainable=True)
# Define output function for the RBF layer.
def output(self, input_value):
K = input_value - self.means
return rbf(input_value,self.means,self.std_dev
Training
If you are interested in training. It is as simple as,
# Set training algorithm
gdnet = algorithms.Momentum(
network,
momenutm = 0.1
)
# Train.
gdnet.train(x,y,max_iter=100)
This compiles with the proper input and target and mean and variances are updated on an elementwise basis.

Categories