How do I implement a custom activation function (RBF kernel with mean and variances adjusted by gradient descent) in Neupy or Theano for use in Neupy.
{Quick Background: Gradient Descent works with every parameter in the network. I want to make a specialized features space that contains optimized feature parameters so Neupy}
I think my problems is in the creation of parameters, how they are sized, and how they are all connected.
Primary functions of interest.
Activation Function Class
class RBF(layers.ActivationLayer):
def initialize(self):
super(RBF, self).initialize()
self.add_parameter(name='mean', shape=(1,),
value=init.Normal(), trainable=True)
self.add_parameter(name='std_dev', shape=(1,),
value=init.Normal(), trainable=True)
def output(self, input_value):
return rbf(input_value, self.parameters)
RBF Function
def rbf(input_value, parameters):
K = _outer_substract(input_value, parameters['mean'])
return np.exp(- np.linalg.norm(K)/parameters['std_dev'])
Function to shape?
def _outer_substract(x, y):
return (x - y.T).T
Help will be much appreciated as this is will provide great insight into how to customize neupy networks. The documentation could use some work in some areas to say the least...
When layer changes shape of the input variable it has to inform the subsequent layers about the change. For this case it must have customized output_shape property. For example:
from neupy import layers
from neupy.utils import as_tuple
import theano.tensor as T
class Flatten(layers.BaseLayer):
"""
Slight modification of the Reshape layer from the neupy library:
https://github.com/itdxer/neupy/blob/master/neupy/layers/reshape.py
"""
#property
def output_shape(self):
# Number of output feature depends on the input shape
# When layer receives input with shape (10, 3, 4)
# than output will be (10, 12). First number 10 defines
# number of samples which you typically don't need to
# change during propagation
n_output_features = np.prod(self.input_shape)
return (n_output_features,)
def output(self, input_value):
n_samples = input_value.shape[0]
return T.reshape(input_value, as_tuple(n_samples, self.output_shape))
If you run it in terminal you will see that it works
>>> network = layers.Input((3, 4)) > Flatten()
>>> predict = network.compile()
>>> predict(np.random.random((10, 3, 4))).shape
(10, 12)
In your example I can see a few issues:
The rbf function doesn't return theano expression. It should fail during the function compilation
Functions like np.linalg.norm will return you scalar if you won't specify axis along which you want to calculate norm.
The following solution should work for you
import numpy as np
from neupy import layers, init
import theano.tensor as T
def norm(value, axis=None):
return T.sqrt(T.sum(T.square(value), axis=axis))
class RBF(layers.BaseLayer):
def initialize(self):
super(RBF, self).initialize()
# It's more flexible when shape of the parameters
# denend on the input shape
self.add_parameter(
name='mean', shape=self.input_shape,
value=init.Constant(0.), trainable=True)
self.add_parameter(
name='std_dev', shape=self.input_shape,
value=init.Constant(1.), trainable=True)
def output(self, input_value):
K = input_value - self.mean
return T.exp(-norm(K, axis=0) / self.std_dev)
network = layers.Input(1) > RBF()
predict = network.compile()
print(predict(np.random.random((10, 1))))
network = layers.Input(4) > RBF()
predict = network.compile()
print(predict(np.random.random((10, 4))))
Although itdxer answered the question sufficiently, I would like to add the exact solution to this problem.
Creation of Architecture
network = layers.Input(size) > RBF() > layers.Softmax(num_out)
Activation Function
# Elementwise Gaussian (RBF)
def rbf(value, mean, std):
return T.exp(-.5*T.sqr(value-mean)/T.sqr(std))/(std*T.sqrt(2*np.pi))
RBF Class
class RBF(layers.BaseLayer):
def initialize(self):
# Begin by initializing.
super(RBF, self).initialize()
# Add parameters to train
self.add_parameter(name='means', shape=self.input_shape,
value=init.Normal(), trainable=True)
self.add_parameter(name='std_dev', shape=self.input_shape,
value=init.Normal(), trainable=True)
# Define output function for the RBF layer.
def output(self, input_value):
K = input_value - self.means
return rbf(input_value,self.means,self.std_dev
Training
If you are interested in training. It is as simple as,
# Set training algorithm
gdnet = algorithms.Momentum(
network,
momenutm = 0.1
)
# Train.
gdnet.train(x,y,max_iter=100)
This compiles with the proper input and target and mean and variances are updated on an elementwise basis.
Related
So, I'm trying to create a custom layer in TensorFlow 2.4.1, using a function for a neuron I defined.
# NOTE: this is not the actual neuron I want to use,
# it's just a simple example.
def neuron(x, W, b):
return W # x + b
Where the W and b it gets would be of shape (1, x.shape[0]) and (1, 1) respectively. This means this is like a single neuron in a dense layer. So, I want to create a dense layer by stacking however many of these individual neurons I want.
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__() # handles standard arguments
self.n_units = n_units # Number of neurons to be in the layer
def build(self, input_shape):
# Create weights and biases for all neurons individually
for i in range(self.n_units):
# Create weights and bias for ith neuron
...
def call(self, inputs):
# Compute outputs for all neurons
...
# Concatenate outputs to create layer output
...
return output
How can I create a layer as a stack of individual neurons (also in a way it can train)? I have roughly outlined the idea for the layer in the above code, but the answer doesn't need to follow that as a blueprint.
Finally; yes I'm aware that to create a dense layer you don't need to go about it in such a roundabout way (you just need 1 weight and bias matrix), but in my actual use case, this is neccessary. Thanks!
So, person who asked this question here, I have found a way to do it, by dynamically creating variables and operations.
First, let's re-define the neuron to use tensorflow operations:
def neuron(x, W, b):
return tf.add(tf.matmul(W, x), b)
Then, let's create the layer (this uses the blueprint layed out in the question):
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__()
self.n_units = n_units
def build(self, input_shape):
for i in range(self.n_units):
exec(f'self.kernel_{i} = self.add_weight("kernel_{i}", shape=[1, int(input_shape[0])])')
exec(f'self.bias_{i} = self.add_weight("bias_{i}", shape=[1, 1])')
def call(self, inputs):
for i in range(self.n_units):
exec(f'out_{i} = neuron(inputs, self.kernel_{i}, self.bias_{i})')
return eval(f'tf.concat([{", ".join([ f"out_{i}" for i in range(self.n_units) ])}], axis=0)')
As you can see, we're using exec and eval to dynamically create variables and perform operations.
That's it! We can perform a few checks to see if TensorFlow could use this:
# Check to see if it outputs the correct thing
layer = Layer(5) # With 5 neurons, it should return a (5, 6)
print(layer(tf.zeros([10, 6])))
# Check to see if it has the right trainable parameters
print(layer.trainable_variables)
# Check to see if TensorFlow can find the gradients
layer = Layer(5)
x = tf.ones([10, 6])
with tf.GradientTape() as tape:
z = layer(x)
print(f"Parameter: {layer.trainable_variables[2]}")
print(f"Gradient: {tape.gradient(z, layer.trainable_variables[2])}")
This solution works, but it's not very elegant... I wonder if there's a better way to do it, some magical TF method that can map the neuron to create a layer, I'm too inexperienced to know for the moment. So, please answer if you have a (better) answer, I'll be happy to accept it :)
I have the follow code
import torch
import torch.nn as nn
from torchviz import make_dot, make_dot_from_trace
class Net(nn.Module):
def __init__(self, input, output):
super(Net, self).__init__()
self.fc = nn.Linear(input, output)
def forward(self, x):
x = self.fc(x)
x = self.fc(x)
return x
model = Net(12, 12)
print(model)
x = torch.rand(1, 12)
y = model(x)
make_dot(y, params = dict(model.named_parameters()))
Here I reuse the self.fc twice in the forward.
The computational graph is look
I am confused about the computational graph and,
I am curious how to train this model in back propagation? It seem for me the gradient will live in a loop forever. Thanks a lot.
There are no issues with your graph. You can train it the same way as any other feed-forward model.
Regarding looping: Since it is a directed acyclic graph, the are no actual loops (check out the arrow directions).
Regarding backprop: Let’s consider fc.bias parameter. Since you are reusing the same layer two times, the bias has two outgoing arrows (used in two places of your net). During backpropagation stage the direction is reversed: bias will get gradients from two places, and these gradients will add up.
Regarding the graph: An FC layer can be represented as this: Addmm(bias, x, T(weight), where T is transposing and Addmm is matrix multiplication plus adding a vector. So, you can see how data (weight, bias) is passed into functions (Addmm, T)
https://pytorch.org/docs/stable/generated/torch.addmm.html
https://pytorch.org/docs/stable/generated/torch.t.html
I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.
I'm training a neural network that learns some weights and based on those weights, I compute transformations that produce the predicted model in combination with the weights. My network doesn't learn properly and therefore I'm writing a different network that does nothing but returning the weights independent from the input x (after normalization with softmax and transpose). This way, I want to find out whether the problem lies in the network or in the transformation estimation outside the network. But this doesn't work. This is what I've got.
class DoNothingNet(torch.nn.Module):
def __init__(self, n_vertices=6890, n_joints=14):
super(DoNothingNet, self).__init__()
self.weights = nn.parameter.Parameter(torch.randn(n_vertices, n_joints))
def forward(self, x, indices):
self.weights = F.softmax(self.weights, dim=1)
return self.weights.transpose(0,1)
But the line self.weights = F.softmax(self.weights, dim=1) doesn't work and produces the error TypeError: cannot assign 'torch.cuda.FloatTensor' as parameter 'weights' (torch.nn.Parameter or None expected). How do I fix this? And does the code even make sense?
nn.Module tracks all fields of type nn.Parameter for training. In your code every forward call you try to change parameters weights by assigning it to Tensor type, so the error occurs.
The following code outputs normalised weights without changing the stored ones. Hope this will help.
import torch
from torch import nn
from torch.nn import functional as F
class DoNothingNet(torch.nn.Module):
def __init__(self, n_vertices=6890, n_joints=14):
super(DoNothingNet, self).__init__()
self.weights = nn.parameter.Parameter(torch.randn(n_vertices, n_joints))
def forward(self, x, indices):
output = F.softmax(self.weights, dim=1)
return output.transpose(0,1)
Hi, I want to add element-wise multiplication layer to duplicate the input to multi-channels like this figure. (So, the input size M x N and multiplication filter size M x N is same), as illustrated in this figure
I want to add custom initialization value to filter, and also want them to get gradient while training. However, I can't find element-wise filter layer in PyTorch. Can I make it? Or is it just impossible in PyTorch?
In pytorch you can always implement your own layers, by making them subclasses of nn.Module. You can also have trainable parameters in your layer, by using nn.Parameter.
Possible implementation of such layer might look like
import torch
from torch import nn
class TrainableEltwiseLayer(nn.Module)
def __init__(self, n, h, w):
super(TrainableEltwiseLayer, self).__init__()
self.weights = nn.Parameter(torch.Tensor(1, n, h, w)) # define the trainable parameter
def forward(self, x):
# assuming x is of size b-n-h-w
return x * self.weights # element-wise multiplication
You still need to worry about initializing the weights. look into nn.init on ways to init weights. Usually, one init the weights of all the net prior to training and prior to loading any stored model (so partially trained models can override random init). Something like
model = mymodel(*args, **kwargs) # instantiate a model
for m in model.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weights.data) # init for conv layers
if isinstance(m, TrainableEltwiseLayer):
nn.init.constant_(m.weights.data, 1) # init your weights here...