Hi, I want to add element-wise multiplication layer to duplicate the input to multi-channels like this figure. (So, the input size M x N and multiplication filter size M x N is same), as illustrated in this figure
I want to add custom initialization value to filter, and also want them to get gradient while training. However, I can't find element-wise filter layer in PyTorch. Can I make it? Or is it just impossible in PyTorch?
In pytorch you can always implement your own layers, by making them subclasses of nn.Module. You can also have trainable parameters in your layer, by using nn.Parameter.
Possible implementation of such layer might look like
import torch
from torch import nn
class TrainableEltwiseLayer(nn.Module)
def __init__(self, n, h, w):
super(TrainableEltwiseLayer, self).__init__()
self.weights = nn.Parameter(torch.Tensor(1, n, h, w)) # define the trainable parameter
def forward(self, x):
# assuming x is of size b-n-h-w
return x * self.weights # element-wise multiplication
You still need to worry about initializing the weights. look into nn.init on ways to init weights. Usually, one init the weights of all the net prior to training and prior to loading any stored model (so partially trained models can override random init). Something like
model = mymodel(*args, **kwargs) # instantiate a model
for m in model.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weights.data) # init for conv layers
if isinstance(m, TrainableEltwiseLayer):
nn.init.constant_(m.weights.data, 1) # init your weights here...
Related
So, I'm trying to create a custom layer in TensorFlow 2.4.1, using a function for a neuron I defined.
# NOTE: this is not the actual neuron I want to use,
# it's just a simple example.
def neuron(x, W, b):
return W # x + b
Where the W and b it gets would be of shape (1, x.shape[0]) and (1, 1) respectively. This means this is like a single neuron in a dense layer. So, I want to create a dense layer by stacking however many of these individual neurons I want.
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__() # handles standard arguments
self.n_units = n_units # Number of neurons to be in the layer
def build(self, input_shape):
# Create weights and biases for all neurons individually
for i in range(self.n_units):
# Create weights and bias for ith neuron
...
def call(self, inputs):
# Compute outputs for all neurons
...
# Concatenate outputs to create layer output
...
return output
How can I create a layer as a stack of individual neurons (also in a way it can train)? I have roughly outlined the idea for the layer in the above code, but the answer doesn't need to follow that as a blueprint.
Finally; yes I'm aware that to create a dense layer you don't need to go about it in such a roundabout way (you just need 1 weight and bias matrix), but in my actual use case, this is neccessary. Thanks!
So, person who asked this question here, I have found a way to do it, by dynamically creating variables and operations.
First, let's re-define the neuron to use tensorflow operations:
def neuron(x, W, b):
return tf.add(tf.matmul(W, x), b)
Then, let's create the layer (this uses the blueprint layed out in the question):
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__()
self.n_units = n_units
def build(self, input_shape):
for i in range(self.n_units):
exec(f'self.kernel_{i} = self.add_weight("kernel_{i}", shape=[1, int(input_shape[0])])')
exec(f'self.bias_{i} = self.add_weight("bias_{i}", shape=[1, 1])')
def call(self, inputs):
for i in range(self.n_units):
exec(f'out_{i} = neuron(inputs, self.kernel_{i}, self.bias_{i})')
return eval(f'tf.concat([{", ".join([ f"out_{i}" for i in range(self.n_units) ])}], axis=0)')
As you can see, we're using exec and eval to dynamically create variables and perform operations.
That's it! We can perform a few checks to see if TensorFlow could use this:
# Check to see if it outputs the correct thing
layer = Layer(5) # With 5 neurons, it should return a (5, 6)
print(layer(tf.zeros([10, 6])))
# Check to see if it has the right trainable parameters
print(layer.trainable_variables)
# Check to see if TensorFlow can find the gradients
layer = Layer(5)
x = tf.ones([10, 6])
with tf.GradientTape() as tape:
z = layer(x)
print(f"Parameter: {layer.trainable_variables[2]}")
print(f"Gradient: {tape.gradient(z, layer.trainable_variables[2])}")
This solution works, but it's not very elegant... I wonder if there's a better way to do it, some magical TF method that can map the neuron to create a layer, I'm too inexperienced to know for the moment. So, please answer if you have a (better) answer, I'll be happy to accept it :)
I have the follow code
import torch
import torch.nn as nn
from torchviz import make_dot, make_dot_from_trace
class Net(nn.Module):
def __init__(self, input, output):
super(Net, self).__init__()
self.fc = nn.Linear(input, output)
def forward(self, x):
x = self.fc(x)
x = self.fc(x)
return x
model = Net(12, 12)
print(model)
x = torch.rand(1, 12)
y = model(x)
make_dot(y, params = dict(model.named_parameters()))
Here I reuse the self.fc twice in the forward.
The computational graph is look
I am confused about the computational graph and,
I am curious how to train this model in back propagation? It seem for me the gradient will live in a loop forever. Thanks a lot.
There are no issues with your graph. You can train it the same way as any other feed-forward model.
Regarding looping: Since it is a directed acyclic graph, the are no actual loops (check out the arrow directions).
Regarding backprop: Let’s consider fc.bias parameter. Since you are reusing the same layer two times, the bias has two outgoing arrows (used in two places of your net). During backpropagation stage the direction is reversed: bias will get gradients from two places, and these gradients will add up.
Regarding the graph: An FC layer can be represented as this: Addmm(bias, x, T(weight), where T is transposing and Addmm is matrix multiplication plus adding a vector. So, you can see how data (weight, bias) is passed into functions (Addmm, T)
https://pytorch.org/docs/stable/generated/torch.addmm.html
https://pytorch.org/docs/stable/generated/torch.t.html
My neural network has the following architecture:
input -> 128x (separate fully connected layers) -> output averaging
I am using a ModuleList to hold the list of fully connected layers. Here's how it looks at this point:
class MultiHead(nn.Module):
def __init__(self, dim_state, dim_action, hidden_size=32, nb_heads=1):
super(MultiHead, self).__init__()
self.networks = nn.ModuleList()
for _ in range(nb_heads):
network = nn.Sequential(
nn.Linear(dim_state, hidden_size),
nn.Tanh(),
nn.Linear(hidden_size, dim_action)
)
self.networks.append(network)
self.cuda()
self.optimizer = optim.Adam(self.parameters())
Then, when I need to calculate the output, I use a for ... in construct to perform the forward and backward pass through all the layers:
q_values = torch.cat([net(observations) for net in self.networks])
# skipped code which ultimately computes the loss I need
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
This works! But I am wondering if I couldn't do this more efficiently. I feel like by doing a for...in, I am actually going through each separate FC layer one by one, while I'd expect this operation could be done in parallel.
In the case of Convnd in place of Linear you could use the groups argument for "grouped convolutions" (a.k.a. "depthwise convolutions"). This let's you handle all parallel networks simultaneously.
If you use a convolution kernel of size 1, then the convolution does nothing else than applying a Linear layer, where each channel is considered an input dimension. So the rough structure of your network would look like this:
Modify the input tensor of shape B x dim_state as follows: add an additional dimension and replicate by nb_state-times B x dim_state to B x (dim_state * nb_heads) x 1
replace the two Linear with
nn.Conv1d(in_channels=dim_state * nb_heads, out_channels=hidden_size * nb_heads, kernel_size=1, groups=nb_heads)
and
nn.Conv1d(in_channels=hidden_size * nb_heads, out_channels=dim_action * nb_heads, kernel_size=1, groups=nb_heads)
we now have a tensor of size B x (dim_action x nb_heads) x 1 you can now modify it to whatever shape you want (e.g. B x nb_heads x dim_action)
While CUDA natively supports grouped convolutions, there were some issues in pytorch with the speed of grouped convolutions (see e.g. here) but I think that was solved now.
I'm training a GAN-like models, but not exactly the same. I'm using Keras with TensorFlow backend.
I have two Keras models G and D. I want to output the weights parameter of a target layer in G, as the input of model D, and use the result of D.predict(G.weights) as part of the loss function for G, i.e. D is not trainable, but the argument G.weights are trainable. In this way to want to further train G.weights.
I tried to use
def custom_loss(ytrue, ypred):
### Something to do with ytrue and ypred
weight = self.G.get_layer('target').get_weights()
loss += self.D.predict(weight)
return loss
but apparently it does not work since weight is just a numpy array and is not trainable.
Is there a way to get the weights of model that is still trainable in Keras? I'm new to Keras and know very little about TensorFlow. I will be very appreciate it someone can help!
As you mention, layer.get_weights() will return the current weights of the matrix. What you want to feed for prediction is a the node in the computation graph representing such weights. You can use layer.trainable_weights instead, which will return two tf.Variable which you can feed to another layer/model.
Note that there is one variable for the unit to unit connections and another one for the bias. If you want to get a flattened tensor from it you could do something like:
from keras import backend as K
...
ww, bias = self.G.get_layer('target').trainable_weights
flattened_weights = Flatten()(K.concat([ww, K.reshape(bias, (5, 1))], axis=1))
I'm trying to get Keras to train a multiclass classification model that can be written in a network like this:
The only set of trainable parameters are those , all the rest is given. The functions fi are combinations of usual mathematical functions (for example .Sigma stands for summing the previous terms and softmax is the usual function. The (x1,x2,...xn) are elements of train or test set and are a specific subset of the original data already selected.
The model in more depth:
Specificaly, given (x_1,x_2,...,x_n) an input in train or test set, the network evaluates
where fi are given mathematical functions, are rows of a particular subset of the original data and the coefficients are the parameters I want to train.
As I'm using keras, I expect it to add a bias term to each row.
After the above evaluation, I will apply a softmax layer (each of the m lines above are numbers that will be inputs for the softmax function).
At the end I want to compile the model and run model.fit as usual.
The problem is that I couln't translate the expression to keras sintax.
My attempt:
Following the network scratch above, I first tried to consider each of the expressions of the form as lambda layers in a Sequential Model, but the best I could get to work was a combination of a dense layer with linear activation (which would play the role of a row's parameters: ) followed by a Lambda layer outputting a vector without the required summation, as follows:
model = Sequential()
#single row considered:
model.add(Lambda(lambda x: f_fixedRow(x), input_shape=(nFeatures,)))
#parameters set after lambda layer to get (a1*f(x1,y1),...,an*f(xn,yn)) and not (f(a1*x1,y1),...,f(an*xn,yn))
model.add(Dense(nFeatures, activation='linear'))
#missing summation: sum(x)
#missing evaluation of f in all other rows
model.add(Dense(classes,activation='softmax',trainable=False)) #should get all rows
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
Also, I had to define the function in the lambda function call with the argument already fixed (because the lambda function could have only the input layers as variable):
def f_fixedRow(x):
#picking a particular row (as a vector) to evaluate f in (f works element-wise)
y=tf.constant(value=x[0,:],dtype=tf.float32)
return f(x,y)
I managed to write the f function with tensorflow (working element-wise in a row), although this is a possible source for problems in my code (and the above workaround seems unnatural).
I also thought that if I could properly write the element-wise sum of the vector in the aforementioned attempt I could repeat the same procedure in a parallelized manner with the keras Functional API and then insert the output of each parallel model in a softmax function, as I need.
Another approach that I considered was to train the parameters keeping their natural matrix structure seen in Network Description, maybe writing a matrix Lambda layer, but I could not find anything related to this idea.
Anyway, I'm not sure what is a good way to work with this model within keras, maybe I'm missing an important point because of the non standard way the parameters are written or lack of experience with tensorflow. Any suggestions are welcome.
For this answer, it's important that f be a tensor function that operates elementwise. (No iterating). This is reasonably easy to have, just check the keras backend functions.
Assumptions:
The x_pk set is constant, otherwise this solution must be reviewed.
The function f is elementwise (if not, please show f for better code)
Your model will need x_pk as a tensor input. And you should do that in a functional API model.
import keras.backend as K
from keras.layers import Input, Lambda, Activation
from keras.models import Model
#x_pk data
x_pk_numpy = select_X_pk_samples(x_train)
x_pk_tensor = K.variable(x_pk_numpy)
#number of rows in x_pk
m = len(x_pk_numpy)
#I suggest a fixed batch size for simplicity
batch = some_batch_size
First let's work on the function that will take x and x_pk calling f.
def calculate_f(inputs): #inputs will be a list with x and x_pk
x, x_pk = inputs
#since f will work elementwise, let's replicate x and x_pk so they have equal shapes
#please explain f for better optimization
# x from (batch, n) to (batch, m, n)
x = K.stack([x]*m, axis=1)
# x_pk from (m, n) to (batch, m, n)
x_pk = K.stack([x_pk]*batch, axis=0)
#a batch size of 1 could make this even simpler
#a variable batch size would make this more complicated
#certain f functions could make this process unnecessary
return f(x, x_pk)
Now, different from a Dense layer, this formula is using the a_pk weights multiplied elementwise. So we need a custom layer:
class ElementwiseWeights(Layer):
def __init__(self, **kwargs):
super(ElementwiseWeights, self).__init__(**kwargs)
def build(self, input_shape):
weight_shape = (1,) + input_shape[1:] #shape (1, m, n)
self.kernel = self.add_weight(name='kernel',
shape=weight_shape,
initializer='uniform',
trainable=True)
super(ElementwiseWeights, self).build(input_shape)
def compute_output_shape(self,input_shape):
return input_shape
def call(self, inputs):
return self.kernel * inputs
Now let's build our functional API model:
#x_pk model tensor input
x_pk = Input(tensor=x_pk_tensor) #shape (m, n)
#x usual input with fixed batch size
x = Input(batch_shape=(batch,n)) #shape (batch, n)
#calculate F
out = Lambda(calculate_f)([x, xp_k]) #shape (batch, m, n)
#multiply a_pk
out = ElementwiseWeights()(out) #shape (batch, m, n)
#sum n elements, keep m rows:
out = Lambda(lambda x: K.sum(x, axis=-1))(out) #shape (batch, m)
#softmax
out = Activation('softmax')(out) #shape (batch,m)
Continue this model with whatever you want and finish it:
model = Model([x, x_pk], out)
model.compile(.....)
model.fit(x_train, y_train, ....) #perhaps you might need .fit([x_train], ytrain,...)
Edit for function f
You can have the proposed f like this:
#create the n coefficients:
coefficients = np.array([c0, c1, .... , cn])
coefficients = coefficients.reshape((1,1,n))
def f(x, x_pk):
c = K.variable(coefficients) #shape (1, 1, n)
out = (x - x_pk) / c
return K.exp(out)
This f would accept x with shape (batch, 1, n), without the stack used in the calculate_f function.
Or could accept x_pk with shape (1, m, n), allowing variable batch size.
But I'm not sure it's possible to have both of these shapes together. Testing this might be interesting.