So, I'm trying to create a custom layer in TensorFlow 2.4.1, using a function for a neuron I defined.
# NOTE: this is not the actual neuron I want to use,
# it's just a simple example.
def neuron(x, W, b):
return W # x + b
Where the W and b it gets would be of shape (1, x.shape[0]) and (1, 1) respectively. This means this is like a single neuron in a dense layer. So, I want to create a dense layer by stacking however many of these individual neurons I want.
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__() # handles standard arguments
self.n_units = n_units # Number of neurons to be in the layer
def build(self, input_shape):
# Create weights and biases for all neurons individually
for i in range(self.n_units):
# Create weights and bias for ith neuron
...
def call(self, inputs):
# Compute outputs for all neurons
...
# Concatenate outputs to create layer output
...
return output
How can I create a layer as a stack of individual neurons (also in a way it can train)? I have roughly outlined the idea for the layer in the above code, but the answer doesn't need to follow that as a blueprint.
Finally; yes I'm aware that to create a dense layer you don't need to go about it in such a roundabout way (you just need 1 weight and bias matrix), but in my actual use case, this is neccessary. Thanks!
So, person who asked this question here, I have found a way to do it, by dynamically creating variables and operations.
First, let's re-define the neuron to use tensorflow operations:
def neuron(x, W, b):
return tf.add(tf.matmul(W, x), b)
Then, let's create the layer (this uses the blueprint layed out in the question):
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__()
self.n_units = n_units
def build(self, input_shape):
for i in range(self.n_units):
exec(f'self.kernel_{i} = self.add_weight("kernel_{i}", shape=[1, int(input_shape[0])])')
exec(f'self.bias_{i} = self.add_weight("bias_{i}", shape=[1, 1])')
def call(self, inputs):
for i in range(self.n_units):
exec(f'out_{i} = neuron(inputs, self.kernel_{i}, self.bias_{i})')
return eval(f'tf.concat([{", ".join([ f"out_{i}" for i in range(self.n_units) ])}], axis=0)')
As you can see, we're using exec and eval to dynamically create variables and perform operations.
That's it! We can perform a few checks to see if TensorFlow could use this:
# Check to see if it outputs the correct thing
layer = Layer(5) # With 5 neurons, it should return a (5, 6)
print(layer(tf.zeros([10, 6])))
# Check to see if it has the right trainable parameters
print(layer.trainable_variables)
# Check to see if TensorFlow can find the gradients
layer = Layer(5)
x = tf.ones([10, 6])
with tf.GradientTape() as tape:
z = layer(x)
print(f"Parameter: {layer.trainable_variables[2]}")
print(f"Gradient: {tape.gradient(z, layer.trainable_variables[2])}")
This solution works, but it's not very elegant... I wonder if there's a better way to do it, some magical TF method that can map the neuron to create a layer, I'm too inexperienced to know for the moment. So, please answer if you have a (better) answer, I'll be happy to accept it :)
Related
I made an example diagram of a scaled down version of what I'm trying to implement:
So the top two input nodes are only fully connected to the top three output nodes, and the same design applies to the bottom two nodes. So far I've come up with two ways of implementing this in PyTorch, neither of which are optimal.
The first would be to create a nn.ModuleList of many smaller Linear Layers, and during the forward pass, iterate the input through them. For the diagram's example, that would look something like this:
class Module(nn.Module):
def __init__(self):
self.layers = nn.Module([nn.Linear(2, 3) for i in range(2)])
def forward(self, input):
output = torch.zeros(2, 3)
for i in range(2):
output[i, :] = self.layers[i](input.view(2, 2)[i, :])
return output.flatten()
So this accomplishes the network in the diagram, the main issue is its very slow. I assume this is because PyTorch has to process the for loop sequentially, and can't process the input tensor in parallel.
To "vectorize" the module such that PyTorch can run it quicker, I have this implementation:
class Module(nn.Module):
def __init__(self):
self.layer = nn.Linear(4, 6)
self.mask = # create mask of ones and zeros to "block" certain layer connections
def forward(self, input):
prune.custom_from_mask(self.layer, name='weight', mask=self.mask)
return self.layer(input)
This also accomplishes the diagram's network, by using weight pruning to ensure certain weights in the fully connected layer are always zero (ex. the weight connecting the top input node to the bottom out node will always be zero, so its effectively "disconnected"). This module is much faster than the previous, as there is no for loop. The problem now is this module takes up significantly more memory. This is likely due to the fact that, even though most of the layer's weights will be zero, PyTorch still treats the network as if they are there. This implementation essentially keeps way more weights around than it needs to.
Has anyone encountered this issue before and come up with an efficient solution?
If weight sharing is ok, then 1D convolutions should solve the problem:
class Module(nn.Module):
def __init__(self):
self.layers = nn.Conv1d(in_channels=2, out_channels=3, kernel_size=1)
self._n_splits = 2
def forward(self, input):
B, C = input.shape
output = self.layers(input.view(B, C//self._n_splits, -1))
return output.view(B, C)
If weight sharing is NOT ok, then you can use the group convolutions: self.layers = nn.Conv1d(in_channels=4, out_channels=4, kernel_size=1, stride=1, groups=2). However, I am not sure if this can implement an arbitrary number of channel splits, you can check the documentation: https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html
A 1D convolution is a fully connected layer on all the channels of the input. A Group convolution will divide the channels into groups and perform separate conv operations on them (which is what you want).
The implementation will look something like:
class Module(nn.Module):
def __init__(self):
self.layers = nn.Conv1d(in_channels=2, out_channels=4, kernel_size=1, groups=2)
def forward(self, input):
B, C = input.shape
output = self.layers(input.unsqueeze(-1))
return output.squeeze()
EDIT:
If you need an odd number of output channels you can combine two group convs.
class Module(nn.Module):
def __init__(self):
self.layers = nn.Sequence(
nn.Conv1d(in_channels=2, out_channels=4, kernel_size=1, groups=2),
nn.Conv1d(in_channels=4, out_channels=3, kernel_size=1, groups=3))
def forward(self, input):
B, C = input.shape
output = self.layers(input.unsqueeze(-1))
return output.squeeze()
That will effectively define the input channels as required in the diagram and allow you for an arbitrary number of output channels. Notice that if the second convolution has groups=1 you will allow for mixing channels and will effectively render the first group conv layer useless.
From a theoretical perspective, there is no need for activation functions in between those two convolutions. We are combining them in a linear matter. However, it is possible that adding an activation function will improve performance.
My neural network has the following architecture:
input -> 128x (separate fully connected layers) -> output averaging
I am using a ModuleList to hold the list of fully connected layers. Here's how it looks at this point:
class MultiHead(nn.Module):
def __init__(self, dim_state, dim_action, hidden_size=32, nb_heads=1):
super(MultiHead, self).__init__()
self.networks = nn.ModuleList()
for _ in range(nb_heads):
network = nn.Sequential(
nn.Linear(dim_state, hidden_size),
nn.Tanh(),
nn.Linear(hidden_size, dim_action)
)
self.networks.append(network)
self.cuda()
self.optimizer = optim.Adam(self.parameters())
Then, when I need to calculate the output, I use a for ... in construct to perform the forward and backward pass through all the layers:
q_values = torch.cat([net(observations) for net in self.networks])
# skipped code which ultimately computes the loss I need
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
This works! But I am wondering if I couldn't do this more efficiently. I feel like by doing a for...in, I am actually going through each separate FC layer one by one, while I'd expect this operation could be done in parallel.
In the case of Convnd in place of Linear you could use the groups argument for "grouped convolutions" (a.k.a. "depthwise convolutions"). This let's you handle all parallel networks simultaneously.
If you use a convolution kernel of size 1, then the convolution does nothing else than applying a Linear layer, where each channel is considered an input dimension. So the rough structure of your network would look like this:
Modify the input tensor of shape B x dim_state as follows: add an additional dimension and replicate by nb_state-times B x dim_state to B x (dim_state * nb_heads) x 1
replace the two Linear with
nn.Conv1d(in_channels=dim_state * nb_heads, out_channels=hidden_size * nb_heads, kernel_size=1, groups=nb_heads)
and
nn.Conv1d(in_channels=hidden_size * nb_heads, out_channels=dim_action * nb_heads, kernel_size=1, groups=nb_heads)
we now have a tensor of size B x (dim_action x nb_heads) x 1 you can now modify it to whatever shape you want (e.g. B x nb_heads x dim_action)
While CUDA natively supports grouped convolutions, there were some issues in pytorch with the speed of grouped convolutions (see e.g. here) but I think that was solved now.
I'm trying to create a custom Keras Layer that is composed of a variable number of other Keras layers. I want access to both the grouped layers treated as a single layer with an input and output as well as access to the individual sublayers.
I'm just wondering, what is the "proper" method for performing this form of layer abstraction in Keras?
I'm not sure if this should be done through Keras's Layer API, or if I can work with name scopes and wrapping layers in a function and then select all layers under some name scope. (e.g. model.get_layer('custom1/*'), but I don't think that works, and name_scope doesn't appear to apply to layer names)
One of the other issues with using Keras Layers for this is that you need to assign the child variables as direct attributes (I assume it makes use of the descriptor API) to the class in order for the trainable weights to be added to the model (which is messier when we don't have a fixed number of layers, meaning we'd need to hack something together with setattr(), which, ew).
Btw, this isn't my actual code, but is a barebones simplification of what I'm trying to accomplish.
class CustomLayer(Layer):
def __init__(self, *a, n_layers=2, **kw):
self.n_layers = n_layers
super().__init__(*a, **kw)
def call(self, x):
for i in range(self.n_layers):
x = Dense(64, activation='relu')(x)
x = Dropout(0.2)(x)
return x
input_shape, n_classes = (None, 100), 10
x = input = Input(input_shape)
x = CustomLayer()(x)
x = CustomLayer()(x)
x = Dense(n_classes, activation='softmax')(x)
model = Model([input], [x])
'''
So this works fine
'''
print([l.name for l in model.layers])
print(model.layers[1].name)
print(model.layers[1].output)
# Output: ['input', 'custom_1', 'custom_2', 'dense']
# Output: 'custom_1'
# Output: the custom_1 output tensor
'''
But I'm not sure how to do something to the effect of this ...
'''
print(model.layers[1].layers[0].name)
print(model.layers[1].layers[0].output)
# Output: custom_1/dense
# Output: custom_1/dense output tensor
Hi, I want to add element-wise multiplication layer to duplicate the input to multi-channels like this figure. (So, the input size M x N and multiplication filter size M x N is same), as illustrated in this figure
I want to add custom initialization value to filter, and also want them to get gradient while training. However, I can't find element-wise filter layer in PyTorch. Can I make it? Or is it just impossible in PyTorch?
In pytorch you can always implement your own layers, by making them subclasses of nn.Module. You can also have trainable parameters in your layer, by using nn.Parameter.
Possible implementation of such layer might look like
import torch
from torch import nn
class TrainableEltwiseLayer(nn.Module)
def __init__(self, n, h, w):
super(TrainableEltwiseLayer, self).__init__()
self.weights = nn.Parameter(torch.Tensor(1, n, h, w)) # define the trainable parameter
def forward(self, x):
# assuming x is of size b-n-h-w
return x * self.weights # element-wise multiplication
You still need to worry about initializing the weights. look into nn.init on ways to init weights. Usually, one init the weights of all the net prior to training and prior to loading any stored model (so partially trained models can override random init). Something like
model = mymodel(*args, **kwargs) # instantiate a model
for m in model.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weights.data) # init for conv layers
if isinstance(m, TrainableEltwiseLayer):
nn.init.constant_(m.weights.data, 1) # init your weights here...
I am using the intermediate outputs of a larger model as the input to smaller models and I'm trying to make it one contiguous Model. In order to do so, I have to use the K.function() as a part of the model. This leads to the question:
Is there any way to use a K.function() within a Keras layer?
I created a simple custom layer using:
class ActivationExtraction(Layer):
"""
Extracts all of the outputs of the input_model network and feeds it as input
to the next layer
"""
def __init__(self, input_model, **kwargs):
self.input_model = input_model
# Extracts all outputs
outputs = [layer.output for layer in input_model.layers]
self.output_dim = np.array(outputs).shape
self.names = [layer.name for layer in input_model.layers]
# Evaluation function
self.output_function = K.function([input_model.input] + [K.learning_phase()],
outputs)
super(ActivationExtraction, self).__init__(**kwargs)
def build(self, input_shape):
super(ActivationExtraction, self).build(input_shape)
def call(self, x):
return self.output_function([x, 0])
def compute_output_shape(self, input_shape):
return self.output_dim
However, when I define the model, it returns the error
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed
values include Python scalars, strings, lists, numpy ndarrays, or TensorHandles.
I don't know a workaround for evaluating the tensor at compile time (because of the input shape being dynamic). I have tried using
def call(self, x):
x = K.get_session.run(x)
return self.outputfuntion([x, 0])
as a long shot to try and evaluate the tensor, but I'm not sure what I would feed it (I have limited experience with tensorflow).
As a last resort, I haven't been able to find a way to evaluate a tensor on the fly in a Keras layer either.