I was working with a transfer learning task.
Inadvertently, to make things easier, I put all the last layers in a torch.nn.Sequential wrapper like this-
self.fc=nn.Sequential(
nn.Linear(24*24*64,80),
nn.ReLU(True),
nn.Linear(80,964),
)
Now what I wanted to do is to change the last 80-unit linear layer with an identity mapping. I had trained this network and saved the weights, and I don't want to train it again (time-consuming :( ).
Is there any way I can replace the Linear layer inside ?
I know the usual outer layer replacement with model.fc1=nn.Identity(), but I just do not think this can happen here, as the last fc individual layers are not single objects, and wrapped in the Sequential layer object.
Perhaps, another hack around ? I have all the time amidst the coronavirus crisis :), any other solution would do ?
Related
I would need to do a kind of custom backpropagation so that, in an arbitrary layer of the network I can decide if actually modify the weights going outside that layer, or make them unchanged.
For example: I would like to study what happens if, during training, I force to not update some weight connecting input layer to 1st layer.
Is there a simple way to just "correct" the normal backpropagation intersecting between the layers ?
Thanks
If a big model consists of end-to-end individual models, can I (after training) preserve only one model and freeze/discard other models during inference?
An example: this struct2depth (see below) have three models training in an unsupervised fashion. However, what I really need is the object motion, namely 3D Object Motion Estimation part. So I wonder if this is feasible to
train on the original networks, but
inference with only Object Motion Estimator, i.e. other following layers frozen/discarded?
I saw that in tensorflow one can obtain tensor-output of a specified layer, but to save unnecessary computation I'd like to simply freeze all other parts... don't know if it's possible.
Looking forward to some insights. Thanks in advance!
You can ignore weights by setting them to 0. For this, you can directly get a weight W and do W.assign(tf.mul(W,0)). I know that you care about speeding up inference but unless you rewrite your code to use sparse representations, you will probably not be speeding up inference since weights can't be removed fully.
What you can alternatively do, is look at existing solutions for pruning in custom layers:
class MyDenseLayer(tf.keras.layers.Dense, tfmot.sparsity.keras.PrunableLayer):
def get_prunable_weights(self):
# Prune bias also, though that usually harms model accuracy too much.
return [self.kernel, self.bias]
# Use `prune_low_magnitude` to make the `MyDenseLayer` layer train with pruning.
model_for_pruning = tf.keras.Sequential([
tfmot.sparsity.keras.prune_low_magnitude(MyDenseLayer(20, input_shape=input_shape)),
tf.keras.layers.Flatten()
])
You can e.g. use ConstantSparsity (see here) and set the parameters such that your layers are fully pruned.
Another alternative is to construct a second, smaller model that you only use for inference. You can then save the required weights separately (instead of saving the entire model) after training and load them in the second model.
Hi I'm kind of new to the world of machine learning, but I do know that with sufficient learning one can extract a layer with learned values optimized for a task.
What does all the dense, pooling, batch normalization, etc. all mean in the model summary?
I've worked a little bit with Keras Sequential and I know there are some layer names that I need some simplification on. There's layers like dense, pooling, etc, and I don't know what they mean.
Ultimately, I want to find out which layer type contains those learned values, and maybe create an embedding by fully connecting a layer (I'm not sure how this works)
Furthermore, where in the layer should I extract this layer? I assume it is towards the last parts...
Thanks in advance.
Sorry if this question is incredibly basic. I feel like there is a wealth of resources online, but most of them are half-complete or skip over the details that I want to know.
I am trying to implement LeNet with Pytorch for practice.
https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html
How come in this examples and many examples online, they define the convolutional layers and the fc layers in init, but the subsampling and activation functions in forward?
What is the purpose of using torch.nn.functional for some functions, and torch.nn for others? For example, you have convolution with torch.nn (https://pytorch.org/docs/stable/nn.html#conv1d) and convolution with torch.nn.functional (https://pytorch.org/docs/stable/nn.functional.html#conv1d). Why choose one or the other?
Let's say I want to try different image sizes, like 28x28 (MNIST). The tutorial recommends I resize MNIST. Is there a way to instead change the values of LeNet? What happens if I don't change them?
What is the purpose of num_flat_features? If you wanted to flatten the features, couldn't you just do x = x.view(-1, 16*5*5)?
How come in this examples and many examples online, they define the
convolutional layers and the fc layers in init, but the subsampling
and activation functions in forward?
Any layer with trainable parameters should be defined in __init__. Subsampling, certain activations, dropout, etc.. don't have any trainable parameters so can be defined either in __init__ or used directly via the torch.nn.functional interface during forward.
What is the purpose of using torch.nn.functional for some functions, and torch.nn for others?
The torch.nn.functional functions are the actual functions that are used at the heart of the majority of torch.nn layers, they call into C++ compiled code. For example nn.Conv2d subclasses nn.Module, as should any custom layer or model which contains trainable parameters. The class handles registering parameters and encapsulates some other necessary functionality required for training and testing. During forward it actually uses nn.functional.conv2d to apply the convolution operation. As mentioned in the first question, when performing a parameterless operation like ReLU there's effectively no difference between using the nn.ReLU class and the nn.functional.relu function.
The reason they are provided is they give some freedom to do unconventional things. For example in this answer which I wrote the other day, providing a solution without nn.functional.conv2d would have been difficult.
Let's say I want to try different image sizes, like 28x28 (MNIST). The
tutorial recommends I resize MNIST. Is there a way to instead change
the values of LeNet? What happens if I don't change them?
There's no obvious way to change an existing, trained model to support different image sizes. The size of the input to the linear layer is necessarily fixed and the number of features at that point in the model is generally determined by the size of the input to the network. If the size of the input differs from the size that the model was designed for then when the data progresses to the linear layers it will have the wrong number of elements and cause the program will crash. Some models can handle a range of input sizes, usually by using something like an nn.AdaptiveAvgPool2d layer before the linear layer to ensure the input shape to the linear layer is always the same. Even so, if the input image size is too small then the downsampling and/or pooling operations in the network will cause the feature maps to vanish at some point, causing the program to crash.
What is the purpose of num_flat_features? If you wanted to flatten the
features, couldn't you just do x = x.view(-1, 16*5*5)?
When you define the linear layer you need to tell it how large the weight matrix is. A linear layer's weights are simply an unconstrained matrix (and bias vector). The shape of the weight matrix therefore is determined by the input shape, but you don't know the input shape before you run forward so it needs to be provided as an additional parameter (or hard coded) when you initialize the model.
To get to the actual question. Yes, during forward you could simply use
x = x.view(-1, 16*5*5)
Better yet, use
x = torch.flatten(x, start_dim=1)
This tutorial was written before the .flatten function was added to the library. The authors effectively just wrote their own flatten functionality which could be used regardless of the shape of x. This was probably so you had some portable code that could be used in your model without hard coding sizes. From a programming perspective it's nice to generalize such things since it means you wouldn't have to worry about changing those magic numbers if you decide to change part of the model (though this concern didn't appear to extend to the initialization).
So, I've retrained the Inception-v3 network to classify specific kinds of data - for training I've provided it with 200x200 pictures. Now, when I run the graph on another 200x200 picture it works just fine. What I want to achieve is to turn it into a filter for a convolutional network - i.e. slide it as a filter through the whole picture and get the probability of each pixel being in a given class.
It seems to be fairly simple to do manually - just splice the picture into small sections, classify each of them, put the results together and voila. But that would be very inefficient. Instead, I want to do something like what is described here: http://cs231n.github.io/convolutional-networks/#convert. Basically, change the last FC layer and turn it into a CONV layer by reshaping the weights. Seems simple enough, but I can't figure out how to actually do this.
My main problem is that at the end of the Inception-v3 net, right before the last FC layer, there's a pooling operation that reformats the data into (1,2048) shape, so I won't really be able to perform a convolution here.
Could anyone help me out?
My most immediate solution for this is to skip the fully connected layer in the end as it would cause the input image to lose its initial structure. Doing a Conv -> FC -> Conv seems redundant