I am new to neural network pruning.I know the unstructured pruning sets the weights to zero while the structured pruning does actually change the network architecture. However, I am curious about the difference between their performance.
To be specific , I have created two models, a big one ,a small one.
class MyNet(nn.Module):
def __init__(self,channels):
super(MyNet,self).__init__()
self.conv1 = nn.Conv2d(3, channels, kernel_size=3, stride=1, padding=1, bias=False)
self.fc = nn.Linear(channels,10,bias=False)
def forward(self,x):
out = self.conv1(x)
out = F.avg_pool2d(out,32)
out = out.view(out.size(0), -1)
out = self.fc(out)
return out
big = MyNet(channels=64)
small = MyNet(channels=32)
The small model copies weights from the big one, then the big only remains the weights that the small model has copied from.(the remaining weights will be set to zeros)
in the first iteration they output the same, but after that,the two models output differently.
So I wonder why they output differently after the second iteration?
I have a pretrained model LeNet5 defined from scratch. I am performing pruning over filters in the convolution layers present in the model shown below.
class LeNet5(nn.Module):
def __init__(self, n_classes):
super(LeNet5, self).__init__()
self.feature_extractor = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=20, kernel_size=5, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(in_channels=20, out_channels=50, kernel_size=5, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.classifier = nn.Sequential(
nn.Linear(in_features=800, out_features=500),
nn.ReLU(),
nn.Linear(in_features=500, out_features=10), # 10 - possible classes
)
def forward(self, x):
#x = x.view(x.size(0), -1)
x = self.feature_extractor(x)
x = torch.flatten(x, 1)
logits = self.classifier(x)
probs = F.softmax(logits, dim=1)
return logits, probs
I have successfully removed 2 filters from 20 in layer 1 (now 18 filters in conv2d layer1) and 5 filters from 50 in layer 2 (now 45 filters in conv2d layer3). So, now I need to update the model with the changes done as follows -
out_channel of layer 1 - 20 to 18
in_channel of layer 3 - 20 to 18
out_channel of layer 3 - 50 to 45
However, I'm unable to run the model as it gives dimension error.
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x720 and 800x500)
How to update the no. of filters layers present in the model using Pytorch to perform pruning? Is there any library I can use for the same?
Assuming you do not want the model to automatically change structure during runtime, you can easily update the structure of the model by simply changing the input parameters to the constructor. For instance:
nn.Conv2d(in_channels = 1, out_channels = 18, kernel_size = 5, stride = 1),
nn.Conv2d(in_channels = 18, out_channels = 45, kernel_size = 5, stride = 1),
and so on.
If you are retraining from scratch every time you change the model structure, that's all you need to do. However, if you would like to maintain portions of the already learned parameters when you change the model, you'll need to select these relevant values and reassign them to the model parameters. For instance, consider the parameters associated with the first convolutional layer, 1 input, 20 outputs, and kernel size of 5. The weights and biases for this layer have size [1,20,5,5] and [1,20]. You need to modify these parameters such that they have size [1,18,5,5] and [1,18]. You'd thus need the indices for the particular kernels/filters you want to maintain and which kernels you'd like to prune. The code syntax for doing this is roughly:
params = net.state_dict()
params["feature_extractor"]["conv1.weight"] = params["feature_extractor"]["conv1.weight"][:,:18,:,:]
params["feature_extractor"]["conv1.bias"] = params["feature_extractor"]["conv1.bias"][:,:18]
# and so on for the other layers
net.load_state_dict(params)
Here, I simply drop the last two kernels/bias values for the first convolutional layer. (Note that the actual dictionary key names may differ slightly; I didn't code this up to check because, as indicated in the comments above, you included a picture of code rather than real, copy-able, code so try to do the latter in the future.)
I just read the official tutorial in this page. And the code example to create a model is below:
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu')
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
return self.d2(x)
# Create an instance of the model
model = MyModel()
My question is, why in the code above we didn't need to specify an input shape in the first layer (Conv2D)? Is there any official documentation that mention this behaviour?
Because if I read the official docs about Conv Layer it said:
When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last"
Everything is working as intended.
Convolution as an operation works regardless of the input shape. Just the channels of the input needs to match.
You can see this behavior work as expected in tf.nn.conv2d. (Which is what your code snippet is using)
Now, you are linking a reference to keras.conv2d which forces the user to specify input shape to make the code more readable and 'validate' user input.
Is there a way to specify our own custom kernel values for a convolution neural network in pytorch? Something like kernel_initialiser in tensorflow? Eg. I want a 3x3 kernel in nn.Conv2d with initialization so that it acts as a identity kernel -
0 0 0
0 1 0
0 0 0
(this will effectively return the same output as my input in the very first iteration)
My non-exhaustive research on the subject -
I could use nn.init but it only has some pre-defined kernel initialisaition values.
I tried to follow the discussion on their official thread but it doesn't suit my needs.
I might have missed something in my research please feel free to point out.
I think an easier solution is to :
deconv = nn.ConvTranspose2d(
in_channels=channel_dim, out_channels=channel_dim,
kernel_size=kernel_size, stride=stride,
bias=False, padding=1, output_padding=1
)
deconv.weight.data.copy_(
get_upsampling_weight(channel_dim, channel_dim, kernel_size)
)
in other words use copy_
Thanks to ptrblck I was able to solve it.
I can define a new convolution layer as conv and as per the example I can set the identity kernel using -
weights = ch.Tensor([[0, 0, 0], [0, 1, 0], [0, 0, 0]]).unsqueeze(0).unsqueeze(0)
weights.requires_grad = True
conv = nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1, bias=False)
with ch.no_grad():
conv.weight = nn.Parameter(weights)
I can then continue to use conv as my regular nn.Conv2d layer.
it is possible to set as param filter array with own filters instead of number of filters in Conv2D
filters = [[[1,0,0],[1,0,0],[1,0,0]],
[[1,0,0],[0,1,0],[0,0,1]],
[[0,1,0],[0,1,0],[0,1,0]],
[[0,0,1],[0,0,1],[0,0,1]]]
model = Sequential()
model.add(Conv2D(filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), data_format='channels_first'))
The accepted answer is right but it would certainly be more useful with a complete example, similar to the one provided in this excellent tensorflow example showing what Conv2d does.
For keras, this is,
from keras.models import Sequential
from keras.layers import Conv2D
import numpy as np
# Keras version of this example:
# https://stackoverflow.com/questions/34619177/what-does-tf-nn-conv2d-do-in-tensorflow
# Requires a custom kernel initialise to set to value from example
# kernel = [[1,0,1],[2,1,0],[0,0,1]]
# image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# output = [[14, 6],[6,12]]
#Set Image
image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# Pad to "channels_last" format
# which is [batch, width, height, channels]=[1,4,4,1]
image = np.expand_dims(np.expand_dims(np.array(image),2),0)
#Initialise to set kernel to required value
def kernel_init(shape):
kernel = np.zeros(shape)
kernel[:,:,0,0] = np.array([[1,0,1],[2,1,0],[0,0,1]])
return kernel
#Build Keras model
model = Sequential()
model.add(Conv2D(1, [3,3], kernel_initializer=kernel_init,
input_shape=(4,4,1), padding="valid"))
model.build()
# To apply existing filter, we use predict with no training
out = model.predict(image)
print(out[0,:,:,0])
which outputs
[[14, 6]
[6, 12]]
as expected.
You must have in mind that the purpose of a Conv2D network is to train these filters values. I mean, in a traditional image processing task using morphological filters we are supposed to design the filter kernels and then iterate them through the whole image (convolution).
In a deep learning approach we are trying to do the same task. But here instead we assume we don't know which filters should be used, although we know exactly what we are looking for (the labeled images). When we are training a convolutional neural network we are showing to it what we want and asking it to find out its own weights, i.e. the filters values.
So, in this context, we should just define how many filters we want to train (in your case, 4 filters) and how they will be initialized. Their weights will be set when training the network.
There are many ways to initialize your filters weights (e.g. setting them all to zero or one; or using a random function to guarantee that distinct image characteristics would be catched by them). The Keras Conv2D function uses as default the 'glorot uniform' algorithm, as specified in https://keras.io/layers/convolutional/#conv2d.
If you really want to initialize your filters weights in the way you have showed, you can write your own function (take a look at https://keras.io/initializers/) and pass it via kernel_initializer parameter:
model.add(Conv2D(number_of_filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), kernel_initializer=your_function, data_format='channels_first'))