I want to use an external program as a custom operation.
Because automatic gradient would be not available, I wrote the code to provide gradients by using numerical methods. However, because it have to compute the batch_size number of derivatives,
I wrote it to get batch_size from the shape of x.
Following is an example using numpy function as an external program
f(x) = np.sum(x**2)
(In fact, for this simple numpy function, no loop over batch_size is necessary. But, it is written for general external function.)
#tf.custom_gradient
def custom_op(x):
# without using numpy, use external function
# assume x shape = (batch_size,3)
batch_size= x.shape[0]
input_length = x.shape[1]
# assert input_length==3
yout=[] # shape should be (batch_size,1)
gout=[] # shape should be (batch_size,3)
for i in range(batch_size):
inputs = x[i,:] # shape (3,)
y = np.sum(inputs**2) # shape (3,)
yout.append(y) # shape (1,)
# compute differences
dy = []
for j in range(len(inputs)):
delta = np.zeros_like(inputs)
delta[j] = np.abs(inputs[j])*0.001
yplus = np.sum((inputs + delta)**2) # change only j-th input
grad = (yplus-y)/delta[j] #shape (1,)
dy.append(grad)
gout.append(dy)
yout = tf.convert_to_tensor(yout,dtype='float32') # (batch_size,)
yout = tf.reshape(yout,shape=(batch_size,1)) # (batch_size,1)
gout = tf.convert_to_tensor(gout,dtype='float32') # (batch_size,)
gout = tf.reshape(gout,shape=(batch_size,input_length)) # (batch_size,1)
def grad(upstream):
return upstream*gout
return yout, grad
x = tf.Variable([[1.,2.,3.],[2.,3.,4.]],dtype='float32')
with tf.GradientTape() as tape:
y = custom_op(x)
tape.gradient(y,x)
and found it works.
However, when I tried to use it in the keras model , for example,
def construct_model():
inputs = tf.keras.Input(shape=(3,)) #input array
x = tf.keras.layers.Dense(1)(inputs)
outputs = custom_op(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
optimizer = 'adam'
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
model = construct_model()
it gives errors
because kerasTensor "inputs" does not have specified batch_size.
I tried to specify batch_size as "tf.keras.Input(shape=(3,),batch_size=2)".
However, it also raises errors because of the use of kerasTensor.
How should I change the custom_op to be compatible with keras?
I have developed a model with three inputs types. Image, categorical data and numerical data. For Image data I've used ResNet50 for the other two I develop my own network.
class MulticlassClassification(nn.Module):
def __init__(self, cat_size, num_col, output_size, layers, p=0.4):
super(MulticlassClassification, self).__init__()
# IMAGE: ResNet
self.cnn = models.resnet50(pretrained = True)
for param in self.cnn.parameters():
param.requires_grad = False
n_inputs = self.cnn.fc.in_features
self.cnn.fc = nn.Sequential(
nn.Linear(n_inputs, 250),
nn.ReLU(),
nn.Dropout(p),
nn.Linear(250, output_size),
nn.LogSoftmax(dim=1)
)
# TABULAR
self.all_embeddings = nn.ModuleList(
[nn.Embedding(categories, size) for categories, size in cat_size]
)
self.embedding_dropout = nn.Dropout(p)
self.batch_norm_num = nn.BatchNorm1d(num_col)
all_layers = []
num_cat_col = sum(e.embedding_dim for e in self.all_embeddings)
input_size = num_cat_col + num_col
for i in layers:
all_layers.append(nn.Linear(input_size, i))
all_layers.append(nn.ReLU(inplace=True))
all_layers.append(nn.BatchNorm1d(i))
all_layers.append(nn.Dropout(p))
input_size = i
all_layers.append(nn.Linear(layers[-1], output_size))
self.layers = nn.Sequential(*all_layers)
#combine
self.combine_fc = nn.Linear(output_size * 2, output_size)
def forward(self, image, x_categorical, x_numerical):
embeddings = []
for i, embedding in enumerate(self.all_embeddings):
print(x_categorical[:,i])
embeddings.append(embedding(x_categorical[:,i]))
x = torch.cat(embeddings, 1)
x = self.embedding_dropout(x)
x_numerical = self.batch_norm_num(x_numerical)
x = torch.cat([x, x_numerical], 1)
x = self.layers(x)
# img
x2 = self.cnn(image)
# combine
x3 = torch.cat([x, x2], 1)
x3 = F.relu(self.combine_fc(x3))
return x
Now after successful training I would like to calculate integrated gradients by using the captum library.
from captum.attr import IntegratedGradients
ig = IntegratedGradients(model)
testiter = iter(testloader)
img, stack_cat, stack_num, target = next(testiter)
attributions_ig = ig.attribute(inputs=(img.cuda(), stack_cat.cuda(), stack_num.cuda()), target=target.cuda())
And here I got an error:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
I figured out that captum injects a wrongly shaped tensor into my x_categorical input (with the print in my forward method). It seems like captum only sees the first input tensor and uses it's shape for all other inputs. How can I change this behaviour?
I've found the similar issue here (https://github.com/pytorch/captum/issues/439). It was recommended to use Interpretable Embedding for categorical data. When I used it I got this error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
I would be very grateful for any tips and advises how to combine all three inputs and to solve my problem.
I have trained an RNN model with pytorch. I need to use the model for prediction in an environment where I'm unable to install pytorch because of some strange dependency issue with glibc. However, I can install numpy and scipy and other libraries. So, I want to use the trained model, with the network definition, without pytorch.
I have the weights of the model as I save the model with its state dict and weights in the standard way, but I can also save it using just json/pickle files or similar.
I also have the network definition, which depends on pytorch in a number of ways. This is my RNN network definition.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
torch.manual_seed(1)
random.seed(1)
device = torch.device('cpu')
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size,num_layers, matching_in_out=False, batch_size=1):
super(RNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.num_layers = num_layers
self.batch_size = batch_size
self.matching_in_out = matching_in_out #length of input vector matches the length of output vector
self.lstm = nn.LSTM(input_size, hidden_size,num_layers)
self.hidden2out = nn.Linear(hidden_size, output_size)
self.hidden = self.init_hidden()
def forward(self, feature_list):
feature_list=torch.tensor(feature_list)
if self.matching_in_out:
lstm_out, _ = self.lstm( feature_list.view(len( feature_list), 1, -1))
output_space = self.hidden2out(lstm_out.view(len( feature_list), -1))
output_scores = torch.sigmoid(output_space) #we'll need to check if we need this sigmoid
return output_scores #output_scores
else:
for i in range(len(feature_list)):
cur_ft_tensor=feature_list[i]#.view([1,1,self.input_size])
cur_ft_tensor=cur_ft_tensor.view([1,1,self.input_size])
lstm_out, self.hidden = self.lstm(cur_ft_tensor, self.hidden)
outs=self.hidden2out(lstm_out)
return outs
def init_hidden(self):
#return torch.rand(self.num_layers, self.batch_size, self.hidden_size)
return (torch.rand(self.num_layers, self.batch_size, self.hidden_size).to(device),
torch.rand(self.num_layers, self.batch_size, self.hidden_size).to(device))
I am aware of this question, but I'm willing to go as low level as possible. I can work with numpy array instead of tensors, and reshape instead of view, and I don't need a device setting.
Based on the class definition above, what I can see here is that I only need the following components from torch to get an output from the forward function:
nn.LSTM
nn.Linear
torch.sigmoid
I think I can easily implement the sigmoid function using numpy. However, can I have some implementation for the nn.LSTM and nn.Linear using something not involving pytorch? Also, how will I use the weights from the state dict into the new class?
So, the question is, how can I "translate" this RNN definition into a class that doesn't need pytorch, and how to use the state dict weights for it?
Alternatively, is there a "light" version of pytorch, that I can use just to run the model and yield a result?
EDIT
I think it might be useful to include the numpy/scipy equivalent for both nn.LSTM and nn.linear. It would help us compare the numpy output to torch output for the same code, and give us some modular code/functions to use. Specifically, a numpy equivalent for the following would be great:
rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(input, (h0, c0))
and also for linear:
m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
You should try to export the model using torch.onnx. The page gives you an example that you can start with.
An alternative is to use TorchScript, but that requires torch libraries.
Both of these can be run without python. You can load torchscript in a C++ application https://pytorch.org/tutorials/advanced/cpp_export.html
ONNX is much more portable and you can use in languages such as C#, Java, or Javascript
https://onnxruntime.ai/ (even on the browser)
A running example
Just modifying a little your example to go over the errors I found
Notice that via tracing any if/elif/else, for, while will be unrolled
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
torch.manual_seed(1)
random.seed(1)
device = torch.device('cpu')
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size,num_layers, matching_in_out=False, batch_size=1):
super(RNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.num_layers = num_layers
self.batch_size = batch_size
self.matching_in_out = matching_in_out #length of input vector matches the length of output vector
self.lstm = nn.LSTM(input_size, hidden_size,num_layers)
self.hidden2out = nn.Linear(hidden_size, output_size)
def forward(self, x, h0, c0):
lstm_out, (hidden_a, hidden_b) = self.lstm(x, (h0, c0))
outs=self.hidden2out(lstm_out)
return outs, (hidden_a, hidden_b)
def init_hidden(self):
#return torch.rand(self.num_layers, self.batch_size, self.hidden_size)
return (torch.rand(self.num_layers, self.batch_size, self.hidden_size).to(device).detach(),
torch.rand(self.num_layers, self.batch_size, self.hidden_size).to(device).detach())
# convert the arguments passed during onnx.export call
class MWrapper(nn.Module):
def __init__(self, model):
super(MWrapper, self).__init__()
self.model = model;
def forward(self, kwargs):
return self.model(**kwargs)
Run an example
rnn = RNN(10, 10, 10, 3)
X = torch.randn(3,1,10)
h0,c0 = rnn.init_hidden()
print(rnn(X, h0, c0)[0])
Use the same input to trace the model and export an onnx file
torch.onnx.export(MWrapper(rnn), {'x':X,'h0':h0,'c0':c0}, 'rnn.onnx',
dynamic_axes={'x':{1:'N'},
'c0':{1: 'N'},
'h0':{1: 'N'}
},
input_names=['x', 'h0', 'c0'],
output_names=['y', 'hn', 'cn']
)
Notice that you can use symbolic values for the dimensions of some axes of some inputs. Unspecified dimensions will be fixed with the values from the traced inputs. By default LSTM uses dimension 1 as batch.
Next we load the ONNX model and pass the same inputs
import onnxruntime
ort_model = onnxruntime.InferenceSession('rnn.onnx')
print(ort_model.run(['y'], {'x':X.numpy(), 'c0':c0.numpy(), 'h0':h0.numpy()}))
Basically implementing it in numpy and copying weights from your pytorch model can do the trick. For your usecase you will only need to do a forward pass so we just need to implement that only
#Set Parameters for a small LSTM network
input_size = 2 # size of one 'event', or sample, in our batch of data
hidden_dim = 3 # 3 cells in the LSTM layer
output_size = 1 # desired model output
num_layers=3
torch_lstm = RNN( input_size,
hidden_dim ,
output_size,
num_layers,
matching_in_out=True
)
state = torch_lstm.state_dict() # state will capture the weights of your model
Now for LSTM in numpy these functions will be used:
got the below code from this link: https://towardsdatascience.com/the-lstm-reference-card-6163ca98ae87
### NOT MY CODE
import numpy as np
from scipy.special import expit as sigmoid
def forget_gate(x, h, Weights_hf, Bias_hf, Weights_xf, Bias_xf, prev_cell_state):
forget_hidden = np.dot(Weights_hf, h) + Bias_hf
forget_eventx = np.dot(Weights_xf, x) + Bias_xf
return np.multiply( sigmoid(forget_hidden + forget_eventx), prev_cell_state )
def input_gate(x, h, Weights_hi, Bias_hi, Weights_xi, Bias_xi, Weights_hl, Bias_hl, Weights_xl, Bias_xl):
ignore_hidden = np.dot(Weights_hi, h) + Bias_hi
ignore_eventx = np.dot(Weights_xi, x) + Bias_xi
learn_hidden = np.dot(Weights_hl, h) + Bias_hl
learn_eventx = np.dot(Weights_xl, x) + Bias_xl
return np.multiply( sigmoid(ignore_eventx + ignore_hidden), np.tanh(learn_eventx + learn_hidden) )
def cell_state(forget_gate_output, input_gate_output):
return forget_gate_output + input_gate_output
def output_gate(x, h, Weights_ho, Bias_ho, Weights_xo, Bias_xo, cell_state):
out_hidden = np.dot(Weights_ho, h) + Bias_ho
out_eventx = np.dot(Weights_xo, x) + Bias_xo
return np.multiply( sigmoid(out_eventx + out_hidden), np.tanh(cell_state) )
We would need the sigmoid function as well so
def sigmoid(x):
return 1/(1 + np.exp(-x))
Because pytorch stores weights in stacked manner so we need to break it up for that we would need the below function
def get_slices(hidden_dim):
slices=[]
breaker=(hidden_dim*4)
slices=[[i,i+3] for i in range(0, breaker, breaker//4)]
return slices
Now we have the functions ready for lstm, now we create an lstm class to copy the weights from pytorch class and get the output from it.
class numpy_lstm:
def __init__( self, layer_num=0, hidden_dim=1, matching_in_out=False):
self.matching_in_out=matching_in_out
self.layer_num=layer_num
self.hidden_dim=hidden_dim
def init_weights_from_pytorch(self, state):
slices=get_slices(self.hidden_dim)
print (slices)
#Event (x) Weights and Biases for all gates
lstm_weight_ih='lstm.weight_ih_l'+str(self.layer_num)
self.Weights_xi = state[lstm_weight_ih][slices[0][0]:slices[0][1]].numpy() # shape [h, x]
self.Weights_xf = state[lstm_weight_ih][slices[1][0]:slices[1][1]].numpy() # shape [h, x]
self.Weights_xl = state[lstm_weight_ih][slices[2][0]:slices[2][1]].numpy() # shape [h, x]
self.Weights_xo = state[lstm_weight_ih][slices[3][0]:slices[3][1]].numpy() # shape [h, x]
lstm_bias_ih='lstm.bias_ih_l'+str(self.layer_num)
self.Bias_xi = state[lstm_bias_ih][slices[0][0]:slices[0][1]].numpy() #shape is [h, 1]
self.Bias_xf = state[lstm_bias_ih][slices[1][0]:slices[1][1]].numpy() #shape is [h, 1]
self.Bias_xl = state[lstm_bias_ih][slices[2][0]:slices[2][1]].numpy() #shape is [h, 1]
self.Bias_xo = state[lstm_bias_ih][slices[3][0]:slices[3][1]].numpy() #shape is [h, 1]
lstm_weight_hh='lstm.weight_hh_l'+str(self.layer_num)
#Hidden state (h) Weights and Biases for all gates
self.Weights_hi = state[lstm_weight_hh][slices[0][0]:slices[0][1]].numpy() #shape is [h, h]
self.Weights_hf = state[lstm_weight_hh][slices[1][0]:slices[1][1]].numpy() #shape is [h, h]
self.Weights_hl = state[lstm_weight_hh][slices[2][0]:slices[2][1]].numpy() #shape is [h, h]
self.Weights_ho = state[lstm_weight_hh][slices[3][0]:slices[3][1]].numpy() #shape is [h, h]
lstm_bias_hh='lstm.bias_hh_l'+str(self.layer_num)
self.Bias_hi = state[lstm_bias_hh][slices[0][0]:slices[0][1]].numpy() #shape is [h, 1]
self.Bias_hf = state[lstm_bias_hh][slices[1][0]:slices[1][1]].numpy() #shape is [h, 1]
self.Bias_hl = state[lstm_bias_hh][slices[2][0]:slices[2][1]].numpy() #shape is [h, 1]
self.Bias_ho = state[lstm_bias_hh][slices[3][0]:slices[3][1]].numpy() #shape is [h, 1]
def forward_lstm_pass(self,input_data):
h = np.zeros(self.hidden_dim)
c = np.zeros(self.hidden_dim)
output_list=[]
for eventx in input_data:
f = forget_gate(eventx, h, self.Weights_hf, self.Bias_hf, self.Weights_xf, self.Bias_xf, c)
i = input_gate(eventx, h, self.Weights_hi, self.Bias_hi, self.Weights_xi, self.Bias_xi,
self.Weights_hl, self.Bias_hl, self.Weights_xl, self.Bias_xl)
c = cell_state(f,i)
h = output_gate(eventx, h, self.Weights_ho, self.Bias_ho, self.Weights_xo, self.Bias_xo, c)
if self.matching_in_out: # doesnt make sense but it was as it was in main code :(
output_list.append(h)
if self.matching_in_out:
return output_list
else:
return h
Similarly for fully connected layer,
class fully_connected_layer:
def __init__(self,state, dict_name='fc', ):
self.fc_Weight = state[dict_name+'.weight'][0].numpy()
self.fc_Bias = state[dict_name+'.bias'][0].numpy() #shape is [,output_size]
def forward(self,lstm_output, is_sigmoid=True):
res=np.dot(self.fc_Weight, lstm_output)+self.fc_Bias
print (res)
if is_sigmoid:
return sigmoid(res)
else:
return res
Now we would need one class to call all of them together and generalise them with respect to multiple layers
You can modify the below class if you need more Fully connected layers or want to set false condition for sigmoid etc.
class RNN_model_Numpy:
def __init__(self, state, input_size, hidden_dim, output_size, num_layers, matching_in_out=True):
self.lstm_layers=[]
for i in range(0, num_layers):
lstm_layer_obj=numpy_lstm(layer_num=i, hidden_dim=hidden_dim, matching_in_out=True)
lstm_layer_obj.init_weights_from_pytorch(state)
self.lstm_layers.append(lstm_layer_obj)
self.hidden2out=fully_connected_layer(state, dict_name='hidden2out')
def forward(self, feature_list):
for x in self.lstm_layers:
lstm_output=x.forward_lstm_pass(feature_list)
feature_list=lstm_output
return self.hidden2out.forward(feature_list, is_sigmoid=False)
Sanity check on a numpy variable:
data = np.array(
[[1,1],
[2,2],
[3,3]])
check=RNN_model_Numpy(state, input_size, hidden_dim, output_size, num_layers)
check.forward(data)
EXPLANATION:
Since we just need forward pass, we would need certain functions that are required in LSTM, for that we have the forget gate, input gate, cell gate and output gate. They are just some operations that are done on the input that you give.
For get_slices function, this is used to break down the weight matrix that we get from pytorch state dictionary (state dictionary) is the dictionary which contains the weights of all the layers that we have in our network.
For LSTM particularly have it in this order ignore, forget, learn, output. So for that we would need to break it up for different LSTM cells.
For numpy_lstm class, we have init_weights_from_pytorch function which must be called, what it will do is that it will extract the weights from state dictionary which we got earlier from pytorch model object and then populate the numpy array weights with the pytorch weights. You can first train your model and then save the state dictionary through pickle and then use it.
The fully connected layer class just implements the hidden2out neural network.
Finally our rnn_model_numpy class is there to ensure that if you have multiple layers then it is able to send the output of one layer of lstm to other layer of lstm.
Lastly there is a small sanity check on data variable.
IMPORTANT NOTE: PLEASE NOTE THAT YOU MIGHT GET DIMENSION ERROR AS PYTORCH WAY OF HANDLING INPUT IS COMPLETELY DIFFERENT SO PLEASE ENSURE THAT YOU INPUT NUMPY IS OF SIMILAR SHAPE AS DATA VARIABLE.
Important references:
https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html
https://christinakouridi.blog/2019/06/19/backpropagation-lstm/
I am trying to develop a non-trainable custom PCA layer, where, starting from a float input of size equal to 256, the output is a two-dimensional vector obtained from the PCA application inside the network as last level. Prior to this layer I need to have one Dense layer. Below is my PCA implementation via tensorflow and the custom level:
class TF_PCA:
def __init__(self, X):
self.X = X
self.u = None
self.singular_values = None
self.sigma = None
def fit(self):
# Perform SVD
self.singular_values, self.u, _ = tf.linalg.svd(self.X)
# Create sigma matrix
self.sigma = tf.linalg.diag(self.singular_values)
def reduce(self, n_dimensions=None, keep_info=None):
if keep_info:
# Normalize singular values
normalized_singular_values = self.singular_values / sum(self.singular_values)
# Create the aggregated ladder of kept information per dimension
ladder = np.cumsum(normalized_singular_values)
# Get the first index which is above the given information threshold
index = next(idx for idx, value in enumerate(ladder) if value >= keep_info) + 1
n_dimensions = index
# Cut out the relevant part from sigma
sigma = tf.slice(self.sigma, [0, 0], [self.X.shape[1], n_dimensions])
# PCA
pca = tf.matmul(self.u, sigma)
return pca
class CustomPCALayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(CustomPCALayer, self).__init__()
self.num_outputs = num_outputs
self.total = tf.Variable(initial_value=tf.zeros((num_outputs,)), trainable=False)
def call(self, inputs):
tf_pca = TF_PCA(inputs)
tf_pca.fit()
pca = tf_pca.reduce(n_dimensions=self.num_outputs)
return tf.convert_to_tensor(pca, dtype=tf.float32)
Once I implemented this I also implemented the neural network:
in_dim = (256,)
out_dim = 2
def build_network(inputs, out_dim):
x = Dense(256)(inputs)
x = Dropout(0.2)(x)
x = CustomPCALayer(out_dim)(x) # non-trainable layer
return x
inputs = Input(shape=in_dim)
network = build_network(inputs, out_dim)
model = Model(inputs=inputs, outputs=[network], name="network")
opt = keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimizer,
loss="mse",
metrics=[tf.keras.metrics.CosineSimilarity()])
As input and output I have something like this:
x = tf.random.uniform((300, 256))
y = tf.random.uniform((300, 2))
If I try to train, it immediately fails with the error NotImplementedError: SVD gradient has not been implemented for input with unknown inner matrix shape.
I would like to understand what is wrong with my architecture in order to solve the error and have a non-trainable layer to perform PCA
I want to expand the number of nodes of a frozen neural network model by width in pytorch. I want to do something like what shown in the below image where grey are frozen weights and green are newly added trainable weights.
.
I have an initial model which takes 3 inputs and gives one output back, this model also has two hidden layers with nodes h1=5 and h2=3 respectively. I created the model in pytorch and frozen the weights.
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(3, 5)
self.fc2 = nn.Linear(5, 3)
self.fc3 = nn.Linear(3, 1)
def forward(self,x):
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
print(Net())
model = Net()
X = torch.rand(5,3)
y = model(X)
print(y)
# Freeze layers
for param in model.parameters():
param.requires_grad = False
Now I want to expand this model by adding trainable nodes to h1=5+2, h2=3+1 and output= 1+1. Only the newly added nodes should be trainable and all other weights should be frozen, and those frozen weights should have the same weight as the parent model. Can this be done in pytorch or in tensorflow?
There are 2 things need to be done
1. Expand the layers
You really should use ModuleList or ModuleDict to create layers because that would mean you can use loop. I know eval or setattr also work but they tend to break something else so I don't want to use them.
There's 2 ways I can think of. One is directly replace the weight with something bigger and the other one is create a bigger layer and replace the whole layer.
# Replace the weight with randomly generated tensor
fc1_newweight = torch.rand(7, 3)
fc1_newbias = torch.rand(7)
fc1_shape = model.fc1.weight.shape
fc1_newweight[:fc1_shape[0], :fc1_shape[1]] = model.fc1.weight.clone()
fc1_newbias[:fc1_shape[0]] = model.fc1.bias.clone()
model.fc1.weight = torch.nn.Parameter(fc1_newweight)
model.fc1.bias = torch.nn.Parameter(fc1_newbias)
# Replace the weight with the random generated weights from the new layer
fc2_shape = model.fc2.weight.shape
fc2 = nn.Linear(7, 4)
fc2_weight = fc2.state_dict()
fc2_weight['weight'][:fc2_shape[0], :fc2_shape[1]] = model.fc2.weight.clone()
fc2_weight['bias'][:fc2_shape[0]] = model.fc2.bias.clone()
fc2.load_state_dict(fc2_weight)
model.fc2.weight = torch.nn.Parameter(fc2_weight['weight'])
model.fc2.bias = torch.nn.Parameter(fc2_weight['bias'])
# Replace the whole layer
fc3_shape = model.fc3.weight.shape
fc3 = nn.Linear(4, 2)
fc3_weight = fc3.state_dict()
fc3_weight['weight'][:fc3_shape[0], :fc3_shape[1]] = model.fc3.weight.clone()
fc3_weight['bias'][:fc3_shape[0]] = model.fc3.bias.clone()
fc3.load_state_dict(fc3_weight)
model.fc3 = fc3
I'd prefer the 2. or 3. over 1. because the weights will be generated using nn.init.kaiming_uniform instead of just uniform.
2. Select what to be trainable
This is tricky because you can't just set require_grad on only some elements of the weights because you'll get RuntimeError: you can only change requires_grad flags of leaf variables.
But something like this should be a good enough substitute. Again, using ModuleList will make the code here look a lot better too.
y = model(x)
loss = criterion(y, target)
loss.backward()
model.fc1.weight.grad[:fc1_shape[0], :fc1_shape[1]] = 0
model.fc1.bias.grad[:fc1_shape[0]] = 0
model.fc2.weight.grad[:fc2_shape[0], :fc2_shape[1]] = 0
model.fc2.bias.grad[:fc2_shape[0]] = 0
model.fc3.weight.grad[:fc3_shape[0], :fc3_shape[1]] = 0
model.fc3.bias.grad[:fc3_shape[0]] = 0
optimizer.step()