Custom weight initiatlization in conv1d pytorch - python

I am trying to create custom weights in conv1d as follows:
import torch
from torch import nn
conv = nn.Conv1d(1,1,kernel_size=2)
K = torch.Tensor([[[0.5, 0.5]]])
with torch.no_grad():
conv.weight = K
But I am getting the error
"File “D:\ProgramData\Miniconda3\envs\pytorchcuda102\lib\site-packages\torch\nn\modules\module.py”, line 611, in setattr
raise TypeError(“cannot assign ‘{}’ as parameter ‘{}’ "
TypeError: cannot assign ‘torch.FloatTensor’ as parameter ‘weight’ (torch.nn.Parameter or None expected)”
What am I doing wrong?

You were close. Note that you do not need to call 'with torch.no_grad()' since in the weight assignment process there will be no gradient computed.
All you need to do is to remove it and call 'conv.weight.data' instead of 'conv.weight' so that you can access the underlying parameter values.
See the fixed code below:
import torch
from torch import nn
conv = nn.Conv1d(1,1,kernel_size=2)
K = torch.Tensor([[[0.5, 0.5]]])
conv.weight.data = K

As per the discussion here, update your code to include torch.nn.Parameter(), which basically makes the weight recognizable as a parameter in optimizer.
import torch
from torch import nn
conv = nn.Conv1d(1,1,kernel_size=2)
K = torch.tensor([[[0.5, 0.5]]]) #use one dimensional as per your conv layer
conv.weight = nn.Parameter(K) #use nn.parameters
for Bigger and complex models you can see this toy example which uses pytorch state_dict() method.
import torch
import torch.nn as nn
import torchvision
net = torchvision.models.resnet18(pretrained=True)
pretrained_dict = net.state_dict()
conv_weights = pretrained_dict['conv1.weight'] #64,3,7,7
new = torch.tensor((), dtype=torch.int32)
new = new.new_ones(conv_weights.shape) #assigning all ones
pretrained_dict['conv1.weight'] = new
net.load_state_dict(pretrained_dict)
param = list(net.parameters())
print(param[0])

Related

TensorFlow Probability MixtureSameFamily layer code example: what are the parameters output from a connected dense layer?[python]

I am looking at the example code from the MixtureSameFamily layer documantion page. Specifically, I am interested in understanding what are the parameters output by the last Dense layer connected to the MixtureSameFamily layer:
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt
tfd = tfp.distributions
tfpl = tfp.layers
tfk = tf.keras
tfkl = tf.keras.layers
n = 2000
t = tfd.Uniform(low=-np.pi, high=np.pi).sample([n, 1])
r = 2 * (1 - tf.cos(t))
x = r * tf.sin(t) + tfd.Normal(loc=0., scale=0.1).sample([n, 1])
y = r * tf.cos(t) + tfd.Normal(loc=0., scale=0.1).sample([n, 1])
event_shape = [1]
num_components = 5
params_size = tfpl.MixtureSameFamily.params_size(
num_components,
component_params_size=tfpl.IndependentNormal.params_size(event_shape))
md_model = tfk.Sequential([
tfkl.Dense(12, activation='relu'),
tfkl.Dense(params_size, activation=None),
tfpl.MixtureSameFamily(num_components, tfpl.IndependentNormal(event_shape)),
])
In this case we have 15 values output from the last Dense layer. If I collect them in the following way for one fixed input sample indexed at 0:
extractor = tfk.Model(inputs=md_model.inputs,
outputs=[layer.output for layer in md_model.layers[:-1]])
features = extractor(x)
parameters = features[1][0]
what are the values found in the parameters array? I guess they should somehow relate to the mixture coefficients, location and scale of the 5 Normal distribution components constituting the mixture model. But how exactly? And in what order? I could not find this information anywhere. In other words, one thing that I would like to do is use them in a MixtureSameFamily distribution object, but I do not know how to assign them. I could do it only for num_components=1:
probs = [1]
loc = [parameters[1]]
scale = [parameter[2]]
gm = tfd.MixtureSameFamily(
mixture_distribution=tfd.Categorical(probs=probs),
components_distribution=tfd.Normal(loc=loc, scale=scale))
But I could not find the proper pattern for num_components>1. Would you have any suggestion regarding how to do this?
If we find a solution, perhaps it could be added to the example/documentation page (I could try to contribute to it).
Thanks in advance!
I guess I have found the answer.
The code to assign the parameters to a distribution object should look like this:
params_reshape = tf.reshape(parameters[..., num_components:], tf.concat([tf.shape(parameters)[:-1], [num_components, -1]], axis=0))
loc_params, scale_params = tf.split(params_reshape, 2, axis=-1)
scale_params = tf.math.softplus(scale_params)
gm = tfd.MixtureSameFamily(
mixture_distribution=tfd.Categorical(logits=parameters[..., :num_components]),
components_distribution=tfd.Independent(tfd.Normal(loc=loc_params, scale=scale_params), reinterpreted_batch_ndims=tf.size(event_shape)))
I have found the answer by digging into the source code found in the github repository. Particularly, these two classes were useful: MixtureSameFamily and IndependentNormal.

Problem initializing Tensorflow variables

I am trying to compute the input signal "maximizing" the activation of a given neuron of an encoder NN (the goal is to understand what my latent features are modelling).
I wrote a little python script which loads the .h5 file with the trained encoder model and builds a tensorflow graph to compute iteratively the "best activation signal".
It seems like my tensorflow implementation is not right. Despite the fact that I run tf.initialize_all_variables(), a FailedPreconditionError: Attempting to use uninitialized value X error is raised.
I am a little new in the use of tensorflow without using keras so this may be a trivial mistake but I could really use some help on this. Here is my code. Thanks a lot.
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
import matplotlib.pyplot as plt
input_sequence_size = 20
input_dim = 4
encoding_dim = 10
model_save = 'siple_autoencoder_encoder.h5'
model = keras.models.load_model(model_save)
lambda_param = 0.1
n_steps = 100
X = tf.Variable(tf.random_uniform([1, input_sequence_size * input_dim], -1.0, 1.0), name = 'X')
prediction = model.predict(X, steps = 1)
y = tf.gather_nd(prediction, [[0]], batch_dims=0, name=None)
gradient = tf.gradients(y, [X])[0]
step = tf.assign(X, X + lambda_param * gradient)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
# output = y.eval()
for i in range(n_steps):
sess.run(step)
activation_signal_1 = X.eval()

AttributeError: ‘RNN’ object has no attribute ‘weight_hh_l’ [duplicate]

I'd like to initialize the parameters of RNN with np arrays.
In the following example, I want to pass w to the parameters of rnn. I know pytorch provides many initialization methods like Xavier, uniform, etc., but is there way to initialize the parameters by passing numpy arrays?
import numpy as np
import torch as nn
rng = np.random.RandomState(313)
w = rng.randn(input_size, hidden_size).astype(np.float32)
rnn = nn.RNN(input_size, hidden_size, num_layers)
First, let's note that nn.RNN has more than one weight variable, c.f. the documentation:
Variables:
weight_ih_l[k] – the learnable input-hidden weights of the k-th layer, of shape (hidden_size * input_size) for k = 0. Otherwise,
the shape is (hidden_size * hidden_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size * hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the k-th layer, of shape (hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)
Now, each of these variables (Parameter instances) are attributes of your nn.RNN instance. You can access them, and edit them, two ways, as show below:
Solution 1: Accessing all the RNN Parameter attributes by name (rnn.weight_hh_lK, rnn.weight_ih_lK, etc.):
import torch
from torch import nn
import numpy as np
input_size, hidden_size, num_layers = 3, 4, 2
use_bias = True
rng = np.random.RandomState(313)
rnn = nn.RNN(input_size, hidden_size, num_layers, bias=use_bias)
def set_nn_parameter_data(layer, parameter_name, new_data):
param = getattr(layer, parameter_name)
param.data = new_data
for i in range(num_layers):
weights_hh_layer_i = rng.randn(hidden_size, hidden_size).astype(np.float32)
weights_ih_layer_i = rng.randn(hidden_size, hidden_size).astype(np.float32)
set_nn_parameter_data(rnn, "weight_hh_l{}".format(i),
torch.from_numpy(weights_hh_layer_i))
set_nn_parameter_data(rnn, "weight_ih_l{}".format(i),
torch.from_numpy(weights_ih_layer_i))
if use_bias:
bias_hh_layer_i = rng.randn(hidden_size).astype(np.float32)
bias_ih_layer_i = rng.randn(hidden_size).astype(np.float32)
set_nn_parameter_data(rnn, "bias_hh_l{}".format(i),
torch.from_numpy(bias_hh_layer_i))
set_nn_parameter_data(rnn, "bias_ih_l{}".format(i),
torch.from_numpy(bias_ih_layer_i))
Solution 2: Accessing all the RNN Parameter attributes through rnn.all_weights list attribute:
import torch
from torch import nn
import numpy as np
input_size, hidden_size, num_layers = 3, 4, 2
use_bias = True
rng = np.random.RandomState(313)
rnn = nn.RNN(input_size, hidden_size, num_layers, bias=use_bias)
for i in range(num_layers):
weights_hh_layer_i = rng.randn(hidden_size, hidden_size).astype(np.float32)
weights_ih_layer_i = rng.randn(hidden_size, hidden_size).astype(np.float32)
rnn.all_weights[i][0].data = torch.from_numpy(weights_ih_layer_i)
rnn.all_weights[i][1].data = torch.from_numpy(weights_hh_layer_i)
if use_bias:
bias_hh_layer_i = rng.randn(hidden_size).astype(np.float32)
bias_ih_layer_i = rng.randn(hidden_size).astype(np.float32)
rnn.all_weights[i][2].data = torch.from_numpy(bias_ih_layer_i)
rnn.all_weights[i][3].data = torch.from_numpy(bias_hh_layer_i)
As a detailed answer is provided, I just to add one more sentence. The parameters of an nn.Module are Tensors (previously, it used to be autograd variables, which is deperecated in Pytorch 0.4). So, essentially you need to use the torch.from_numpy() method to convert the Numpy array to Tensor and then use them to initialize the nn.Module parameters.

Gradient of the layers of a loaded neural network in Chainer

I am loading a pre-trained model in Chainer:
net=chainer.links.VGG16Layers(pretrained_model='auto')
Then, I make a forward pass with some data and add a loss layer:
acts = net.predict([image]).array
loss=chainer.Variable(np.array(np.sum(np.square(acts-one_hot))))
Now the question is, how can I make a backward pass and get the gradients of the different layers?
The typical backward method does not work.
If you want to get .grad of the input image, you have to wrap the input by chainer.Variable.
However, VGGLayers.extract() does not support inputs of Variable, so in this case you should call .forward() or its wrapping function __call__().
import chainer
from chainer import Variable
from chainer import functions as F
from cv2 import imread
from chainer.links.model.vision import vgg
net = vgg.VGG16Layers(pretrained_model='auto')
# convert raw image (np.ndarray, dtype=uint32) to a batch of Variable(dtype=float32)
img = imread("path/to/image")
img = Variable(vgg.prepare(img))
img = img.reshape((1,) + img.shape) # (channel, width, height) -> (batch, channel, width, height)
# just call VGG16Layers.forward, which is wrapped by __call__()
prob = net(img)['prob']
intermediate = F.square(prob)
loss = F.sum(intermediate)
# calculate grad
img_grad = chainer.grad([loss], [img]) # returns Variable
print(img_grad.array) # some ndarray
Point 1.
DO NOT call VGGLayers.predict(), which is not for backprop computation.
DO use VGGLayers.extract() instead.
Point 2.
DO NOT apply np.square() and np.sum() directly to chainer.Variable.
DO use F.square() and F.sum() instead for chainer.Variable.
Point 3.
Use loss.backward() to obtain .grad for learnable parameters. (pattern 1)
Use loss.backward(retain_grad=True) to obtain .grad for all variables. (pattern 2)
Use chainer.grad() to obtain .grad for a specific variable. (pattern 3)
Code:
import chainer
from chainer import functions as F, links as L
from cv2 import imread
net = L.VGG16Layers(pretrained_model='auto')
img = imread("/path/to/img")
prob = net.extract([img], layers=['prob'])['prob'] # NOT predict, which overrides chainer.config['enable_backprop'] as False
intermediate = F.square(prob)
loss = F.sum(intermediate)
# pattern 1:
loss.backward()
print(net.fc8.W.grad) # some ndarray
print(intermediate.grad) # None
###########################################
net.cleargrads()
intermediate.grad = None
prob.grad = None
###########################################
# pattern 2:
loss.backward(retain_grad=True)
print(net.fc8.W.grad) # some ndarray
print(intermediate.grad) # some ndarray
###########################################
net.cleargrads()
intermediate.grad = None
prob.grad = None
###########################################
# pattern 3:
print(chainer.grad([loss], [net.fc8.W])) # some ndarray
print(intermediate.grad) # None

Keras VGG extract features

I have loaded a pre-trained VGG face CNN and have run it successfully. I want to extract the hyper-column average from layers 3 and 8. I was following the section about extracting hyper-columns from here. However, since the get_output function was not working, I had to make a few changes:
Imports:
import matplotlib.pyplot as plt
import theano
from scipy import misc
import scipy as sp
from PIL import Image
import PIL.ImageOps
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import numpy as np
from keras import backend as K
Main function:
#after necessary processing of input to get im
layers_extract = [3, 8]
hc = extract_hypercolumn(model, layers_extract, im)
ave = np.average(hc.transpose(1, 2, 0), axis=2)
print(ave.shape)
plt.imshow(ave)
plt.show()
Get features function:(I followed this)
def get_features(model, layer, X_batch):
get_features = K.function([model.layers[0].input, K.learning_phase()], [model.layers[layer].output,])
features = get_features([X_batch,0])
return features
Hyper-column extraction:
def extract_hypercolumn(model, layer_indexes, instance):
layers = [K.function([model.layers[0].input],[model.layers[li].output])([instance])[0] for li in layer_indexes]
feature_maps = get_features(model,layers,instance)
hypercolumns = []
for convmap in feature_maps:
for fmap in convmap[0]:
upscaled = sp.misc.imresize(fmap, size=(224, 224),mode="F", interp='bilinear')
hypercolumns.append(upscaled)
return np.asarray(hypercolumns)
However, when I run the code, I'm getting the following error:
get_features = K.function([model.layers[0].input, K.learning_phase()], [model.layers[layer].output,])
TypeError: list indices must be integers, not list
How can I fix this?
NOTE:
In the hyper-column extraction function, when I use feature_maps = get_features(model,1,instance) or any integer in place of 1, it works fine. But I want to extract the average from layers 3 to 8.
It confused me a lot:
After layers = [K.function([model.layers[0].input],[model.layers[li].output])([instance])[0] for li in layer_indexes], layers is list of extracted feature.
And then you send that list into feature_maps = get_features(model,layers,instance).
In def get_features(model, layer, X_batch):, they second parameter, namely layer, is used to index in model.layers[layer].output.
What you want is:
feature_maps = get_features(model,layer_indexes,instance): passing layer indices rather than extracted features.
get_features = K.function([model.layers[0].input, K.learning_phase()], [model.layers[l].output for l in layer]): list cannot be used to indexing list.
Still, your feature abstracting function is horribly written. I suggest you to rewrite everything rather than mixing codes.
I rewrote your function for a single channel input image (W x H x 1). Maybe it will be helpful.
def extract_hypercolumn(model, layer_indexes, instance):
test_image = instance
outputs = [layer.output for layer in model.layers] # all layer outputs
comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs] # evaluation functions
feature_maps = []
for layerIdx in layer_indexes:
feature_maps.append(layer_outputs_list[layerIdx][0][0])
hypercolumns = []
for idx, convmap in enumerate(feature_maps):
# vv = np.asarray(convmap)
# print(vv.shape)
vv = np.asarray(convmap)
print('shape of feature map at layer ', layer_indexes[idx], ' is: ', vv.shape)
for i in range(vv.shape[-1]):
fmap = vv[:,:,i]
upscaled = sp.misc.imresize(fmap, size=(img_width, img_height),
mode="F", interp='bilinear')
hypercolumns.append(upscaled)
# hypc = np.asarray(hypercolumns)
# print('shape of hypercolumns ', hypc.shape)
return np.asarray(hypercolumns)

Categories