I am loading a pre-trained model in Chainer:
net=chainer.links.VGG16Layers(pretrained_model='auto')
Then, I make a forward pass with some data and add a loss layer:
acts = net.predict([image]).array
loss=chainer.Variable(np.array(np.sum(np.square(acts-one_hot))))
Now the question is, how can I make a backward pass and get the gradients of the different layers?
The typical backward method does not work.
If you want to get .grad of the input image, you have to wrap the input by chainer.Variable.
However, VGGLayers.extract() does not support inputs of Variable, so in this case you should call .forward() or its wrapping function __call__().
import chainer
from chainer import Variable
from chainer import functions as F
from cv2 import imread
from chainer.links.model.vision import vgg
net = vgg.VGG16Layers(pretrained_model='auto')
# convert raw image (np.ndarray, dtype=uint32) to a batch of Variable(dtype=float32)
img = imread("path/to/image")
img = Variable(vgg.prepare(img))
img = img.reshape((1,) + img.shape) # (channel, width, height) -> (batch, channel, width, height)
# just call VGG16Layers.forward, which is wrapped by __call__()
prob = net(img)['prob']
intermediate = F.square(prob)
loss = F.sum(intermediate)
# calculate grad
img_grad = chainer.grad([loss], [img]) # returns Variable
print(img_grad.array) # some ndarray
Point 1.
DO NOT call VGGLayers.predict(), which is not for backprop computation.
DO use VGGLayers.extract() instead.
Point 2.
DO NOT apply np.square() and np.sum() directly to chainer.Variable.
DO use F.square() and F.sum() instead for chainer.Variable.
Point 3.
Use loss.backward() to obtain .grad for learnable parameters. (pattern 1)
Use loss.backward(retain_grad=True) to obtain .grad for all variables. (pattern 2)
Use chainer.grad() to obtain .grad for a specific variable. (pattern 3)
Code:
import chainer
from chainer import functions as F, links as L
from cv2 import imread
net = L.VGG16Layers(pretrained_model='auto')
img = imread("/path/to/img")
prob = net.extract([img], layers=['prob'])['prob'] # NOT predict, which overrides chainer.config['enable_backprop'] as False
intermediate = F.square(prob)
loss = F.sum(intermediate)
# pattern 1:
loss.backward()
print(net.fc8.W.grad) # some ndarray
print(intermediate.grad) # None
###########################################
net.cleargrads()
intermediate.grad = None
prob.grad = None
###########################################
# pattern 2:
loss.backward(retain_grad=True)
print(net.fc8.W.grad) # some ndarray
print(intermediate.grad) # some ndarray
###########################################
net.cleargrads()
intermediate.grad = None
prob.grad = None
###########################################
# pattern 3:
print(chainer.grad([loss], [net.fc8.W])) # some ndarray
print(intermediate.grad) # None
Related
I am trying to create custom weights in conv1d as follows:
import torch
from torch import nn
conv = nn.Conv1d(1,1,kernel_size=2)
K = torch.Tensor([[[0.5, 0.5]]])
with torch.no_grad():
conv.weight = K
But I am getting the error
"File “D:\ProgramData\Miniconda3\envs\pytorchcuda102\lib\site-packages\torch\nn\modules\module.py”, line 611, in setattr
raise TypeError(“cannot assign ‘{}’ as parameter ‘{}’ "
TypeError: cannot assign ‘torch.FloatTensor’ as parameter ‘weight’ (torch.nn.Parameter or None expected)”
What am I doing wrong?
You were close. Note that you do not need to call 'with torch.no_grad()' since in the weight assignment process there will be no gradient computed.
All you need to do is to remove it and call 'conv.weight.data' instead of 'conv.weight' so that you can access the underlying parameter values.
See the fixed code below:
import torch
from torch import nn
conv = nn.Conv1d(1,1,kernel_size=2)
K = torch.Tensor([[[0.5, 0.5]]])
conv.weight.data = K
As per the discussion here, update your code to include torch.nn.Parameter(), which basically makes the weight recognizable as a parameter in optimizer.
import torch
from torch import nn
conv = nn.Conv1d(1,1,kernel_size=2)
K = torch.tensor([[[0.5, 0.5]]]) #use one dimensional as per your conv layer
conv.weight = nn.Parameter(K) #use nn.parameters
for Bigger and complex models you can see this toy example which uses pytorch state_dict() method.
import torch
import torch.nn as nn
import torchvision
net = torchvision.models.resnet18(pretrained=True)
pretrained_dict = net.state_dict()
conv_weights = pretrained_dict['conv1.weight'] #64,3,7,7
new = torch.tensor((), dtype=torch.int32)
new = new.new_ones(conv_weights.shape) #assigning all ones
pretrained_dict['conv1.weight'] = new
net.load_state_dict(pretrained_dict)
param = list(net.parameters())
print(param[0])
I read an image from file and call predict method of Keras Inception v3 model. And I found two different results from one input.
from keras.applications.inception_v3 import InceptionV3, decode_predictions
from keras.preprocessing import image
import numpy as np
def model():
model = InceptionV3(weights='imagenet')
def predict(x):
x *= 2
x -= 1
return model.predict(np.array([x]))[0]
return predict
img = image.load_img("2.jpg", target_size=(299, 299))
img = image.img_to_array(img)
img /= 255.
p = model()
print('Predicted:', decode_predictions(np.array([p(img)]), top=3)[0])
print('Predicted:', decode_predictions(np.array([p(img)]), top=3)[0])
The output is
Predicted: [('n01443537', 'goldfish', 0.98162466), ('n02701002', 'ambulance', 0.0010537759), ('n01440764', 'tench', 0.00027527584)]
Predicted: [('n02606052', 'rock_beauty', 0.69015616), ('n01990800', 'isopod', 0.039278224), ('n01443537', 'goldfish', 0.03365362)]
where the first result is correct.
You are modifying your input (img) in the predict function not just locally, as you might expect. That modified input is used in the next predict, where it again is modified. So you are effectively appying the modifications once for in your first call to predict, but twice in the second call.
You can find more details about that behavior in this question.
I have an autoencoder and I need to add a Gaussian noise layer after my output. I need a custom layer to do this, but I really do not know how to produce it, I need to produce it using tensors.
what should I do if I want to implement the above equation in the call part of the following code?
class SaltAndPepper(Layer):
def __init__(self, ratio, **kwargs):
super(SaltAndPepper, self).__init__(**kwargs)
self.supports_masking = True
self.ratio = ratio
# the definition of the call method of custom layer
def call(self, inputs, training=None):
def noised():
shp = K.shape(inputs)[1:]
**what should I put here????**
return out
return K.in_train_phase(noised(), inputs, training=training)
def get_config(self):
config = {'ratio': self.ratio}
base_config = super(SaltAndPepper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
I also try to implement using lambda layer but it dose not work.
If you are looking for additive or multiplicative Gaussian noise, then they have been already implemented as a layer in Keras: GuassianNoise (additive) and GuassianDropout (multiplicative).
However, if you are specifically looking for the blurring effect as in Gaussian blur filters in image processing, then you can simply use a depth-wise convolution layer (to apply the filter on each input channel independently) with fixed weights to get the desired output (Note that you need to generate the weights of Gaussian kernel to set them as the weights of DepthwiseConv2D layer. For that you can use the function introduced in this answer):
import numpy as np
from keras.layers import DepthwiseConv2D
kernel_size = 3 # set the filter size of Gaussian filter
kernel_weights = ... # compute the weights of the filter with the given size (and additional params)
# assuming that the shape of `kernel_weighs` is `(kernel_size, kernel_size)`
# we need to modify it to make it compatible with the number of input channels
in_channels = 3 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
# define your model...
# somewhere in your model you want to apply the Gaussian blur,
# so define a DepthwiseConv2D layer and set its weights to kernel weights
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')
g_layer_out = g_layer(the_input_tensor_for_this_layer) # apply it on the input Tensor of this layer
# the rest of the model definition...
# do this BEFORE calling `compile` method of the model
g_layer.set_weights([kernel_weights])
g_layer.trainable = False # the weights should not change during training
# compile the model and start training...
After a while trying to figure out how to do this with the code #today has provided, I have decided to share my final code with anyone possibly needing it in future. I have created a very simple model that is only applying the blurring on the input data:
import numpy as np
from keras.layers import DepthwiseConv2D
from keras.layers import Input
from keras.models import Model
def gauss2D(shape=(3,3),sigma=0.5):
m,n = [(ss-1.)/2. for ss in shape]
y,x = np.ogrid[-m:m+1,-n:n+1]
h = np.exp( -(x*x + y*y) / (2.*sigma*sigma) )
h[ h < np.finfo(h.dtype).eps*h.max() ] = 0
sumh = h.sum()
if sumh != 0:
h /= sumh
return h
def gaussFilter():
kernel_size = 3
kernel_weights = gauss2D(shape=(kernel_size,kernel_size))
in_channels = 1 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
inp = Input(shape=(3,3,1))
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')(inp)
model_network = Model(input=inp, output=g_layer)
model_network.layers[1].set_weights([kernel_weights])
model_network.trainable= False #can be applied to a given layer only as well
return model_network
a = np.array([[[1, 2, 3], [4, 5, 6], [4, 5, 6]]])
filt = gaussFilter()
print(a.reshape((1,3,3,1)))
print(filt.predict(a.reshape(1,3,3,1)))
For testing purposes the data are only of shape 1,3,3,1, the function gaussFilter() creates a very simple model with only input and one convolution layer that provides Gaussian blurring with weights defined in the function gauss2D(). You can add parameters to the function to make it more dynamic, e.g. shape, kernel size, channels. The weights according to my findings can be applied only after the layer was added to the model.
As the Error: AttributeError: 'float' object has no attribute 'dtype', just change K.sqrt to math.sqrt, then it will work.
I am trying to build a deep network using theano. However the accuracy is zero. I can not figure out my mistake. I am trying to create a deep learning network with 3 hidden layers and one output. I am tyring to do a classification task and I have 5 classes. Therefore, the output layer have 5 nodes.
Any suggestion?
#!/usr/bin/env python
from __future__ import print_function
import theano
import theano.tensor as T
import lasagne
import numpy as np
import sklearn.datasets
import os
import csv
import pandas as pd
# Lasagne is pre-release, so it's interface is changing.
# Whenever there's a backwards-incompatible change, a warning is raised.
# Let's ignore these for the course of the tutorial
import warnings
warnings.filterwarnings('ignore', module='lasagne')
from lasagne.objectives import categorical_crossentropy, aggregate
#load the data and prepare it
df = pd.read_excel('risk_sample_data_9.20.16_anon.xls',skiprows=0)
rawdata = df.values
# remove empty rows (odd rows)
mask = np.ones(len(rawdata), dtype=bool)
mask[::2] = False
data = rawdata[mask]
idx = np.array([1,5,6,7])
m = np.zeros_like(data)
m[:,idx] = 1
X = np.ma.masked_array(data,m)
X = np.ma.filled(X, fill_value=0)
X = X.astype(theano.config.floatX)
y = data[:,7] # extract financial rating labels
# convert char lables into int , A=1 , B=2, C=3, D=4, F=5
y[y == 'A'] = 1
y[y == 'B'] = 2
y[y == 'C'] = 3
y[y == 'D'] = 4
y[y == 'F'] = 5
y = pd.to_numeric(y)
y = y.astype('int32')
#y = y.astype(theano.config.floatX)
N_CLASSES = 5
# First, construct an input layer.
# The shape parameter defines the expected input shape,
# which is just the shape of our data matrix data.
l_in = lasagne.layers.InputLayer(shape=X.shape)
# We'll create a network with two dense layers:
# A tanh hidden layer and a softmax output layer.
l_hidden1 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_in,
# This defines the layer's output dimensionality
num_units=250,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden2 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden1,
# This defines the layer's output dimensionality
num_units=100,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden3 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden2,
# This defines the layer's output dimensionality
num_units=50,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.rectify)
l_hidden4 = lasagne.layers.DenseLayer(
# The first argument is the input layer
l_hidden3,
# This defines the layer's output dimensionality
num_units=10,
# Various nonlinearities are available
nonlinearity=lasagne.nonlinearities.sigmoid)
# For our output layer, we'll use a dense layer with a softmax nonlinearity.
l_output = lasagne.layers.DenseLayer(
l_hidden4, num_units=N_CLASSES, nonlinearity=lasagne.nonlinearities.softmax)
net_output = lasagne.layers.get_output(l_output)
# As a loss function, we'll use Theano's categorical_crossentropy function.
# This allows for the network output to be class probabilities,
# but the target output to be class labels.
true_output = T.ivector('true_output')
# get_loss computes a Theano expression for the objective,
# given a target variable
# By default, it will use the network's InputLayer input_var,
# which is what we want.
#loss = objective.get_loss(target=true_output)
loss = lasagne.objectives.categorical_crossentropy(net_output, true_output)
loss = aggregate(loss, mode='mean')
# Retrieving all parameters of the network is done using get_all_params,
# which recursively collects the parameters of all layers
# connected to the provided layer.
all_params = lasagne.layers.get_all_params(l_output)
# Now, we'll generate updates using Lasagne's SGD function
updates = lasagne.updates.sgd(loss, all_params, learning_rate=1)
# Finally, we can compile Theano functions for training and
# computing the output.
# Note that because loss depends on the input variable of our input layer,
# we need to retrieve it and tell Theano to use it.
train = theano.function([l_in.input_var, true_output], loss, updates=updates)
get_output = theano.function([l_in.input_var], net_output)
def eq(x, y):
if x==y:
return 1
return 0
print("Training ...")
# Train for 100 epochs
for n in xrange(10):
train(X, y)
y_predicted = np.argmax(get_output(X), axis=1)
correct = reduce(lambda a, b: a+b, map(eq, y_predicted, y))
print("Iteration {} correct prediction {}".format(n, correct))
# Compute the predicted label of the training data.
# The argmax converts the class probability output to class label
y_predicted = np.argmax(get_output(X), axis=1)
print(y_predicted)
The learning rate seems way too high. Try a lower learning rate first. It might be that your model diverges on the task. Hard to tell without being able to try it on your data.
I have loaded a pre-trained VGG face CNN and have run it successfully. I want to extract the hyper-column average from layers 3 and 8. I was following the section about extracting hyper-columns from here. However, since the get_output function was not working, I had to make a few changes:
Imports:
import matplotlib.pyplot as plt
import theano
from scipy import misc
import scipy as sp
from PIL import Image
import PIL.ImageOps
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import numpy as np
from keras import backend as K
Main function:
#after necessary processing of input to get im
layers_extract = [3, 8]
hc = extract_hypercolumn(model, layers_extract, im)
ave = np.average(hc.transpose(1, 2, 0), axis=2)
print(ave.shape)
plt.imshow(ave)
plt.show()
Get features function:(I followed this)
def get_features(model, layer, X_batch):
get_features = K.function([model.layers[0].input, K.learning_phase()], [model.layers[layer].output,])
features = get_features([X_batch,0])
return features
Hyper-column extraction:
def extract_hypercolumn(model, layer_indexes, instance):
layers = [K.function([model.layers[0].input],[model.layers[li].output])([instance])[0] for li in layer_indexes]
feature_maps = get_features(model,layers,instance)
hypercolumns = []
for convmap in feature_maps:
for fmap in convmap[0]:
upscaled = sp.misc.imresize(fmap, size=(224, 224),mode="F", interp='bilinear')
hypercolumns.append(upscaled)
return np.asarray(hypercolumns)
However, when I run the code, I'm getting the following error:
get_features = K.function([model.layers[0].input, K.learning_phase()], [model.layers[layer].output,])
TypeError: list indices must be integers, not list
How can I fix this?
NOTE:
In the hyper-column extraction function, when I use feature_maps = get_features(model,1,instance) or any integer in place of 1, it works fine. But I want to extract the average from layers 3 to 8.
It confused me a lot:
After layers = [K.function([model.layers[0].input],[model.layers[li].output])([instance])[0] for li in layer_indexes], layers is list of extracted feature.
And then you send that list into feature_maps = get_features(model,layers,instance).
In def get_features(model, layer, X_batch):, they second parameter, namely layer, is used to index in model.layers[layer].output.
What you want is:
feature_maps = get_features(model,layer_indexes,instance): passing layer indices rather than extracted features.
get_features = K.function([model.layers[0].input, K.learning_phase()], [model.layers[l].output for l in layer]): list cannot be used to indexing list.
Still, your feature abstracting function is horribly written. I suggest you to rewrite everything rather than mixing codes.
I rewrote your function for a single channel input image (W x H x 1). Maybe it will be helpful.
def extract_hypercolumn(model, layer_indexes, instance):
test_image = instance
outputs = [layer.output for layer in model.layers] # all layer outputs
comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs] # evaluation functions
feature_maps = []
for layerIdx in layer_indexes:
feature_maps.append(layer_outputs_list[layerIdx][0][0])
hypercolumns = []
for idx, convmap in enumerate(feature_maps):
# vv = np.asarray(convmap)
# print(vv.shape)
vv = np.asarray(convmap)
print('shape of feature map at layer ', layer_indexes[idx], ' is: ', vv.shape)
for i in range(vv.shape[-1]):
fmap = vv[:,:,i]
upscaled = sp.misc.imresize(fmap, size=(img_width, img_height),
mode="F", interp='bilinear')
hypercolumns.append(upscaled)
# hypc = np.asarray(hypercolumns)
# print('shape of hypercolumns ', hypc.shape)
return np.asarray(hypercolumns)