How to prune weights of a CNN (convolution neural network) model which is less than a threshold value (let's consider prune all weights which are <= 1).
How we can achieve that for a weight file saved in .pth format in pytorch?
PyTorch since 1.4.0 provides model pruning out of the box, see official tutorial.
As there is no threshold method to prune in PyTorch currently, you have to implement it yourself, though it's kinda easy once you get the overall idea.
Threshold Pruning method
Below is a code performing pruning:
from torch.nn.utils import prune
class ThresholdPruning(prune.BasePruningMethod):
PRUNING_TYPE = "unstructured"
def __init__(self, threshold):
self.threshold = threshold
def compute_mask(self, tensor, default_mask):
return torch.abs(tensor) > self.threshold
Explanation:
PRUNING_TYPE can be one of global, structured, unstructured. global acts across whole module (e.g. remove 20% of weight with smallest value), structured acts on whole channels/modules. We need unstructured as we would like to modify each connection in specific parameter tensor (say weight or bias)
__init__ - pass here whatever you want or need to make it work, normal stuff
compute_mask - mask to be used to prune specific tensor. In our case all parameters below threshold should be zero. I did it with absolute value as it makes more sense. default_mask is not needed here, but is left as named parameter as that's what API requires atm.
Moreover, inheriting from prune.BasePruningMethod defines methods to apply the mask to each parameter, make pruning permanent etc. See base class docs for more info.
Example module
Nothing too fancy, you can put anything you want here:
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.first = torch.nn.Linear(50, 30)
self.second = torch.nn.Linear(30, 10)
def forward(self, inputs):
return self.second(torch.relu(self.first(inputs)))
module = MyModule()
You can also load your module via module = torch.load('checkpoint.pth')
if you need, it doesn't matter.
Prune module's parameters
We should define which parameter of our module (and whether it's weight or bias) should be pruned, like this:
parameters_to_prune = ((module.first, "weight"), (module.second, "weight"))
Now, we can apply globally our unstructured pruning to all defined parameters (threshold is passed as kwarg to __init__ of ThresholdPruning):
prune.global_unstructured(
parameters_to_prune, pruning_method=ThresholdPruning, threshold=0.1
)
Results
weight attribute
To see the effect, check weights of first submodule simply with:
print(module.first.weight)
It is a weight with our pruning technique applied, but please notice it's not a torch.nn.Parameter anymore! Now it is simply an attribute of our model, hence it won't take part in training or evaluation currently.
weight_mask
We can check created mask via module.first.weight_mask to see everything is done correctly (it will be binary in this case).
weight_orig
Applying pruning creates a new torch.nn.Parameter with original weights named name + _orig, in this case weight_orig, let's see:
print(module.first.weight_orig)
This parameter will be used during training and evaluation currently!. After applying pruning via methods described above there are forward_pre_hooks added which "switch" original weight to weight_orig.
Due to such approach you can define and apply your pruning at any part of training or inference without "destroying" original weights.
Applying pruning permanently
If you wish to apply pruning permanently simply issue:
prune.remove(module.first, "weight")
And now our module.first.weight is once again parameter with entries appropriately pruned, module.first.weight_mask is removed and so is module.first.weight_orig. It's what you are probably after.
You can iterate over children to make it permanent:
for child in module.children():
prune.remove(child, "weight")
You could define parameters_to_prune using the same logic:
parameters_to_prune = [(child, "weight") for child in module.children()]
Or if you want only convolution layers to be pruned (or anything else really):
parameters_to_prune = [
(child, "weight")
for child in module.children()
if isinstance(child, torch.nn.Conv2d)
]
Advantages
uses "PyTorch way of pruning" so it's easier to communicate your intent to other programmers
define pruning on a per-tensor basis, single responsibility instead of going through everything
confine to predefined ways
pruning is not permanent hence you can recover from it if needed. Module can be saved with pruning masks and original weights so it leaves you some space to revert eventual mistake (e.g. threshold was too high and now all your weights are zero rendering results meaningless)
works with original weights during forward calls unless you want to finally change to pruned version (simple call to remove)
Disadvantages
IMO pruning API could be clearer
You can do it shorter (as provided by Shai)
might be confusing for those who do not know such thing is "defined" by PyTorch (still there are tutorials and docs so I don't think it's a major problem)
You can work directly on the values saved in the state_dict:
sd = torch.load('saved_weights.pth') # load the state dicd
for k in sd.keys():
if not 'weight' in k:
continue # skip biases and other saved parameters
w = sd[k]
sd[k] = w * (w > thr) # set to zero weights smaller than thr
torch.save(sd, 'pruned_weights.pth')
Related
I'm trying to add a scalar parameter to my model (code too complex to attach), but it is effectively like:
class WholeModel:
def __init__(...):
self.new_parameter = Parameter(torch.scalar_tensor(0.1, requires_grad=True))
self.model = self.make_model()
def make_model(self):
d = distribution() # returns a Distribution which is a Module
d = transform_distribution(d, self.new_parameter)
d.register_parameter(name='new', param=self.new_parameter)
return d
However, I run into this error RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
If I change self.new_parameter = Parameter(torch.scalar_tensor(0.1)) to self.new_parameter = torch.scalar_tensor(0.1) and remove the register_parameter, then it compiles and runs (but then obviously its not then learning the parameter).
I've also tried using a tensor rather than a scalar_tensor but this also doesn't work. The error occurs with/without requires_grad.
Any ideas? It really is just a simple addition to a blackbox model.
Thanks
I'm trying to implement an INN (invertible neural network) with the structure as described in this paper.
I was wondering if it is possible to create a block (as proposed in the paper) as a custom keras layer with two different call functions.
The basic structur would look as follows:
import tensorflow as tf
import tensorflow.keras.layers as layers
class INNBlock(tf.keras.Model):
#inheriting from model instead of keras.layers.Layer, because I want manage the
#underlying layer as well
def __init__(self, size):
super(INNBlock, self).__init__(name='innblock')
#define layers
self.denseL1 = layers.Dense(size,activation='relu')
def call(self, inputs):
#define the relationship between the layers for a foward call
out = self.denseL1(inputs)
return out
def inverse_call(self, inputs):
#define inverse relationship between the layer
out = -self.denseL1(inputs) #use the same weights as the foward call
return out
class INN(tf.keras.Model):
def __init__(self,kenel_size,input_dim,min_clip,max_clip):
super(INN, self).__init__()
self.block_1 = INNBlock(size)
self.block_2 = INNBlock(size)
def call(self, inputs):
x = self.block_1(inputs)
x = self.block_2.inverse_call(y)
x = self.block_1.inverse_call(x)
return (y,x)
Solutions I already thought of (but don't particulary like):
Creating new layers for the inverse call and give them the same weights as the layers in the forward call.
Adding another dimension to inputs and have a variable in there, that determines whether or not the inverse call or the foward call is to be executed (but I don't know if this would even be allowed by keras)
I hope someone knows, if there is a way to implement this.
Thank you in advance :)
There is nothing wrong with your code. You can try it and it will run normally.
The call method is the standard method for when you simply do model_instance(input_tensor) or layer_instance(input_tensor).
But there is nothing wrong if you define another method and use that method inside the model's call method. What will happen is just:
If you use the_block(input_tensor), it will use the_block.call(input_tensor).
If you use the_block.inverse_call(input_tensor) somewhere outside a layer/model, it will fail to build a Keras model (nothing can be outside a layer)
If you use the_block.inverse_call(input_tensor) inside a layer/model (that's what you're doing), it is exactly the same as just writing the operations directly. You just wrapped it inside another function.
For Keras/Tensorflow, there will be nothing special about inverse_call. You can use it anywhere you could use any other keras/tensorflow function.
Will the gradients be updated twice?
Not exactly twice, but the operation will certainly be counted in. When the system calculates the gradient of the loss with relation to the weights, if the loss was built with inverse_call in the way, then it will participate in the gradient calculation.
But the update will be once per batch, as usual.
I have a custom keras layer that uses a fixed weight matrix. I wonder how should I handle this fixed weight matrix using the keras API with tensorflow. In particular, why would I use K.constant when self.add_weights(trainable=False) offers more flexibility (for instance, I can use Layer.set_weights with the latter one).
Concretely, in the build method I can either do:
class CustomLayer(Layer):
...
def build(self, input_shape):
self.fixed_tensor = K.constant(self.my_fixed_tensor)
self.built = True
or
class CustomLayer(Layer):
...
def build(self, input_shape):
self.fixed_tensor = self.add_weight(
shape=self.my_fixed_tensor.shape,
initializer=lambda shape, dtype: self.my_fixed_tensor,
trainable=False
)
self.built = True
Both solutions work, I wonder if they are handled differently in the backend.
K.constant is simply the Keras analogous to tf.constant, it just creates a constant-valued tensor. It is a lower-level construct and, like you say, it is useful only for values that will never change. Most times it is not necessary to call this explicitly, as doing something like 2 * my_tensor will convert the 2 into a constant
tensor with the right type automatically for you. However, in some cases you may prefer to call it explicitly, for example if you have an array of constant values and only want a single tensor representing them (instead of repeatedly converting them into new constant tensors).
add_weight is a method for layers, and it creates a TensorFlow variable representing some mutable value in the layer. Weights are a higher-level concept, related to layered models. As you point out, weights, trainable or not, can be changed dynamically.
In theory, you could have absolutely no constants in a model and replace them all with weights. However, it is generally not very practical, as variables, at least in 1.x, need to be initialized, their use cannot be optimized as well as with constants and their overhead would not give you any benefit if you are never changing it.
TLDR: How to create a variable that holds the confusion matrix used for computing custom metrics, accumulating the values across all of the evaluation steps?
I have implemented custom metrics to use in the tf.estimator.train_and_evaluation pipeline, with a confusion matrix as the crux for them all. My goal is to make this confusion matrix persist over multiple evaluation steps in order to track the learning progress.
Using get_variable in the variable scope did not work, since it does not save the variable to the checkpoint (or so it seems).
This does not work:
#property
def confusion_matrix(self):
with tf.variable_scope(
f"{self.name}-{self.metric_type}", reuse=tf.AUTO_REUSE
):
confusion_matrix = tf.get_variable(
name="confusion-matrix",
initializer=tf.zeros(
[self.n_classes, self.n_classes],
dtype=tf.float32,
name=f"{self.name}/{self.metric_type}-confusion-matrix",
),
aggregation=tf.VariableAggregation.SUM,
)
return confusion_matrix
Just saving the matrix as a class attribute works, but it obviously does not persist over multple steps:
self.confusion_matrix = tf.zeros(
[self.n_classes, self.n_classes],
dtype=tf.float32,
name=f"{self.name}/{self.metric_type}-confusion-matrix",
)
You can look at the full example here.
I expect to have this confusion matrix persist from end to finish during evaluation, but I do not need to have it in the final SavedModel. Could you please tell me how I can achieve this? Do I need to just save the matrix to an external file, or there is a better way?
You can define a custom metric:
def confusion_matrix(labels, predictions):
matrix = ... # confusion matrix calculation
mean, update_op = tf.metrics.mean_tensor(matrix)
# do something with mean if needed
return {'confusion_matrix': (mean, udpate_op)}
and then add it to your estimator:
estimator = tf.estimator.add_metrics(estimator, confusion_matrix)
if you need sum instead of meen you can take insight from tf.metrics.mean_tensor implementation
I write a function using tensorflow ops. I know the fact when I run the function, it will add many ops to the graph. But I am confused with how to get access of these ops.
for example:
def assign_weights():
with tf.name_scope('zheng'):
v = tf.Variable(0, 'v', dtype=tf.float32)
b = tf.placeholder(tf.float32, shape=())
z = tf.assign(v, b)
return z, b
I can use feed_dict to pass a value to b, only if I set b as a return value. Otherwise, there is no way to access b. If we want to access many ops in the function scope, we should set many return values. This is very ugly.
I want to know what happens under the hood when I run functions using tensorflow and how to get access of the ops in the function scope.
Thank you!
Obviously, it's true that to access an op (or tensor) we need some reference to it. IMHO, one standard workaround is to build your graph in a class and make certain tensors attributes of the class and access them through the object.
Alternatively, if you're more inclined to the functional approach, a better way than returning all relevant ops and tensors separately would be to return a dict (or namedtuple).
Additionally, there are also specialized functions that return ops by name: e.g. get_operation_by_name.
As an aside to this question, you might also want to try out eager execution, which is imperative.
3 things happen when you use op function:
create and add a compute node to default graph
set your input as the node input tensor
set node output tensor as return value
for example, a = tf.add(b, c, name='add'),
add a node with op Add to default graph, with name 'add'
set b and c as node input tensor
set node output, with name 'add:0', to a
So you can access nodes via sess.graph, there are many functions to access nodes, say, get_operation_by_name.
Also, you can operate the graph via sess.graph_def, which is serialized graph with protobuf, you can find the protobuf definition in the tensorflow source code, tensorflow/core/framework, some .proto files there.