what happens when I write a function using tensorflow ops - python

I write a function using tensorflow ops. I know the fact when I run the function, it will add many ops to the graph. But I am confused with how to get access of these ops.
for example:
def assign_weights():
with tf.name_scope('zheng'):
v = tf.Variable(0, 'v', dtype=tf.float32)
b = tf.placeholder(tf.float32, shape=())
z = tf.assign(v, b)
return z, b
I can use feed_dict to pass a value to b, only if I set b as a return value. Otherwise, there is no way to access b. If we want to access many ops in the function scope, we should set many return values. This is very ugly.
I want to know what happens under the hood when I run functions using tensorflow and how to get access of the ops in the function scope.
Thank you!

Obviously, it's true that to access an op (or tensor) we need some reference to it. IMHO, one standard workaround is to build your graph in a class and make certain tensors attributes of the class and access them through the object.
Alternatively, if you're more inclined to the functional approach, a better way than returning all relevant ops and tensors separately would be to return a dict (or namedtuple).
Additionally, there are also specialized functions that return ops by name: e.g. get_operation_by_name.
As an aside to this question, you might also want to try out eager execution, which is imperative.

3 things happen when you use op function:
create and add a compute node to default graph
set your input as the node input tensor
set node output tensor as return value
for example, a = tf.add(b, c, name='add'),
add a node with op Add to default graph, with name 'add'
set b and c as node input tensor
set node output, with name 'add:0', to a
So you can access nodes via sess.graph, there are many functions to access nodes, say, get_operation_by_name.
Also, you can operate the graph via sess.graph_def, which is serialized graph with protobuf, you can find the protobuf definition in the tensorflow source code, tensorflow/core/framework, some .proto files there.

Related

Add additional scalar parameter to pytorch model gives runtimeerror

I'm trying to add a scalar parameter to my model (code too complex to attach), but it is effectively like:
class WholeModel:
def __init__(...):
self.new_parameter = Parameter(torch.scalar_tensor(0.1, requires_grad=True))
self.model = self.make_model()
def make_model(self):
d = distribution() # returns a Distribution which is a Module
d = transform_distribution(d, self.new_parameter)
d.register_parameter(name='new', param=self.new_parameter)
return d
However, I run into this error RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
If I change self.new_parameter = Parameter(torch.scalar_tensor(0.1)) to self.new_parameter = torch.scalar_tensor(0.1) and remove the register_parameter, then it compiles and runs (but then obviously its not then learning the parameter).
I've also tried using a tensor rather than a scalar_tensor but this also doesn't work. The error occurs with/without requires_grad.
Any ideas? It really is just a simple addition to a blackbox model.
Thanks

How to prune weights less than a threshold in PyTorch?

How to prune weights of a CNN (convolution neural network) model which is less than a threshold value (let's consider prune all weights which are <= 1).
How we can achieve that for a weight file saved in .pth format in pytorch?
PyTorch since 1.4.0 provides model pruning out of the box, see official tutorial.
As there is no threshold method to prune in PyTorch currently, you have to implement it yourself, though it's kinda easy once you get the overall idea.
Threshold Pruning method
Below is a code performing pruning:
from torch.nn.utils import prune
class ThresholdPruning(prune.BasePruningMethod):
PRUNING_TYPE = "unstructured"
def __init__(self, threshold):
self.threshold = threshold
def compute_mask(self, tensor, default_mask):
return torch.abs(tensor) > self.threshold
Explanation:
PRUNING_TYPE can be one of global, structured, unstructured. global acts across whole module (e.g. remove 20% of weight with smallest value), structured acts on whole channels/modules. We need unstructured as we would like to modify each connection in specific parameter tensor (say weight or bias)
__init__ - pass here whatever you want or need to make it work, normal stuff
compute_mask - mask to be used to prune specific tensor. In our case all parameters below threshold should be zero. I did it with absolute value as it makes more sense. default_mask is not needed here, but is left as named parameter as that's what API requires atm.
Moreover, inheriting from prune.BasePruningMethod defines methods to apply the mask to each parameter, make pruning permanent etc. See base class docs for more info.
Example module
Nothing too fancy, you can put anything you want here:
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.first = torch.nn.Linear(50, 30)
self.second = torch.nn.Linear(30, 10)
def forward(self, inputs):
return self.second(torch.relu(self.first(inputs)))
module = MyModule()
You can also load your module via module = torch.load('checkpoint.pth')
if you need, it doesn't matter.
Prune module's parameters
We should define which parameter of our module (and whether it's weight or bias) should be pruned, like this:
parameters_to_prune = ((module.first, "weight"), (module.second, "weight"))
Now, we can apply globally our unstructured pruning to all defined parameters (threshold is passed as kwarg to __init__ of ThresholdPruning):
prune.global_unstructured(
parameters_to_prune, pruning_method=ThresholdPruning, threshold=0.1
)
Results
weight attribute
To see the effect, check weights of first submodule simply with:
print(module.first.weight)
It is a weight with our pruning technique applied, but please notice it's not a torch.nn.Parameter anymore! Now it is simply an attribute of our model, hence it won't take part in training or evaluation currently.
weight_mask
We can check created mask via module.first.weight_mask to see everything is done correctly (it will be binary in this case).
weight_orig
Applying pruning creates a new torch.nn.Parameter with original weights named name + _orig, in this case weight_orig, let's see:
print(module.first.weight_orig)
This parameter will be used during training and evaluation currently!. After applying pruning via methods described above there are forward_pre_hooks added which "switch" original weight to weight_orig.
Due to such approach you can define and apply your pruning at any part of training or inference without "destroying" original weights.
Applying pruning permanently
If you wish to apply pruning permanently simply issue:
prune.remove(module.first, "weight")
And now our module.first.weight is once again parameter with entries appropriately pruned, module.first.weight_mask is removed and so is module.first.weight_orig. It's what you are probably after.
You can iterate over children to make it permanent:
for child in module.children():
prune.remove(child, "weight")
You could define parameters_to_prune using the same logic:
parameters_to_prune = [(child, "weight") for child in module.children()]
Or if you want only convolution layers to be pruned (or anything else really):
parameters_to_prune = [
(child, "weight")
for child in module.children()
if isinstance(child, torch.nn.Conv2d)
]
Advantages
uses "PyTorch way of pruning" so it's easier to communicate your intent to other programmers
define pruning on a per-tensor basis, single responsibility instead of going through everything
confine to predefined ways
pruning is not permanent hence you can recover from it if needed. Module can be saved with pruning masks and original weights so it leaves you some space to revert eventual mistake (e.g. threshold was too high and now all your weights are zero rendering results meaningless)
works with original weights during forward calls unless you want to finally change to pruned version (simple call to remove)
Disadvantages
IMO pruning API could be clearer
You can do it shorter (as provided by Shai)
might be confusing for those who do not know such thing is "defined" by PyTorch (still there are tutorials and docs so I don't think it's a major problem)
You can work directly on the values saved in the state_dict:
sd = torch.load('saved_weights.pth') # load the state dicd
for k in sd.keys():
if not 'weight' in k:
continue # skip biases and other saved parameters
w = sd[k]
sd[k] = w * (w > thr) # set to zero weights smaller than thr
torch.save(sd, 'pruned_weights.pth')

tensorflow Optimizer.py "optimizer._get_variable_for" file function

I am trying to implement an Optimizer in tensorflow, and have been looking at optimizer code from old version of tensorflow, and want to understand what does this function _get_variable_for do? It is the first function in the optimizer file.
Any help would be appreciated.
Thanking You.
I see that this function checks two conditions.
ResourceVariable and VarHandleOp
This is a ResourceVariable according to the comments in the code
"For example, if there is more than one assignment to a ResourceVariable in
a single session.run call there is a well-defined value for each operation
which uses the variable's value if the assignments and the read are connected
by edges in the graph. Consider the following example, in which two writes
can cause tf.Variable and tf.ResourceVariable to behave differently:"
a = tf.Variable(1.0, use_resource=True)
a.initializer.run()
assign = a.assign(2.0)
with tf.control_dependencies([assign]):
b = a.read_value()
with tf.control_dependencies([b]):
other_assign = a.assign(3.0)
with tf.control_dependencies([other_assign]):
# Will print 2.0 because the value was read before other_assign ran. If
# `a` was a tf.Variable instead, 2.0 or 3.0 could be printed.
tf.Print(b, [b]).eval()
VarHandleOp seems to have deeper semantics as per this
"A common approach to managing where variables are placed, is to create a method to determine where each Op is to be placed and use that method in place of a specific device name when calling with tf.device(): Consider a scenario where a model is being trained on 2 GPUs and the variables are to be placed on the CPU. There would be a loop for creating and placing the "towers" on each of the 2 GPUs. A custom device placement method would be created that watches for Ops of type Variable, VariableV2, and VarHandleOp and indicates that they are to be placed on the CPU. All other Ops would be placed on the target GPU."
It explains this scenario further with sample code.

Two staged training for different dependent graphs in tensorflow

I am trying to perform two tasks (A and B) which have inputs inp_A and inp_B and corresponding outputs out_A, out_B.
Task A is to be first achieved by a Graph g_A. After Task A is finished, I wish to use the weights of g_A into a new graph g_B which is a bigger graph ( a superset of g_A).
I am unsure how to do this in tensorflow.
I am using this kind of split for training and validation purposes, where I create dedicated input and output pipelines but share the inception part of the graph, although I'm using the same graph (as of tf.Graph()), but different (unconnected) subgraphs within it.
Within one tf.Graph() the general concept is variable sharing which you can achieve by using tf.variable_scope() to group your variables by concept and them creating and refetching them by using tf.get_variable() (instead of using tf.Variable() directly). The first time it's called it will create the variables, the second time it will reuse them - provided the name of the variable stays the same.
However I found it much easier to use tf.make_template() instead, which will wrap a function that creates a subgraph entirely and on every call creates a new instance of the graph while sharing all of its variables.
The documentation example for that is
def my_op(x, scalar_name):
var1 = tf.get_variable(scalar_name,
shape=[],
initializer=tf.constant_initializer(1))
return x * var1
create_instance = tf.make_template('scale_by_y', my_op, scalar_name='y')
z = create_instance(input1)
w = create_instance(input2)
Here, each call to create_instance will create a new node called scale_by_y in the graph that performs the operation defined by my_op() while sharing its internal variables. (In the example, the parameter scalar_name is statically bound to the value of y, resulting in variable scale_by_y/y to be created (and reused) in the graph. I find that to be more confusing than helpful.)
It does not care about parent scopes, so
with tf.variable_scope('training'):
z1 = create_instance(input1)
with tf.variable_scope('validation'):
z2 = create_instance(input2)
works. It also might or might not work across different tf.Graph() instances, though I doubt it.

Is it possible to modify an existing TensorFlow computation graph?

TensorFlow graph is usually built gradually from inputs to outputs, and then executed. Looking at the Python code, the inputs lists of operations are immutable which suggests that the inputs should not be modified. Does that mean that there is no way to update/modify an existing graph?
The TensorFlow tf.Graph class is an append-only data structure, which means that you can add nodes to the graph after executing part of the graph, but you cannot remove or modify existing nodes. Since TensorFlow executes only the necessary subgraph when you call Session.run(), there is no execution-time cost to having redundant nodes in the graph (although they will continue to consume memory).
To remove all nodes in the graph, you can create a session with a new graph:
with tf.Graph().as_default(): # Create a new graph, and make it the default.
with tf.Session() as sess: # `sess` will use the new, currently empty, graph.
# Build graph and execute nodes in here.
Yes, tf.Graph are build in an append-only fashion as #mrry puts it.
But there's workaround:
Conceptually you can modify an existing graph by cloning it and perform the modifications needed along the way. As of r1.1, Tensorflow provides a module named tf.contrib.graph_editor which implements the above idea as a set of convinient functions.
In addition to what #zaxily and #mrry says, I want to provide an example of how to actually do a modification to the graph. In short:
one can not modify existing operations, all ops are final and non-mutable
one may copy an op, modify it's inputs or attributes and add new op back to the graph
all downstream ops that depend on the new/copied op have to be recreated. Yes, a signifficant portion of the graph would be copied copied, which is not a problem
The code:
import tensorflow
import copy
import tensorflow.contrib.graph_editor as ge
from copy import deepcopy
a = tf.constant(1)
b = tf.constant(2)
c = a+b
def modify(t):
# illustrate operation copy&modification
new_t = deepcopy(t.op.node_def)
new_t.name = new_t.name+"_but_awesome"
new_t = tf.Operation(new_t, tf.get_default_graph())
# we got a tensor, let's return a tensor
return new_t.outputs[0]
def update_existing(target, updated):
# illustrate how to use new op
related_ops = ge.get_backward_walk_ops(target, stop_at_ts=updated.keys(), inclusive=True)
new_ops, mapping = ge.copy_with_input_replacements(related_ops, updated)
new_op = mapping._transformed_ops[target.op]
return new_op.outputs[0]
new_a = modify(a)
new_b = modify(b)
injection = new_a+39 # illustrate how to add another op to the graph
new_c = update_existing(c, {a:injection, b:new_b})
with tf.Session():
print(c.eval()) # -> 3
print(new_c.eval()) # -> 42
For tensorflow v>=2.6, using Graph directly have been depcreated
A tf.Graph can be constructed and used directly without a tf.function, as was required in TensorFlow 1, but this is deprecated and it is recommended to use a tf.function instead. If a graph is directly used, other deprecated TensorFlow 1 classes are also required to execute the graph, such as a tf.compat.v1.Session.
That being said, I think your question can be still relevant, I think the kind of problem you are facing might be solved when using tensorflow eager execution. While running tf in eager mode, you can run, modify the graph before building it test it before building it...
TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow and debug models, and it reduces boilerplate as well. To follow along with this guide, run the code samples below in an interactive python interpreter.
However be careful eager mode trade debugging/flexbillity with performance/speed, so for production you might consider turning it off.
Lastly, there is other feature of tensorflow, that might be relvant for this probelm, which is tensor slicing, tf.slice.

Categories