I'm trying to add a scalar parameter to my model (code too complex to attach), but it is effectively like:
class WholeModel:
def __init__(...):
self.new_parameter = Parameter(torch.scalar_tensor(0.1, requires_grad=True))
self.model = self.make_model()
def make_model(self):
d = distribution() # returns a Distribution which is a Module
d = transform_distribution(d, self.new_parameter)
d.register_parameter(name='new', param=self.new_parameter)
return d
However, I run into this error RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
If I change self.new_parameter = Parameter(torch.scalar_tensor(0.1)) to self.new_parameter = torch.scalar_tensor(0.1) and remove the register_parameter, then it compiles and runs (but then obviously its not then learning the parameter).
I've also tried using a tensor rather than a scalar_tensor but this also doesn't work. The error occurs with/without requires_grad.
Any ideas? It really is just a simple addition to a blackbox model.
Thanks
Related
I'm working on a project where the model requires access to a tensor that i declare in the constructor init of the class (im sub-classing torch.nn.Module class) and then i need to use this tensor in the forward() method via a simple matmul() , the model is sent to gpu via a cuda() call:
model = Model()
model.cuda()
However when i do forward-propagation of a simple input X through:
model(X) # or model.forward(X)
I get
RuntimeError: Expected object of type torch.cuda.FloatTensor but found
type torch.FloatTensor for argument #2 'mat2'
Indicating that the second argument of matmul(the instance tensor i declared) is on CPU and it was expected on GPU (as the rest of the model and data).
In matmul, the tensor is transposed via matrix.t()
I even tried overriding the cuda() method thorugh:
def cuda(self):
super().cuda()
self.matrix.cuda()
The data is already in the GPU ,meaning the following line of code was already executed:
X = X.cuda()
Also the error explcitly says argument 2 of matmul which for this case is the tensor(called matrix) not X.
Let's assume the following:
X is moved correctly to the GPU
The tensor declared in the Model class is a simple attribute.
i.e. Something like the following:
class Model(nn.Module):
def __init__(self):
super().__init__()
self.matrix = torch.randn(784, 10)
def forward(self, x):
return torch.matmul(x, self.matrix)
If so, your first attempt wouldn't work because the nn.Module.cuda() method only moves all of the Parameters and Buffers to the GPU.
You would need to make Model.matrix a Parameter instead of regular attribute.
Wrap it in the parameter class.
Something like:
self.matrix = nn.Parameter(torch.randn(784, 10))
Now, instead of automatically casting to the GPU like above, you tried to manually call the .cuda() method on Model.matrix within the override.
This doesn't work either because of a subtle difference between the nn.Module.cuda() method and the torch.Tensor.cuda() method.
While nn.Module.cuda() moves all the Parameters and Buffers of the Module to GPU and returns itself, torch.Tensor.cuda() only returns a copy of the tensor on the GPU.
The original tensor is unaffected.
In summary, either:
Wrap your matrix attribute as a Parameter or
Assign the GPU copy back to matrix via:
self.matrix = self.matrix.cuda()
In your override.
I would suggest the first.
I would like to highlight this from #Vaisakh's answer:
While nn.Module.cuda() moves all the Parameters and Buffers of the Module to GPU and returns itself, torch.Tensor.cuda() only returns a copy of the tensor on the GPU.
In other words, as #Umang_Gupta says in his comment:
# if m is a Module, you do:
m.cuda()
# if t is a Tensor, you do:
t = t.cuda()
For my task, I do not need to compute gradients. I am simply replacing nn.L1Loss with a numpy function (corrcoef) in my loss evaluation but I get the following error:
RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
I couldn’t figure out how exactly I should detach the graph (I tried torch.Tensor.detach(np.corrcoef(x, y)) but I still get the same error. I eventually wrapped everything using with torch.no_grad as follow:
with torch.no_grad():
predFeats = self.forward(x)
targetFeats = self.forward(target)
loss = torch.from_numpy(np.corrcoef(predFeats.cpu().numpy().astype(np.float32), targetFeats.cpu().numpy().astype(np.float32))[1][1])
But this time I get the following error:
TypeError: expected np.ndarray (got numpy.float64)
I wonder, what am I doing wrong?
TL;DR
with torch.no_grad():
predFeats = self(x)
targetFeats = self(target)
loss = torch.tensor(np.corrcoef(predFeats.cpu().numpy(),
targetFeats.cpu().numpy())[1][1]).float()
You would avoid the first RuntimeError by detaching the tensors (predFeats and targetFeats) from the computational graph.
i.e. Getting a copy of the tensor data without the gradients and the gradient function (grad_fn).
So, instead of
torch.Tensor.detach(np.corrcoef(x.numpy(), y.numpy())) # Detaches a newly created tensor!
# x and y still may have gradients. Hence the first error.
which does nothing, do
# Detaches x and y properly
torch.Tensor(np.corrcoef(x.detach().numpy(), y.detach().numpy()))
But let's not bother with all the detachments.
Like you rightfully fixed, it, let's disable the gradients.
torch.no_grad()
Now, compute the features.
predFeats = self(x) # No need for the explicit .forward() call
targetFeats = self(target)
I found it helpful to break your last line up.
loss = np.corrcoef(predFeats.numpy(), targetFeats.numpy()) # We don't need to detach
# Notice that we don't need to cast the arguments to fp32
# since the `corrcoef` casts them to fp64 anyway.
print(loss.shape, loss.dtype) # A 2-dimensional fp64 matrix
loss = loss[1][1]
print(type(loss)) # Output: numpy.float64
# Loss now just a simple fp64 number
And that is the problem!
Because, when we do
loss = torch.from_numpy(loss)
we're passing in a number (numpy.float64) while it expects a numpy tensor (np.ndarray).
If you're using PyTorch 0.4 or up, there's inbuilt support for scalars.
Simply replace the from_numpy() method with the universal tensor() creation method.
loss = torch.tensor(loss)
P.S. You might also want to look at setting rowvar=False in corrcoef since the rows in PyTorch tensors usually represent the observations.
I write a function using tensorflow ops. I know the fact when I run the function, it will add many ops to the graph. But I am confused with how to get access of these ops.
for example:
def assign_weights():
with tf.name_scope('zheng'):
v = tf.Variable(0, 'v', dtype=tf.float32)
b = tf.placeholder(tf.float32, shape=())
z = tf.assign(v, b)
return z, b
I can use feed_dict to pass a value to b, only if I set b as a return value. Otherwise, there is no way to access b. If we want to access many ops in the function scope, we should set many return values. This is very ugly.
I want to know what happens under the hood when I run functions using tensorflow and how to get access of the ops in the function scope.
Thank you!
Obviously, it's true that to access an op (or tensor) we need some reference to it. IMHO, one standard workaround is to build your graph in a class and make certain tensors attributes of the class and access them through the object.
Alternatively, if you're more inclined to the functional approach, a better way than returning all relevant ops and tensors separately would be to return a dict (or namedtuple).
Additionally, there are also specialized functions that return ops by name: e.g. get_operation_by_name.
As an aside to this question, you might also want to try out eager execution, which is imperative.
3 things happen when you use op function:
create and add a compute node to default graph
set your input as the node input tensor
set node output tensor as return value
for example, a = tf.add(b, c, name='add'),
add a node with op Add to default graph, with name 'add'
set b and c as node input tensor
set node output, with name 'add:0', to a
So you can access nodes via sess.graph, there are many functions to access nodes, say, get_operation_by_name.
Also, you can operate the graph via sess.graph_def, which is serialized graph with protobuf, you can find the protobuf definition in the tensorflow source code, tensorflow/core/framework, some .proto files there.
I'm trying to create a custom RNN cell in TensorFlow that accepts a tuple as an input, but I'm running into the problem that the parent class BasicLSTMCell requires that inputs be two-dimensional:
# Inputs must be 2-dimensional.
self.input_spec = base_layer.InputSpec(ndim=2)
How can I get around this restriction? I can't add the logic to handle the tuple in the call() method because execution never reaches the method - a dimensionality check raises an error.
I actually found this problem as well. There is a bug in the tensorflow platform. You can solve by changing the get_step_input_shape function in the recurrent.py file. Just add [0] to the end of this line: nest.map_structure(get_input_spec, input_shape))
As the title says I'm using tensorflow version 1.2 built from source for my machine. I don't believe that affects my question though.
What is the difference between these two chunks of code?
The top one causes me to never get values assigned while training but the bottom does. I am copying all my epoch data over to the gpu and then grabbing the data for each batch as I need it, so this code runs at the beginning of every batch inside the same session.
The code is in python and all of this is defined inside my model class.
All of the self.data objects are 3D float32 tensors.
## the index i.e the current step in the epoch
index = tf.to_int32(self.step, name="step_to_int")
## code that doesn't work
tf.assign(self.input_data, self.all_input_data[index])
tf.assign(self.targets, self.all_target_data[index])
## code that works
self.input_data = self.all_input_data[index]
self.targets = self.all_target_data[index]
Remember that pretty much everything is an operation in TensorFlow. I believe the issue in your code is that you never run the assignment operation (you just evaluate the input_data tensor as it has been initialised).
You then need to assign the return of the assignment method to a variable:
self.input_data = tf.assign(self.input_data, self.all_input_data[index])
This variable will hold both the new value and the reassignment operation, and so whenever you will evaluate it, it will update its value.
Quoting the doc string:
Returns:
A Tensor that will hold the new value of 'ref' after
the assignment has completed.