I am trying to perform the most basic function minimisation possible in TensorFlow 2.0, exactly as in the question Tensorflow 2.0: minimize a simple function, however I cannot get the solution described there to work. Here is my attempt, mostly copy-pasted but with some bits that seemed to be missing added in.
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
with tf.GradientTape() as t:
y = tf.math.square(x)
# Is the tape that computes the gradients!
trainable_variables = [x]
#### Option 2
# To use minimize you have to define your loss computation as a funcction
def compute_loss():
y = tf.math.square(x)
return y
opt = tf.optimizers.Adam(learning_rate=0.001)
train = opt.minimize(compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", y)
Output:
x: <tf.Variable 'x:0' shape=() dtype=float32, numpy=1.999>
y: tf.Tensor(4.0, shape=(), dtype=float32)
So it says the minimum is at x=1.999, but obviously that is wrong. So what happened? I suppose it only performed one loop of the minimiser or something? If so then "minimize" seems like a terrible name for the function. How is this supposed to work?
On a side note, I also need to know the values of intermediate variables that are calculated in the loss function (the example only has y, but imagine that it took several steps to compute y and I want all those numbers). I don't think I am using the gradient tape correctly either, it is not obvious to me that it has anything to do with the computations in the loss function (I just copied this stuff from the other question).
You need to call minimize multiple times, because minimize only performs a single step of your optimisation.
Following should work
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
# Is the tape that computes the gradients!
trainable_variables = [x]
# To use minimize you have to define your loss computation as a funcction
class Model():
def __init__(self):
self.y = 0
def compute_loss(self):
self.y = tf.math.square(x)
return self.y
opt = tf.optimizers.Adam(learning_rate=0.01)
model = Model()
for i in range(1000):
train = opt.minimize(model.compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", model.y)
Related
I want to approximate the function g(x) = exp(x) with a linear combination of functions, h(x) = sum_i(a_i * f_i(x)) using a neural network with TF2.
Now, the network input is just x and the outputs are the functions f_i.
The custom loss function is simple: the mean squared difference |g(x) - h(x)|^2.
My problem is that I don't understand how to define/use a_i?
First, all a_i are just scalars, In addition they don't depend on x.
Defining a_i as inputs doesn't make sense since I want them to be optimized.
Defining as outputs makes them depend on x which I don't want.
How can I add these variables as scalars to the network and make them optimized by the optimization process ?.
During training, Tensorflow tracks all tf.Variable objects defined in the model. If you define your constant as a tf.Variable, it will be ajusted via backpropagation.
Let's say we have a dataset wi X, and y which is y = X * 2:
import tensorflow as tf
x = tf.random.uniform((10_000, 1))
y = x * 2
We will create a model which has a constant inside, which will need to replicate the relationship between X and y. We will of course initialize this value as something other than 2. The relationship between X and y is 2 so the training should make the constant converge towards 2. So let's define a model that is nothing but a constant.
class CustomModel(tf.keras.models.Model):
def __init__(self):
super(CustomModel, self).__init__()
self.constant = tf.Variable(initial_value=0.1, dtype=tf.float32, trainable=True)
def call(self, x, **kwargs):
x = self.constant * x
return x
model = CustomModel()
Now just compile and train the model:
model.compile(loss='mae')
history = model.fit(x, y, epochs=25, verbose=0)
Now look at the weights. The constant, which was initialized with value 0.1, is now 2. It was optimized, as it understood the relationship between X and y, which is 2.
model.weights
[<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>]
I am trying to use a manually calculate a gradient using the output of my network, I will then use this in a loss function. I have managed to get an example working in keras, but converting it to PyTorch has proven more difficult
I have a model like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(1, 50)
self.fc2 = nn.Linear(50, 10)
self.fc3 = nn.Linear(10, 1)
def forward(self, x):
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
x = self.fc3(x)
return x
and some data:
x = torch.unsqueeze(torch.linspace(-1, 1, 101), dim=1)
x = Variable(x)
I can then try find a gradient like:
output = net(x)
grad = torch.autograd.grad(outputs=output, inputs=x, retain_graph=True)[0]
I want to be able to find the gradient of each point, then do something like:
err_sqr = (grad - x)**2
loss = torch.mean(err_sqr)**2
However, at the moment if I try to do this I get the error:
grad can be implicitly created only for scalar outputs
I have tried changing the shape of my network output to fix this, but if I change it to much it says its not part of the graph. I can get rid of that error by allowing that, but then it says my gradient is None. I've managed to get this working in keras, so I'm confident that its possible here too, I just need a hand!
My questions are:
Is there a way to "fix" what I have to allow me to calculate the gradient
PyTorch expects an upstream gradient in the grad call. For usual (scalar) loss functions, the upstream gradient is implicitly assumed to be 1.
You can do a similar thing by passing ones as the upstream gradient:
grad = torch.autograd.grad(outputs=output, inputs=x, grad_outputs=torch.ones_like(output), retain_graph=True)[0]
I am currently using TensorFlow version 1.14.
In the code below, I am trying to create a dummy model that takes in 2 inputs and provides two outputs, with all weights set to ones and biases to zeros (Single layered perceptron). I am defining a custom loss function that computes the jacobian of the input layer wrt the output layer.
# Prior function
def f_i(x):
x1 = np.arctanh(x)
return np.exp(-x1**2)
B = np.random.choice(x, (10000,2), p = f_i(x)/np.sum(f_i(x)))
def my_loss(y_pred, y_true):
jacobian_tf = jacobian_tensorflow3(sim.output, sim.input)
loss = tf.abs(tf.linalg.det(jacobian_tf))
return K.mean(loss)
def jacobian_tensorflow3(x,y, verbose=False):
jacobian_matrix = []
it = tqdm(range(ndim)) if verbose else range(ndim)
for o in it:
grad_func = tf.gradients(x[:,o], y)
jacobian_matrix.append(grad_func[0])
jacobian_matrix = tf.stack(jacobian_matrix)
jacobian_matrix1 = tf.transpose(jacobian_matrix, perm=[1,0,2])
return jacobian_matrix1
sim = Sequential()
sim.add(Dense(2, kernel_initializer='ones', bias_initializer='zeros', activation='linear', input_dim=2))
sim.compile(optimizer='adam', loss=my_loss)
sim.fit(B, np.random.random(B.shape), batch_size=100, epochs=2)
While this model works in giving the result of the Jacobian matrix and also has no issues with compilation, but when I run sim.fit I get the following error:
ValueError: Variable <tf.Variable 'dense_14/bias:0' shape=(2,) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I am stuck at this step for a long time, and I am not able to proceed ahead. Any help/suggestions would be beneficial.
I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.
I am trying to make some changes to the inbuilt dropout function in tensorflow. What is the best procedure to do so?
I'd like to make some changes in forward and backpropogation steps. In Tensorflow Implementation I can only find forward pass not backward pass.
I'd like to modify both forward and backward pass.
You can use tf.custom_gradient to define your own forward and backprop step in a single method. Here is a simple example:
import tensorflow as tf
tf.InteractiveSession()
#tf.custom_gradient
def custom_multiply(a, x):
# Define your own forward step
y = a * x
# Define your own backward step
def grads(dy): return dy * x, dy * a + 100
# Return the forward result and the backward function
return y, grads
a, x = tf.constant(2), tf.constant(3)
y = custom_multiply(a, x)
dy_dx = tf.gradients(y, x)[0]
# It will print `dy/dx = 102` instead of 2 if the gradient is not customized
print('dy/dx =', dy_dx.eval())
If your want to customize your own layer, simply replace the core function used in tf.layers.Dropout.call with your own's.