In the reference paper for TensorFlow Distributions (now Probability), it is mentioned that TensorFlow Variables can be used to construct Bijector and TransformedDistribution objects, i.e.:
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tf.enable_eager_execution()
shift = tf.Variable(1., dtype=tf.float32)
myBij = tfp.bijectors.Affine(shift=shift)
# Normal distribution centered in zero, then shifted to 1 using the bijection
myDistr = tfd.TransformedDistribution(
distribution=tfd.Normal(loc=0., scale=1.),
bijector=myBij,
name="test")
# 2 samples of a normal centered at 1:
y = myDistr.sample(2)
# 2 samples of a normal centered at 0, obtained using inverse transform of myBij:
x = myBij.inverse(y)
I would now like to modify the shift variable (say, I might compute gradients of some likelihood function as a function of the shift and update its value) so I do
shift.assign(2.)
gx = myBij.forward(x)
I would expect that gx=y+1, but I see that gx=y... And indeed, myBij.shift still evalues to 1.
If I try to modify the bijector directly, i.e.:
myBij.shift.assign(2.)
I get
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'assign'
Computing gradients also does not work as expected:
with tf.GradientTape() as tape:
gx = myBij.forward(x)
grad = tape.gradient(gx, shift)
Yields None, as well as this exception when the script ends:
Exception ignored in: <bound method GradientTape.__del__ of <tensorflow.python.eager.backprop.GradientTape object at 0x7f529c4702e8>>
Traceback (most recent call last):
File "~/.local/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 765, in __del__
AttributeError: 'NoneType' object has no attribute 'context'
What am I missing here?
Edit: I got it working with a graph/session, so it seems there is an issue with eager execution...
Note: I have tensorflow version 1.12.0 and tensorflow_probability version 0.5.0
If you are using eager mode, you will need to recompute everything from the variable forward. Best to capture this logic in a function;
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tf.enable_eager_execution()
shift = tf.Variable(1., dtype=tf.float32)
def f():
myBij = tfp.bijectors.Affine(shift=shift)
# Normal distribution centered in zero, then shifted to 1 using the bijection
myDistr = tfd.TransformedDistribution(
distribution=tfd.Normal(loc=0., scale=1.),
bijector=myBij,
name="test")
# 2 samples of a normal centered at 1:
y = myDistr.sample(2)
# 2 samples of a normal centered at 0, obtained using inverse
# transform of myBij:
x = myBij.inverse(y)
return x, y
x, y = f()
shift.assign(2.)
gx, _ = f()
Regarding gradients, you will need to wrap calls to f() in a GradientTape
Related
I'm trying to run a code on ubuntu that uses tensorflow, it gives me the error:
AttributeError: module 'tensorflow' has no attribute 'py_func'
how can I fix it?
the relevant part of the code:
for i in range(cfg.num_layers):
neighbour_idx = tf.py_func(DP.knn_search, [batch_xyz, batch_xyz, cfg.k_n], tf.int32)
sub_points = batch_xyz[:, :tf.shape(batch_xyz)[1] // cfg.sub_sampling_ratio[i], :]
pool_i = neighbour_idx[:, :tf.shape(batch_xyz)[1] // cfg.sub_sampling_ratio[i], :]
up_i = tf.py_func(DP.knn_search, [sub_points, batch_xyz, 1], tf.int32)
input_points.append(batch_xyz)
input_neighbors.append(neighbour_idx)
input_pools.append(pool_i)
input_up_samples.append(up_i)
batch_xyz = sub_points
input_list = input_points + input_neighbors + input_pools + input_up_samples
input_list += [batch_features, batch_labels, batch_pc_idx, batch_cloud_idx]
It can't get the tf.py_func.
PS: I tried adding tf.compat.v1.py_func and it didn't work.
tf.py_func is designed for Tensorflow 1.
In Tensorflow 2,tf.numpy_function is a near-exact replacement, just drop the stateful argument (all tf.numpy_function calls are considered stateful). It is compatible with eager execution and tf.function.
tf.py_function is a close but not an exact replacement, passing TensorFlow tensors to the wrapped function instead of NumPy arrays, which provides gradients and can take advantage of accelerators.
I want to run a kernel ridge regression in python using the sklearn.kernel_ridge.KernelRidge function with a custom kernel (wendland kernel), that is not implemented in python, so I have to provide a callable (I want to avoide to use the 'precomputed' option in order to keep it in line with my other models). The problem is, that the callable has to return a float number, so it will be called once for each datapoint, which causes a real slow training.
Looking at a similar setup model, i.e. SVM.SVR, one has to provide a callable kernel function which returns the whole kernel matrix at once, which makes it much faster.
So my question is, if there is a possibility to make the KernelRidge function accept a callable function that provides the gram matrix in one step in order to speed up the process? Are there other alternatives?
import numpy as np
from sklearn.kernel_ridge import KernelRidge
from sklearn.metrics.pairwise import check_pairwise_arrays, euclidean_distances
def Wendland_kernel(eps=None):
#Kernel I want to use und am allowed to use with SVM.SVR
def Wendland_gram_intern(X, Y=None, eps=eps):
X, Y = check_pairwise_arrays(X,Y)
if eps is None:
eps = 1.0 / X.shape[1]
K = euclidean_distances(X, Y, squared=False)
K = 1 - eps*K
return np.maximum(K,0)**2
return Wendland_gram_intern
def Wendland_single(eps=None):
#Kernel I have to use
def Wendland_single_intern(x1, y1, eps=eps):
K = np.linalg.norm(x1-y1)
K = 1 - eps*K
return np.maximum(K,0)**2
return Wendland_single_intern
X = np.random.random((10,2))
y = np.random.normal(size=(10,))
clf = KernelRidge(kernel=Wendland_single(eps=2.5))
clf.fit(X, y)
print(clf.predict([[0.5,0.5]]))
I'm trying to define a gradient method for my custom TF operation. Most of the solutions I have found online seem to based on a gist by harpone. I'm reluctant to use that approach as it uses py_func which won't run on GPU. I found another solution here that uses tf.identity() that looks more elegant and I think will run on GPU. However, I have some problems accessing inputs of the ops in my custom gradient function. Here's my code:
#tf.RegisterGradient('MyCustomGradient')
def _custom_gradient(op, gradients):
x = op.inputs[0]
return(x)
def my_op(w):
return tf.pow(w,3)
var_foo = tf.Variable(5, dtype=tf.float32)
bar = my_op(var_foo)
g = tf.get_default_graph()
with g.gradient_override_map({'Identity': 'MyCustomGradient'}):
bar = tf.identity(bar)
g = tf.gradients(bar, var_foo)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(g))
I was expecting _custom_gradient() to return the input to the op (5 in this example) but instead it seems to return op output x gradient. My custom my_op will have non-differentiable operations like tf.sign and I'd like to define my custom gradient based on the inputs. What am I doing wrong?
There is no problem with your code:
Let's first do the forward pass:
var_foo = 5 -> bar = 125 -> tf.identity(bar) = 125
Now let's backpropagate:
The gradient of tf.identity(bar) with respect to its argument bar equals (by your definition) to bar, that is, 125. The gradient of bar with respect to var_foo equals 3 times the square of var_foo which is 75. Multiply, and you get 9375, which is indeed the output of your code.
op.inputs[0] contains the forward-pass value of the op. In this case, the forward pass of the identity op is 125.
I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient).
This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python
The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html
In my case, I am prototyping so I don't care about whether it runs on GPU, and I don't care about it being usable from anything other than the TF python API.
Yes, as mentionned in #Yaroslav's answer, it is possible and the key is the links he references: here and here. I want to elaborate on this answer by giving a concret example.
Modulo opperation: Let's implement the element-wise modulo operation in tensorflow (it already exists but its gradient is not defined, but for the example we will implement it from scratch).
Numpy function: The first step is to define the opperation we want for numpy arrays. The element-wise modulo opperation is already implemented in numpy so it is easy:
import numpy as np
def np_mod(x,y):
return (x % y).astype(np.float32)
The reason for the .astype(np.float32) is because by default tensorflow takes float32 types and if you give it float64 (the numpy default) it will complain.
Gradient Function: Next we need to define the gradient function for our opperation for each input of the opperation as tensorflow function. The function needs to take a very specific form. It need to take the tensorflow representation of the opperation op and the gradient of the output grad and say how to propagate the gradients. In our case, the gradients of the mod opperation are easy, the derivative is 1 with respect to the first argument and
with respect to the second (almost everywhere, and infinite at a finite number of spots, but let's ignore that, see https://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator for details). So we have
def modgrad(op, grad):
x = op.inputs[0] # the first argument (normally you need those to calculate the gradient, like the gradient of x^2 is 2x. )
y = op.inputs[1] # the second argument
return grad * 1, grad * tf.neg(tf.floordiv(x, y)) #the propagated gradient with respect to the first and second argument respectively
The grad function needs to return an n-tuple where n is the number of arguments of the operation. Notice that we need to return tensorflow functions of the input.
Making a TF function with gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc].
Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:
import tensorflow as tf
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.
Combining it all together: Now that we have all the pieces, we can combine them all together:
from tensorflow.python.framework import ops
def tf_mod(x,y, name=None):
with ops.op_scope([x,y], name, "mod") as name:
z = py_func(np_mod,
[x,y],
[tf.float32],
name=name,
grad=modgrad) # <-- here's the call to the gradient
return z[0]
tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x,y] (and return z[0]).
And now we are done. And we can test it.
Test:
with tf.Session() as sess:
x = tf.constant([0.3,0.7,1.2,1.7])
y = tf.constant([0.2,0.5,1.0,2.9])
z = tf_mod(x,y)
gr = tf.gradients(z, [x,y])
tf.initialize_all_variables().run()
print(x.eval(), y.eval(),z.eval(), gr[0].eval(), gr[1].eval())
[ 0.30000001 0.69999999 1.20000005 1.70000005] [ 0.2 0.5 1. 2.9000001] [ 0.10000001 0.19999999 0.20000005 1.70000005] [ 1. 1. 1. 1.] [ -1. -1. -1. 0.]
Success!
Here's an example of adding gradient to a specific py_func
https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342
Here's the issue discussion
The reproducible example to fix the discussion:
from sklearn.linear_model import RidgeCV
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale
boston = scale(load_boston().data)
target = load_boston().target
import numpy as np
alphas = np.linspace(1.0,200.0, 5)
fit0 = RidgeCV(alphas=alphas, store_cv_values = True, gcv_mode='eigen').fit(boston, target)
fit0.alpha_
fit0.cv_values_[:,0]
The question: what formula is used to compute fit0.cv_values_?
Edit:
#Abhinav Arora answer below seems to suggests that fit0.cv_values_[:,0][0], the first entry of fit0.cv_values_[:,0] would be
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
where fit1 is a ridge regression with alpha = 1.0, fitted to the data-set from which observation 0 was removed.
Let's see:
1) create new dataset with first row of original dataset removed:
from sklearn.linear_model import Ridge
boston1 = np.delete(boston, (0), axis=0)
target1 = np.delete(target, (0), axis=0)
2) fit a ridge model with alpha = 1.0 on this truncated dataset:
fit1 = Ridge(alpha=1.0).fit(boston1, target1)
3) check the MSE of that model on the first data-point:
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
it is array([ 37.64650853]) which is not the same as what is produced by the fit0.cv_values_[:,0], ergo:
fit0.cv_values_[:,0][0]
which is 37.495629960571137
What gives?
Quoting from the Sklearn documentation:
Cross-validation values for each alpha (if store_cv_values=True and
cv=None). After fit() has been called, this attribute will contain the
mean squared errors (by default) or the values of the
{loss,score}_func function (if provided in the constructor).
Since you have not provided any scoring function in the constructor and also not provided anything for the cv argument in the constructor, this attribute should store the mean squared error for each sample using Leave-One out cross validation. The general formula for Mean Squared Error is
where the Y (with the cap) is the prediction of your regressor and the other Y is the true value.
In your case, you are doing Leave-One out cross validation. Therefore, in every fold you have only 1 test point and thus n = 1. So, in your case doing a fit0.cv_values_[:,0] will simply give you the squared error for every point in your training data set when it was a part of the test fold and when the value of alpha was 1.0
Hope that helps.
Let's look - it's open source after all
The first call to fit makes a call upwards to its parent, _BaseRidgeCV (line 997, in that implementation). We haven't provided a cross-validation generator, so we make another call upwards to _RidgeGCV.fit. There' plenty of math in the documentation of this function, but we're so close to the source that I'll let you go and read about it.
Here's the actual source
v, Q, QT_y = _pre_compute(X, y)
n_y = 1 if len(y.shape) == 1 else y.shape[1]
cv_values = np.zeros((n_samples * n_y, len(self.alphas)))
C = []
scorer = check_scoring(self, scoring=self.scoring, allow_none=True)
error = scorer is None
for i, alpha in enumerate(self.alphas):
weighted_alpha = (sample_weight * alpha
if sample_weight is not None
else alpha)
if error:
out, c = _errors(weighted_alpha, y, v, Q, QT_y)
else:
out, c = _values(weighted_alpha, y, v, Q, QT_y)
cv_values[:, i] = out.ravel()
C.append(c)
Note the un-exciting pre_compute function
def _pre_compute(self, X, y):
# even if X is very sparse, K is usually very dense
K = safe_sparse_dot(X, X.T, dense_output=True)
v, Q = linalg.eigh(K)
QT_y = np.dot(Q.T, y)
return v, Q, QT_y
Abinav has explained what's going on on a mathematical level -it's simply accumulating the weighted mean squared error. The details of their implementation, and where it differs from your implementation, can be evaluated step-by-step from the code