I'm new to tensorflow and try to understand how to use outside of a machine learning context. I would like to optimize a python function with the ADAM implemenation of tensorflow.
Let's assume I have the following function:
def fun_test(x):
"""
:param x: List of parameters, e.g. [1,2,3]
:return: real value
"""
res=do_something(x)
return res
When using scipy, I would call 'scipy.minimize(fun_test,x0,method="Nelder-Mead")'. How could I do this with tensorflow?
Best,
Michael
You need to rewrite the function do_something to take tensors as inputs and returns a scalar tensor (i.e. creating a computation graph). Then the following code is a sketch of how to perform optimization on the function. (BTW, in your code fun_test and do_something has no real difference so I picked the latter).
x = tf.get_variable("x", dtype=..., initializer=...)
target = do_something(x)
opt = tf.train.AdamOptimizer(...).minimize(target) # Defines one optimization step
with tf.Session() as sess:
sess.run(tf.global_variables_initializer()) # Initialize x
NUM_STEPS = 1000
for _ in range(NUM_STEPS):
sess.run(opt) # Run optimization for NUM_STEPS steps
print(sess.run(x)) # Show values of x
print(sess.run(target)) # Show target value
Related
After the introduction of Tensorflow 2.0 the scipy interface (tf.contrib.opt.ScipyOptimizerInterface) has been removed. However, I would still like to use the scipy optimizer scipy.optimize.minimize(method=’L-BFGS-B’) to train a neural network (keras model sequential). In order for the optimizer to work, it requires as input a function fun(x0) with x0 being an array of shape (n,). Therefore, the first step would be to "flatten" the weights matrices to obtain a vector with the required shape. To this end, I modified the code provided by https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/. This provides a function factory meant to create such a function fun(x0). However, the code does not seem to work and the loss function does not decrease. I would be really grateful if someone could help me work this out.
Here the piece of code I am using:
func = function_factory(model, loss_function, x_u_train, u_train)
# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)
# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')
def loss_function(x_u_train, u_train, network):
u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
return tf.cast(loss_value, dtype=tf.float32)
def function_factory(model, loss_f, x_u_train, u_train):
"""A factory to create a function required by tfp.optimizer.lbfgs_minimize.
Args:
model [in]: an instance of `tf.keras.Model` or its subclasses.
loss [in]: a function with signature loss_value = loss(pred_y, true_y).
train_x [in]: the input part of training data.
train_y [in]: the output part of training data.
Returns:
A function that has a signature of:
loss_value, gradients = f(model_parameters).
"""
# obtain the shapes of all trainable parameters in the model
shapes = tf.shape_n(model.trainable_variables)
n_tensors = len(shapes)
# we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
# prepare required information first
count = 0
idx = [] # stitch indices
part = [] # partition indices
for i, shape in enumerate(shapes):
n = np.product(shape)
idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
part.extend([i]*n)
count += n
part = tf.constant(part)
def assign_new_model_parameters(params_1d):
"""A function updating the model's parameters with a 1D tf.Tensor.
Args:
params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
"""
params = tf.dynamic_partition(params_1d, part, n_tensors)
for i, (shape, param) in enumerate(zip(shapes, params)):
model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))
# now create a function that will be returned by this factory
def f(params_1d):
"""
This function is created by function_factory.
Args:
params_1d [in]: a 1D tf.Tensor.
Returns:
A scalar loss.
"""
# update the parameters in the model
assign_new_model_parameters(params_1d)
# calculate the loss
loss_value = loss_f(x_u_train, u_train, model)
# print out iteration & loss
f.iter.assign_add(1)
tf.print("Iter:", f.iter, "loss:", loss_value)
return loss_value
# store these information as members so we can use them outside the scope
f.iter = tf.Variable(0)
f.idx = idx
f.part = part
f.shapes = shapes
f.assign_new_model_parameters = assign_new_model_parameters
return f
Here model is an object tf.keras.Sequential.
Thank you in advance for any help!
Changing from tf1 to tf2 I was exposed to the same question and after a little bit of experimenting I found the solution below that shows how to establish the interface between a function decorated with tf.function and a scipy optimizer. The important changes compared to the question are:
As mentioned by Ives scipy's lbfgs
needs to get function value and gradient, so you need to provide a function that delivers both and then set jac=True
scipy's lbfgs is a Fortran function that expects the interface to provide np.float64 arrays while tensorflow tf.function uses tf.float32.
So one has to cast input and output.
I provide an example of how this can be done for a toy problem here below.
import tensorflow as tf
import numpy as np
import scipy.optimize as sopt
def model(x):
return tf.reduce_sum(tf.square(x-tf.constant(2, dtype=tf.float32)))
#tf.function
def val_and_grad(x):
with tf.GradientTape() as tape:
tape.watch(x)
loss = model(x)
grad = tape.gradient(loss, x)
return loss, grad
def func(x):
return [vv.numpy().astype(np.float64) for vv in val_and_grad(tf.constant(x, dtype=tf.float32))]
resdd= sopt.minimize(fun=func, x0=np.ones(5),
jac=True, method='L-BFGS-B')
print("info:\n",resdd)
displays
info:
fun: 7.105427357601002e-14
hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
jac: array([-2.38418579e-07, -2.38418579e-07, -2.38418579e-07, -2.38418579e-07,
-2.38418579e-07])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 3
nit: 2
status: 0
success: True
x: array([1.99999988, 1.99999988, 1.99999988, 1.99999988, 1.99999988])
Benchmark
For comparing speed
I use
the lbfgs optimizer for a style transfer
problem (see here for the network). Note, that for this problem the network parameters are fixed and the input signal is adapted. As the optimized parameters (the input signal) are 1D the function factory is not needed.
I compare four implementations
TF1.12: TF1 with with ScipyOptimizerInterface
TF2.0 (E): the approach above without using tf.function decorators
TF2.0 (G): the approach above using tf.function decorators
TF2.0/TFP: using the lbfgs minimizer from
tensorflow_probability
For this comparison the optimization is stopped after 300 iterations (generally for convergence the problem requires 3000 iterations)
Results
Method runtime(300it) final loss
TF1.12 240s 0.045 (baseline)
TF2.0 (E) 299s 0.045
TF2.0 (G) 233s 0.045
TF2.0/TFP 226s 0.053
The TF2.0 eager mode (TF2.0(E)) works correctly but is about 20% slower than the TF1.12 baseline version. TF2.0(G) with tf.function works fine and is marginally faster than TF1.12, which is a good thing to know.
The optimizer from tensorflow_probability (TF2.0/TFP) is slightly faster than TF2.0(G) using scipy's lbfgs but does not achieve the same error reduction. In fact the decrease of the loss over time is not monotonous which seems a bad sign. Comparing the two implementations of lbfgs (scipy and tensorflow_probability=TFP) it is clear that the Fortran code in scipy is significantly more complex.
So either the simplification of the algorithm in TFP is harming here or even the fact that TFP is performing all calculations in float32 may also be a problem.
Here is a simple solution using a library (autograd_minimize) that I wrote building on the answer of Roebel:
import tensorflow as tf
from autograd_minimize import minimize
def rosen_tf(x):
return tf.reduce_sum(100.0*(x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)
res = minimize(rosen_tf, np.array([0.,0.]))
print(res.x)
>>> array([0.99999912, 0.99999824])
It also works with keras models as shown with this naive example of linear regression:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from autograd_minimize.tf_wrapper import tf_function_factory
from autograd_minimize import minimize
import tensorflow as tf
#### Prepares data
X = np.random.random((200, 2))
y = X[:,:1]*2+X[:,1:]*0.4-1
#### Creates model
model = keras.Sequential([keras.Input(shape=2),
layers.Dense(1)])
# Transforms model into a function of its parameter
func, params = tf_function_factory(model, tf.keras.losses.MSE, X, y)
# Minimization
res = minimize(func, params, method='L-BFGS-B')
print(res.x)
>>> [array([[2.0000016 ],
[0.40000062]]), array([-1.00000164])]
I guess SciPy does not know how to calculate gradients of TensorFlow objects. Try to use the original function factory (i.e., the one also returns the gradients together after loss), and set jac=True in scipy.optimize.minimize.
I tested the python code from the original Gist and replaced tfp.optimizer.lbfgs_minimize with SciPy optimizer. It worked with BFGS method:
results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')
jac=True means SciPy knows that func also returns gradients.
For L-BFGS-B, however, it's tricky. After some effort, I finally made it work. I have to comment out the #tf.function lines and let func return grads.numpy() instead of the raw TF Tensor. I guess that's because the underlying implementation of L-BFGS-B is a Fortran function, so there might be some issue converting data from tf.Tensor -> numpy array -> Fortran array. And forcing the function func to return the ndarray version of the gradients resolves the problem. But then it's not possible to use #tf.function.
(Similar Question to: Is there a tf.keras.optimizers implementation for L-BFGS?)
While this is not from anywhere as legit as tf.contrib, it's an implementation L-BFGS (and any other scipy.optimize.minimize solver) for your consideration in case it fits your use case:
https://pypi.org/project/kormos/
https://github.com/mbhynes/kormos
The package has models that extend keras.Model and keras.Sequential models, and can be compiled with .compile(..., optimizer="L-BFGS-B") to use L-BFGS in TF2, or compiled with any of the other standard optimizers (because flipping between stochastic & deterministic should be easy!):
kormos.models.BatchOptimizedModel
kormos.models.BatchOptimizedSequentialModel
How can I edit the sessions.run function so that it runs on Tensorflow 2.0?
with tf.compat.v1.Session(graph=graph) as sess:
start = time.time()
results = sess.run(output_operation.outputs[0],
{input_operation.outputs[0]: t})
I read the documentation over here and learned that you have to change a function like this:
normalized = tf.divide(tf.subtract(resized, [input_mean]), [input_std])
sess = tf.compat.v1.Session()
result = sess.run(normalized)
return result
to this:
def myFunctionToReplaceSessionRun(resized,input_mean,input_std):
return tf.divide(tf.subtract(resized, [input_mean]), [input_std])
normalized = myFunctionToReplaceSessionRun(resized,input_mean,input_std)
but I'm unable to figure out how to change the first one.
Here's a bit of context, I was trying out this code lab, and in this found that the sess.run, that was giving me trouble.
This is the command line output when running the label_images file.
And this is the function that gave errors.
With TensorFlow 1.x, we used to create tf.placeholder tensors by which the data could enter the graph. We used a feed_dict= along with the tf.Session() object.
In TensorFlow 2.0, we can directly feed the data to the graph as eager execution is enabled by default. With the #tf.function annotation, we can include the function directly in our graph. The official docs say,
At the centre of this merger is tf.function, which allows you to
transform a subset of Python syntax into portable, high-performance
TensorFlow graphs.
Here's a simple example from the docs,
#tf.function
def simple_nn_layer(x, y):
return tf.nn.relu(tf.matmul(x, y))
x = tf.random.uniform((3, 3))
y = tf.random.uniform((3, 3))
simple_nn_layer(x, y)
Now, looking into your problem, you can convert your function like,
#tf.function
def get_output_operation( input_op ):
# The function goes here
# from here return `results`
results = get_output_operation( some_input_op )
In simple and less precise words, the placeholder tensors are transformed to function arguments, the tensor in sess.run( tensor ) is returned by the function. All this happens in a #tf.function annotated function.
I want to use Tensorflow to calculate the gradients of a function. However, if I use the tf.gradients function, it returns a single list of gradients. How to return a list for each point of the batch?
# in a tensorflow graph I have the following code
tf_x = tf.placeholder(dtype=tf.float32, shape=(None,N_in), name='x')
tf_net #... conveniently defined neural network
tf_y = tf.placeholder(dtype=tf.float32, shape=(None,1), name='y')
tf_cost = (tf_net(tf_x) - tf_y)**2 # this should have length N_samples because I did not apply a tf.reduce_mean
tf_cost_gradients = tf.gradients(tf_cost,tf_net.trainable_weights)
If we run it in a tensorflow session,
# suppose myx = np.random.randn(N_samples,N_in) and myy conveniently chosen
feed = {tf_x:myx, tx_y:myy}
sess.run(tf_cost_gradients,feed)
I get only one list, and not a list for each sample as I would like. I can use
for i in len(myx):
feed = {tf_x:myx[i], tx_y:myy[i]}
sess.run(tf_cost_gradients,feed)
but this is extremely slow! What can I do? Thank you
Although, there is an 'aggregation_method' parameter in tf.gradients, it is not easy to get the individual gradients.
aggregation_method: Specifies the method used to combine gradient terms.
Please see these threads:
https://github.com/tensorflow/tensorflow/issues/15760
https://github.com/tensorflow/tensorflow/issues/4897
In one of the threads(#4897), Ian Goodfellow makes the following suggestion to speed up individual gradient computation:
This is only pseudocode, but basic idea is:
examples = tf.split(batch)
weight_copies = [tf.identity(weights) for x in examples]
output = tf.stack(f(x, w) in zip(examples, weight_copies))
cost = cost_function(output)
per_example_gradients = tf.gradients(cost, weight_copies)
I have one question about random variables in TensorFlow. Let's suppose I need a random variable inside my loss function.
In TensorFlow tutorials I find random functions used for initialize variables, like weights that in a second time are modified by training process.
In my case I need a random vector of floats (let's say 128 values), that follows a particular distribution (uniform or Gaussian) but that can change in each loss calculation.
Defining this variable in my loss function, is this the simple thing that I need to do, since at each epoch I get new values (that anyway follow the selected distribution) or do I get that the values that are always the same in all the iterations?
A random node in TensorFlow always takes a different value each time it is called, as you can verify by calling it several times
import tensorflow as tf
x = tf.random_uniform(shape=())
sess = tf.Session()
sess.run(x)
# 0.79877698
sess.run(x)
# 0.76016617
It is not a Variable in the tensorflow terminology, as you can check from the code above, which runs without calling variable initialization.
If you assign the values randomly generated to a Variable then this value will remain fixed until you update this variable.
If you, instead, put in the loss function directly the "generation" (tf.random_*) of the numbers, then they'll be different at each call.
Just try this out:
import tensorflow as tf
# generator
x = tf.random_uniform((3,1), minval=0, maxval=10)
# variable
a = tf.get_variable("a", shape=(3,1), dtype=tf.float32)
# assignment
b = tf.assign(a, x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(5):
# 5 different values
print(sess.run(x))
# assign the value
sess.run(b)
for i in range(5):
# 5 equal values
print(sess.run(a))
I am trying to compute the local variance map of an image by taking data from all possible window of fixed-size (eg 5x5), inside a training loop. To vectorize this operation I am thinking about expanding the original image with an operation similar to this using scatter_update/scatter_nd_update inside the training loop. What this operation essentially does is to map each element in the original tensor to potentially many locations in the new tensor, and the locations are computed inside the training loop.
However, scatter_update does not allow gradient propagation, and my attempt at creating a simple custom gradient for the scatter_update did not work.
#tf.RegisterGradient("CustomGrad")
def _clip_grad(unused_op, grad):
return tf.constant(5., dtype=tf.float32, shape=(1)) # tf.clip_by_value(grad, -0.1, 0.1)
x = tf.Variable([3.0], dtype=tf.float32)
y = tf.get_variable('y', shape=(1), dtype=tf.float32)
g = tf.get_default_graph()
with g.gradient_override_map({"ScatterNdUpdate1": "CustomGrad"}):
output = tf.scatter_nd_update(y, [[0]], x, name="ScatterNdUpdate1")
grad_custom = tf.gradients(output, y)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(grad_custom)
Running the code above shows that grad_custom contains None. Does any one have any idea of how to properly implement a local variance map that can be used in the training loop? Solving the gradient problem would also help me with another problem I am having.