Define recursive function in pytorch - python

Hi I need to define the following recursive function in pytorch which is updated at each new time step:
y_t+1(x) = y_t(x) ^ (1/(1+eta_t*lambda_t)) * exp (-eta_t * f_t(x)/(1+eta_t*lambda_t))
I wanted to know which is the most efficient way to define this function given that I will need to evaluate it many times.\
I thought about having some lists where I save the values eta_t, lambda_t, f_t(x) at each timestep t, but it seems time consuming given that I would need to iterate over the lists to obtain y_t+1(x).
Does anyone with more experience in pytorch know what is the best approach for this?
Right know given that the function is a probabilit function I am working in log probability and I have defined a class just for that function where I have defined the following function called at each timestep to update log_y:
def update(self, eta_t, lambda_t, f_t):
self.log_y = lambda x: 1/(1+eta_tlambda_t)(self.log_y(x)-eta_t*f_t(x))
But I get the following error once I call update and I try to evaluate log_y on a value of x:
RecursionError: maximum recursion depth exceeded
It seems that it is calling the update function recursively

Related

scipy minimize COBYLA callback not working

I am using scipy.optimize's minimizefunction, and I would like to terminate the search as soon as the function value is below some threshold. I have tried using a callback that returns true when the above mentioned condition is met, but this in my code the search just continues.
I also have another "fundamental" issue with using the structure of callback they documentation requires: my function is pretty expensive to evaluate, and using the callback I evaluate it twice for the same set of parameters (once with the callback and the second as actual iteration); so if I could be spared from the extra computational cost it would also be nice.
Below my code
class MinimizeStopper(object):
def __init__(self, maximal_non_overlap = 0.05):
self.max_non_overlap = maximal_non_overlap
def __call__(self, xk):
res = fit_min(xk)
return (res <= self.max_non_overlap)
my_cb = MinimizeStopper(0.1)
print(scipy.optimize.minimize(fit_min, ansatz_params[1], callback= my_cb.__call__, method='COBYLA'))
I guess scipy.optimize.minimize's documentation is not 100% clear in regard to the callbacks. AFAIK, only the 'trust-constr' method gets terminated by the callback once the callback returns True. For all the remaining methods the callback can only be used for logging purposes.
Regarding the second part of your question, I cite the docs:
For ‘trust-constr’ it is a callable with the signature:
callback(xk, OptimizeResult state) -> bool
where xk is the current parameter vector. and state is an OptimizeResult > object, with the same fields as the ones from the return.
Thus, assuming you're open for 'trust-constr', you don't need to evaluate your objective function again since you can directly access the objective value in the current iteration via state.fun:
from scipy.optimize import minimize
def cb(xk, state, threshold_value = 0.1):
return state.fun <= threshold_value
res = minimize(your_fun, x0=x0, callback=cb, method="trust-constr")
Since your objective function is expensive to evaluate, it's highly recommended to pass the gradient and hessian as well if your objective is twice continuously differentiable and both are known. Otherwise, both get approximated by finite differences which will be quite slow.

How to define a python function 'on the fly' for use with pymanopt/autodifferentiation

I had no idea how to phrase the title of this question, so apologies for any confusion there. I am using the pymanopt package for optimization and would like to be able to create some sort of a function/method that allows for a generalized input (variable amount of input arrays). To use pymanopt, one has to provide a cost function defined in terms of array that are to be optimized to minimize the cost.
For example, a cost function could be:
#pymanopt.function.Autograd
def f(A,B):
return ((X - A#B.T)**2).sum()
To do the optimization, the variable X is defined prior to f, then f is supplied as the cost function to the pymanopt solver. Optimization is done with respect to the arguments of f and these arrays are returned by pymanopt with values that minimize the cost function.
Ideally, I would like to be able to do this definition more dynamically. So instead of defining a function in terms of hard coded arrays, to be able to supply a list of variables to be optimized. So if my cost function was instead:
#pymanopt.function.Autograd
def f(L):
return ((X - np.linalg.multi_dot(L)**2).sum()
Where the arrays A,B,...,C would be stored in a list, L. However, as far as I can tell, the variables to be optimized have to be directly defined as individual arrays in the cost function supplied to the solver.
The only thing I can think of doing is to define the cost function by creating a string that contains the 'hard coded' function and executing it via exec() with something like this:
args = ','.join(['A{}'.format(i) for i in range(len(L))])
exec('#pymanopt.function.Autograd\ndef({}):\n\treturn ((X-np.linalg.multi_dot({}))**2).sum()'.format(args,args))
but I understand that using this method should be avoided if possible. Any advice for navigating this sort of problem is greatly appreciated - thanks! Please let me know if anything is unclear/doesn't make sense.

Python recursive algorithm segmentation fault

I'm pretty bad with recursion as it is, but this algorithm naturally seems like it's best done recursively. Basically, I have a list of all the function calls made in a C program, throughout multiple files. This list is unordered. My recursive algorithm attempts to make a tree of all the functions called, starting from the main method.
This works perfectly fine for smaller programs, but when I tried it out with larger ones I'm getting this error. I read that the issue might be due to me exceeding the cstack limit? Since I already tried raising the recursion limit in python.
Would appreciate some help here, thanks.
functions = set containing a list of function calls and their info, type Function. The data in node is of type Function.
#dataclass
class Function:
name : str
file : str
id : int
calls : set
....
Here's the algorithm.
def order_functions(node, functions, defines):
calls = set()
# Checking if the called function is user-defined
for call in node.data.calls:
if call in defines:
calls.add(call)
node.data.calls = calls
if len(calls) == 0:
return node
for call in node.data.calls:
child = Node(next((f for f in functions if f.name == call), None))
node.add_child(child)
Parser.order_functions(child, functions, defines)
return node
If you exceed the predefined limit on the call stack size, the best idea probably is to rewrite an iterative version of your program. If you have no idea on how deeply your recursion will go, then don't use recursion.
More information here, and maybe if you need to implement an iterative version you can get inspiration from this post.
The main information here is that python doesn't perform any tail recursion elimination. Therefore recursive functions will never work with inputs that have an unknown/unbounded hierarchical structure.

How should I use #pm.stochastic in PyMC?

Fairly simple question: How should I use #pm.stochastic? I have read some blog posts that claim #pm.stochasticexpects a negative log value:
#pm.stochastic(observed=True)
def loglike(value=data):
# some calculations that generate a numeric result
return -np.log(result)
I tried this recently but found really bad results. Since I also noticed that some people used np.log instead of -np.log, I give it a try and worked much better. What is really expecting #pm.stochastic? I'm guessing there was a small confusion on the sign required due to a very popular example using something like np.log(1/(1+t_1-t_0)) which was written as -np.log(1+t_1-t_0)
Another question: What is this decorator doing with the value argument? As I understand it, we start with some proposed value for the priors that need to enter in the likelihood and the idea of #pm.stochastic is basically produce some number to compare this likelihood to the number generated by the previous iteration in the sampling process. The likelihood should receive the value argument and some values for the priors, but I'm not sure if this is all value is doing because that's the only required argument and yet I can write:
#pm.stochastic(observed=True)
def loglike(value=[1]):
data = [3,5,1] # some data
# some calculations that generate a numeric result
return np.log(result)
And as far as I can tell, that produces the same result as before. Maybe, it works in this way because I added observed=True to the decorator. If I would have tried this in a stochastic variable with observed=False by default, value would be changed in each iteration trying to obtain a better likelihood.
#pm.stochastic is a decorator, so it is expecting a function. The simplest way to use it is to give it a function that includes value as one of its arguments, and returns a log-likelihood.
You should use the #pm.stochastic decorator to define a custom prior for a parameter in your model. You should use the #pm.observed decorator to define a custom likelihood for data. Both of these decorators will create a pm.Stochastic object, which takes its name from the function it decorates, and has all the familiar methods and attributes (here is a nice article on Python decorators).
Examples:
A parameter a that has a triangular distribution a priori:
#pm.stochastic
def a(value=.5):
if 0 <= value < 1:
return np.log(1.-value)
else:
return -np.inf
Here value=.5 is used as the initial value of the parameter, and changing it to value=1 raises an exception, because it is outside of the support of the distribution.
A likelihood b that has is normal distribution centered at a, with a fixed precision:
#pm.observed
def b(value=[.2,.3], mu=a):
return pm.normal_like(value, mu, 100.)
Here value=[.2,.3] is used to represent the observed data.
I've put this together in a notebook that shows it all in action here.
Yes confusion is easy since the #stochastic returns a likelihood which is the opposite of the error essentially. So you take the negative log of your custom error function and return THAT as your log-likelihood.

Can I pass the objective and derivative functions to scipy.optimize.minimize as one function?

I'm trying to use scipy.optimize.minimize to minimize a complicated function. I noticed in hindsight that the minimize function takes the objective and derivative functions as separate arguments. Unfortunately, I've already defined a function which returns the objective function value and first-derivative values together -- because the two are computed simultaneously in a for loop. I don't think there is a good way to separate my function into two without the program essentially running the same for loop twice.
Is there a way to pass this combined function to minimize?
(FYI, I'm writing an artificial neural network backpropagation algorithm, so the for loop is used to loop over training data. The objective and derivatives are accumulated concurrently.)
Yes, you can pass them in a single function:
import numpy as np
from scipy.optimize import minimize
def f(x):
return np.sin(x) + x**2, np.cos(x) + 2*x
sol = minimize(f, [0], jac=True, method='L-BFGS-B')
Something that might work is: you can memoize the function, meaning that if it gets called with the same inputs a second time, it will simply return the same outputs corresponding to those inputs without doing any actual work the second time. What is happening behind the scenes is that the results are getting cached. In the context of a nonlinear program, there could be thousands of calls which implies a large cache. Often with memoizers(?), you can specify a cache limit and the population will be managed FIFO. IOW you still benefit fully for your particular case because the inputs will be the same only when you are needing to return function value and derivative around the same point in time. So what I'm getting at is that a small cache should suffice.
You don't say whether you are using py2 or py3. In Py 3.2+, you can use functools.lru_cache as a decorator to provide this memoization. Then, you write your code like this:
#functools.lru_cache
def original_fn(x):
blah
return fnvalue, fnderiv
def new_fn_value(x):
fnvalue, fnderiv = original_fn(x)
return fnvalue
def new_fn_deriv(x):
fnvalue, fnderiv = original_fn(x)
return fnderiv
Then you pass each of the new functions to minimize. You still have a penalty because of the second call, but it will do no work if x is unchanged. You will need to research what unchanged means in the context of floating point numbers, particularly since the change in x will fall away as the minimization begins to converge.
There are lots of recipes for memoization in py2.x if you look around a bit.
Did I make any sense at all?

Categories