I am implementing this code (found here: https://emukit.readthedocs.io/en/latest/notebooks/Emukit-tutorial-custom-model.html)
import numpy as np
from emukit.experimental_design import ExperimentalDesignLoop
from emukit.core import ParameterSpace, ContinuousParameter
from emukit.core.loop import UserFunctionWrapper
from sklearn.gaussian_process import GaussianProcessRegressor
x_min = -30.0
x_max = 30.0
X = np.random.uniform(x_min, x_max, (10, 1))
Y = np.sin(X) + np.random.randn(10, 1) * 0.05
sklearn_gp = GaussianProcessRegressor();
sklearn_gp.fit(X, Y);
from emukit.core.interfaces import IModel
class SklearnGPModel(IModel):
def __init__(self, sklearn_model):
self.model = sklearn_model
def predict(self, X):
mean, std = self.model.predict(X, return_std=True)
return mean[:, None], np.square(std)[:, None]
def set_data(self, X: np.ndarray, Y: np.ndarray) -> None:
self.model.fit(X, Y)
def optimize(self, verbose: bool = False) -> None:
# There is no separate optimization routine for sklearn models
pass
#property
def X(self) -> np.ndarray:
return self.model.X_train_
#property
def Y(self) -> np.ndarray:
return self.model.y_train_
emukit_model = SklearnGPModel(sklearn_gp)
p = ContinuousParameter('c', x_min, x_max)
space = ParameterSpace([p])
loop = ExperimentalDesignLoop(space, emukit_model)
loop.run_loop(np.sin, 50)
I am trying to implement this code but with the exteral data set. To do this, I need to understand if I can extract the 50 x-values propagated through the np.sin function when the loop.run_loop(np.sin, 50) is executed. Then, having obtained these 50 inputs (x-values), I need to propagate them in an external software, which saves the result as .csv file.
The information that I would have, that needs to be "put through" the loop.run_loop() is as follows:
So, I need to make the loop.run_loop() code work by loading an external results data but do now know how to implement that.
If i understand your question correctly, passing data does not make sense in this context. The default acquisition function will select the next input (or experiment) based on the your model. Your model is updated at each iteration from the outcome of your experiment and the next experiment is dependent on previous observations - it's not random.
Passing your samples independently of this loop would be significantly less informative.
In short, you need to define a function similar to np.sin that can be queried.
Hope this makes sense!
I'm attempting to write a custom Theano Op which numerically integrates a function between two values. The Op is a custom likelihood for PyMC3 which involves the numerical evaluation of some integrals. I can't simply use the #as_op decorator as I need to use HMC to do the MCMC step. Any help would be much appreciated, as this question seems to have come up several times but has never been solved (e.g. https://stackoverflow.com/questions/36853015/using-theano-with-numerical-integration, Theano: implementing an integral function).
Clearly one solution would be to write a numerical integrator within Theano, but this seems like a waste of effort when very good integrators are already available, for example through scipy.integrate.
To keep this as a minimal example, let's just try and integrate a function between 0 and 1 inside an Op. The following integrates a Theano function outside of an Op, and produces correct results as far as my testing has gone.
import theano
import theano.tensor as tt
from scipy.integrate import quad
x = tt.dscalar('x')
y = x**4 # integrand
f = theano.function([x], y)
print f(0)
print f(1)
ans = integrate.quad(f, 0, 1)[0]
print ans
However, attempting to do integration within an Op appears much harder. My current best effort is:
import numpy as np
import theano
import theano.tensor as tt
from scipy import integrate
class IntOp(theano.Op):
__props__ = ()
def make_node(self, x):
x = tt.as_tensor_variable(x)
return theano.Apply(self, [x], [x.type()])
def perform(self, node, inputs, output_storage):
x = inputs[0]
z = output_storage[0]
f_to_int = theano.function([x], x)
z[0] = tt.as_tensor_variable(integrate.quad(f_to_int, 0, 1)[0])
def infer_shape(self, node, i0_shapes):
return i0_shapes
def grad(self, inputs, output_grads):
ans = integrate.quad(output_grads[0], 0, 1)[0]
return [ans]
intOp = IntOp()
x = tt.dmatrix('x')
y = intOp(x)
f = theano.function([x], y)
inp = np.asarray([[2, 4], [6, 8]], dtype=theano.config.floatX)
out = f(inp)
print inp
print out
Which gives the following error:
Traceback (most recent call last):
File "stackoverflow.py", line 35, in <module>
out = f(inp)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 912, in rval
r = p(n, [x[0] for x in i], o)
File "stackoverflow.py", line 17, in perform
f_to_int = theano.function([x], x)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 320, in function
output_keys=output_keys)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 390, in pfunc
for p in params]
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 489, in _pfunc_param_to_in
raise TypeError('Unknown parameter type: %s' % type(param))
TypeError: Unknown parameter type: <type 'numpy.ndarray'>
Apply node that caused the error: IntOp(x)
Toposort index: 0
Inputs types: [TensorType(float64, matrix)]
Inputs shapes: [(2, 2)]
Inputs strides: [(16, 8)]
Inputs values: [array([[ 2., 4.],
[ 6., 8.]])]
Outputs clients: [['output']]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "stackoverflow.py", line 30, in <module>
y = intOp(x)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "stackoverflow.py", line 11, in make_node
return theano.Apply(self, [x], [x.type()])
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
I'm surprised by this, especially the TypeError, as I thought I had converted the output_storage variable into a tensor but it appears to believe here that it is still an ndarray.
I found your question because I'm trying to build a random variable in PyMC3 that represents a general point process (Hawkes, Cox, Poisson, etc) and the likelihood function has an integral. I really want to be able to use Hamiltonian Monte Carlo or NUTS samplers, so I needed that integral with respect to time to be differentiable.
Starting off of your attempt, I made an integrateOut theano Op that seems to work correctly with the behavior I need. I've tested it out on a few different inputs (not on my stats model just yet, but it appears promising!). I'm a total theano n00b, so pardon any stupidity. I would greatly appreciate feedback if anyone has any. Not sure it's exactly what you're looking for, but here's my solution (example at the bottom and in the doc strings). *EDIT: simplified some remnants of screwing around with ways to do this.
import theano
import theano.tensor as T
from scipy.integrate import quad
class integrateOut(theano.Op):
"""
Integrate out a variable from an expression, computing
the definite integral w.r.t. the variable specified
!!! Only implemented in this for scalars !!!
Parameters
----------
f : scalar
input 'function' to integrate
t : scalar
the variable to integrate out
t0: float
lower integration limit
tf: float
upper integration limit
Returns
-------
scalar
a new scalar with the 't' integrated out
Notes
-----
usage of this looks like:
x = T.dscalar('x')
y = T.dscalar('y')
t = T.dscalar('t')
z = (x**2 + y**2)*t
# integrate z w.r.t. t as a function of (x,y)
intZ = integrateOut(z,t,0.0,5.0)(x,y)
gradIntZ = T.grad(intZ,[x,y])
funcIntZ = theano.function([x,y],intZ)
funcGradIntZ = theano.function([x,y],gradIntZ)
"""
def __init__(self,f,t,t0,tf,*args,**kwargs):
super(integrateOut,self).__init__()
self.f = f
self.t = t
self.t0 = t0
self.tf = tf
def make_node(self,*inputs):
self.fvars=list(inputs)
# This will fail when taking the gradient... don't be concerned
try:
self.gradF = T.grad(self.f,self.fvars)
except:
self.gradF = None
return theano.Apply(self,self.fvars,[T.dscalar().type()])
def perform(self,node, inputs, output_storage):
# Everything else is an argument to the quad function
args = tuple(inputs)
# create a function to evaluate the integral
f = theano.function([self.t]+self.fvars,self.f)
# actually compute the integral
output_storage[0][0] = quad(f,self.t0,self.tf,args=args)[0]
def grad(self,inputs,grads):
return [integrateOut(g,self.t,self.t0,self.tf)(*inputs)*grads[0] \
for g in self.gradF]
x = T.dscalar('x')
y = T.dscalar('y')
t = T.dscalar('t')
z = (x**2+y**2)*t
intZ = integrateOut(z,t,0,1)(x,y)
gradIntZ = T.grad(intZ,[x,y])
funcIntZ = theano.function([x,y],intZ)
funcGradIntZ = theano.function([x,y],gradIntZ)
print funcIntZ(2,2)
print funcGradIntZ(2,2)
SymPy is proving harder than anticipated, but in the meantime in case anyone's finding this useful, I'll also point out how to modify this Op to allow for changing the final timepoint without creating a new Op. This can be useful if you have a point process, or if you have uncertainty in your time measurements.
class integrateOut2(theano.Op):
def __init__(self, f, int_var, *args,**kwargs):
super(integrateOut2,self).__init__()
self.f = f
self.int_var = int_var
def make_node(self, *inputs):
tmax = inputs[0]
self.fvars=list(inputs[1:])
return theano.Apply(self, [tmax]+self.fvars, [T.dscalar().type()])
def perform(self, node, inputs, output_storage):
# Everything else is an argument to the quad function
tmax = inputs[0]
args = tuple(inputs[1:])
# create a function to evaluate the integral
f = theano.function([self.int_var]+self.fvars, self.f)
# actually compute the integral
output_storage[0][0] = quad(f, 0., tmax, args=args)[0]
def grad(self, inputs, grads):
tmax = inputs[0]
param_grads = T.grad(self.f, self.fvars)
## Recall fundamental theorem of calculus
## d/dt \int^{t}_{0}f(x)dx = f(t)
## So sub in t_max to the graph
FTC_grad = theano.clone(self.f, {self.int_var: tmax})
grad_list = [FTC_grad*grads[0]] + \
[integrateOut2(grad_fn, self.int_var)(*inputs)*grads[0] \
for grad_fn in param_grads]
return grad_list
I always use the following code where I generate B = 10000 samples of n = 30 observations from a normal distribution with µ = 1 and σ 2 = 2.25. For each sample, the parameters µ and σ are estimated and stored in a matrix. I hope this can help you.
loglik <- function(p,z){
sum(dnorm(z,mean=p[1],sd=p[2],log=TRUE))
}
set.seed(45)
n <- 30
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)
B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}
One can also obtain the ML estimate of λ and use the bootstrap to estimate the bias and the standard error of the estimate. First calculate the MLE of λ Then, we estimate the bias and the standard error of λˆ by a nonparametric bootstrap.
B <- 9999
lambda.B <- rep(NA,B)
n <- length(w.time)
for (b in 1:B){
b.sample <- sample(1:n,n,replace=TRUE)
lambda.B[b] <- 1/mean(w.time[b.sample])
}
bias <- mean(lambda.B-m$estimate)
sd(lambda.B)
In the second part we calculate a 95% confidence interval for the mean time between failures.
n <- length(w.time)
m <- mean(w.time)
se <- sd(w.time)/sqrt(n)
interval.1 <- m + se * qnorm(c(0.025,0.975))
interval.1
But we can also use the the assumption that the data are from an exponential distribution. In that case we have varX¯ = 1/(nλ^2) = θ^{2}/n which can be estimated by X¯^{2}/n.
sd.m <- sqrt(m^2/n)
interval.2 <- m + sd.m * qnorm(c(0.025,0.975))
interval.2
We can also estimate the standard error of ˆθ by means of a boostrap procedure. We use the nonparametric bootstrap, that is, we sample from the original sample with replacement.
B <- 9999
m.star <- rep(NA,B)
for (b in 1:B){
m.star[b] <- mean(sample(w.time,replace=TRUE))
}
sd.m.star <- sd(m.star)
interval.3 <- m + sd.m.star * qnorm(c(0.025,0.975))
interval.3
An interval not based on the assumption of normality of ˆθ is obtained by the percentile method:
interval.4 <- quantile(m.star, probs=c(0.025,0.975))
interval.4
I am using the PyGMO package for Python, for multi-objective optimisation. I am unable to fix the dimension of the fitness function in the constructor, and the documentation is not very descriptive either. I am wondering if anyone here has had experience with PyGMO in the past: this could be fairly simple.
I try to construct a minimum example below:
from PyGMO.problem import base
from PyGMO import algorithm, population
import numpy as np
import matplotlib.pyplot as plt
class my_problem(base):
def __init__(self, fdim=2):
NUM_PARAMS = 4
super(my_problem, self).__init__(NUM_PARAMS)
self.set_bounds(0.01, 100)
def _objfun_impl(self, K):
E1 = K[0] + K[2]
E2 = K[1] + K[3]
return (E1, E2, )
if __name__ == '__main__':
prob = my_problem() # Create the problem
print (prob)
algo = algorithm.sms_emoa(gen=100)
pop = population(prob, 50)
pop = algo.evolve(pop)
F = np.array([ind.cur_f for ind in pop]).T
plt.scatter(F[0], F[1])
plt.xlabel("$E_1$")
plt.ylabel("$E_2$")
plt.show()
fdim=2 above is a failed attempt to set the fitness dimension. The code fails with the following error:
ValueError: ..\..\src\problem\base.cpp,584: fitness dimension was changed inside objfun_impl().
I'd be grateful if someone can help figure this out. Thanks!
Are you looking at the correct documentation?
There is no fdim (which anyway does nothing in your example since it is only a local variable and is not used). But there is n_obj:
n_obj: number of objectives. Defaults to 1
So, I think you want something like (corrected thanks to #Distopia):
#(...)
def __init__(self, fdim=2):
NUM_PARAMS = 4
super(my_problem, self).__init__(NUM_PARAMS, 0, fdim)
self.set_bounds(0.01, 100)
#(...)
I modified their example and this seemed to work for me.
#(...)
def __init__(self, fdim=2):
NUM_PARAMS = 4
# We call the base constructor as 'dim' dimensional problem, with 0 integer parts and 2 objectives.
super(my_problem, self).__init__(NUM_PARAMS,0,fdim)
self.set_bounds(0.01, 100)
#(...)
I am trying to solve a multi-objective optimization problem using openMDAO's pyopt-sparse driver with NSGA2 algorithm. Following is the code:
from __future__ import print_function
from openmdao.api import IndepVarComp, Component, Problem, Group, pyOptSparseDriver
class Circ2(Component):
def __init__(self):
super(Circ2, self).__init__()
self.add_param('x', val=10.0)
self.add_param('y', val=10.0)
self.add_output('f1', val=40.0)
self.add_output('f2', val=40.0)
def solve_nonlinear(self, params, unknowns, resids):
x = params['x']
y = params['y']
unknowns['f1'] = (x - 0.0)**2 + (y - 0.0)**2
unknowns['f2'] = (x - 1.0)**2 + (y - 1.0)**2
def linearize(self, params, unknowns, resids):
J = {}
x = params['x']
y = params['y']
J['f1', 'x'] = 2*x
J['f1', 'y'] = 2*y
J['f2', 'x'] = 2*x-2
J['f2', 'y'] = 2*y-2
return J
if __name__ == "__main__":
# Defining Problem & Root
top = Problem()
root = top.root = Group()
# Adding in-present variable values and model
startVal = 50.0
root.add('p1', IndepVarComp('x', startVal))
root.add('p2', IndepVarComp('y', startVal))
root.add('p', Circ2())
# Making Connections
root.connect('p1.x', 'p.x')
root.connect('p2.y', 'p.y')
# Configuring Driver
top.driver = pyOptSparseDriver()
top.driver.options['optimizer'] = 'NSGA2'
# Setting bounds for the optimizer
top.driver.add_desvar('p1.x', lower=-600, upper=600)
top.driver.add_desvar('p2.y', lower=-600, upper=600)
# Setting Objective Function(s)
top.driver.add_objective('p.f1')
top.driver.add_objective('p.f2')
# # Setting up constraints
# top.driver.add_constraint('con.c', lower=1.0)
top.setup()
top.run()
and I am getting the following error -
Traceback (most recent call last):
File "/home/prasad/DivyaManglam/Python Scripts/Pareto Testing/Basic_Sphere.py", line 89, in <module>
top.run()
File "/usr/local/lib/python2.7/dist-packages/openmdao/core/problem.py", line 1038, in run
self.driver.run(self)
File "/usr/local/lib/python2.7/dist-packages/openmdao/drivers/pyoptsparse_driver.py", line 280, in run
sol = opt(opt_prob, sens=self._gradfunc, storeHistory=self.hist_file)
File "/usr/local/lib/python2.7/dist-packages/pyoptsparse/pyNSGA2/pyNSGA2.py", line 193, in __call__
self.optProb.comm.bcast(-1, root=0)
File "MPI/Comm.pyx", line 1276, in mpi4py.MPI.Comm.bcast (src/mpi4py.MPI.c:108819)
File "MPI/msgpickle.pxi", line 620, in mpi4py.MPI.PyMPI_bcast (src/mpi4py.MPI.c:47164)
File "MPI/msgpickle.pxi", line 143, in mpi4py.MPI.Pickle.load (src/mpi4py.MPI.c:41248)
TypeError: only length-1 arrays can be converted to Python scalars
Kindly tell if you can make anything of the above error and how to fix it.
Also, in what form would the output of the multi-objective problem's solution be returned. I intend to generate a pareto-optimal front for this.
Thank you all.
It turns out this is not an OpenMDAO bug, but rather a bug in the Pyopt sparse wrapper for NSGA2. The fix wasn't terrible, and I've submitted a pull request to the pyoptsparse repo for it. In the meantime it would be easy enough to patch your local copy of pyoptsparse.
You asked how the results will be reported. The current NSGA2 wrapper doesn't do anything with the results. It just lets NSGA2 write them all out to a series of text files such as nsga2_best_pop.out that look like this:
# This file contains the data of final feasible population (if found)
# of objectives = 1, # of constraints = 0, # of real_var = 2, # of bits of bin_var = 0, constr_violation, rank, crowding_distance
-1.000000e+00 3.190310e+01 -2.413640e+02 0.000000e+00 1 1.000000e+14
-1.000000e+00 -5.309160e+02 2.449727e+02 0.000000e+00 1 0.000000e+00
-1.000000e+00 -3.995119e+02 -1.829071e+02 0.000000e+00 1 0.000000e+00
As a side note, in your example, you implemented a linearize method for the component in your example. That method is used when you need to compute gradients, but NSGA2 is a gradient free optimizer and won't use it at all. So unless you're also planning to test out some gradient based methods you can leave that method out of your components.
In the following code which is an example of my main code, I have tried to use pathos.multiprocessing to increase the speed of iteration of a loop. The output of each iteration which has implemented with multiprocessing is a 2-D array. I used pathos.multiprocessing instead of multiprocessing since I wanted to use it in my class method. I have used apipe method of the pathos.multiprocessing to collect the output in a list but it returns an empty list. I have no idea why it fails
import numpy as np
import random
import pathos.multiprocessing as mp
class Testsystematics(object):
def __init__(self, x, y, NTH = None, THMIN = None, THMAX = None, NRESAMPLE = None):
self.x = x
self.y = y
self.nbins = NTH
self.bmin = THMIN
self.bmax = THMAX
self.nresample= NRESAMPLE
self.bins = np.linspace(self.bmin, self.bmax, self.nbins+1, True).astype(np.float)
self.sample = np.array([[random.choice(range(len(self.y))) for _ in xrange(len(self.y))] for i in range(self.nresample)])
self.result_list=[]
def log_result(self, result):
self.result_list.append(result)
def bootstrapping(self, k):
xi_p = np.zeros(self.nbins, float)
xi_m = np.zeros(self.nbins, float)
nind = np.zeros(self.nbins, float)
for i in range(len(self.x)):
for j in range(len(self.x)):
if (i!=j):
sep= np.sqrt(self.x[i]**2+self.x[j]**2)
index= np.searchsorted(self.bins, sep , side='right')-1
sind = np.sin(sep)
if ((sep< self.bins[-1]) and (sep>=self.bins[0])):
xi_p[index] += sind*(np.mean(y)-np.median(y))
xi_m[index] += sind*np.std(y)
nind[index] += 1.0
for i in range(self.nbins):
xi_p[i]=xi_p[i]/nind[i]
xi_m[i]=xi_m[i]/nind[i]
return np.vstack((xi_p,xi_m))
def twopcf(self):
if (self.sys_type==1):
pool = mp.ProcessingPool(16)
for n in range(self.nresample):
pool.apipe(self.bootstrapping, args=(n,), callback=self.log_result)
shape,scale=0.5, 0.6
x=np.random.gamma(shape, scale, 10000)
mu1, sigma1 = 0, 0.5 # mean and standard deviation
mu2, sigma2 = 0.1, 0.7 # mean and standard deviation
y = np.random.normal(mu1, sigma1, 1000)+np.random.normal(mu2, sigma2, 1000)
sysTest=Testsystematics(x, y, NTH = 10, THMIN = 0, THMAX = 5, NRESAMPLE = 100)
any suggestion?
I'm the pathos author. I tried your code, and it runs, but produces no error and produces no result in result_list. I believe that is because you are using apipe incorrectly. The correct use of apipe is as follows:
>>> import pathos
>>> def squared(x):
... return x**2
...
>>> pool = pathos.multiprocessing.ProcessingPool()
>>> res = pool.apipe(squared, 5)
>>> res.get()
25
self.bootstrapping takes self and k, so you have to provide a k in the pipe call when you calling it as an instance method. There is no callback -- if you want a callback, you'd need to add one to your function.
Note that the return value is retrieved by (1) getting a return object, and (2) by calling get on the return object.
From you use of apipe within a for loop, that points me to suggest you use pool.amap (or pool.imap) instead -- then you can do the for loop in parallel.