Background
I've written a python implementation of this answer to my recent question over on Math SE. In short, the problem is as follows:
I have an experimental setup consisting of three receivers, with known rectangular coordinates [xi, yi, zi], and a transmitter with unknown coordinates [x,y,z] emitting a signal at unknown time t with velocity c and arriving at receiver i at known time ti.
This information is insufficient to uniquely determine the rectangular coordinates of the transmitter. We can, however, estimate the angle to the transmitter (i.e. the transmitter's phi and theta in spherical coordinates). This may be done by solving the system of equations given in the linked post.
My goal is to solve these equations for real, experimental data.
Problem
I have been able to write an effective python implementation of the approach described in the linked post. For both simulated and experimental data, this gives satisfactory estimates of the angle to the transmitter.
I now wish to use this in practice. Unfortunately, however, my code runs too slowly to be useful in my desired application.
Ideally, we'd like to be able to solve for on the order of 1 million datapoints per hour in
a near-real-time application. Currently, this takes several hours. While I recognize that, particularly with Python, dramatic performance improvements are not to be expected, any improvement would be helpful.
In short, I'd like to reduce the execution time of this algorithm. Due to the minimal Linux install (and my lack of control over it) being used on the host machine, I'd like to do so by improving my code, and without use of additional modules/external libraries/etc if possible.
Code
import math
import numpy as np
import ROOT
from scipy.optimize import root
from dataclasses import dataclass
from ROOT import TFile, TNtuple
c = 299792
#dataclass
class Vertexer:
roc: list
def fun(self, var, dat):
f0 = var.dot(self.roc[0] - self.roc[1]) - c * (dat[1] - dat[0])
f1 = var.dot(self.roc[1] - self.roc[2]) - c * (dat[2] - dat[1])
n = np.linalg.norm(var) - 1
return [f0, f1, n]
def find(self, dat):
result = root(
self.fun,
(0, 0, 0),
method="lm",
args=dat,
options={
"col_deriv": 1,
"xtol": 1.49012e-08,
"ftol": 1.49012e-08,
"gtol": 0.0,
"maxiter": 0,
"eps": 0.0,
"factor": 100,
"diag": None,
},
)
if result.success:
return result.x
def main():
myVertexer = Vertexer(
[
np.array([3.0085470085470085, 3.7642857142857116, -0.06]),
np.array([2.0034188034188034, 2.0142857142857133, -0.19]),
np.array([1.0324786324786326, 0.27142857142857135, -0.19]),
]
)
data = ROOT.RDataFrame("D", "2018_filtered.root")
colns = data.AsNumpy(columns=["ai", "aj", "ak"])
f = TFile("theta_phi_2.root", "RECREATE")
ntuple = TNtuple("N", "N", "i:j:k:ai:aj:ak")
for (ai, aj, ak) in zip(colns["ai"], colns["aj"], colns["ak"]):
v = myVertexer.find([ai, aj, ak])
if v.any() != None:
ntuple.Fill(v[0], v[1], v[2], ai, aj, ak)
ntuple.Write()
f.Write()
main()
Step-Through
The Vertexer class contains a fairly standard and straightforward SciPy based system of equations solution "algorithm". The function func() contains the system of equations described in the linked post, and find() solves the system given the times of arrival (dat) and the receiver coordinates (roc, provided upon instantiation). I'm using a Levenberg Marquardt solver with coln_deriv set to True in order to improve solution speed. I see similar solution accuracy across all solvers. All other settings are set to default.
The main() function reads time-of-arrival data from a .root file (see) into a dict of NumPy arrays, which I loop over, feed to the algorithm, and record the result in another .root file.
If you like, you may replace the main() function with the following crude "source simulation" code I've written, which simply produces a random point, computes the arrival time of a signal from that point to three randomly-placed "receivers", and feeds those times to the algorithm:
x0 = random.randrange(0,1000); y0 = random.randrange(0,1000); z0 = random.randrange(0,1000)
x1 = random.randrange(0,50); x2 = random.randrange(0,50); x3 = random.randrange(0,50);
y1 = random.randrange(0,50); y2 = random.randrange(0,50); y3 = random.randrange(0,50);
z1 = random.randrange(0,50); z2 = random.randrange(0,50); z3 = random.randrange(0,50);
t1 = math.sqrt((x0-x1)**2 + (y0-y1)**2 + (z0-z1)**2)/c
t2 = math.sqrt((x0-x2)**2 + (y0-y2)**2 + (z0-z2)**2)/c
t3 = math.sqrt((x0-x3)**2 + (y0-y3)**2 + (z0-z3)**2)/c
myVertexer = Vertexer([[x1,y1,z1], [x2,y2,z2], [x3,y3,z3]])
result = myVertexer.find([t1,t2,t3])
You can solve your equations analytically rather than numerically (probably someone will say that this is not the place for this).
Let's analyze your function
def fun(self, var, dat):
f0 = var.dot(self.roc[0] - self.roc[1]) - c * (dat[1] - dat[0])
f1 = var.dot(self.roc[1] - self.roc[2]) - c * (dat[2] - dat[1])
n = np.linalg.norm(var) - 1
return [f0, f1, n]
The solutions of f0 = 0 lie in a plane normal to self.roc[0] - self.roc[1].
The solutions of f1 = 0 lie in a plane normal to self.roc[1] - self.roc[2].
The solutions of n = 0 lies in a sphere.
The intersection of two planes is a line going on the direction give by the cross product between the normal of the two planes.
The intersection between a line and a sphere will either be (a) one tangent point; (b) two points, one entering, one leaving the sphere; (c) no solution if the line passes is far from the sphere.
Related
I'm currently learning how to use FiPy and eventually want to use it to solve some biological problems. I have been trying to implement the following PDE system which describes hyphal growth of fungi:
PDE system
I can implement the system without problems, as long as I ignore that the change of si over time depends on the absolute change of p over space abs(dp/dx) in the fourth equation dsi/dt. Here is my code without abs():
from fipy import *
##### produce mesh
nx= 100.
ny= nx
dx = 1/100
dy = dx
L = nx*dx
mesh = Grid2D(nx=nx,ny=ny,dx=dx,dy=dy)
x,y = mesh.cellCenters
##### parameters
b=1e+7 # branching rate (no. branches * cm^-1 * hyphae * day^-1 * (mol glucose)^-1)
f = 10. # fusion rate (no. fusions in cm * day^-1)
v = 1e+5 # cm * mol^-1 * day^-1
d = 0.5 # day^-1
r = 0. # degradation of hyphae
c1 = 9e+2 # cm * mol^-1 * day^-1
c2 = 1e-7 # mol * cm^-1
c3 = 1e+3 # cm * mol^-1 * day^-1, c1/c3 = 90% efficiency
c4 = 1e-8 # cm^-1
De = 1e-3 # diffusion of external substrate (cm² * s^-1)
Di = 1e-2 # diffusion of internal substrate (cm³ * day^-1)
Da = 0.
##### define state variables:
m = CellVariable(name="m",mesh=mesh,hasOld=True,value=0.)
mi = CellVariable(name="mi",mesh=mesh,hasOld=True,value=0.)
p = CellVariable(name="p",mesh=mesh,hasOld=True,value=0.)
si = CellVariable(name="si",mesh=mesh,hasOld=True,value=0.)
se = CellVariable(name="se",mesh=mesh,hasOld=True,value=0.)
##### differential equations
eqm = (TransientTerm(var=m)== si*v*p - ImplicitSourceTerm(var=m,coeff=d))
eqmi = (TransientTerm(var=mi)==m*d - ImplicitSourceTerm(var=mi,coeff=r))
eqp = (TransientTerm(var=p)== - ConvectionTerm(var=p,coeff=[[v]]*si) + b*si*m - ImplicitSourceTerm(var=p,coeff=f*m))
eqsi = (TransientTerm(var=si)== DiffusionTerm(var=si,coeff=Di*m)- DiffusionTerm(var=p,coeff=Da*m*si) + ImplicitSourceTerm(var=si,coeff=c1*m*se) - ImplicitSourceTerm(var=si,coeff=c2*v*p) - ConvectionTerm(var=p,coeff=[[c4*Da]]*(m*si)))
eqse = (TransientTerm(var=se) == DiffusionTerm(var=se,coeff=De) - ImplicitSourceTerm(var=se,coeff=c3*m*si))
# initial values
m0 = 100.
mi0 = 0.
p0 = 500.
si0 = 1e-5
se0 = 3e-5
r = 3. # radius of initial plug
m.setValue(m0,where=((x>(L/2-(r*dx)))&(x<(L/2+(r*dx)))&(y>(L/2-(r*dy)))&(y<(L/2+(r*dy)))))
mi.setValue(mi0,where=((x>(L/2-(r*dx)))&(x<(L/2+(r*dx)))&(y>(L/2-(r*dy)))&(y<(L/2+(r*dy)))))
p.setValue(p0,where=((x>(L/2-(r*dx)))&(x<(L/2+(r*dx)))&(y>(L/2-(r*dy)))&(y<(L/2+(r*dy)))))
si.setValue(si0,where=((x>(L/2-(r*dx)))&(x<(L/2+(r*dx)))&(y>(L/2-(r*dy)))&(y<(L/2+(r*dy)))))
se.setValue(se0)
# boundary conditions
#----------------
# simulation
eq = eqm & eqmi & eqp & eqsi & eqse
vi = Viewer((m))
from builtins import range
for t in range(100):
m.updateOld()
mi.updateOld()
p.updateOld()
si.updateOld()
se.updateOld()
eq.solve(dt=0.1)
print(t)
vi.plot()
Now, I tried to simply write
eqsi = (TransientTerm(var=si)== DiffusionTerm(var=si,coeff=Di*m)- DiffusionTerm(var=p,coeff=Da*m*si) + ImplicitSourceTerm(var=si,coeff=c1*m*se) - ImplicitSourceTerm(var=si,coeff=c2*v*p) - abs(ConvectionTerm(var=p,coeff=[[c4*Da]]*(m*si))))
which (expectedly) gives out an error: "bad operand type for abs(): 'PowerLawConvectionTerm'"
I tried to work my way around it by adding another CellVariable dpdx:
dpdx= CellVariable(name="dpdx",mesh=mesh,hasOld=True,value=0.)
eqdpdx= (dpdx == ConvectionTerm(var=p,coeff=[[1]]))
eqsi = (TransientTerm(var=si)== DiffusionTerm(var=si,coeff=Di*m)- DiffusionTerm(var=p,coeff=Da*m*si) + ImplicitSourceTerm(var=si,coeff=c1*m*se) - ImplicitSourceTerm(var=si,coeff=c2*v*p) - abs(dpdx)*c4*Da*m*si)
which then gives the error "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", which is probably caused by the fact that eqdpdx is not a derivative function?
My question is: Is it possible at all to perform mathematical operations with a ConvectionTerm? Can I somehow express the absolute change of p?
And also, how can I express non derivative function that change over space like eqdpdx (or any dynamic parameters)?
I looked through the FiPy manual and could not find a solution - I am sorry if my question is trivial or has been answered elsewhere.
It's important to understand what FiPy Terms are. They are human-readable expressions of parts of PDEs that can be discretized to linear algebra. If you apply some potentially non-linear function to that linear algebra, then it's not linear algebra anymore. This use is not supported and I'm not sure how it even could be.
Fortunately, you don't need it. dp/dx is not a ConvectionTerm, it's a gradient. I would write this term as
- ImplicitSourceTerm(coeff=c4*Da*m*p.grad.mag, var=si)
As a side note, 1D equations are the Devil's work. They will lead you astray every single time. You can use nabla notation like we do in the manual or you can use Einstein notation, but always keep clear in your head whether your expression is scalar or vector (or tensor or ...).
I am working on a homework problem. I'm trying to simulate a PID control in Python with Scipy's integrate.solve_ivp() function.
My method is to run the PID code within the right-hand-side of the function, using global variables and appending them to a global matrix at the end of each timestep, like so:
solution = integrate.solve_ivp(rhs, tspan, init, t_eval=teval)
Here is my code:
def rhs(dt, init):
global old_time, omega0dot, rhs_t, omega0dotmat
timestep = dt - old_time
old_time = dt
# UNPACK INITIAL
x = init[0]
y = init[1]
z = init[2]
xdot = init[3]
ydot = init[4]
zdot = init[5]
alpha = init[6]
beta = init[7]
gamma = init[8]
alphadot = init[9]
betadot = init[10]
gammadot = init[11]
# SOLVE EQUATIONS
(xddot, yddot, zddot, alphaddot, betaddot, gammaddot) = dynamics(k_d, k_m, x, y, z, xdot, ydot, zdot, alpha, beta, gamma, alphadot, betadot, gammadot, omega0dot)
# CONTROL SYSTEMS
z_des = 10
err_z = z_des - z
zPID = (1*err_z) + hover
omega0dot = zPID
rhs_t.append(dt)
omega0dotmat.append(omega0dot)
return [xdot, ydot, zdot, xddot, yddot, zddot, alphadot, betadot, gammadot, alphaddot, betaddot, gammaddot]
The global variables are initialized outside this function. You might notice that I am specifically trying to simulate a quadcopter, where the linear and angular motion of the quadrotor are dependent on omega0dot, which represents rotor velocity and which I am trying to control with PID.
My difficulty is with the timestep of integrate.solve_ivp(). Both the integral and derivative part of the PID control rely on the timestep, but the solve_ivp() function has a variable time step and seems to even go backwards in time sometimes, and sometimes makes no timestep (i.e. dt <= 0).
I was wondering if there was a better way to go about this PID control, or if maybe I'm interpreting the dt term in solve_ivp() wrong.
Let's look at a more simple system, the ubiquitous spring with dampening
y'' + c*y' + k*y = u(t)
where u(t) could represent the force exerted by an electro-magnet (which immediately suggests ways to make the system more complicated by introducing a more realistic relation of voltage and magnetic field).
Now in a PID controller we have the error to a reference output e = yr - y and
u(t) = kD*e'(t) + kP*e(t) + kI*integral(e(t))
To treat this with an ODE solver we immediately see that the state needs to be extended with a new component E(t) where E'(t)=e(t). The next difficulty is to implement the derivative of an expression that is not necessarily differentiable. This is achievable by avoiding to differentiate this expression at all by using a non-standard first-order implementation (where the standard would be to use [y,y',E] as state).
Essentially, collect all derived expressions in the formula in their integrated form as
v(t)=y'(t)+c*y-kD*e(t).
Then going back to the derivative one gets the first order system
v'(t) = y''(t) + c*y'(t) - kD*e'(t)
= kP*e(t) + kI*E(t) - k*y(t)
y'(t) = v(t) - c*y(t) + kD*e(t)
E'(t) = e(t)
This now allows to implement the controlled system as ODE system without tricks involving a global memory or similar
def odePID(t,u,params):
c,k,kD,kP,kI = params
y,v,E = u
e = yr(t)-y
return [ v-c*y+kD*e, kP*e+kI*E-k*y, e ]
You should be able to use similar transformations of the first order system in your more complex model.
I'm desperately trying to solve (and display the graph) a system made of nine nonlinear differential equations which model the path of a boomerang. The system is the following:
All the letters on the left side are variables, the others are either constants or known functions depending on v_G and w_z
I have tried with scipy.odeint with no conclusive results (I had this issue but the workaround did not work.)
I begin to think that the problem is linked with the fact that these equations are nonlinear or that the function in denominator might cause a singularity that the scipy solver is simply unable to handle. However, I am not familiar with that sort of mathematical knowledge.
What possibilities python-wise do I have to solve this set of equations?
EDIT : Sorry if I was not clear enough. Since it models the path of a boomerang, my goal is not to solve analytically this system (ie I don't care about the mathematical expression of each function), but rather to get the values of each function for a specific time range (say, from t1 = 0s to t2 = 15s with an interval of 0.01s between each value) in order to display the graph of each function and the graph of the center of mass of the boomerang (X,Y,Z are its coordinates).
Here is the code I tried :
import scipy.integrate as spi
import numpy as np
#Constants
I3 = 10**-3
lamb = 1
L = 5*10**-1
mu = I3
m = 0.1
Cz = 0.5
rho = 1.2
S = 0.03*0.4
Kz = 1/2*rho*S*Cz
g = 9.81
#Initial conditions
omega0 = 20*np.pi
V0 = 25
Psi0 = 0
theta0 = np.pi/2
phi0 = 0
psi0 = -np.pi/9
X0 = 0
Y0 = 0
Z0 = 1.8
INPUT = (omega0, V0, Psi0, theta0, phi0, psi0, X0, Y0, Z0) #initial conditions
def diff_eqs(t, INP):
'''The main set of equations'''
Y=np.zeros((9))
Y[0] = (1/I3) * (Kz*L*(INP[1]**2+(L*INP[0])**2))
Y[1] = -(lamb/m)*INP[1]
Y[2] = -(1/(m * INP[1])) * ( Kz*L*(INP[1]**2+(L*INP[0])**2) + m*g) + (mu/I3)/INP[0]
Y[3] = (1/(I3*INP[0]))*(-mu*INP[0]*np.sin(INP[6]))
Y[4] = (1/(I3*INP[0]*np.sin(INP[3]))) * (mu*INP[0]*np.cos(INP[5]))
Y[5] = -np.cos(INP[3])*Y[4]
Y[6] = INP[1]*(-np.cos(INP[5])*np.cos(INP[4]) + np.sin(INP[5])*np.sin(INP[4])*np.cos(INP[3]))
Y[7] = INP[1]*(-np.cos(INP[5])*np.sin(INP[4]) - np.sin(INP[5])*np.cos(INP[4])*np.cos(INP[3]))
Y[8] = INP[1]*(-np.sin(INP[5])*np.sin(INP[3]))
return Y # For odeint
t_start = 0.0
t_end = 20
t_step = 0.01
t_range = np.arange(t_start, t_end, t_step)
RES = spi.odeint(diff_eqs, INPUT, t_range)
However, I keep getting the same problem as shown here and especially the error message :
Excess work done on this call (perhaps wrong Dfun type)
I am not quite sure what it means but it looks like the solver have troubles solving the system. In any case, when I try to display the 3D path thanks to the XYZ coordinates, I just get 3 or 4 points where there should be something like 2000.
So my questions are : - Am I doing something wrong in my code ?
- If not, is there an other maybe more sophisticated tool to solve this sytem ?
- If not, is it even possible to get what I want from this system of ODEs ?
Thanks in advance
There are several issues:
if I copy the code, it does not run
the workaround you mention does not work with odeint, the given
solution uses ode
The scipy reference for odeint says:"For new code, use
scipy.integrate.solve_ivp to solve a differential equation."
the call RES = spi.odeint(diff_eqs, INPUT, t_range) should be
consistent to the function head def diff_eqs(t, INP) . Mind the
order: RES = spi.odeint(diff_eqs,t_range, INPUT)
There are some issues about to mathematical formulas too:
have a look at the 3rd formula on your picture. It has no tendency term, it starts with a zero - what does that mean ?
it's hard to check wether you have translated the formula correctly into code since the code does not follow the formulas strictly.
Below I tried a solution with scipy solve_ivp. In case A I'm able to run a pendulum, but in case B no meaningful solution for the boomerang can be found. So check the maths, I guess some error in the mathematical expressions.
For the graphics use pandas to plot all variables together (see code below).
import scipy.integrate as spi
import numpy as np
import pandas as pd
def diff_eqs_boomerang(t,Y):
INP = Y
dY = np.zeros((9))
dY[0] = (1/I3) * (Kz*L*(INP[1]**2+(L*INP[0])**2))
dY[1] = -(lamb/m)*INP[1]
dY[2] = -(1/(m * INP[1])) * ( Kz*L*(INP[1]**2+(L*INP[0])**2) + m*g) + (mu/I3)/INP[0]
dY[3] = (1/(I3*INP[0]))*(-mu*INP[0]*np.sin(INP[6]))
dY[4] = (1/(I3*INP[0]*np.sin(INP[3]))) * (mu*INP[0]*np.cos(INP[5]))
dY[5] = -np.cos(INP[3])*INP[4]
dY[6] = INP[1]*(-np.cos(INP[5])*np.cos(INP[4]) + np.sin(INP[5])*np.sin(INP[4])*np.cos(INP[3]))
dY[7] = INP[1]*(-np.cos(INP[5])*np.sin(INP[4]) - np.sin(INP[5])*np.cos(INP[4])*np.cos(INP[3]))
dY[8] = INP[1]*(-np.sin(INP[5])*np.sin(INP[3]))
return dY
def diff_eqs_pendulum(t,Y):
dY = np.zeros((3))
dY[0] = Y[1]
dY[1] = -Y[0]
dY[2] = Y[0]*Y[1]
return dY
t_start, t_end = 0.0, 12.0
case = 'A'
if case == 'A': # pendulum
Y = np.array([0.1, 1.0, 0.0]);
Yres = spi.solve_ivp(diff_eqs_pendulum, [t_start, t_end], Y, method='RK45', max_step=0.01)
if case == 'B': # boomerang
Y = np.array([omega0, V0, Psi0, theta0, phi0, psi0, X0, Y0, Z0])
print('Y initial:'); print(Y); print()
Yres = spi.solve_ivp(diff_eqs_boomerang, [t_start, t_end], Y, method='RK45', max_step=0.01)
#---- graphics ---------------------
yy = pd.DataFrame(Yres.y).T
tt = np.linspace(t_start,t_end,yy.shape[0])
with plt.style.context('fivethirtyeight'):
plt.figure(1, figsize=(20,5))
plt.plot(tt,yy,lw=8, alpha=0.5);
plt.grid(axis='y')
for j in range(3):
plt.fill_between(tt,yy[j],0, alpha=0.2, label='y['+str(j)+']')
plt.legend(prop={'size':20})
I want to use Hawkes process to model some data. I could not find whether PyMC supports Hawkes process. More specifically I want an observed variable with Hawkes Process and learn a posterior on its params.
If it is not there, then could I define it in PyMC in some way e.g. #deterministic etc.??
It's been quite a long time since your question, but I've worked it out on PyMC today so I'd thought I'd share the gist of my implementation for the other people who might get across the same problem. We're going to infer the parameters λ and α of a Hawkes process. I'm not going to cover the temporal scale parameter β, I'll leave that as an exercise for the readers.
First let's generate some data :
def hawkes_intensity(mu, alpha, points, t):
p = np.array(points)
p = p[p <= t]
p = np.exp(p - t)
return mu + alpha * np.sum(p)
def simulate_hawkes(mu, alpha, window):
t = 0
points = []
lambdas = []
while t < window:
m = hawkes_intensity(mu, alpha, points, t)
s = np.random.exponential(scale=1/m)
ratio = hawkes_intensity(mu, alpha, points, t + s)
t = t + s
if t < window:
points.append(t)
lambdas.append(ratio)
else:
break
points = np.sort(np.array(points, dtype=np.float32))
lambdas = np.array(lambdas, dtype=np.float32)
return points, lambdas
# parameters
window = 1000
mu = 8
alpha = 0.25
points, lambdas = simulate_hawkes(mu, alpha, window)
num_points = len(points)
We just generated some temporal points using some functions that I adapted from there : https://nbviewer.jupyter.org/github/MatthewDaws/PointProcesses/blob/master/Temporal%20points%20processes.ipynb
Now, the trick is to create a matrix of size (num_points, num_points) that contains the temporal distance of the ith point from all the other points. So the (i, j) point of the matrix is the temporal interval separating the ith point to the jth. This matrix will be used to compute the sum of the exponentials of the Hawkes process, ie. the self-exciting part. The way to create this matrix as well as the sum of the exponentials is a bit tricky. I'd recommend to check every line yourself so you can see what they do.
tile = np.tile(points, num_points).reshape(num_points, num_points)
tile = np.clip(points[:, None] - tile, 0, np.inf)
tile = np.tril(np.exp(-tile), k=-1)
Σ = np.sum(tile, axis=1)[:-1] # this is our self-exciting sum term
We have points and we have a matrix containg the sum of the excitations term.
The duration between two consecutive events of a Hawkes process follow an exponential distribution of parameter λ = λ0 + ∑ excitation. This is what we are going to model, but first we have to compute the duration between two consecutive points of our generated data.
interval = points[1:] - points[:-1]
We're now ready for inference:
with pm.Model() as model:
λ = pm.Exponential("λ", 1)
α = pm.Uniform("α", 0, 1)
lam = pm.Deterministic("lam", λ + α * Σ)
interarrival = pm.Exponential(
"interarrival", lam, observed=interval)
trace = pm.sample(2000, tune=4000)
pm.plot_posterior(trace, var_names=["λ", "α"])
plt.show()
print(np.mean(trace["λ"]))
print(np.mean(trace["α"]))
7.829
0.284
Note: the tile matrix can become quite large if you have many data points.
I have a function which is actually a call to another program (some Fortran code). When I call this function (run_moog) I can parse 4 variables, and it returns 6 values. These values should all be close to 0 (in order to minimize). However, I combined them like this: np.sum(results**2). Now I have a scalar function. I would like to minimize this function, i.e. get the np.sum(results**2) as close to zero as possible.
Note: When this function (run_moog) takes the 4 input parameters, it creates an input file for the Fortran code that depends on these parameters.
I have tried several ways to optimize this from the scipy docs. But none works as expected. The minimization should be able to have bounds on the 4 variables. Here is an attempt:
from scipy.optimize import minimize # Tried others as well from the docs
x0 = 4435, 3.54, 0.13, 2.4
bounds = [(4000, 6000), (3.00, 4.50), (-0.1, 0.1), (0.0, None)]
a = minimize(fun_mmog, x0, bounds=bounds, method='L-BFGS-B') # I've tried several different methods here
print a
This then gives me
status: 0
success: True
nfev: 5
fun: 2.3194639999999964
x: array([ 4.43500000e+03, 3.54000000e+00, 1.00000000e-01,
2.40000000e+00])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
jac: array([ 0., 0., -54090399.99999981, 0.])
nit: 0
The third parameter changes slightly, while the others are exactly the same. Also there have been 5 function calls (nfev) but no iterations (nit). The output from scipy is shown here.
Couple of possibilities:
Try COBYLA. It should be derivative-free, and supports inequality constraints.
You can't use different epsilons via the normal interface; so try scaling your first variable by 1e4. (Divide it going in, multiply coming back out.)
Skip the normal automatic jacobian constructor, and make your own:
Say you're trying to use SLSQP, and you don't provide a jacobian function. It makes one for you. The code for it is in approx_jacobian in slsqp.py. Here's a condensed version:
def approx_jacobian(x,func,epsilon,*args):
x0 = asfarray(x)
f0 = atleast_1d(func(*((x0,)+args)))
jac = zeros([len(x0),len(f0)])
dx = zeros(len(x0))
for i in range(len(x0)):
dx[i] = epsilon
jac[i] = (func(*((x0+dx,)+args)) - f0)/epsilon
dx[i] = 0.0
return jac.transpose()
You could try replacing that loop with:
for (i, e) in zip(range(len(x0)), epsilon):
dx[i] = e
jac[i] = (func(*((x0+dx,)+args)) - f0)/e
dx[i] = 0.0
You can't provide this as the jacobian to minimize, but fixing it up for that is straightforward:
def construct_jacobian(func,epsilon):
def jac(x, *args):
x0 = asfarray(x)
f0 = atleast_1d(func(*((x0,)+args)))
jac = zeros([len(x0),len(f0)])
dx = zeros(len(x0))
for i in range(len(x0)):
dx[i] = epsilon
jac[i] = (func(*((x0+dx,)+args)) - f0)/epsilon
dx[i] = 0.0
return jac.transpose()
return jac
You can then call minimize like:
minimize(fun_mmog, x0,
jac=construct_jacobian(fun_mmog, [1e0, 1e-4, 1e-4, 1e-4]),
bounds=bounds, method='SLSQP')
It sounds like your target function doesn't have well-behaving derivatives. The line in the output jac: array([ 0., 0., -54090399.99999981, 0.]) means that changing only the third variable value is significant. And because the derivative w.r.t. to this variable is virtually infinite, there is probably something wrong in the function. That is also why the third variable value ends up in its maximum.
I would suggest that you take a look at the derivatives, at least in a few points in your parameter space. Compute them using finite differences and the default step size of SciPy's fmin_l_bfgs_b, 1e-8. Here is an example of how you could compute the derivates.
Try also plotting your target function. For instance, keep two of the parameters constant and let the two others vary. If the function has multiple local optima, you shouldn't use gradient-based methods like BFGS.
How difficult is it to get an analytical expression for the gradient? If you have that you can then approximate the product of Hessian with a vector using finite difference. Then you can use other optimization routines available.
Among the various optimization routines available in SciPy, the one called TNC (Newton Conjugate Gradient with Truncation) is quite robust to the numerical values associated with the problem.
The Nelder-Mead Simplex Method (suggested by Cristián Antuña in the comments above) is well known to be a good choice for optimizing (posibly ill-behaved) functions with no knowledge of derivatives (see Numerical Recipies In C, Chapter 10).
There are two somewhat specific aspects to your question. The first is the constraints on the inputs, and the second is a scaling problem. The following suggests solutions to these points, but you might need to manually iterate between them a few times until things work.
Input Constraints
Assuming your input constraints form a convex region (as your examples above indicate, but I'd like to generalize it a bit), then you can write a function
is_in_bounds(p):
# Return if p is in the bounds
Using this function, assume that the algorithm wants to move from point from_ to point to, where from_ is known to be in the region. Then the following function will efficiently find the furthermost point on the line between the two points on which it can proceed:
from numpy.linalg import norm
def progress_within_bounds(from_, to, eps):
"""
from_ -- source (in region)
to -- target point
eps -- Eucliedan precision along the line
"""
if norm(from_, to) < eps:
return from_
mid = (from_ + to) / 2
if is_in_bounds(mid):
return progress_within_bounds(mid, to, eps)
return progress_within_bounds(from_, mid, eps)
(Note that this function can be optimized for some regions, but it's hardly worth the bother, as it doesn't even call your original object function, which is the expensive one.)
One of the nice aspects of Nelder-Mead is that the function does a series of steps which are so intuitive. Some of these points can obviously throw you out of the region, but it's easy to modify this. Here is an implementation of Nelder Mead with modifications made marked between pairs of lines of the form ##################################################################:
import copy
'''
Pure Python/Numpy implementation of the Nelder-Mead algorithm.
Reference: https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method
'''
def nelder_mead(f, x_start,
step=0.1, no_improve_thr=10e-6, no_improv_break=10, max_iter=0,
alpha = 1., gamma = 2., rho = -0.5, sigma = 0.5):
'''
#param f (function): function to optimize, must return a scalar score
and operate over a numpy array of the same dimensions as x_start
#param x_start (numpy array): initial position
#param step (float): look-around radius in initial step
#no_improv_thr, no_improv_break (float, int): break after no_improv_break iterations with
an improvement lower than no_improv_thr
#max_iter (int): always break after this number of iterations.
Set it to 0 to loop indefinitely.
#alpha, gamma, rho, sigma (floats): parameters of the algorithm
(see Wikipedia page for reference)
'''
# init
dim = len(x_start)
prev_best = f(x_start)
no_improv = 0
res = [[x_start, prev_best]]
for i in range(dim):
x = copy.copy(x_start)
x[i] = x[i] + step
score = f(x)
res.append([x, score])
# simplex iter
iters = 0
while 1:
# order
res.sort(key = lambda x: x[1])
best = res[0][1]
# break after max_iter
if max_iter and iters >= max_iter:
return res[0]
iters += 1
# break after no_improv_break iterations with no improvement
print '...best so far:', best
if best < prev_best - no_improve_thr:
no_improv = 0
prev_best = best
else:
no_improv += 1
if no_improv >= no_improv_break:
return res[0]
# centroid
x0 = [0.] * dim
for tup in res[:-1]:
for i, c in enumerate(tup[0]):
x0[i] += c / (len(res)-1)
# reflection
xr = x0 + alpha*(x0 - res[-1][0])
##################################################################
##################################################################
xr = progress_within_bounds(x0, x0 + alpha*(x0 - res[-1][0]), prog_eps)
##################################################################
##################################################################
rscore = f(xr)
if res[0][1] <= rscore < res[-2][1]:
del res[-1]
res.append([xr, rscore])
continue
# expansion
if rscore < res[0][1]:
xe = x0 + gamma*(x0 - res[-1][0])
##################################################################
##################################################################
xe = progress_within_bounds(x0, x0 + gamma*(x0 - res[-1][0]), prog_eps)
##################################################################
##################################################################
escore = f(xe)
if escore < rscore:
del res[-1]
res.append([xe, escore])
continue
else:
del res[-1]
res.append([xr, rscore])
continue
# contraction
xc = x0 + rho*(x0 - res[-1][0])
##################################################################
##################################################################
xc = progress_within_bounds(x0, x0 + rho*(x0 - res[-1][0]), prog_eps)
##################################################################
##################################################################
cscore = f(xc)
if cscore < res[-1][1]:
del res[-1]
res.append([xc, cscore])
continue
# reduction
x1 = res[0][0]
nres = []
for tup in res:
redx = x1 + sigma*(tup[0] - x1)
score = f(redx)
nres.append([redx, score])
res = nres
Note This implementation is GPL, which is either fine for you or not. It's extremely easy to modify NM from any pseudocode, though, and you might want to throw in simulated annealing in any case.
Scaling
This is a trickier problem, but jasaarim has made an interesting point regarding that. Once the modified NM algorithm has found a point, you might want to run matplotlib.contour while fixing a few dimensions, in order to see how the function behaves. At this point, you might want to rescale one or more of the dimensions, and rerun the modified NM.
–