Multiprocessing nested numerical integrals in python - python

I'm working with nested numerical integrals in python where the limits of each layer depends on the next layer out. The overall structure of my code looks like
import numpy as np
import scipy.integrate as si
def func(x1, x2, x3, x4):
return x1**2 - x2**3+x3*x2 - x4*x3**3
def int1():
"""integrates `int2` over x1"""
a1, b1 = -1, 3
def int2(x1):
"""integrates `func` over x2 at given x1."""
#partial_func1 = lambda x2: func(x1, x2)
b2 = 1 - np.abs(x1)
a2 = -np.abs(x1**3)
def int3(x2):
a3 = x2
b3 = -a3
def int4(x3):
partial_func = lambda x4: func(x1, x2, x3, x4)
a4 = 1+np.abs(x3)
b4 = - a4
return si.quad(partial_func,a4,b4)[0]
return si.quad(int4, a3, b3)[0]
return si.quad(int3, a2, b2)[0]
return si.quad(int2, a1, b1)[0]
result = int1() # -22576720.048151683
In the full version of my code, the integral and the limits are complex and it takes several hours to run, which is inconvenient. Each integral seems like it could be easily parallelized though: it seems like I should be able to use multiprocessing to distribute the integration to multiple CPUs and speed up the run time.
Referring to some other posts on stack overflow, I tried the following:
def testfunc(intfunc,fmin,fmax):
return scint.quad(intfun,fmin,fmax,epsabs=10**-40)[0]
result = pool.map(partial(partial(testfunc, intfunc = int4),fmin = a3),[b3])
But I got an error that the local object can't be pickled.
Another resource I came across was at http://catherineh.github.io/programming/2016/10/04/parallel-integration-for-mere-mortals
But I need a function where I can pass the limits through as inputs as well (hence my use of partials).
Does anyone know how to resolve these issues? I think a solution would be some version of pool.map that could handle multiple inputs would be great, but if there's something wrong with my use of partials, that would be great to find out too.
Thanks in advance and let me know if there's anything here that can be cleared up!

This answer probably isn't satisfactory, but hopefully it'll give some insight into what field the question falls.
To reiterate, the original problem is to compute the quadruple integral
integrate(
integrate(
integrate(
integrate(
f(x1, x2, x3, x4),
[1+abs(x3), -1-abs(x3)]
),
[x2, -x2]
),
[1-abs(x1), -x1**3]
),
[-3, 1])
Mathematically, one could formulate this as
integrate(f(x1, x2, x3, x4), Omega)
where Omega is a four-dimensional domain defined by the integral limits above. Had the domain been in one, two, or three dimensions, then the answer to your question would be clear:
Discretize your complex domain into lines, triangles, or tetrahedra (those are the simplices in dimensions 1, 2, 3, respectively) (using one of many mesh tools), and then
use numerical quadrature on each of the lines/triangles/tetrahedra (e.g., from here).
Unfortunately, I'm not aware of any tool that discretizes a four-dimensional domain into 4-simplices, nor of quadrature rules for 4-simplices (except perhaps the vertex and midpoint rules). Both, however, would be possible to create in general; particularly a bunch of quadrature rules should be easy to come up with.
For the sake of completeness, let me mention that there is at least one class of domains for which integration rules exist in any dimension: the hypercube.

Update:
After much testing and restructuring, it seems that the best way to take care of this is not to nest the functions or the definitions, but rather to make use of the args parameter in the scipy.integrate.quad function to pass external variables through to inner integrations.
Many thanks to those who commented!

Related

Finding the minimum distance from a point to a curve

I need to find the minimum distance from a point (X,Y) to a curve defined by four coefficients C0, C1, C2, C3 like y = C0 + C1X + C2X^2 + C3X^3
I have used a numerical approach using np.linspace and np.polyval to generate discrete (X,Y) for the curve and then the shapely 's Point, MultiPoint and nearest_points to find the nearest points, and finally np.linalg.norm to find the distance.
This is a numerical approach by discretizing the curve.
My question is how can I find the distance by analytical methods and code it?
Problem definition
For the sake of simplicity let's use P for the point and Px and Py for the coordinates. Let's call the function f(x).
An other way to look at you're problem is that you're trying to find an x that minimzes the distance between the P and the point (x, f(x))
The problem can then be formulated as a minimization problem.
Find x that minimizes (x-Px)² + (f(x)-Py)²
(Not that we can drop the square root that should be there because square root is a monotonic function and doesn't change the optima. Some details here.)
Analytical solution
The fully analytical way to solve this would be a pen and paper approach. You can develop the equation and compute the derivative, see where they cancel out to find out where extremums are (This will be a lengthy process to do analytically. #Yves Daoust addresses it in his answer. Either do that or use a numerical solver for this part. For example a version of Newton's method should do). Then check if the extremums are maximums or minimums by computing the point and sampling a few points around to check how the function is evolving. From this you can find where the global minimum is and that gives you the x you're looking for. But developing this is content probably better suited for mathematics.
Optimization approach
So instead I'm gonna suggest a solution that uses numerical minimization that doesn't use a sampling approach. You can use the minimize function from scipy to solve the minimization problem.
from math import pow
from scipy.optimize import minimize
# Define function
C0 = -1
C1 = 5
C2 = -5
C3 = 6
f = lambda x: C0 + C1 * x + C2 * pow(x, 2) + C3 * pow(x, 3)
# Define function to minimize
p_x = 12
p_y = -7
min_f = lambda x: pow(x-p_x, 2) + pow(f(x) - p_y, 2)
# Minimize
min_res = minimze(min_f, 0) # The starting point doesn't really matter here
# Show result
print("Closest point is x=", min_res.x[0], " y=", f(min_res.x[0]))
Here I used your function with dummy values but you could use any function you want with this approach.
You need to differentiate (x - X)² + (C0 + C1 x + C2 x² + C3 x³ - Y)² and find the roots. But this is a quintic polynomial (fifth degree) with general coefficients so the Abel-Ruffini theorem fully applies, meaning that there is no solution in radicals.
There is a known solution anyway, by reducing the equation (via a lengthy substitution process) to the form x^5 - x + t = 0 known as the Bring–Jerrard normal form, and getting the solutions (called ultraradicals) by means of the elliptic functions of Hermite or evaluation of the roots by Taylor.
Personal note:
This approach is virtually foolish, as there exist ready-made numerical polynomial root-finders, and the ultraradical function is uneasy to evaluate.
Anyway, looking at the plot of x^5 - x, one can see that it is intersected once or three times by and horizontal, and finding an interval with a change of sign is easy. With that, you can obtain an accurate root by dichotomy (and far from the extrema, Newton will easily converge).
After having found this root, you can deflate the polynomial to a quartic, for which explicit formulas by radicals are known.

Fit a given function

I have a basic question. I want to use scikit-learn to fit a polynomial model to my data. I could do that by PolynomialFeatures but I want to fit a polynomial with some specific form.
For example, if I have 2 features I want to create a model such that:
F = a1 * x1 + a2 * x2 + a3 * x1 * x2 + a4 * x1^2 + a5 * x2^3
Can You please guide me how can I do that? I could not find any example that I can use for my purpose.
I've used the following method to try and fit curves that map to specific functions, you might be able to rework some of this to meet your needs:
First, define your model function as a function that takes a value of x, and some set of parameters and which return an associated y value.
You'll need to be sure that your function really is a function in the mathematical sense (i.e. that they return a single value of y for any input value of x)
This is your "model" function - for example :
# A kind of elliptic curve with two parameters n, m
def mn_elliptical(x,m,n):
return 1-((1 - (x)**(n))**(m))
For 2-dimensional models (i.e. where inputs are x and y, and there's a third output of z) there are ways of formulating your model and input data discussed here: https://scipython.com/blog/non-linear-least-squares-fitting-of-a-two-dimensional-data/ - also, for an example of this in practice, see end of this answer.
Then, using the scipy.optimize.curve_fit method, you need to feed it a pair of arrays, one for the x's and one for the y's of the known observations you have collected, against which the fitting will take place.
xdata = [ ... all x values for all observations ... ]
ydata = [ ... all observed y values in the same order as above ... ]
from scipy.optimize import curve_fit
fitp, fite = curve_fit(mn_elliptical, xdata, ydata))
This will yield fitp the optimal parameters fit by the method, and fite an output describing how much (least squares) error remains after having performed the fit. If your fite values are too big, then it's likely your model function isn't a good one.
You can help guide the process by helping set the expected bounds of the parameters you want to return, and this can speed things up significantly - or, if you've got a skewy function, it can help focus in on the right values that would otherwise get missed. These details are covered in more depth in the linked scipy docs.
Having validated and accepted the amount of error, you can can then retrieve the parameters from fitp and use these to generate additional values of x through your (now fitted) model, and get predicted results.
new_y = mn_elliptical(x, *fitp)
Which will yield a single result - use more advanced numpy/pandas methods to generate multiple results from arrays of x values that you supply.
Just to demonstrate that 2-dimensional use-case, let's imagine a crudely plotted circle, with points A,B,C,D,E at the following xy coordinates (4,1), (6.5,3.5), (4,6), (1.5,3.5), (2,2)
We know that a circle follows the formula (x-cx)^2+(y-cy)^2=r^2, so can write that in a fittable function form:
def circ(xy, cx, cy, r):
x,y = xy
return (((x-cx)**2) + ((y-cy)**2))-(r**2)
Notice that I've flattened the return value to always be zero (at least for values of xy that conform) due to the nature of the formula.
We use the observed data points laid out here:
xdata = np.array([4,6.5,4,1.5,2])
ydata = np.array([1,3.5,6,3.5,2])
zdata = np.array([0,0,0,0,0])
And transform that data based on the method used in the linked article on 2-dimensions.
xdata = np.vstack((xdata.ravel(), ydata.ravel()))
ydata = zdata.ravel()
And then feed this into the 2-d circ function.
curve_fit(circ,xdata,ydata)
This yields:
(array([ 4. , 3.5, -2.5]),
array([[ 0., -0., -0.],
[-0., 0., -0.],
[-0., -0., 0.]]))
The first part of which describes the (cx, cy, r) parameters from my fit circ function, the first two being x,y coordinates of the centre, and the third being the radius of the circle. Based on my pencil on paper drawing, this is pretty much on the money.
The second part describes the errors encountered, which for this hand-drawn example are not horrendous.

Optimization of two vectors in python

I need to do an optimization of two vectors x and y, the objective function is a function of these two vectors f(x,y) and x and y are also related with a-x/y =0, is there a well-known method to solve this on python?
Well, your question is general, it'd be great if you provide more details. But I grabbed a code snippet from here, where you can edit. scipy has a class optimize which has a couple of methods to optimize functions.
import numpy as np
from scipy.optimize import minimize
def f(x,y):
return 10 - x/y
# initial values
x0 = 1.3
y0 = 0.5
res = minimize(f, [x0, y0], method='...')
print(res.x)
If you provide more info like what algorithm you want to use, I can provide more precise code.

Constrained global optimization tuning [mystic]

Background
Auto-tuner for a car application: The application may change depending on model of the car, so naturally the objective function is going to change as well. The problem is to tune the parameters to the optimum ones for the specific car model. Input: car model, output: optimum paramters for the application for the specific car model. I want to solve this with optimization
I'm trying to minimize a complex nonlinear function, constrained with two nonlinear constraints, one inequality and one equality constraint. The problem is not bounded per se but i've put bounds on the parameters anyway to help speed up the optimization since I know more or less where the correct parameters lie. Parameters are: [x0,x1,x2,x3]
I've used the scipy.optimize.minimize() function with SLSQP method and found good results when the problem is bounded correctly. Although, the scipy.optimize.minimize() function is a local optimizer and solves QP problems which I don't think that my problem is. I've therefore started using a global optimization method with mystic,(mystic.differential_evolution). Since I'm not an expert in global optimization I naturally have some questions.
The problem
If I choose the bounds too wide the optimizer (mystic.differential_evolution) will stop iterating after a while and print:
STOP("ChangeOverGeneration with {'tolerance': 0.005, 'generations': 1500}")
When I run the solution that the optimizer found, I see that the result is not as good as if I were to lower(shrink) the bounds. Obviously the global optimizer has not found the global optimum, yet it stopped iterating. I know that there are a multiple parameter sets that yield the same global minimum.
Since the objective function may change with the car model, I want the bounds to remain relativley broad in case the global optimum changes which would change the correct parameters.
Question
How do I tune the settings of the optimizer to get it to keep searching and find the global optimum?
Is the npop = 10*dim rule a good approach to the problem?
Can I make broaden the horizon of the optimizers search algorithm to get it to find the optimal parameters which it missed?
Code
def optimize_mystic_wd(T, param_wp, w_max, safe_f):
# mystic
import mystic
from mystic.monitors import VerboseLoggingMonitor
from mystic.penalty import quadratic_inequality
from mystic.penalty import quadratic_equality
from mystic.solvers import diffev2
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# tools
from mystic.tools import getch
import pylab
import numpy as np
def objective(x):
from model_func import model
[val, _] = model(param_wp, x)
return -val
def penalty1(x): # <= 0.0
t = np.linspace(0, T, 100)
wd = (x[0] * np.sin(x[1] * t + x[3]) + x[2])
index = np.argmax(wd)
t_max = t[index]
return ((x[0] * np.sin(x[1] * t_max + x[3]) + x[2])) -2*np.pi
def penalty2(x): # == 0.0
return x[0] * (np.cos(x[3]) - np.cos(x[1] * T + x[3])) / x[1] + x[2] * T - 2 * np.pi
#quadratic_inequality(penalty1, k=1e12)
#quadratic_equality(penalty2, k=1e12)
def penalty(x):
return 0.0
b1 = (0, 2*np.pi)
b2 = (0, 2 * np.pi/(2*T))
b3 = (0, 2*np.pi)
b4 = (0, 2*np.pi/T)
bounds = [b1, b2, b3, b4]
stepmon = VerboseLoggingMonitor(1,1)
result = diffev2(objective, x0=bounds, bounds=bounds, penalty=penalty, npop=40, gtol=1500, disp=True, full_output=True, itermon=stepmon, handler=True, retall=True, maxiter=4000)
I'm the mystic author.
With regard to your questions:
Differential evolution can be tricky. It randomly mutates your candidate solution vector, and accepts changes that improve the cost. The default stop condition is that it quits when ngen number of steps have occurred with no improvement. This means that if the solver stops early, it's probably not even in a local minimum. There are several ways however, to help ensure the solver has a better chance of finding the global minimum.
Increase ngen, the number of steps to go without improvement.
Increase npop, the number of candidate solutions each iteration.
Increase the maximum number of iterations and function evaluations possible.
Pick a different termination condition that doesn't use ngen.
Personally, I usually use a very large ngen as the first approach. The consequence is that the solver will tend to run a very long time until it randomly finds the global minimum. This is expected for differential evolution.
Yes.
I'm not sure what you mean by the last question. With mystic, you certainly can broaden your parameter range either at optimizer start, or any point along the way. If you use the class interface (DifferentialEvolutionSolver not the "one-liner" diffev), then you have the option to:
Save the solver's state at any point in the process
Restart the solver with different solver parameters, including range.
Step the optimizer through the optimization, potentially changing the range (or constraints, or penalties) at any step.
Restrict (or remove restrictions on) the range of the solver by adding (or removing) constraints or penalties.
Lastly, you might want to look at mystic's ensemble solvers, which enable you to sample N optimizers from a distribution, each with different initial conditions. In this case, you'd pick fast local solvers... with the intent of quickly searching the local space, but sampling over the distribution helping guarantee you have searched globally. It's like a traditional grid search, but having optimizers start at each point of the "grid" (and using a distribution, and not necessarily a grid).
I might also suggest having a look at this example, which demonstrates how to use mystic.search.Searcher, whose purpose is to (for example) efficiently keep spawning solvers looking for local minima until you have found all the local minima, and hence the global minimum.

Performing a double integral over a matrix of limits

I have recently been learning how to perform double integrals in python. This is what I am using:
myint = dblquad(lambda xa,xb: np.exp(-(xa-xb)**2),-np.inf,x1,lambda x: -np.inf, lambda x: x2)
and for testing purposes I have chosen x1 and x2 to be say 5 and 10. This seems to work.
But in reality, my x1 = [1,2,3,4,5] and x2 = [5,6,7,8,9] and I want the double integral to be performed over every combination of x1 and x2 i.e. a matrix. I could do this with 2 for loops I guess, but I thought there might be a better way.
So my question is just - how do I perform a double integration over a matrix of limits.
Thank you.
edit:
I got the following warning:
UserWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
Does this mean that it doesn't converge? I don't really understand the message.
When I plot:
y = exp(-(x-5)^2)
for example, it just looks like a gaussian curve, so there is no problem integrating over that right? Is the problem because of the double integral?
Thank you.
edit:
Ah, I see. Thanks Raman Shah, I understand the problem now.
Using itertools you can create an iterator of limits to walk over. This essentially is a double loop, but for more extensible as you can have an arbitrary number of inputs with itertools.product and you don't store all the limits at once:
import numpy as np
from scipy.integrate import dblquad
import itertools
f = lambda xa,xb: np.exp(-(xa-xb)**2)
intg = lambda (x1,x2): dblquad(f,-np.inf,x1,
lambda x:-np.inf,
lambda x:x2)
X1 = np.arange(1,6)
X2 = np.arange(5,10)
for limit in itertools.product(X1,X2):
print limit, intg(limit)
If you need more speed, you can look into the multiprocessing module for parallel computation since each process is independent.
Why not use pythons zip function to feed exactly the values from each tuple that you want treated as inputs to your double integral, and then use map/apply to operate on those discrete pairs

Categories