Sometimes, I get wrong solution when I integrate with infinite boundaries in Python. Here is a simple example to illustrate my confusion:
from scipy import *
import numpy as np
from scipy.integrate import quad
def integrand1(x):
output = exp(-(x-1.0)**2.0)
return output
def integrand2(x):
output = exp(-(x-100.0)**2.0)
return output
solution1 = quad(integrand1,-np.inf,np.inf)
print solution1
solution1 = quad(integrand2,-np.inf,np.inf)
print solution2
while the output is:
(1.7724538509055159, 3.668332157626072e-11)
(0.0, 0.0)
I don't understand why the second integral is wrong while the first one avoid to be wrong. It will be great to tell me some tricks to handle infinite in Python.
There is nothing inherently wrong with your code. The results you get are due to the quad algorithm being an approximate method whose accuracy, from what I gathered doing some tests, greatly depends on where the midpoint of the integration interval is located, with respect to the x-axis interval where the integrand is significantly different from 0.
The midpoint of the integration interval in case of a (-inf,+inf) interval is always 0 (see comment to relevant Fortran code here, starting at line 238), and (sadly) cannot be configured. Your integrand2 function is centered on x=100, which is too far from the quad algorithm's midpoint for it to be accurate enough.
It would be nice to be able to specify the midpoint in case of integration between -inf and +inf, but the good news is that you can implement it yourself with function decorators. First you need a wrapper for your integrand function, in order to shift it arbitrarily over the x axis:
def shift_integrand(integrand, offset):
def dec(x):
return integrand(x - offset)
return dec
This generates a new function based on any integrand you like, just shifting it along the x axis according to the offset parameter. So if you do something like this (using your integrand1 and integrand2 functions):
new_integrand1 = shift_integrand(integrand1, -1.0)
print new_integrand1(0.0)
new_integrand2 = shift_integrand(integrand2, -100.0)
print new_integrand2(0.0)
you get:
1.0
1.0
Now you need another wrapper for the quad function, in order to be able to pass a new midpoint:
def my_quad(func, a, b, midpoint=0.0, **kwargs):
if midpoint != 0.0:
func = shift_integrand(func, -midpoint)
return quad(func, a, b, **kwargs)
Finally, knowing where your integrands are centered, you can call it thus:
solution2 = my_quad(integrand2, -np.inf, np.inf, midpoint=100.0)
print solution2
Which yields:
(1.772453850905516, 1.4202639944499085e-08)
Related
I want to integrate the following double integral:
I want to use the dblquad method from the scipy.integrate package, which allows you to do double integrals with limits of the inner integral as a function of the outer integral variable:
import scipy.integrate as spi
import numpy as np
x_limit = 0
y_limit = lambda x: np.arccos(np.cos(x))
integrand = lambda x, y: np.exp(-(2+np.cos(x)-np.cos(y)))
low_limit_y = 0 # inner integral
up_limit_y = y_limit
low_limit_x = x_limit # outer integral
up_limit_x = 2*np.pi-x_limit
integral = spi.dblquad(integrand, low_limit_x, up_limit_x, low_limit_y, up_limit_y)
print(integral)
Output:
(0.6934912861906996, 2.1067956428653226e-12)
The code runs, but does not give me the right answer. Using Wolfram Alpha I get the right answer: 3.58857
Wolfram Alpha method
The only thing I've noticed is that the values from the two methods agree when the signs on the cosines are switched from + to - and vice versa:
Wolfram Alpha method with signs on the cosines swapped
However, I have no plausible reason for why this should be the case. Does anyone have any clue what is going on here? I can separate the function out into the inner integral looping over all values of x and then summing the results which gives the right answer, but that is really quite slow.
Take another look at the docstring of dblquad; it says
Return the double (definite) integral of ``func(y, x)`` from ``x = a..b``
and ``y = gfun(x)..hfun(x)``.
Note the order of arguments of func(y, x): y first, then x.
If you change your definition of integrand to
integrand = lambda y, x: np.exp(-(2+np.cos(x)-np.cos(y)))
you get the expected answer. That is also (in effect) what you did when you changed the signs of the cos terms in the integrand.
(You're not the first one to get tripped up by the expected order of the arguments to func.)
To clarify what I mean, my issue is with a simulated annealing problem where I want to find the theta that gives me the max area of a shape:
def Area(theta):
#returns area
def SimAnneal(space,func, T):
#space is some linspace
#func is some function that takes in some theta and outputs some area
#T = separate temperature parameter that is not relevant for problem
#returns maximum area from given thetas
Simulated annealing starts by choosing a random starting “theta”, in this scenario. My goal is to use the setup above as shown below. It should be noted that the input for area() is a single theta, but my hope was that there was some way to make ? a “potential” list of thetas that the next function, SimAnneal(), can choose from.
x = np.linspace(0,100,1000)
func = area(?)
T = 4
SimAnneal(x,func,T)
What should I put into ? in order for SimAnneal to output correctly.
In other words, is there a ? that can satisfy the condition of being a single float parameter but carry all the possible float parameters in some linspace?
You can use np.vectorize to apply a func taking a single value as follows:
import numpy as np
def Area(theta):
pass
def SimAnneal(space, func, T):
applied_space = np.vectorize(func)(space)
x = np.linspace(0, 100, 1000)
T = 4
SimAnneal(x, Area, T)
Note that np.vectorize won't actually give you performance improvements that we see with actual vectorization. It is instead a convenient interface that exactly fits your need: Applying a func that takes a single value to a bunch of values (your space).
Alternative, you can move the np.vectorize call outside of SimAnneal like this:
def SimAnneal(space, func, T):
applied_space = func(space)
x = np.linspace(0, 100, 1000)
func = np.vectorize(Area)
T = 4
SimAnneal(x, func, T)
This is closer to your original example.
First, there is no data type that is both a float and a collection. Additionally, you want to pass the area function directly into the SimAnneal function rather than the return of a call to it as you currently have it:
SimAnneal(x, area, T)
From a design standpoint, it would make more sense to leave the area function as is taking a single float as a parameter. That being said, it is relatively simple to run a single function through a list and store those outputs with the theta that created it using a technique called Dictionary Comprehensions. In the example below thetas is the list of thetas you want to choose from:
areas = {i: area(i) for i in thetas}
From there you can then search through the new dictionary to find the theta that produced the greatest area:
max_theta = list(areas.keys())[0] # retrieve the first theta
for theta, area in areas.items():
if area > areas[theta]:
max_theta = theta
return theta
I'm writing this program where I have to do a bunch of optimizations. Some with only 1 variable, some with 2. At first I was using the basinhopping algorithm from the scipy.optimize library, but I figured that the normal minimize algorithm should do the job. The basinhopping optimization was working, more or less, but it was extremely time-consuming. Now, I'm using the normal minimize optimization and I've already figured out how to do it for 1 variable. The code for this is given below. I'm using the COBYLA method here, since this one seems to be the only one working. (Nelder-Mead and Powell also work, but sometimes they give back a negative x, which I can't have. And since both of these methods are unconstrained, I can't use them). Hence my first question: what is the difference between all the methods and why do some of the methods converge for my function and others don't?
x0 = [50]
func = lambda x: calculate_score(Sector(center, x, rot), im, msk)
ret = op.minimize(func, x0, method='COBYLA')
print(ret)
The code that I use for the optimization for 2 variables is quite identical to the one for 1 variable, but somehow it gives me the wrong results. Does this have to do with the method I'm using? Or what could be the problem here?
x0 = [50, 50]
func = lambda x: calculate_score(Triangle(center, x[0], x[1], rot), im, msk)
ret = op.minimize(func, x0, method='COBYLA')
print(ret.x[0], ret.x[1])
For the sake of completeness, below is my code for the calculate_score function. I was maybe thinking to calculate the gradient of this function, so that given this gradient the BFGS or L-BFGS-B methods would work, but I'm not quite sure how to do this.
def calculate_score(sect, im, msk):
# find the circle in the image
center, radius = find_circle(im)
# variable to count the score
score = 0
# Loop over all pixels of the detected circle
# This is more time efficient than looping over all pixels
for i in range(0 - radius, radius):
for j in range(0 - radius, radius):
pixel = Point(center.x + i, center.y + j)
# Check if pixel is in given sector
if sect.in_sector(pixel):
# Check if pixel is white
if msk[pixel.y, pixel.x]:
score -= 1 # Decrement score
else:
score += 1 # Increment score
print(score)
return score # Return score as result
In short, what I would like to know is:
Was it a good idea to switch from basinhopping to minimize? (I just thought basinhopping was extremely slow)
Is the method COBYLA I'm using the best one for this specific case?
Why is my result for 1 variable correct, while my result for 2 variables isn't?
I'm trying to obtain the function expected_W or H that is the result of an integration:
where:
theta is a vector with two elements: theta_0 and theta_1
f(beta | theta) is a normal density for beta with mean theta_0 and variance theta_1
q(epsilon) is a normal density for epsilon with mean zero and variance sigma_epsilon (set to 1 by default).
w(p, theta, eps, beta) is a function I take as input, so I cannot predict exactly how it looks. It will likely be non-linear, but not particularly nasty.
This is the way I implement the problem. I'm sure the wrapper functions I make are a mess, so I'd be happy to receive any help on that too.
from __future__ import division
from scipy import integrate
from scipy.stats import norm
import math
import numpy as np
def exp_w(w_B, sigma_eps = 1, **kwargs):
'''
Integrates the w_B function
Input:
+ w_B : the function to be integrated.
+ sigma_eps : variance of the epsilon term. Set to 1 by default
'''
#The integrand function gives everything under the integral:
# w(B(p, \theta, \epsilon, \beta)) f(\beta | \theta ) q(\epsilon)
def integrand(eps, beta, p, theta_0, theta_1, sigma_eps=sigma_eps):
q_e = norm.pdf(eps, loc=0, scale=math.sqrt(sigma_eps))
f_beta = norm.pdf(beta, loc=theta_0, scale=math.sqrt(theta_1))
return w_B(p = p,
theta_0 = theta_0, theta_1 = theta_1,
eps = eps, beta=beta)* q_e *f_beta
#limits of integration. Using limited support for now.
eps_inf = lambda beta : -10 # otherwise: -np.inf
eps_sup = lambda beta : 10 # otherwise: np.inf
beta_inf = -10
beta_sup = 10
def integrated_f(p, theta_0, theta_1):
return integrate.dblquad(integrand, beta_inf, beta_sup,
eps_inf, eps_sup,
args = (p, theta_0, theta_1))
# this integrated_f is the H referenced at the top of the question
return integrated_f
I tested this function with a simple w function for which I know the analytic solution (this won't usually be the case).
def test_exp_w():
def w_B(p, theta_0, theta_1, eps, beta):
return 3*(p*eps + p*(theta_0 + theta_1) - beta)
# Function that I get
integrated = exp_w(w_B, sigma_eps = 1)
# Function that I should get
def exp_result(p, theta_0, theta_1):
return 3*p*(theta_0 + theta_1) - 3*theta_0
args = np.random.rand(3)
d_args = {'p' : args[0], 'theta_0' : args[1], 'theta_1' : args[2]}
if not (np.allclose(
integrated(**d_args)[0], exp_result(**d_args)) ):
raise Exception("Integration procedure isn't working!")
Hence, my implementation seems to be working, but it's very slow for my purpose. I need to repeat this process with tens or hundreds of thousands of times (this is a step in a Value function iteration. I can give more info if people think it's relevant).
With scipy version 0.14.0 and numpy version 1.8.1, this integral takes 15 seconds to compute.
Does anybody have any suggestion on how to go about this?
To start with, tt probably would help to get bounded domains of integration, but I haven't figure out how to do that or if the gaussian quadrature in SciPy takes care of it in a good way (does it use Gauss-Hermite?).
Thanks for your time.
---- Edit: adding profiling times -----
%lprun results gives that most of the time is spent in
_distn_infraestructure.py:1529(pdf) and
_continuous_distns.py:97(_norm_pdf)
each with a whopping 83244 number calls.
The time taken to integrate your function sounds very long if the function is not a nasty one.
First thing I suggest you do is to profile where the time is spent. Is it spent in dblquad or elsewhere? How many calls are made to w_B during the integration? If the time is spent in dblquad and the number of calls is very high, could you use looser tolerances in the integration?
It seems that the multiplication by the gaussians actually enables you to limit the integration limits a great deal, as most of the energy of the gaussian is within a very small area. You might want to try and calculate reasonable tighter bounds. You have already limited the area into -10..10; is there any significant performance change between -100..100, -10..10, and -1..1?
If you know your functions are relatively smooth, then there is a Mickey-Mouse version of the integration:
determine reasonable upper and lower limits in both axes (by the gaussians)
calculate a reasonable grid density (e.g. 100 points in each direction)
calculate the w_B for each of these points (and this will be much faster, if it is possible to require a vectorized version of w_B)
sum it all together
This is very low-tech but also very fast. Whether or not it gives you results which are good enough for the outer iteration is an interesting question. It just might.
I wanted to compute the volume of the intersect of a sphere and infinite cylinder at some distance b, and i figured i would do it using a quick and dirty python script. My requirements are a <1s computation with >3 significant digits.
My thinking was as such:
We place the sphere, with radius R, such that its center is at the origin, and we place the cylinder, with radius R', such that its axis is spanned in z from (b,0,0). We integrate over the sphere, using a step function that returns 1 if we are inside the cylinder, and 0 if not, thus integrating 1 over the set constrained by being inside both sphere and cylinder, i.e. the intersect.
I tried this using scipy.intigrate.tplquad. It did not work out. I think its because of the discontinuity of the step function as i get warnings such the following. Of course, i might just be doing this wrong. Assuming i have not made some stupid mistake, I could attempt to formulate the ranges of the intersect, thus removing the need for the step function, but i figured i might try and get some feedback first. Can anyone spot any mistake, or point towards some simple solution.
Warning: The maximum number of
subdivisions (50) has been achieved.
If increasing the limit yields no
improvement it is advised to analyze
the integrand in order to determine
the difficulties. If the position of
a local difficulty can be
determined (singularity,
discontinuity) one will probably
gain from splitting up the interval
and calling the integrator on the
subranges. Perhaps a special-purpose
integrator should be used.
Code:
from scipy.integrate import tplquad
from math import sqrt
def integrand(z, y, x):
if Rprim >= (x - b)**2 + y**2:
return 1.
else:
return 0.
def integral():
return tplquad(integrand, -R, R,
lambda x: -sqrt(R**2 - x**2), # lower y
lambda x: sqrt(R**2 - x**2), # upper y
lambda x,y: -sqrt(R**2 - x**2 - y**2), # lower z
lambda x,y: sqrt(R**2 - x**2 - y**2), # upper z
epsabs=1.e-01, epsrel=1.e-01
)
R=1
Rprim=1
b=0.5
print integral()
Assuming you are able to translate and scale your data such a way that the origin of the sphere is in [0, 0, 0] and its radius is 1, then a simple stochastic approximation may give you a reasonable answer fast enough. So, something along the lines could be a good starting point:
import numpy as np
def in_sphere(p, r= 1.):
return np.sqrt((p** 2).sum(0))<= r
def in_cylinder(p, c, r= 1.):
m= np.mean(c, 1)[:, None]
pm= p- m
d= np.diff(c- m)
d= d/ np.sqrt(d** 2).sum()
pp= np.dot(np.dot(d, d.T), pm)
return np.sqrt(((pp- pm)** 2).sum(0))<= r
def in_sac(p, c, r_c):
return np.logical_and(in_sphere(p), in_cylinder(p, c, r_c))
if __name__ == '__main__':
n, c= 1e6, [[0, 1], [0, 1], [0, 1]]
p= 2* np.random.rand(3, n)- 2
print (in_sac(p, c, 1).sum()/ n)* 2** 3
Performing a triple adaptive numerical integrations on a discontinuous function that is constant over two domains is a terribly poor idea, especially if you wish to see either speed or accuracy.
I would suggest a far better idea is to reduce the problem analytically.
Align the cylinder with an axis, by transformation. This translates the sphere to some point that is not at the origin.
Now, find the limits of intersection of the sphere with the cylinder along that axis.
Integrate over that axis variable. The area of intersection at any fixed value along the axis is simply the area of intersection of two circles, which in turn is simply computable using trigonometry and a little effort.
In the end, you will have an exact result, with almost no computation time needed.
I solved it using a simple MC integration, as suggested by eat, but my implementation was to slow. My requirements had increased. I therefore reformulated the problem mathematically, as suggested by woodchips.
Basically i formulated the limits of x as a function of z and y, and y as a function of z. Then i, in essence, integrated f(z,y,z)=1 over the intersection, using the limits. I did this because of the speed increase, allowing me to plot volume vs b, and because it allows me to integrate more complex functions with relative minor modification.
I include my code in case anyone is interested.
from scipy.integrate import quad
from math import sqrt
from math import pi
def x_max(y,r):
return sqrt(r**2-y**2)
def x_min(y,r):
return max(-sqrt(r**2 - y**2), -sqrt(R**2 - y**2) + b)
def y_max(r):
if (R<b and b-R<r) or (R>b and b-R>r):
return sqrt( R**2 - (R**2-r**2+b**2)**2/(4.*b**2) )
elif r+R<b:
return 0.
else: #r+b<R
return r
def z_max():
if R>b:
return R
else:
return sqrt(2.*b*R - b**2)
def delta_x(y, r):
return x_max(y,r) - x_min(y,r)
def int_xy(z):
r = sqrt(R**2 - z**2)
return quad(delta_x, 0., y_max(r), args=(r))
def int_xyz():
return quad(lambda z: int_xy(z)[0], 0., z_max())
R=1.
Rprim=1.
b=0.5
print 4*int_xyz()[0]
First off: You can calculate the volume of the intersection by hand. If you don't want to (or can't) do that, here's an alternative:
I'd generate a tetrahedral mesh for the domain and then add up the cell volumes. An example with pygalmesh and meshplex (both authored by myself):
import pygalmesh
import meshplex
import numpy
ball = pygalmesh.Ball([0, 0, 0], 1.0)
cyl = pygalmesh.Cylinder(-1, 1, 0.7, 0.1)
u = pygalmesh.Intersection([ball, cyl])
mesh = pygalmesh.generate_mesh(u, cell_size=0.05, edge_size=0.1)
points = mesh.points
cells = mesh.cells["tetra"]
# kick out unused vertices
uvertices, uidx = numpy.unique(cells, return_inverse=True)
cells = uidx.reshape(cells.shape)
points = points[uvertices]
mp = meshplex.MeshTetra(points, cells)
print(sum(mp.cell_volumes))
This gives you
and prints 2.6567890958740463 as volume. Decrease cell or edge sizes for higher precision.