How to reduce integration time for integration over 2D connected domains - python

I need to compute many 2D integrations over domains that are simply connected (and convex most of the time). I'm using python function scipy.integrate.nquad to do this integration. However, the time required by this operation is significantly large compared to integration over a rectangular domain. Is there any faster implementation possible?
Here is an example; I integrate a constant function first over a circular domain (using a constraint inside the function) and then on a rectangular domain (default domain of nquad function).
from scipy import integrate
import time
def circular(x,y,a):
if x**2 + y**2 < a**2/4:
return 1
else:
return 0
def rectangular(x,y,a):
return 1
a = 4
start = time.time()
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
now = time.time()
print(now-start)
start = time.time()
result = integrate.nquad(rectangular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
now = time.time()
print(now-start)
The rectangular domain takes only 0.00029 seconds, while the circular domain requires 2.07061 seconds to complete.
Also the circular integration gives the following warning:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
**opt)

One way to make the calculation faster is to use numba, a just-in-time compiler for Python.
The #jit decorator
Numba provides a #jit decorator to compile some Python code and output optimized machine code that can be run in parallel on several CPU. Jitting the integrand function only takes little effort and will achieve some time saving as the code is optimized to run faster. One doesn't even have to worry with types, Numba does all this under the hood.
from scipy import integrate
from numba import jit
#jit
def circular_jit(x, y, a):
if x**2 + y**2 < a**2 / 4:
return 1
else:
return 0
a = 4
result = integrate.nquad(circular_jit, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
This runs indeed faster and when timing it on my machine, I get:
Original circular function: 1.599048376083374
Jitted circular function: 0.8280022144317627
That is a ~50% reduction of computation time.
Scipy's LowLevelCallable
Function calls in Python are quite time consuming due to the nature of the language. The overhead can sometimes make Python code slow in comparison to compiled languages like C.
In order to mitigate this, Scipy provides a LowLevelCallable class which can be used to provide access to a low-level compiled callback function. Through this mechanism, Python's function call overhead is bypassed and further time saving can be made.
Note that in the case of nquad, the signature of the cfunc passed to LowerLevelCallable must be one of:
double func(int n, double *xx)
double func(int n, double *xx, void *user_data)
where the int is the number of arguments and the values for the arguments are in the second argument. user_data is used for callbacks that need context to operate.
We can therefore slightly change the circular function signature in Python to make it compatible.
from scipy import integrate, LowLevelCallable
from numba import cfunc
from numba.types import intc, CPointer, float64
#cfunc(float64(intc, CPointer(float64)))
def circular_cfunc(n, args):
x, y, a = (args[0], args[1], args[2]) # Cannot do `(args[i] for i in range(n))` as `yield` is not supported
if x**2 + y**2 < a**2/4:
return 1
else:
return 0
circular_LLC = LowLevelCallable(circular_cfunc.ctypes)
a = 4
result = integrate.nquad(circular_LLC, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
With this method I get
LowLevelCallable circular function: 0.07962369918823242
This is a 95% reduction compared to the original and 90% when compared to the jitted version of the function.
A bespoke decorator
In order to make the code more tidy and to keep the integrand function's signature flexible, a bespoke decorator function can be created. It will jit the integrand function and wrap it into a LowLevelCallable object that can then be used with nquad.
from scipy import integrate, LowLevelCallable
from numba import cfunc, jit
from numba.types import intc, CPointer, float64
def jit_integrand_function(integrand_function):
jitted_function = jit(integrand_function, nopython=True)
#cfunc(float64(intc, CPointer(float64)))
def wrapped(n, xx):
return jitted_function(xx[0], xx[1], xx[2])
return LowLevelCallable(wrapped.ctypes)
#jit_integrand_function
def circular(x, y, a):
if x**2 + y**2 < a**2 / 4:
return 1
else:
return 0
a = 4
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
Arbitrary number of arguments
If the number of arguments is unknown, then we can use the convenient carray function provided by Numba to convert the CPointer(float64) to a Numpy array.
import numpy as np
from scipy import integrate, LowLevelCallable
from numba import cfunc, carray, jit
from numba.types import intc, CPointer, float64
def jit_integrand_function(integrand_function):
jitted_function = jit(integrand_function, nopython=True)
#cfunc(float64(intc, CPointer(float64)))
def wrapped(n, xx):
ar = carray(xx, n)
return jitted_function(ar[0], ar[1], ar[2:])
return LowLevelCallable(wrapped.ctypes)
#jit_integrand_function
def circular(x, y, a):
if x**2 + y**2 < a[-1]**2 / 4:
return 1
else:
return 0
ar = np.array([1, 2, 3, 4])
a = ar[-1]
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=ar)

Related

Numerical integration for matrix values in Python

I am trying to integrate over some matrix entries in Python. I want to avoid loops, because my tasks includes 1 Mio simulations. I am looking for a specification that will efficiently solve my problem.
I get the following error: only size-1 arrays can be converted to Python scalars
from scipy import integrate
import numpy.random as npr
n = 1000
m = 30
x = npr.standard_normal([n, m])
def integrand(k):
return k * x ** 2
integrate.quad(integrand, 0, 100)
This is a simplied example of my case. I have multiple nested functions, such that I cannot simple put x infront of the integral.
Well you might want to use parallel execution for this. It should be quite easy as long as you just want to execute integrate.quad 30000000 times. Just split your workload in little packages and give it to a threadpool. Of course the speedup is limited to the number of cores you have in your pc. I'm not a python programer but this should be possible. You can also increase epsabs and epsrel parameters in the quad function, depending on the implemetation this should speed up the programm as well. Of course you'll get a less precise result but this might be ok depending on your problem.
import threading
from scipy import integrate
import numpy.random as npr
n = 2
m = 3
x = npr.standard_normal([n,m])
def f(a):
for j in range(m):
integrand = lambda k: k * x[a,j]**2
i =integrate.quad(integrand, 0, 100)
print(i) ##write it to result array
for i in range(n):
threading.Thread(target=f(i)).start();
##better split it up even more and give it to a threadpool to avoid
##overhead because of thread init
This is maybe not the ideal solution but it should help a bit. You can use numpy.vectorize. Even the doc says: The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop. But still, a %timeit on the simple example you provided shows a 2.3x speedup.
The implementation is
from scipy import integrate
from numpy import vectorize
import numpy.random as npr
n = 1000
m = 30
x = npr.standard_normal([n,m])
def g(x):
integrand = lambda k: k * x**2
return integrate.quad(integrand, 0, 100)
vg = vectorize(g)
res = vg(x)
quadpy (a project of mine) does vectorized quadrature:
import numpy
import numpy.random as npr
import quadpy
x = npr.standard_normal([1000, 30])
def integrand(k):
return numpy.multiply.outer(x ** 2, k)
scheme = quadpy.line_segment.gauss_legendre(10)
val = scheme.integrate(integrand, [0, 100])
This is much faster than all other answers.

Numba jit with scipy

So I wanted to speed up a program I wrote with the help of numba jit. However jit seems to be not compatible with many scipy functions because they use try ... except ... structures that jit cannot handle (Am I right with this point?)
A relatively simple solution I came up with is to copy the scipy source code I need and delete the try except parts (I already know that it will not run into errors so the try part will always work anyways)
However I do not like this solution and I am not sure if it will work.
My code structure looks like the following
import scipy.integrate as integrate
from scipy optimize import curve_fit
from numba import jit
def fitfunction():
...
#jit
def function(x):
# do some stuff
try:
fit_param, fit_cov = curve_fit(fitfunction, x, y, p0=(0,0,0), maxfev=500)
for idx in some_list:
integrated = integrate.quad(lambda x: fitfunction(fit_param), lower, upper)
except:
fit_param=(0,0,0)
...
Now this results in the following error:
LoweringError: Failed at object (object mode backend)
I assume this is due to jit not being able to handle try except (it also does not work if I only put jit on the curve_fit and integrate.quad parts and work around my own try except structure)
import scipy.integrate as integrate
from scipy optimize import curve_fit
from numba import jit
def fitfunction():
...
#jit
def integral(lower, upper):
return integrate.quad(lambda x: fitfunction(fit_param), lower, upper)
#jit
def fitting(x, y, pzero, max_fev)
return curve_fit(fitfunction, x, y, p0=pzero, maxfev=max_fev)
def function(x):
# do some stuff
try:
fit_param, fit_cov = fitting(x, y, (0,0,0), 500)
for idx in some_list:
integrated = integral(lower, upper)
except:
fit_param=(0,0,0)
...
Is there a way to use jit with scipy.integrate.quad and curve_fit without manually deleting all try except structures from the scipy code?
And would it even speed up the code?
Numba simply is not a general-purpose library to speed code up. There is a class of problems that can be solved in a much faster way with numba (especially if you have loops over arrays, number crunching) but everything else is either (1) not supported or (2) only slightly faster or even a lot slower.
[...] would it even speed up the code?
SciPy is already a high-performance library so in most cases I would expect numba to perform worse (or rarely: slightly better). You might do some profiling to find out if the bottleneck is really in the code that you jitted, then you could get some improvements. But I suspect the bottleneck will be in the compiled code of SciPy and that compiled code is probably already heavily optimized (so it's really unlikely that you find an implementation that could "only" compete with that code).
Is there a way to use jit with scipy.integrate.quad and curve_fit without manually deleting all try except structures from the scipy code?
As you correctly assumed try and except is simply not supported by numba at this time.
2.6.1. Language
2.6.1.1. Constructs
Numba strives to support as much of the Python language as possible, but some language features are not available inside Numba-compiled functions. The following Python language features are not currently supported:
[...]
Exception handling (try .. except, try .. finally)
So the answer here is No.
Nowadays try and except work with numba. However numba and scipy are still not compatible. Yes, Scipy calls compiled C and Fortran, but it does so in a way that numba can't deal with.
Fortunately there are alternatives to scipy that work well with numba! Below I use NumbaQuadpack and NumbaMinpack to do some curve fitting and integration similar to your example code. Disclaimer: i put together these packages. Below, I also give an equivalent implementation in scipy.
The Scipy implementation is ~18 times slower than the Scipy alternatives (NumbaQuadpack and NumbaMinpack).
Using Scipy alternatives (0.23 ms)
from NumbaQuadpack import quadpack_sig, dqags
from NumbaMinpack import minpack_sig, lmdif
import numpy as np
import numba as nb
import timeit
np.random.seed(0)
x = np.linspace(0,2*np.pi,100)
y = np.sin(x)+ np.random.rand(100)
#nb.jit
def fitfunction(x, A, B):
return A*np.sin(B*x)
#nb.cfunc(minpack_sig)
def fitfunction_optimize(u_, fvec, args_):
u = nb.carray(u_,(2,))
args = nb.carray(args_,(200,))
A, B = u
x = args[:100]
y = args[100:]
for i in range(100):
fvec[i] = fitfunction(x[i], A, B) - y[i]
optimize_ptr = fitfunction_optimize.address
#nb.cfunc(quadpack_sig)
def fitfunction_integrate(x, data):
A = data[0]
B = data[1]
return fitfunction(x, A, B)
integrate_ptr = fitfunction_integrate.address
#nb.njit
def fast_function():
try:
neqs = 100
u_init = np.array([2.0,.8],np.float64)
args = np.append(x,y)
fitparam, fvec, success, info = lmdif(optimize_ptr , u_init, neqs, args)
if not success:
raise Exception
lower = 0.0
uppers = np.linspace(np.pi,np.pi*2.0,200)
solutions = np.empty(len(uppers))
for i in range(len(uppers)):
solutions[i], abserr, success = dqags(integrate_ptr, lower, uppers[i], data = fitparam)
if not success:
raise Exception
except:
print('doing something else')
fast_function()
iters = 1000
t_nb = timeit.Timer(fast_function).timeit(number=iters)/iters
print(t_nb)
Using Scipy (4.4 ms)
import scipy.integrate as integrate
from scipy.optimize import curve_fit
import numpy as np
import numba as nb
import timeit
np.random.seed(0)
x = np.linspace(0,2*np.pi,100)
y = np.sin(x)+ np.random.rand(100)
#nb.jit
def fitfunction(x, A, B):
return A*np.sin(B*x)
def function():
try:
p0 = (2.0,.8)
fit_param, fit_cov = curve_fit(fitfunction, x, y, p0=p0, maxfev=500)
lower = 0.0
uppers = np.linspace(np.pi,np.pi*2.0,200)
solutions = np.empty(len(uppers))
for i in range(len(uppers)):
solutions[i], abserr = integrate.quad(fitfunction, lower, uppers[i], args = tuple(fit_param))
except:
print('do something else')
function()
iters = 1000
t_sp = timeit.Timer(function).timeit(number=iters)/iters
print(t_sp)

Numeric Integration Python versus Matlab

My python code takes about 6.2 seconds to run. The Matlab code runs in under 0.05 seconds. Why is this and what can I do to speed up the Python code? Is Cython the solution?
Matlab:
function X=Test
nIter=1000000;
Step=.001;
X0=1;
X=zeros(1,nIter+1); X(1)=X0;
tic
for i=1:nIter
X(i+1)=X(i)+Step*(X(i)^2*cos(i*Step+X(i)));
end
toc
figure(1) plot(0:nIter,X)
Python:
nIter = 1000000
Step = .001
x = np.zeros(1+nIter)
x[0] = 1
start = time.time()
for i in range(1,1+nIter):
x[i] = x[i-1] + Step*x[i-1]**2*np.cos(Step*(i-1)+x[i-1])
end = time.time()
print(end - start)
How to speed up your Python code
Your largest time sink is np.cos which performs several checks on the format of the input.
These are relevant and usually negligible for high-dimensional inputs, but for your one-dimensional input, this becomes the bottleneck.
The solution to this is to use math.cos, which only accepts one-dimensional numbers as input and thus is faster (though less flexible).
Another time sink is indexing x multiple times.
You can speed this up by having one state variable which you update and only writing to x once per iteration.
With all of this, you can speed up things by a factor of roughly ten:
import numpy as np
from math import cos
nIter = 1000000
Step = .001
x = np.zeros(1+nIter)
state = x[0] = 1
for i in range(nIter):
state += Step*state**2*cos(Step*i+state)
x[i+1] = state
Now, your main problem is that your truly innermost loop happens completely in Python, i.e., you have a lot of wrapping operations that eat up time.
You can avoid this by using uFuncs (e.g., created with SymPy’s ufuncify) and using NumPy’s accumulate:
import numpy as np
from sympy.utilities.autowrap import ufuncify
from sympy.abc import t,y
from sympy import cos
nIter = 1000000
Step = 0.001
state = x[0] = 1
f = ufuncify([y,t],y+Step*y**2*cos(t+y))
times = np.arange(0,nIter*Step,Step)
times[0] = 1
x = f.accumulate(times)
This runs practically within an instant.
… and why that’s not what you should worry about
If your exact code (and only that) is what you care about, then you shouldn’t worry about runtime anyway, because it’s very short either way.
If on the other hand, you use this to gauge efficiency for problems with a considerable runtime, your example will fail because it considers only one initial condition and is a very simple dynamics.
Moreover, you are using the Euler method, which is either not very efficient or robust, depending on your step size.
The latter (Step) is absurdly low in your case, yielding much more data than you probably need:
With a step size of 1, You can see what’s going on just fine.
If you want a robust integration in such cases, it’s almost always best to use a modern adaptive integrator, that can adjust its step size itself, e.g., here is a solution to your problem using a native Python integrator:
from math import cos
import numpy as np
from scipy.integrate import solve_ivp
T = 1000
dt = 0.001
x = solve_ivp(
lambda t,state: state**2*cos(t+state),
t_span = (0,T),
t_eval = np.arange(0,T,dt),
y0 = [1],
rtol = 1e-5
).y
This automatically adjusts the step size to something higher, depending on the error tolerance rtol.
It still returns the same amount of output data, but that’s via interpolation of the solution.
It runs in 0.3 s for me.
How to speed up things in a scalable manner
If you still need to speed up something like this, chances are that your derivative (f) is considerably more complex than in your example and thus it is the bottleneck.
Depending on your problem, you may be able to vectorise its calcultion (using NumPy or similar).
If you can’t vectorise, I wrote a module that specifically focusses on this by hard-coding your derivative under the hood.
Here is your example in with a sampling step of 1.
import numpy as np
from jitcode import jitcode,y,t
from symengine import cos
T = 1000
dt = 1
ODE = jitcode([y(0)**2*cos(t+y(0))])
ODE.set_initial_value([1])
ODE.set_integrator("dop853")
x = np.hstack([ODE.integrate(t) for t in np.arange(0,T,dt)])
This runs again within an instant. While this may not be a relevant speed boost here, this is scalable to huge systems.
The difference is jit-compilation, which Matlab uses per default. Let's try your example with Numba(a Python jit-compiler)
Code
import numba as nb
import numpy as np
import time
nIter = 1000000
Step = .001
#nb.njit()
def integrate(nIter,Step):
x = np.zeros(1+nIter)
x[0] = 1
for i in range(1,1+nIter):
x[i] = x[i-1] + Step*x[i-1]**2*np.cos(Step*(i-1)+x[i-1])
return x
#Avoid measuring the compilation time,
#this would be also recommendable for Matlab to have a fair comparison
res=integrate(nIter,Step)
start = time.time()
for i in range(100):
res=integrate(nIter,Step)
end=time.time()
print((end - start)/100)
This results in 0.022s runtime per call.

Optimizing root finding algorithm from scipy

I use the root function from scipy.optimize with the method "excitingmixing" in my code because other methods, like standard Newton, don't converge to the roots I am looking for.
However I would like to optimize my code using numba, which doesn't support the scipy package. I tried to look up the "exciting mixing" algorithm in the documentation to program it myself:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.root.html
I didn't find anything useful except the not really helpful statement that the method "uses a tuned diagonal Jacobian approximation".
I would be glad if someone could tell me something about the algorithm or has an idea on how to optimize the scipy function in an other way.
As requested here is a minimal code example:
import numpy as np
from scipy import optimize
from numba import jit
#jit(nopython = True)
def func(x):
[a, b, c, d] = x
da = a*(1-b)
db = b*(1-c)
dc = c
dd = 1
return [da, db, dc, dd]
#jit(nopython = True)
def getRoot(x0):
solution = optimize.root(func, x0, method="excitingmixing")
return(solution.x)
root = getRoot([0.1,0.1,0.2,0.4])
print(root)
You can look in the source code of scipy to see the implementation of the excitingmixing option:
https://github.com/scipy/scipy/blob/c948e96ebb3454f6a82e9d14021cc601d7ce7a85/scipy/optimize/nonlin.py#L1272
You're likely not going to want to reimplement the entire root finding algorithm in numba. I better strategy you can test is to use numba to optimize the function that you pass to the scipy method. You're still going to pay some overhead of scipy calling a function, but you might see a performance increase if the bottleneck is evaluating the function and that can be done faster with a numba jitted version. I've found it best to just experiment with numba and test with the timeit method.
I wrote a little wrapper Minpack, called NumbaMinpack, which can be called within numba compiled functions: https://github.com/Nicholaswogan/NumbaMinpack.
You should try the lmdif method, if Newton's method is failing you.
from NumbaMinpack import lmdif, hybrd, minpack_sig
from numba import njit, cfunc
import numpy as np
#cfunc(minpack_sig)
def myfunc(x, fvec, args):
fvec[0] = x[0]**2 - args[0]
fvec[1] = x[1]**2 - args[1]
funcptr = myfunc.address # pointer to myfunc
x_init = np.array([10.0,10.0]) # initial conditions
neqs = 2 # number of equations
args = np.array([30.0,8.0]) # data you want to pass to myfunc
#njit
def test():
# solve with lmdif
sol = lmdif(funcptr, x_init, neqs, args)
# OR solve with hybrd
sol = hybrd(funcptr, x_init, args)
return sol
test() # it works!

How can you implement a C callable from Numba for efficient integration with nquad?

I need to do a numerical integration in 6D in python. Because the scipy.integrate.nquad function is slow I am currently trying to speed things up by defining the integrand as a scipy.LowLevelCallable with Numba.
I was able to do this in 1D with the scipy.integrate.quad by replicating the example given here:
import numpy as np
from numba import cfunc
from scipy import integrate
def integrand(t):
return np.exp(-t) / t**2
nb_integrand = cfunc("float64(float64)")(integrand)
# regular integration
%timeit integrate.quad(integrand, 1, np.inf)
10000 loops, best of 3: 128 µs per loop
# integration with compiled function
%timeit integrate.quad(nb_integrand.ctypes, 1, np.inf)
100000 loops, best of 3: 7.08 µs per loop
When I want to do this now with nquad, the nquad documentation says:
If the user desires improved integration performance, then f may be a
scipy.LowLevelCallable with one of the signatures:
double func(int n, double *xx)
double func(int n, double *xx, void *user_data)
where n is the number of extra parameters and args is an array of
doubles of the additional parameters, the xx array contains the
coordinates. The user_data is the data contained in the
scipy.LowLevelCallable.
But the following code gives me an error:
from numba import cfunc
import ctypes
def func(n_arg,x):
xe = x[0]
xh = x[1]
return np.sin(2*np.pi*xe)*np.sin(2*np.pi*xh)
nb_func = cfunc("float64(int64,CPointer(float64))")(func)
integrate.nquad(nb_func.ctypes, [[0,1],[0,1]], full_output=True)
error: quad: first argument is a ctypes function pointer with incorrect signature
Is it possible to compile a function with numba that can be used with nquad directly in the code and without defining the function in an external file?
Thank you very much in advance!
Wrapping the function in a scipy.LowLevelCallable makes nquad happy:
si.nquad(sp.LowLevelCallable(nb_func.ctypes), [[0,1],[0,1]], full_output=True)
# (-2.3958561404687756e-19, 7.002641250699693e-15, {'neval': 1323})
The signature of the function you pass to nquad should be double func(int n, double *xx). You can create a decorator for your function func like so:
import numpy as np
import scipy.integrate as si
import numba
from numba import cfunc
from numba.types import intc, CPointer, float64
from scipy import LowLevelCallable
def jit_integrand_function(integrand_function):
jitted_function = numba.jit(integrand_function, nopython=True)
#cfunc(float64(intc, CPointer(float64)))
def wrapped(n, xx):
return jitted_function(xx[0], xx[1])
return LowLevelCallable(wrapped.ctypes)
#jit_integrand_function
def func(xe, xh):
return np.sin(2*np.pi*xe)*np.sin(2*np.pi*xh)
print(si.nquad(func, [[0,1],[0,1]], full_output=True))
>>>(-2.3958561404687756e-19, 7.002641250699693e-15, {'neval': 1323})

Categories