How to decide between scipy.integrate.simps or numpy.trapz? - python

I have a set of points, of which when I plot I get the graph below. I would like to find the area under the graph, however I am not sure whether scipy.integrate.simps or numpy.trapz is more suitable.
Could someone advice me on the mathematical background between the two functions and thus the conclusion on which function is more accurate?

The trapezoidal rule is the simplest of numerical integration methods. In effect, it estimates the area under a curve by approximating the curve with straight line segments, which only requires two points for each segment. Simpson's rule uses quadratic curves to approximate the function segments instead, each of which requires three points, sampled from your function, to approximate a given segment.
So what is the error associated with using these numerical methods as approximations to an analytical integral?
The error associated with the trapezoidal rule, to leading order, is proportional to h^2[f'(a) - f'(b)]. h is the spacing between sampled points in your function; f'(a) and f'(b) are the first derivative of your function at the beginning and end of the sampling domain.
The error through Simpson's rule, on the other hand, is proportional to h^4[f'''(a) - f'''(b)]. f''' is the third-order derivative in your function.
h is typically small, so h^4 is typically much smaller than h^2!
TLDR: Simpson's rule typically gives far superior results for numerical integration, compared to the trapezoidal rule, with basically no additional computational cost.

Related

How to integrate discrete function

I need to integrate a certain function that I have specified as discrete values for discrete arguments (I want to count the area under the graph I get).
I.e., from the earlier part of the code I have the literal:
args=[a1, a2, a3, a3]
valuses=[v1, v2, v3, v4]
where value v1 corresponds to a1, etc. If it's important, I have args set in advance with a specific discretization width, and I count values with a ready-made function.
I am attaching a figure.
And putting this function, which gave me a 'values' array, into integrate.quad() gives me an error:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used.
How can I integrate this? I'm mulling over the scipy documentation, but I can't seem to put it together. Because, after all, args themselves are already discretized by a finite number.
I am guessing that before passing the integral to quad you did some kind of interpolation on it. In general this is a misguided approach.
Integration and interpolation is very closely related. An integral requires you to compute the area under the curve and thus you must know the value of the function at any given point. Hence, starting from a set of data it is natural to want to interpolate it first. Yet the quad routine does not know that you started with a limited set of data, it just assumes that the function you gave it is "perfect" and it will do its best to compute the area under it! However the interpolated function is just a guess on what the values are between given points and thus integrating an interpolated function is a waste of time.
As MB-F said, in the discrete case you should simply sum up the points while multiplying them by the step size between them. You can do this the naïve way by pretending that the function is just rectangles. Or you can do what MB-F suggested which pretend that all the data points are connected with straight lines. Going one step further is pretending that the line connecting the data points is smooth (often true for physical systems) and use simpson integration implemented by scipy
Since you only have a discrete approximation of the function, integration reduces to summation.
As a simple approximation of an integral, try this:
midpoints = (values[:-1] + values[1:]) / 2
steps = np.diff(args)
area = np.sum(midpoints * steps)
(Assuming args and values are numpy arrays and the function is value = f(arg).)
That approach sums the areas of all trapezoids between adjacent data points (Wikipedia).

Better method to perform numerical integration on acceleration

I have a set of acceleration data points read from the sensor.
I also have the time at which the reading was taken.
How do I numerically integrate to find the instantaneous velocity?
I have tried the following which does give me the result but the I am wondering whether there is a better more accurate method.
v_1=v_0+a*dt
Where dt is calculated from the difference between the times at which the data was measured.
And by iterating the above I could find the instantaneous velocity.
If you only have a number of discrete data points, it is reasonable to assume that the acceleration changes linearly between the data points, i.e.,
When integrating this function, the midpoint rule is completely accurate. (Midpoint is typically better than trapezoidal btw.)
You can get more fancy assuming that the acceleration is continuously differentiable in which case you'd have to construct a quadratic polynomial in each intersection and integrating that, resulting in Simpson's rule.

Solving Second Order ODE in coordinate space

I need to numerically compute the eigenvalues and eigenfunctions of the radial Schrodinger Equation in the case of a 3D and 2D Coulomb potential. The differential equation contains both the first and second derivative of R(r).
The problem is that scipy.integrate.ode or scipy.integrate.odeint require "initial values" for the function and its first derivative - every example I've seen online uses a differential equation in t, so that specifying the initial conditions of the system is trivial and arbitrary.
Since I am considering an ODE in space, not time, I should note that "initial conditions" should be more accurately called "boundary conditions". For my case, however, the function R(r) must only be finite at the origin, but there is no specific value it must take at r=0. Furthermore the derivative of the radial wavefunction, R', is physically meaningless and so constraining (or especially specifying) its value at the origin is nonsensical. The only definite boundary conditions present in the system is that the function must exponentially decay to zero for very large r.
In this case I am considering, instead of making the linspace count up from zero to some large number, make it count backwards from a large value to zero, for which I can set my "initial condition at r -> infinity" such that R and R' are zero.
Anyone else have this issue, and what workaround did you find for it?
Thank you!

What derivatives are used for splines in SciPy.interpolate.interp1d?

I'm trying to find out how the spline interpolation in scipy.interpolate.interp1d decides what the derivatives of the fitting/smoothing function should be. From the documentation, I understand that interp1d fits a spline if an int (or quadratic or cubic) is passed to the kind keyword.
But if I'm not providing any derivative information, how does it decide what the derivatives are? I tried to follow the function calls in the source code, but this only got me to the cryptic spleval function, which just seems to call FITPACK. And I wasn't really sure how to find information about FITPACK from their website...
Spline derivatives at the knot points are not explicitly prescribed, they are determined by continuity/smoothness conditions. I'll take the cubic case as an example. You give n x-values and n y-values. A cubic spline has 4*(n-1) coefficients, 4 on each of (n-1) intervals between the given x-values. These coefficients are determined from the following conditions:
The spline must be continuous at each interior knot: this is (n-2) equations, as there are (n-2) interior knots. We want both pieces to the left and right of a knot to have the same value at the knot.
The first derivative of the spline must be continuous at each interior knot: this is (n-2) equations.
The second derivative of the spline must be continuous at each interior knot: this is (n-2) equations.
The spline must match each of the given y-values: this is n equations.
The total so far is 4*n-6 equations for 4*n-4 unknowns. Two additional equations are needed; the most popular choice is to require the 3rd derivative to be continuous at the leftmost and rightmost interior knots (this is called the "not a knot" condition). Now we have a linear system of size 4*n-4, which can be solved for the coefficients.
The above should not be confused with Hermite interpolation, which is where one prescribes the values of derivatives as well as of the function itself. This is a less common task, and to my knowledge, SciPy does not have a built-in tool for it.

Finding the length of a cubic B-spline

Using scipy's interpolate.splprep function get a parametric spline on parameter u, but the domain of u is not the line integral of the spline, it is a piecewise linear connection of the input coordinates. I've tried integrate.splint, but that just gives the individual integrals over u. Obviously, I can numerically integrate a bunch of Cartesian differential distances, but I was wondering if there was closed-form method for getting the length of a spline or spline segment (using scipy or numpy) that I was overlooking.
Edit: I am looking for a closed-form solution or a very fast way to converge to a machine-precision answer. I have all but given up on the numerical root-finding methods and am now primarily after a closed-form answer. If anyone has any experience integrating elliptical functions or can point me to a good resource (other than Wolfram), That would be great.
I'm going to try Maxima to try to get the indefinite integral of what I believe is the function for one segment of the spline: I cross-posted this on MathOverflow
Because both x & y are cubic parametric functions, there isn't a closed solution in terms of simple functions. Numerical integration is the way to go. Either integrating the arc length expression or simply adding line segment lengths - depends on the accuracy you are after and how much effort you want to exert.
An accurate and fast "Adding length of line segments" method:
Using recurvise subdivision (a form of de Casteljeau's algorithm) to generate points, can give you a highly accurate representation with minimal number of points.
Only subdivide subdivisions if they fail to meet a criteria. Usually the criteria is based on the length joining the control points (the hull or cage).
For cubic, usually comparing closeness of P0P1+P1P2+P2P3 to P0P3, where P0, P1, P2 & P3 are the control points that define your bezier.
You can find some Delphi code here:
link text
It should be relatively easy to convert to Python.
It will generate the points. The code already calculates the length of the segments in order to test the criteria. You can simply accumulate those length values along the way.
You can integrate the function sqrt(x'(u)**2+y'(u)**2) over u, where you calculate the derivatives x' and y' of your coordinates with scipy.interpolate.splev. The integration can be done with one of the routines from scipy.integrate (quad is precise [Clenshaw-Curtis], romberg is generally faster). This should be more precise, and probably faster than adding up lots of small distances (which is equivalent to integrating with the rectangle rule).

Categories