Finding the length of a cubic B-spline - python

Using scipy's interpolate.splprep function get a parametric spline on parameter u, but the domain of u is not the line integral of the spline, it is a piecewise linear connection of the input coordinates. I've tried integrate.splint, but that just gives the individual integrals over u. Obviously, I can numerically integrate a bunch of Cartesian differential distances, but I was wondering if there was closed-form method for getting the length of a spline or spline segment (using scipy or numpy) that I was overlooking.
Edit: I am looking for a closed-form solution or a very fast way to converge to a machine-precision answer. I have all but given up on the numerical root-finding methods and am now primarily after a closed-form answer. If anyone has any experience integrating elliptical functions or can point me to a good resource (other than Wolfram), That would be great.
I'm going to try Maxima to try to get the indefinite integral of what I believe is the function for one segment of the spline: I cross-posted this on MathOverflow

Because both x & y are cubic parametric functions, there isn't a closed solution in terms of simple functions. Numerical integration is the way to go. Either integrating the arc length expression or simply adding line segment lengths - depends on the accuracy you are after and how much effort you want to exert.
An accurate and fast "Adding length of line segments" method:
Using recurvise subdivision (a form of de Casteljeau's algorithm) to generate points, can give you a highly accurate representation with minimal number of points.
Only subdivide subdivisions if they fail to meet a criteria. Usually the criteria is based on the length joining the control points (the hull or cage).
For cubic, usually comparing closeness of P0P1+P1P2+P2P3 to P0P3, where P0, P1, P2 & P3 are the control points that define your bezier.
You can find some Delphi code here:
link text
It should be relatively easy to convert to Python.
It will generate the points. The code already calculates the length of the segments in order to test the criteria. You can simply accumulate those length values along the way.

You can integrate the function sqrt(x'(u)**2+y'(u)**2) over u, where you calculate the derivatives x' and y' of your coordinates with scipy.interpolate.splev. The integration can be done with one of the routines from scipy.integrate (quad is precise [Clenshaw-Curtis], romberg is generally faster). This should be more precise, and probably faster than adding up lots of small distances (which is equivalent to integrating with the rectangle rule).

Related

How to integrate discrete function

I need to integrate a certain function that I have specified as discrete values for discrete arguments (I want to count the area under the graph I get).
I.e., from the earlier part of the code I have the literal:
args=[a1, a2, a3, a3]
valuses=[v1, v2, v3, v4]
where value v1 corresponds to a1, etc. If it's important, I have args set in advance with a specific discretization width, and I count values with a ready-made function.
I am attaching a figure.
And putting this function, which gave me a 'values' array, into integrate.quad() gives me an error:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used.
How can I integrate this? I'm mulling over the scipy documentation, but I can't seem to put it together. Because, after all, args themselves are already discretized by a finite number.
I am guessing that before passing the integral to quad you did some kind of interpolation on it. In general this is a misguided approach.
Integration and interpolation is very closely related. An integral requires you to compute the area under the curve and thus you must know the value of the function at any given point. Hence, starting from a set of data it is natural to want to interpolate it first. Yet the quad routine does not know that you started with a limited set of data, it just assumes that the function you gave it is "perfect" and it will do its best to compute the area under it! However the interpolated function is just a guess on what the values are between given points and thus integrating an interpolated function is a waste of time.
As MB-F said, in the discrete case you should simply sum up the points while multiplying them by the step size between them. You can do this the naïve way by pretending that the function is just rectangles. Or you can do what MB-F suggested which pretend that all the data points are connected with straight lines. Going one step further is pretending that the line connecting the data points is smooth (often true for physical systems) and use simpson integration implemented by scipy
Since you only have a discrete approximation of the function, integration reduces to summation.
As a simple approximation of an integral, try this:
midpoints = (values[:-1] + values[1:]) / 2
steps = np.diff(args)
area = np.sum(midpoints * steps)
(Assuming args and values are numpy arrays and the function is value = f(arg).)
That approach sums the areas of all trapezoids between adjacent data points (Wikipedia).

Computing intersection of a function with a specific interval using scipy

I'm stuck trying to get functions that are existent in scipy (or sympy) for the following task:
Suppose we are given the following function:
f(A,B,C) = k1-A*sin(B*k2-C)
for each of the axis A,B,C of the space we have a specific interval, like [a_lb, a_ub], [b_lb, b_ub], [c_lb, c_ub], [d_lb, d_ub].
Which functions of scipy can be used to compute if the space encompassed by the boundaries is intersected by the given function? I thought of like e.g. computing the Hessian matrix.
Thank you for hints
Best regards
If I understand correctly, what you are looking for is an answer to whether f(A,B,C) bounded in the domain [a_l,a_u]x[b_l,b_u]x[c_l,c_u] has a value within [d_l,d_u]. You can try using scipy.optimize.minimize for this.
If you run scipy.optimize.minimize on f with the bounds [a_l,a_u]x[b_l,b_u]x[c_l,c_u], you should get the minimal value of f in the domain. Similarly, minimizing -f will give you the maximal value of f in the domain. f intersects the given boundary if and only if the interval [fmin, fmax] intersects the interval [d_l,d_u].
Note that scipy.optimize.minimize is a non-linear optimization and therefore requires an initial guess. The middle point of the domain box is a natural choice, but since the non-linear optimization may encounter a local minimum (or not converge), you may want to try several other initial guesses as well. scipy.optimize.minimize has many (optional) parameters so I recommend you read its documentation and play with them to fine-tune your usage to your needs.

How to decide between scipy.integrate.simps or numpy.trapz?

I have a set of points, of which when I plot I get the graph below. I would like to find the area under the graph, however I am not sure whether scipy.integrate.simps or numpy.trapz is more suitable.
Could someone advice me on the mathematical background between the two functions and thus the conclusion on which function is more accurate?
The trapezoidal rule is the simplest of numerical integration methods. In effect, it estimates the area under a curve by approximating the curve with straight line segments, which only requires two points for each segment. Simpson's rule uses quadratic curves to approximate the function segments instead, each of which requires three points, sampled from your function, to approximate a given segment.
So what is the error associated with using these numerical methods as approximations to an analytical integral?
The error associated with the trapezoidal rule, to leading order, is proportional to h^2[f'(a) - f'(b)]. h is the spacing between sampled points in your function; f'(a) and f'(b) are the first derivative of your function at the beginning and end of the sampling domain.
The error through Simpson's rule, on the other hand, is proportional to h^4[f'''(a) - f'''(b)]. f''' is the third-order derivative in your function.
h is typically small, so h^4 is typically much smaller than h^2!
TLDR: Simpson's rule typically gives far superior results for numerical integration, compared to the trapezoidal rule, with basically no additional computational cost.

Most efficient method of returning coefficients for a fit in Python for use in another languages?

So, I have the following data I've plotted in Python.
The data is input for a forcing term in a system of differential equations I am working with. Thus, I need to fit a continuous function to this data so I will not have to deal with stability issues that could come with discontinuities of a step-wise function. Unfortunately, it's a pretty large data set.
I am trying to end up with a fitted function that is possible and not too tedious to translate into Stan, the language that I am coding the differential equations in, so was preferring something in piece-wise polynomial form with a maximum of just a few pieces that I can manually code.
I started off with polyfit from numpy, which was not very good. Using UnivariateSpline from scipy gave me a decent fit, but it did not give me something that looked tractable for translation into Stan. Hence, I was looking for suggestions into other fits I could try that would return functions that are more easily translatable into other languages? Looking at the shape of my data, is there a periodic spline fit that could be useful?
The UnivariateSpline object has get_knots and get_coeffs methods. They give you the knots and coefficients of the fit in the b-spline basis.
An alternative, equivalent, way is to use splrep for fitting (and splev for evaluations).
To convert to a piecewise polynomial representation, use PPoly.from_spline (check the docs for the latter for the exact format)
If what you want is a Fourier space representation, you can use leastsq or least_squares. It'd be essential to provide sensible starting values for NLSQ fit parameters. At least I'd start from e.g. max-to-max distance estimate for the period and max-to-min estimate for the amplitude.
As always with non-linear fitting, YMMV, however.
From the direction field, it seems that a fit involving the sum of or composition of multiple sinusoidal functions might be it.
Ex: sin(cos(2x)), sin(x)+2cos(x), etc.
I would use Wolfram Alpha, Mathematica, or Matlab to create direction fields.

Python: Plotting a power law function with exponential cutoff

I have a graph between 2 functions f and g.
I know it follows a power law function with exponential cutoff.
f(x) = x**(-alpha)*e**(-lambda*x)
How do I find the value of exponent alpha?
If you have sufficiently close x points (for example one every 0.1), you can try the following:
ln(f(x)) = -alpha ln(x) - lambda x
ln(f(x))' = - alpha / x - lambda
So depending on where you have your points:
If you have a lot of points near 0, you can try:
h(x) = x ln(f(x))' = -alpha - lambda x
So the limit of the function h when x goes to 0 is -alpha
If you have large values of x, the function x -> ln(f(x))' tends toward lambda when x goes to infinity, so you can guess lambda and use pwdyson's expression.
If you don't have close x points, the numerical derivative will be very noisy, so I would try to guess lambda as the limit of -ln(f(x)/x for large x's...
If you don't have large values, but a large number of x's, you can try a minimization of
sum_x_i (ln(y_i) + alpha ln(x_i) + lambda x_i) ^2
on both alpha and lambda (I guess It would be more precise than the initial expression)...
It is a simple least square regression (numpy.linalg.lstsq will do the job).
So you have plenty of methods, the one to chose really depends on you inputs.
The usual and general way of doing what you want is to perform a non-linear regression (even though, as noted in another response, it is possible to linearize the problem). Python can do this quite easily with the help of the SciPy package, which is used by many scientists.
The routine you are looking for is its least-square optimization routine (scipy.optimize.leastsq). Once you wrap your head around the way this general optimization procedure works (see the example), you will probably find many other opportunities to use it. Basically, you calculate the list of differences between your measurements and their ideal value f(x), and you ask SciPy to find the parameters that make these differences as small as possible, so that your data fits the model as well as possible. This then gives you the parameter you are looking for.
It sounds like you might be trying to fit a power-law to a distribution with an exponential cutoff at the low end due to incompleteness - but I may be reading too far into your problem.
If that is the problem you're dealing with, this website (and accompanying publication) addresses the issue: http://tuvalu.santafe.edu/~aaronc/powerlaws/. I wrote the python implementation of the power-law fitter on that page; it is linked from there.
If you know that the points follow this law exactly, then invert the equation and put in an x and its corresponding f(x) value:
import math
alpha = -(lambda*x + math.log(f(x)))/math.log(x)
But the if the points do not exactly fit the equation you will need to do some sort of regression to determine alpha.
EDIT: Ok, so they don't fit exactly. This is getting beyond a Python question, but there may be something in numpy that can handle it. Here is a numpy linear regression recipe but your equation can't be rearranged into a linear form, so you'll have to look into non-linear regression.

Categories