Reverse output of polyfit numpy - python

I have used numpy's polyfit and obtained a very good fit (using a 7th order polynomial) for two arrays, x and y. My relationship is thus;
y(x) = p[0]* x^7 + p[1]*x^6 + p[2]*x^5 + p[3]*x^4 + p[4]*x^3 + p[5]*x^2 + p[6]*x^1 + p[7]
where p is the polynomial array output by polyfit.
Is there a way to reverse this method easily, so I have a solution in the form of,
x(y) = p[0]*y^n + p[1]*y^n-1 + .... + p[n]*y^0

No there is no easy way in general. Closed form-solutions for arbitrary polynomials are not available for polynomials of the seventh order.
Doing the fit in the reverse direction is possible, but only on monotonically varying regions of the original polynomial. If the original polynomial has minima or maxima on the domain you are interested in, then even though y is a function of x, x cannot be a function of y because there is no 1-to-1 relation between them.
If you are (i) OK with redoing the fitting procedure, and (ii) OK with working piecewise on single monotonic regions of your fit at a time, then you could do something like this:
-
import numpy as np
# generate a random coefficient vector a
degree = 1
a = 2 * np.random.random(degree+1) - 1
# an assumed true polynomial y(x)
def y_of_x(x, coeff_vector):
"""
Evaluate a polynomial with coeff_vector and degree len(coeff_vector)-1 using Horner's method.
Coefficients are ordered by increasing degree, from the constant term at coeff_vector[0],
to the linear term at coeff_vector[1], to the n-th degree term at coeff_vector[n]
"""
coeff_rev = coeff_vector[::-1]
b = 0
for a in coeff_rev:
b = b * x + a
return b
# generate some data
my_x = np.arange(-1, 1, 0.01)
my_y = y_of_x(my_x, a)
# verify that polyfit in the "traditional" direction gives the correct result
# [::-1] b/c polyfit returns coeffs in backwards order rel. to y_of_x()
p_test = np.polyfit(my_x, my_y, deg=degree)[::-1]
print p_test, a
# fit the data using polyfit but with y as the independent var, x as the dependent var
p = np.polyfit(my_y, my_x, deg=degree)[::-1]
# define x as a function of y
def x_of_y(yy, a):
return y_of_x(yy, a)
# compare results
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(my_x, my_y, '-b', x_of_y(my_y, p), my_y, '-r')
Note: this code does not check for monotonicity but simply assumes it.
By playing around with the value of degree, you should see that see the code only works well for all random values of a when degree=1. It occasionally does OK for other degrees, but not when there are lots of minima / maxima. It never does perfectly for degree > 1 because approximating parabolas with square-root functions doesn't always work, etc.

Related

Using a forloop to solve coupled differential equations in python

I am trying to solve a set of differential equations, but I have been having difficulty making this work. My differential equations contain an "i" subscript that represents numbers from 1 to n. I tried implementing a forloop as follows, but I have been getting this index error (the error message is below). I have tried changing the initial conditions (y0) and other values, but nothing seems to work. In this code, I am using solve_ivp. The code is as follows:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.integrate import solve_ivp
def testmodel(t, y):
X = y[0]
Y = y[1]
J = y[2]
Q = y[3]
a = 3
S = 0.4
K = 0.8
L = 2.3
n = 100
for i in range(1,n+1):
dXdt[i] = K**a+(Q[i]**a) - S*X[i]
dYdt[i] = (K*X[i])-(L*Y[i])
dJdt[i] = S*Y[i]-(K*Q[i])
dQdt[i] = K*X[i]/L+J[i]
return dXdt, dYdt, dJdt, dQdt
t_span= np.array([0, 120])
times = np.linspace(t_span[0], t_span[1], 1000)
y0 = 0,0,0,0
soln = solve_ivp(testmodel, t_span, y0, t_eval=times,
vectorized=True)
t = soln.t
X = soln.y[0]
Y = soln.y[1]
J = soln.y[2]
Q = soln.y[3]
plt.plot(t, X,linewidth=2, color='red')
plt.show()
The error I get is
IndexError Traceback (most recent call last)
<ipython-input-107-3a0cfa6e42ed> in testmodel(t, y)
15 n = 100
16 for i in range(1,n+1):
--> 17 dXdt[i] = K**a+(Q[i]**a) - S*X[i]
IndexError: index 1 is out of bounds for axis 0 with size 1
I have scattered the web for a solution to this, but I have been unable to apply any solution to this problem. I am not sure what I am doing wrong and what to actually change.
I have tried to remove the "vectorized=True" argument, but then I get an error that states I cannot index scalar variables. This is confusing because I do not think these values should be scalar. How do I resolve this problem, my ultimate goal is to plot these differential equations. Thank you in advance.
It is nice that you provide the standard solver with a vectorized ODE function for multi-point evalutions. But the default method is the explicit RK45, and explicit methods do not use Jacobi matrices. So there is no need for multi-point evaluations for difference quotients for the partial derivatives.
In essence, the coordinate arrays always have size 1, as the evaluation is at a single point, so for instance Q is an array of length 1, the only valid index is 0. Remember, in all "true" programming languages, array indices start at 0. It is only some CAS script languages that use the "more mathematical" 1 as index start. (Setting n=100 and ignoring the length of the arrays provided by the solver is wrong as well.)
You can avoid all that and shorten your routine by taking into account that the standard arithmetic operations are applied element-wise for numpy arrays, so
def testmodel(t, y):
X,Y,J,Q = y
a = 3; S = 0.4; K = 0.8; L = 2.3
dXdt = K**a + Q**a - S*X
dYdt = K*X - L*Y
dJdt = S*Y - K*Q
dQdt = K*X/L + J
return dXdt, dYdt, dJdt, dQdt
Modifying your code for multiple compartments with the same dynamic
You need to pass the solver a flat vector of the state. The first design decision is how the compartments and their components are arranged in the flat vector. One variant that is most compatible with the existing code is to cluster the same components together. Then in the ODE function the first operation is to separate out these clusters.
X,Y,J,Q = y.reshape([4,-1])
This splits the input vector into 4 pieces of equal length. At the end you need to reverse this split so that the derivatives are again in a flat vector.
return np.concatenate([dXdt, dYdt, dJdt, dQdt])
Everything else remains the same. Apart from the initial vector, which needs to have 4 segments of length N containing the data for the compartments. Here that could just be
y0 = np.zeros(4*N)
If the initial data is from any other source, and given in records per compartment, you might have to transpose the resulting array before flattening it.
Note that this construction is not vectorized, so leave that option unset in its default False.
For uniform interaction patterns like in a circle I recommend the use of numpy.roll to continue to avoid the use of explicit loops. For an interaction pattern that looks like a network one can use connectivity matrices and masks like in Using python built-in functions for coupled ODEs

Manually recover the original function from numpy rfft

I have performed a numpy.fft.rfft on a function to obtain the Fourier coefficients. Since the docs do not seem to contain the exact formula used, I have been assuming a formula found in a textbook of mine:
S(x) = a_0/2 + SUM(real(a_n) * cos(nx) + imag(a_n) * sin(nx))
where imag(a_n) is the imaginary part of the n_th element of the Fourier coefficients.
To translate this into python-speak, I have implemented the following:
def fourier(freqs, X):
# input the fourier frequencies from np.fft.rfft, and arbitrary X
const_term = np.repeat(np.real(freqs[0])/2, X.shape[0]).reshape(-1,1)
# this is the "n" part of the inside of the trig terms
trig_terms = np.tile(np.arange(1,len(freqs)), (X.shape[0],1))
sin_terms = np.imag(freqs[1:])*np.sin(np.einsum('i,ij->ij', X, trig_terms))
cos_terms = np.real(freqs[1:])*np.cos(np.einsum('i,ij->ij', X, trig_terms))
return np.concatenate((const_term, sin_terms, cos_terms), axis=1)
This should give me an [X.shape[0], 2*freqs.shape[0] - 1] array, containing at entry i,j the i_th element of X evaluated at the j_th term of the Fourier decomposition (where the j_th term is a sin term for odd j).
By summing this array over the axis of Fourier terms, I should obtain the function evaluated at the i_th term in X:
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(-1,1,50)
y = X*(X-0.8)*(X+1)
reconstructed_y = np.sum(
fourier(
np.fft.rfft(y),
X
),
axis = 1
)
plt.plot(X,y)
plt.plot(X, reconstructed_y, c='r')
plt.show()
In any case, the red line should be basically on top of the blue line. Something has gone wrong either in my assumptions about what numpy.fft.rfft returns, or in my specific implementation, but I am having a hard time tracking down the bug. Can anyone shed some light on what I've done wrong here?

How to implement the following formula for derivatives in python?

I'm trying to implement the following formula in python for X and Y points
I have tried following approach
def f(c):
"""This function computes the curvature of the leaf."""
tt = c
n = (tt[0]*tt[3] - tt[1]*tt[2])
d = (tt[0]**2 + tt[1]**2)
k = n/d
R = 1/k # Radius of Curvature
return R
There is something incorrect as it is not giving me correct result. I think I'm making some mistake while computing derivatives in first two lines. How can I fix that?
Here are some of the points which are in a data frame:
pts = pd.DataFrame({'x': x, 'y': y})
x y
0.089631 97.710199
0.089831 97.904541
0.090030 98.099313
0.090229 98.294513
0.090428 98.490142
0.090627 98.686200
0.090827 98.882687
0.091026 99.079602
0.091225 99.276947
0.091424 99.474720
0.091623 99.672922
0.091822 99.871553
0.092022 100.070613
0.092221 100.270102
0.092420 100.470020
0.092619 100.670366
0.092818 100.871142
0.093017 101.072346
0.093217 101.273979
0.093416 101.476041
0.093615 101.678532
0.093814 101.881451
0.094013 102.084800
0.094213 102.288577
pts_x = np.gradient(x_c, t) # first derivatives
pts_y = np.gradient(y_c, t)
pts_xx = np.gradient(pts_x, t) # second derivatives
pts_yy = np.gradient(pts_y, t)
After getting the derivatives I am putting the derivatives x_prim, x_prim_prim, y_prim, y_prim_prim in another dataframe using the following code:
d = pd.DataFrame({'x_prim': pts_x, 'y_prim': pts_y, 'x_prim_prim': pts_xx, 'y_prim_prim':pts_yy})
after having everything in the data frame I am calling function for each row of the data frame to get curvature at that point using following code:
# Getting the curvature at each point
for i in range(len(d)):
temp = d.iloc[i]
c_temp = f(temp)
curv.append(c_temp)
You do not specify exactly what the structure of the parameter pts is. But it seems that it is a two-dimensional array where each row has two values x and y and the rows are the points in your curve. That itself is problematic, since the documentation is not quite clear on what exactly is returned in such a case.
But you clearly are not getting the derivatives of x or y. If you supply only one array to np.gradient then numpy assumes that the points are evenly spaced with a distance of one. But that is probably not the case. The meaning of x' in your formula is the derivative of x with respect to t, the parameter variable for the curve (which is separate from the parameters to the computer functions). But you never supply the values of t to numpy. The values of t must be the second parameter passed to the gradient function.
So to get your derivatives, split the x, y, and t values into separate one-dimensional arrays--lets call them x and y and t. Then get your first and second derivatives with
pts_x = np.gradient(x, t) # first derivatives
pts_y = np.gradient(y, t)
pts_xx = np.gradient(pts_x, t) # second derivatives
pts_yy = np.gradient(pts_y, t)
Then continue from there. You no longer need the t values to calculate the curvatures, which is the point of the formula you are using. Note that gradient is not really designed to calculate the second derivatives, and it absolutely should not be used to calculate third or higher-order derivatives. More complex formulas are needed for those. Numpy's gradient uses "second order accurate central differences" which are pretty good for the first derivative, poor for the second derivative, and worthless for higher-order derivatives.
I think your problem is that x and y are arrays of double values.
The array x is the independent variable; I'd expect it to be sorted into ascending order. If I evaluate y[i], I expect to get the value of the curve at x[i].
When you call that numpy function you get an array of derivative values that are the same shape as the (x, y) arrays. If there are n pairs from (x, y), then
y'[i] gives the value of the first derivative of y w.r.t. x at x[i];
y''[i] gives the value of the second derivative of y w.r.t. x at x[i].
The curvature k will also be an array with n points:
k[i] = abs(x'[i]*y''[i] -y'[i]*x''[i])/(x'[i]**2 + y'[i]**2)**1.5
Think of x and y as both being functions of a parameter t. x' = dx/dt, etc. This means curvature k is also a function of that parameter t.
I like to have a well understood closed form solution available when I program a solution.
y(x) = sin(x) for 0 <= x <= pi
y'(x) = cos(x)
y''(x) = -sin(x)
k = sin(x)/(1+(cos(x))**2)**1.5
Now you have a nice formula for curvature as a function of x.
If you want to parameterize it, use
x(t) = pi*t for 0 <= t <= 1
x'(t) = pi
x''(t) = 0
See if you can plot those and make your Python solution match it.

numpy fit coefficients to linear combination of polynomials

I have data that I want to fit with polynomials. I have 200,000 data points, so I want an efficient algorithm. I want to use the numpy.polynomial package so that I can try different families and degrees of polynomials. Is there some way I can formulate this as a system of equations like Ax=b? Is there a better way to solve this than with scipy.minimize?
import numpy as np
from scipy.optimize import minimize as mini
x1 = np.random.random(2000)
x2 = np.random.random(2000)
y = 20 * np.sin(x1) + x2 - np.sin (30 * x1 - x2 / 10)
def fitness(x, degree=5):
poly1 = np.polynomial.polynomial.polyval(x1, x[:degree])
poly2 = np.polynomial.polynomial.polyval(x2, x[degree:])
return np.sum((y - (poly1 + poly2)) ** 2 )
# It seems like I should be able to solve this as a system of equations
# x = np.linalg.solve(np.concatenate([x1, x2]), y)
# minimize the sum of the squared residuals to find the optimal polynomial coefficients
x = mini(fitness, np.ones(10))
print fitness(x.x)
Your intuition is right. You can solve this as a system of equations of the form Ax = b.
However:
The system is overdefined and you want to get the least-squares solution, so you need to use np.linalg.lstsq instead of np.linalg.solve.
You can't use polyval because you need to separate the coefficients and powers of the independent variable.
This is how to construct the system of equations and solve it:
A = np.stack([x1**0, x1**1, x1**2, x1**3, x1**4, x2**0, x2**1, x2**2, x2**3, x2**4]).T
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx)) # test the result with original fitness function
Of course you can generalize over the degree:
A = np.stack([x1**p for p in range(degree)] + [x2**p for p in range(degree)]).T
With the example data, the least squares solution runs much faster than the minimize solution (800µs vs 35ms on my laptop). However, A can become quite large, so if memory is an issue minimize might still be an option.
Update:
Without any knowledge about the internals of the polynomial function things become tricky, but it is possible to separate terms and coefficients. Here is a somewhat ugly way to construct the system matrix A from a function like polyval:
def construct_A(valfunc, degree):
columns1 = []
columns2 = []
for p in range(degree):
c = np.zeros(degree)
c[p] = 1
columns1.append(valfunc(x1, c))
columns2.append(valfunc(x2, c))
return np.stack(columns1 + columns2).T
A = construct_A(np.polynomial.polynomial.polyval, 5)
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx)) # test the result with original fitness function

Comparing model and data using scipy.optimize

I'm trying to compare a set of discrete data values with a model to estimate the "x" value where there is a good match between the discrete data points and the model. In other words, I'm trying to estimate the x value (or the range of x) where the differences between data (discrete points) and the model are minimum. I've a model that provides Ya(x), Yb(x), Yc(x) (continuous lines). I also have the data points A, B and C (filled circles). I would like to estimate the x value where the data points A, B and C (or most of the points) match well with the corresponding continuous line. I also plot the (model-data)^2 as a function of x. It appears from the second plot that a good match can be obtained for the x range 5.e3 to 1.e4. I was wondering if I may use any scipy.optimize subroutine to estimate it quantitatively.
Thanks for your time and any help would be greatly appreciated.
I think this might be some pseudo-code to get you started. See if this actually matches what you want.
import scipy.optimize
import numpy as np
def yA(x):
# whatever calculations here you do for curve A
return(1.0) # return whatever yA is at x
def yB(x):
# whatever calculations here you do for curve B
return(1.0) # return whatever yB is at x
def yC(x):
# whatever calculations here you do for curve C
return(1.0) # return whatever yC is at x
def func(x,data):
A,B,C = data # unpack tuple
devA = np.abs((yA(x)-A)/yA(x)) # normalize the deviations
devB = np.abs((yB(x)-B)/yB(x)) # to account for the order
devC = np.abs((yC(x)-C)/yC(x)) # of magnitude variations
return(devA+devB+devC) # you want to minimize the sum of the deviations
A = 1.0E-10 # these are your data points (rough guess from plot)
B = 1.0E-11
C = 1.0E-8
x0 = 1000.0 # an initial guess
result = scipy.optimize.minimize(func,x0,args=(A,B,C))
print(result.x)

Categories