numpy roots negative exponential function - python

I have read numpy.roots, which works out common algebraic function's y axis intersections.
which
y = ax^n + bx^{n - 1} + cx^{n - 2} ...
and exponentials are always natural number.
so by passing in
[1, 2, 3]
I am basically working out
y = x^2 + 2x + 3
but I need to bring in negative exponentials, such as
y = x^2 + 2x + 3 - x^{-1}
I wonder if I can construct a dict, for the instance above
{2: 1, 1: 2, 0: 3, -1: 1}
which keys are exponentials and values are coefficients, and work out the roots.
any suggestions will be appreciated.

If you got a "polynomial" with negative exponents, it is not really a
polynomial but a fractional function, so to find the roots you can simply find the roots of the numerator polynomial.
In a function
you can factor out the last term
which is equal to
So we can say
The function g(x) is a polynomial of n+m degree and finding the roots of g(x) you'll get the roots of f(x) because the denominator cannot be zero (x=0 is outside the domain of f(x), when it's zero you get a singularity, in this case a division by zero which is impossible).
EDIT
We can graphically verify. For example, let's take a function with these coefficients
import numpy as np
coeffs = [1, 6, -6, -64, -27, 90]
roots = np.roots(coeffs)
roots
array([-5., 3., -3., -2., 1.])
As you can see we got 5 real roots.
Let's now define a polynomial with the given coefficients and a fractional function
import matplotlib.pyplot as plt
def print_func(func, func_name):
fig, ax = plt.subplots(figsize=(12, .5))
ax.set_xticks([])
ax.set_yticks([])
ax.axis('off')
ax.text(0, .5,
f"{func_name}\n"
fr"${func}$",
fontsize=15,
va='center'
)
plt.show()
def polynom(x, coeffs):
res = 0
terms = []
for t, c in enumerate(coeffs[::-1]):
res += c * x**t
terms.append(f"{c:+} x^{{{t}}}")
func = "".join(terms[::-1])
print_func(func, "polynomial")
return res
def fract(x, coeffs, min_exp):
res = 0
terms = []
for t, c in enumerate(coeffs[::-1]):
e = t + min_exp
res += c * x**e
terms.append(f"{c:+} x^{{{e}}}")
func = "".join(terms[::-1])
print_func(func, "fractional")
return res
x = np.linspace(-6, 4, 100)
So this is the polynomial
y1 = polynom(x, coeffs)
and this is the fractional function
y2 = fract(x, coeffs, -2)
Let's plot them both and the found roots
plt.plot(x, y1, label='polynomial')
plt.plot(x, y2, label='fractional')
plt.axhline(0, color='k', ls='--', lw=1)
plt.plot(roots, np.repeat(0, roots.size), 'o', label='roots')
plt.ylim(-100, 100)
plt.legend()
plt.show()
You see they've got the same roots.
Please note that, if you had a fractional like
y2 = fract(x, coeffs, -3)
i.e where the denominator has an odd exponent, you could think there is a root in x=0 too, simply looking at the plot
but that's not a root, that's a singularity, a vertical asymptote to ±∞. With an even exponent denominator the singularity will go to +∞.

Related

Applying Riemann–Liouville derivative to Lorenz 3D equation

I am using a module in python differint to solve the system of Lorenz 3D equation. After running my 3D system over differint ==> Riemann-Liouville operator for alpha value 1 the original equation and Riemann-Liouville results are not same . The code is mentioned below
from scipy.integrate import odeint
import numpy as np
import matplotlib.pyplot as plt
import differint.differint as df
t = np.arange(1 , 50, 0.01)
def Lorenz(state,t):
# unpack the state vector
x = state[0]
y = state[1]
z = state[2]
a=10;b=8/3;c=28
xd = a*(y -x)
yd = - y +c*x - x*z
zd = -b*z + x*y
return [xd,yd,zd]
state0 = [1,1,1]
state = odeint(Lorenz, state0, t)
#Simple lorentz eqaution plot
plt.subplot(2, 2, 1)
plt.plot(state[:,0],state[:,1])
plt.subplot(2, 2, 2)
plt.plot(state[:,0],state[:,2])
plt.subplot(2, 2, 3)
plt.plot(state[:,1],state[:,2])
plt.show()
DF = df.RL(1, state, 0, len(t), len(t))
# Riemann-Liouville plots
state=DF
plt.subplot(2, 2, 1)
plt.plot(state[:,0],state[:,1])
plt.subplot(2,2, 2)
plt.plot(state[:,0],state[:,2])
plt.subplot(2, 2, 3)
plt.plot(state[:,1],state[:,2])
plt.show()
Am i making mistake anywhere or this is the true result?
as you can see in eqaution (2) when we put α = 1 we will get the results same as of non fractional system (1). Interested in calculating equation (3) for different values of alpha
I think this perhaps the idea of mine is incorrect, because what i am doing is first calculating the system of differential equation using
state = odeint(Lorenz, state0, t)
followed by the differint module
DF = df.RL(1, state, 0, len(t), len(t))
Graphs for lorenz eqaution
Graphs for RL fractional for alpha = 1
in these graphs as u can see that the trajectories are absolutely same but the scaling become different
The call to DF.RL requires a function of the differential equation, however, you used the solution to a differential equation - state (output of odeint call). From the package site https://pypi.org/project/differint/
def f(x):
return x**0.5
DF = df.RL(0.5, f)
print(DF)
Can you try DF = df.RL(1, Lorenz, 0, len(t), len(t)) ?

How to resolve function approximation task in Python?

Consider the complex mathematical function on the line [1, 15]:
f(x) = sin(x / 5) * exp(x / 10) + 5 * exp(-x / 2)
polynomial of degree n (w_0 + w_1 x + w_2 x^2 + ... + w_n x^n) is uniquely defined by any n + 1 different points through which it passes.
This means that its coefficients w_0, ... w_n can be determined from the following system of linear equations:
Where x_1, ..., x_n, x_ {n + 1} are the points through which the polynomial passes, and by f (x_1), ..., f (x_n), f (x_ {n + 1}) - values that it must take at these points.
I'm trying to form a system of linear equations (that is, specify the coefficient matrix A and the free vector b) for the polynomial of the third degree, which must coincide with the function f at points 1, 4, 10, and 15. Solve this system using the scipy.linalg.solve function.
A = numpy.array([[1., 1., 1., 1.], [1., 4., 8., 64.], [1., 10., 100., 1000.], [1., 15., 225., 3375.]])
V = numpy.array([3.25, 1.74, 2.50, 0.63])
numpy.linalg.solve(A, V)
I got the wrong answer, which is
So the question is: is the matrix correct?
No, your matrix is not correct.
The biggest mistake is your second sub-matrix for A. The third entry should be 4**2 which is 16 but you have 8. Less important, you have only two decimal places for your constants array V but you really should have more precision than that. Systems of linear equations are sometimes very sensitive to the provided values, so make them as precise as possible. Also, the rounding in your final three entries is bad: you rounded down but you should have rounded up. If you really want two decimal places (which I do not recommend) the values should be
V = numpy.array([3.25, 1.75, 2.51, 0.64])
But better would be
V = numpy.array([3.252216865271419, 1.7468459495903677,
2.5054164070002463, 0.6352214195786656])
With those changes to A and V I get the result
array([ 4.36264154, -1.29552587, 0.19333685, -0.00823565])
I get these two sympy plots, the first showing your original function and the second using the approximated cubic polynomial.
They look close to me! When I calculate the function values at 1, 4, 10, and 15, the largest absolute error is for 15, namely -4.57042132584462e-6. That is somewhat larger than I would have expected but probably is good enough.
Is it from data science course? :)
Here is an almost generic solution I did:
%matplotlib inline
import numpy as np;
import math;
import matplotlib.pyplot as plt;
def f(x):
return np.sin(x / 5) * np.exp(x / 10) + 5 * np.exp(-x / 2)
# approximate at the given points (feel free to experiment: change/add/remove)
points = np.array([1, 4, 10, 15])
n = points.size
# fill A-matrix, each row is 1 or xi^0, xi^1, xi^2, xi^3 .. xi^n
A = np.zeros((n, n))
for index in range(0, n):
A[index] = np.power(np.full(n, points[index]), np.arange(0, n, 1))
# fill b-matrix, i.e. function value at the given points
b = f(points)
# solve to get approximation polynomial coefficents
solve = np.linalg.solve(A,b)
# define the polynome approximation of the function
def polinom(x):
# Yi = solve * Xi where Xi = x^i
tiles = np.tile(x, (n, 1))
tiles[0] = np.ones(x.size)
for index in range(1, n):
tiles[index] = tiles[index]**index
return solve.dot(tiles)
# plot the graphs of original function and its approximation
x = np.linspace(1, 15, 100)
plt.plot(x, f(x))
plt.plot(x, polinom(x))
# print out the coefficients of polynome approximating our function
print(solve)

Printing a Variable using matplotlib's ax.text prints a list of values instead of a number?

I'm currently trying to make a program which will plot a function using matplotlib, graph it, shade the area under the curve between two variables, and use Simpson's 3/8th's rule to calculate the shaded area. However, when trying to print the variable I've assigned to the final value of the integral, it prints a list.
To begin, here's the base of my code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
This definition defines the function I will be working with here, a simple polynomial.
def func(x):
return (x - 3) * (x - 5) * (x - 7) + 85
Here is the function which calculates the area under the curve
def simpson(function, a, b, n):
"""Approximates the definite integral of f from a to b by the
composite Simpson's rule, using n subintervals (with n even)"""
if n % 2:
raise ValueError("n must be even (received n=%d)" % n)
h = (b - a) / n #The first section of Simpson's 3/8ths rule
s = function(a) + function(b) #The addition of functions over an interval
for i in range(1, n, 2):
s += 4 * function(a + i * h)
for i in range(2, n-1, 2):
s += 2 * function(a + i * h)
return s * h / 3
Now the simpson's rule definition is over, and I define a few variables for simplicity.
a, b = 2, 9 # integral limits
x = np.linspace(0, 10) #Generates 100 points evenly spaced between 0 and 10
y = func(x) #Just defines y to be f(x) so its ez later on
fig, ax = plt.subplots()
plt.plot(x, y, 'r', linewidth=2)
plt.ylim(ymin=0)
final_integral = simpson(lambda x: y, a, b, 100000)
At this point something must have broken down, but I'll include the rest of the code in case you can spot the issue further on.
# Make the shaded region
ix = np.linspace(a, b)
iy = func(ix)
verts = [(a, 0)] + list(zip(ix, iy)) + [(b, 0)]
poly = Polygon(verts, facecolor='0.9', edgecolor='0.5')
ax.add_patch(poly)
plt.text(0.5 * (a + b), 30, r"$\int_a^b f(x)\mathrm{d}x$",
horizontalalignment='center', fontsize=20)
ax.text(0.25, 135, r"Using Simpson's 3/8ths rule, the area under the curve is: ", fontsize=20)
Here is where the integral value should be printed:
ax.text(0.25, 114, final_integral , fontsize=20)
Here is the rest of the code necessary to plot the graph:
plt.figtext(0.9, 0.05, '$x$')
plt.figtext(0.1, 0.9, '$y$')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.xaxis.set_ticks_position('bottom')
ax.set_xticks((a, b))
ax.set_xticklabels(('$a$', '$b$'))
ax.set_yticks([])
plt.show()
When running this program, you get this graph, and a series of numbers has been printed where the area under the curve should be
Any help here is appreciated. I'm totally stuck. Also, sorry if this is a tad long, it's my first question on the forum.
Have you tried feeding your simpson() function the func() directly, as opposed to using the lambda setup?
I think this could work:
final_integral = simpson(func, a, b, 100000)
You might also try:
final_integral = simpson(lambda x: func(x), a, b, 100000)
What is happening is that y is an array with values func(x), and when you use the expression lambda x: y you are actually creating a constant function of the form f(x) = y = const. Your final_integral is then a list of integrals, where each integrand was the constant function with a particular value from the y array.
Note that you might want to format this number when you print it on the graph, in case it has a lot of trailing decimal points. How you do this depends on whether you are using Python 2 or 3.
You assigned x as linspace which is an array so y is also an array of values of a function of x. You can replace this line of code:
#old:
final_integral = simpson(lambda x:y, a, b, 100000)
#new:
final_integral = simpson(lambda t:func(t), a, b, 100000)
Changing the variable from x to t will give you the value for the area under that the curve. Hope this helps.

Second order gradient in numpy

I am trying to calculate the 2nd-order gradient numerically of an array in numpy.
a = np.sin(np.arange(0, 10, .01))
da = np.gradient(a)
dda = np.gradient(da)
This is what I come up. Is the the way it should be done?
I am asking this, because in numpy there isn't an option saying np.gradient(a, order=2). I am concerned about whether this usage is wrong, and that is why numpy does not have this implemented.
PS1: I do realize that there is np.diff(a, 2). But this is only single-sided estimation, so I was curious why np.gradient does not have a similar keyword.
PS2: The np.sin() is a toy data - the real data does not have an analytic form.
Thank you!
I'll second #jrennie's first sentence - it can all depend. The numpy.gradient function requires that the data be evenly spaced (although allows for different distances in each direction if multi-dimensional). If your data does not adhere to this, than numpy.gradient isn't going to be much use. Experimental data may have (OK, will have) noise on it, in addition to not necessarily being all evenly spaced. In this case it might be better to use one of the scipy.interpolate spline functions (or objects). These can take unevenly spaced data, allow for smoothing, and can return derivatives up to k-1 where k is the order of the spline fit requested. The default value for k is 3, so a second derivative is just fine.
Example:
spl = scipy.interpolate.splrep(x,y,k=3) # no smoothing, 3rd order spline
ddy = scipy.interpolate.splev(x,spl,der=2) # use those knots to get second derivative
The object oriented splines like scipy.interpolate.UnivariateSpline have methods for the derivatives. Note that the derivative methods are implemented in Scipy 0.13 and are not present in 0.12.
Note that, as pointed out by #JosephCottham in comments in 2018, this answer (good for Numpy 1.08 at least), is no longer applicable since (at least) Numpy 1.14. Check your version number and the available options for the call.
There's no universal right answer for numerical gradient calculation. Before you can calculate the gradient about sample data, you have to make some assumption about the underlying function that generated that data. You can technically use np.diff for gradient calculation. Using np.gradient is a reasonable approach. I don't see anything fundamentally wrong with what you are doing---it's one particular approximation of the 2nd derivative of a 1-D function.
The double gradient approach fails for discontinuities in the first derivative.
As the gradient function takes one data point to the left and to the right into account, this continues/spreads when applying it multiple times.
On the other hand side, the second derivative can be calculated by the formula
d^2 f(x[i]) / dx^2 = (f(x[i-1]) - 2*f(x[i]) + f(x[i+1])) / h^2
compare here. This has the advantage to just take the two neighboring pixels into account.
In the picture the double np.gradient approach (left) and the above mentioned formula (right), as implemented by np.diff are compared. As f(x) has only one kink at zero, the second derivative (green) should only there have a peak.
As the double gradient solution takes 2 neighboring points in each direction into account, this leads to finite second derivative values at +/- 1.
In some cases, however, you may want to prefer the double gradient solution, as this is more robust to noise.
I am not sure why there is np.gradient and np.diff, but a reason might be, that the second argument of np.gradient defines the pixel distance (for each dimension) and for images it can be applied for both dimensions simultaneously gy, gx = np.gradient(a).
Code
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(-5,6,1)
f = np.abs(xs)
f_x = np.gradient(f)
f_xx_bad = np.gradient(f_x)
f_xx_good = np.diff(f, 2)
test = f[:-2] - 2* f[1:-1] + f[2:]
# lets plot all this
fig, axs = plt.subplots(1, 2, figsize=(9, 3), sharey=True)
ax = axs[0]
ax.set_title('bad: double gradient')
ax.plot(xs, f, marker='o', label='f(x)')
ax.plot(xs, f_x, marker='o', label='d f(x) / dx')
ax.plot(xs, f_xx_bad, marker='o', label='d^2 f(x) / dx^2')
ax.legend()
ax = axs[1]
ax.set_title('good: diff with n=2')
ax.plot(xs, f, marker='o', label='f(x)')
ax.plot(xs, f_x, marker='o', label='d f(x) / dx')
ax.plot(xs[1:-1], f_xx_good, marker='o', label='d^2 f(x) / dx^2')
ax.plot(xs[1:-1], test, marker='o', label='test', markersize=1)
ax.legend()
As I keep stepping over this problem in one form or the other again and again, I decided to write a function gradient_n, which adds an differentiation oder functionality to np.gradient. Not all functionalities of np.gradient are supported, like differentiation of mutiple axis.
Like np.gradient, gradient_n returns the differentiated result in the same shape as the input. Also a pixel distance argument (d) is supported.
import numpy as np
def gradient_n(arr, n, d=1, axis=0):
"""Differentiate np.ndarray n times.
Similar to np.diff, but additional support of pixel distance d
and padding of the result to the same shape as arr.
If n is even: np.diff is applied and the result is zero-padded
If n is odd:
np.diff is applied n-1 times and zero-padded.
Then gradient is applied. This ensures the right output shape.
"""
n2 = int((n // 2) * 2)
diff = arr
if n2 > 0:
a0 = max(0, axis)
a1 = max(0, arr.ndim-axis-1)
diff = np.diff(arr, n2, axis=axis) / d**n2
diff = np.pad(diff, tuple([(0,0)]*a0 + [(1,1)] +[(0,0)]*a1),
'constant', constant_values=0)
if n > n2:
assert n-n2 == 1, 'n={:f}, n2={:f}'.format(n, n2)
diff = np.gradient(diff, d, axis=axis)
return diff
def test_gradient_n():
import matplotlib.pyplot as plt
x = np.linspace(-4, 4, 17)
y = np.linspace(-2, 2, 9)
X, Y = np.meshgrid(x, y)
arr = np.abs(X)
arr_x = np.gradient(arr, .5, axis=1)
arr_x2 = gradient_n(arr, 1, .5, axis=1)
arr_xx = np.diff(arr, 2, axis=1) / .5**2
arr_xx = np.pad(arr_xx, ((0, 0), (1, 1)), 'constant', constant_values=0)
arr_xx2 = gradient_n(arr, 2, .5, axis=1)
assert np.sum(arr_x - arr_x2) == 0
assert np.sum(arr_xx - arr_xx2) == 0
fig, axs = plt.subplots(2, 2, figsize=(29, 21))
axs = np.array(axs).flatten()
ax = axs[0]
ax.set_title('x-cut')
ax.plot(x, arr[0, :], marker='o', label='arr')
ax.plot(x, arr_x[0, :], marker='o', label='arr_x')
ax.plot(x, arr_x2[0, :], marker='x', label='arr_x2', ls='--')
ax.plot(x, arr_xx[0, :], marker='o', label='arr_xx')
ax.plot(x, arr_xx2[0, :], marker='x', label='arr_xx2', ls='--')
ax.legend()
ax = axs[1]
ax.set_title('arr')
im = ax.imshow(arr, cmap='bwr')
cbar = ax.figure.colorbar(im, ax=ax, pad=.05)
ax = axs[2]
ax.set_title('arr_x')
im = ax.imshow(arr_x, cmap='bwr')
cbar = ax.figure.colorbar(im, ax=ax, pad=.05)
ax = axs[3]
ax.set_title('arr_xx')
im = ax.imshow(arr_xx, cmap='bwr')
cbar = ax.figure.colorbar(im, ax=ax, pad=.05)
test_gradient_n()
This is an excerpt from the original documentation (at the time of writing found at http://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html). It states that unless the sampling distance is 1 you need to include a list containing the distances as an argument.
numpy.gradient(f, *varargs, **kwargs)
Return the gradient of an N-dimensional array.
The gradient is computed using second order accurate central differences in the interior and either first differences or second order accurate one-sides (forward or backwards) differences at the boundaries. The returned gradient hence has the same shape as the input array.
Parameters:
f : array_like
An N-dimensional array containing samples of a scalar function.
varargs : list of scalar, optional
N scalars specifying the sample distances for each dimension, i.e. dx, dy, dz, ... Default distance: 1.
edge_order : {1, 2}, optional
Gradient is calculated using Nth order accurate differences at the boundaries. Default: 1.
New in version 1.9.1.
Returns:
gradient : ndarray
N arrays of the same shape as f giving the derivative of f with respect to each dimension.
My solution is to create a function similar to np.gradient that calculates the 2nd derivatives numerically from the array data.
import numpy as np
def gradient2_even(y, h=None, edge_order=1):
"""
Return the 2nd-order gradient i.e.
2nd derivatives of y with n samples and k components.
The 2nd-order gradient is computed using second-order-accurate central differences
in the interior points and either first or second order accurate one-sided
(forward or backwards) differences at the boundaries.
The returned gradient hence has the same shape as the input array.
Parameters
----------
y : 1d or 2d array_like
The array containing the samples. If 2d with shape (n,k),
n is the number of samples at least 2 while k is the number of
y series/components. 1d input is equivalent to 2d input with shape (n,1).
h : constant or 1d, optional
spacing between the y samples. Default unitary spacing for
all y components. Spacing can be specified using:
1. Single scalar spacing value for all y components.
2. 1d array_like of length k specifying the spacing for each y component
edge_order : {1, 2}, optional
Order 1 means 3-point forward/backward finite differences
are used to calculate the 2nd derivatves at the edge points while
order 2 uses 4-point forward/backward finite differences.
Returns
----------
d2y : 1d or 2d array
Array containing the 2nd derivatives. The output shape is the same as y.
"""
if edge_order!=1 and edge_order!=2:
raise ValueError('edge_order must be 1 or 2.')
else:
pass
y = np.asfarray(y)
origshape = y.shape
if y.ndim!=1 and y.ndim!=2:
raise ValueError('y can only be 1d or 2d.')
elif y.ndim==1:
y = np.atleast_2d(y).T
elif y.ndim==2:
if y.shape[0]<2:
raise ValueError('The number of y samples must be atleast 2.')
else:
pass
else:
pass
n,k = y.shape
if h is None:
h = 1.0
else:
h = np.asfarray(h)
if h.ndim!=0 and h.ndim!=1:
raise ValueError('h can only be 0d or 1d.')
elif h.ndim==0:
pass
elif h.ndim==1 and h.size!=n:
raise ValueError('If h is 1d, it must have the same number as the components of y.')
else:
pass
d2y = np.zeros_like(y)
if n==2:
pass
elif n==3:
d2y[:] = ( 1/h**2 * (y[0] - 2*y[1] + y[2]) )
else:
d2y = np.zeros_like(y)
d2y[1:-1]=1/h**2 * ( y[:-2] - 2*y[1:-1] + y[2:] )
if edge_order==1:
d2y[0]=1/h**2 * ( y[0] - 2*y[1] + y[2] )
d2y[-1]=1/h**2 * ( y[-1] - 2*y[-2] + y[-3] )
else:
d2y[0]=1/h**2 * ( 2*y[0] - 5*y[1] + 4*y[2] - y[3] )
d2y[-1]=1/h**2 * ( 2*y[-1] - 5*y[-2] + 4*y[-3] - y[-4] )
return d2y.reshape(origshape)
Using your example,
# After importing the function from the script file or running it
from numpy import *
from matplotlib.pyplot import *
x, h = linspace(0, 10, 17) # use a fairly coarse grid to see the discrepancies better
y = sin(x)
ypp = -sin(x) # analytical 2nd derivatives
# Compute numerically the 2nd derivatives using 2nd-order finite differences at the edge points
d2y = gradient2_even(y, h, 2)
# Compute numerically the 2nd derivatives using nested gradient function
d2y2 = gradient(gradient(y, h, edge_order=2), h, edge_order=2)
# Compute numerically the 2nd derivatives using 1st-order finite differences at the edge points
d2y3 = gradient2_even(y, h, 1)
fig,ax=subplots(1,1)
ax.plot(x, ypp, x, d2y, 'o', x, d2y2, 'o', x, d2y3, 'o'), ax.grid()
ax.legend(['Analytical', 'edge_order=2', 'nested gradient', 'edge_order=1'])
fig.tight_layout()

Python - calculating trendlines with errors

So I've got some data stored as two lists, and plotted them using
plot(datasetx, datasety)
Then I set a trendline
trend = polyfit(datasetx, datasety)
trendx = []
trendy = []
for a in range(datasetx[0], (datasetx[-1]+1)):
trendx.append(a)
trendy.append(trend[0]*a**2 + trend[1]*a + trend[2])
plot(trendx, trendy)
But I have a third list of data, which is the error in the original datasety. I'm fine with plotting the errorbars, but what I don't know is using this, how to find the error in the coefficients of the polynomial trendline.
So say my trendline came out to be 5x^2 + 3x + 4 = y, there needs to be some sort of error on the 5, 3 and 4 values.
Is there a tool using NumPy that will calculate this for me?
I think you can use the function curve_fit of scipy.optimize (documentation). A basic example of the usage:
import numpy as np
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*x**2 + b*x + c
x = np.linspace(0,4,50)
y = func(x, 5, 3, 4)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
Following the documentation, pcov gives:
The estimated covariance of popt. The diagonals provide the variance
of the parameter estimate.
So in this way you can calculate an error estimate on the coefficients. To have the standard deviation you can take the square root of the variance.
Now you have an error on the coefficients, but it is only based on the deviation between the ydata and the fit. In case you also want to account for an error on the ydata itself, the curve_fit function provides the sigma argument:
sigma : None or N-length sequence
If not None, it represents the standard-deviation of ydata. This
vector, if given, will be used as weights in the least-squares
problem.
A complete example:
import numpy as np
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*x**2 + b*x + c
x = np.linspace(0,4,20)
y = func(x, 5, 3, 4)
# generate noisy ydata
yn = y + 0.2 * y * np.random.normal(size=len(x))
# generate error on ydata
y_sigma = 0.2 * y * np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)
# plot
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o')
ax.plot(x, np.polyval(popt, x), '-')
ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))
ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))
ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))
ax.grid()
plt.show()
Then something else, about using numpy arrays. One of the main advantages of using numpy is that you can avoid for loops because operations on arrays apply elementwise. So the for-loop in your example can also be done as following:
trendx = arange(datasetx[0], (datasetx[-1]+1))
trendy = trend[0]*trendx**2 + trend[1]*trendx + trend[2]
Where I use arange instead of range as it returns a numpy array instead of a list.
In this case you can also use the numpy function polyval:
trendy = polyval(trend, trendx)
I have not been able to find any way of getting the errors in the coefficients that is built in to numpy or python. I have a simple tool that I wrote based on Section 8.5 and 8.6 of John Taylor's An Introduction to Error Analysis. Maybe this will be sufficient for your task (note the default return is the variance, not the standard deviation). You can get large errors (as in the provided example) because of significant covariance.
def leastSquares(xMat, yMat):
'''
Purpose
-------
Perform least squares using the procedure outlined in 8.5 and 8.6 of Taylor, solving
matrix equation X a = Y
Examples
--------
>>> from scipy import matrix
>>> xMat = matrix([[ 1, 5, 25],
[ 1, 7, 49],
[ 1, 9, 81],
[ 1, 11, 121]])
>>> # matrix has rows of format [constant, x, x^2]
>>> yMat = matrix([[142],
[168],
[211],
[251]])
>>> a, varCoef, yRes = leastSquares(xMat, yMat)
>>> # a is a column matrix, holding the three coefficients a, b, c, corresponding to
>>> # the equation a + b*x + c*x^2
Returns
-------
a: matrix
best fit coefficients
varCoef: matrix
variance of derived coefficents
yRes: matrix
y-residuals of fit
'''
xMatSize = xMat.shape
numMeas = xMatSize[0]
numVars = xMatSize[1]
xxMat = xMat.T * xMat
xyMat = xMat.T * yMat
xxMatI = xxMat.I
aMat = xxMatI * xyMat
yAvgMat = xMat * aMat
yRes = yMat - yAvgMat
var = (yRes.T * yRes) / (numMeas - numVars)
varCoef = xxMatI.diagonal() * var[0, 0]
return aMat, varCoef, yRes

Categories