retrieve the standard deviation of the y-intercept - python

I am using polyfit to fit my data to a line. The equation of the line is of the form y = mx + b. I am trying to retrieve the error on the slope and the error on the y-intercept. Here is my code:
fit, res, _, _, _ = np.polyfit(X,Y,1, full = True)
This method returns the residuals. But I don't want the residuals. So here's another method I used:
slope, intercept, r_value, p_value, std_err = stats.linregress(X,Y)
I am aware that std_err returns the error on the slope. I still need to get the standard deviation of the y-intercept. How can I do this?

If you can use a least squares fit, you can calculate the slope, y-intercept, correlation coefficient, standard deviation of the slope, and standard deviation of the y-intercept with the following function:
import numpy as np
def lsqfity(X, Y):
"""
Calculate a "MODEL-1" least squares fit.
The line is fit by MINIMIZING the residuals in Y only.
The equation of the line is: Y = my * X + by.
Equations are from Bevington & Robinson (1992)
Data Reduction and Error Analysis for the Physical Sciences, 2nd Ed."
pp: 104, 108-109, 199.
Data are input and output as follows:
my, by, ry, smy, sby = lsqfity(X,Y)
X = x data (vector)
Y = y data (vector)
my = slope
by = y-intercept
ry = correlation coefficient
smy = standard deviation of the slope
sby = standard deviation of the y-intercept
"""
X, Y = map(np.asanyarray, (X, Y))
# Determine the size of the vector.
n = len(X)
# Calculate the sums.
Sx = np.sum(X)
Sy = np.sum(Y)
Sx2 = np.sum(X ** 2)
Sxy = np.sum(X * Y)
Sy2 = np.sum(Y ** 2)
# Calculate re-used expressions.
num = n * Sxy - Sx * Sy
den = n * Sx2 - Sx ** 2
# Calculate my, by, ry, s2, smy and sby.
my = num / den
by = (Sx2 * Sy - Sx * Sxy) / den
ry = num / (np.sqrt(den) * np.sqrt(n * Sy2 - Sy ** 2))
diff = Y - by - my * X
s2 = np.sum(diff * diff) / (n - 2)
smy = np.sqrt(n * s2 / den)
sby = np.sqrt(Sx2 * s2 / den)
return my, by, ry, smy, sby
print lsqfity([0,2,4,6,8],[0,3,6,9,12])
Output:
(1, 0, 1.0, 0.0, 2.4494897427831779)
The function was written by Filipe P. A. Fernandes, and was originally posted here.

Related

Integrating and fitting coupled ODE's for SIR modelling

In this case, there are 3 ODE's that describe a SIR model. The issue comes in I want to calculate which beta and gamma values are the best to fit onto the datapoints from the x_axis and y_axisvalues. The method I'm currently using is to integrate the ODE's using odeintfrom the scipy library and the curve_fit method also from the same library. In this case, how would you calculate the values for beta and gamma to fit the datapoints?
P.S. the current error is this: ValueError: operands could not be broadcast together with shapes (3,) (14,)
#initial values
S_I_R = (0.762/763, 1/763, 0)
x_axis = [m for m in range(1,15)]
y_axis = [3,8,28,75,221,291,255,235,190,125,70,28,12,5]
# ODE's that describe the system
def equation(SIR_Values,t,beta,gamma):
Array = np.zeros((3))
SIR = SIR_Values
Array[0] = -beta * SIR[0] * SIR[1]
Array[1] = beta * SIR[0] * SIR[1] - gamma * SIR[1]
Array[2] = gamma * SIR[1]
return Array
# Results = spi.odeint(equation,S_I_R,time)
#fitting the values
beta_values,gamma_values = curve_fit(equation, x_axis,y_axis)
# Starting values
S0 = 762/763
I0 = 1/763
R0 = 0
x_axis = np.array([m for m in range(0,15)])
y_axis = np.array([1,3,8,28,75,221,291,255,235,190,125,70,28,12,5])
y_axis = np.divide(y_axis,763)
def sir_model(y, x, beta, gamma):
S = -beta * y[0] * y[1]
R = gamma * y[1]
I = beta * y[0] * y[1] - gamma * y[1]
return S, I, R
def fit_odeint(x, beta, gamma):
return spi.odeint(sir_model, (S0, I0, R0), x, args=(beta, gamma))[:,1]
popt, pcov = curve_fit(fit_odeint, x_axis, y_axis)
beta,gamma = popt
fitted = fit_odeint(x_axis,*popt)
plt.plot(x_axis,y_axis, 'o', label = "infected per day")
plt.plot(x_axis, fitted, label = "fitted graph")
plt.xlabel("Time (in days)")
plt.ylabel("Fraction of infected")
plt.title("Fitted beta and gamma values")
plt.legend()
plt.show()
As this example from scipy documentation, the function must output an array with the same size as x_axis and y_axis.

Gaussian fit not taking into account negative part of the peak

As you guys can see in the below picture, I am doing a gaussian fit on the spectrum that has some of it in the negative part of the y-axis:
This is how I am doing the fit:
def Gauss(velo_peak, a, mu0, sigma):
res = a * np.exp(-(velo_peak - mu0)**2 / (2 * sigma**2))
return res
mu0 = sum(velo_peak * spec_peak) / sum(spec_peak)
sigma = np.sqrt(sum(spec_peak * (velo_peak - mu0)**2) / sum(spec_peak))
peak = max(spec_peak)
p0 = [peak, mu0, sigma]
popt,pcov = curve_fit(Gauss, velo_peak, spec_peak, p0, maxfev=100000)
my main goal is to find the value of the peak of the spectrum, but it is clearly an over-estimate of the peak value. Is there some condition I can apply to the gaussian fit function?
Since you can define whatever function you want, try to add an offset to your Gauss function:
def Gauss(velo_peak, a, mu0, sigma, offs):
res = a * np.exp(-(velo_peak - mu0)**2 / (2 * sigma**2)) + offs
return res

Couple Differential Equations using Python

I am trying to solve a system of geodesics orbital equations using python. They are coupled ordinary equations. I've tried different approaches, but they all yielded me a wrong shape (the shape should be some periodic function when plotting r and phi). Any idea on how to do this?
Here are my constants
G = 4.30091252525 * (pow(10, -3)) #Gravitational constant in (parsec*km^2)/(Ms*sec^2)
c = 0.0020053761 #speed of light , AU/sec
M = 170000 #mass of the central body, in solar masses
m = 10 #mass of the orbiting body, in solar masses
rs = 2 * G * M / pow(c, 2) #Schwarzschild radius
Lz= 0.000024 #Angular momemntum
h = Lz / m #Just the constant in equation
E= 1.715488e-007 #energy
And initial conditions are:
Y(0) = rs
Phi(0) = math.pi
Orbital equations
The way I tried to do it:
def rhs(t, u):
Y, phi = u
dY = np.sqrt((E**2 / (m**2 * c**2) - (1 - rs / Y) * (c**2 + h**2 / Y**2)))
dphi = L / Y**2
return [dY, dphi]
Y0 = np.array([rs,math.pi])
sol = solve_ivp(rhs, [1, 1000], Y0, method='Radau', dense_output=True)
It seems like you are looking at the spacial coordinates in an invariant plane of the geodesic equations of an object moving in Schwarzschild gravity.
One can use many different methods, which preserve as much of the underlying geometric structure of the model as possible, like symplectic geometric integrators or perturbation theory. As Lutz Lehmann pointed out in the comments, the default method for 'solve_ivp' uses as default the Dormand-Prince (4)5 stepper that utilizes the extrapolation mode, that is, the order 5 step, with the step size selection driven by the error estimate of the order 4 step.
Warning: your initial condition for Y equals Schwarzschild's radius, so these equations may fail or require special treatment (especially the time component of the equations, which you have not included here!) It may be that you have to switch to different coordinates, that remove the singularity at the even horizon. Moreover, the solutions may not be periodic curves, but quasi-periodic, so they may not close up nicely.
For a quick and dirty treatment, but possibly a fairly accurate one, I would differentiate the first equation
(dr / dtau)^2 = (E2_mc2 - c2) + (2*GM)/r - (h^2)/(r^2) + (r_schw*h^2)/(r^3)
with respect to the proper time tau, then cancel out the first derivative dr / dtau with respect to r on both sides, and end up with an equation with second derivative for the radius r on the left. Then turn this second derivative equation into a pair of first derivative equations for r and its rate of change v, i.e
dphi / dtau = h / (r^2)
dr / dtau = v
dv / dtau = - GM / (r^2) + h^2 / (r^3) - 3*r_schw*(h^2) / (2*r^4)
and calculate from the original equation for r and its first derivative dr / dtau an initial value for the rate of change v = dr / dtau, i.e. I would solve for v the equations with r=r0:
(v0)^2 = (E2_mc2 - c2) + (2*GM)/r0 - (h^2)/(r0^2) + (r_schw*h^2)/(r0^3)
Maybe some kind of python code like this may work:
import math
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
#from ode_helpers import state_plotter
# u = [phi, Y, V, t] or if time is excluded
# u = [phi, Y, V]
def f(tau, u, param):
E2_mc2, c2, GM, h, r_schw = param
Y = u[1]
f_phi = h / (Y**2)
f_Y = u[2] # this is the dr / dt auxiliary equation
f_V = - GM / (Y**2) + h**2 / (Y**3) - 3*r_schw*(h**2) / (2*Y**4)
#f_time = (E2_mc2 * Y) / (Y - r_schw) # this is the equation of the time coordinate
return [f_phi, f_Y, f_V] # or [f_phi, f_Y, f_V, f_time]
# from the initial value for r = Y0 and given energy E,
# calculate the initial rate of change dr / dtau = V0
def ivp(Y0, param, sign):
E2_mc2, c2, GM, h, r_schw = param
V0 = math.sqrt((E2_mc2 - c2) + (2*GM)/Y0 - (h**2)/(Y0**2) + (r_schw*h**2)/(Y0**3))
return sign*V0
G = 4.30091252525 * (pow(10, -3)) #Gravitational constant in (parsec*km^2)/(Ms*sec^2)
c = 0.0020053761 #speed of light , AU/sec
M = 170000 #mass of the central body, in solar masses
m = 10 #mass of the orbiting body, in solar masses
Lz= 0.000024 #Angular momemntum
h = Lz / m #Just the constant in equation
E= 1.715488e-007 #energy
c2 = c**2
E2_mc2 = (E**2) / (c2*m**2)
GM = G*M
r_schw = 2*GM / c2
param = [E2_mc2, c2, GM, h, r_schw]
Y0 = r_schw
sign = 1 # or -1
V0 = ivp(Y0, param, sign)
tau_span = np.linspace(1, 1000, num=1000)
u0 = [math.pi, Y0, V0]
sol = solve_ivp(lambda tau, u: f(tau, u, param), [1, 1000], u0, t_eval=tau_span)
Double check the equations, mistakes and inaccuracies are possible.

Stop Integrating when Output Reaches 0 in scipy.integrate.odeint

I've written a code which looks at projectile motion of an object with drag. I'm using odeint from scipy to do the forward Euler method. The integration runs until a time limit is reached. I would like to stop integrating either when this limit is reached or when the output for ry = 0 (i.e. the projectile has landed).
def f(y, t):
# takes vector y(t) and time t and returns function y_dot = F(t,y) as a vector
# y(t) = [vx(t), vy(t), rx(t), ry(t)]
# magnitude of velocity calculated
v = np.sqrt(y[0] ** 2 + y[1] **2)
# define new drag constant k
k = cd * rho * v * A / (2 * m)
return [-k * y[0], -k * y[1] - g, y[0], y[1]]
def solve_f(v0, ang, h0, tstep, tmax=1000):
# uses the forward Euler time integration method to solve the ODE using f function
# create vector for time steps based on args
t = np.linspace(0, tmax, tmax / tstep)
# convert angle from degrees to radians
ang = ang * np.pi / 180
return odeint(f, [v0 * np.cos(ang), v0 * np.sin(ang), 0, h0], t)
solution = solve_f(v0, ang, h0, tstep)
I've tried several loops and similar to try to stop integrating when ry = 0. And found this question below but have not been able to implement something similar with odeint. solution[:,3] is the output column for ry. Is there a simple way to do this with odeint?
Does scipy.integrate.ode.set_solout work?
Checkout scipy.integrate.ode here. It is more flexible than odeint and helps with what you want to do.
A simple example using a vertical shot, integrated until it touches ground:
from scipy.integrate import ode, odeint
import scipy.constants as SPC
def f(t, y):
return [y[1], -SPC.g]
v0 = 10
y0 = 0
r = ode(f)
r.set_initial_value([y0, v0], 0)
dt = 0.1
while r.successful() and r.y[0] >= 0:
print('time: {:.3f}, y: {:.3f}, vy: {:.3f}'.format(r.t + dt, *r.integrate(r.t + dt)))
Each time you call r.integrate, r will store current time and y value. You can pass them to a list if you want to store them.
Let's solve this as the boundary value problem that it is. We have the conditions x(0)=0, y(0)=h0, vx(0)=0, vy(0)=0 and y(T)=0. To have a fixed integration interval, set t=T*s, which means that dx/ds=T*dx/dt=T*vx etc.
def fall_ode(t,u,p):
vx, vy, rx, ry = u
T = p[0]
# magnitude of velocity calculated
v = np.hypot(vx, vy)
# define new drag constant k
k = cd * rho * v * A / (2 * m)
return np.array([-k * vx, -k * vy - g, vx, vy])*T
def solve_fall(v0, ang, h0):
# convert angle from degrees to radians
ang = ang * np.pi / 180
vx0, vy0 = v0*np.cos(ang), v0*np.sin(ang)
def fall_bc(y0, y1, p): return [ y0[0]-vx0, y0[1]-vy0, y0[2], y0[3]-h0, y1[3] ]
t_init = [0,1]
u_init = [[0,0],[0,0],[0,0], [h0,0]]
p_init = [1]
res = solve_bvp(fall_ode, fall_bc, t_init, u_init, p_init)
print res.message
if res.success:
print "time to ground: ", res.p[0]
# res.sol is a dense output, evaluation interpolates the computed values
return res.sol
sol = solve_fall(300, 30, 20)
s = np.linspace(0,1,501)
u = sol(s)
vx, vy, rx, ry = u
plt.plot(rx, ry)

Multi-Variable Gradient Descent using Numpy - Error in no. of coefficients

For the past few days, I have been trying to code this application of Gradient Descent for my final-year project in Mechanical Engineering. https://drive.google.com/open?id=1tIGqZ2Lb0sN4GEpgYEZLFvtmhigXnot0 The HTML file is attached above. Just download the file, and if you see the results. There are only 3 values in theta, whereas x has 3 independent variables. So it should have 4 values in theta.
The code is as follows. For the result, it is theta [-0.03312393 0.94409351 0.99853041]
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
x = df.loc[:,'0':'2']
y = df[3]
print (x)
m, n = np.shape(x)
numIterations= 200
alpha = 0.000001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

Categories