Related
I've written a code which looks at projectile motion of an object with drag. I'm using odeint from scipy to do the forward Euler method. The integration runs until a time limit is reached. I would like to stop integrating either when this limit is reached or when the output for ry = 0 (i.e. the projectile has landed).
def f(y, t):
# takes vector y(t) and time t and returns function y_dot = F(t,y) as a vector
# y(t) = [vx(t), vy(t), rx(t), ry(t)]
# magnitude of velocity calculated
v = np.sqrt(y[0] ** 2 + y[1] **2)
# define new drag constant k
k = cd * rho * v * A / (2 * m)
return [-k * y[0], -k * y[1] - g, y[0], y[1]]
def solve_f(v0, ang, h0, tstep, tmax=1000):
# uses the forward Euler time integration method to solve the ODE using f function
# create vector for time steps based on args
t = np.linspace(0, tmax, tmax / tstep)
# convert angle from degrees to radians
ang = ang * np.pi / 180
return odeint(f, [v0 * np.cos(ang), v0 * np.sin(ang), 0, h0], t)
solution = solve_f(v0, ang, h0, tstep)
I've tried several loops and similar to try to stop integrating when ry = 0. And found this question below but have not been able to implement something similar with odeint. solution[:,3] is the output column for ry. Is there a simple way to do this with odeint?
Does scipy.integrate.ode.set_solout work?
Checkout scipy.integrate.ode here. It is more flexible than odeint and helps with what you want to do.
A simple example using a vertical shot, integrated until it touches ground:
from scipy.integrate import ode, odeint
import scipy.constants as SPC
def f(t, y):
return [y[1], -SPC.g]
v0 = 10
y0 = 0
r = ode(f)
r.set_initial_value([y0, v0], 0)
dt = 0.1
while r.successful() and r.y[0] >= 0:
print('time: {:.3f}, y: {:.3f}, vy: {:.3f}'.format(r.t + dt, *r.integrate(r.t + dt)))
Each time you call r.integrate, r will store current time and y value. You can pass them to a list if you want to store them.
Let's solve this as the boundary value problem that it is. We have the conditions x(0)=0, y(0)=h0, vx(0)=0, vy(0)=0 and y(T)=0. To have a fixed integration interval, set t=T*s, which means that dx/ds=T*dx/dt=T*vx etc.
def fall_ode(t,u,p):
vx, vy, rx, ry = u
T = p[0]
# magnitude of velocity calculated
v = np.hypot(vx, vy)
# define new drag constant k
k = cd * rho * v * A / (2 * m)
return np.array([-k * vx, -k * vy - g, vx, vy])*T
def solve_fall(v0, ang, h0):
# convert angle from degrees to radians
ang = ang * np.pi / 180
vx0, vy0 = v0*np.cos(ang), v0*np.sin(ang)
def fall_bc(y0, y1, p): return [ y0[0]-vx0, y0[1]-vy0, y0[2], y0[3]-h0, y1[3] ]
t_init = [0,1]
u_init = [[0,0],[0,0],[0,0], [h0,0]]
p_init = [1]
res = solve_bvp(fall_ode, fall_bc, t_init, u_init, p_init)
print res.message
if res.success:
print "time to ground: ", res.p[0]
# res.sol is a dense output, evaluation interpolates the computed values
return res.sol
sol = solve_fall(300, 30, 20)
s = np.linspace(0,1,501)
u = sol(s)
vx, vy, rx, ry = u
plt.plot(rx, ry)
For example, the below code simulates Geometric Brownian Motion (GBM) process, which satisfies the following stochastic differential equation:
The code is a condensed version of the code in this Wikipedia article.
import numpy as np
np.random.seed(1)
def gbm(mu=1, sigma = 0.6, x0=100, n=50, dt=0.1):
step = np.exp( (mu - sigma**2 / 2) * dt ) * np.exp( sigma * np.random.normal(0, np.sqrt(dt), (1, n)))
return x0 * step.cumprod()
series = gbm()
How to fit the GBM process in Python? That is, how to estimate mu and sigma and solve the stochastic differential equation given the timeseries series?
Parameter estimation for SDEs is a research level area, and thus rather non-trivial. Whole books exist on the topic. Feel free to look into those for more details.
But here's a trivial approach for this case. Firstly, note that the log of GBM is an affinely transformed Wiener process (i.e. a linear Ito drift-diffusion process). So
d ln(S_t) = (mu - sigma^2 / 2) dt + sigma dB_t
Thus we can estimate the log process parameters and translate them to fit the original process. Check out
[1],
[2],
[3],
[4], for example.
Here's a script that does this in two simple ways for the drift (just wanted to see the difference), and just one for the diffusion (sorry). The drift of the log-process is estimated by (X_T - X_0) / T and via the incremental MLE (see code). The diffusion parameter is estimated (in a biased way) with its definition as the infinitesimal variance.
import numpy as np
np.random.seed(9713)
# Parameters
mu = 1.5
sigma = 0.9
x0 = 1.0
n = 1000
dt = 0.05
# Times
T = dt*n
ts = np.linspace(dt, T, n)
# Geometric Brownian motion generator
def gbm(mu, sigma, x0, n, dt):
step = np.exp( (mu - sigma**2 / 2) * dt ) * np.exp( sigma * np.random.normal(0, np.sqrt(dt), (1, n)))
return x0 * step.cumprod()
# Estimate mu just from the series end-points
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
def simple_estimate_mu(series):
return (series[-1] - x0) / T
# Use all the increments combined (maximum likelihood estimator)
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
def incremental_estimate_mu(series):
total = (1.0 / dt) * (ts**2).sum()
return (1.0 / total) * (1.0 / dt) * ( ts * series ).sum()
# This just estimates the sigma by its definition as the infinitesimal variance (simple Monte Carlo)
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
# One can do better than this of course (MLE?)
def estimate_sigma(series):
return np.sqrt( ( np.diff(series)**2 ).sum() / (n * dt) )
# Estimator helper
all_estimates0 = lambda s: (simple_estimate_mu(s), incremental_estimate_mu(s), estimate_sigma(s))
# Since log-GBM is a linear Ito drift-diffusion process (scaled Wiener process with drift), we
# take the log of the realizations, compute mu and sigma, and then translate the mu and sigma
# to that of the GBM (instead of the log-GBM). (For sigma, nothing is required in this simple case).
def gbm_drift(log_mu, log_sigma):
return log_mu + 0.5 * log_sigma**2
# Translates all the estimates from the log-series
def all_estimates(es):
lmu1, lmu2, sigma = all_estimates0(es)
return gbm_drift(lmu1, sigma), gbm_drift(lmu2, sigma), sigma
print('Real Mu:', mu)
print('Real Sigma:', sigma)
### Using one series ###
series = gbm(mu, sigma, x0, n, dt)
log_series = np.log(series)
print('Using 1 series: mu1 = %.2f, mu2 = %.2f, sigma = %.2f' % all_estimates(log_series) )
### Using K series ###
K = 10000
s = [ np.log(gbm(mu, sigma, x0, n, dt)) for i in range(K) ]
e = np.array( [ all_estimates(si) for si in s ] )
avgs = np.mean(e, axis=0)
print('Using %d series: mu1 = %.2f, mu2 = %.2f, sigma = %.2f' % (K, avgs[0], avgs[1], avgs[2]) )
The output:
Real Mu: 1.5
Real Sigma: 0.9
Using 1 series: mu1 = 1.56, mu2 = 1.54, sigma = 0.96
Using 10000 series: mu1 = 1.51, mu2 = 1.53, sigma = 0.93
I have implemented the following explicit euler method in python:
def explicit_euler(df, x0, h, N):
"""Solves an ODE IVP using the Explicit Euler method.
Keyword arguments:
df - The derivative of the system you wish to solve.
x0 - The initial value of the system you wish to solve.
h - The step size.
N - The number off steps.
"""
x = np.zeros(N)
x[0] = x0
for i in range(0, N-1):
x[i+1] = x[i] + h * df(x[i])
return x
Following the article on wikipedia I can plot the function and verify that I get the same plot: . I believe that here the method I have written is working correctly.
Next I tried to use it to solve the last system given on this page and instead of the plot shown there I obtain this:
I am not sure why my plot doesn't match the one shown on the webpage. The explicit euler method seems to work fine when I use it to solve systems where the slope doesn't change, but for an oscillating function it never seems to mimic it at all. Not even showing the expected error gain as indicated on the linked webpage. I am not sure what is wrong with the method I have implemented.
Here is the code used for plotting and the derivative:
def g(t):
return -0.5 * np.exp(t * 0.5) * np.sin(5 * t) + 5 * np.exp(t * 0.5)
* np.cos(5 * t)
h = 0.001
x0 = 0
tn = 4
N = int(tn / h)
x = ee.explicit_euler(f, x0, h, N)
t = np.arange(0, tn, h)
fig = plt.figure()
plt.plot(t, x, label="Explicit Euler")
plt.plot(t, (np.exp(0.5 * t) * np.sin(5 * t)), label="Analytical
solution")
#plt.plot(t, np.exp(0.5 * t), label="Analytical solution")
plt.xlabel('Timesteps t')
plt.ylabel('x(t)=e^(0.5*t) * sin(5*t)')
plt.legend()
plt.grid()
plt.show()
Edit:
As requested here is the current equation I am applying the method to:
y'-y=-0.5*e^(t/2)*sin(5t)+5e^(t/2)*cos(5t)
Where y(0)=0.
I would like to make clear however that this behaviour doesn't occur just for this equation but all equations where the slope has a change in sign, or oscillating behaviour.
Edit 2:
Ok thanks. Yes the code below does indeed work. But I have one further question. In the simple example I had for the exponential function, I had defined a method:
def f(x):
return x
for the system f'(x)=x. This gave the output of my first graph which looks correct. I then defined another function:
def k(x):
return cos(x)
for the system f'(x)=cos(x), this does not give expected output. But when I change the function definition to
def k(t, x):
return cos(t)
I get the expected output. If I change my function
def f(t, x):
return t
I get an incorrect output. Am I always actually evaluating the function at a time step and is it just by chance for the system x'=x that at each time step the value is just the value of x?
I had understood that the Euler method used the value of the previously calculated value in order to get the next value. But if I run code for my function k(x)=cos(x), I get output pictured below, which must be incorrect. This now uses the updated code you provided.
def k(t, x):
return np.cos(x)
h = 0.1 # Step size
x0 = (0, 0) # Initial point of iteration
tn = 10 # Time step to iterate to
N = int(tn / h) # Number of steps
x = ee.explicit_euler(k, x0, h, N)
t = np.arange(0, tn, h)
The problem is that you have incorrectly raised the function g, you want to solve the equation:
From where we observe that:
y' = y -0.5*e^(t/2)*sin(5t)+5e^(t/2)*cos(5t)
Then we define the function f(t, y) = y -0.5*e^(t/2)*sin(5t)+5e^(t/2)*cos(5t) as:
def f(t, y):
return y -0.5 * np.exp(t * 0.5) * np.sin(5 * t) + 5 * np.exp(t * 0.5) * np.cos(5 * t)
The initial point of iteration is f0=(t(0), y(0)):
f0 = (0, 0)
Then from Euler's equations:
def explicit_euler(df, x0, h, N):
"""Solves an ODE IVP using the Explicit Euler method.
Keyword arguments:
df - The derivative of the system you wish to solve.
x0 - The initial value of the system you wish to solve.
h - The step size.
N - The number off steps.
"""
x = np.zeros(N)
t, x[0] = x0
for i in range(0, N-1):
x[i+1] = x[i] + h * df(t ,x[i])
t += h
return x
Complete Code:
def explicit_euler(df, x0, h, N):
"""Solves an ODE IVP using the Explicit Euler method.
Keyword arguments:
df - The derivative of the system you wish to solve.
x0 - The initial value of the system you wish to solve.
h - The step size.
N - The number off steps.
"""
x = np.zeros(N)
t, x[0] = x0
for i in range(0, N-1):
x[i+1] = x[i] + h * df(t ,x[i])
t += h
return x
def df(t, y):
return -0.5 * np.exp(t * 0.5) * np.sin(5 * t) + 5 * np.exp(t * 0.5) * np.cos(5 * t) + y
h = 0.001
f0 = (0, 0)
tn = 4
N = int(tn / h)
x = explicit_euler(df, f0, h, N)
t = np.arange(0, tn, h)
fig = plt.figure()
plt.plot(t, x, label="Explicit Euler")
plt.plot(t, (np.exp(0.5 * t) * np.sin(5 * t)), label="Analytical solution")
#plt.plot(t, np.exp(0.5 * t), label="Analytical solution")
plt.xlabel('Timesteps t')
plt.ylabel('x(t)=e^(0.5*t) * sin(5*t)')
plt.legend()
plt.grid()
plt.show()
Screenshot:
Dump y' and what is on the right side is what you should place in the df function.
We will modify the variables to maintain the same standard for the variables, and will y be the dependent variable, and t the independent variable.
Equation 2: In this case the equation f'(x)=cos(x) will be rewritten to:
y'=cos(t)
Then:
def df(t, y):
return np.cos(t)
In conclusion, if we have an equation of the following form:
y' = f(t, y)
Then:
def df(t, y):
return f(t, y)
I am using polyfit to fit my data to a line. The equation of the line is of the form y = mx + b. I am trying to retrieve the error on the slope and the error on the y-intercept. Here is my code:
fit, res, _, _, _ = np.polyfit(X,Y,1, full = True)
This method returns the residuals. But I don't want the residuals. So here's another method I used:
slope, intercept, r_value, p_value, std_err = stats.linregress(X,Y)
I am aware that std_err returns the error on the slope. I still need to get the standard deviation of the y-intercept. How can I do this?
If you can use a least squares fit, you can calculate the slope, y-intercept, correlation coefficient, standard deviation of the slope, and standard deviation of the y-intercept with the following function:
import numpy as np
def lsqfity(X, Y):
"""
Calculate a "MODEL-1" least squares fit.
The line is fit by MINIMIZING the residuals in Y only.
The equation of the line is: Y = my * X + by.
Equations are from Bevington & Robinson (1992)
Data Reduction and Error Analysis for the Physical Sciences, 2nd Ed."
pp: 104, 108-109, 199.
Data are input and output as follows:
my, by, ry, smy, sby = lsqfity(X,Y)
X = x data (vector)
Y = y data (vector)
my = slope
by = y-intercept
ry = correlation coefficient
smy = standard deviation of the slope
sby = standard deviation of the y-intercept
"""
X, Y = map(np.asanyarray, (X, Y))
# Determine the size of the vector.
n = len(X)
# Calculate the sums.
Sx = np.sum(X)
Sy = np.sum(Y)
Sx2 = np.sum(X ** 2)
Sxy = np.sum(X * Y)
Sy2 = np.sum(Y ** 2)
# Calculate re-used expressions.
num = n * Sxy - Sx * Sy
den = n * Sx2 - Sx ** 2
# Calculate my, by, ry, s2, smy and sby.
my = num / den
by = (Sx2 * Sy - Sx * Sxy) / den
ry = num / (np.sqrt(den) * np.sqrt(n * Sy2 - Sy ** 2))
diff = Y - by - my * X
s2 = np.sum(diff * diff) / (n - 2)
smy = np.sqrt(n * s2 / den)
sby = np.sqrt(Sx2 * s2 / den)
return my, by, ry, smy, sby
print lsqfity([0,2,4,6,8],[0,3,6,9,12])
Output:
(1, 0, 1.0, 0.0, 2.4494897427831779)
The function was written by Filipe P. A. Fernandes, and was originally posted here.
I am trying to make a gaussian fit over many data points. E.g. I have a 256 x 262144 array of data. Where the 256 points need to be fitted to a gaussian distribution, and I need 262144 of them.
Sometimes the peak of the gaussian distribution is outside the data-range, so to get an accurate mean result curve-fitting is the best approach. Even if the peak is inside the range, curve-fitting gives a better sigma because other data is not in the range.
I have this working for one data point, using code from http://www.scipy.org/Cookbook/FittingData .
I have tried to just repeat this algorithm, but it looks like it is going to take something in the order of 43 minutes to solve this. Is there an already-written fast way of doing this in parallel or more efficiently?
from scipy import optimize
from numpy import *
import numpy
# Fitting code taken from: http://www.scipy.org/Cookbook/FittingData
class Parameter:
def __init__(self, value):
self.value = value
def set(self, value):
self.value = value
def __call__(self):
return self.value
def fit(function, parameters, y, x = None):
def f(params):
i = 0
for p in parameters:
p.set(params[i])
i += 1
return y - function(x)
if x is None: x = arange(y.shape[0])
p = [param() for param in parameters]
optimize.leastsq(f, p)
def nd_fit(function, parameters, y, x = None, axis=0):
"""
Tries to an n-dimensional array to the data as though each point is a new dataset valid across the appropriate axis.
"""
y = y.swapaxes(0, axis)
shape = y.shape
axis_of_interest_len = shape[0]
prod = numpy.array(shape[1:]).prod()
y = y.reshape(axis_of_interest_len, prod)
params = numpy.zeros([len(parameters), prod])
for i in range(prod):
print "at %d of %d"%(i, prod)
fit(function, parameters, y[:,i], x)
for p in range(len(parameters)):
params[p, i] = parameters[p]()
shape[0] = len(parameters)
params = params.reshape(shape)
return params
Note that the data isn't necessarily 256x262144 and i've done some fudging around in nd_fit to make this work.
The code I use to get this to work is
from curve_fitting import *
import numpy
frames = numpy.load("data.npy")
y = frames[:,0,0,20,40]
x = range(0, 512, 2)
mu = Parameter(x[argmax(y)])
height = Parameter(max(y))
sigma = Parameter(50)
def f(x): return height() * exp (-((x - mu()) / sigma()) ** 2)
ls_data = nd_fit(f, [mu, sigma, height], frames, x, 0)
Note: The solution posted below by #JoeKington is great and solves really fast. However it doesn't appear to work unless the significant area of the gaussian is inside the appropriate area. I will have to test if the mean is still accurate though, as that is the main thing I use this for.
The easiest thing to do is to linearlize the problem. You're using a non-linear, iterative method which will be slower than a linear least squares solution.
Basically, you have:
y = height * exp(-(x - mu)^2 / (2 * sigma^2))
To make this a linear equation, take the (natural) log of both sides:
ln(y) = ln(height) - (x - mu)^2 / (2 * sigma^2)
This then simplifies to the polynomial:
ln(y) = -x^2 / (2 * sigma^2) + x * mu / sigma^2 - mu^2 / sigma^2 + ln(height)
We can recast this in a bit simpler form:
ln(y) = A * x^2 + B * x + C
where:
A = 1 / (2 * sigma^2)
B = mu / (2 * sigma^2)
C = mu^2 / sigma^2 + ln(height)
However, there's one catch. This will become unstable in the presence of noise in the "tails" of the distribution.
Therefore, we need to use only the data near the "peaks" of the distribution. It's easy enough to only include data that falls above some threshold in the fitting. In this example, I'm only including data that's greater than 20% of the maximum observed value for a given gaussian curve that we're fitting.
Once we've done this, though, it's rather fast. Solving for 262144 different gaussian curves takes only ~1 minute (Be sure to removing the plotting portion of the code if you run it on something that large...). It's also quite easy to parallelize, if you want...
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import itertools
def main():
x, data = generate_data(256, 6)
model = [invert(x, y) for y in data.T]
sigma, mu, height = [np.array(item) for item in zip(*model)]
prediction = gaussian(x, sigma, mu, height)
plot(x, data, linestyle='none', marker='o')
plot(x, prediction, linestyle='-')
plt.show()
def invert(x, y):
# Use only data within the "peak" (20% of the max value...)
key_points = y > (0.2 * y.max())
x = x[key_points]
y = y[key_points]
# Fit a 2nd order polynomial to the log of the observed values
A, B, C = np.polyfit(x, np.log(y), 2)
# Solve for the desired parameters...
sigma = np.sqrt(-1 / (2.0 * A))
mu = B * sigma**2
height = np.exp(C + 0.5 * mu**2 / sigma**2)
return sigma, mu, height
def generate_data(numpoints, numcurves):
np.random.seed(3)
x = np.linspace(0, 500, numpoints)
height = 100 * np.random.random(numcurves)
mu = 200 * np.random.random(numcurves) + 200
sigma = 100 * np.random.random(numcurves) + 0.1
data = gaussian(x, sigma, mu, height)
noise = 5 * (np.random.random(data.shape) - 0.5)
return x, data + noise
def gaussian(x, sigma, mu, height):
data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)
return height * np.exp(data)
def plot(x, ydata, ax=None, **kwargs):
if ax is None:
ax = plt.gca()
colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])
for y, color in zip(ydata.T, colorcycle):
ax.plot(x, y, color=color, **kwargs)
main()
The only thing we'd need to change for a parallel version is the main function. (We also need a dummy function because multiprocessing.Pool.imap can't supply additional arguments to its function...) It would look something like this:
def parallel_main():
import multiprocessing
p = multiprocessing.Pool()
x, data = generate_data(256, 262144)
args = itertools.izip(itertools.repeat(x), data.T)
model = p.imap(parallel_func, args, chunksize=500)
sigma, mu, height = [np.array(item) for item in zip(*model)]
prediction = gaussian(x, sigma, mu, height)
def parallel_func(args):
return invert(*args)
Edit: In cases where the simple polynomial fitting isn't working well, try weighting the problem by the y-values, as mentioned in the link/paper that #tslisten shared (and Stefan van der Walt implemented, though my implementation is a bit different).
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import itertools
def main():
def run(x, data, func, threshold=0):
model = [func(x, y, threshold=threshold) for y in data.T]
sigma, mu, height = [np.array(item) for item in zip(*model)]
prediction = gaussian(x, sigma, mu, height)
plt.figure()
plot(x, data, linestyle='none', marker='o', markersize=4)
plot(x, prediction, linestyle='-', lw=2)
x, data = generate_data(256, 6, noise=100)
threshold = 50
run(x, data, weighted_invert, threshold=threshold)
plt.title('Weighted by Y-Value')
run(x, data, invert, threshold=threshold)
plt.title('Un-weighted Linear Inverse'
plt.show()
def invert(x, y, threshold=0):
mask = y > threshold
x, y = x[mask], y[mask]
# Fit a 2nd order polynomial to the log of the observed values
A, B, C = np.polyfit(x, np.log(y), 2)
# Solve for the desired parameters...
sigma, mu, height = poly_to_gauss(A,B,C)
return sigma, mu, height
def poly_to_gauss(A,B,C):
sigma = np.sqrt(-1 / (2.0 * A))
mu = B * sigma**2
height = np.exp(C + 0.5 * mu**2 / sigma**2)
return sigma, mu, height
def weighted_invert(x, y, weights=None, threshold=0):
mask = y > threshold
x,y = x[mask], y[mask]
if weights is None:
weights = y
else:
weights = weights[mask]
d = np.log(y)
G = np.ones((x.size, 3), dtype=np.float)
G[:,0] = x**2
G[:,1] = x
model,_,_,_ = np.linalg.lstsq((G.T*weights**2).T, d*weights**2)
return poly_to_gauss(*model)
def generate_data(numpoints, numcurves, noise=None):
np.random.seed(3)
x = np.linspace(0, 500, numpoints)
height = 7000 * np.random.random(numcurves)
mu = 1100 * np.random.random(numcurves)
sigma = 100 * np.random.random(numcurves) + 0.1
data = gaussian(x, sigma, mu, height)
if noise is None:
noise = 0.1 * height.max()
noise = noise * (np.random.random(data.shape) - 0.5)
return x, data + noise
def gaussian(x, sigma, mu, height):
data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)
return height * np.exp(data)
def plot(x, ydata, ax=None, **kwargs):
if ax is None:
ax = plt.gca()
colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])
for y, color in zip(ydata.T, colorcycle):
#kwargs['color'] = kwargs.get('color', color)
ax.plot(x, y, color=color, **kwargs)
main()
If that's still giving you trouble, then try iteratively-reweighting the least-squares problem (The final "best" reccomended method in the link #tslisten mentioned). Keep in mind that this will be considerably slower, however.
def iterative_weighted_invert(x, y, threshold=None, numiter=5):
last_y = y
for _ in range(numiter):
model = weighted_invert(x, y, weights=last_y, threshold=threshold)
last_y = gaussian(x, *model)
return model