I will try and explain exactly what's going on and my issue.
This is a bit mathy and SO doesn't support latex, so sadly I had to resort to images. I hope that's okay.
I don't know why it's inverted, sorry about that.
At any rate, this is a linear system Ax = b where we know A and b, so we can find x, which is our approximation at the next time step. We continue doing this until time t_final.
This is the code
import numpy as np
tau = 2 * np.pi
tau2 = tau * tau
i = complex(0,1)
def solution_f(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) + np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
def solution_g(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) - np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
for l in range(2, 12):
N = 2 ** l #number of grid points
dx = 1.0 / N #space between grid points
dx2 = dx * dx
dt = dx #time step
t_final = 1
approximate_f = np.zeros((N, 1), dtype = np.complex)
approximate_g = np.zeros((N, 1), dtype = np.complex)
#Insert initial conditions
for k in range(N):
approximate_f[k, 0] = np.cos(tau * k * dx)
approximate_g[k, 0] = -i * np.sin(tau * k * dx)
#Create coefficient matrix
A = np.zeros((2 * N, 2 * N), dtype = np.complex)
#First row is special
A[0, 0] = 1 -3*i*dt
A[0, N] = ((2 * dt / dx2) + dt) * i
A[0, N + 1] = (-dt / dx2) * i
A[0, -1] = (-dt / dx2) * i
#Last row is special
A[N - 1, N - 1] = 1 - (3 * dt) * i
A[N - 1, N] = (-dt / dx2) * i
A[N - 1, -2] = (-dt / dx2) * i
A[N - 1, -1] = ((2 * dt / dx2) + dt) * i
#middle
for k in range(1, N - 1):
A[k, k] = 1 - (3 * dt) * i
A[k, k + N - 1] = (-dt / dx2) * i
A[k, k + N] = ((2 * dt / dx2) + dt) * i
A[k, k + N + 1] = (-dt / dx2) * i
#Bottom half
A[N :, :N] = A[:N, N:]
A[N:, N:] = A[:N, :N]
Ainv = np.linalg.inv(A)
#Advance through time
time = 0
while time < t_final:
b = np.concatenate((approximate_f, approximate_g), axis = 0)
x = np.dot(Ainv, b) #Solve Ax = b
approximate_f = x[:N]
approximate_g = x[N:]
time += dt
approximate_solution = np.concatenate((approximate_f, approximate_g), axis=0)
#Calculate the actual solution
actual_f = np.zeros((N, 1), dtype = np.complex)
actual_g = np.zeros((N, 1), dtype = np.complex)
for k in range(N):
actual_f[k, 0] = solution_f(t_final, k * dx)
actual_g[k, 0] = solution_g(t_final, k * dx)
actual_solution = np.concatenate((actual_f, actual_g), axis = 0)
print(np.sqrt(dx) * np.linalg.norm(actual_solution - approximate_solution))
It doesn't work. At least not in the beginning, it shouldn't start this slow. I should be unconditionally stable and converge to the right answer.
What's going wrong here?
The L2-norm can be a useful metric to test convergence, but isn't ideal when debugging as it doesn't explain what the problem is. Although your solution should be unconditionally stable, backward Euler won't necessarily converge to the right answer. Just like forward Euler is notoriously unstable (anti-dissipative), backward Euler is notoriously dissipative. Plotting your solutions confirms this. The numerical solutions converge to zero. For a next-order approximation, Crank-Nicolson is a reasonable candidate. The code below contains the more general theta-method so that you can tune the implicit-ness of the solution. theta=0.5 gives CN, theta=1 gives BE, and theta=0 gives FE.
A couple other things that I tweaked:
I selected a more appropriate time step of dt = (dx**2)/2 instead of dt = dx. That latter doesn't converge to the right solution using CN.
It's a minor note, but since t_final isn't guaranteed to be a multiple of dt, you weren't comparing solutions at the same time step.
With regards to your comment about it being slow: As you increase the spatial resolution, your time resolution needs to increase too. Even in your case with dt=dx, you have to perform a (1024 x 1024)*1024 matrix multiplication 1024 times. I didn't find this to take particularly long on my machine. I removed some unneeded concatenation to speed it up a bit, but changing the time step to dt = (dx**2)/2 will really bog things down, unfortunately. You could trying compiling with Numba if you are concerned with speed.
All that said, I didn't find tremendous success with the consistency of CN. I had to set N=2^6 to get anything at t_final=1. Increasing t_final makes this worse, decreasing t_final makes it better. Depending on your needs, you could looking into implementing TR-BDF2 or other linear multistep methods to improve this.
The code with a plot is below:
import numpy as np
import matplotlib.pyplot as plt
tau = 2 * np.pi
tau2 = tau * tau
i = complex(0,1)
def solution_f(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) + np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
def solution_g(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) - np.exp(tau * i * x) *
np.exp((tau2 + 4) * i * t))
l=6
N = 2 ** l
dx = 1.0 / N
dx2 = dx * dx
dt = dx2/2
t_final = 1.
x_arr = np.arange(0,1,dx)
approximate_f = np.cos(tau*x_arr)
approximate_g = -i*np.sin(tau*x_arr)
H = np.zeros([2*N,2*N], dtype=np.complex)
for k in range(N):
H[k,k] = -3*i*dt
H[k,k+N] = (2/dx2+1)*i*dt
if k==0:
H[k,N+1] = -i/dx2*dt
H[k,-1] = -i/dx2*dt
elif k==N-1:
H[N-1,N] = -i/dx2*dt
H[N-1,-2] = -i/dx2*dt
else:
H[k,k+N-1] = -i/dx2*dt
H[k,k+N+1] = -i/dx2*dt
### Bottom half
H[N :, :N] = H[:N, N:]
H[N:, N:] = H[:N, :N]
### Theta method. 0.5 -> Crank Nicolson
theta=0.5
A = np.eye(2*N)+H*theta
B = np.eye(2*N)-H*(1-theta)
### Precompute for faster computations
mat = np.linalg.inv(A)#B
t = 0
b = np.concatenate((approximate_f, approximate_g))
while t < t_final:
t += dt
b = mat#b
approximate_f = b[:N]
approximate_g = b[N:]
approximate_solution = np.concatenate((approximate_f, approximate_g))
#Calculate the actual solution
actual_f = solution_f(t,np.arange(0,1,dx))
actual_g = solution_g(t,np.arange(0,1,dx))
actual_solution = np.concatenate((actual_f, actual_g))
plt.figure(figsize=(7,5))
plt.plot(x_arr,actual_f.real,c="C0",label=r"$Re(f_\mathrm{true})$")
plt.plot(x_arr,actual_f.imag,c="C1",label=r"$Im(f_\mathrm{true})$")
plt.plot(x_arr,approximate_f.real,c="C0",ls="--",label=r"$Re(f_\mathrm{num})$")
plt.plot(x_arr,approximate_f.imag,c="C1",ls="--",label=r"$Im(f_\mathrm{num})$")
plt.legend(loc=3,fontsize=12)
plt.xlabel("x")
plt.savefig("num_approx.png",dpi=150)
I am not going to go through all of your math, but I'm going to offer a suggestion.
The use of a direct calculation for fxx and gxx seems like a good candidate for being numerically unstable. Intuitively a first order method should be expected to make second order mistakes in the terms. Second order mistakes in the individual terms, after passing through that formula, wind up as constant order mistakes in the second derivative. Plus when your step size gets small, you are going to find that a quadratic formula makes even small roundoff mistakes turn into surprisingly large errors.
Instead I would suggest that you start by turning this into a first-order system of 4 functions, f, fx, g, and gx. And then proceed with backward's Euler on that system. Intuitively, with this approach, a first order method creates second order mistakes, which pass through a formula that creates first order mistakes of them. And now you are converging as you should from the start, and are also not as sensitive to propagation of roundoff errors.
Related
This is more of a computational physics problem, and I've asked it on physics stack exchange, but no answers on there. This is, I suppose, a mix of the disciplines on here and there (and maybe even mathematics stack exchange), so finding the right place to post is a task in of itself apparently...
I'm attempting to use Crank-Nicolson scheme to solve the TDSE in 1D. The initial wave is a real Gaussian that has been normalised wrt its probability density. As the solution evolves, a depression grows in the central peak of the real part of the wave, and the imaginary part's central trough is perhaps a bit higher than I expect (image below).
Does this behaviour seem reasonable? I have searched around and not seen questions/figures that are similar. I've tested another person's code from Github and it exhibits the same behaviour, which makes me feel a bit better. But I still think the center peak should just decrease in height and increase in width. The likelihood of me getting a physics-based explanation is relatively low here I'd assume, but a computational-based explanation on errors I may have made is more likely.
I'm happy to give more information, for example my code, or the matrices used in the scheme, etc. Thanks in advance!
Here's a link to GIF of time evolution:
And the part of my code relevant to solving the 1D TDSE:
(pretty much the entire thing except the plotting)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Define function for norm.
def normf(dxc, uc, ic):
return sum(dxc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of position.
def xexpf(dxc, xc, uc, ic):
return sum(dxc * xc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of squared position.
def xexpsf(dxc, xc, uc, ic):
return sum(dxc * np.square(xc) * np.square(np.abs(uc[ic, :])))
# Define function for standard deviation.
def sdaf(xexpc, xexpsc, ic):
return np.sqrt(xexpsc[ic] - np.square(xexpc[ic]))
# Time t: t0 =< t =< tf. Have N steps at which to evaluate the CN scheme. The
# time interval is dt. decp: variable for plotting to certain number of decimal
# places.
t0 = 0
tf = 20
N = 200
dt = tf / N
t = np.linspace(t0, tf, num = N + 1, endpoint = True)
decp = str(dt)[::-1].find('.')
# Initialise array for filling with norm values at each time step.
norm = np.zeros(len(t))
# Initialise array for expectation value of position.
xexp = np.zeros(len(t))
# Initialise array for expectation value of squared position.
xexps = np.zeros(len(t))
# Initialise array for alternate standard deviation.
sda = np.zeros(len(t))
# Position x: -a =< x =< a. M is an even number. There are M + 1 total discrete
# positions, for the points to be symmetric and centred at x = 0.
a = 100
M = 1200
dx = (2 * a) / M
x = np.linspace(-a, a, num = M + 1, endpoint = True)
# The gaussian function u diffuses over time. sd sets the width of gaussian. u0
# is the initial gaussian at t0.
sd = 1
var = np.power(sd, 2)
mu = 0
u0 = np.sqrt(1 / np.sqrt(np.pi * var)) * np.exp(-np.power(x - mu, 2) / (2 * \
var))
u = np.zeros([len(t), len(x)], dtype = 'complex_')
u[0, :] = u0
# Normalise u.
u[0, :] = u[0, :] / np.sqrt(normf(dx, u, 0))
# Set coefficients of CN scheme.
alpha = dt * -1j / (4 * np.power(dx, 2))
beta = dt * 1j / (4 * np.power(dx, 2))
# Tridiagonal matrices Al and AR. Al to be solved using Thomas algorithm.
Al = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Al[i + 1, i] = alpha
Al[i, i] = 1 - (2 * alpha)
Al[i, i + 1] = alpha
# Corner elements for BC's.
Al[M, M], Al[0, 0] = 1 - alpha, 1 - alpha
Ar = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Ar[i + 1, i] = beta
Ar[i, i] = 1 - (2 * beta)
Ar[i, i + 1] = beta
# Corner elements for BC's.
Ar[M, M], Ar[0, 0] = 1 - 2*beta, 1 - beta
# Thomas algorithm variables. Following similar naming as in Wiki article.
a = np.diag(Al, -1)
b = np.diag(Al)
c = np.diag(Al, 1)
NT = len(b)
cp = np.zeros(NT - 1, dtype = 'complex_')
for n in range(0, NT - 1):
if n == 0:
cp[n] = c[n] / b[n]
else:
cp[n] = c[n] / (b[n] - (a[n - 1] * cp[n - 1]))
d = np.zeros(NT, dtype = 'complex_')
dp = np.zeros(NT, dtype = 'complex_')
# Iterate over each time step to solve CN method. Maintain boundary
# conditions. Keep track of standard deviation.
for i in range(0, N):
# BC's.
u[i, 0], u[i, M] = 0, 0
# Find RHS.
d = np.dot(Ar, u[i, :])
for n in range(0, NT):
if n == 0:
dp[n] = d[n] / b[n]
else:
dp[n] = (d[n] - (a[n - 1] * dp[n - 1])) / (b[n] - (a[n - 1] * \
cp[n - 1]))
nc = NT - 1
while nc > -1:
if nc == NT - 1:
u[i + 1, nc] = dp[nc]
nc -= 1
else:
u[i + 1, nc] = dp[nc] - (cp[nc] * u[i + 1, nc + 1])
nc -= 1
norm[i] = normf(dx, u, i)
xexp[i] = xexpf(dx, x, u, i)
xexps[i] = xexpsf(dx, x, u, i)
sda[i] = sdaf(xexp, xexps, i)
# Fill in final norm value.
norm[N] = normf(dx, u, N)
# Fill in final position expectation value.
xexp[N] = xexpf(dx, x, u, N)
# Fill in final squared position expectation value.
xexps[N] = xexpsf(dx, x, u, N)
# Fill in final standard deviation value.
sda[N] = sdaf(xexp, xexps, N)
I'm having trouble with the slow computation of my Python code. Based on the pycallgraph below, the bottleneck seems to be the module named miepython.miepython.mie_S1_S2 (highlighted by pink), which takes 0.47 seconds per call.
The source code for this module is as follows:
import numpy as np
from numba import njit, int32, float64, complex128
__all__ = ('ez_mie',
'ez_intensities',
'generate_mie_costheta',
'i_par',
'i_per',
'i_unpolarized',
'mie',
'mie_S1_S2',
'mie_cdf',
'mie_mu_with_uniform_cdf',
)
#njit((complex128, float64, float64[:]), cache=True)
def _mie_S1_S2(m, x, mu):
"""
Calculate the scattering amplitude functions for spheres.
The amplitude functions have been normalized so that when integrated
over all 4*pi solid angles, the integral will be qext*pi*x**2.
The units are weird, sr**(-0.5)
Args:
m: the complex index of refraction of the sphere
x: the size parameter of the sphere
mu: array of angles, cos(theta), to calculate scattering amplitudes
Returns:
S1, S2: the scattering amplitudes at each angle mu [sr**(-0.5)]
"""
nstop = int(x + 4.05 * x**0.33333 + 2.0) + 1
a = np.zeros(nstop - 1, dtype=np.complex128)
b = np.zeros(nstop - 1, dtype=np.complex128)
_mie_An_Bn(m, x, a, b)
nangles = len(mu)
S1 = np.zeros(nangles, dtype=np.complex128)
S2 = np.zeros(nangles, dtype=np.complex128)
nstop = len(a)
for k in range(nangles):
pi_nm2 = 0
pi_nm1 = 1
for n in range(1, nstop):
tau_nm1 = n * mu[k] * pi_nm1 - (n + 1) * pi_nm2
S1[k] += (2 * n + 1) * (pi_nm1 * a[n - 1]
+ tau_nm1 * b[n - 1]) / (n + 1) / n
S2[k] += (2 * n + 1) * (tau_nm1 * a[n - 1]
+ pi_nm1 * b[n - 1]) / (n + 1) / n
temp = pi_nm1
pi_nm1 = ((2 * n + 1) * mu[k] * pi_nm1 - (n + 1) * pi_nm2) / n
pi_nm2 = temp
# calculate norm = sqrt(pi * Qext * x**2)
n = np.arange(1, nstop + 1)
norm = np.sqrt(2 * np.pi * np.sum((2 * n + 1) * (a.real + b.real)))
S1 /= norm
S2 /= norm
return [S1, S2]
Apparently, the source code is jitted by Numba so it should be faster than it actually is. The number of iterations in for loop in this function is around 25,000 (len(mu)=50, len(a)-1=500).
Any ideas on how to speed up this computation? Is something hindering the fast computation of Numba? Or, do you think the computation is already fast enough?
[More details]
In the above, another function _mie_An_Bn is being used. This function is also jitted, and the source code is as follows:
#njit((complex128, float64, complex128[:], complex128[:]), cache=True)
def _mie_An_Bn(m, x, a, b):
"""
Compute arrays of Mie coefficients A and B for a sphere.
This estimates the size of the arrays based on Wiscombe's formula. The length
of the arrays is chosen so that the error when the series are summed is
around 1e-6.
Args:
m: the complex index of refraction of the sphere
x: the size parameter of the sphere
Returns:
An, Bn: arrays of Mie coefficents
"""
psi_nm1 = np.sin(x) # nm1 = n-1 = 0
psi_n = psi_nm1 / x - np.cos(x) # n = 1
xi_nm1 = complex(psi_nm1, np.cos(x))
xi_n = complex(psi_n, np.cos(x) / x + np.sin(x))
nstop = len(a)
if m.real > 0.0:
D = _D_calc(m, x, nstop + 1)
for n in range(1, nstop):
temp = D[n] / m + n / x
a[n - 1] = (temp * psi_n - psi_nm1) / (temp * xi_n - xi_nm1)
temp = D[n] * m + n / x
b[n - 1] = (temp * psi_n - psi_nm1) / (temp * xi_n - xi_nm1)
xi = (2 * n + 1) * xi_n / x - xi_nm1
xi_nm1 = xi_n
xi_n = xi
psi_nm1 = psi_n
psi_n = xi_n.real
else:
for n in range(1, nstop):
a[n - 1] = (n * psi_n / x - psi_nm1) / (n * xi_n / x - xi_nm1)
b[n - 1] = psi_n / xi_n
xi = (2 * n + 1) * xi_n / x - xi_nm1
xi_nm1 = xi_n
xi_n = xi
psi_nm1 = psi_n
psi_n = xi_n.real
The example inputs are like the followings:
m = 1.336-2.462e-09j
x = 8526.95
mu = np.array([-1., -0.7500396, 0.46037385, 0.5988121, 0.67384093, 0.72468684, 0.76421644, 0.79175856, 0.81723714, 0.83962897, 0.85924182, 0.87641596, 0.89383665, 0.90708978, 0.91931481, 0.93067567, 0.94073113, 0.94961222, 0.95689496, 0.96467123, 0.97138347, 0.97791831, 0.98339434, 0.98870543, 0.99414948, 0.9975728 0.9989995, 0.9989995, 0.9989995, 0.9989995, 0.9989995,0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899952, 0.99899952,
0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 1. ])
I am focussing on _mie_S1_S2 since it appear to be the most expensive function on the provided example dataset.
First of all, you can use the parameter fastmath=True to the JIT to accelerate the computation if there is no values like +Inf, -Inf, -0 or NaN computed.
Then you can pre-compute some expensive expression containing divisions or implicit integer-to-float conversions. Note that (2 * n + 1) / n = 2 + 1/n and (n + 1) / n = 1 + 1/n. This can be useful to reduce the number of precomputed array but did not change the performance on my machine (this may change regarding the target architecture). Note also that such a precomputation have a slight impact on the result accuracy (most of the time negligible and sometime better than the reference implementation).
On my machine, this strategy make the code 4.5 times faster with fastmath=True and 2.8 times faster without.
The k-based loop can be parallelized using parallel=True and prange of Numba. However, this may not be always faster on all machines (especially the ones with a lot of cores) since the loop is pretty fast.
Here is the final code:
#njit((complex128, float64, float64[:]), cache=True, parallel=True)
def _mie_S1_S2_opt(m, x, mu):
nstop = int(x + 4.05 * x**0.33333 + 2.0) + 1
a = np.zeros(nstop - 1, dtype=np.complex128)
b = np.zeros(nstop - 1, dtype=np.complex128)
_mie_An_Bn(m, x, a, b)
nangles = len(mu)
S1 = np.zeros(nangles, dtype=np.complex128)
S2 = np.zeros(nangles, dtype=np.complex128)
factor1 = np.empty(nstop, dtype=np.float64)
factor2 = np.empty(nstop, dtype=np.float64)
factor3 = np.empty(nstop, dtype=np.float64)
for n in range(1, nstop):
factor1[n - 1] = (2 * n + 1) / (n + 1) / n
factor2[n - 1] = (2 * n + 1) / n
factor3[n - 1] = (n + 1) / n
nstop = len(a)
for k in nb.prange(nangles):
pi_nm2 = 0
pi_nm1 = 1
for n in range(1, nstop):
i = n - 1
tau_nm1 = n * mu[k] * pi_nm1 - (n + 1.0) * pi_nm2
S1[k] += factor1[i] * (pi_nm1 * a[i] + tau_nm1 * b[i])
S2[k] += factor1[i] * (tau_nm1 * a[i] + pi_nm1 * b[i])
temp = pi_nm1
pi_nm1 = factor2[i] * mu[k] * pi_nm1 - factor3[i] * pi_nm2
pi_nm2 = temp
# calculate norm = sqrt(pi * Qext * x**2)
n = np.arange(1, nstop + 1)
norm = np.sqrt(2 * np.pi * np.sum((2 * n + 1) * (a.real + b.real)))
S1 /= norm
S2 /= norm
return [S1, S2]
%timeit -n 1000 _mie_S1_S2_opt(m, x, mu)
On my machine with 6 cores, the final optimized implementation is 12 times faster with fastmath=True and 8.8 times faster without. Note that using similar strategies in other functions may also helps to speed up them.
I am trying to calculate g(x_(i+2)) from the value g(x_(i+1)) and g(x_i), i is an integer, assuming I(x) and s(x) are Gaussian function. If we know x_i = 100, then the summation from 0 to 100, I don't know how to handle g(x_i) with the subscript in python, knowing the first and second value, we can find the third value, after n cycle, we can find the nth value.
Equation:
code:
import numpy as np
from matplotlib import pyplot as p
from math import pi
def f_s(x, mu_s, sig_s):
ss = -np.power(x - mu_s, 2) / (2 * np.power(sig_s, 2))
return np.exp(ss) / (np.power(2 * pi, 2) * sig_s)
def f_i(x, mu_i, sig_i):
ii = -np.power(x - mu_i, 2) / (2 * np.power(sig_i, 2))
return np.exp(ii) / (np.power(2 * pi, 2) * sig_i)
# problems occur in this part
def g(x, m, mu_s, sig_s, mu_i, sig_i):
for i in range(1, m): # specify the number x, x_1, x_2, x_3 ......X_m
h = (x[i + 1] - x[i]) / e
for n in range(0, x[i]): # calculate summation
sum_f = (f_i(x[i], mu_i, sig_i) - f_s(x[i] - n, mu_s, sig_s) * g_x[n]) * np.conj(f_s(n +
x[i], mu_s, sig_s))
g_x[1] = 1 # initial value
g_x[2] = 5
g_x[i + 2] = h * sum_f + 2 * g_x[i + 1] - g_x[i]
return g_x[i + 2]
x = np.linspace(-10, 10, 10000)
e = 1
d = 0.01
m = 1000
mu_s = 2
sig_s = 1
mu_i = 1
sig_i = 1
p.plot(x, g(x, m, mu_s, sig_s, mu_i, sig_i))
p.legend()
p.show()
result:
I(x) and s(x)
import math
import numpy as np
S0 = 100.; K = 100.; T = 1.0; r = 0.05; sigma = 0.2
M = 100; dt = T / M; I = 500000
S = np.zeros((M + 1, I))
S[0] = S0
for t in range(1, M + 1):
z = np.random.standard_normal(I)
S[t] = S[t - 1] * np.exp((r - 0.5 * sigma ** 2) * dt + sigma *
math.sqrt(dt) * z)
C0 = math.exp(-r * T) * np.sum(np.maximum(S[-1] - K, 0)) / I
print ("European Option Value is ", C0)
It gives a value of around 10.45 as you increase the number of simulations, but using the B-S formula the value should be around 10.09. Anybody know why the code isn't giving a number closer to the formula?
So I've been trying to fit to an exponentially modified gaussian function (if interested, https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution)
import numpy as np
import scipy.optimize as sio
import scipy.special as sps
def exp_gaussian(x, h, u, c, t):
z = 1.0/sqrt(2.0) * (c/t - (x-u)/c) #not important
k1 = k2 = h * c / t * sqrt(pi / 2) #not important
n1 = 1/2 * (c / t)**2 - (x-u)/t #not important
n2 = -1 / 2 * ((x - u) / c)**2 #not important
y = np.zeros(len(x))
y += (k1 * np.exp(n1) * sps.erfc(z)) * (z < 0)
y += (k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0)
return y
In order to prevent overflow problems, one of two equivilent functions must be used depending on whether z is positive or negative (see Alternative forms for computation from previous wikipedia page).
The problem I am having is this: The line y += (k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0)is only supposed to add to y when z is positive. But if z is, say, -30, sps.erfcx(-30) is inf, and inf * False is NaN. Therefore, instead of leaving y untouched, the resulting y is clustered with NaN. Example:
x = np.linspace(400, 1000, 1001)
y = exp_gaussian(x, 100, 400, 10, 5)
y
array([ 84.27384586, 86.04516723, 87.57518493, ..., nan,
nan, nan])
I tried the replacing the line in question with the following:
y += numpy.nan_to_num((k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0))
But doing this ran into serious runtime issues. Is there a way to only evaluate (k2 * np.exp(n2) * sps.erfcx(z)) on the condition that (z >= 0) ? Is there some other way to solve this without sacrificing runtime?
Thanks!
EDIT: After Rishi's advice, the following code seems to work much better:
def exp_gaussian(x, h, u, c, t):
z = 1.0/sqrt(2.0) * (c/t - (x-u)/c)
k1 = k2 = h * c / t * sqrt(pi / 2)
n1 = 1/2 * (c / t)**2 - (x-u)/t
n2 = -1 / 2 * ((x - u) / c)**2
return = np.where(z >= 0, k2 * np.exp(n2) * sps.erfcx(z), k1 * np.exp(n1) * sps.erfc(z))
How about using numpy.where with something like: np.where(z >= 0, sps.erfcx(z), sps.erfc(z)). I'm no numpy expert, so don't know if it's efficient. Looks elegant at least!
One thing you could do is create a mask and reuse it so it wouldn't need to be evaluated twice. Another idea is to use the nan_to_num only once at the end
mask = (z<0)
y += (k1 * np.exp(n1) * sps.erfc(z)) * (mask)
y += (k2 * np.exp(n2) * sps.erfcx(z)) * (~mask)
y = numpy.nan_yo_num(y)
Try and see if this helps...