Fitting piecewise function: running into NaN problems due to boolean array indexing

Fitting piecewise function: running into NaN problems due to boolean array indexing - python

So I've been trying to fit to an exponentially modified gaussian function (if interested, https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution)
import numpy as np
import scipy.optimize as sio
import scipy.special as sps
def exp_gaussian(x, h, u, c, t):
z = 1.0/sqrt(2.0) * (c/t - (x-u)/c) #not important
k1 = k2 = h * c / t * sqrt(pi / 2) #not important
n1 = 1/2 * (c / t)**2 - (x-u)/t #not important
n2 = -1 / 2 * ((x - u) / c)**2 #not important
y = np.zeros(len(x))
y += (k1 * np.exp(n1) * sps.erfc(z)) * (z < 0)
y += (k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0)
return y
In order to prevent overflow problems, one of two equivilent functions must be used depending on whether z is positive or negative (see Alternative forms for computation from previous wikipedia page).
The problem I am having is this: The line y += (k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0)is only supposed to add to y when z is positive. But if z is, say, -30, sps.erfcx(-30) is inf, and inf * False is NaN. Therefore, instead of leaving y untouched, the resulting y is clustered with NaN. Example:
x = np.linspace(400, 1000, 1001)
y = exp_gaussian(x, 100, 400, 10, 5)
y
array([ 84.27384586, 86.04516723, 87.57518493, ..., nan,
nan, nan])
I tried the replacing the line in question with the following:
y += numpy.nan_to_num((k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0))
But doing this ran into serious runtime issues. Is there a way to only evaluate (k2 * np.exp(n2) * sps.erfcx(z)) on the condition that (z >= 0) ? Is there some other way to solve this without sacrificing runtime?
Thanks!
EDIT: After Rishi's advice, the following code seems to work much better:
def exp_gaussian(x, h, u, c, t):
z = 1.0/sqrt(2.0) * (c/t - (x-u)/c)
k1 = k2 = h * c / t * sqrt(pi / 2)
n1 = 1/2 * (c / t)**2 - (x-u)/t
n2 = -1 / 2 * ((x - u) / c)**2
return = np.where(z >= 0, k2 * np.exp(n2) * sps.erfcx(z), k1 * np.exp(n1) * sps.erfc(z))

How about using numpy.where with something like: np.where(z >= 0, sps.erfcx(z), sps.erfc(z)). I'm no numpy expert, so don't know if it's efficient. Looks elegant at least!

One thing you could do is create a mask and reuse it so it wouldn't need to be evaluated twice. Another idea is to use the nan_to_num only once at the end
mask = (z<0)
y += (k1 * np.exp(n1) * sps.erfc(z)) * (mask)
y += (k2 * np.exp(n2) * sps.erfcx(z)) * (~mask)
y = numpy.nan_yo_num(y)
Try and see if this helps...

Related

converting a Matlab array to python

I have the following Matlab code:
A=[length(x) sum(x) sum(y) sum(x.*y);
sum(x) sum(x.^2) sum(x.*y) sum(y.*x.^2);
sum(y) sum(x.*y) sum(y.^2) sum(x.*y.^2);
sum(x.*y) sum((x.^2).*y) sum(x.*y.^2) sum((x.^2).*(y.^2))];
v=[sum(z); sum(z.*x); sum(z.*y); sum(x.*y.*z)];
solution=inv(A)*v;
a=solution(1);
b=solution(2);
c=solution(3);
d=solution(4);
which I want to convert into Python. I have no knowledge of Matlab, and with some research came up with the following Python code:
A = [
[len(x), x.sum(), y.sum(), (x * y).sum()],
[x.sum(), (x**2).sum(), (x * y).sum(), ((y * x) ** 2).sum()],
[y.sum(), (x * y).sum(), (y**2).sum(), ((x * y) ** 2).sum()],
[(x * y).sum(), ((x**2) * y).sum(), ((x * y)**2).sum(), ((x**2) * (y**2)).sum()]
]
v = [z.sum(), (z * x).sum(), (z * y).sum(), (x * y * z).sum()]
solution = np.linalg.inv(A) * v
I excpected a list with 4 values, which should be the coefficients which I want to use for a function. However, the result is another matrix. What am I doing wrong?

In the last step, you should use matrix multiplication instead of corresponding position multiplication.
Matrix multiplication uses operator#:
solution = np.linalg.inv(A) # v
Or function numpy.matmul:
solution = np.matmul(np.linalg.inv(A), v)

Sympy integration running into error, but Mathematica works fine

I'm trying to integrate an equation using sympy, but the evaluation keeps erroring out with:
TypeError: cannot add <class 'sympy.matrices.immutable.ImmutableDenseMatrix'> and <class 'sympy.core.numbers.Zero'>
However, when I integrate the same equation in Mathematica, I get the result I'm looking for. I'm hoping someone can explain what is happening under the hood with sympy and how its integration differs from mathematica's.
Because there are a lot of equations involved, I am only going to post the Python representation of the equations which are used to make up the variables called in the integration. To know for certain that the integration is an issue, I have independently checked each equation involved and made sure the results match between Python and Mathematica. I can confirm that only the integration fails.
Integration variable setup in Python:
import numpy as np
import sympy as sym
from sympy import I, Matrix, integrate
from sympy.integrals import Integral
from sympy.functions import exp
from sympy.physics.vector import cross, dot
c_speed = 299792458
lambda_u = 0.01
k_u = (2 * np.pi)/lambda_u
K_0 = 0.2
gamma_0 = 2.0
beta_0 = sym.sqrt(1 - 1 / gamma_0**2)
t = sym.symbols('t')
t_start = (-2 * lambda_u) / c_speed
t_end = (3 * lambda_u) / c_speed
beta_x_of_t = sym.Piecewise( (0.0, sym.Or(t < t_start, t > t_end)),
((-1 * K_0)/gamma_0 * sym.sin(k_u * c_speed * t), sym.And(t_start <= t, t <= t_end)) )
beta_z_of_t = sym.Piecewise( (beta_0, sym.Or(t < t_start, t > t_end)),
(beta_0 * (1 - K_0**2/ (4 * beta_0 * gamma_0**2)) + K_0**2 / (4 * beta_0 * gamma_0**2) * sym.sin(2 * k_u * c_speed * t), sym.And(t_start <= t, t <= t_end)) )
beta_xp_of_t = sym.diff(beta_x_of_t, t)
beta_zp_of_t = sym.diff(beta_z_of_t, t)
Python Integration:
n = Matrix([(0, 0, 1)])
beta = Matrix([(beta_x_of_t, 0, beta_z_of_t)])
betap = Matrix([(beta_xp_of_t, 0, beta_zp_of_t)])
def rad(n, beta, betap):
return integrate(n.cross((n-beta).cross(betap)), (t, t_start, t_end))
rad(n, beta, betap)
# Output is the error above
Mathematica integration:
rad[n_, beta_, betap_] :=
NIntegrate[
Cross[n, Cross[n - beta, betap]]
, {t, tstart, tend}, AccuracyGoal -> 3]
rad[{0, 0, 1}, {betax[t], 0, betaz[t]}, {betaxp[t], 0, betazp[t]}]
# Output is {0.00150421, 0., 0.}
I did see similar question such as this one and this question, but I'm not entirely sure if they are relevant here (the second link seems to be a closer match, but not quite a fit).

As you're working with imprecise floats (and even use np.pi instead of sym.pi), while sympy tries to find exact symbolic solutions, sympy's expressions get rather wild with constants like 6.67e-11 mixed with much larger values.
Here is an attempt to use more symbolic expressions, and only integrate the x-coordinate of the function.
import sympy as sym
from sympy import I, Matrix, integrate, S
from sympy.integrals import Integral
from sympy.functions import exp
from sympy.physics.vector import cross, dot
c_speed = 299792458
lambda_u = S(1) / 100
k_u = (2 * sym.pi) / lambda_u
K_0 = S(2) / 100
gamma_0 = 2
beta_0 = sym.sqrt(1 - S(1) / gamma_0 ** 2)
t = sym.symbols('t')
t_start = (-2 * lambda_u) / c_speed
t_end = (3 * lambda_u) / c_speed
beta_x_of_t = sym.Piecewise((0, sym.Or(t < t_start, t > t_end)),
((-1 * K_0) / gamma_0 * sym.sin(k_u * c_speed * t), sym.And(t_start <= t, t <= t_end)))
beta_z_of_t = sym.Piecewise((beta_0, sym.Or(t < t_start, t > t_end)),
(beta_0 * (1 - K_0 ** 2 / (4 * beta_0 * gamma_0 ** 2)) + K_0 ** 2 / (
4 * beta_0 * gamma_0 ** 2) * sym.sin(2 * k_u * c_speed * t),
sym.And(t_start <= t, t <= t_end)))
beta_xp_of_t = sym.diff(beta_x_of_t, t)
beta_zp_of_t = sym.diff(beta_z_of_t, t)
n = Matrix([(0, 0, 1)])
beta = Matrix([(beta_x_of_t, 0, beta_z_of_t)])
betap = Matrix([(beta_xp_of_t, 0, beta_zp_of_t)])
func = n.cross((n - beta).cross(betap))[0]
print(integrate(func, (t, t_start, t_end)))
This doesn't give an error, and outputs just zero.
Lambdify can be used to convert the function to numpy, and plot it.
func_np = sym.lambdify(t, func)
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
t_start_np = float(t_start.evalf())
t_end_np = float(t_end.evalf())
ts = np.linspace(t_start_np, t_end_np, 100000)
plt.plot(ts, func_np(ts))
print("approximate integral (np.trapz):", np.trapz(ts, func_np(ts)))
print("approximate integral (scipy's quad):", quad(func_np, t_start_np, t_end_np))
This outputs:
approximate integral (np.trapz): 0.04209721470548062
approximate integral (scipy's quad): (-2.3852447794681098e-18, 5.516333374450447e-10)
Note the huge values on the y-axis, and the small values on the x-axis. These values, as well as matematica's, could well be rounding errors.

How to make fsolve function work inside a for loop?

I am writing a function to calculate three nonlinear equations. I have the function to calculate them, but the thing is I give initial guess to them - which works fine. But now I have a constraint for x between (1/4 and 1/3). For that, I am making 2 for lops. One is for ptinting 13 different values, the other one for the constraint.
However the code does not give me the result at all:
import numpy as np
import math
from scipy.optimize import fsolve
def equations(vars):
x, y, z = vars
eq1 = ((x / (1 - x)) - (((2.5*np.cos(z)) / (8 * np.pi * np.sin(z) ** 2)) * (1 + (design_ratio * np.tan(z)))))
eq2 = ((y / (1 + y)) - (2.5 / (8 * np.pi * np.cos(z))) * (1 -design_ratio *( (1 / np.tan(z)))))
eq3 = np.tan(z) - ((1-x) /( 1.40 * (1+y)))
return [eq1, eq2, eq3]
n=13
for i in range(0, n):
for j in range(25555, 33333):
x = 0.00001 *x
x, y, z = fsolve(equations, (0.328, 0.048, 28))
print(x, y, z)

Frankly, using a loop is a terrible approach to handle the box constraint 1/4 <= x <= 1/3. Since fsolve doesn't support (box) constraints, you can rewrite the problem
Solve F(x,y,z) = 0 with 1/4 <= x <= 1/3
as an equivalent minimization problem
min np.sum(F(x,y,z)**2) s.t. 1/4 <= x <= 1/3
and solve it by means of scipy.optimize.minimize like this:
import numpy as np
from scipy.optimize import minimize
def F(vars):
x, y, z = vars
eq1 = ((x / (1 - x)) - (((2.5*np.cos(z)) / (8 * np.pi * np.sin(z) ** 2)) * (1 + (design_ratio * np.tan(z)))))
eq2 = ((y / (1 + y)) - (2.5 / (8 * np.pi * np.cos(z))) * (1 -design_ratio *( (1 / np.tan(z)))))
eq3 = np.tan(z) - ((1-x) /( 1.40 * (1+y)))
return np.array([eq1, eq2, eq3])
bounds = [(1./4, 1./3), (None, None), (None, None)]
res = minimize(lambda vars: np.sum(F(vars)**2), x0=(0.328, 0.048, 28), bounds=bounds)

Solving PDE with implicit euler in python - incorrect output

I will try and explain exactly what's going on and my issue.
This is a bit mathy and SO doesn't support latex, so sadly I had to resort to images. I hope that's okay.
I don't know why it's inverted, sorry about that.
At any rate, this is a linear system Ax = b where we know A and b, so we can find x, which is our approximation at the next time step. We continue doing this until time t_final.
This is the code
import numpy as np
tau = 2 * np.pi
tau2 = tau * tau
i = complex(0,1)
def solution_f(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) + np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
def solution_g(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) - np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
for l in range(2, 12):
N = 2 ** l #number of grid points
dx = 1.0 / N #space between grid points
dx2 = dx * dx
dt = dx #time step
t_final = 1
approximate_f = np.zeros((N, 1), dtype = np.complex)
approximate_g = np.zeros((N, 1), dtype = np.complex)
#Insert initial conditions
for k in range(N):
approximate_f[k, 0] = np.cos(tau * k * dx)
approximate_g[k, 0] = -i * np.sin(tau * k * dx)
#Create coefficient matrix
A = np.zeros((2 * N, 2 * N), dtype = np.complex)
#First row is special
A[0, 0] = 1 -3*i*dt
A[0, N] = ((2 * dt / dx2) + dt) * i
A[0, N + 1] = (-dt / dx2) * i
A[0, -1] = (-dt / dx2) * i
#Last row is special
A[N - 1, N - 1] = 1 - (3 * dt) * i
A[N - 1, N] = (-dt / dx2) * i
A[N - 1, -2] = (-dt / dx2) * i
A[N - 1, -1] = ((2 * dt / dx2) + dt) * i
#middle
for k in range(1, N - 1):
A[k, k] = 1 - (3 * dt) * i
A[k, k + N - 1] = (-dt / dx2) * i
A[k, k + N] = ((2 * dt / dx2) + dt) * i
A[k, k + N + 1] = (-dt / dx2) * i
#Bottom half
A[N :, :N] = A[:N, N:]
A[N:, N:] = A[:N, :N]
Ainv = np.linalg.inv(A)
#Advance through time
time = 0
while time < t_final:
b = np.concatenate((approximate_f, approximate_g), axis = 0)
x = np.dot(Ainv, b) #Solve Ax = b
approximate_f = x[:N]
approximate_g = x[N:]
time += dt
approximate_solution = np.concatenate((approximate_f, approximate_g), axis=0)
#Calculate the actual solution
actual_f = np.zeros((N, 1), dtype = np.complex)
actual_g = np.zeros((N, 1), dtype = np.complex)
for k in range(N):
actual_f[k, 0] = solution_f(t_final, k * dx)
actual_g[k, 0] = solution_g(t_final, k * dx)
actual_solution = np.concatenate((actual_f, actual_g), axis = 0)
print(np.sqrt(dx) * np.linalg.norm(actual_solution - approximate_solution))
It doesn't work. At least not in the beginning, it shouldn't start this slow. I should be unconditionally stable and converge to the right answer.
What's going wrong here?

The L2-norm can be a useful metric to test convergence, but isn't ideal when debugging as it doesn't explain what the problem is. Although your solution should be unconditionally stable, backward Euler won't necessarily converge to the right answer. Just like forward Euler is notoriously unstable (anti-dissipative), backward Euler is notoriously dissipative. Plotting your solutions confirms this. The numerical solutions converge to zero. For a next-order approximation, Crank-Nicolson is a reasonable candidate. The code below contains the more general theta-method so that you can tune the implicit-ness of the solution. theta=0.5 gives CN, theta=1 gives BE, and theta=0 gives FE.
A couple other things that I tweaked:
I selected a more appropriate time step of dt = (dx**2)/2 instead of dt = dx. That latter doesn't converge to the right solution using CN.
It's a minor note, but since t_final isn't guaranteed to be a multiple of dt, you weren't comparing solutions at the same time step.
With regards to your comment about it being slow: As you increase the spatial resolution, your time resolution needs to increase too. Even in your case with dt=dx, you have to perform a (1024 x 1024)*1024 matrix multiplication 1024 times. I didn't find this to take particularly long on my machine. I removed some unneeded concatenation to speed it up a bit, but changing the time step to dt = (dx**2)/2 will really bog things down, unfortunately. You could trying compiling with Numba if you are concerned with speed.
All that said, I didn't find tremendous success with the consistency of CN. I had to set N=2^6 to get anything at t_final=1. Increasing t_final makes this worse, decreasing t_final makes it better. Depending on your needs, you could looking into implementing TR-BDF2 or other linear multistep methods to improve this.
The code with a plot is below:
import numpy as np
import matplotlib.pyplot as plt
tau = 2 * np.pi
tau2 = tau * tau
i = complex(0,1)
def solution_f(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) + np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
def solution_g(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) - np.exp(tau * i * x) *
np.exp((tau2 + 4) * i * t))
l=6
N = 2 ** l
dx = 1.0 / N
dx2 = dx * dx
dt = dx2/2
t_final = 1.
x_arr = np.arange(0,1,dx)
approximate_f = np.cos(tau*x_arr)
approximate_g = -i*np.sin(tau*x_arr)
H = np.zeros([2*N,2*N], dtype=np.complex)
for k in range(N):
H[k,k] = -3*i*dt
H[k,k+N] = (2/dx2+1)*i*dt
if k==0:
H[k,N+1] = -i/dx2*dt
H[k,-1] = -i/dx2*dt
elif k==N-1:
H[N-1,N] = -i/dx2*dt
H[N-1,-2] = -i/dx2*dt
else:
H[k,k+N-1] = -i/dx2*dt
H[k,k+N+1] = -i/dx2*dt
### Bottom half
H[N :, :N] = H[:N, N:]
H[N:, N:] = H[:N, :N]
### Theta method. 0.5 -> Crank Nicolson
theta=0.5
A = np.eye(2*N)+H*theta
B = np.eye(2*N)-H*(1-theta)
### Precompute for faster computations
mat = np.linalg.inv(A)#B
t = 0
b = np.concatenate((approximate_f, approximate_g))
while t < t_final:
t += dt
b = mat#b
approximate_f = b[:N]
approximate_g = b[N:]
approximate_solution = np.concatenate((approximate_f, approximate_g))
#Calculate the actual solution
actual_f = solution_f(t,np.arange(0,1,dx))
actual_g = solution_g(t,np.arange(0,1,dx))
actual_solution = np.concatenate((actual_f, actual_g))
plt.figure(figsize=(7,5))
plt.plot(x_arr,actual_f.real,c="C0",label=r"$Re(f_\mathrm{true})$")
plt.plot(x_arr,actual_f.imag,c="C1",label=r"$Im(f_\mathrm{true})$")
plt.plot(x_arr,approximate_f.real,c="C0",ls="--",label=r"$Re(f_\mathrm{num})$")
plt.plot(x_arr,approximate_f.imag,c="C1",ls="--",label=r"$Im(f_\mathrm{num})$")
plt.legend(loc=3,fontsize=12)
plt.xlabel("x")
plt.savefig("num_approx.png",dpi=150)

I am not going to go through all of your math, but I'm going to offer a suggestion.
The use of a direct calculation for fxx and gxx seems like a good candidate for being numerically unstable. Intuitively a first order method should be expected to make second order mistakes in the terms. Second order mistakes in the individual terms, after passing through that formula, wind up as constant order mistakes in the second derivative. Plus when your step size gets small, you are going to find that a quadratic formula makes even small roundoff mistakes turn into surprisingly large errors.
Instead I would suggest that you start by turning this into a first-order system of 4 functions, f, fx, g, and gx. And then proceed with backward's Euler on that system. Intuitively, with this approach, a first order method creates second order mistakes, which pass through a formula that creates first order mistakes of them. And now you are converging as you should from the start, and are also not as sensitive to propagation of roundoff errors.

Multiplying matrices that are stored in an array in Python

This may seem like a silly question, but I am a bloody newby in Python (and in programming). I am running a physics simulation that involves many (~10 000) 2x2 matrices that I store in an array. I call these matrices M and the array T in the code below. Then I simply want to compute the product of all of these matrices. This is what I came up with, but it looks ugly and it would be so much work for 10000+ 2x2 matrices. Is there a simpler way or an inbuilt function that I could use?
import numpy as np
#build matrix M (dont read it, just an example, doesnt matter here)
def M(k1 , k2 , x):
a = (k1 + k2) * np.exp(1j * (k2-k1) * x)
b = (k1 - k2) * np.exp(-1j * (k1 + k2) * x)
c = (k1 - k2) * np.exp(1j * (k2 + k1) * x)
d = (k1 + k2) * np.exp(-1j * (k2 - k1) * x)
M = np.array([[a , b] , [c , d]])
M *= 1. / (2. * k1)
return M
#array of test matrices T
T = np.array([M(1,2,3), M(3,3,3), M(54,3,9), M(33,11,42) ,M(12,9,5)])
#compute the matrix product of T[0] * T[1] *... * T[4]
#I originally had this line of code, which is wrong, as pointed out in the comments
#np.dot(T[0],np.dot(T[1], np.dot(T[2], np.dot(T[2],np.dot(T[3],T[4])))))
#it should be:
np.dot(T[0], np.dot(T[1], np.dot(T[2],np.dot(T[3],T[4]))))

Not very NumPythonic, but you could do:
reduce(lambda x,y: np.dot(x,y), T, np.eye(2))
Or more concisely, as suggested
reduce(np.dot, T, np.eye(2))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fitting piecewise function: running into NaN problems due to boolean array indexing - python

How about using numpy.where with something like: np.where(z >= 0, sps.erfcx(z), sps.erfc(z)). I'm no numpy expert, so don't know if it's efficient. Looks elegant at least!

Related

converting a Matlab array to python

Sympy integration running into error, but Mathematica works fine

How to make fsolve function work inside a for loop?

Solving PDE with implicit euler in python - incorrect output

Multiplying matrices that are stored in an array in Python

Categories

Resources