I am given a list of dates in UTC, all hours cast to 00:00.
I'd like to determine if a (lunar) eclipse occurred in a given day (ie past 24 hours)
Considering the python snippet
from sykfield.api import load
eph = load('de421.bsp')
def eclipticangle(t):
moon, earth = eph['moon'], eph['earth']
e = earth.at(t)
x, y, _ = e.observe(moon).apparent().ecliptic_latlon()
return x.degrees
I am assuming one is able to determine if an eclipse occurred within 24hrs of a time t by
Checking that the first angle is close enough to 180 (easy)
Checking if the second degree is close enough to 0 (not so essy?)
Now as far as the answer in the comment suggests it is not so trivial to solve the second problem simply by testing if the angle is close to 0.
Therefore, my question is
Can someone provide a function to determine if a lunar eclipse occurred on a given day t?
Edit. This question was edited to reflect the feedback from Brandon Rhodes left in the comments below.
I just went through section 11.2.3 of the Explanatory Supplement to the Astronomical Almanac and tried turning it into Skyfield Python code. Here is what I came up with:
import numpy as np
from skyfield.api import load
from skyfield.constants import ERAD
from skyfield.functions import angle_between, length_of
from skyfield.searchlib import find_maxima
eph = load('de421.bsp')
earth = eph['earth']
moon = eph['moon']
sun = eph['sun']
def f(t):
e = earth.at(t).position.au
s = sun.at(t).position.au
m = moon.at(t).position.au
return angle_between(s - e, m - e)
f.step_days = 5.0
ts = load.timescale()
start_time = ts.utc(2019, 1, 1)
end_time = ts.utc(2020, 1, 1)
t, y = find_maxima(start_time, end_time, f)
e = earth.at(t).position.m
m = moon.at(t).position.m
s = sun.at(t).position.m
solar_radius_m = 696340e3
moon_radius_m = 1.7371e6
pi_m = np.arcsin(ERAD / length_of(m - e))
pi_s = np.arcsin(ERAD / length_of(s - e))
s_s = np.arcsin(solar_radius_m / length_of(s - e))
pi_1 = 0.998340 * pi_m
sigma = angle_between(s - e, e - m)
s_m = np.arcsin(moon_radius_m / length_of(e - m))
penumbral = sigma < 1.02 * (pi_1 + pi_s + s_s) + s_m
partial = sigma < 1.02 * (pi_1 + pi_s - s_s) + s_m
total = sigma < 1.02 * (pi_1 + pi_s - s_s) - s_m
mask = penumbral | partial | total
t = t[mask]
penumbral = penumbral[mask]
partial = partial[mask]
total = total[mask]
print(t.utc_strftime())
print(0 + penumbral + partial + total)
It produces a vector of times at which lunar eclipses occurred, and then a rating of how total the eclipse is:
['2019-01-21 05:12:51 UTC', '2019-07-16 21:31:27 UTC']
[3 2]
Its eclipse times are within 3 seconds of the times given in the huge table of lunar ephemerides at NASA:
https://eclipse.gsfc.nasa.gov/5MCLE/5MKLEcatalog.txt
Related
I am trying to integrate an equation by python. However, I don't understand why the integration doesn't run. The equation I want to integrate is:
Having:
, to find having and , with .
I am doing this procedure:
from sympy import *
x=symbols('x')
y=symbols('y')
Gamma = 0.167
sigma_8 = 0.9
M_8 = 6e14
gamma = (0.3*Gamma+0.2)*(2.92+1/3*log(x/M_8))
sigma = 0.9*(x/M_8)**(-gamma/3)
diff(sigma,x)
print(diff(sigma,x))
out:
0.9*(1.66666666666667e-15*x)**(-0.0277888888888889*log(1.66666666666667e-15*x) - 0.243430666666667)*(600000000000000.0*(-4.63148148148148e-17*log(1.66666666666667e-15*x) - 4.05717777777778e-16)/x - 0.0277888888888889*log(1.66666666666667e-15*x)/x)
Then
import math
dnudM = math.sqrt(0.707)*1.686*(1+y)*diff(sigma,x)
print(dnudM)
Out:
0.9*(1.66666666666667e-15*x)**(-0.0277888888888889*log(1.66666666666667e-15*x) - 0.243430666666667)*(1.41764430376593*y + 1.41764430376593)*(600000000000000.0*(-4.63148148148148e-17*log(1.66666666666667e-15*x) - 4.05717777777778e-16)/x - 0.0277888888888889*log(1.66666666666667e-15*x)/x)
Then
n = (1+(1/x**0.6))*exp(-x**2/2)*dnudM
print(n)
And out
0.9*(1.66666666666667e-15*x)**(-0.0277888888888889*log(1.66666666666667e-15*x) - 0.243430666666667)*(x**(-0.6) + 1)*(1.41764430376593*y + 1.41764430376593)*(600000000000000.0*(-4.63148148148148e-17*log(1.66666666666667e-15*x) - 4.05717777777778e-16)/x - 0.0277888888888889*log(1.66666666666667e-15*x)/x)*exp(-x**2/2)
Finally, I arrive to this point that the integration doesn't produce any output.
n_H = integrate(n, x)
print(n_H)
It doesn't show either errors nor output!
The code you have ran to completion for me, but took nearly 30 hours.
I tweaked a couple of variable names and added in some timer code to track it.
Code:
import math
from sympy import *
import time
x=symbols('x')
y=symbols('y')
Gamma = 0.167
sigma_8 = 0.9
M_8 = 6e14
gamma_x = (0.3*Gamma+0.2)*(2.92+1/3*log(x/M_8))
sigma = 0.9*(x/M_8)**(-gamma_x/3)
diff_sigma_x = diff(sigma,x)
print(f"{diff_sigma_x=}\n")
dnudM = math.sqrt(0.707)*1.686*(1+y)*diff_sigma_x
print(f"{dnudM=}\n")
n = (1+(1/x**0.6))*exp(-x**2/2)*dnudM
print(f"{n=}\n")
start = time.time()
n_H = integrate(n, x)
print(f"{n_H=}\n")
end = time.time()
print(f"Time to integrate = {end-start} seconds")
Results
n_H=-2.10235303142289*(y + 1)*(1.0*Integral(-4.20001660788705e-11*exp(-x**2/2)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.897831723570746*x**(0.0277888888888889*log(x))), x) + 1.0*Integral(-4.20001660788705e-11*exp(-x**2/2)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.297831723570746*x**(0.0277888888888889*log(x))), x) + 1.0*Integral(1.41662964847297e-12*exp(-x**2/2)*log(x)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.897831723570746*x**(0.0277888888888889*log(x))), x) + 1.0*Integral(1.41662964847297e-12*exp(-x**2/2)*log(x)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.297831723570746*x**(0.0277888888888889*log(x))), x))
Time to integrate = 107187.83417797089 seconds
I want to get my code to loop so that every time it performs the calculation, it adds basically does a cumulative sum for my variable delta_omega. i.e. for every calculation, it takes the previous values in the delta_omega array, adds them together and uses that value to perform the calculation again and so on. I'm really not sure how to go about this as I want to plot these results too.
import numpy as np
import matplotlib.pyplot as plt
delta_omega = np.linspace(-900*10**6, -100*10**6, m) #Hz - range of frequencies
i = 0
while i<len(delta_omega):
delta = delta_omega[i] - (k*v_cap) + (mu_eff*B)/hbar
p_ee = (s0*L/2) / (1 + s0 + (2*delta/L)**2) #population of the excited state
R = L * p_ee # scattering rate
F = hbar*k*(R) #scattering force on atoms
a = F/m_Rb #acceleration assumed constant
vf_slower = (v_cap**2 - (2*a*z0))**0.5 #velocity at the end of the slower
t_d = 1/a * (v_cap - vf_slower) #time taken during slower
# -------- After slower --------
da = 0.1 #(m) distance from end of slower to the middle of the MOT
vf_MOT = (vf_slower**2 - (2*a*da))**0.5 #(m/s) - velocity of the particles at MOT center
t_a = da/vf_MOT #(s) time taken after slower
r0 = 0.01 #MOT capture radius
vr_max = r0/(t_b+t_d+t_a) #maximum transveral velocity
vz_max = (v_cap**2 + 2*a_max*z0)**0.5 #m/s - maximum axial velocity
# -------- Flux of atoms captured --------
P = 10**(4.312-(4040/T)) #vapour pressure for liquid phase (use 4.857 for solid phase)
A = 5*10**-4 #area of the oven aperture
n = P/(k_b*T) #atomic number density
f_oven = ((n*A)/4) * (2/(np.pi)**0.5) * ((2*k_b*T)/m_Rb)**0.5
f = f_oven * (1 - np.exp(-vr_max**2/vp**2))*(1 - np.exp(-vz_max**2/vp**2))
i+=1
plt.plot(delta_omega, f)
A simple cumulative sum would be defining the variable outside the loop and adding to it
i = 0
x = 0
while i < 10:
x = x + 5 #do your work on the cumulative value here
i += 1
print("cumulative sum: {}".format(x))
so define a variable that will contain the cumulative sum, and every loop, add to it
I want the while loop at the end of my code to perform the calculation for each component of delta_omega, and print the result before moving on to the next number in the delta_omega array. When I run the code python outputs 'The rate of capture at a detuning of -5000000 is 3.0E+13' an infinite number of times before I manually stop it. I am unsure of why this is happening and how to fix it? I tried using break at the end of the loop, but this just performed the calculation once for the first component of the delta_omega array.
import numpy as np
from scipy.integrate import odeint
#import matplotlib.pyplot as plt
# Constants
m_Rb = 1.443*10**-25 #mass of rubidium 87
k_b = 1.38*10**-23
h = 6.63*10**-34
hbar = 1.05*10**-34
L = 38.116*10**6 #natural linewidth
epsilon_0 = 8.85418782*10**-12 #permittivity of free space
# Changable paramaters
lmbda = 780*10**-9 #wavelength of laser light
k = (2*np.pi)/lmbda #wavevector of laser light
B = 5*10**-4 #magnetic field strength
# D2 effective magetic moment
gj_gnd = 1 + (0.5*(0.5+1) + 0.5*(0.5+1) - 0*(0+1))/(2*0.5*(0.5+1))
mj_gnd = 0.5
gj_ex = 1 + (1.5*(1.5+1) + 0.5*(0.5+1) - 1*(1+1))/(2*1.5*(1.5+1))
mj_ex = 1.5
Bohr = 9.274*10**-24 #Bohr magneton value
mu_eff = Bohr*(gj_ex*mj_ex - gj_gnd*mj_gnd)
# -------- Before slower --------
T = 700 #temperature of oven
vp = ((2*k_b*T)/m_Rb)**0.5 #mean velocity of particles coming out of the oven
x_os = 0.1
a = (hbar*L*k)/(2*m_Rb) #max decelleration of atoms
vf_oven = (vp**2 + (2*a*x_os))**0.5
t_b = (2*x_os)/(vp + vf_oven) #time taken from oven to start of slower
# -------- During slower --------
length_slow = 0.5
vz_max = (vf_oven**2 + 2*a*length_slow)**0.5
Z = 0.7
#Z = np.linspace(0, length_slow, 100)
P = 10**(4.312-(4040/T)) #vapour pressure for liquid phase (use 4.857 for solid phase)
A = 5*10**-4 #area of the oven aperture
n = P/(k_b*T) #atomic number density
I = 1*10**5 #intensity
n0 = 1 #refraction constant for medium
E_0 = ((2*I)/(3*10**8*n0*epsilon_0))**0.5
Rabi = (E_0*3.5844*10**-29)/hbar
II_sat = (2*Rabi**2)/L**2
delta_omega = np.array([-5*10**6, -10*10**6, -30*10**6]) #range of frequencies
i = 0
while i<len(delta_omega):
B_p = (h/mu_eff) * (delta_omega[i] + (1/lmbda)*(vf_oven**2 - (2*a*length_slow))**0.5)
B_n = (h/mu_eff) * (delta_omega[i] - (1/lmbda)*(vf_oven**2 - (2*a*length_slow))**0.5)
delta_n = delta_omega[i] + (k*vf_oven) - (mu_eff*B_n)/hbar
delta_p = delta_omega[i] - (k*vf_oven) + (mu_eff*B_p)/hbar
F = (hbar*k*L)/2 * ((II_sat/(1+II_sat+(2*delta_n/L)**2))
- (II_sat/(1+II_sat+(2*delta_p/L)**2)))
accn = abs(F/m_Rb)
vf_slower = (vf_oven**2 - (2*accn*length_slow))**0.5
t_d = 1/accn * (vf_oven - vf_slower) #time taken during slower
# -------- After slower --------
da = 0.1 #distance from end of slower to the middle of the MOT
vf_MOT = (vf_slower**2 - (2*accn*da))**0.5
t_a = da/vf_MOT #time taken after slower
r0 = 0.01 #MOT capture radius
vr_max = r0/(t_b+t_d+t_a)
# -------- Flux of atoms captured --------
f_oven = ((n*A)/4) * (2/(np.pi)**0.5) * ((2*k_b*T)/m_Rb)**0.5
f = f_oven * (1 - np.exp(-vr_max**2/vp**2))*(1 - np.exp(-vz_max**2/vp**2))
print('The rate of capture at a detuning of', delta_omega[i], 'is', format(f, '.1E'))
It seems you forgot to increment i within the loop...
I am trying to implement 2 temperature models, the following equations:
C_e(∂T_e)/∂t=∇[k_e∇T_e ]-G(T_e-T_ph )+ A(r,t)
C_ph(∂T_ph)/∂t=∇[k_ph∇T_ph] + G(T_e-T_ph)
Code
from fipy.tools import numerix
import scipy
import fipy
import numpy as np
from fipy import CylindricalGrid1D
from fipy import Variable, CellVariable, TransientTerm, DiffusionTerm, Viewer, LinearLUSolver, LinearPCGSolver, \
LinearGMRESSolver, ImplicitDiffusionTerm, Grid1D
FIPY_SOLVERS = scipy
## Mesh
nr = 50
dr = 1e-7
# r = nr * dr
mesh = CylindricalGrid1D(nr=nr, dr=dr, origin=0)
x = mesh.cellCenters[0]
# Variables
T_e = CellVariable(name="electronTemp", mesh=mesh,hasOld=True)
T_e.setValue(300)
T_ph = CellVariable(name="phononTemp", mesh=mesh, hasOld=True)
T_ph.setValue(300)
G = CellVariable(name="EPC", mesh=mesh)
t = Variable()
# Material parameters
C_e = CellVariable(name="C_e", mesh=mesh)
k_e = CellVariable(name="k_e", mesh=mesh)
C_ph = CellVariable(name="C_ph", mesh=mesh)
k_ph = CellVariable(name="k_ph", mesh=mesh)
C_e = 4.15303 - (4.06897 * numerix.exp(T_e / -85120.8644))
C_ph = 4.10446 - 3.886 * numerix.exp(-T_ph / 373.8)
k_e = 0.1549 * T_e**-0.052
k_ph =1.24 + 16.29 * numerix.exp(-T_ph / 151.57)
G = numerix.exp(21.87 + 10.062 * numerix.log(numerix.log(T_e )- 5.4))
# Boundary conditions
T_e.constrain(300, where=x > 4.5e-6)
T_ph.constrain(300, where=x > 4.5e-6)
# Source 𝐴(𝑟,𝑡) = 𝑎𝐷(𝑟)𝜏−1 𝑒−𝑡/𝜏 , 𝐷(𝑟) = 𝑆𝑒 exp (−𝑟2/𝜎2)/√2𝜋𝜎2
sig = 1.0e-6
tau = 1e-15
S_e = 35
d_r = (S_e * 1.6e-9 * numerix.exp(-x**2 /sig**2)) / (numerix.sqrt(2. * 3.14 * sig**2))
A_t = numerix.exp(-t/tau)
a = (numerix.sqrt(2. * 3.14)) / (3.14 * sig)
A_r = a * d_r * tau**-1 * A_t
eq0 = (TransientTerm(var=T_e, coeff=C_e) == DiffusionTerm(var=T_e, coeff=k_e) - G*(T_e - T_ph) + A_r
eq1 =(TransientTerm(var=T_ph, coeff=C_ph) == DiffusionTerm(var=T_ph, coeff=k_ph) + G*(T_e - T_ph)
eq = eq0 & eq1
dt = 1e-18
steps = 7000
elapsed = 0.
vi = Viewer((T_e, T_ph), datamin=0., datamax=2e4)
for step in range(steps):
T_e.updateOld()
T_ph.updateOld()
vi.plot()
res = 1e100
dt *= 1.1
while res > 1:
res = eq.sweep(dt=dt)
print(t, res)
t.setValue(t + dt)
Problem
The code is working fine with very small dt = 1e-18, but I need to run it until e 1e-10.
With this time step is going to take very long time, and setting dt *= 1.1 the resduals at some point start to increase then
gives following runtime error:
factor is exactly singular
Even with very small increment dt*= 1.005 the same issue pop up.
Using dt= 1.001 runs the code for quit long time then the residual get stuck at certain value.
Questions
I there any error in the fipy formalism of the equations?
What causes the error?
Is the error because of time step increase? If yes, how can I increase my time step?
I've made a few more changes to the code that can get you to an elapsed time of 1e-10. The main changes are
Using ImplicitSourceTerm for the terms with G. This stabalizes the solution.
Applied underRelaxation=0.5 in the sweep step. This slows down the updates in the sweep loop so the feedback loop is damped down.
Removed FIPY_SOLVERS=scipy. This isn't doing anything. FIPY_SOLVERS is an environment variable that you set outside of the Python environment.
The way the boundary conditions were applied seemed strange so I applied them in a more canonical way.
The sweep loop is fixed to 10 sweeps to get to a steady state quickly. Note that as the solution gets close to a stable steady state, the residual won't get better necessarily. Probably want to go back to residual checks if you need an accurate transient.
from fipy.tools import numerix
import scipy
import fipy
import numpy as np
from fipy import CylindricalGrid1D
from fipy import Variable, CellVariable, TransientTerm, DiffusionTerm, Viewer, LinearLUSolver, LinearPCGSolver, \
LinearGMRESSolver, ImplicitDiffusionTerm, Grid1D, ImplicitSourceTerm
## Mesh
nr = 50
dr = 1e-7
# r = nr * dr
mesh = CylindricalGrid1D(nr=nr, dr=dr, origin=0)
x = mesh.cellCenters[0]
# Variables
T_e = CellVariable(name="electronTemp", mesh=mesh,hasOld=True)
T_e.setValue(300)
T_ph = CellVariable(name="phononTemp", mesh=mesh, hasOld=True)
T_ph.setValue(300)
G = CellVariable(name="EPC", mesh=mesh)
t = Variable()
# Material parameters
C_e = CellVariable(name="C_e", mesh=mesh)
k_e = CellVariable(name="k_e", mesh=mesh)
C_ph = CellVariable(name="C_ph", mesh=mesh)
k_ph = CellVariable(name="k_ph", mesh=mesh)
C_e = 4.15303 - (4.06897 * numerix.exp(T_e / -85120.8644))
C_ph = 4.10446 - 3.886 * numerix.exp(-T_ph / 373.8)
k_e = 0.1549 * T_e**-0.052
k_ph =1.24 + 16.29 * numerix.exp(-T_ph / 151.57)
G = numerix.exp(21.87 + 10.062 * numerix.log(numerix.log(T_e )- 5.4))
# Boundary conditions
T_e.constrain(300, where=mesh.facesRight)
T_ph.constrain(300, where=mesh.facesRight)
# Source 𝐴(𝑟,𝑡) = 𝑎𝐷(𝑟)𝜏−1 𝑒−𝑡/𝜏 , 𝐷(𝑟) = 𝑆𝑒 exp (−𝑟2/𝜎2)/√2𝜋𝜎2
sig = 1.0e-6
tau = 1e-15
S_e = 35
d_r = (S_e * 1.6e-9 * numerix.exp(-x**2 /sig**2)) / (numerix.sqrt(2. * 3.14 * sig**2))
A_t = numerix.exp(-t/tau)
a = (numerix.sqrt(2. * 3.14)) / (3.14 * sig)
A_r = a * d_r * tau**-1 * A_t
eq0 = (
TransientTerm(var=T_e, coeff=C_e) == \
DiffusionTerm(var=T_e, coeff=k_e) - \
ImplicitSourceTerm(coeff=G, var=T_e) + \
ImplicitSourceTerm(var=T_ph, coeff=G) + \
A_r)
eq1 = (TransientTerm(var=T_ph, coeff=C_ph) == DiffusionTerm(var=T_ph, coeff=k_ph) + ImplicitSourceTerm(var=T_e, coeff=G) - ImplicitSourceTerm(coeff=G, var=T_ph))
eq = eq0 & eq1
dt = 1e-18
steps = 7000
elapsed = 0.
vi = Viewer((T_e, T_ph), datamin=0., datamax=2e4)
for step in range(steps):
T_e.updateOld()
T_ph.updateOld()
vi.plot()
res = 1e100
dt *= 1.1
count = 0
while count < 10:
res = eq.sweep(dt=dt, underRelaxation=0.5)
print(t, res)
count += 1
print('elapsed:', t.value)
t.setValue(t + dt)
Regarding your questions.
I there any error in the fipy formalism of the equations?
Actually, no. Nothing wrong with the formalism, but better to use ImplicitSourceTerm.
What causes the error?
There are two source of instability in this system. The source terms inside the equation when written explicitly are unstable above a certain time step. Using an ImplcitSourceTerm removes this instablity. There is also some sort of instability in the coupling of the equations. I think that using under relaxation helps with that.
Is the error because of time step increase? If yes, how can I increase my time step?
Explained above.
In addition to #wd15's answer:
Your equations are extremely non-linear. You will likely benefit from Newton iterations to get decent convergence.
As #TimRoberts said, geometrically increasing the time step without bound is probably not a good idea.
I've recently posted a package called steppyngstounes that takes care of adapting timesteps. Although a standalone package, it's intended to work with FiPy. For example, you could change your solve loop to this:
from steppyngstounes import FixedStepper, PIDStepper
T_e.updateOld()
T_ph.updateOld()
for checkpoint in FixedStepper(start=0, stop=1e-10, size=1e-12):
for step in PIDStepper(start=checkpoint.begin,
stop=checkpoint.end,
size=dt):
res = 1e100
for sweep in range(10):
res = eq.sweep(dt=dt, underRelaxation=0.5)
print(t, sweep, res)
if step.succeeded(error=res / 1000):
T_e.updateOld()
T_ph.updateOld()
t.value = step.end
else:
T_e.value = T_e.old
T_ph.value = T_ph.old
print('elapsed:', t.value)
# the last step might have been smaller than possible,
# if it was near the end of the checkpoint range
dt = step.want
_ = checkpoint.succeeded()
vi.plot()
This code will update the viewer every 1e-12 time units, and adaptively make it's way between those checkpoints. There are other steppers in the package that would facilitate taking geometrically or exponentially increasing checkpoints, if that kept things more interesting.
You could probably get better overall performance by sweeping fewer times and letting the adapter take much smaller time steps in the beginning. I found that no time step was small enough to get the initial residual lower than 777.9. After the first couple of steps, the error metric could probably be much more aggressive, giving more accurate results.
This question is a follow-up to a recent question posted regarding MATLAB being twice as fast as Numpy.
I currently have a Gauss-Seidel solver implemented in both MATLAB and Numpy which acts on a 2D axisymmetric domain (cylindrical coordinates). The code was originally written in MATLAB and then transferred to Python. The Matlab code runs in ~20 s whereas the Numpy codes takes ~30 s. I would like to use Numpy, however, since this code is part of a larger program, the almost twice as long simulation time is a significant drawback.
The algorithm simply solves the discretized Laplace equation on a rectangular mesh (in cylindrical coordinates). It finishes when the maximum difference between updates on the mesh is less than the indicated tolerance.
The code in Numpy is:
import numpy as np
import time
T = np.transpose
# geometry
length = 0.008
width = 0.002
# mesh
nz = 256
nr = 64
# step sizes
dz = length/nz
dr = width/nr
# node position matrices
r = np.tile(np.linspace(0,width,nr+1), (nz+1, 1)).T
ri = r/dr
# equation coefficients
cr = dz**2 / (2*(dr**2 + dz**2))
cz = dr**2 / (2*(dr**2 + dz**2))
# initial/boundary conditions
v = np.zeros((nr+1,nz+1))
v[:,0] = 1100
v[:,-1] = 0
v[31:,29:40] = 1000
v[19:,54:65] = -200
# convergence parameters
tol = 1e-4
# Gauss-Seidel solver
tic = time.time()
max_v_diff = 1;
while (max_v_diff > tol):
v_old = v.copy()
# left boundary updates
v[0,1:nz] = cr*2*v[1,1:nz] + cz*(v[0,0:nz-1] + v[0,2:nz+2])
# internal updates
v[1:nr,1:nz] = cr*((1 - 1/(2*ri[1:nr,1:nz]))*v[0:nr-1,1:nz] + (1 + 1/(2*ri[1:nr,1:nz]))*v[2:nr+1,1:nz]) + cz*(v[1:nr,0:nz-1] + v[1:nr,2:nz+1])
# right boundary updates
v[nr,1:nz] = cr*2*v[nr-1,1:nz] + cz*(v[nr,0:nz-1] + v[nr,2:nz+1])
# reapply grid potentials
v[31:,29:40] = 1000
v[19:,54:65] = -200
# check for convergence
v_diff = v - v_old
max_v_diff = np.absolute(v_diff).max()
toc = time.time() - tic
print(toc)
This is actually not the full algorithm which I use. The full algorithm uses successive overrelaxation and a checkerboard iteration scheme to improve speed and remove solver directionality, but for purposes of simplicity I provided this easier to understand version. The speed drawbacks in Numpy are more pronounced for the full version (17s vs. 9s simulation times respectively in Numpy and MATLAB).
I tried the solution from the previous question, changing v to a column-major order array, but there was no performance increase.
Any suggestions?
Edit: The Matlab code for reference is:
% geometry
length = 0.008;
width = 0.002;
% mesh
nz = 256;
nr = 64;
% step sizes
dz = length/nz;
dr = width/nr;
% node position matrices
r = repmat(linspace(0,width,nr+1)', 1, nz+1);
ri = r./dr;
% equation coefficients
cr = dz^2/(2*(dr^2+dz^2));
cz = dr^2/(2*(dr^2+dz^2));
% initial/boundary conditions
v = zeros(nr+1,nz+1);
v(1:nr+1,1) = 1100;
v(1:nr+1,nz+1) = 0;
v(32:nr+1,30:40) = 1000;
v(20:nr+1,55:65) = -200;
% convergence parameters
tol = 1e-4;
max_v_diff = 1;
% Gauss-Seidel Solver
tic
while (max_v_diff > tol)
v_old = v;
% left boundary updates
v(1,2:nz) = cr.*2.*v(2,2:nz) + cz.*( v(1,1:nz-1) + v(1,3:nz+1) );
% internal updates
v(2:nr,2:nz) = cr.*( (1 - 1./(2.*ri(2:nr,2:nz))).*v(1:nr-1,2:nz) + (1 + 1./(2.*ri(2:nr,2:nz))).*v(3:nr+1,2:nz) ) + cz.*( v(2:nr,1:nz-1) + v(2:nr,3:nz+1) );
% right boundary updates
v(nr+1,2:nz) = cr.*2.*v(nr,2:nz) + cz.*( v(nr+1,1:nz-1) + v(nr+1,3:nz+1) );
% reapply grid potentials
v(32:nr+1,30:40) = 1000;
v(20:nr+1,55:65) = -200;
% check for convergence
max_v_diff = max(max(abs(v - v_old)));
end
toc
I've been able to reduce the running time in my laptop from 66 to 21 seconds by following this process:
Find the bottleneck. I profiled the code using line_profiler from the IPython console to find the lines that took most time. It turned out that over 80% of the time was spent in the line that does "internal updates".
Choose a way to optimise it. There are several tools to speed code up in numpy (Cython, numexpr, weave...). In particular, scipy.weave.blitz is well suited to compile numpy expressions, like the offending line, into fast code. In theory, that line could be wrapped inside "..." and executed as weave.blitz("...") but the array that's being updated is used in the computation, so as stated by point #4 in the docs a temporary array must be used to keep the same result:
expr = "temp = cr*((1 - 1/(2*ri[1:nr,1:nz]))*v[0:nr-1,1:nz] + (1 + 1/(2*ri[1:nr,1:nz]))*v[2:nr+1,1:nz]) + cz*(v[1:nr,0:nz-1] + v[1:nr,2:nz+1]); v[1:nr,1:nz] = temp"
temp = np.empty((nr-1, nz-1))
...
while ...
# internal updates
weave.blitz(expr)
After checking that the results are correct, runtime checks are disabled by using weave.blitz(expr, check_size=0). The code now runs in 34 seconds.
Building up on Jaime's work, precompute the constant factors A and B in the expression. The code runs in 21 seconds (with minimal changes but it now needs a compiler).
This is the core of the code:
from scipy import weave
# [...] Set up code till "# Gauss-Seidel solver"
tic = time.time()
max_v_diff = 1;
A = cr * (1 - 1/(2*ri[1:nr,1:nz]))
B = cr * (1 + 1/(2*ri[1:nr,1:nz]))
expr = "temp = A*v[0:nr-1,1:nz] + B*v[2:nr+1,1:nz] + cz*(v[1:nr,0:nz-1] + v[1:nr,2:nz+1]); v[1:nr,1:nz] = temp"
temp = np.empty((nr-1, nz-1))
while (max_v_diff > tol):
v_old = v.copy()
# left boundary updates
v[0,1:nz] = cr*2*v[1,1:nz] + cz*(v[0,0:nz-1] + v[0,2:nz+2])
# internal updates
weave.blitz(expr, check_size=0)
# right boundary updates
v[nr,1:nz] = cr*2*v[nr-1,1:nz] + cz*(v[nr,0:nz-1] + v[nr,2:nz+1])
# reapply grid potentials
v[31:,29:40] = 1000
v[19:,54:65] = -200
# check for convergence
v_diff = v - v_old
max_v_diff = np.absolute(v_diff).max()
toc = time.time() - tic
On my laptop your code runs in about 45 seconds. By trying to reduce creation of intermediate arrays to the bare minimum, including reuse of pre-allocated work arrays, I have managed to reduce that time to 27 seconds. That should put you back at the level of MATLAB, but your code would be less readable. Anyway, find below code to replace everything below your # Gauss-Seidel solver comment:
# work arrays
v_old = np.empty_like(v)
w1 = np.empty_like(v[0, 1:nz])
w2 = np.empty_like(v[1:nr,1:nz])
w3 = np.empty_like(v[nr, 1:nz])
# constants
A = cr * (1 - 1/(2*ri[1:nr,1:nz]))
B = cr * (1 + 1/(2*ri[1:nr,1:nz]))
# Gauss-Seidel solver
tic = time.time()
max_v_diff = 1;
while (max_v_diff > tol):
v_old[:] = v
# left boundary updates
np.add(v_old[0, 0:nz-1], v_old[0, 2:nz+2], out=v[0, 1:nz])
v[0, 1:nz] *= cz
np.multiply(2*cr, v_old[1, 1:nz], out=w1)
v[0, 1:nz] += w1
# internal updates
np.add(v_old[1:nr, 0:nz-1], v_old[1:nr, 2:nz+1], out=v[1:nr, 1:nz])
v[1:nr,1:nz] *= cz
np.multiply(A, v_old[0:nr-1, 1:nz], out=w2)
v[1:nr,1:nz] += w2
np.multiply(B, v_old[2:nr+1, 1:nz], out=w2)
v[1:nr,1:nz] += w2
# right boundary updates
np.add(v_old[nr, 0:nz-1], v_old[nr, 2:nz+1], out=v[nr, 1:nz])
v[nr, 1:nz] *= cz
np.multiply(2*cr, v_old[nr-1, 1:nz], out=w3)
v[nr,1:nz] += w3
# reapply grid potentials
v[31:,29:40] = 1000
v[19:,54:65] = -200
# check for convergence
v_old -= v
max_v_diff = np.absolute(v_old).max()
toc = time.time() - tic