I am relatively new to parallel computing and the Numba package. I am looking for optimization methods for my stupendously parallel N-body simulation. I've applied everything I know so far with Numpy arrays, JIT compliers, and multiprocessing. However, I'm still not getting the speed I desire (I've seen videos where their codes are MUCH faster still)
What I have currently is a rather simple python integrator using Runge-Kutta Integration and two equations of motion. I work with numerical integrators a lot in my field so I would definitely like to pick up a few more tricks from you guys.
I have posted my code below, but essentially, I have one main function called "Motion" which takes 2 initial conditions and integrate their motion for a set amount of time. I have JITTED this function and all the functions it called upon iteratively: "RK4", "ODE", "Electric Field". Lastly, I call the pool function from Multiprocessing to parallelize the "Motion" function and insert different initial conditions for each simulation it runs.
Again, I've implemented every types of optimization I'm aware of, however I'm still not very happy with its speed. I've posted my code below. If anyone can spot a piece of algorithm that could be further optimized, that would be extremely helpful and educational (for me at least)! Thank you for your time.
import numpy as np
import matplotlib.pyplot as plt
from numba import njit, prange
from time import time
from tqdm import tqdm
import multiprocessing as mp
from IPython.display import clear_output
from scipy import interpolate
"Electric Field Information"
A = np.float32(1.00E-04)
N_waves = np.int(19)
frequency = np.linspace(37.5,46.5,N_waves)*1e-3 #Set of frequencies used for Electric Field
m = np.int(20) #Azimuthal Wave Number
sigma = np.float32(0.5) #Gaussian Width of E wave in L
zeta = np.float32(1)
"Particle Information"
N_Particles = np.int(10000)
q = np.float32(-1) #Charge of electron
mass = np.float32(0.511e6) #Mass of Proton eV/c^2
FirstAdiabatic = np.float32(2000e10) #MeV/Gauss Adiabatic Invariant
"Runge-Kutta Paramters"
Total_Time = np.float32(10) #hours
Step_Size = np.float32(0.2) #second
Plot_Time = np.float32(60) #seconds
time_array = np.arange(0, Total_Time*3600+Step_Size, Step_Size) #Convert to seconds and Add End Point
N_points = len(time_array)
Skip_How_Many = int(Plot_Time/Step_Size) #Used to shorten our data set and save RAM
"Constants"
Beq = np.float64(31221.60592e-9) #nT
Re = np.float32(6371e3) #Meters
c = np.float32(2.998e8) #m/s
"Start Electric Field Code"
def wave_peak(omega): #Called once so no need to JIT or Optimize this
L_sample = np.linspace(1,10,100)
phidot = -3*FirstAdiabatic / (q* (L_sample*Re)**2 * np.sqrt(1+ (2*FirstAdiabatic*Beq/ (mass*L_sample**3)) ) )
phidot_to_L = interpolate.interp1d(phidot,L_sample, kind = 'cubic')
L0i = phidot_to_L(omega/m)
return L0i
omega = 2*np.pi*frequency
L0i_wave = wave_peak(omega)
Phi0i_wave = np.linspace(0,2*np.pi,N_waves)
np.random.shuffle(Phi0i_wave)
#njit(nogil= True)
def Electric_Field(t,r):
E0 = A*np.exp(-(r[0]-L0i_wave)**2 / (2*sigma**2))
Delta = np.arctan2( (r[0] * (r[0]-L0i_wave)/sigma**2 - 1), (2*np.pi*r[0]/zeta) )
Er = E0/m * np.sqrt( (2*np.pi*r[0]/zeta)**2 + (r[0]*(r[0]-L0i_wave)/sigma**2 -1)**2 ) * np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta + Delta)
Ephi = E0*np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta)
return np.sum(Er),np.sum(Ephi)
"End of Electric Field Code"
"Particle's ODE - Equation of Motion"
#njit(nogil= True)
def ODE(t,r):
Er, Ephi = Electric_Field(t,r) #Pull out the electric so we only call it once.
Ldot = Ephi * r[0]**3 / (Re*Beq)
Phidot = -Er * r[0]**2 / (Re*Beq) - 3* FirstAdiabatic / (q*r[0]**2*Re**2) * 1/np.sqrt(2*FirstAdiabatic*Beq/ (r[0]**3*mass) + 1)
return np.array([Ldot,Phidot])
#njit(nogil= True)
def RK4(t,r): #Standard Runge-Kutta Integration Algorthim
k1 = Step_Size*ODE(t,r)
k2 = Step_Size*ODE(t+Step_Size/2, r+k1/2)
k3 = Step_Size*ODE(t+Step_Size/2, r+k2)
k4 = Step_Size*ODE(t+Step_Size, r+k3)
return r + k1/6 + k2/3 + k3/3 + k4/6
#njit(nogil= True)
def Motion(L0,Phi0): #Insert Inital Conditions and it will loop through the RK4 integrator and output all its positions.
L_Array = np.zeros_like(time_array)
Phi_Array = np.zeros_like(time_array)
L_Array[0] = L0
Phi_Array[0] = Phi0
for i in range(1,N_points):
L_Array[i], Phi_Array[i] = RK4(time_array[i-1], np.array([ L_Array[i-1],Phi_Array[i-1] ]) )
return L_Array[::Skip_How_Many], Phi_Array[::Skip_How_Many]
#Skip_How_Many is used to take up less RAM space since we don't need that kind of percsion in our data
# Location = Motion(5,0)
# x = Location[0]*np.cos(Location[1])
# y = Location[0]*np.sin(Location[1])
# plt.plot(x,y,"o", markersize = 0.5)
# ts = time()
# Motion(5,0)
# print('Solo Time:', time() - ts)
"Getting my Arrays ready so I can index it"
Split = int(np.sqrt(N_Particles))
L0i = np.linspace(4.4,5.5,Split)
Phi0i = np.linspace(0,360,Split) / 180 * np.pi
L0_Grid = np.repeat(L0i,Split)
# ^Here I want to run a meshgrid of L0i and Phi0, so I repeat L0i using this function and mod (%) the index on the Phi Function
#Create Appending Array
results = []
def get_results(result): #Call back to this array from Multiprocessing to append the results it gives per run.
results.append(result)
clear_output()
print("Getting Results %0.2f" %(len(results)/N_Particles * 100), end='\r')
if __name__ == '__main__':
#Call In Multiprocessing
pool = mp.Pool(mp.cpu_count()) #Counting number of threads to start
ts = time() #Timing this process. Begins here
for ii in range(N_Particles): #Not too sure what this does, but it works - I assume it parallelizes this loop
pool.apply_async(Motion, args = (L0_Grid[ii],Phi0i[int(ii%Split)]), callback=get_results)
pool.close() #I'm not too sure what this does but everyone uses it, and it won't work without it
pool.join()
print('Time in MP parallel:', time() - ts) #Output Time
I think the main reason why your code is slow is because your Runge-Kutta method has fixed time steps. Fancy ODE solvers will select the biggest time step that allows a tolerable amount of error. One example is the LSODA ODE solver from ODEPACK.
Below I've re-written your code using NumbaLSODA. On my computer, it speeds up your code by about 200x.
import numpy as np
import matplotlib.pyplot as plt
from numba import njit, prange
from time import time
from tqdm import tqdm
import multiprocessing as mp
from scipy import interpolate
from NumbaLSODA import lsoda_sig, lsoda
from numba import cfunc
import numba as nb
"Electric Field Information"
A = np.float32(1.00E-04)
N_waves = np.int(19)
frequency = np.linspace(37.5,46.5,N_waves)*1e-3 #Set of frequencies used for Electric Field
m = np.int(20) #Azimuthal Wave Number
sigma = np.float32(0.5) #Gaussian Width of E wave in L
zeta = np.float32(1)
"Particle Information"
N_Particles = np.int(10000)
q = np.float32(-1) #Charge of electron
mass = np.float32(0.511e6) #Mass of Proton eV/c^2
FirstAdiabatic = np.float32(2000e10) #MeV/Gauss Adiabatic Invariant
"Runge-Kutta Paramters"
Total_Time = np.float32(10) #hours
Step_Size = np.float32(0.2) #second
Plot_Time = np.float32(60) #seconds
time_array = np.arange(0, Total_Time*3600+Step_Size, Step_Size) #Convert to seconds and Add End Point
N_points = len(time_array)
Skip_How_Many = int(Plot_Time/Step_Size) #Used to shorten our data set and save RAM
"Constants"
Beq = np.float64(31221.60592e-9) #nT
Re = np.float32(6371e3) #Meters
c = np.float32(2.998e8) #m/s
"Start Electric Field Code"
def wave_peak(omega): #Called once so no need to JIT or Optimize this
L_sample = np.linspace(1,10,100)
phidot = -3*FirstAdiabatic / (q* (L_sample*Re)**2 * np.sqrt(1+ (2*FirstAdiabatic*Beq/ (mass*L_sample**3)) ) )
phidot_to_L = interpolate.interp1d(phidot,L_sample, kind = 'cubic')
L0i = phidot_to_L(omega/m)
return L0i
omega = 2*np.pi*frequency
L0i_wave = wave_peak(omega)
Phi0i_wave = np.linspace(0,2*np.pi,N_waves)
np.random.shuffle(Phi0i_wave)
#njit
def Electric_Field(t,r):
E0 = A*np.exp(-(r[0]-L0i_wave)**2 / (2*sigma**2))
Delta = np.arctan2( (r[0] * (r[0]-L0i_wave)/sigma**2 - 1), (2*np.pi*r[0]/zeta) )
Er = E0/m * np.sqrt( (2*np.pi*r[0]/zeta)**2 + (r[0]*(r[0]-L0i_wave)/sigma**2 -1)**2 ) * np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta + Delta)
Ephi = E0*np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta)
return np.sum(Er),np.sum(Ephi)
"End of Electric Field Code"
"Particle's ODE - Equation of Motion"
#cfunc(lsoda_sig)
def ODE(t, r_, dr, p):
r = nb.carray(r_, (2,))
Er, Ephi = Electric_Field(t,r)
Ldot = Ephi * r[0]**3 / (Re*Beq)
Phidot = -Er * r[0]**2 / (Re*Beq) - 3* FirstAdiabatic / (q*r[0]**2*Re**2) * 1/np.sqrt(2*FirstAdiabatic*Beq/ (r[0]**3*mass) + 1)
dr[0] = Ldot
dr[1] = Phidot
funcptr = ODE.address
#njit
def Motion(L0,Phi0):
u0 = np.array([L0,Phi0],np.float64)
data = np.array([5.0])
usol, success = lsoda(funcptr, u0, time_array, data)
L_Array = usol[:,0]
Phi_Array = usol[:,1]
return L_Array[::Skip_How_Many], Phi_Array[::Skip_How_Many]
#Skip_How_Many is used to take up less RAM space since we don't need that kind of percsion in our data
Location = Motion(5,0)
x = Location[0]*np.cos(Location[1])
y = Location[0]*np.sin(Location[1])
plt.plot(x,y,"o", markersize = 0.5)
ts = time()
Motion(5,0)
print('Solo Time:', time() - ts)
Related
I'm attempting to implement a Gaussian smoothing/flattening function in my Python 3.10 script to flatten a set of XY-points. For each data point, I'm creating a Y buffer and a Gaussian kernel, which I use to flatten each one of the Y-points based on it's neighbours.
Here are some sources on the Gaussian-smoothing method:
Source 1
Source 2
I'm using the NumPy module for my data arrays, and MatPlotLib to plot the data.
I wrote a minimal reproducible example, with some randomly-generated data, and each one of the arguments needed for the Gaussian function listed at the top of the main function:
import numpy as np
import matplotlib.pyplot as plt
import time
def main():
dataSize = 1000
yDataRange = [-4, 4]
reachPercentage = 0.1
sigma = 10
phi = 0
amplitude = 1
testXData = np.arange(stop = dataSize)
testYData = np.random.uniform(low = yDataRange[0], high = yDataRange[1], size = dataSize)
print("Flattening...")
startTime = time.time()
flattenedYData = GaussianFlattenData(testXData, testYData, reachPercentage, sigma, phi, amplitude)
totalTime = round(time.time() - startTime, 2)
print("Flattened! (" + str(totalTime) + " sec)")
plt.title(str(totalTime) + " sec")
plt.plot(testXData, testYData, label = "Original Data")
plt.plot(testXData, flattenedYData, label = "Flattened Data")
plt.legend()
plt.show()
plt.close()
def GaussianFlattenData(xData, yData, reachPercentage, sigma, phi, amplitude):
flattenedYData = np.empty(shape = len(xData), dtype = float)
# For each data point, create a Y buffer and a Gaussian kernel, and flatten it based on it's neighbours
for i in range(len(xData)):
gaussianCenter = xData[i]
baseReachEdges = GetGaussianValueX((GetGaussianValueY(0, 0, sigma, phi, amplitude) * reachPercentage), 0, sigma, phi, amplitude)
reachEdgeIndices = [FindInArray(xData, GetClosestNum((gaussianCenter + baseReachEdges[0]), xData)),
FindInArray(xData, GetClosestNum((gaussianCenter + baseReachEdges[1]), xData))]
currDataScanNum = reachEdgeIndices[0] - reachEdgeIndices[1]
# Creating Y buffer and Gaussian kernel...
currYPoints = np.empty(shape = currDataScanNum, dtype = float)
kernel = np.empty(shape = currDataScanNum, dtype = float)
for j in range(currDataScanNum):
currYPoints[j] = yData[j + reachEdgeIndices[1]]
kernel[j] = GetGaussianValueY(j, (i - reachEdgeIndices[1]), sigma, phi, amplitude)
# Dividing kernel by its sum...
kernelSum = np.sum(kernel)
for j in range(len(kernel)):
kernel[j] = (kernel[j] / kernelSum)
# Acquiring the current flattened Y point...
newCurrYPoints = np.empty(shape = len(currYPoints), dtype = float)
for j in range(len(currYPoints)):
newCurrYPoints[j] = currYPoints[j] * kernel[j]
flattenedYData[i] = np.sum(newCurrYPoints)
return flattenedYData
def GetGaussianValueX(y, mu, sigma, phi, amplitude):
x = ((sigma * np.sqrt(-2 * np.log(y / (amplitude * np.cos(phi))))) + mu)
return [x, (mu - (x - mu))]
def GetGaussianValueY(x, mu, sigma, phi, amplitude):
y = ((amplitude * np.cos(phi)) * np.exp(-np.power(((x - mu) / sigma), 2) / 2))
return y
def GetClosestNum(base, nums):
closestIdx = 0
closestDiff = np.abs(base - nums[0])
idx = 1
while (idx < len(nums)):
currDiff = np.abs(base - nums[idx])
if (currDiff < closestDiff):
closestDiff = currDiff
closestIdx = idx
idx += 1
return nums[closestIdx]
def FindInArray(arr, value):
for i in range(len(arr)):
if (arr[i] == value):
return i
return -1
if (__name__ == "__main__"):
main()
In the example above, I generate 1,000 random data points, between the ranges of -4 and 4. The reachPercentage variable is the percentage of the Gaussian amplitude above which the Gaussian values will be inserted into the kernel. The sigma, phi and amplitude variables are all inputs to the Gaussian function which will actually generate the Gaussians for each Y-data point to be smoothened.
I wrote some additional utility functions which I needed as well.
The script above works to smoothen the generated data, and I get the following plot:
Blue being the original data, and Orange being the flattened data.
However, it takes a surprisingly long amount of time to smoothen even smaller amounts of data. In the example above I generated 1,000 data points, and it takes ~8 seconds to flatten that. With datasets exceeding 10,000 in number, it can easily take over 10 minutes.
Since this is a very popular and known way of smoothening data, I was wondering why this script ran so slow. I originally had this implemented with standard Pythons Lists with calling append, however it was extremely slow. I hoped that using the NumPy arrays instead without calling the append function would make it faster, but that is not really the case.
Is there a way to speed up this process? Is there a Gaussian-smoothing function that already exists out there, that takes in the same arguments, and that could do the job faster?
Thanks for reading my post, any guidance is appreciated.
You have a number of loops - those tend to slow you down.
Here are two examples. Refactoring GetClosestNum to this:
def GetClosestNum(base, nums):
nums = np.array(nums)
diffs = np.abs(nums - base)
return nums[np.argmin(diffs)]
and refactoring FindInArray to this:
def FindInArray(arr, value):
res = np.where(np.array(arr) - value == 0)[0]
if res.size > 0:
return res[0]
else:
return -1
lets me process 5000 datapoints in 1.5s instead of the 54s it took with your original code.
Numpy lets you do a lot of powerful stuff without looping - Jake Vanderplas has a few really good (oldie but goodie) videos on using numpy constructs in place of loops to massively increase speed - https://www.youtube.com/watch?v=EEUXKG97YRw.
After asking people on the Python forums, as well as doing some more searching online, I managed to find much faster alternatives to most of the functions I had in my loop.
In order to get a better image of which parts of the smoothing function took up the most time, I subdivided the code into 4 parts, and timed each one to see how much each part contributed to the total runtime. To my surprise, the part that took up over 90% of the time, was the first part of the loop:
gaussianCenter = xData[i]
baseReachEdges = GetGaussianValueX((GetGaussianValueY(0, 0, sigma, phi, amplitude) * reachPercentage), 0, sigma, phi, amplitude)
reachEdgeIndices = [FindInArray(xData, GetClosestNum((gaussianCenter + baseReachEdges[0]), xData)),
FindInArray(xData, GetClosestNum((gaussianCenter + baseReachEdges[1]), xData))]
currDataScanNum = reachEdgeIndices[0] - reachEdgeIndices[1]
Luckly, the people on the Python forums and here were able to assist me, and I was able find a much faster alternative GetClosestNum function (thanks Vin), as well as removing the FindInArray function:
There are also replacements in the latter parts of the loop, where instead of having 3 for loops, they were all replaced my NumPy iterative expressions.
The whole script now looks like this:
import numpy as np
import matplotlib.pyplot as plt
import time
def main():
dataSize = 3073
yDataRange = [-4, 4]
reachPercentage = 0.001
sigma = 100
phi = 0
amplitude = 1
testXData = np.arange(stop = dataSize)
testYData = np.random.uniform(low = yDataRange[0], high = yDataRange[1], size = dataSize)
print("Flattening...")
startTime = time.time()
flattenedYData = GaussianFlattenData(testXData, testYData, reachPercentage, sigma, phi, amplitude)
totalTime = round(time.time() - startTime, 2)
print("Flattened! (" + str(totalTime) + " sec)")
plt.title(str(totalTime) + " sec")
plt.plot(testXData, testYData, label = "Original Data")
plt.plot(testXData, flattenedYData, label = "Flattened Data")
plt.legend()
plt.show()
plt.close()
def GaussianFlattenData(xData, yData, reachPercentage, sigma, phi, amplitude):
flattenedYData = np.empty(shape = len(xData), dtype = float)
# For each data point, create a Y buffer and a Gaussian kernel, and flatten it based on it's neighbours
for i in range(len(xData)):
gaussianCenter = xData[i]
baseReachEdges = GetGaussianValueX((GetGaussianValueY(0, 0, sigma, phi, amplitude) * reachPercentage), 0, sigma, phi, amplitude)
reachEdgeIndices = [np.where(xData == GetClosestNum((gaussianCenter + baseReachEdges[0]), xData))[0][0],
np.where(xData == GetClosestNum((gaussianCenter + baseReachEdges[1]), xData))[0][0]]
currDataScanNum = reachEdgeIndices[0] - reachEdgeIndices[1]
# Creating Y buffer and Gaussian kernel...
currYPoints = yData[reachEdgeIndices[1] : reachEdgeIndices[1] + currDataScanNum]
kernel = GetGaussianValueY(np.arange(currDataScanNum), (i - reachEdgeIndices[1]), sigma, phi, amplitude)
# Acquiring the current flattened Y point...
flattenedYData[i] = np.sum(currYPoints * (kernel / np.sum(kernel)))
return flattenedYData
def GetGaussianValueX(y, mu, sigma, phi, amplitude):
x = ((sigma * np.sqrt(-2 * np.log(y / (amplitude * np.cos(phi))))) + mu)
return [x, (mu - (x - mu))]
def GetGaussianValueY(x, mu, sigma, phi, amplitude):
y = ((amplitude * np.cos(phi)) * np.exp(-np.power(((x - mu) / sigma), 2) / 2))
return y
def GetClosestNum(base, nums):
nums = np.asarray(nums)
return nums[(np.abs(nums - base)).argmin()]
if (__name__ == "__main__"):
main()
Instead of taking ~8 seconds to process the 1,000 data points, it now takes merely ~0.15 seconds!
It also takes ~1.75 seconds to process the 10,000 points.
Thanks for the feedback everyone, cheers!
I am trying to integrate an equation by python. However, I don't understand why the integration doesn't run. The equation I want to integrate is:
Having:
, to find having and , with .
I am doing this procedure:
from sympy import *
x=symbols('x')
y=symbols('y')
Gamma = 0.167
sigma_8 = 0.9
M_8 = 6e14
gamma = (0.3*Gamma+0.2)*(2.92+1/3*log(x/M_8))
sigma = 0.9*(x/M_8)**(-gamma/3)
diff(sigma,x)
print(diff(sigma,x))
out:
0.9*(1.66666666666667e-15*x)**(-0.0277888888888889*log(1.66666666666667e-15*x) - 0.243430666666667)*(600000000000000.0*(-4.63148148148148e-17*log(1.66666666666667e-15*x) - 4.05717777777778e-16)/x - 0.0277888888888889*log(1.66666666666667e-15*x)/x)
Then
import math
dnudM = math.sqrt(0.707)*1.686*(1+y)*diff(sigma,x)
print(dnudM)
Out:
0.9*(1.66666666666667e-15*x)**(-0.0277888888888889*log(1.66666666666667e-15*x) - 0.243430666666667)*(1.41764430376593*y + 1.41764430376593)*(600000000000000.0*(-4.63148148148148e-17*log(1.66666666666667e-15*x) - 4.05717777777778e-16)/x - 0.0277888888888889*log(1.66666666666667e-15*x)/x)
Then
n = (1+(1/x**0.6))*exp(-x**2/2)*dnudM
print(n)
And out
0.9*(1.66666666666667e-15*x)**(-0.0277888888888889*log(1.66666666666667e-15*x) - 0.243430666666667)*(x**(-0.6) + 1)*(1.41764430376593*y + 1.41764430376593)*(600000000000000.0*(-4.63148148148148e-17*log(1.66666666666667e-15*x) - 4.05717777777778e-16)/x - 0.0277888888888889*log(1.66666666666667e-15*x)/x)*exp(-x**2/2)
Finally, I arrive to this point that the integration doesn't produce any output.
n_H = integrate(n, x)
print(n_H)
It doesn't show either errors nor output!
The code you have ran to completion for me, but took nearly 30 hours.
I tweaked a couple of variable names and added in some timer code to track it.
Code:
import math
from sympy import *
import time
x=symbols('x')
y=symbols('y')
Gamma = 0.167
sigma_8 = 0.9
M_8 = 6e14
gamma_x = (0.3*Gamma+0.2)*(2.92+1/3*log(x/M_8))
sigma = 0.9*(x/M_8)**(-gamma_x/3)
diff_sigma_x = diff(sigma,x)
print(f"{diff_sigma_x=}\n")
dnudM = math.sqrt(0.707)*1.686*(1+y)*diff_sigma_x
print(f"{dnudM=}\n")
n = (1+(1/x**0.6))*exp(-x**2/2)*dnudM
print(f"{n=}\n")
start = time.time()
n_H = integrate(n, x)
print(f"{n_H=}\n")
end = time.time()
print(f"Time to integrate = {end-start} seconds")
Results
n_H=-2.10235303142289*(y + 1)*(1.0*Integral(-4.20001660788705e-11*exp(-x**2/2)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.897831723570746*x**(0.0277888888888889*log(x))), x) + 1.0*Integral(-4.20001660788705e-11*exp(-x**2/2)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.297831723570746*x**(0.0277888888888889*log(x))), x) + 1.0*Integral(1.41662964847297e-12*exp(-x**2/2)*log(x)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.897831723570746*x**(0.0277888888888889*log(x))), x) + 1.0*Integral(1.41662964847297e-12*exp(-x**2/2)*log(x)/(1.66666666666667e-15**(0.0277888888888889*log(x))*x**0.297831723570746*x**(0.0277888888888889*log(x))), x))
Time to integrate = 107187.83417797089 seconds
I am trying to implement 2 temperature models, the following equations:
C_e(∂T_e)/∂t=∇[k_e∇T_e ]-G(T_e-T_ph )+ A(r,t)
C_ph(∂T_ph)/∂t=∇[k_ph∇T_ph] + G(T_e-T_ph)
Code
from fipy.tools import numerix
import scipy
import fipy
import numpy as np
from fipy import CylindricalGrid1D
from fipy import Variable, CellVariable, TransientTerm, DiffusionTerm, Viewer, LinearLUSolver, LinearPCGSolver, \
LinearGMRESSolver, ImplicitDiffusionTerm, Grid1D
FIPY_SOLVERS = scipy
## Mesh
nr = 50
dr = 1e-7
# r = nr * dr
mesh = CylindricalGrid1D(nr=nr, dr=dr, origin=0)
x = mesh.cellCenters[0]
# Variables
T_e = CellVariable(name="electronTemp", mesh=mesh,hasOld=True)
T_e.setValue(300)
T_ph = CellVariable(name="phononTemp", mesh=mesh, hasOld=True)
T_ph.setValue(300)
G = CellVariable(name="EPC", mesh=mesh)
t = Variable()
# Material parameters
C_e = CellVariable(name="C_e", mesh=mesh)
k_e = CellVariable(name="k_e", mesh=mesh)
C_ph = CellVariable(name="C_ph", mesh=mesh)
k_ph = CellVariable(name="k_ph", mesh=mesh)
C_e = 4.15303 - (4.06897 * numerix.exp(T_e / -85120.8644))
C_ph = 4.10446 - 3.886 * numerix.exp(-T_ph / 373.8)
k_e = 0.1549 * T_e**-0.052
k_ph =1.24 + 16.29 * numerix.exp(-T_ph / 151.57)
G = numerix.exp(21.87 + 10.062 * numerix.log(numerix.log(T_e )- 5.4))
# Boundary conditions
T_e.constrain(300, where=x > 4.5e-6)
T_ph.constrain(300, where=x > 4.5e-6)
# Source 𝐴(𝑟,𝑡) = 𝑎𝐷(𝑟)𝜏−1 𝑒−𝑡/𝜏 , 𝐷(𝑟) = 𝑆𝑒 exp (−𝑟2/𝜎2)/√2𝜋𝜎2
sig = 1.0e-6
tau = 1e-15
S_e = 35
d_r = (S_e * 1.6e-9 * numerix.exp(-x**2 /sig**2)) / (numerix.sqrt(2. * 3.14 * sig**2))
A_t = numerix.exp(-t/tau)
a = (numerix.sqrt(2. * 3.14)) / (3.14 * sig)
A_r = a * d_r * tau**-1 * A_t
eq0 = (TransientTerm(var=T_e, coeff=C_e) == DiffusionTerm(var=T_e, coeff=k_e) - G*(T_e - T_ph) + A_r
eq1 =(TransientTerm(var=T_ph, coeff=C_ph) == DiffusionTerm(var=T_ph, coeff=k_ph) + G*(T_e - T_ph)
eq = eq0 & eq1
dt = 1e-18
steps = 7000
elapsed = 0.
vi = Viewer((T_e, T_ph), datamin=0., datamax=2e4)
for step in range(steps):
T_e.updateOld()
T_ph.updateOld()
vi.plot()
res = 1e100
dt *= 1.1
while res > 1:
res = eq.sweep(dt=dt)
print(t, res)
t.setValue(t + dt)
Problem
The code is working fine with very small dt = 1e-18, but I need to run it until e 1e-10.
With this time step is going to take very long time, and setting dt *= 1.1 the resduals at some point start to increase then
gives following runtime error:
factor is exactly singular
Even with very small increment dt*= 1.005 the same issue pop up.
Using dt= 1.001 runs the code for quit long time then the residual get stuck at certain value.
Questions
I there any error in the fipy formalism of the equations?
What causes the error?
Is the error because of time step increase? If yes, how can I increase my time step?
I've made a few more changes to the code that can get you to an elapsed time of 1e-10. The main changes are
Using ImplicitSourceTerm for the terms with G. This stabalizes the solution.
Applied underRelaxation=0.5 in the sweep step. This slows down the updates in the sweep loop so the feedback loop is damped down.
Removed FIPY_SOLVERS=scipy. This isn't doing anything. FIPY_SOLVERS is an environment variable that you set outside of the Python environment.
The way the boundary conditions were applied seemed strange so I applied them in a more canonical way.
The sweep loop is fixed to 10 sweeps to get to a steady state quickly. Note that as the solution gets close to a stable steady state, the residual won't get better necessarily. Probably want to go back to residual checks if you need an accurate transient.
from fipy.tools import numerix
import scipy
import fipy
import numpy as np
from fipy import CylindricalGrid1D
from fipy import Variable, CellVariable, TransientTerm, DiffusionTerm, Viewer, LinearLUSolver, LinearPCGSolver, \
LinearGMRESSolver, ImplicitDiffusionTerm, Grid1D, ImplicitSourceTerm
## Mesh
nr = 50
dr = 1e-7
# r = nr * dr
mesh = CylindricalGrid1D(nr=nr, dr=dr, origin=0)
x = mesh.cellCenters[0]
# Variables
T_e = CellVariable(name="electronTemp", mesh=mesh,hasOld=True)
T_e.setValue(300)
T_ph = CellVariable(name="phononTemp", mesh=mesh, hasOld=True)
T_ph.setValue(300)
G = CellVariable(name="EPC", mesh=mesh)
t = Variable()
# Material parameters
C_e = CellVariable(name="C_e", mesh=mesh)
k_e = CellVariable(name="k_e", mesh=mesh)
C_ph = CellVariable(name="C_ph", mesh=mesh)
k_ph = CellVariable(name="k_ph", mesh=mesh)
C_e = 4.15303 - (4.06897 * numerix.exp(T_e / -85120.8644))
C_ph = 4.10446 - 3.886 * numerix.exp(-T_ph / 373.8)
k_e = 0.1549 * T_e**-0.052
k_ph =1.24 + 16.29 * numerix.exp(-T_ph / 151.57)
G = numerix.exp(21.87 + 10.062 * numerix.log(numerix.log(T_e )- 5.4))
# Boundary conditions
T_e.constrain(300, where=mesh.facesRight)
T_ph.constrain(300, where=mesh.facesRight)
# Source 𝐴(𝑟,𝑡) = 𝑎𝐷(𝑟)𝜏−1 𝑒−𝑡/𝜏 , 𝐷(𝑟) = 𝑆𝑒 exp (−𝑟2/𝜎2)/√2𝜋𝜎2
sig = 1.0e-6
tau = 1e-15
S_e = 35
d_r = (S_e * 1.6e-9 * numerix.exp(-x**2 /sig**2)) / (numerix.sqrt(2. * 3.14 * sig**2))
A_t = numerix.exp(-t/tau)
a = (numerix.sqrt(2. * 3.14)) / (3.14 * sig)
A_r = a * d_r * tau**-1 * A_t
eq0 = (
TransientTerm(var=T_e, coeff=C_e) == \
DiffusionTerm(var=T_e, coeff=k_e) - \
ImplicitSourceTerm(coeff=G, var=T_e) + \
ImplicitSourceTerm(var=T_ph, coeff=G) + \
A_r)
eq1 = (TransientTerm(var=T_ph, coeff=C_ph) == DiffusionTerm(var=T_ph, coeff=k_ph) + ImplicitSourceTerm(var=T_e, coeff=G) - ImplicitSourceTerm(coeff=G, var=T_ph))
eq = eq0 & eq1
dt = 1e-18
steps = 7000
elapsed = 0.
vi = Viewer((T_e, T_ph), datamin=0., datamax=2e4)
for step in range(steps):
T_e.updateOld()
T_ph.updateOld()
vi.plot()
res = 1e100
dt *= 1.1
count = 0
while count < 10:
res = eq.sweep(dt=dt, underRelaxation=0.5)
print(t, res)
count += 1
print('elapsed:', t.value)
t.setValue(t + dt)
Regarding your questions.
I there any error in the fipy formalism of the equations?
Actually, no. Nothing wrong with the formalism, but better to use ImplicitSourceTerm.
What causes the error?
There are two source of instability in this system. The source terms inside the equation when written explicitly are unstable above a certain time step. Using an ImplcitSourceTerm removes this instablity. There is also some sort of instability in the coupling of the equations. I think that using under relaxation helps with that.
Is the error because of time step increase? If yes, how can I increase my time step?
Explained above.
In addition to #wd15's answer:
Your equations are extremely non-linear. You will likely benefit from Newton iterations to get decent convergence.
As #TimRoberts said, geometrically increasing the time step without bound is probably not a good idea.
I've recently posted a package called steppyngstounes that takes care of adapting timesteps. Although a standalone package, it's intended to work with FiPy. For example, you could change your solve loop to this:
from steppyngstounes import FixedStepper, PIDStepper
T_e.updateOld()
T_ph.updateOld()
for checkpoint in FixedStepper(start=0, stop=1e-10, size=1e-12):
for step in PIDStepper(start=checkpoint.begin,
stop=checkpoint.end,
size=dt):
res = 1e100
for sweep in range(10):
res = eq.sweep(dt=dt, underRelaxation=0.5)
print(t, sweep, res)
if step.succeeded(error=res / 1000):
T_e.updateOld()
T_ph.updateOld()
t.value = step.end
else:
T_e.value = T_e.old
T_ph.value = T_ph.old
print('elapsed:', t.value)
# the last step might have been smaller than possible,
# if it was near the end of the checkpoint range
dt = step.want
_ = checkpoint.succeeded()
vi.plot()
This code will update the viewer every 1e-12 time units, and adaptively make it's way between those checkpoints. There are other steppers in the package that would facilitate taking geometrically or exponentially increasing checkpoints, if that kept things more interesting.
You could probably get better overall performance by sweeping fewer times and letting the adapter take much smaller time steps in the beginning. I found that no time step was small enough to get the initial residual lower than 777.9. After the first couple of steps, the error metric could probably be much more aggressive, giving more accurate results.
currently I am trying to think about how to speedup odeint for a simulation I am running. Actually its only a primitive second order ode with a friction term and a discontinuous force. The model I am using to describe the dynamics is defined in a seperate function. Trying to solve the ode results in an error or extremely high computation times (talking about days).
Here is my code mostly hardcoded:
import numpy as np
import pandas as pd
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def create_ref(tspan):
if tspan>2 and tspan<8:
output = np.sin(tspan)
elif tspan>=20:
output = np.sin(tspan*10)
else:
output = 0.5
return output
def model(state,t):
def Fs(x):
pT = 0
pB = 150
p0 = 200
pA = 50
kein = 0.25
kaus = 0.25
if x<0:
pA = pT
Fsres = -kaus*x*pA-kein*x*(p0-pB)
else:
pB = pT
Fsres = -kaus*x*pB-kein*x*(p0-pA)
return Fsres
x,dx = state
refnow = np.interp(t,xref.index.values,xref.values.squeeze())
refprev = np.interp(t-dt,xref.index.values,xref.values.squeeze())
drefnow = (refnow-refprev)/dt
x_meas = x
dx_meas = dx
errnow = refnow-x_meas
errprev = refprev-(x_meas-dx_meas*dt)
intrefnow = dt*(errnow-errprev)
kp = 10
kd = 0.1
ki = 100
sigma = kp*(refnow-x_meas)+kd*(drefnow-dx_meas)+ki*intrefnow
tr0 = 5
FricRed = (1.5-0.5*np.tanh(0.1*(t-tr0)))
kpu = 300
fr = 0.1
m = 0.01
d = 0.01
k = 10
u = float(kpu*np.sqrt(np.abs(sigma))*np.sign(sigma))
ddx = 1/m*(Fs(x)+FricRed*fr*np.sign(dx)-d*dx-k*x + u )
return [dx,ddx]
dt = 1e-3
tspan = np.arange(start=0, stop=50, step=dt)
steplim = tspan[-1]*0.1
reffunc = np.vectorize(create_ref)
xrefvals = reffunc(tspan)
xref = pd.DataFrame(data=xrefvals,index=tspan,columns=['xref'])
x0 = [-0.5,0]
simresult = odeint(model, x0, tspan)
plt.figure(num=1)
plt.plot(tspan,simresult[:,0],label='ispos')
plt.plot(tspan,xref['xref'].values,label='despos')
plt.legend()
plt.show()
I change the code according to Pranav Hosangadi s comments. Thanks for the hints. I didn't know that and learned something new and didn't expect dictionaries to have such a high impact on computation time. But now its much faster.
I am trying to use scipy.integrate.solve_ivp to calculate the solutions to newton's gravitation equation for my n body simulation, however I am confused how the function is passed into solve_ivp. I have the following code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
G = 6.67408e-11
m_sun = 1988500e24
m_jupiter = 1898.13e24
m_earth = 5.97219e24
au = 149597870.700e3
v_factor = 1731460
year = 31557600.e0
init_s = np.array([-6.534087946884256E-03*au, 6.100454846284101E-03*au, 1.019968145073305E-04*au, -6.938967653087248E-06*v_factor, -5.599052606952444E-06*v_factor, 2.173251724105919E-07*v_factor])
init_j = np.array([2.932487231769548E+00*au, -4.163444383137574E+00*au, -4.833604407653648E-02*au, 6.076788230491844E-03*v_factor, 4.702729516645153E-03*v_factor, -1.554436340872727E-04*v_factor])
variables_s = init_s
variables_j = init_j
N = 2
tStart = 0e0
t_End = 25*year
Nt = 2000
dt = t_End/Nt
temp_end = dt
t=tStart
domain = (t, temp_end)
planetsinit = np.vstack((init_s, init_j))
planetspos = planetsinit[:,0:3]
mass = np.vstack((1988500e24, 1898.13e24))
def weird_division(n, d):
return n / d if d else 0
variables_save = np.zeros((N,6,Nt))
variables_save[:,:,0] = planetsinit
pos_s = planetspos[0]
pos_j = planetspos[1]
while t < t_End:
t_index = int(weird_division(t, dt))
for index in range(len(planetspos)):
for otherindex in range(len(planetspos)):
if index != otherindex:
x1_p1, x2_p1, x3_p1 = planetsinit[index, 0:3]
x1_p2, x2_p2, x3_p2 = planetsinit[otherindex, 0:3]
m = mass[otherindex]
def f_grav(t, y):
x1_p1, x2_p1, x3_p1, v1_p1, v2_p1, v3_p1 = y
x1_diff = x1_p1 - x1_p2
x2_diff = x2_p1 - x2_p2
x3_diff = x3_p1 - x3_p2
dydt = [v1_p1,
v2_p1,
v3_p1,
-(x1_diff)*G*m/((x1_diff)**2+(x2_diff)**2+(x3_diff)**2)**(3/2),
-(x2_diff)*G*m/((x1_diff)**2+(x2_diff)**2+(x3_diff)**2)**(3/2),
-(x3_diff)*G*m/((x1_diff)**2+(x2_diff)**2+(x3_diff)**2)**(3/2)]
return dydt
solution = solve_ivp(fun=f_grav, t_span=domain, y0=planetsinit[index])
planetsinit[index] = solution['y'][0:6, -1]
variables_save[index,:,t_index] = solution['y'][0:6, -1]
planetspos[index] = planetsinit[index][0:3]
t += dt
temp_end += dt
domain = (t,temp_end)
pos_s = variables_save[0,0:3,:]
pos_j = variables_save[1,0:3,:]
plt.plot(variables_save[0,0:3,:][0], variables_save[0,0:3,:][1])
plt.plot(variables_save[1,0:3,:][0], variables_save[1,0:3,:][1])
The code above works very nicely and produces a stable orbit. However when I calculate the acceleration outside the function and feed that through into the f_grav function, something goes wrong and produces an orbit which is no longer stable. However I am perplexed as I don't know why the data is different as to be it seems like that I have passed through the exactly same inputs. Which leads me to think that maybe its the way the the function f_grav is interpolated by the solve_ivp integrator? To calculate the acceleration outside all I do is change the following code in the loop to:
x1_p1, x2_p1, x3_p1 = planetsinit[index, 0:3]
x1_p2, x2_p2, x3_p2 = planetsinit[otherindex, 0:3]
m = mass[otherindex]
x1_diff = x1_p1 - x1_p2
x2_diff = x2_p1 - x2_p2
x3_diff = x3_p1 - x3_p2
ax = -(x1_diff)*G*m/((x1_diff)**2+(x2_diff)**2+(x3_diff)**2)**(3/2)
ay = -(x2_diff)*G*m/((x1_diff)**2+(x2_diff)**2+(x3_diff)**2)**(3/2)
az = -(x3_diff)*G*m/((x1_diff)**2+(x2_diff)**2+(x3_diff)**2)**(3/2)
def f_grav(t, y):
x1_p1, x2_p1, x3_p1, v1_p1, v2_p1, v3_p1 = y
dydt = [v1_p1,
v2_p1,
v3_p1,
ax,
ay,
az]
return dydt
solution = solve_ivp(fun=f_grav, t_span=domain, y0=planetsinit[index])
planetsinit[index] = solution['y'][0:6, -1]
variables_save[index,:,t_index] = solution['y'][0:6, -1]
planetspos[index] = planetsinit[index][0:3]
As I said I don't know why different orbits are produces which are shown below and any hints as to why or how to solve it would me much appreciated. To clarify why I can't use the working code as it is, as when more bodies are involved I aim to sum the accelerations contribution of all the other planets which isn't possible this way where the acceleration is calculated in the function itself.
Sorry for the large coding chunks but I did feel it was appropriate as then it could be run and the problem itself is clearer.
Both have the same time period, dt, however the orbit on the left is stable and the one on the right is not