Related
As the title suggests, I would love some suggestions on how to make the following code faster. I have tried multiple methods including utilizing Numba, a combination of the np.fromiter() and map() functions, as well as Numpy vectorizing the hash function so as to get it all to work faster.
Take a look at the code below! Below the code are the results of the code from running it on my machine.
Some notes for the future:
the CODES variable holds the dimensions of the arrays that need to be hashed (generated random strings for StackOverflow purposes)
the first compilation time for Numba is quite high, but I am willing to take on this cost if it means speeding up the code in the long run when running hundreds of thousands of times
import numpy as np
import numba as nb
import time
# GENERATE RANDOM "<U64" CHARACTER ARRAYS
A, Z = np.array(["A","Z"]).view("int32")
CODES = (25058, 64272, 61425)
LENGTH = 64
tmp1 = np.random.randint(low = A, high = Z, size = CODES[0] * LENGTH, dtype = "int32").view(f"U{LENGTH}")
tmp2 = np.random.randint(low = A, high = Z, size = CODES[1] * LENGTH, dtype = "int32").view(f"U{LENGTH}")
tmp3 = np.random.randint(low = A, high = Z, size = CODES[2] * LENGTH, dtype = "int32").view(f"U{LENGTH}")
# NUMBA ITERATION AND HASHING ONE BY ONE
#nb.jit(nopython=True, fastmath=True, parallel=True, cache=True)
def form_hashed_array(to_hash: np.ndarray) -> np.ndarray:
hashed = np.empty_like(to_hash, dtype=np.int64)
for i in nb.prange(len(to_hash)):
hashed[i] = hash(to_hash[i])
return hashed
print("--------")
t2 = time.monotonic()
t = time.monotonic()
tmp4 = form_hashed_array(tmp1)
print(time.monotonic() - t) # this first one will be larger due to compilation time
t = time.monotonic()
tmp5 = form_hashed_array(tmp2)
print(time.monotonic() - t)
t = time.monotonic()
tmp6 = form_hashed_array(tmp3)
print(time.monotonic() - t)
print("NUMBA ITERATION TOOK: " + str(time.monotonic() - t2))
# MAP + FROMITER COMBINATION
print("--------")
t2 = time.monotonic()
t = time.monotonic()
tmp7 = np.fromiter((map(hash, tmp1)), dtype=np.int64)
print(time.monotonic() - t)
t=time.monotonic()
tmp8 = np.fromiter((map(hash, tmp2)), dtype=np.int64)
print(time.monotonic()-t)
t = time.monotonic()
tmp9 = np.fromiter((map(hash, tmp3)), dtype=np.int64)
print(time.monotonic()-t)
print("MAP + FROMITER COMBINATION TOOK : " + str(time.monotonic()-t2))
# NUMPY FUNCTION VECTORIZATION
vfunc = np.vectorize(hash)
print("--------")
t2 = time.monotonic()
t = time.monotonic()
tmp10 = vfunc(tmp1)
print(time.monotonic() - t)
t = time.monotonic()
tmp11 = vfunc(tmp2)
print(time.monotonic() - t)
t = time.monotonic()
tmp12 = vfunc(tmp3)
print(time.monotonic() - t)
print("NUMPY FUNCTION VECTORIZATION TOOK: " + str(time.monotonic()-t2))
# SANITY CHECKS
print("--------")
print(not (tmp4 - tmp7).any() and not (tmp7 - tmp10).any(), end=" ")
print(not (tmp5 - tmp8).any() and not (tmp8 - tmp11).any(), end=" ")
print(not (tmp6 - tmp9).any() and not (tmp9 - tmp12).any())
print("--------")
breakpoint()
This code will be run a significant number of times, so it is important that it is as fast as possible.
--------
6.9208437809720635
0.08914285799255595
0.09502507897559553
NUMBA ITERATION TOOK: 7.1051117710303515
--------
0.009926816972438246
0.02683716599131003
0.026946138008497655
MAP + FROMITER COMBINATION TOOK : 0.06381386297289282
--------
0.011753249040339142
0.02864329604199156
0.029279633017722517
NUMPY FUNCTION VECTORIZATION TOOK: 0.06976548000238836
--------
True True True
--------
Thanks for the help in advance!
I am relatively new to parallel computing and the Numba package. I am looking for optimization methods for my stupendously parallel N-body simulation. I've applied everything I know so far with Numpy arrays, JIT compliers, and multiprocessing. However, I'm still not getting the speed I desire (I've seen videos where their codes are MUCH faster still)
What I have currently is a rather simple python integrator using Runge-Kutta Integration and two equations of motion. I work with numerical integrators a lot in my field so I would definitely like to pick up a few more tricks from you guys.
I have posted my code below, but essentially, I have one main function called "Motion" which takes 2 initial conditions and integrate their motion for a set amount of time. I have JITTED this function and all the functions it called upon iteratively: "RK4", "ODE", "Electric Field". Lastly, I call the pool function from Multiprocessing to parallelize the "Motion" function and insert different initial conditions for each simulation it runs.
Again, I've implemented every types of optimization I'm aware of, however I'm still not very happy with its speed. I've posted my code below. If anyone can spot a piece of algorithm that could be further optimized, that would be extremely helpful and educational (for me at least)! Thank you for your time.
import numpy as np
import matplotlib.pyplot as plt
from numba import njit, prange
from time import time
from tqdm import tqdm
import multiprocessing as mp
from IPython.display import clear_output
from scipy import interpolate
"Electric Field Information"
A = np.float32(1.00E-04)
N_waves = np.int(19)
frequency = np.linspace(37.5,46.5,N_waves)*1e-3 #Set of frequencies used for Electric Field
m = np.int(20) #Azimuthal Wave Number
sigma = np.float32(0.5) #Gaussian Width of E wave in L
zeta = np.float32(1)
"Particle Information"
N_Particles = np.int(10000)
q = np.float32(-1) #Charge of electron
mass = np.float32(0.511e6) #Mass of Proton eV/c^2
FirstAdiabatic = np.float32(2000e10) #MeV/Gauss Adiabatic Invariant
"Runge-Kutta Paramters"
Total_Time = np.float32(10) #hours
Step_Size = np.float32(0.2) #second
Plot_Time = np.float32(60) #seconds
time_array = np.arange(0, Total_Time*3600+Step_Size, Step_Size) #Convert to seconds and Add End Point
N_points = len(time_array)
Skip_How_Many = int(Plot_Time/Step_Size) #Used to shorten our data set and save RAM
"Constants"
Beq = np.float64(31221.60592e-9) #nT
Re = np.float32(6371e3) #Meters
c = np.float32(2.998e8) #m/s
"Start Electric Field Code"
def wave_peak(omega): #Called once so no need to JIT or Optimize this
L_sample = np.linspace(1,10,100)
phidot = -3*FirstAdiabatic / (q* (L_sample*Re)**2 * np.sqrt(1+ (2*FirstAdiabatic*Beq/ (mass*L_sample**3)) ) )
phidot_to_L = interpolate.interp1d(phidot,L_sample, kind = 'cubic')
L0i = phidot_to_L(omega/m)
return L0i
omega = 2*np.pi*frequency
L0i_wave = wave_peak(omega)
Phi0i_wave = np.linspace(0,2*np.pi,N_waves)
np.random.shuffle(Phi0i_wave)
#njit(nogil= True)
def Electric_Field(t,r):
E0 = A*np.exp(-(r[0]-L0i_wave)**2 / (2*sigma**2))
Delta = np.arctan2( (r[0] * (r[0]-L0i_wave)/sigma**2 - 1), (2*np.pi*r[0]/zeta) )
Er = E0/m * np.sqrt( (2*np.pi*r[0]/zeta)**2 + (r[0]*(r[0]-L0i_wave)/sigma**2 -1)**2 ) * np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta + Delta)
Ephi = E0*np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta)
return np.sum(Er),np.sum(Ephi)
"End of Electric Field Code"
"Particle's ODE - Equation of Motion"
#njit(nogil= True)
def ODE(t,r):
Er, Ephi = Electric_Field(t,r) #Pull out the electric so we only call it once.
Ldot = Ephi * r[0]**3 / (Re*Beq)
Phidot = -Er * r[0]**2 / (Re*Beq) - 3* FirstAdiabatic / (q*r[0]**2*Re**2) * 1/np.sqrt(2*FirstAdiabatic*Beq/ (r[0]**3*mass) + 1)
return np.array([Ldot,Phidot])
#njit(nogil= True)
def RK4(t,r): #Standard Runge-Kutta Integration Algorthim
k1 = Step_Size*ODE(t,r)
k2 = Step_Size*ODE(t+Step_Size/2, r+k1/2)
k3 = Step_Size*ODE(t+Step_Size/2, r+k2)
k4 = Step_Size*ODE(t+Step_Size, r+k3)
return r + k1/6 + k2/3 + k3/3 + k4/6
#njit(nogil= True)
def Motion(L0,Phi0): #Insert Inital Conditions and it will loop through the RK4 integrator and output all its positions.
L_Array = np.zeros_like(time_array)
Phi_Array = np.zeros_like(time_array)
L_Array[0] = L0
Phi_Array[0] = Phi0
for i in range(1,N_points):
L_Array[i], Phi_Array[i] = RK4(time_array[i-1], np.array([ L_Array[i-1],Phi_Array[i-1] ]) )
return L_Array[::Skip_How_Many], Phi_Array[::Skip_How_Many]
#Skip_How_Many is used to take up less RAM space since we don't need that kind of percsion in our data
# Location = Motion(5,0)
# x = Location[0]*np.cos(Location[1])
# y = Location[0]*np.sin(Location[1])
# plt.plot(x,y,"o", markersize = 0.5)
# ts = time()
# Motion(5,0)
# print('Solo Time:', time() - ts)
"Getting my Arrays ready so I can index it"
Split = int(np.sqrt(N_Particles))
L0i = np.linspace(4.4,5.5,Split)
Phi0i = np.linspace(0,360,Split) / 180 * np.pi
L0_Grid = np.repeat(L0i,Split)
# ^Here I want to run a meshgrid of L0i and Phi0, so I repeat L0i using this function and mod (%) the index on the Phi Function
#Create Appending Array
results = []
def get_results(result): #Call back to this array from Multiprocessing to append the results it gives per run.
results.append(result)
clear_output()
print("Getting Results %0.2f" %(len(results)/N_Particles * 100), end='\r')
if __name__ == '__main__':
#Call In Multiprocessing
pool = mp.Pool(mp.cpu_count()) #Counting number of threads to start
ts = time() #Timing this process. Begins here
for ii in range(N_Particles): #Not too sure what this does, but it works - I assume it parallelizes this loop
pool.apply_async(Motion, args = (L0_Grid[ii],Phi0i[int(ii%Split)]), callback=get_results)
pool.close() #I'm not too sure what this does but everyone uses it, and it won't work without it
pool.join()
print('Time in MP parallel:', time() - ts) #Output Time
I think the main reason why your code is slow is because your Runge-Kutta method has fixed time steps. Fancy ODE solvers will select the biggest time step that allows a tolerable amount of error. One example is the LSODA ODE solver from ODEPACK.
Below I've re-written your code using NumbaLSODA. On my computer, it speeds up your code by about 200x.
import numpy as np
import matplotlib.pyplot as plt
from numba import njit, prange
from time import time
from tqdm import tqdm
import multiprocessing as mp
from scipy import interpolate
from NumbaLSODA import lsoda_sig, lsoda
from numba import cfunc
import numba as nb
"Electric Field Information"
A = np.float32(1.00E-04)
N_waves = np.int(19)
frequency = np.linspace(37.5,46.5,N_waves)*1e-3 #Set of frequencies used for Electric Field
m = np.int(20) #Azimuthal Wave Number
sigma = np.float32(0.5) #Gaussian Width of E wave in L
zeta = np.float32(1)
"Particle Information"
N_Particles = np.int(10000)
q = np.float32(-1) #Charge of electron
mass = np.float32(0.511e6) #Mass of Proton eV/c^2
FirstAdiabatic = np.float32(2000e10) #MeV/Gauss Adiabatic Invariant
"Runge-Kutta Paramters"
Total_Time = np.float32(10) #hours
Step_Size = np.float32(0.2) #second
Plot_Time = np.float32(60) #seconds
time_array = np.arange(0, Total_Time*3600+Step_Size, Step_Size) #Convert to seconds and Add End Point
N_points = len(time_array)
Skip_How_Many = int(Plot_Time/Step_Size) #Used to shorten our data set and save RAM
"Constants"
Beq = np.float64(31221.60592e-9) #nT
Re = np.float32(6371e3) #Meters
c = np.float32(2.998e8) #m/s
"Start Electric Field Code"
def wave_peak(omega): #Called once so no need to JIT or Optimize this
L_sample = np.linspace(1,10,100)
phidot = -3*FirstAdiabatic / (q* (L_sample*Re)**2 * np.sqrt(1+ (2*FirstAdiabatic*Beq/ (mass*L_sample**3)) ) )
phidot_to_L = interpolate.interp1d(phidot,L_sample, kind = 'cubic')
L0i = phidot_to_L(omega/m)
return L0i
omega = 2*np.pi*frequency
L0i_wave = wave_peak(omega)
Phi0i_wave = np.linspace(0,2*np.pi,N_waves)
np.random.shuffle(Phi0i_wave)
#njit
def Electric_Field(t,r):
E0 = A*np.exp(-(r[0]-L0i_wave)**2 / (2*sigma**2))
Delta = np.arctan2( (r[0] * (r[0]-L0i_wave)/sigma**2 - 1), (2*np.pi*r[0]/zeta) )
Er = E0/m * np.sqrt( (2*np.pi*r[0]/zeta)**2 + (r[0]*(r[0]-L0i_wave)/sigma**2 -1)**2 ) * np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta + Delta)
Ephi = E0*np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta)
return np.sum(Er),np.sum(Ephi)
"End of Electric Field Code"
"Particle's ODE - Equation of Motion"
#cfunc(lsoda_sig)
def ODE(t, r_, dr, p):
r = nb.carray(r_, (2,))
Er, Ephi = Electric_Field(t,r)
Ldot = Ephi * r[0]**3 / (Re*Beq)
Phidot = -Er * r[0]**2 / (Re*Beq) - 3* FirstAdiabatic / (q*r[0]**2*Re**2) * 1/np.sqrt(2*FirstAdiabatic*Beq/ (r[0]**3*mass) + 1)
dr[0] = Ldot
dr[1] = Phidot
funcptr = ODE.address
#njit
def Motion(L0,Phi0):
u0 = np.array([L0,Phi0],np.float64)
data = np.array([5.0])
usol, success = lsoda(funcptr, u0, time_array, data)
L_Array = usol[:,0]
Phi_Array = usol[:,1]
return L_Array[::Skip_How_Many], Phi_Array[::Skip_How_Many]
#Skip_How_Many is used to take up less RAM space since we don't need that kind of percsion in our data
Location = Motion(5,0)
x = Location[0]*np.cos(Location[1])
y = Location[0]*np.sin(Location[1])
plt.plot(x,y,"o", markersize = 0.5)
ts = time()
Motion(5,0)
print('Solo Time:', time() - ts)
I would like to lower the time Scipy's odeint takes for solving a differential
equation.
To practice, I used the example covered in Python in scientific computations as template. Because odeint takes a function f as argument, I wrote this function as a statically typed Cython version and hoped
the running time of odeint would decrease significantly.
The function f is contained in file called ode.pyx as follows:
import numpy as np
cimport numpy as np
from libc.math cimport sin, cos
def f(y, t, params):
cdef double theta = y[0], omega = y[1]
cdef double Q = params[0], d = params[1], Omega = params[2]
cdef double derivs[2]
derivs[0] = omega
derivs[1] = -omega/Q + np.sin(theta) + d*np.cos(Omega*t)
return derivs
def fCMath(y, double t, params):
cdef double theta = y[0], omega = y[1]
cdef double Q = params[0], d = params[1], Omega = params[2]
cdef double derivs[2]
derivs[0] = omega
derivs[1] = -omega/Q + sin(theta) + d*cos(Omega*t)
return derivs
I then create a file setup.py to complie the function:
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize('ode.pyx'))
The script solving the differential equation (also containing the Python
version of f) is called solveODE.py and looks as:
import ode
import numpy as np
from scipy.integrate import odeint
import time
def f(y, t, params):
theta, omega = y
Q, d, Omega = params
derivs = [omega,
-omega/Q + np.sin(theta) + d*np.cos(Omega*t)]
return derivs
params = np.array([2.0, 1.5, 0.65])
y0 = np.array([0.0, 0.0])
t = np.arange(0., 200., 0.05)
start_time = time.time()
odeint(f, y0, t, args=(params,))
print("The Python Code took: %.6s seconds" % (time.time() - start_time))
start_time = time.time()
odeint(ode.f, y0, t, args=(params,))
print("The Cython Code took: %.6s seconds ---" % (time.time() - start_time))
start_time = time.time()
odeint(ode.fCMath, y0, t, args=(params,))
print("The Cython Code incorpoarting two of DavidW_s suggestions took: %.6s seconds ---" % (time.time() - start_time))
I then run:
python setup.py build_ext --inplace
python solveODE.py
in the terminal.
The time for the python version is approximately 0.055 seconds,
whilst the Cython version takes roughly 0.04 seconds.
Does somebody have a recommendation to improve on my attempt of solving the
differential equation, preferably without tinkering with the odeint routine itself, with Cython?
Edit
I incorporated DavidW's suggestion in the two files ode.pyx and solveODE.py It took only roughly 0.015 seconds to run the code with these suggestions.
The easiest change to make (which will probably gain you a lot) is to use the C math library sin and cos for operations on single numbers instead of number. The call to numpy and the time spent working out that it isn't an array is fairly costly.
from libc.math cimport sin, cos
# later
-omega/Q + sin(theta) + d*cos(Omega*t)
I'd be tempted to assign a type to the input d (none of the other inputs are easily typed without changing the interface):
def f(y, double t, params):
I think I'd also just return a list like you do in your Python version. I don't think you gain a lot by using a C array.
tldr; use numba.jit for 3x speedup...
I don't have much experience with cython, but my machine seems to get similar computation times for your strictly python version, so we should be able to compare roughly apples to apples. I used numba to compile the function f (which I re-wrote slightly to make it play nicer with the compiler).
def f(y, t, params):
return np.array([y[1], -y[1]/params[0] + np.sin(y[0]) + params[1]*np.cos(params[2]*t)])
numba_f = numba.jit(f)
dropping in numba_f in place of your ode.f gives me this output...
The Python Code took: 0.0468 seconds
The Numba Code took: 0.0155 seconds
I then wondered if I could duplicate odeint and also compile with numba to speed things up even further... (I could not)
Here is my Runge-Kutta numerical differential equation integrator:
#function f is provided inline (not as an arg)
def runge_kutta(y0, steps, dt, args=()): #improvement on euler's method. *note: time steps given in number of steps and dt
Y = np.empty([steps,y0.shape[0]])
Y[0] = y0
t = 0
n = 0
for n in range(steps-1):
#calculate coeficients
k1 = f(Y[n], t, args) #(euler's method coeficient) beginning of interval
k2 = f(Y[n] + (dt * k1 / 2), t + (dt/2), args) #interval midpoint A
k3 = f(Y[n] + (dt * k2 / 2), t + (dt/2), args) #interval midpoint B
k4 = f(Y[n] + dt * k3, t + dt, args) #interval end point
Y[n + 1] = Y[n] + (dt/6) * (k1 + 2*k2 + 2*k3 + k4) #calculate Y(n+1)
t += dt #calculate t(n+1)
return Y
naive looping functions are typically the fastest once compiled, although this could probably be re-structured for a little better speed. I should note, this gives a different answer than odeint, deviating by as much as .001 after around 2000 steps, and is completely different after 3000. For the numba version of the function, I simply replaced f with numba_f, and added the compilation with #numba.jit as a decorator. In this case, as expected the pure python version is very slow, but the numba version is not any faster than the numba with odeint (again, ymmv).
using custom integrator
The Python Code took: 0.2340 seconds
The Numba Code took: 0.0156 seconds
Here's an example of compiling ahead of time. I don't have the necessary toolchain on this computer to compile, and I don't have admin to install it, so this gives me an error that I don't have the required compiler, but it should work otherwise.
import numpy as np
from numba.pycc import CC
cc = CC('diffeq')
#cc.export('func', 'f8[:](f8[:], f8, f8[:])')
def func(y, t, params):
return np.array([y[1], -y[1]/params[0] + np.sin(y[0]) + params[1]*np.cos(params[2]*t)])
cc.compile()
If others answer this question using other modules, I might as well chime in:
I am the author of JiTCODE, which accepts an ODE written in SymPy symbols and then converts this ODE to C code for a Python module, compiles this C code, loads the result and uses this as a derivative for SciPy’s ODE. Your example translated to JiTCODE looks like this:
from jitcode import jitcode, provide_basic_symbols
import numpy as np
from sympy import sin, cos
import time
Q = 2.0
d = 1.5
Ω = 0.65
t, y = provide_basic_symbols()
f = [
y(1),
-y(1)/Q + sin(y(0)) + d*cos(Ω*t)
]
initial_state = np.array([0.0,0.0])
ODE = jitcode(f)
ODE.set_integrator("lsoda")
ODE.set_initial_value(initial_state,0.0)
start_time = time.time()
data = np.vstack(ODE.integrate(T) for T in np.arange(0.05, 200., 0.05))
end_time = time.time()
print("JiTCODE took: %.6s seconds" % (end_time - start_time))
This takes 0.11 seconds, which is horribly slow compared to the solutions based on odeint, but this is not due to the actual integration but the way the results are handled: While odeint directly creates an array efficiently internally, this is done via Python here. Depending on what you do, this may be a crucial disadvantage, but this quickly becomes irrelevant for a coarser sampling or larger differential equations.
So, let’s remove the data collection and just look at the integration, by replacing the last lines with the following:
ODE = jitcode(f)
ODE.set_integrator("lsoda", max_step=0.05, nsteps=1e10)
ODE.set_initial_value(initial_state,0.0)
start_time = time.time()
ODE.integrate(200.0)
end_time = time.time()
print("JiTCODE took: %.6s seconds" % (end_time - start_time))
Note that I set max_step=0.05 to force the integrator to make at least as many steps as in your example and ensure that the only difference is that the results of the integration are not stored to some array. This runs in 0.010 seconds.
NumbaLSODA takes 0.00088 seconds (17x faster than Cython).
from NumbaLSODA import lsoda_sig, lsoda
import numba as nb
import numpy as np
import time
#nb.cfunc(lsoda_sig)
def f(t, y_, dy, p_):
p = nb.carray(p_, (3,))
y = nb.carray(y_, (2,))
theta, omega = y
Q, d, Omega = p
dy[0] = omega
dy[1] = -omega/Q + np.sin(theta) + d*np.cos(Omega*t)
funcptr = f.address # address to ODE function
y0 = np.array([0.0, 0.0])
data = np.array([2.0, 1.5, 0.65])
t = np.arange(0., 200., 0.05)
start_time = time.time()
usol, success = lsoda(funcptr, y0, t, data = data)
print("NumbaLSODA took: %.8s seconds ---" % (time.time() - start_time))
result
NumbaLSODA took: 0.000880 seconds ---
I would like to reduce the computation time for the code posted below. In essence, the code below calculates the array Tf as product of the following nested loop:
Af = lambda x: Approximationf(f, x)
for idxp, prior in enumerate(grid_prior):
for idxy, y in enumerate(grid_y):
posterior = lambda yPrime: updated_posterior(prior, y, yPrime)
integrateL = integrate(lambda z: Af(np.array([y*np.exp(mu[0])*z,
posterior(y*np.exp(mu[0]) * z)])))
integrateH = integrate(lambda z: Af(np.array([y*np.exp(mu[1])*z,
posterior(y * np.exp(mu[1])*z)])))
Tf[idxy, idxp] = (h[idxy, idxp] +
beta * ((prior * integrateL) +
(1-prior)*integrateH))
The objects posterior, integrate and Af are functions that are repeatedly called while iterating over the loop. The function posterior calculates a scalar called posterior. The function Af approximates the function f at sample points x and passes the result on to the function integrate, which calculates the conditional expectation of the function f.
The code posted below is a simplification of a more difficult problem. Instead of running the nested loop once, I have to run it multiple times to solve a fixed point problem. This problem is initialized with an arbitrary function f and a function Tf is created. This array is then used in the next iteration over the nested loop to calculate another array Tf. The process continues until convergence.
I decided not to report results of the cProfile module. By neglecting the iteration over the nested loop until convergence a lot of internal python executions require a relatively long time. However, when iterating until convergence, these internal executions loose their relative importance and are relegated to lower positions in the cPython output.
I tried to mimick different suggestions for lowering the computation time of loops I found online for slightly modified problems. Unfortunately, I couldn't do so and could not really figure out a common approach to tackle these problems. Does somebody has an idea how to lower the computation time of this loop? I am grateful for any help!
import numpy as np
from scipy import interpolate
from scipy.stats import lognorm
from scipy.integrate import fixed_quad
# == The following lines define the paramters for the problem == #
gamma, beta, sigma, mu = 2, 0.95, 0.0255, np.array([0.0113, -0.0016])
grid_y, grid_prior = np.linspace(7, 10, 15), np.linspace(0, 1, 5)
int_min, int_max = np.exp(- 7 * sigma), np.exp(+ 7 * sigma)
phi = lognorm(sigma)
f = np.array([[ 1.29824564, 1.29161017, 1.28379398, 1.2676886, 1.15320819],
[ 1.26290108, 1.26147364, 1.24755837, 1.23819851, 1.11912802],
[ 1.22847276, 1.23013194, 1.22128198, 1.20996971, 1.0864706 ],
[ 1.19528104, 1.19645792, 1.19056084, 1.17980572, 1.05532966],
[ 1.16344832, 1.16279841, 1.15997191, 1.15169942, 1.02564429],
[ 1.13301675, 1.13109952, 1.12883038, 1.1236645, 0.99730795],
[ 1.10398195, 1.10125013, 1.0988554, 1.09612933, 0.97019688],
[ 1.07630046, 1.07356297, 1.07126087, 1.06878758, 0.94417658],
[ 1.04989686, 1.04728542, 1.04514962, 1.04289665, 0.91910765],
[ 1.02467087, 1.0221532, 1.02011384, 1.01797238, 0.89485162],
[ 1.00050447, 0.99795025, 0.99576917, 0.99330549, 0.87127677],
[ 0.97726849, 0.97443288, 0.97190614, 0.96861352, 0.84826362],
[ 0.95482612, 0.94783816, 0.94340077, 0.93753641, 0.82569922],
[ 0.93302433, 0.91985497, 0.9059118, 0.88895196, 0.80348449],
[ 0.91165997, 0.88253486, 0.86126688, 0.84769975, 0.78147382]])
# == Calculate function h, Used in the loop below == #
E0 = np.exp((1-gamma)*mu + (1-gamma)**2*sigma**2/2)
h = np.outer(beta*grid_y**(1-gamma), grid_prior*E0[0] + (1-grid_prior)*E0[1])
def integrate(g):
"""
This function is repeatedly called in the loop below
"""
integrand = lambda z: g(z) * phi.pdf(z)
result = fixed_quad(integrand, int_min, int_max, n=15)[0]
return result
def Approximationf(f, x):
"""
This function approximates the function f and is repeatedly called in
the loop
"""
# == simplify notation == #
fApprox = np.empty((x.shape[1]))
lower, middle = (x[0] < grid_y[0]), (x[0] >= grid_y[0]) & (x[0] <= grid_y[-1])
upper = (x[0] > grid_y[-1])
# = Calculate Polynomial == #
y_tile = np.tile(grid_y, len(grid_prior))
prior_repeat = np.repeat(grid_prior, len(grid_y))
s = interpolate.SmoothBivariateSpline(y_tile, prior_repeat,
f.T.flatten(), kx=5, ky=5)
# == interpolation == #
fApprox[middle] = s(x[0, middle], x[1, middle])[:, 0]
# == Extrapolation == #
if any(lower):
s0 = s(lower[lower]*grid_y[0], x[1, lower])[:, 0]
s1 = s(lower[lower]*grid_y[1], x[1, lower])[:, 0]
slope_lower = (s0 - s1)/(grid_y[0] - grid_y[1])
fApprox[lower] = s0 + slope_lower*(x[0, lower] - grid_y[0])
if any(upper):
sM1 = s(upper[upper]*grid_y[-1], x[1, upper])[:, 0]
sM2 = s(upper[upper]*grid_y[-2], x[1, upper])[:, 0]
slope_upper = (sM1 - sM2)/(grid_y[-1] - grid_y[-2])
fApprox[upper] = sM1 + slope_upper*(x[0, upper] - grid_y[-1])
return fApprox
def updated_posterior(prior, y, yPrime):
"""
This function calculates the posterior weights put on each distribution.
It is the thrid function repeatedly called in the loop below.
"""
z_0 = yPrime/(y * np.exp(mu[0]))
z_1 = yPrime/(y * np.exp(mu[1]))
l0, l1 = phi.pdf(z_0), phi.pdf(z_1)
posterior = l0*prior / (l0*prior + l1*(1-prior))
return posterior
Tf = np.empty_like(f)
Af = lambda x: Approximationf(f, x)
# == Apply the T operator to f == #
for idxp, prior in enumerate(grid_prior):
for idxy, y in enumerate(grid_y):
posterior = lambda yPrime: updated_posterior(prior, y, yPrime)
integrateL = integrate(lambda z: Af(np.array([y*np.exp(mu[0])*z,
posterior(y*np.exp(mu[0]) * z)])))
integrateH = integrate(lambda z: Af(np.array([y*np.exp(mu[1])*z,
posterior(y * np.exp(mu[1])*z)])))
Tf[idxy, idxp] = (h[idxy, idxp] +
beta * ((prior * integrateL) +
(1-prior)*integrateH))
Some experience with multiprocessing Following reptilicus comment, I decided to investigate how to use the multiprocessing module. My idea was to begin by parallizing the computation of the intergrateL array. To do so, I fixed the outer loop to prior =0.5 and wanted to iterate over the inner loop, grid_y. However, I still have to take into consideration that intergrateL is a lambda function in z. I tried to follow the advice of the stack-overflow question "How to let Pool.map take a lambda function" and wrote the following code:
prior = 0.5
Af = lambda x: Approximationf(f, x)
class Iteration(object):
def __init__(self,state):
self.y = state
def __call__(self,z):
Af(np.array([self.y*np.exp(mu[0])*z,
updated_posterior(prior,
self.y,self.y*np.exp(mu[0])*z)]))
with Pool(processes=4) as pool:
out = pool.map(Iteration(y), np.nditer(grid_y))
Unfortunately, python returns upon running the program:
IndexError: tuple index out of range
On first sight, these sniffs like a trivial error, but I cannot remedy it. Does somebody has an idea how to tackle the problem? Again, I'm grateful for any advice I receive!
I would target that nested loop, something like this. This is psuedo-code but it should get you started.
def do_calc(idxp, idxy, y, prior):
posterior = lambda yPrime: updated_posterior(prior, y, yPrime)
integrateL = integrate(lambda z: Af(np.array([y*np.exp(mu[0])*z,
posterior(y*np.exp(mu[0]) * z)])))
integrateH = integrate(lambda z: Af(np.array([y*np.exp(mu[1])*z,
posterior(y * np.exp(mu[1])*z)])))
return (idxp, idyy, posterior, integrateL, integrateH)
pool = multiprocessing.pool(8) # or however many cores you have
results = []
# This is the part that I would try to parallelize
for idxp, prior in enumerate(grid_prior):
for idxy, y in enumerate(grid_y):
results.append(pool.apply_async(do_calc, args=(idxpy, idxy, y, prior))
pool.close()
pool.join()
results = [r.get() for r in results]
for r in results:
Tf[r[0], r[1] = (h[r[0], r[1]] +
beta * ((prior * r[3]) +
(1-prior)*r[4))
I am trying to fit some data that are distributed in the time following a rising gaussian curve, and then decaying exponentially.
I have found this example on the web, that is very similar to my case, but I just started to fit with python, and the example seems quite confusing to me.
Nonetheless, I have tryied to adapt the example to my script and data, and in the following is my progress:
#!/usr/bin/env python
import pyfits, os, re, glob, sys
from scipy.optimize import leastsq
from numpy import *
from pylab import *
from scipy import *
from scipy import optimize
import numpy as N
import pylab as P
data=pyfits.open('http://heasarc.gsfc.nasa.gov/docs/swift/results/transients/weak/GX304-1.orbit.lc.fits')
time = data[1].data.field(0)/86400. + data[1].header['MJDREFF'] + data[1].header['MJDREFI']
rate = data[1].data.field(1)
error = data[1].data.field(2)
data.close()
cond = ((time > 56200) & (time < 56220))
time=time[cond]
rate=rate[cond]
error=error[cond]
def expGauss(x, pos, wid, tConst, expMod = 0.5, amp = 1):
expMod *= 1.0
gNorm = amp * N.exp(-0.5*((x-pos)/(wid))**2)
g = expBroaden(gNorm, tConst, expMod)
return g, gNorm
def expBroaden(y, t, expMod):
fy = F.fft(y)
a = N.exp(-1*expMod*time/t)
fa = F.fft(a)
fy1 = fy*fa
yb = (F.ifft(fy1).real)/N.sum(a)
return yb
if __name__ == '__main__':
# Fit the first set
#p[0] -- amplitude, p[1] -- position, p[2] -- width
fitfuncG = lambda p, x: p[0]*N.exp(-0.5*(x-p[1])**2/p[2]**2) # Target function
errfuncG = lambda p, x, y: fitfuncG(p, x) - y # Distance to the target function
p0 = [0.20, 56210, 2.0] # Initial guess for the parameters
p1, success = optimize.leastsq(errfuncG, p0[:], args=(time, rate))
p1G = fitfuncG(p1, time)
# P.plot(rate, 'ro', alpha = 0.4, label = "Gaussian")
# P.plot(p1G, label = 'G-Fit')
def expGauss(x, pos, wid, tConst, expMod = 0.5, amp = 1):
#p[0] -- amplitude, p[1] -- position, p[2] -- width, p[3]--tConst, p[4] -- expMod
fitfuncExpG = lambda p, x: expGauss(x, p[1], p[2], p[3], p[4], p[0])[0]
errfuncExpG = lambda p, x, y: fitfuncExpG(p, x) - y # Distance to the target function
p0a = [0.20, 56210, 2.0] # Initial guess for the parameters
p1a, success = optimize.leastsq(errfuncExpG, p0a[:], args=(time, rate))
p1aG = fitfuncExpG(p1a, time)
print type(rate), type(time), len(rate), len(time)
P.plot(rate, 'go', alpha = 0.4, label = "ExpGaussian")
P.plot(p1aG, label = 'ExpG-Fit')
P.legend()
P.show()
I am sure to have confused the whole thing, so sorry in advance for that, but at this point I don't know how to go further...
The code take the data from the web, so it is directly executable.
At the moment the code runs without any error, but it doesn't produce any plot.
Again, my goal is to fit the data with those two functions, how can I improve my code to do that?
Any suggestion is really appreciated.
Similarly to your other question, here also I would use a trigonometric function to fit this peaK:
The following code works if pasted after your code:
import numpy as np
from scipy.optimize import curve_fit
x = time
den = x.max() - x.min()
x -= x.min()
y_points = rate
def func(x, a1, a2, a3):
return a1*sin(1*pi*x/den)+\
a2*sin(2*pi*x/den)+\
a3*sin(3*pi*x/den)
popt, pcov = curve_fit(func, x, y_points)
y = func(x, *popt)
plot(time,rate)
plot(x,y, color='r', linewidth=2.)
show()