I have implemented two functions FFT and InverseFFT in recursive mode.
These are the functions:
def rfft(a):
n = a.size
if n == 1:
return a
i = 1j
w_n = e ** (-2 * i * pi / float(n))
w = 1
a_0 = np.zeros(int(math.ceil(n / 2.0)), dtype=np.complex_)
a_1 = np.zeros(n / 2, dtype=np.complex_)
for index in range(0, n):
if index % 2 == 0:
a_0[index / 2] = a[index]
else:
a_1[index / 2] = a[index]
y_0 = rfft(a_0)
y_1 = rfft(a_1)
y = np.zeros(n, dtype=np.complex_)
for k in range(0, n / 2):
y[k] = y_0[k] + w * y_1[k]
y[k + n / 2] = y_0[k] - w * y_1[k]
w = w * w_n
return y
def rifft(y):
n = y.size
if n == 1:
return y
i = 1j
w_n = e ** (2 * i * pi / float(n))
w = 1
y_0 = np.zeros(int(math.ceil(n / 2.0)), dtype=np.complex_)
y_1 = np.zeros(n / 2, dtype=np.complex_)
for index in range(0, n):
if index % 2 == 0:
y_0[index / 2] = y[index]
else:
y_1[index / 2] = y[index]
a_0 = rifft(y_0)
a_1 = rifft(y_1)
a = np.zeros(n, dtype=np.complex_)
for k in range(0, n / 2):
a[k] = (a_0[k] + w * a_1[k]) / n
a[k + n / 2] = (a_0[k] - w * a_1[k]) / n
w = w * w_n
return a
Based on the definition of IFFT, converting FFT function to IFFT function can be done by changing 2*i*pi to -2*i*pi and dividing the result by N. The rfft() function works fine but the rifft() function, after these modifications, does not work.
I compare the output of my functions with scipy.fftpack.fft and scipy.fftpack.ifft functions.
I feed the following NumPy array:
a = np.array([1, 0, -1, 3, 0, 0, 0, 0])
The following box shows the results of rfft() function and scipy.fftpack.fft function.
//rfft(a)
[ 3.00000000+0.j -1.12132034-1.12132034j 2.00000000+3.j 3.12132034-3.12132034j -3.00000000+0.j 3.12132034+3.12132034j 2.00000000-3.j -1.12132034+1.12132034j]
//scipy.fftpack.fft(a)
[ 3.00000000+0.j -1.12132034-1.12132034j 2.00000000+3.j 3.12132034-3.12132034j -3.00000000+0.j 3.12132034+3.12132034j 2.00000000-3.j -1.12132034+1.12132034j]
And this box shows the results of rifft() function and scipy.fftpack.ifft function.
//rifft(a)
[ 0.04687500+0.j -0.01752063+0.01752063j 0.03125000-0.046875j 0.04877063+0.04877063j -0.04687500+0.j 0.04877063-0.04877063j 0.03125000+0.046875j -0.01752063-0.01752063j]
//scipy.fftpack.ifft(a)
[ 0.37500000+0.j -0.14016504+0.14016504j 0.25000000-0.375j 0.39016504+0.39016504j -0.37500000+0.j 0.39016504-0.39016504j 0.25000000+0.375j -0.14016504-0.14016504j]
The division by the size N is a global scaling factor and should be performed on the result of the recursion rather than dividing at each stage of the recursion as you have done (by a decreasing factor as you go deeper in the recursion; overall scaling down the result too much). You could address this by removing the / n factor in the final loop of your original implementation, which gets called by another function performing the scaling:
def unscaledrifft(y):
...
for k in range(0, n / 2):
a[k] = (a_0[k] + w * a_1[k])
a[k + n / 2] = (a_0[k] - w * a_1[k])
w = w * w_n
return a
def rifft(y):
return unscaledrifft(y)/y.size
Alternatively, since you are performing a radix-2 FFT, the global factor N would be a power of 2 such that N=2**n, where n is the number of steps in the recursion. You could thus divide by 2 at each stage of the recursion to achieve the same result:
def rifft(y):
...
for k in range(0, n / 2):
a[k] = (a_0[k] + w * a_1[k]) / 2
a[k + n / 2] = (a_0[k] - w * a_1[k]) / 2
w = w * w_n
return a
Related
I'm writing a code that solves a heat equation implementing an implicit method. The problem is that the values between first and last layer of the matrix are NaNs. What could be the problem?
From my problem of view, the main issue might be with the 105th line, which represents the convrsion of original function to the one that includes the boundary function.
Boundary functions code:
def func(x, t):
return x*(1 - x)*np.exp(-2*t)
# boundary function for x = 0 and x = 1
def q0(t):
return t*np.exp(-t/0.1)*np.cos(t) # граничное условие при x = 0
def q1(t):
return t*np.exp(-t/0.5)*np.cos(t) # граничное уcловие при x = 1
def derivative(f, x0, step):
return (f(x0+step) - f(x0))/step
# boundary function that for t = 0
def u_x0(x):
return (-x + 1)*x
Function that solves the three-diagonal matrix equation
def solution(a, b):
n = len(a)
x = [0 for k in range(0, n)]
# forward
v = [0 for k in range(0, n)]
u = [0 for k in range(0, n)]
# first string (t = 0)
v[0] = a[0][1] / (-a[0][0])
u[0] = ( - b[0]) / (-a[0][0])
for i in range(1, n - 1):
v[i] = a[i][i+1] / ( -a[i][i] - a[i][i-1]*v[i-1] )
u[i] = ( a[i][i-1]*u[i-1] - b[i] ) / ( -a[i][i] - a[i][i-1]*v[i-1] )
# last string (t = 1)
v[n-1] = 0
u[n-1] = (a[n-1][n-2]*u[n-2] - b[n-1]) / (-a[n-1][n-1] - a[n-1][n-2]*v[n-2])
x[n-1] = u[n-1]
for i in range(n-1, 0, -1):
x[i-1] = v[i-1] * x[i] + u[i-1]
return x
Coefficent matrix values:
A = -t/h**2
B = 1 + 2*t/h**2
C = -t/h**2
Code that actually solves the matrix:
i = 1
X =[]
while i < 99:
X = solution(cool_array, f)
k = 0
while k < len(x_i):
#line-105
X[k] += 0.01*(func(x_i[k], x_i[i]) - (1 - x_i[i])*derivative(q0, x_i[i], 0.01) - (x_i[i])*derivative(q1, x_i[i], 0.01))
k+=1
a = 1
while a < 98:
w_h_t[i][a] = X[a]
a+=1
f = X
f[0] = w_h_t[i][0]
f[99] = w_h_t[i][99]
i+=1
print(w_h_t)
As far as I understand, the algorith solution(a, b) is written properly, so I guess the problem might be with the boundary functions or with the 105th line. The output I expect is at least an array of number, not NaNs.
The function get_cubic needs 4 points and i need to find b and c by calculation (a and d is given).
Here is my code and i need help specifically with get_bezier_coef
def get_bezier_coef(points):
# since the formulas work given that we have n+1 points
# then n must be this:
n = len(points) - 1
# build coefficents matrix
C = 4 * np.identity(n)
np.fill_diagonal(C[1:], 1)
np.fill_diagonal(C[:, 1:], 1)
C[0, 0] = 2
C[n - 1, n - 1] = 7
C[n - 1, n - 2] = 2
# build points vector
P = [2. * (2. * points[i] + points[i + 1]) for i in range(n)]
P[0] = points[0] + 2 * points[1]
P[n - 1] = 8 * points[n - 1] + points[n]
# solve system, find a & b
A = np.linalg.solve(C,P)
B = [0] * n
for i in range(n - 1):
B[i] = 2. * points[i + 1] - A[i + 1]
B[n - 1] = (A[n - 1] + points[n]) / 2.
return A, B
# returns the general Bezier cubic formula given 4 control points
def get_cubic(a, b, c, d):
return lambda t: np.power(1 - t, 3) * a + 3 * np.power(1 - t, 2) * t * b + 3 * (1 - t) * np.power(t,2) * c + np.power(t, 3) * d
# return one cubic curve for each consecutive points
def get_bezier_cubic(points):
A, B = get_bezier_coef(points)
return [
get_cubic(points[i], A[i], B[i], points[i + 1])
for i in range(len(points) - 1)
]
The function get_bezier_coef get list of points [(X0,Y0),(X1,Y1)....] and return the coefficients of the bezier (find the 2 control points between the start and end point). Is there anyway to calculate the coefficients without matrices? Or any other way that will reduce time.
I'm having trouble with the slow computation of my Python code. Based on the pycallgraph below, the bottleneck seems to be the module named miepython.miepython.mie_S1_S2 (highlighted by pink), which takes 0.47 seconds per call.
The source code for this module is as follows:
import numpy as np
from numba import njit, int32, float64, complex128
__all__ = ('ez_mie',
'ez_intensities',
'generate_mie_costheta',
'i_par',
'i_per',
'i_unpolarized',
'mie',
'mie_S1_S2',
'mie_cdf',
'mie_mu_with_uniform_cdf',
)
#njit((complex128, float64, float64[:]), cache=True)
def _mie_S1_S2(m, x, mu):
"""
Calculate the scattering amplitude functions for spheres.
The amplitude functions have been normalized so that when integrated
over all 4*pi solid angles, the integral will be qext*pi*x**2.
The units are weird, sr**(-0.5)
Args:
m: the complex index of refraction of the sphere
x: the size parameter of the sphere
mu: array of angles, cos(theta), to calculate scattering amplitudes
Returns:
S1, S2: the scattering amplitudes at each angle mu [sr**(-0.5)]
"""
nstop = int(x + 4.05 * x**0.33333 + 2.0) + 1
a = np.zeros(nstop - 1, dtype=np.complex128)
b = np.zeros(nstop - 1, dtype=np.complex128)
_mie_An_Bn(m, x, a, b)
nangles = len(mu)
S1 = np.zeros(nangles, dtype=np.complex128)
S2 = np.zeros(nangles, dtype=np.complex128)
nstop = len(a)
for k in range(nangles):
pi_nm2 = 0
pi_nm1 = 1
for n in range(1, nstop):
tau_nm1 = n * mu[k] * pi_nm1 - (n + 1) * pi_nm2
S1[k] += (2 * n + 1) * (pi_nm1 * a[n - 1]
+ tau_nm1 * b[n - 1]) / (n + 1) / n
S2[k] += (2 * n + 1) * (tau_nm1 * a[n - 1]
+ pi_nm1 * b[n - 1]) / (n + 1) / n
temp = pi_nm1
pi_nm1 = ((2 * n + 1) * mu[k] * pi_nm1 - (n + 1) * pi_nm2) / n
pi_nm2 = temp
# calculate norm = sqrt(pi * Qext * x**2)
n = np.arange(1, nstop + 1)
norm = np.sqrt(2 * np.pi * np.sum((2 * n + 1) * (a.real + b.real)))
S1 /= norm
S2 /= norm
return [S1, S2]
Apparently, the source code is jitted by Numba so it should be faster than it actually is. The number of iterations in for loop in this function is around 25,000 (len(mu)=50, len(a)-1=500).
Any ideas on how to speed up this computation? Is something hindering the fast computation of Numba? Or, do you think the computation is already fast enough?
[More details]
In the above, another function _mie_An_Bn is being used. This function is also jitted, and the source code is as follows:
#njit((complex128, float64, complex128[:], complex128[:]), cache=True)
def _mie_An_Bn(m, x, a, b):
"""
Compute arrays of Mie coefficients A and B for a sphere.
This estimates the size of the arrays based on Wiscombe's formula. The length
of the arrays is chosen so that the error when the series are summed is
around 1e-6.
Args:
m: the complex index of refraction of the sphere
x: the size parameter of the sphere
Returns:
An, Bn: arrays of Mie coefficents
"""
psi_nm1 = np.sin(x) # nm1 = n-1 = 0
psi_n = psi_nm1 / x - np.cos(x) # n = 1
xi_nm1 = complex(psi_nm1, np.cos(x))
xi_n = complex(psi_n, np.cos(x) / x + np.sin(x))
nstop = len(a)
if m.real > 0.0:
D = _D_calc(m, x, nstop + 1)
for n in range(1, nstop):
temp = D[n] / m + n / x
a[n - 1] = (temp * psi_n - psi_nm1) / (temp * xi_n - xi_nm1)
temp = D[n] * m + n / x
b[n - 1] = (temp * psi_n - psi_nm1) / (temp * xi_n - xi_nm1)
xi = (2 * n + 1) * xi_n / x - xi_nm1
xi_nm1 = xi_n
xi_n = xi
psi_nm1 = psi_n
psi_n = xi_n.real
else:
for n in range(1, nstop):
a[n - 1] = (n * psi_n / x - psi_nm1) / (n * xi_n / x - xi_nm1)
b[n - 1] = psi_n / xi_n
xi = (2 * n + 1) * xi_n / x - xi_nm1
xi_nm1 = xi_n
xi_n = xi
psi_nm1 = psi_n
psi_n = xi_n.real
The example inputs are like the followings:
m = 1.336-2.462e-09j
x = 8526.95
mu = np.array([-1., -0.7500396, 0.46037385, 0.5988121, 0.67384093, 0.72468684, 0.76421644, 0.79175856, 0.81723714, 0.83962897, 0.85924182, 0.87641596, 0.89383665, 0.90708978, 0.91931481, 0.93067567, 0.94073113, 0.94961222, 0.95689496, 0.96467123, 0.97138347, 0.97791831, 0.98339434, 0.98870543, 0.99414948, 0.9975728 0.9989995, 0.9989995, 0.9989995, 0.9989995, 0.9989995,0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899951, 0.99899952, 0.99899952,
0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 0.99899952, 1. ])
I am focussing on _mie_S1_S2 since it appear to be the most expensive function on the provided example dataset.
First of all, you can use the parameter fastmath=True to the JIT to accelerate the computation if there is no values like +Inf, -Inf, -0 or NaN computed.
Then you can pre-compute some expensive expression containing divisions or implicit integer-to-float conversions. Note that (2 * n + 1) / n = 2 + 1/n and (n + 1) / n = 1 + 1/n. This can be useful to reduce the number of precomputed array but did not change the performance on my machine (this may change regarding the target architecture). Note also that such a precomputation have a slight impact on the result accuracy (most of the time negligible and sometime better than the reference implementation).
On my machine, this strategy make the code 4.5 times faster with fastmath=True and 2.8 times faster without.
The k-based loop can be parallelized using parallel=True and prange of Numba. However, this may not be always faster on all machines (especially the ones with a lot of cores) since the loop is pretty fast.
Here is the final code:
#njit((complex128, float64, float64[:]), cache=True, parallel=True)
def _mie_S1_S2_opt(m, x, mu):
nstop = int(x + 4.05 * x**0.33333 + 2.0) + 1
a = np.zeros(nstop - 1, dtype=np.complex128)
b = np.zeros(nstop - 1, dtype=np.complex128)
_mie_An_Bn(m, x, a, b)
nangles = len(mu)
S1 = np.zeros(nangles, dtype=np.complex128)
S2 = np.zeros(nangles, dtype=np.complex128)
factor1 = np.empty(nstop, dtype=np.float64)
factor2 = np.empty(nstop, dtype=np.float64)
factor3 = np.empty(nstop, dtype=np.float64)
for n in range(1, nstop):
factor1[n - 1] = (2 * n + 1) / (n + 1) / n
factor2[n - 1] = (2 * n + 1) / n
factor3[n - 1] = (n + 1) / n
nstop = len(a)
for k in nb.prange(nangles):
pi_nm2 = 0
pi_nm1 = 1
for n in range(1, nstop):
i = n - 1
tau_nm1 = n * mu[k] * pi_nm1 - (n + 1.0) * pi_nm2
S1[k] += factor1[i] * (pi_nm1 * a[i] + tau_nm1 * b[i])
S2[k] += factor1[i] * (tau_nm1 * a[i] + pi_nm1 * b[i])
temp = pi_nm1
pi_nm1 = factor2[i] * mu[k] * pi_nm1 - factor3[i] * pi_nm2
pi_nm2 = temp
# calculate norm = sqrt(pi * Qext * x**2)
n = np.arange(1, nstop + 1)
norm = np.sqrt(2 * np.pi * np.sum((2 * n + 1) * (a.real + b.real)))
S1 /= norm
S2 /= norm
return [S1, S2]
%timeit -n 1000 _mie_S1_S2_opt(m, x, mu)
On my machine with 6 cores, the final optimized implementation is 12 times faster with fastmath=True and 8.8 times faster without. Note that using similar strategies in other functions may also helps to speed up them.
I have used the Equation of Motion (Newtons Law) for a simple spring and mass scenario incorporating it into the given 2nd ODE equation y" + (k/m)x = 0; y(0) = 3; y'(0) = 0.
I have then been able to run a code that calculates and compares the Exact Solution with the Runge-Kutta Method Solution.
It works fine...however, I have recently been asked not to separate my values of 'x' and 'v', but use a single vector 'x' that has two dimensions ( i.e. 'x' and 'v' can be handled by x(1) and x(2) ).
MY CODE:
# Given is y" + (k/m)x = 0; y(0) = 3; y'(0) = 0
# Parameters
h = 0.01; #Step Size
t = 100.0; #Time(sec)
k = 1;
m = 1;
x0 = 3;
v0 = 0;
# Exact Analytical Solution
te = np.arange(0, t ,h);
N = len(te);
w = (k / m) ** 0.5;
x_exact = x0 * np.cos(w * te);
v_exact = -x0 * w * np.sin(w * te);
# Runge-kutta Method
x = np.empty(N);
v = np.empty(N);
x[0] = x0;
v[0] = v0;
def f1 (t, x, v):
x = v
return x
def f2 (t, x, v):
v = -(k / m) * x
return v
for i in range(N - 1): #MAIN LOOP
K1x = f1(te[i], x[i], v[i])
K1v = f2(te[i], x[i], v[i])
K2x = f1(te[i] + h / 2, x[i] + h * K1x / 2, v[i] + h * K1v / 2)
K2v = f2(te[i] + h / 2, x[i] + h * K1x / 2, v[i] + h * K1v / 2)
K3x = f1(te[i] + h / 2, x[i] + h * K2x / 2, v[i] + h * K2v / 2)
K3v = f2(te[i] + h / 2, x[i] + h * K2x / 2, v[i] + h * K2v / 2)
K4x = f1(te[i] + h, x[i] + h * K3x, v[i] + h * K3v)
K4v = f2(te[i] + h, x[i] + h * K3x, v[i] + h * K3v)
x[i + 1] = x[i] + h / 6 * (K1x + 2 * K2x + 2 * K3x + K4x)
v[i + 1] = v[i] + h / 6 * (K1v + 2 * K2v + 2 * K3v + K4v)
Can anyone help me understand how I can create this single vector having 2 dimensions, and how to fix my code up please?
You can use np.array() function, here is an example of what you're trying to do:
x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Unsure of your exact expectations of what you are wanting besides just having a 2 lists inside a single list. Though I do hope this link will help answer your issue.
https://www.tutorialspoint.com/python_data_structure/python_2darray.htm?
I am trying to calculate g(x_(i+2)) from the value g(x_(i+1)) and g(x_i), i is an integer, assuming I(x) and s(x) are Gaussian function. If we know x_i = 100, then the summation from 0 to 100, I don't know how to handle g(x_i) with the subscript in python, knowing the first and second value, we can find the third value, after n cycle, we can find the nth value.
Equation:
code:
import numpy as np
from matplotlib import pyplot as p
from math import pi
def f_s(x, mu_s, sig_s):
ss = -np.power(x - mu_s, 2) / (2 * np.power(sig_s, 2))
return np.exp(ss) / (np.power(2 * pi, 2) * sig_s)
def f_i(x, mu_i, sig_i):
ii = -np.power(x - mu_i, 2) / (2 * np.power(sig_i, 2))
return np.exp(ii) / (np.power(2 * pi, 2) * sig_i)
# problems occur in this part
def g(x, m, mu_s, sig_s, mu_i, sig_i):
for i in range(1, m): # specify the number x, x_1, x_2, x_3 ......X_m
h = (x[i + 1] - x[i]) / e
for n in range(0, x[i]): # calculate summation
sum_f = (f_i(x[i], mu_i, sig_i) - f_s(x[i] - n, mu_s, sig_s) * g_x[n]) * np.conj(f_s(n +
x[i], mu_s, sig_s))
g_x[1] = 1 # initial value
g_x[2] = 5
g_x[i + 2] = h * sum_f + 2 * g_x[i + 1] - g_x[i]
return g_x[i + 2]
x = np.linspace(-10, 10, 10000)
e = 1
d = 0.01
m = 1000
mu_s = 2
sig_s = 1
mu_i = 1
sig_i = 1
p.plot(x, g(x, m, mu_s, sig_s, mu_i, sig_i))
p.legend()
p.show()
result:
I(x) and s(x)