I am currently trying to understand the fft-function from numpy. For that I tested the following assumption:
I have two functions, f(x) = x^2 and g(x) = f'(x) = 2*x. According to the fourier transformation laws and wolfram alpha it should be that G(w) = 2pi*i*F(w) (prefactors can vary, but there should only be a constant factor). When implementing that in python, I write
import numpy as np
def x2(x):
return x*x
def nx(x):
return 2*x
a = np.linspace(-3, 3, 16)
a1 = x2(a)
a2 = nx(a)
b1 = np.fft.fft(a1)
b2 = np.fft.fft(a2)
c = b1/b2
Now I am expecting a nearly constant value for c, but I get
array([ 1.02081592e+16+0.j , 1.32769987e-16-1.0054679j ,
4.90653893e-17-0.48284271j, -1.28214041e-16-0.29932115j,
-1.21430643e-16-0.2j , 5.63664751e-16-0.13363573j,
-5.92271642e-17-0.08284271j, -4.21346622e-16-0.03978247j,
-5.55111512e-16-0.j , -5.04781597e-16+0.03978247j,
-6.29288619e-17+0.08284271j, 8.39500693e-16+0.13363573j,
-1.21430643e-16+0.2j , -0.00000000e+00+0.29932115j,
-0.00000000e+00+0.48284271j, 1.32769987e-16+1.0054679j ])
Where is my mistake, and what can I do to use the fft as intended?
The properties you give apply to the Continuous Fourier transform (CFT). What is computed by the FFT is the Discrete Fourier transform (DFT), which is related to the CFT but is not exactly equivalent.
It's true that the DFT is proportional to the CFT under certain conditions: namely with sufficient sampling of a function that is zero outside the sample limits (see e.g. Appendix E of this book).
Neither condition holds for the functions you propose above, so the DFT is not proportional to the CFT and your numerical results reflect that.
Here's some code that confirms via the FFT the relationship you're interested in, using an appropriately sampled band-limited function:
import numpy as np
def f(x):
return np.exp(-x ** 2)
def fprime(x):
return -2 * x * f(x)
a = np.linspace(-10, 10, 100)
a1 = f(a)
a2 = fprime(a)
b1 = np.fft.fft(a1)
b2 = np.fft.fft(a2)
omega = 2 * np.pi * np.fft.fftfreq(len(a), a[1] - a[0])
np.allclose(b1 * 1j * omega, b2)
# True
Related
I'm trying to do a particle in a box simulation with no potential field. Took me some time to find out that simple explicit and implicit methods break unitary time evolution so I resorted to crank-nicolson, which is supposed to be unitary. But when I try it I find that it still is not so. I'm not sure what I'm missing.. The formulation I used is this:
where T is the tridiagonal Toeplitz matrix for the second derivative wrt x and
The system simplifies to
The A and B matrices are:
I just solve this linear system for using the sparse module. The math makes sense and I found the same numeric scheme in some papers so that led me to believe my code is where the problem is.
Here's my code so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import toeplitz
from scipy.sparse.linalg import spsolve
from scipy import sparse
# Spatial discretisation
N = 100
x = np.linspace(0, 1, N)
dx = x[1] - x[0]
# Time discretisation
K = 10000
t = np.linspace(0, 10, K)
dt = t[1] - t[0]
alpha = (1j * dt) / (2 * (dx ** 2))
A = sparse.csc_matrix(toeplitz([1 + 2 * alpha, -alpha, *np.zeros(N-4)]), dtype=np.cfloat) # 2 less for both boundaries
B = sparse.csc_matrix(toeplitz([1 - 2 * alpha, alpha, *np.zeros(N-4)]), dtype=np.cfloat)
# Initial and boundary conditions (localized gaussian)
psi = np.exp((1j * 50 * x) - (200 * (x - .5) ** 2))
b = B.dot(psi[1:-1])
psi[0], psi[-1] = 0, 0
for index, step in enumerate(t):
# Within the domain
psi[1:-1] = spsolve(A, b)
# Enforce boundaries
# psi[0], psi[N - 1] = 0, 0
b = B.dot(psi[1:-1])
# Square integration to show if it's unitary
print(np.trapz(np.abs(psi) ** 2, dx))
You are relying on the Toeplitz constructor to produce a symmetric matrix, so that the entries below the diagonal are the same as above the diagonal. However, the documentation for scipy.linalg.toeplitz(c, r=None) says not "transpose", but
*"If r is not given, r == conjugate(c) is assumed."
so that the resulting matrix is self-adjoint. In this case this means that the entries above the diagonal have their sign switched.
It makes no sense to first construct a dense matrix and then extract a sparse representation. Construct it as sparse tridiagonal matrix from the start, using scipy.sparse.diags
A = sparse.diags([ (N-3)*[-alpha], (N-2)*[1+2*alpha], (N-3)*[-alpha]], [-1,0,1], format="csc");
B = sparse.diags([ (N-3)*[ alpha], (N-2)*[1-2*alpha], (N-3)*[ alpha]], [-1,0,1], format="csc");
I'm tring to approximate an empirical cumulative distribution function (ECDF I want to approximate) with a smooth function (with less than 5 parameter) such as the generalized logistic function.
However, using scipy.optimize.curve_fit, the fitting operation gives really bad approximations or it doesn't work at all (depending on the initial values). The variable series represents my data stored as pandas.Series.
from scipy.optimize import curve_fit
def fit_ecdf(x):
x = np.sort(x)
def result(v):
return np.searchsorted(x, v, side='right') / x.size
return result
ecdf = fit_ecdf(series)
def genlogistic(x, B, M, Q, v):
return 1 / (1 + Q * np.exp(-B * (x - M))) ** (1 / v)
params = curve_fit(genlogistic, xdata = series, ydata = ecdf(series), p0 = (0.1, 10.0, 0.1, 0.1))[0]
Should I use another type of function for the fit?
Are there any code mistakes?
UPDATE - 1
As asked, I link to a csv containing the data.
UPDATE - 2
After a lot of search and trial and error I find out this function
f(x; a, b, c) = 1 - 1 / (1 + (x / b) ** a) ** c
with a = 4.61320000, b = 2.94570952, c = 0.5886922
which fits a lot better than the other one. The only problem is the little step that the ECDF shows near x=1. How can I modify f to improve the quality of the fit? I was thinking of adding some sort of function that is "relevant" only in those kind of points. Here are the graphical results of the fit where the solid blue line is the ECDF and the dotted line represents the (x, f(x)) points.
I find out how to deal with that little step near x=1. As expressed in the question, adding some sort of function that is significant only in that interval was the game changer.
The "step" ends at about (1.7, 0.04) so I needed a sort of function that flattens for x > 1.7 and has y = 0.04 as asymptote. The natural choice (just to stay on point) was to take a function like f(x) = 1/exp(x).
Thanks to JamesPhillips, I also picked up the proper data for the regression (no double values = no overweighted points).
Python Code
from scipy.optimize import curve_fit
def fit_ecdf(x):
x = np.sort(x)
def result(v):
return np.searchsorted(x, v, side = 'right') / x.size
return result
ecdf = fit_ecdf(series)
unique_series = series.unique().tolist()
def cdf_interpolation(x, a, b, c, d):
f_1 = 0.95 + (0 - 0.95) / (1 + (x / b) ** a) ** c + 0.05
f_2 = (0 - 0.05)/(np.exp(d * x))
return f_1 + f_2
params = curve_fit(cdf_interpolation,
xdata = unique_series ,
ydata = ecdf(unique_series),
p0 = (6.0, 3.0, 0.4, 1.0))[0]
Parameters
a = 6.03256462
b = 2.89418871
c = 0.42997956
d = 1.06864006
Graphical results
I got an OK fit for a 5-parameter logistic equation (see image and code) using unique values, not sure if the low end curve is sufficient for your needs, please check.
import numpy as np
def Sigmoidal_FiveParameterLogistic_model(x_in): # from zunzun.com
# coefficients
a = 9.9220221252324947E-01
b = -3.1572339989462903E+00
c = 2.2303376075685142E+00
d = 2.6271495036080207E-02
f = 3.4399008905318986E+00
return d + (a - d) / np.power(1.0 + np.power(x_in / c, b), f)
I have a system of ODEs that depend on a matrix of data. Each ODE should reference a different column of data in its evaluation.
import numpy as np
n_eqns = 20
coeffs = np.random.normal(0, 1, (n_eqns, 20))
def dfdt(_, f, idx):
return (f ** 2) * coeffs[idx, :].sum() - 2 * f * coeffs.sum()
from scipy.integrate import ode
f0 = np.random.uniform(-1, 1, n_eqns)
t0 = 0
tf = 1
dt = 0.001
r = ode(dfdt)
r.set_initial_value(f0, t0).set_f_params(range(n_eqns))
while r.successful() and r.t < tf:
print(r.t+dt, r.integrate(r.t+dt))
How can I specify that each ODE should use the idx value associated with its index in the system of ODEs? The first equation should be passed idx=0, the second idx=1, and so on.
The function dfdt takes and returns the state and derivative, respectively as arrays (or other iterables). Thus, all you have to do is to loop over all indices and apply your operations accordingly. For example:
def dfdt(t,f):
output = np.empty_like(f)
for i,entry in enumerate(f)
output[i] = f[i]**2 * coeffs[i,:].sum() - 2*f[i]*coeffs.sum()
return output
You can also write this using NumPy’s component-wise operations (which is quicker):
def dfdt(t,f):
return f**2 * coeffs.sum(axis=1) - 2*f*coeffs.sum()
Finally note that using f for your state may be somewhat confusing since this is how ode denotes the derivative (which you call dfdt).
I am trying to fit the following function: Detrended SNR into my data. C1, C2 and h are the parameters I need to obtain from the leastsq's method. C1 and C2 are simple but the problem is that my h(t) is in reality: . What I want to obtain are the coefficients hj inside that function (in my case there are 35 different hj's ). This function is the sum of different basis B-Splines each one weighted different and the number of coefficients is equal to the number of knots of the B-Spline. As I want to obtain C1, C2 and h1..35 I do the following:
funcLine = lambda tpl, eix_x: (tpl[0]*np.sin((4*math.pi*np.sum(bsplines_evaluades * np.transpose([tpl[2],tpl[3],tpl[4],tpl[5],tpl[6],tpl[7],tpl[8],tpl[9],tpl[10],tpl[11],tpl[12],tpl[13],tpl[14],tpl[15],tpl[16],tpl[17],tpl[18],tpl[19],tpl[20],tpl[21],tpl[22],tpl[23],tpl[24],tpl[25],tpl[26],tpl[27],tpl[28],tpl[29],tpl[30],tpl[31],tpl[32],tpl[33],tpl[34],tpl[35],tpl[36],tpl[37]]) , axis=0))*eix_x/lambda1) + tpl[1]*np.cos((4*math.pi*np.sum(bsplines_evaluades * np.transpose([tpl[2],tpl[3],tpl[4],tpl[5],tpl[6],tpl[7],tpl[8],tpl[9],tpl[10],tpl[11],tpl[12],tpl[13],tpl[14],tpl[15],tpl[16],tpl[17],tpl[18],tpl[19],tpl[20],tpl[21],tpl[22],tpl[23],tpl[24],tpl[25],tpl[26],tpl[27],tpl[28],tpl[29],tpl[30],tpl[31],tpl[32],tpl[33],tpl[34],tpl[35],tpl[36],tpl[37]]) , axis=0))*eix_x/lambda1))*np.exp(-4*np.power(k, 2)*lambda_big*np.power(eix_x, 2))
func = funcLine
ErrorFunc = lambda tpl, eix_x, ydata: np.power(func(tpl, eix_x) - ydata,2)
tplFinal1, success = leastsq(ErrorFunc, [2, -2, 8.2*np.ones(35)], args=(eix_x, ydata))
tpl(0)=C1, tpl(1)=C2 and tpl(2..35)=my coefficients. bsplines_evaluades is a matrix [35,86000] where each row is the temporal function of each basis b-spline so I weight each row with its individual coefficient, 86000 is the length of eix_x. ydata(eix_x) is the function I want to aproximate. lambda1= 0.1903 ; lambda_big= 2; k=2*pi/lambda1. The output is the same initial parameters which is not logic.
Can anyone help me? I have tried with curvefit too but it does not work.
Data is in : http://www.filedropper.com/data_5>http://www.filedropper.com/download_button.png width=127 height=145 border=0/> http://www.filedropper.com >online backup storage
EDIT
The code right now is:
lambda1 = 0.1903
k = 2 * math.pi / lambda1
lambda_big = 2
def funcLine(tpl, eix_x):
C1, C2, h = tpl[0], tpl(1), tpl[2:]
hsum = np.sum(bsplines_evaluades * h, axis=1) # weight each
theta = 4 * np.pi * np.array(hsum) * np.array(eix_x) / lambda1
return (C1*np.sin(theta)+C2*np.cos(theta))*np.exp(-4*lambda_big*(k*eix_x)**2) # lambda_big = 2
if len(eix_x) != 0:
ErrorFunc = lambda tpl, eix_x, ydata: funcLine(tpl, eix_x) - ydata
param_values = 7.5 * np.ones(37)
param_values[0] = 2
param_values(1) = -2
tplFinal2, success = leastsq(ErrorFunc, param_values, args=(eix_x, ydata))
The problem is that the output parameters don't change with respect the initial ones. Data (x_axis,ydata,bsplines_evaluades):
gist.github.com/hect1995/dcd36a4237fe57791d996bd70e7a9fc7 gist.github.com/hect1995/39ae4768ebb32c27f1ddea97e24d96af gist.github.com/hect1995/bddd02de567f8fcbedc752371b47ff71
It's always helpful (to yourself as well as us) to provide a readable example that is complete enough to be runnable. Your example with lambdas and very long line is definitely not readable, making it very easy for you to miss simple mistakes. One of the points of using Python is to make code easier to read.
It is fine to have spline coefficients as fit variables, but you want to have the np.ndarray of variables to be exactly 1-dimensional. So your parameter array should be
param_values = 8.2 * np.ones(37)
param_values[0] = 2
param_values[1] = -2.
result_params, success = leastsq(errorFunc, param_values, ....)
It should be fine to use curve_fit() too. Beyond that, it's hard to give much help, as you give neither a complete runnable program (leaving lots of terms undefined), nor the output or error messages from running your code.
Several things could be wrong here: I'm not sure you're indexing your tpl array correctly (if it has 37 entries, the indexes should be 0:36). And your errorFunc should probably return the residual rather than the square residual.
Finally, I think your h-sum may be incorrect: you want to sum over the $N$ axis but not the $x$ axis, right?
You might tidy up your code as follows and see if it helps (without some data it's difficult to test for myself):
def funcLine(tpl, eix_x):
C1, C2, h = tpl[0], tpl[1], tpl[2:]
hsum = np.sum(bsplines_evaluades * h, axis=1)
theta = 4 * np.pi * hsum * eix_x / lambda1
return (C1 * np.sin(theta) + C2 * np.cos(theta)) * np.exp(-4 *lambda_big *
(k * eix_x)**2)
errorFunc = lambda tpl, eix_x, ydata: funcLine(tpl, eix_x) - ydata
tplFinal2, success = leastsq(errorFunc, [2, -2, 8.2*np.ones(35)],
args=(eix_x, ydata))
Short summary: How do I quickly calculate the finite convolution of two arrays?
Problem description
I am trying to obtain the finite convolution of two functions f(x), g(x) defined by
To achieve this, I have taken discrete samples of the functions and turned them into arrays of length steps:
xarray = [x * i / steps for i in range(steps)]
farray = [f(x) for x in xarray]
garray = [g(x) for x in xarray]
I then tried to calculate the convolution using the scipy.signal.convolve function. This function gives the same results as the algorithm conv suggested here. However, the results differ considerably from analytical solutions. Modifying the algorithm conv to use the trapezoidal rule gives the desired results.
To illustrate this, I let
f(x) = exp(-x)
g(x) = 2 * exp(-2 * x)
the results are:
Here Riemann represents a simple Riemann sum, trapezoidal is a modified version of the Riemann algorithm to use the trapezoidal rule, scipy.signal.convolve is the scipy function and analytical is the analytical convolution.
Now let g(x) = x^2 * exp(-x) and the results become:
Here 'ratio' is the ratio of the values obtained from scipy to the analytical values. The above demonstrates that the problem cannot be solved by renormalising the integral.
The question
Is it possible to use the speed of scipy but retain the better results of a trapezoidal rule or do I have to write a C extension to achieve the desired results?
An example
Just copy and paste the code below to see the problem I am encountering. The two results can be brought to closer agreement by increasing the steps variable. I believe that the problem is due to artefacts from right hand Riemann sums because the integral is overestimated when it is increasing and approaches the analytical solution again as it is decreasing.
EDIT: I have now included the original algorithm 2 as a comparison which gives the same results as the scipy.signal.convolve function.
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
import math
def convolveoriginal(x, y):
'''
The original algorithm from http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P, Q, N = len(x), len(y), len(x) + len(y) - 1
z = []
for k in range(N):
t, lower, upper = 0, max(0, k - (Q - 1)), min(P - 1, k)
for i in range(lower, upper + 1):
t = t + x[i] * y[k - i]
z.append(t)
return np.array(z) #Modified to include conversion to numpy array
def convolve(y1, y2, dx = None):
'''
Compute the finite convolution of two signals of equal length.
#param y1: First signal.
#param y2: Second signal.
#param dx: [optional] Integration step width.
#note: Based on the algorithm at http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P = len(y1) #Determine the length of the signal
z = [] #Create a list of convolution values
for k in range(P):
t = 0
lower = max(0, k - (P - 1))
upper = min(P - 1, k)
for i in range(lower, upper):
t += (y1[i] * y2[k - i] + y1[i + 1] * y2[k - (i + 1)]) / 2
z.append(t)
z = np.array(z) #Convert to a numpy array
if dx != None: #Is a step width specified?
z *= dx
return z
steps = 50 #Number of integration steps
maxtime = 5 #Maximum time
dt = float(maxtime) / steps #Obtain the width of a time step
time = [dt * i for i in range (steps)] #Create an array of times
exp1 = [math.exp(-t) for t in time] #Create an array of function values
exp2 = [2 * math.exp(-2 * t) for t in time]
#Calculate the analytical expression
analytical = [2 * math.exp(-2 * t) * (-1 + math.exp(t)) for t in time]
#Calculate the trapezoidal convolution
trapezoidal = convolve(exp1, exp2, dt)
#Calculate the scipy convolution
sci = signal.convolve(exp1, exp2, mode = 'full')
#Slice the first half to obtain the causal convolution and multiply by dt
#to account for the step width
sci = sci[0:steps] * dt
#Calculate the convolution using the original Riemann sum algorithm
riemann = convolveoriginal(exp1, exp2)
riemann = riemann[0:steps] * dt
#Plot
plt.plot(time, analytical, label = 'analytical')
plt.plot(time, trapezoidal, 'o', label = 'trapezoidal')
plt.plot(time, riemann, 'o', label = 'Riemann')
plt.plot(time, sci, '.', label = 'scipy.signal.convolve')
plt.legend()
plt.show()
Thank you for your time!
or, for those who prefer numpy to C. It will be slower than the C implementation, but it's just a few lines.
>>> t = np.linspace(0, maxtime-dt, 50)
>>> fx = np.exp(-np.array(t))
>>> gx = 2*np.exp(-2*np.array(t))
>>> analytical = 2 * np.exp(-2 * t) * (-1 + np.exp(t))
this looks like trapezoidal in this case (but I didn't check the math)
>>> s2a = signal.convolve(fx[1:], gx, 'full')*dt
>>> s2b = signal.convolve(fx, gx[1:], 'full')*dt
>>> s = (s2a+s2b)/2
>>> s[:10]
array([ 0.17235682, 0.29706872, 0.38433313, 0.44235042, 0.47770012,
0.49564748, 0.50039326, 0.49527721, 0.48294359, 0.46547582])
>>> analytical[:10]
array([ 0. , 0.17221333, 0.29682141, 0.38401317, 0.44198216,
0.47730244, 0.49523485, 0.49997668, 0.49486489, 0.48254154])
largest absolute error:
>>> np.max(np.abs(s[:len(analytical)-1] - analytical[1:]))
0.00041657780840698155
>>> np.argmax(np.abs(s[:len(analytical)-1] - analytical[1:]))
6
Short answer: Write it in C!
Long answer
Using the cookbook about numpy arrays I rewrote the trapezoidal convolution method in C. In order to use the C code one requires three files (https://gist.github.com/1626919)
The C code (performancemodule.c).
The setup file to build the code and make it callable from python (performancemodulesetup.py).
The python file that makes use of the C extension (performancetest.py)
The code should run upon downloading by doing the following
Adjust the include path in performancemodule.c.
Run the following
python performancemodulesetup.py build
python performancetest.py
You may have to copy the library file performancemodule.so or performancemodule.dll into the same directory as performancetest.py.
Results and performance
The results agree neatly with one another as shown below:
The performance of the C method is even better than scipy's convolve method. Running 10k convolutions with array length 50 requires
convolve (seconds, microseconds) 81 349969
scipy.signal.convolve (seconds, microseconds) 1 962599
convolve in C (seconds, microseconds) 0 87024
Thus, the C implementation is about 1000 times faster than the python implementation and a bit more than 20 times as fast as the scipy implementation (admittedly, the scipy implementation is more versatile).
EDIT: This does not solve the original question exactly but is sufficient for my purposes.