FFT to find autocorrelation function - python

I am trying to find the correlation function of the following stochastic process:
where beta and D are constants and xi(t) is a Gaussian noise term.
After simulating this process with the Euler method, I want to find the auto correlation function of this process. First of all, I have found an analytical solution for the correlation function and already used the definition of correlation function to simulate it and the two results were pretty close (please see the photo, the corresponding code is at the end of this post).
(Figure 1)
Now I want to use the Wiener-Khinchin theorem (using fft) to find the correlation function by taking the fft of the realizations, multiply it with its conjugate and then find take the ifft to get the correlation function. But obviously I am getting results that are way off the expected correlation function, so I am pretty sure there is something I misunderstood in the code to get this wrong results..
Here is my code for the solution of the stochastic process (which I am sure it is right although my code might be sloppy) and my attempt to find the autocorrelaion with the fft:
N = 1000000
dt=0.01
gamma = 1
D=1
v_data = []
v_factor = math.sqrt(2*D*dt)
v=1
for t in range(N):
F = random.gauss(0,1)
v = v - gamma*dt + v_factor*F
if v<0: ###boundary conditions.
v=-v
v_data.append(v)
def S(x,dt): ### power spectrum
N=len(x)
fft=np.fft.fft(x)
s=fft*np.conjugate(fft)
# n=N*np.ones(N)-np.arange(0,N) #divide res(m) by (N-m)
return s.real/(N)
c=np.fft.ifft(S(v_data,0.01)) ### correlation function
t=np.linspace(0,1000,len(c))
plt.plot(t,c.real,label='fft method')
plt.xlim(0,20)
plt.legend()
plt.show()
And this is what I would get using this method for the correlation function,
And this is my code for the correlation function using the definition:
def c_theo(t,b,d): ##this was obtained by integrating the solution of the SDE
I1=((-t*d)+((d**2)/(b**2))-((1/4)*(b**2)*(t**2)))*special.erfc(b*t/(2*np.sqrt(d*t)))
I2=(((d/b)*(np.sqrt(d*t/np.pi)))+((1/2)*(b*t)*(np.sqrt(d*t/np.pi))))*np.exp(-((b**2)*t)/(4*d))
return I1+I2
## this is the correlation function that was plotted in the figure 1 using the definition of the autocorrelation.
Ntau = 500
sum2=np.zeros(Ntau)
c=np.zeros(Ntau)
v_mean=0
for i in range (0,N):
v_mean=v_mean+v_data[i]
v_mean=v_mean/N
for itau in range (0,Ntau):
for i in range (0,N-10*itau):
sum2[itau]=sum2[itau]+v_data[i]*v_data[itau*10+i]
sum2[itau]=sum2[itau]/(N-itau*10)
c[itau]=sum2[itau]-v_mean**2
t=np.arange(Ntau)*dt*10
plt.plot(t,c,label='numericaly')
plt.plot(t,c_theo(t,1,1),label='analyticaly')
plt.legend()
plt.show()
so would someone please point out where is the mistake in my code, and how could I simulate it better to get the right correlation function?

There are two issues with the code that I can see.
As francis said in a comment, you need to subtract the mean from your signal to get the autocorrelation to reach zero.
You plot your autocorrelation function with a wrong x-axis values.
v_data is defined with:
N = 1000000 % 1e6
dt = 0.01 % 1e-2
meaning that t goes from 0 to 1e4. However:
t = np.linspace(0,1000,len(c))
meaning that you plot with t from 0 to 1e3. You should probably define t with
t = np.arange(N) * dt
Looking at the plot, I'd say that stretching the blue line by a factor 10 would make it line up with the red line quite well.

Related

How do I numerically integrate a function thats a product of a lorentzian and a cosinus in Python?

I am new to stackoverflow and also quite new to Python. So, I hope to ask my question in an appropriate manner.
I am running a Python code similar to this minimal example with an example function that is a product of a lorentzian with a cosinus that I want to numerically integrate:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
#minimal example:
omega_loc = 15
gamma = 5
def Lorentzian(w):
#print(w)
return (w**3)/((w/omega_loc) + 1)**2*(gamma/2)/((w-omega_loc)**2+(gamma/2)**2)
def intRe(t):
return quad(lambda w: w**(-2)*Lorentzian(w)*(1-np.cos(w*t)),0,np.inf,limit=10000)[0]
plt.figure(1)
plot_range = np.linspace(0,100,1000)
plt.plot(plot_range, [intRe(t) for t in plot_range])
Independent on the upper limit of the integration I never get the code to run and to give me a result.
When I enable the #print(w) line it seems like the code just keeps on probing the integral at random different values of w in an infinite loop (?). Also the console gives me a detection of a roundoff error.
Is there a different way for numerical integration in Python that is better suited for this kind of function than the quad function or did I do a more fundamental error?
Observations
Close to zero (1 - cos(w*t)) / w**2 tends to 0/0. We can take the taylor expansion t**2(1/2 - (w*t)**2/24).
Going to the infinity the Lorentzian is a constant and the cosine term will cause the output to oscillate indefinitely, the integral can be approximated by multiplying that term by a slowly decreasing term.
You are using a linearly spaced scale with many points. It is easier to visualize with w in log scale.
The plot looks like this before damping the cosine term
I introduced two parameters to tune the attenuation of the oscilations
def cosinus_term(w, t, damping=1e4*omega_loc):
return np.where(abs(w*t) < 1e-6, t**2*(0.5 - (w*t)**2/24.0), (1-np.exp(-abs(w/damping))*np.cos(w*t))/w**2)
def intRe(t, damping=1e4*omega_loc):
return quad(lambda w: cosinus_term(w, t)*Lorentzian(w),0,np.inf,limit=10000)[0]
Plotting with the following code
plt.figure(1)
plot_range = np.logspace(-3,3,100)
plt.plot(plot_range, [intRe(t, 1e2*omega_loc) for t in plot_range])
plt.plot(plot_range, [intRe(t, 1e3*omega_loc) for t in plot_range])
plt.xscale('log')
It runs in less than 3 minutes here, and the two results are close to each other, specially for large w, suggesting that the damping doesn't affect too much the result.

Python Integration: to calculate area under the curve

I want to find the integral of output power Po in the following code:
Vo = 54.6
# defining a function for duty cycle, output current and output power
def duty_cycle(output_voltage, array_voltage):
duty_cycle = np.divide(output_voltage, array_voltage)
return duty_cycle
def output_current(array_current, duty_cycle):
output_current = np.divide(array_current, duty_cycle)
return output_current
def output_power(output_voltage, output_current):
output_power = np.multiply(output_voltage, output_current)
return output_power
#calculating duty cycle, output current and output power
D = duty_cycle(Vo, array_params['arr_v_mp'])
Io = output_current(array_params['arr_i_mp'], D)
Po = output_power(Vo, Io)
#plot ouput power
plt.ylabel('Output Power [W]')
Po.plot(style='r-')
The code above is just a part of a script. array_params is a pandas time-series data frame. When plotted pandas Series Po, it looks like this:
This is my first time calculating integral using python. After reading through the internet, I think Python's scipy module could be of help but don't really know how and which method to implement. I would appreciate your help in any manner with the above-explained problem.
To compute an integral of the form int y(x) dx from x0 to x1, with an array x_array with values from x0 to x1 and a corresponding y_array of same length, one can use numpy's trapezoidal integration:
integral = np.trapz(y_array, x_array)
which will work also for non-constant spacing x_array[i+1]-x_array[i].
If an indefinite integral (i.e. an integral F(t) = integral f(t) dt) is needed, use scipy.integrate.cumtrapz (instead of numpy.trapz for definite integrals).
integrated = scipy.integrate.cumtrapz(power, dx=timestep)
or
integrated = scipy.integrate.cumtrapz(power, x=timevalues)
To have integrated the same length as power, specify the initial value of the integral, via the optional parameter initial (e.g. initial=0) to scipy.integrate.cumtrapz.

fit gaussian function to a broad data

I have two Arrays. one is E which is an integer Array from 0-21. the second is E_Dist which Shows the number of elements for each index of E Array. the min and max values of E_Dist is 0-450000.
I want to plot E_Dist vs E, i.e E in horizontal and E_Dist in vertical direction and fit a gaussian function to this plot.
unfortunately How much I try I cannot fit a gaussian function to it. I have some Problems in fact:
1- how can I guess the Center of gaussian function correctly. I can use simply the mean value of E, which is 10, but in reality my gaussian plot Peaks about 5. I tried to use some formula like
E_cen= sum(E*E_Dist)/sum(E)
but now the result is much worse and is about 10e4.
2- I used two different gaussian function form one with Amplitude and another without it. I never could fit a gaussian function when I considered the Amplitude but without Amplitude I have a gaussian plot but never fit my data Points.
I think one issue is that I couldn't calculate the Sigma or other gaussian Parameters correctly. another one can be the large difference of E_Dist values which covers 0 to 450000.
I attached my plotanyone can help me how can I solve the Problem?
Thanks in advance
.
########### Gaussian fit
E_cen= np.mean(E)
print 'E_cen=',E_cen
10.54
mean_N = np.mean(E_Dist ) # mean number of particles
print 'mean_N=',mean_N
>>>58937
sigma = np.sqrt( sum((E_Dist - mean_N)**2 )/len(E_Dist)-1 )
print 'sigma =',sigma #/1.e9,'*1.e9'
>>>119888.7
##########
# def gaus(x,a,x0,sigma):
# return a*np.exp(-(x-x0)**2/(2*sigma**2)) # Gaussian function with amplitude
#popt,pcov = curve_fit(gaus,E,E_Dist ,p0=[2,E_cen,sigma]) #
def gaus(x,x0,sigma):
return np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,E,E_Dist ,p0=[E_cen,sigma]) # no amplitude
#################
print 'min(popt)=',min(popt)
>>>10.54
print 'max(popt)=',max(popt)
>>>119800
print 'mean(popt)=',np.mean(popt)
>>>59940

Fourier smoothing of data set

I am following this link to do a smoothing of my data set.
The technique is based on the principle of removing the higher order terms of the Fourier Transform of the signal, and so obtaining a smoothed function.
This is part of my code:
N = len(y)
y = y.astype(float) # fix issue, see below
yfft = fft(y, N)
yfft[31:] = 0.0 # set higher harmonics to zero
y_smooth = fft(yfft, N)
ax.errorbar(phase, y, yerr = err, fmt='b.', capsize=0, elinewidth=1.0)
ax.plot(phase, y_smooth/30, color='black') #arbitrary normalization, see below
However some things do not work properly.
Indeed, you can check the resulting plot :
The blue points are my data, while the black line should be the smoothed curve.
First of all I had to convert my array of data y by following this discussion.
Second, I just normalized arbitrarily to compare the curve with data, since I don't know why the original curve had values much higher than the data points.
Most importantly, the curve is like "specular" to the data point, and I don't know why this happens.
It would be great to have some advices especially to the third point, and more generally how to optimize the smoothing with this technique for my particular data set shape.
Your problem is probably due to the shifting that the standard FFT does. You can read about it here.
Your data is real, so you can take advantage of symmetries in the FT and use the special function np.fft.rfft
import numpy as np
x = np.arange(40)
y = np.log(x + 1) * np.exp(-x/8.) * x**2 + np.random.random(40) * 15
rft = np.fft.rfft(y)
rft[5:] = 0 # Note, rft.shape = 21
y_smooth = np.fft.irfft(rft)
plt.plot(x, y, label='Original')
plt.plot(x, y_smooth, label='Smoothed')
plt.legend(loc=0)
plt.show()
If you plot the absolute value of rft, you will see that there is almost no information in frequencies beyond 5, so that is why I choose that threshold (and a bit of playing around, too).
Here the results:
From what I can gather you want to build a low pass filter by doing the following:
Move to the frequency domain. (Fourier transform)
Remove undesired frequencies.
Move back to the time domain. (Inverse fourier transform)
Looking at your code, instead of doing 3) you're just doing another fourier transform. Instead, try doing an actual inverse fourier transform to move back to the time domain:
y_smooth = ifft(yfft, N)
Have a look at scipy signal to see a bunch of already available filters.
(Edit: I'd be curious to see the results, do share!)
I would be very cautious in using this technique. By zeroing out frequency components of the FFT you are effectively constructing a brick wall filter in the frequency domain. This will result in convolution with a sinc in the time domain and likely distort the information you want to process. Look up "Gibbs phenomenon" for more information.
You're probably better off designing a low pass filter or using a simple N-point moving average (which is itself a LPF) to accomplish the smoothing.

Fitting gaussian to a curve in Python II

I have two lists .
import numpy
x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])
They make a U shaped curve.
Now I want to fit a gaussian to that curve.
from scipy.optimize import curve_fit
n = len(x)
mean = sum(y)/n
sigma = sum(y - mean)**2/n
def gaus(x,a,x0,sigma,c):
return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c
popt, pcov = curve_fit(gaus,x,y,p0=[-1,mean,sigma,-5])
pylab.plot(x,y,'r-')
pylab.plot(x,gaus(x,*popt),'k-')
pylab.show()
I just end up with the noisy original U-shaped curve and a straight horizontal line running through the curve.
I am not sure what the -1 and the -5 represent in the above code but I am sure that I need to adjust them or something else to get the gaussian curve. I have been playing around with possible values but to no avail.
Any ideas?
First of all, your variable sigma is actually variance, i.e. sigma squared --- http://en.wikipedia.org/wiki/Variance#Definition.
This confuses the curve_fit by giving it a suboptimal starting estimate.
Then, your fitting ansatz, gaus, includes an amplitude a and an offset, is this what you actually need? And the starting values are a=-1 (negated bell shape) and offset c=-5. Where do they come from?
Here's what I'd do:
fix your fitting model. Do you want just a gaussian, does it need to be normalized. If it does, then the amplitude a is fixed by sigma etc.
Have a look at the actual data. What's the tail (offset), what's the sign (amplitude sign).
If you're actually want just a gaussian without any bells and whistles, you might not actually need curve_fit: a gaussian is fully defined by two first moments, mean and sigma. Calculate them as you do, plot them over the data and see if you're not all set.
p0 in your call to curve_fit gives the initial guesses for the additional parameters of you function in addition to x. In the above code you are saying that I want the curve_fit function to use -1 as the initial guess for a, -5 as the initial guess for c, mean as the initial guess for x0, and sigma as the guess for sigma. The curve_fit function will then adjust these parameters to try and get a better fit. The problem is your initial guesses at your function parameters are really bad given the order of (x,y)s.
Think a little bit about the order of magnitude of your different parameters for the Gaussian. a should be around the size of your y values (10**-16) as at the peak of the Gaussian the exponential part will never be larger than 1. x0 will give the position within your x values at which the exponential part of your Gaussian will be 1, so x0 should be around 7500, probably somewhere in the centre of your data. Sigma indicates the width, or spread of your Gaussian, so perhaps something in the 100's just a guess. Finally c is just an offset to shift the whole Gaussian up and down.
What I would recommend doing, is before fitting the curve, pick some values for a, x0, sigma, and c that seem reasonable and just plot the data with the Gaussian, and play with a, x0, sigma, and c until you get something that looks at least some what the way you want the Gaussian to fit, then use those as the starting points for curve_fit p0 values. The values I gave should get you started, but may not do exactly what you want. For instance a probably needs to be negative if you want to flip the Gaussian to get a "U" shape.
Also printing out the values that curve_fit thinks are good for your a,x0,sigma, and c might help you see what it is doing and if that function is on the right track to minimizing the residual of the fit.
I have had similar problems doing curve fitting with gnuplot, if the initial values are too far from what you want to fit it goes in completely the wrong direction with the parameters to minimize the residuals, and you could probably do better by eye. Think of these functions as a way to fine tune your by eye estimates of these parameters.
hope that helps
I don't think you are estimating your initial guesses for mean and sigma correctly.
Take a look at the SciPy Cookbook here
I think it should look like this.
x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])
n = len(x)
mean = sum(x*y)/sum(y)
sigma = sqrt(abs(sum((x-mean)**2*y)/sum(y)))
def gaus(x,a,x0,sigma,c):
return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c
popy, pcov = curve_fit(gaus,x,y,p0=[-max(y),mean,sigma,min(x)+((max(x)-min(x)))/2])
pylab.plot(x,gaus(x,*popt))
If anyone has a link to a simple explanation why these are the correct moments I would appreciate it. I am going on faith that SciPy Cookbook got it right.
Here is the solution thanks to everyone .
x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])
n = len(x)
mean = sum(x)/n
sigma = math.sqrt(sum((x-mean)**2)/n)
def gaus(x,a,x0,sigma,c):
return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c
popy, pcov = curve_fit(gaus,x,y,p0=[-max(y),mean,sigma,min(x)+((max(x)-min(x)))/2])
pylab.plot(x,gaus(x,*popt))
Maybe it is because I use matlab and fminsearch or my fits have to work on much fewer datapoints (~ 5-10), I have much better results with the following starter values (as simple as they are):
a = max(y)-min(y);
imax= find(y==max(y),1);
mean = x(imax);
avg = sum(x.*y)./sum(y);
sigma = sqrt(abs(sum((x-avg).^2.*y) ./ sum(y)));
c = min(y);
The sigma works fine.

Categories