Related
I have heard that the Cauchy integration formula can be used to interpolate complex-valued functions along a closed boundary of a disk to points inside the disk. For my current project, this sounds rather valuable, so I attempted to give this a shot. Unfortunately, my experiments were not very successful so far, and I am not certain what is going wrong. Some degree of interpolation is certainly going on, but the results do not seem to be correct along the boundaries. Here is what my code returns:
Here is my initial code example:
import scipy.stats
import numpy as np
import scipy.integrate
import scipy.interpolate
import matplotlib.pyplot as plt
plt.close('all')
# This is the interpolation function, which takes as input a position on the
# boundary in radians (x), a complex evaluation point (eval_point), and the
# function which returns the boundary condition
def f(x,eval_point,itp):
# What is the complex coordinate of this point on the boundary?
zi = np.cos(x) + 1j*np.sin(x)
# Get the boundary condition value
fz = itp(x)
return fz/(zi-eval_point)
# Complex quadrature for integration, adapted from
# https://stackoverflow.com/questions/57325919/using-scipy-quad-with-i%ce%b5-trick-bad-results
def cquad(func, a, b, **kwargs):
real_integral = scipy.integrate.quad(lambda x: np.real(func(x, **kwargs)), a, b, limit=200)
imag_integral = scipy.integrate.quad(lambda x: np.imag(func(x, **kwargs)), a, b, limit=200)
return (real_integral[0] + 1j*imag_integral[0], real_integral[1:], imag_integral[1:])
# Define the interpolation function for the boundary values
itp = scipy.interpolate.interp1d(
x = [0,np.pi/2,np.pi,1.5*np.pi,2*np.pi],
y = [0+0j,0+1j,1+1j,1+0j,0+0j])
# Get some evaluation points
X,Y = np.meshgrid(np.linspace(-1,1,51),
np.linspace(-1,1,51))
XY = X+1j*Y
x = np.ndarray.flatten(XY)
# Throw away all points outside the unit disk; avoid evaluting at radius 1 to
# dodge singularities
x = x[np.where(np.abs(x) <= 0.99)]
# Calculate the result for each evaluation point
res = []
for val in x:
res.append(cquad(
func = f,
a = 0,
b = 2*np.pi,
eval_point = val,
itp = itp)[0]/(2*np.pi*1j))
# Convert the results into an array
res = np.asarray(res)
# Plot the real part of the results
plt.tricontour(
np.real(x),
np.imag(x),
np.real(res),
cmap = 'jet')
plt.colorbar(label='real part')
# Plot the imaginary part of the results
plt.tricontour(
np.real(x),
np.imag(x),
np.imag(res),
cmap = 'Greys')
plt.colorbar(label='imaginary part')
Does anybody have an idea what is going wrong?
You can get an easy approximation of that function by employing the FFT. The inverse FFT can be interpreted as polynomial evaluation at the corresponding points on the unit circle, so that the polynomial in total is an approximation of the Cauchy-formula
c = np.fft.fft(itp(np.linspace(0,2*np.pi,401)[:-1]))
c=c[::-1]/len(c)
np.polyval(c,[1,1j,-1,-1j])
returns
[5.55111512e-17+5.55111512e-17j, 5.55111512e-17+1.00000000e+00j,
1.00000000e+00+1.00000000e+00j, 1.00000000e+00+5.55111512e-17j]
these are the values that were expected.
X,Y = np.meshgrid(np.linspace(-1,1,151),
np.linspace(-1,1,151))
Z = (X+1j*Y).flatten()
Z = Z[np.where(np.abs(Z) <= 0.99)]
W = np.polyval(c,Z)
# Plot the real part of the results
plt.tricontour( Z.real, Z.imag, W.real, cmap = 'jet')
plt.colorbar(label='real part')
# Plot the imaginary part of the results
plt.tricontour( Z.real, Z.imag, W.imag, cmap = 'Greys')
plt.colorbar(label='imaginary part')
plt.tight_layout(); plt.show()
This then gives the picture
The dominant terms of the polynomial are
(1+1j)*(0.500000 - 0.045040*z^3 - 0.008279*z^7
- 0.005012*z^391 - 0.016220*z^395 - 0.405293*z^399)
As far as I could see, the leading degree 3 after the constant term is constant under refinement of the sampling sequence.
I am trying to show that economies follow a relatively sinusoidal growth pattern. I am building a python simulation to show that even when we let some degree of randomness take hold, we can still produce something relatively sinusoidal.
I am happy with the data I'm producing, but now I'd like to find some way to get a sine graph that pretty closely matches the data. I know you can do polynomial fit, but can you do sine fit?
Here is a parameter-free fitting function fit_sin() that does not require manual guess of frequency:
import numpy, scipy.optimize
def fit_sin(tt, yy):
'''Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc"'''
tt = numpy.array(tt)
yy = numpy.array(yy)
ff = numpy.fft.fftfreq(len(tt), (tt[1]-tt[0])) # assume uniform spacing
Fyy = abs(numpy.fft.fft(yy))
guess_freq = abs(ff[numpy.argmax(Fyy[1:])+1]) # excluding the zero frequency "peak", which is related to offset
guess_amp = numpy.std(yy) * 2.**0.5
guess_offset = numpy.mean(yy)
guess = numpy.array([guess_amp, 2.*numpy.pi*guess_freq, 0., guess_offset])
def sinfunc(t, A, w, p, c): return A * numpy.sin(w*t + p) + c
popt, pcov = scipy.optimize.curve_fit(sinfunc, tt, yy, p0=guess)
A, w, p, c = popt
f = w/(2.*numpy.pi)
fitfunc = lambda t: A * numpy.sin(w*t + p) + c
return {"amp": A, "omega": w, "phase": p, "offset": c, "freq": f, "period": 1./f, "fitfunc": fitfunc, "maxcov": numpy.max(pcov), "rawres": (guess,popt,pcov)}
The initial frequency guess is given by the peak frequency in the frequency domain using FFT. The fitting result is almost perfect assuming there is only one dominant frequency (other than the zero frequency peak).
import pylab as plt
N, amp, omega, phase, offset, noise = 500, 1., 2., .5, 4., 3
#N, amp, omega, phase, offset, noise = 50, 1., .4, .5, 4., .2
#N, amp, omega, phase, offset, noise = 200, 1., 20, .5, 4., 1
tt = numpy.linspace(0, 10, N)
tt2 = numpy.linspace(0, 10, 10*N)
yy = amp*numpy.sin(omega*tt + phase) + offset
yynoise = yy + noise*(numpy.random.random(len(tt))-0.5)
res = fit_sin(tt, yynoise)
print( "Amplitude=%(amp)s, Angular freq.=%(omega)s, phase=%(phase)s, offset=%(offset)s, Max. Cov.=%(maxcov)s" % res )
plt.plot(tt, yy, "-k", label="y", linewidth=2)
plt.plot(tt, yynoise, "ok", label="y with noise")
plt.plot(tt2, res["fitfunc"](tt2), "r-", label="y fit curve", linewidth=2)
plt.legend(loc="best")
plt.show()
The result is good even with high noise:
Amplitude=1.00660540618, Angular freq.=2.03370472482, phase=0.360276844224, offset=3.95747467506, Max. Cov.=0.0122923578658
You can use the least-square optimization function in scipy to fit any arbitrary function to another. In case of fitting a sin function, the 3 parameters to fit are the offset ('a'), amplitude ('b') and the phase ('c').
As long as you provide a reasonable first guess of the parameters, the optimization should converge well.Fortunately for a sine function, first estimates of 2 of these are easy: the offset can be estimated by taking the mean of the data and the amplitude via the RMS (3*standard deviation/sqrt(2)).
Note: as a later edit, frequency fitting has also been added. This does not work very well (can lead to extremely poor fits). Thus, use at your discretion, my advise would be to not use frequency fitting unless frequency error is smaller than a few percent.
This leads to the following code:
import numpy as np
from scipy.optimize import leastsq
import pylab as plt
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.15247 # Optional!! Advised not to use
data = 3.0*np.sin(f*t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise
guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)/(2**0.5)
guess_phase = 0
guess_freq = 1
guess_amp = 1
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_std*np.sin(t+guess_phase) + guess_mean
# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*np.sin(x[1]*t+x[2]) + x[3] - data
est_amp, est_freq, est_phase, est_mean = leastsq(optimize_func, [guess_amp, guess_freq, guess_phase, guess_mean])[0]
# recreate the fitted curve using the optimized parameters
data_fit = est_amp*np.sin(est_freq*t+est_phase) + est_mean
# recreate the fitted curve using the optimized parameters
fine_t = np.arange(0,max(t),0.1)
data_fit=est_amp*np.sin(est_freq*fine_t+est_phase)+est_mean
plt.plot(t, data, '.')
plt.plot(t, data_first_guess, label='first guess')
plt.plot(fine_t, data_fit, label='after fitting')
plt.legend()
plt.show()
Edit: I assumed that you know the number of periods in the sine-wave. If you don't, it's somewhat trickier to fit. You can try and guess the number of periods by manual plotting and try and optimize it as your 6th parameter.
More userfriendly to us is the function curvefit. Here an example:
import numpy as np
from scipy.optimize import curve_fit
import pylab as plt
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
data = 3.0*np.sin(t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise
guess_freq = 1
guess_amplitude = 3*np.std(data)/(2**0.5)
guess_phase = 0
guess_offset = np.mean(data)
p0=[guess_freq, guess_amplitude,
guess_phase, guess_offset]
# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
return np.sin(x * freq + phase) * amplitude + offset
# now do the fit
fit = curve_fit(my_sin, t, data, p0=p0)
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)
# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()
The current methods to fit a sin curve to a given data set require a first guess of the parameters, followed by an interative process. This is a non-linear regression problem.
A different method consists in transforming the non-linear regression to a linear regression thanks to a convenient integral equation. Then, there is no need for initial guess and no need for iterative process : the fitting is directly obtained.
In case of the function y = a + r*sin(w*x+phi) or y=a+b*sin(w*x)+c*cos(w*x), see pages 35-36 of the paper "RĂ©gression sinusoidale" published on Scribd
In case of the function y = a + p*x + r*sin(w*x+phi) : pages 49-51 of the chapter "Mixed linear and sinusoidal regressions".
In case of more complicated functions, the general process is explained in the chapter "Generalized sinusoidal regression" pages 54-61, followed by a numerical example y = r*sin(w*x+phi)+(b/x)+c*ln(x), pages 62-63
All the above answers are based on curve fitting, and most use an iterative method - they all work very nicely, but I wanted to add a different approach using an FFT. Here, we transform the data, set all but the peak frequency to zero and then do the inverse transform. Note, that you probably want to remove the data mean (and detrend) before doing the FFT and then you can add those back in after.
import numpy as np
import pylab as plt
# fake data
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.05
data = 3.0*np.sin(f*t+0.001) + np.random.randn(N) # create artificial data with noise
# FFT...
mfft=np.fft.fft(data)
imax=np.argmax(np.absolute(mfft))
mask=np.zeros_like(mfft)
mask[[imax]]=1
mfft*=mask
fdata=np.fft.ifft(mfft)
plt.plot(t, data, '.')
plt.plot(t, fdata,'.', label='FFT')
plt.legend()
plt.show()
I can't seem to wrap my head around why the theoretical and simulated results are so different for the probablity distribution of a normal random variable squared. (e.g. the power of a Gaussian noise voltage signal)
I suspect I'm doing something wrong and wanted to ask, if anyone could help with this.
Here is the code explaining what I'm trying to do:
import numpy as np
from scipy.integrate import quad, simps
from matplotlib import pyplot as plt
def PDF(x, sigma=1, mu=0): # Gaussian normal distribution PDF
return 1/(np.sqrt(2*np.pi*sigma))*np.exp(-1/(2*sigma**2)*(x-mu)**2)
def PDFu(u, u_rms=1, u_mean=0):
return PDF(u, sigma=u_rms, mu=u_mean)
def PDFP(P):
return 2*PDFu(np.sqrt(P)) # substitute the input variable with the 'scaled' one
def probDensity(x, nbins): # calculate the probability density based on the input samples
distr, bins = np.histogram(x, nbins) # similar to plt.hist(density=True)
binWidth = bins[1]-bins[0]
binCenters = bins[:-1]+binWidth/2
return distr/len(x)/binWidth, binCenters
npoints = 100000
rms = 1
u = np.random.normal(0, rms, npoints) # samples with Gaussian normal distribution
P = u**2 # square of the samples with Gaussian normal distribution - should follow chi-squared distribution?
nbins = 500
u_distr, u_bins = probDensity(u, nbins) # calculate PDF based on the samples
print('U_distr integral = ', simps(u_distr,u_bins)) # integrate the calculated PDF, should be 1
plt.plot(u_bins, u_distr)
us = np.linspace(-10, 10, 500)
PDFu_u = PDFu(us) # calculate the theoretical PDF
print('PDFu_u integral = ', quad(PDFu, -np.Inf, np.Inf)) # integral of the theoretical PDF, should be 1
plt.plot(us, PDFu_u)
nbins = 1000
P_distr, P_bins = probDensity(P, nbins) # calculate PDF based on the samples
print('P_distr integral = ', simps(P_distr, P_bins)) # integrate the calculated PDF, should be 1
plt.plot(P_bins, P_distr)
Ps = np.linspace(0, 8, npoints)
PDFP_P = PDFP(Ps) # calculate the theoretical PDF
plt.plot(Ps, PDFP_P)
print('PDFP_P integral = ', quad(PDFP, 0, np.Inf)) # integral of the theoretical PDF, should be 1
plt.show()
The theroetical and the simulated probablity distribution of the normal random variable (u) seem to match nicely, I use this as a sanity check. But the difference is substantial in case of the squared variable and I can't understand why and how to get them to match. Btw, I tried various plausible scaling factors for the theoretical distribution (e.g. 0.5, 2, sqrt(2)), but it did not work and I don't see why I would even need it. Shouldn't it work with just substituting 'P' with 'u' according to the formula u=sqrt(P*R) [R=1] and using the normal distribution of 'u' to calculate the PDF value for certain 'P's?
I trust the simulated distribution a little more and I am wondering how the theoretical one should be properly calculated. Why doesn't the substituition method work?
Thank you for the help in advance!
Your theoretical density for the square of a Gaussian is wrong. Here is the calc. If X is Gaussian then for the CDF $F$ of the squared variable $Y=X^2$ we have
$$
F(x) = P(Y<x) = P(X^2 <x) = P(-\sqrt{x} < X < \sqrt{x}) = \Phi(\sqrt{x}) - \Phi(-\sqrt{x})
$$
where $\Phi$ is the Gaussian CDF
so for the PDF $f(x)$ of $Y$ we differentiate that and we get
$$
f(x) = F'(x) = (1/(2\sqrt{x})) \Psi'(\sqrt{x}) + (1/(2\sqrt{x})) \Psi'(-\sqrt{x}) = (1/(2\sqrt{x})) (\psi(\sqrt{x}) + \psi(-\sqrt{x})
$$
where $\psi$ is the Gaussian PDF
so at the very least you are missing the term $(1/(2\sqrt{x}))$
Here is an image of the formulas if it helps
For reference, here is the code with the corrected PDF, based on piterbarg's answer. Thanks again!
import numpy as np
from scipy.integrate import quad, simps
from matplotlib import pyplot as plt
def PDF(x, sigma=1, mu=0): # Gaussian normal distribution PDF
return 1/(np.sqrt(2*np.pi*sigma))*np.exp(-1/(2*sigma**2)*(x-mu)**2)
def PDFu(u, u_rms=1, u_mean=0):
return PDF(u, sigma=u_rms, mu=u_mean)
def PDFP(P):
return 1/(2*np.sqrt(P))*2*PDFu(np.sqrt(P)) # substitute the input variable with the 'scaled' one
def probDensity(x, nbins): # calculate the probability density based on the input samples
distr, bins = np.histogram(x, nbins) # similar to plt.hist(density=True)
binWidth = bins[1]-bins[0]
binCenters = bins[:-1]+binWidth/2
return distr/len(x)/binWidth, binCenters
npoints = 100000
rms = 1
u = np.random.normal(0, rms, npoints) # samples with Gaussian normal distribution
P = u**2 # square of the samples with Gaussian normal distribution - should follow chi-squared distribution?
nbins = 500
u_distr, u_bins = probDensity(u, nbins) # calculate PDF based on the samples
print('U_distr integral = ', simps(u_distr,u_bins)) # integrate the calculated PDF, should be 1
plt.plot(u_bins, u_distr)
us = np.linspace(-10, 10, 500)
PDFu_u = PDFu(us) # calculate the theoretical PDF
print('PDFu_u integral = ', quad(PDFu, -np.Inf, np.Inf)) # integral of the theoretical PDF, should be 1
plt.plot(us, PDFu_u)
nbins = 1000
P_distr, P_bins = probDensity(P, nbins) # calculate PDF based on the samples
print('P_distr integral = ', simps(P_distr, P_bins)) # integrate the calculated PDF, should be 1
plt.plot(P_bins, P_distr)
Ps = np.linspace(0, 8, npoints)
PDFP_P = PDFP(Ps) # calculate the theoretical PDF
plt.plot(Ps, PDFP_P)
print('PDFP_P integral = ', quad(PDFP, 0, np.Inf)) # integral of the theoretical PDF, should be 1
plt.show()
I have a time series of 3-hourly temperature data that I have analyzed and found the power spectrum for using Fourier analysis.
data = np.genfromtxt('H:/RData/3hr_obs.txt',
skip_header=3)
step = data[:,0]
t = data[:,1]
y = data[:,2]
freq = 0.125
yps = np.abs(np.fft.fft(y))**2
yfreqs = np.fft.fftfreq(y.size, freq)
y_idx = np.argsort(yfreqs)
fig = plt.figure(figsize=(14,10))
ax = fig.add_subplot(111)
ax.semilogy(yfreqs[y_idx],yps[y_idx])
ax.set_ylim(1e-3,1e8)
Original Data:
Frequency Spectrum:
Power Spectrum:
Now that I know that the signal is strongest at frequencies of 1 and 2, I want to create a filter (non-boxcar) that can smooth out the data to keep those dominant frequencies.
Is there a specific numpy or scipy function that can do this? Will this be something that will have to be created outside the main packages?
An example with some synthetic data:
# fourier filter example (1D)
%matplotlib inline
import matplotlib.pyplot as p
import numpy as np
# make up a noisy signal
dt=0.01
t= np.arange(0,5,dt)
f1,f2= 5, 20 #Hz
n=t.size
s0= 0.2*np.sin(2*np.pi*f1*t)+ 0.15 * np.sin(2*np.pi*f2*t)
sr= np.random.rand(np.size(t))
s=s0+sr
#fft
s-= s.mean() # remove DC (spectrum easier to look at)
fr=np.fft.fftfreq(n,dt) # a nice helper function to get the frequencies
fou=np.fft.fft(s)
#make up a narrow bandpass with a Gaussian
df=0.1
gpl= np.exp(- ((fr-f1)/(2*df))**2)+ np.exp(- ((fr-f2)/(2*df))**2) # pos. frequencies
gmn= np.exp(- ((fr+f1)/(2*df))**2)+ np.exp(- ((fr+f2)/(2*df))**2) # neg. frequencies
g=gpl+gmn
filt=fou*g #filtered spectrum = spectrum * bandpass
#ifft
s2=np.fft.ifft(filt)
p.figure(figsize=(12,8))
p.subplot(511)
p.plot(t,s0)
p.title('data w/o noise')
p.subplot(512)
p.plot(t,s)
p.title('data w/ noise')
p.subplot(513)
p.plot(np.fft.fftshift(fr) ,np.fft.fftshift(np.abs(fou) ) )
p.title('spectrum of noisy data')
p.subplot(514)
p.plot(fr,g*50, 'r')
p.plot(fr,np.abs(filt))
p.title('filter (red) + filtered spectrum')
p.subplot(515)
p.plot(t,np.real(s2))
p.title('filtered time data')
I am trying to show that economies follow a relatively sinusoidal growth pattern. I am building a python simulation to show that even when we let some degree of randomness take hold, we can still produce something relatively sinusoidal.
I am happy with the data I'm producing, but now I'd like to find some way to get a sine graph that pretty closely matches the data. I know you can do polynomial fit, but can you do sine fit?
Here is a parameter-free fitting function fit_sin() that does not require manual guess of frequency:
import numpy, scipy.optimize
def fit_sin(tt, yy):
'''Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc"'''
tt = numpy.array(tt)
yy = numpy.array(yy)
ff = numpy.fft.fftfreq(len(tt), (tt[1]-tt[0])) # assume uniform spacing
Fyy = abs(numpy.fft.fft(yy))
guess_freq = abs(ff[numpy.argmax(Fyy[1:])+1]) # excluding the zero frequency "peak", which is related to offset
guess_amp = numpy.std(yy) * 2.**0.5
guess_offset = numpy.mean(yy)
guess = numpy.array([guess_amp, 2.*numpy.pi*guess_freq, 0., guess_offset])
def sinfunc(t, A, w, p, c): return A * numpy.sin(w*t + p) + c
popt, pcov = scipy.optimize.curve_fit(sinfunc, tt, yy, p0=guess)
A, w, p, c = popt
f = w/(2.*numpy.pi)
fitfunc = lambda t: A * numpy.sin(w*t + p) + c
return {"amp": A, "omega": w, "phase": p, "offset": c, "freq": f, "period": 1./f, "fitfunc": fitfunc, "maxcov": numpy.max(pcov), "rawres": (guess,popt,pcov)}
The initial frequency guess is given by the peak frequency in the frequency domain using FFT. The fitting result is almost perfect assuming there is only one dominant frequency (other than the zero frequency peak).
import pylab as plt
N, amp, omega, phase, offset, noise = 500, 1., 2., .5, 4., 3
#N, amp, omega, phase, offset, noise = 50, 1., .4, .5, 4., .2
#N, amp, omega, phase, offset, noise = 200, 1., 20, .5, 4., 1
tt = numpy.linspace(0, 10, N)
tt2 = numpy.linspace(0, 10, 10*N)
yy = amp*numpy.sin(omega*tt + phase) + offset
yynoise = yy + noise*(numpy.random.random(len(tt))-0.5)
res = fit_sin(tt, yynoise)
print( "Amplitude=%(amp)s, Angular freq.=%(omega)s, phase=%(phase)s, offset=%(offset)s, Max. Cov.=%(maxcov)s" % res )
plt.plot(tt, yy, "-k", label="y", linewidth=2)
plt.plot(tt, yynoise, "ok", label="y with noise")
plt.plot(tt2, res["fitfunc"](tt2), "r-", label="y fit curve", linewidth=2)
plt.legend(loc="best")
plt.show()
The result is good even with high noise:
Amplitude=1.00660540618, Angular freq.=2.03370472482, phase=0.360276844224, offset=3.95747467506, Max. Cov.=0.0122923578658
You can use the least-square optimization function in scipy to fit any arbitrary function to another. In case of fitting a sin function, the 3 parameters to fit are the offset ('a'), amplitude ('b') and the phase ('c').
As long as you provide a reasonable first guess of the parameters, the optimization should converge well.Fortunately for a sine function, first estimates of 2 of these are easy: the offset can be estimated by taking the mean of the data and the amplitude via the RMS (3*standard deviation/sqrt(2)).
Note: as a later edit, frequency fitting has also been added. This does not work very well (can lead to extremely poor fits). Thus, use at your discretion, my advise would be to not use frequency fitting unless frequency error is smaller than a few percent.
This leads to the following code:
import numpy as np
from scipy.optimize import leastsq
import pylab as plt
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.15247 # Optional!! Advised not to use
data = 3.0*np.sin(f*t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise
guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)/(2**0.5)
guess_phase = 0
guess_freq = 1
guess_amp = 1
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_std*np.sin(t+guess_phase) + guess_mean
# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*np.sin(x[1]*t+x[2]) + x[3] - data
est_amp, est_freq, est_phase, est_mean = leastsq(optimize_func, [guess_amp, guess_freq, guess_phase, guess_mean])[0]
# recreate the fitted curve using the optimized parameters
data_fit = est_amp*np.sin(est_freq*t+est_phase) + est_mean
# recreate the fitted curve using the optimized parameters
fine_t = np.arange(0,max(t),0.1)
data_fit=est_amp*np.sin(est_freq*fine_t+est_phase)+est_mean
plt.plot(t, data, '.')
plt.plot(t, data_first_guess, label='first guess')
plt.plot(fine_t, data_fit, label='after fitting')
plt.legend()
plt.show()
Edit: I assumed that you know the number of periods in the sine-wave. If you don't, it's somewhat trickier to fit. You can try and guess the number of periods by manual plotting and try and optimize it as your 6th parameter.
More userfriendly to us is the function curvefit. Here an example:
import numpy as np
from scipy.optimize import curve_fit
import pylab as plt
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
data = 3.0*np.sin(t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise
guess_freq = 1
guess_amplitude = 3*np.std(data)/(2**0.5)
guess_phase = 0
guess_offset = np.mean(data)
p0=[guess_freq, guess_amplitude,
guess_phase, guess_offset]
# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
return np.sin(x * freq + phase) * amplitude + offset
# now do the fit
fit = curve_fit(my_sin, t, data, p0=p0)
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)
# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()
The current methods to fit a sin curve to a given data set require a first guess of the parameters, followed by an interative process. This is a non-linear regression problem.
A different method consists in transforming the non-linear regression to a linear regression thanks to a convenient integral equation. Then, there is no need for initial guess and no need for iterative process : the fitting is directly obtained.
In case of the function y = a + r*sin(w*x+phi) or y=a+b*sin(w*x)+c*cos(w*x), see pages 35-36 of the paper "RĂ©gression sinusoidale" published on Scribd
In case of the function y = a + p*x + r*sin(w*x+phi) : pages 49-51 of the chapter "Mixed linear and sinusoidal regressions".
In case of more complicated functions, the general process is explained in the chapter "Generalized sinusoidal regression" pages 54-61, followed by a numerical example y = r*sin(w*x+phi)+(b/x)+c*ln(x), pages 62-63
All the above answers are based on curve fitting, and most use an iterative method - they all work very nicely, but I wanted to add a different approach using an FFT. Here, we transform the data, set all but the peak frequency to zero and then do the inverse transform. Note, that you probably want to remove the data mean (and detrend) before doing the FFT and then you can add those back in after.
import numpy as np
import pylab as plt
# fake data
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.05
data = 3.0*np.sin(f*t+0.001) + np.random.randn(N) # create artificial data with noise
# FFT...
mfft=np.fft.fft(data)
imax=np.argmax(np.absolute(mfft))
mask=np.zeros_like(mfft)
mask[[imax]]=1
mfft*=mask
fdata=np.fft.ifft(mfft)
plt.plot(t, data, '.')
plt.plot(t, fdata,'.', label='FFT')
plt.legend()
plt.show()