I'd like to reconstruct a signal for which I have some data and used LombScargle to obtain it's frequency components.
My data is in addition as the following:
r = np.array([119.75024144, 119.77177673, 119.79671626, 119.81566188,
119.81291201, 119.71610143, 119.24156708, 117.66932347,
114.22145178, 109.27266933, 104.57675147, 101.63381325,
100.42623807, 100.09436745, 100.02798438, 100.02696846,
100.05422613, 100.12216521, 100.27569606, 100.60962812,
101.32023289, 102.71102637, 105.01826819, 108.17052642,
111.67848758, 114.78442424, 116.95337537, 118.19437002,
118.84307457, 119.19571404, 119.40326818, 119.53101551,
119.61170874, 119.66610072, 119.68315253, 119.53757829,
118.83748609, 116.90425868, 113.32095843, 108.72465638,
104.58292906, 101.93316248, 100.68856962, 100.22523098,
100.08558767, 100.07194691, 100.11193397, 100.19142891,
100.33208922, 100.5849306 , 101.04224415, 101.87565882,
103.33985519, 105.63631456, 108.64972952, 111.86837667,
114.67115037, 116.69548163, 117.96207449, 118.69589499,
119.11781077, 119.36770681, 119.51566311, 119.59301667])
z = np.array ([-422.05230434, -408.98182253, -395.78387843, -382.43143962,
-368.92341485, -355.26851343, -341.47780372, -327.56493425,
-313.54536462, -299.43740189, -285.26768576, -271.07676026,
-256.92098157, -242.86416227, -228.95449427, -215.207069 ,
-201.61590575, -188.17719265, -174.89201262, -161.75452196,
-148.74812279, -135.85126854, -123.04093538, -110.29151714,
-97.57502515, -84.86119278, -72.1145478 , -59.2947726 ,
-46.36450604, -33.29821629, -20.08471733, -6.72030326,
6.80047849, 20.48309726, 34.32320864, 48.30267819,
62.393214 , 76.56022602, 90.76260159, 104.94787451,
119.04731699, 132.98616969, 146.71491239, 160.23436159,
173.58582543, 186.81849059, 199.96724955, 213.05229133,
226.08870416, 239.09310452, 252.08377421, 265.0769367 ,
278.08234368, 291.10215472, 304.13509998, 317.18351924,
330.25976991, 343.38777732, 356.59626164, 369.90725571,
383.33109354, 396.87227086, 410.5309987 , 424.2899438])
plt.plot(z,r, label='data');plt.legend()
Afterwards, I use the LobmbScargle on this data:
f, a = LombScargle(z, r).autopower()
plt.plot(f, a, label='frequency components');plt.legend()
Similar to a Fourier series, I would like to reconstruct the signal by a sum of sines or cosines.
Where I am mainly interested in finding a_i and w_i values.
Which I do like the following but my reconstruction does not look like the signal for which I had data.
s = 0
for i in range(f.shape[0]):
s += a[i]*np.sin(f[i]*z)
plt.plot(z, s, label='reconstructed signal');plt.legend()
Either there is a mistake in the way I am using Lomb-Scargle or in the signal reconstruction part but I haven't figured out what.
Related
I have been struggling for apparently no reason trying to fit a sin function to a small dataset that resembles a sinusoid. I've looked at many other questions and tried different libraries and can't seem to find any glaring mistake in my code. Also in many answers people are fitting a function onto data where y = f(x); but I'm retrieving both of my lists independently from stellar spectra.
These are the lists for reference:
time = np.array([2454294.5084288 , 2454298.37039515, 2454298.6022165 ,
2454299.34790096, 2454299.60750029, 2454300.35176022,
2454300.61361622, 2454301.36130122, 2454301.57111912,
2454301.57540159, 2454301.57978822, 2454301.5842906 ,
2454301.58873511, 2454302.38635047, 2454302.59553152,
2454303.41548415, 2454303.56765036, 2454303.61479213,
2454304.38528718, 2454305.54043812, 2454306.36761011,
2454306.58025083, 2454306.60772791, 2454307.36686591,
2454307.49460991, 2454307.58258509, 2454308.3698358 ,
2454308.59468672, 2454309.40004997, 2454309.51208756,
2454310.43078368, 2454310.6091061 , 2454311.40121502,
2454311.5702085 , 2454312.39758274, 2454312.54580053,
2454313.52984047, 2454313.61734047, 2454314.37609003,
2454315.56721061, 2454316.39218499, 2454316.5672538 ,
2454317.49410168, 2454317.6280825 , 2454318.32944441,
2454318.56913047])
velocities = np.array([-2.08468951, -2.26117398, -2.44703149, -2.10149768, -2.09835213,
-2.20540079, -2.4221183 , -2.1394637 , -2.0841663 , -2.2458154 ,
-2.06177386, -2.47993416, -2.13462117, -2.26602791, -2.47359571,
-2.19834895, -2.17976339, -2.37745005, -2.48849617, -2.15875901,
-2.27674409, -2.39054554, -2.34029665, -2.09267843, -2.20338104,
-2.49483926, -2.08860222, -2.26816951, -2.08516229, -2.34925637,
-2.09381667, -2.21849357, -2.43438148, -2.28439031, -2.43506056,
-2.16953358, -2.24405359, -2.10093237, -2.33155007, -2.37739938,
-2.42468714, -2.19635302, -2.368558 , -2.45959665, -2.13392004,
-2.25268181]
These are radial velocities of a star observed at different times. When plotted they look like this:
Plotted Data
This is then the code I'm using to fit a test sine on the data:
x = time
y = velocities
def sin_fit(x, A, w):
return A * np.sin(w * x)
popt, pcov = curve_fit(sin_fit,x,y) #try to calculate exoplanet parameters with these data
xfit = np.arange(min(x),max(x),0.1)
fit = sin_fit(xfit,*popt)
mod = plt.figure()
plt.xlabel("Time (G. Days)")
plt.ylabel("Radial Velocity")
plt.scatter(x,[i for i in y],color="b",label="Data")
plt.plot(x,[i for i in y],color="b",alpha=0.2)
plt.plot(xfit,fit,color="r",label="Model Fit")
plt.legend()
mod.savefig("Data with sin fit.png")
plt.show()
I thought this was right, and it seems right by looking at other answers, but then this is what I get:
Data with model sine
What am I doing wrong?
Thank you in advanceee
I guess it's due the sin_fit function is not able to fit the data at all. The sin function per default whirls around y=0 while your data whirls somewhere around y=-2.3.
I tried your code and extended the sin_fit with an offset, yielding way better results (althought looking not too perfect):
def sin_fit(x, A, w, offset):
return A * np.sin(w * x) + offset
with this the function has at least a chance to fit
I have data points of time and voltage that create the curve shown below.
The time data is
array([ 0.10810811, 0.75675676, 1.62162162, 2.59459459,
3.56756757, 4.21621622, 4.97297297, 4.97297297,
4.97297297, 4.97297297, 4.97297297, 4.97297297,
4.97297297, 4.97297297, 5.08108108, 5.18918919,
5.2972973 , 5.51351351, 5.72972973, 5.94594595,
6.27027027, 6.59459459, 7.13513514, 7.67567568,
8.32432432, 9.18918919, 10.05405405, 10.91891892,
11.78378378, 12.64864865, 13.51351351, 14.37837838,
15.35135135, 16.32432432, 17.08108108, 18.16216216,
19.02702703, 20. , 20. , 20. ,
20. , 20. , 20. , 20. ,
20.10810811, 20.21621622, 20.43243243, 20.64864865,
20.97297297, 21.40540541, 22.05405405, 22.91891892,
23.78378378, 24.86486486, 25.83783784, 26.7027027 ,
27.56756757, 28.54054054, 29.51351351, 30.48648649,
31.56756757, 32.64864865, 33.62162162, 34.59459459,
35.67567568, 36.64864865, 37.62162162, 38.59459459,
39.67567568, 40.75675676, 41.83783784, 42.81081081,
43.89189189, 44.97297297, 46.05405405, 47.02702703,
48.10810811, 49.18918919, 50.27027027, 51.35135135,
52.43243243, 53.51351351, 54.48648649, 55.56756757,
56.75675676, 57.72972973, 58.81081081, 59.89189189])
and the volts data is
array([ 4.11041056, 4.11041056, 4.11041056, 4.11041056, 4.11041056,
4.11041056, 4.11041056, 4.10454545, 4.09794721, 4.09208211,
4.08621701, 4.07961877, 4.07228739, 4.06568915, 4.05909091,
4.05175953, 4.04516129, 4.03782991, 4.03123167, 4.02463343,
4.01803519, 4.01217009, 4.00557185, 3.99970674, 3.99384164,
3.98797654, 3.98284457, 3.97771261, 3.97331378, 3.96891496,
3.96451613, 3.96085044, 3.95645161, 3.95205279, 3.9483871 ,
3.94398827, 3.94032258, 3.93665689, 3.94325513, 3.94985337,
3.95645161, 3.96378299, 3.97038123, 3.97624633, 3.98284457,
3.98944282, 3.99604106, 4.0026393 , 4.00923754, 4.01510264,
4.02096774, 4.02609971, 4.02903226, 4.03196481, 4.03416422,
4.0356305 , 4.03709677, 4.03856305, 4.03929619, 4.04002933,
4.04076246, 4.04222874, 4.04296188, 4.04296188, 4.04369501,
4.04442815, 4.04516129, 4.04516129, 4.04589443, 4.04589443,
4.04662757, 4.04662757, 4.0473607 , 4.0473607 , 4.04809384,
4.04809384, 4.04809384, 4.04882698, 4.04882698, 4.04882698,
4.04956012, 4.04956012, 4.04956012, 4.04956012, 4.05029326,
4.05029326, 4.05029326, 4.05029326])
I would like to determine the location of the points labeled A, B, C, D, and E. Point A is the first location where the slope goes from zero to undefined. Point B is the location where the line is no longer vertical. Point C is the minimum of the curve. Point D is where the curve is no longer vertical. Point E is where the slope is close to zero again. The Python code below determines the locations for points A and C.
tdiff = np.diff(time)
vdiff = np.diff(volts)
# point A
idxA = np.where(vdiff < 0)[0][0]
timeA = time[idxA]
voltA = volts[idxA]
# point C
idxC = volts.idxmin()
timeC = time[idxC]
voltC = volts[idxC]
How can I determine the other locations on the curve represented by points B, D, and E?
You are looking for the points that mark any location where the slope changes to or from zero or infinity. We do not not actually need to compute slopes anywhere: either yn - yn-1 == 0 and yn+1 - yn != 0, or vice versa, or the same for x.
We can take the diff of x. If one of two successive elements is zero, then the diff of the diff will be the diff or the negative diff at that point. So we just want to find and label all points where diff(x) == diff(diff(x)) and diff(x) != 0, properly adjusted for differences in size between the arrays of course. We also want all the points where the same is true for y.
In numpy terms, this is can be written as follows
def masks(vec):
d = np.diff(vec)
dd = np.diff(d)
# Mask of locations where graph goes to vertical or horizontal, depending on vec
to_mask = ((d[:-1] != 0) & (d[:-1] == -dd))
# Mask of locations where graph comes from vertical or horizontal, depending on vec
from_mask = ((d[1:] != 0) & (d[1:] == dd))
return to_mask, from_mask
to_vert_mask, from_vert_mask = masks(time)
to_horiz_mask, from_horiz_mask = masks(volts)
Keep in mind that the masks are computed on second order differences, so they are two elements shorter than the inputs. Elements in the masks correspond to elements in the input arrays with a one-element border on the leading and trailing edge (hence the index [1:-1] below). You can convert the mask to indices using np.nonzero or you can get the x- and y-values directly using the masks as indices:
def apply_mask(mask, x, y):
return x[1:-1][mask], y[1:-1][mask]
to_vert_t, to_vert_v = apply_mask(to_vert_mask, time, volts)
from_vert_t, from_vert_v = apply_mask(from_vert_mask, time, volts)
to_horiz_t, to_horiz_v = apply_mask(to_horiz_mask, time, volts)
from_horiz_t, from_horiz_v = apply_mask(from_horiz_mask, time, volts)
plt.plot(time, volts, 'b-')
plt.plot(to_vert_t, to_vert_v, 'r^', label='Plot goes vertical')
plt.plot(from_vert_t, from_vert_v, 'kv', label='Plot stops being vertical')
plt.plot(to_horiz_t, to_horiz_v, 'r>', label='Plot goes horizontal')
plt.plot(from_horiz_t, from_horiz_v, 'k<', label='Plot stops being horizontal')
plt.legend()
plt.show()
Here is the resulting plot:
Notice that because the classification is done separately, "Point A" is correctly identified as being both a spot where verticalness starts and horizontalness ends. The problem is that "Point E" does not appear to be resolvable as such according to these criteria. Zooming in shows that all of the proliferated points correctly identify horizontal line segments:
You could choose a "correct" version of "Point E" by discarding from_horiz completely, and only the last value from to_horiz:
to_horiz_t, to_horiz_v = apply_mask(to_horiz_mask, time, volts)
to_horiz_t, to_horiz_v = to_horiz_t[-1], to_horiz_v[-1]
plt.plot(time, volts, 'b-')
plt.plot(*apply_mask(to_vert_mask, time, volts), 'r^', label='Plot goes vertical')
plt.plot(*apply_mask(from_vert_mask, time, volts), 'kv', label='Plot stops being vertical')
plt.plot(to_horiz_t, to_horiz_v, 'r>', label='Plot goes horizontal')
plt.legend()
plt.show()
I am using this as a showcase for the star expansion of the results of apply_mask. The resulting plot is:
This is pretty much exactly the plot you were looking for. Discarding from_horiz also makes "Point A" be identified only as a drop to vertical, which is nice.
As multiple values in to_horiz show, this method is very sensitive to noise within the data. Your data is quite smooth, but this approach is unlikely to ever work with raw unfiltered measurements.
I am trying to compute S3x3 moving averages, using asymmetric weights, as described in this MatLab example and I am unsure if my interpretation of the following is correct when translating from MatLab:
Have I set up my matrices in the same way?
Does scipy.signal.convolve2d do the same as MatLab's conv2d in this instance?
Why is my fit so bad?!
In MatLab, the filter is given and applied as:
% S3x3 seasonal filter
% Symmetric weights
sW3 = [1/9;2/9;1/3;2/9;1/9];
% Asymmetric weights for end of series
aW3 = [.259 .407;.37 .407;.259 .185;.111 0];
% dat contains data - simplified adaptation from link above
ns = length(dat) ; first = 1:4 ; last = ns - 3:ns;
trend = conv(dat, sW3, 'same');
trend(1:2) = conv2(dat(first), 1, rot90(aW3,2), 'valid');
trend(ns-1:ns) = conv2(dat(last), 1, aW3, 'valid');
I have interpreted this in python using my own data, I have assumed in doing so that ; in MatLab matrices means new row and that a space means new column
import numpy as np
from scipy.signal import convolve2d
dat = np.array([0.02360784, 0.0227628 , 0.0386366 , 0.03338596, 0.03141621, 0.03430469])
dat = dat.reshape(dat.shape[0], 1) # in columns
sW3 = np.array([[1/9.],[2/9.],[1/3.],[2/9.],[1/9.]])
aW3 = np.array( [[ 0.259, 0.407],
[ 0.37 , 0.407],
[ 0.259, 0.185],
[ 0.111, 0. ]])
trend = convolve2d(dat, sW3, 'same')
trend[:2] = convolve2d(dat[:2], np.rot90(aW3,2), 'same')
trend[-2:] = convolve2d(dat[-2:], np.rot90(aW3,2), 'same')
Plotting the data, the fit is pretty bad...
import matplotlib.pyplot as plt
plt.plot(dat, 'grey', label='raw data', linewidth=4.)
plt.plot(trend, 'b--', label = 'S3x3 trend')
plt.legend()
plt.plot()
Solved.
It turned out that the issue was really to do with the subtle differences in MatLab's conv2d and scipy's convolve2d, from the docs:
C = conv2(h1,h2,A) first convolves each column of A with the vector h1 and then convolves each row of the result with the vector h2
This means that what is actually happening at the edges is incorrect in my code. It should actually be the sum of a the convolution of each endpoint and the relevant columns of aW3
e.g.
nwtrend = np.zeros(dat.shape)
nwtrend = convolve2d(dat, sW3, 'same')
for i in xrange(np.rot90(aW3, 2).shape[1]):
nwtrend[i] = np.convolve(dat[i,0], np.rot90(aW3, 2)[:,i], 'same').sum()
for i in xrange(aW3.shape[1]):
nwtrend[-i-1] = np.convolve(dat[-i-1,0], aW3[:,i], 'same').sum()
To get the output above
plt.plot(nwtrend, 'r', label='S3x3 new',linewidth=2.)
plt.plot(trend, 'b--', label='S3x3 old')
plt.legend(loc='lower centre')
plt.show()
I'm new to signal processing (and numpy, scipy, and matlab for that matter). I'm trying to estimate vowel formants with LPC in Python by adapting this matlab code:
http://www.mathworks.com/help/signal/ug/formant-estimation-with-lpc-coefficients.html
Here is my code so far:
#!/usr/bin/env python
import sys
import numpy
import wave
import math
from scipy.signal import lfilter, hamming
from scikits.talkbox import lpc
"""
Estimate formants using LPC.
"""
def get_formants(file_path):
# Read from file.
spf = wave.open(file_path, 'r') # http://www.linguistics.ucla.edu/people/hayes/103/Charts/VChart/ae.wav
# Get file as numpy array.
x = spf.readframes(-1)
x = numpy.fromstring(x, 'Int16')
# Get Hamming window.
N = len(x)
w = numpy.hamming(N)
# Apply window and high pass filter.
x1 = x * w
x1 = lfilter([1., -0.63], 1, x1)
# Get LPC.
A, e, k = lpc(x1, 8)
# Get roots.
rts = numpy.roots(A)
rts = [r for r in rts if numpy.imag(r) >= 0]
# Get angles.
angz = numpy.arctan2(numpy.imag(rts), numpy.real(rts))
# Get frequencies.
Fs = spf.getframerate()
frqs = sorted(angz * (Fs / (2 * math.pi)))
return frqs
print get_formants(sys.argv[1])
Using this file as input, my script returns this list:
[682.18960189917243, 1886.3054773107765, 3518.8326108511073, 6524.8112723782951]
I didn't even get to the last steps where they filter the frequencies by bandwidth because the frequencies in the list aren't right. According to Praat, I should get something like this (this is the formant listing for the middle of the vowel):
Time_s F1_Hz F2_Hz F3_Hz F4_Hz
0.164969 731.914588 1737.980346 2115.510104 3191.775838
What am I doing wrong?
Thanks very much
UPDATE:
I changed this
x1 = lfilter([1., -0.63], 1, x1)
to
x1 = lfilter([1], [1., 0.63], x1)
as per Warren Weckesser's suggestion and am now getting
[631.44354635609318, 1815.8629524985781, 3421.8288991389031, 6667.5030877036006]
I feel like I'm missing something since F3 is very off.
UPDATE 2:
I realized that the order being passed to scikits.talkbox.lpc was off due to a difference in sampling frequency. Changed it to:
Fs = spf.getframerate()
ncoeff = 2 + Fs / 1000
A, e, k = lpc(x1, ncoeff)
Now I'm getting:
[257.86573127888488, 774.59006835496086, 1769.4624576002402, 2386.7093679399809, 3282.387975973973, 4413.0428174593926, 6060.8150432549655, 6503.3090645887842, 7266.5069407315023]
Much closer to Praat's estimation!
The problem had to do with the order being passed to the lpc function. 2 + fs / 1000 where fs is the sampling frequency is the rule of thumb according to:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html
I have not been able to get the results you expect, but I do notice two things which might cause some differences:
Your code uses [1, -0.63] where the MATLAB code from the link you provided has [1 0.63].
Your processing is being applied to the entire x vector at once instead of smaller segments of it (see where the MATLAB code does this: x = mtlb(I0:Iend); ).
Hope that helps.
There are at least two problems:
According to the link, the "pre-emphasis filter is a highpass all-pole (AR(1)) filter". The signs of the coefficients given there are correct: [1, 0.63]. If you use [1, -0.63], you get a lowpass filter.
You have the first two arguments to scipy.signal.lfilter reversed.
So, try changing this:
x1 = lfilter([1., -0.63], 1, x1)
to this:
x1 = lfilter([1.], [1., 0.63], x1)
I haven't tried running your code yet, so I don't know if those are the only problems.
I am working on moving some code from IDL into python. One IDL call is to INT_TABULATE which performs integration on a fixed range.
The INT_TABULATED function integrates a tabulated set of data { xi , fi } on the closed interval [MIN(x) , MAX(x)], using a five-point Newton-Cotes integration formula.
Result = INT_TABULATED( X, F [, /DOUBLE] [, /SORT] )
Where result is the area under the curve.
IDL DOCS
My question is, does Numpy/SciPy offer a similar form of integration? I see that [scipy.integrate.newton_cotes] exists, but it appears to return "weights and error coefficient for Newton-Cotes integration instead of area".
Scipy does not provide such a high order integrator for tabulated data by default. The closest you have available without coding it yourself is scipy.integrate.simps, which uses a 3 point Newton-Cotes method.
If you simply want to get comparable integration precision, you could split your x and f arrays into 5 point chunks and integrate them one at a time, using the weights returned by scipy.integrate.newton_cotes doing something along the lines of:
def idl_tabulate(x, f, p=5) :
def newton_cotes(x, f) :
if x.shape[0] < 2 :
return 0
rn = (x.shape[0] - 1) * (x - x[0]) / (x[-1] - x[0])
weights = scipy.integrate.newton_cotes(rn)[0]
return (x[-1] - x[0]) / (x.shape[0] - 1) * np.dot(weights, f)
ret = 0
for idx in xrange(0, x.shape[0], p - 1) :
ret += newton_cotes(x[idx:idx + p], f[idx:idx + p])
return ret
This does 5-point Newton-Cotes on all intervals, except perhaps the last, where it will do a Newton-Cotes of the number of points remaining. Unfortunately, this will not give you the same results as IDL_TABULATE because the internal methods are different:
Scipy calculates the weights for points not equally spaced using what seems like a least-sqaures fit, I don't fully understand what is going on, but the code is pure python, you can find it in your Scipy installation in file scipy\integrate\quadrature.py.
INT_TABULATED always performs 5-point Newton-Cotes on equispaced data. If the data are not equispaced, it builds an equispaced grid, using a cubic spline to interpolate the values at those points. You can check the code here.
For the example in the INT_TABULATED docstring, which is suppossed to return 1.6271 using the original code, and have an exact solution of 1.6405, the above function returns:
>>> x = np.array([0.0, 0.12, 0.22, 0.32, 0.36, 0.40, 0.44, 0.54, 0.64,
... 0.70, 0.80])
>>> f = np.array([0.200000, 1.30973, 1.30524, 1.74339, 2.07490, 2.45600,
... 2.84299, 3.50730, 3.18194, 2.36302, 0.231964])
>>> idl_tabulate(x, f)
1.641998154242472