I'm comparing two irregulary spaced time series with tslearn implementation of DTW. As both time-series are very irregulary sampled and their sampling isn't correlated, I would like to use Sakoe-Chiba radius to constrain range of compared observation to one hour (for example), if I would have regularly sampled time series in, let say, 1 minute intervals I would use Sakoe-Chiba radius equal to 60, but I don't have such data, it should exists more natural solution then data manipulation (interpolation to 1 minute time interval) for example variable Sakoe-Chiba radius (each observation have different S-Ch radius, precomputed to obtain equivalent of 1 hour constrain), is there reasons that would be computationally inefficient in comparison to constant S-Ch radius ?
Related
I have 2 sampled sine waves obtained as a measurement from a DSO. The sampling rate of the DSO is 160 GSa/s and my signal is 60 GHz. I need to find the phase difference between the two sine waves. Both are the same frequency. However, the sampling rate is not enough to accurately determine the phase. Is there any way to interpolate the measured signal to get a better sine wave and then calculate the phase difference?
You may fit to sine functions, but for the phase difference (delta phi=2pi frequency delta t) it would be sufficient to detect and compare the zero-crossings (respective a possible constant offset), which may be found from a segment of your series by an interpolation like
w=6.38 # some radian frequency
t = np.linspace(0, 0.5) # time interval containing ONE zero-crossing
delta_phi=0.1 # some phase difference
x = np.sin(w*t-delta_phi) # x(t)
f = interpolate.interp1d(x, t) # interpolate t(x), default is linear
delta_t = f(0) # zero-crossing time referred to t=0
delta_phi_detected= w*delta_t
You need to relate two adjacent zero-crossings of your signals.
Alternatively, you may obtain an average value by multiplication of both signals and numerical integration over time T which converges to (T/2)cos(delta_phi), if both signals have (or are made to) zero mean value.
From say N=1000 voltage samples at 1 ms sample rate. I need to find precisely with python/numpy the amplitude and angle of the fundamental, which is between 45 and 55 Hz as well as any side bands, that may exist.
Do I need a phase lock loop to do that, or can it be done without ?
Your measurement frequency is fundamentally too low and should be more than double your expected event frequency!
measured: 0.025s
event range: 0.0182-0.0222s
more here: https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem
A phase lock loop is be a reasonable approach to estimate the frequency etc. of the fundamental. Supposing you have collected the N samples up front, another way to do the analysis is:
Apply a window, like np.hanning(N), by multiplying it pointwise with the samples.
Compute the spectrum with np.fft.rfft. For a sampling interval of Ts seconds, the nth element of the resulting array is the DFT coefficient for frequency n * N * Ts in units of Hz (or for the values N=1000, Ts=0.001, simple n Hz).
Find the bin with the peak magnitude between 45 and 55. The location of the peak gives the frequency of the fundamental. You can interpolate a polynomial (np.polyfit) across a few neighboring bins and find its peak to get a more precise estimate. Magnitude and complex phase of the peak give the amplitude and phase (angle) of the fundamental.
Plot the magnitude of the spectrum to look for sidebands.
I have three numpy arrays where one column is the time stamp (Unix time to the millisecond as an integer) and the other column is a reading from a sensor (integer). Each of these three arrays occurs simultaneously in time (ie, the span of the time column is roughly the same), however they are sampled at different frequencies (one is 500 Hz, others 125 Hz). The final array should be (n,4) with columns [time, array1,array2,array3].
500.0 Hz Example (only the head, these are multiple minutes long)
array([[1463505325032, 196],
[1463505325034, 197],
[1463505325036, 197],
[1463505325038, 195]])
125.0 Hz Example (only the head, these are multiple minutes long)
array([[1463505287912, -5796],
[1463505287920, -5858],
[1463505287928, -5920],
[1463505287936, -5968]])
Currently, my initial plan has been as follows but performance isn't amazing:
Find the earliest start time (b/c of different frequencies and system issues, they do not exactly all start at the same millisecond)
Create a new array with a time column that starts at the earliest time and runs as long as the longest of the three arrays. Fill the time column to the desired common frequency using np.linspace/np.arange
Loop over the three arrays, using np.interp or similar to convert to common frequency, and then stack the output onto the common numpy array created above
I have tens of thousands of these intervals and they can be multiple days long, so hoping for something that is reasonably quick and memory efficient. Thank you!
You'll have to interpolate the 125 Hz signal to get 500 Hz. It depends on what quality of the interpolation you need. For linear interpolation, scipy.signal.interp1d in linear-interpolation mode is a bit slow, O(log n) for n data points, because it does a bisection search for every evaluation. The calculation time explodes if you ask it to do a smooth interpolation on a large dataset, because that involves solving a system of 3n equations with 3n unknowns.
If your sampling rates have an integer ratio (1:4 in your example), you can do linear interpolation more efficiently like this:
# interpolate a125 to a500
n = len(a125)
a500 = np.zeros((n-1)*4+1)
a500[0::4] = a125
a500[1::4] = 0.75*a125[:-1] + 0.25*a125[1:]
a500[2::4] = 0.5*a125[:-1] + 0.5*a125[1:]
a500[3::4] = 0.25*a125[:-1] + 0.75*a125[1:]
If you need smooth interpolation, use scipy.signal.resample. This being a Fourier method will require careful handling of the end points of your time series; you need to pad it with data that makes a gradual transition from the end point back to the start point:
from scipy.signal import resample
m = n//8
padding = np.linspace(a125[-1], a125[0], m)
a125_pad = np.concatenate([a125, padding])
a500b = resample(a125_pad, (n+m)*4)[:4*n]
Depending on the nature of your data, it might be better to have a continuous derivative at the end points.
Note that the FFT that is used for the resampling likes to have an array size that is a product of small prime numbers (2, 3, 5, 7). Choose the padding size (m) wisely.
The dataset of 921rows x 10166columns is used to prediction bacteria plate count based on water temperature. Each row is an observation(first 10080 columns being the time series of water temperature and the remaining 2 columns being y labels- 1 means high bacteria count, 0 means low bacteria count).
There is fluctuation in the temperature for each activation. For the rest of the time, water temperature would remain constant at 25°C. Since there are too many features in the time series, I am thinking about extracting some relevant features from the time series data, such as the first 3 lowest frequency values or amplitude of the time series using fftor ifftetc fromscipy.fftpack, then fit into a logistics regression model. However, due to limited background knowledge in waves/signal, I am confused about a few things:
1)Does applying fft on the time series produce an array of numbers of the frequencies of the time series data? If not, which function should I use instead?
2)I've done forward fill to my time series data(ie. data points are spaced at fixed time intervals) and the number of data for each time series is the same. If 1) is correct, will the number of frequencies returned for different time series be the same?
Below is a basic visualisation of my original data.
Any help is appreciated. Thank you.
I am analyzing a time-series dataset that I am pretty sure can be broken down using fft. I want to develop a model to estimate the data using a sum of sin/cos but I am having trouble with the syntax to find the frequencies in python
Here is a graph of the data
data graph
And here's a link to the original data: https://drive.google.com/open?id=1mqZtQ-txdd_AFbKGBlbSL6903CK-_kXl
Most of the examples I have seen have multiple samples per second/time period, however the data in this set represent by-minute observations of some metric. Because of this, I've had trouble translating the answers online to this problem
Here's my naive first approach
X = fftpack.fft(data)
freqs = fftpack.fftfreq(len(data))
plt.plot(freqs, np.abs(X))
plt.show()
Instead of peaking at the major frequencies, my plot only has one peak at 0.
result
The FFT you posted has been shifted so that 0 is at the center. Data to the left of the center represents negative frequencies and to the right represents positive frequencies. If you zoom in and look more closely, I think you will see that there are two peaks close to the center that you are interpreting as a single peak at 0. Just looking at the positive side, the location of this peak will tell you which frequency is contributing significant signal power.
Like you said, your x-axis is probably incorrect. scipy.fftpack.fftfreq needs to know the time between samples (in seconds, I think) of your time-domain signal to correctly determine the bandwidth and create the x-axis array in Hz. This should do it:
dt = 60 # 60 seconds between samples
freqs = fftpack.fftfreq(len(data),dt)