Using fourier analysis for time series prediction

Using fourier analysis for time series prediction - python

For data that is known to have seasonal, or daily patterns I'd like to use fourier analysis be used to make predictions. After running fft on time series data, I obtain coefficients. How can I use these coefficients for prediction?
I believe FFT assumes all data it receives constitute one period, then, if I simply regenerate data using ifft, I am also regenerating the continuation of my function, so can I use these values for future values?
Simply put: I run fft for t=0,1,2,..10 then using ifft on coef, can I use regenerated time series for t=11,12,..20 ?

I'm aware that this question may be not actual for you anymore, but for others that are looking for answers I wrote a very simple example of fourier extrapolation in Python https://gist.github.com/tartakynov/83f3cd8f44208a1856ce
Before you run the script make sure that you have all dependencies installed (numpy, matplotlib). Feel free to experiment with it.
P.S. Locally Stationary Wavelet may be better than fourier extrapolation. LSW is commonly used in predicting time series. The main disadvantage of fourier extrapolation is that it just repeats your series with period N, where N - length of your time series.

It sounds like you want a combination of extrapolation and denoising.
You say you want to repeat the observed data over multiple periods. Well, then just repeat the observed data. No need for Fourier analysis.
But you also want to find "patterns". I assume that means finding the dominant frequency components in the observed data. Then yes, take the Fourier transform, preserve the largest coefficients, and eliminate the rest.
X = scipy.fft(x)
Y = scipy.zeros(len(X))
Y[important frequencies] = X[important frequencies]
As for periodic repetition: Let z = [x, x], i.e., two periods of the signal x. Then Z[2k] = X[k] for all k in {0, 1, ..., N-1}, and zeros otherwise.
Z = scipy.zeros(2*len(X))
Z[::2] = X

When you run an FFT on time series data, you transform it into the frequency domain. The coefficients multiply the terms in the series (sines and cosines or complex exponentials), each with a different frequency.
Extrapolation is always a dangerous thing, but you're welcome to try it. You're using past information to predict the future when you do this: "Predict tomorrow's weather by looking at today." Just be aware of the risks.
I'd recommend reading "Black Swan".

you can use the library that #tartakynov posted and, to not repeat exactly the same time series in the forcast (overfitting), you can add a new parameter to the function called n_param and fix a lower bound h for the amplitudes of the frequencies.
def fourierExtrapolation(x, n_predict,n_param):
usually you will find that, in a signal, there are some frequencies that have significantly higher amplitude than others, so, if you select this frequencies you will be able to isolate the periodic nature of the signal
you can add this two lines who are determinated by certain number n_param
h=np.sort(x_freqdom)[-n_param]
x_freqdom=[ x_freqdom[i] if np.absolute(x_freqdom[i])>=h else 0 for i in range(len(x_freqdom)) ]
just adding this you will be able to forecast nice and smooth
another useful article about FFt:
forecast FFt in R

Related

How to apply differential privacy on list of data?

How to apply differential privacy on a list of data.
OpenMined release a differential privacy project called PyDP 2 years ago.
On the examples provided, they showed how to compute the PyDP on the data by computing some statistical features such as the mean, Max, Median.
Is there a way to apply a differential privacy to the list of dataset, and get the list of data back, without computing any statistical feature yet ?
e.g. input_list = [1.03,2.23,3.058,4.97]
out_put_differential_privacy_list = dp_function(input_list)
out_put_differential_privacy_list
>> [1.01,2.03,3.8,4.04]
How is the noise added to the data (they use laplacian)?
Is the noise added taking into account the whole data set, or is it added considering each single value at a time ?
I couldn't fine the github code for pydp.algorithms.laplacian.
These are the statistical features they showed how to compute.
from pydp.algorithms.laplacian import (
BoundedSum,
BoundedMean,
BoundedStandardDeviation,
Count,
Max,
Min,
Median,
)
Are they also functions to compute differential privacy percentiles ?
Any other resources will also be welcome.

Here are my two cents on the question,
The idea of differential privacy is to publish aggregated information of sensitive values only if noise is added to the aggregated info. This will in terms make it infeasible to match sensitive values to their owners, and also make the dataset not highly dependent on any particular data from the dataset.
The way that noise is added to the differential privacy is by injecting Laplace noise into each pieces of data at a time which in terms will add noise to the overall dataset, the essential idea of DP would be the following:
A(D,f) = f(D) + noise
A = some randomised algorithm
This is to ensure that the result each time will be slightly different.
f = sensitivity of a function such that this sensitivity will be used to determine to what degree an individual piece of data will affect the output.
D = the dataset you want to 'mask', the overall thing, in your case it would be the list of numbers.
noise = Laplace noise, i.e. lambda = delta f/epsilon = 1/epsilon
The epsilon value here is sort of indicates the privacy loss on adding/removing an entry from the dataset, i.e. making adjusments to the dataset. The smaller the epsilon is, the less privacy loss on the adjustments made on the dataset, which means better protection for privacy.
And as you can see now, the noise are only dependent on the sentitivyt and epsilon value, and has nothing to do with the underlying dataset.
... they showed how to compute the PyDP on the data by computing some statistical features such as the mean, Max, Median.
lets say for example, say we have a bunch of numbers like you have here, we could first find the max number out of the list, which would be 4.97, then we can just try to draw the eta from Lap(4.7/epsilon). I believe the idea is to sort of anchor the data around some certain statistical feature.
Hope this is somewhat useful :)

Waveform frequency resolution for FFT – any way to increase it?

This article https://www.bitweenie.com/listings/fft-zero-padding/ gives a simple relation between time-length of the input data to the FFT and the minimum distance between two frequencies that can be distinguished in the FFT. The article calls this Waveform frequency resolution.
In other words; if two input-frequencies are closer in frequency than 1/time-length_of_input_data, they will show as only one peak in the FFT-plot.
My question is: is there a way to increase this Waveform frequency resolution? I am finding it difficult to work with rather short data-series due to this limitation.
As an example, if I use a combination of sine series with periods 9.5, 10, and 11 over 240 datapoints I cannot distinguish between the different frequencies.

To have good frequency resultion you need a long time series.
This is a fundamental issue, called uncertainty principle. It cannot be overcome within Fourier analysis (Fourier transform, DFT, short-time Fourier transform and so on).
Also note that zero padding will not overcome this issue.
It gives more points in the frequency domain, in the sense that the same spectral information is sampled more densely, but it will not make peaks sharper or more separated.
The only way to overcome the uncertainty principle is to make further assumptions on the data.
If for example you know that there is only a single frequency component, it is possible to determine its frequency more accurately than the uncertainty principle predicts.
Also you can use transforms such as the Vigner-Wille transform . It is not bound by the uncertainty principle, but generates "crossterms", i.e. frequency component artifacts. However, when you only have few frequency compoents this might be acceptable. Depends on the use-case.

Improving frequency time normalization/hilbert transfer runtimes

So this is a bit of a nitty gritty question...
I have a time-series signal that has a non-uniform response spectrum that I need to whiten. I do this whitening using a frequency time normalization method, where I incrementally filter my signal between two frequency endpoints, using a constant narrow frequency band (~1/4 the lowest frequency end-member). I then find the envelope that characterizes each one of these narrow bands, and normalize that frequency component. I then rebuild my signal using these normalized signals... all done in python (sorry, has to be a python solution)...
Here is the raw data:
and here is its spectrum:
and here is the spectrum of the whitened data:
The problem is, that I have to do this for maybe ~500,000 signals like this, and it takes a while (~a minute each)... With almost the entirety of the time being spend doing the actual (multiple) Hilbert transforms
I have it running on a small cluster already. I don't want to parallelize the loop the Hilbert is in.
I'm looking for alternative envelope routines/functions (non Hilbert), or alternative ways to calculate the entire narrowband response function without doing a loop.
The other option is to make the frequency bands adaptive to the center frequency over which its filtering, so they get progressively larger as we march through the routines; which would just decrease the number of times I have to go through the loop.
Any and all suggestions welcome!!!
example code/dataset:
https://github.com/ashtonflinders/FTN_Example

Here is a faster method to calculate the enveloop by local max:
def calc_envelope(x, ind):
x_abs = np.abs(x)
loc = np.where(np.diff(np.sign(np.diff(x_abs))) < 0)[0] + 1
peak = x_abs[loc]
envelope = np.interp(ind, loc, peak)
return envelope
Here is an example output:
It's about 6x faster than hilbert. To speedup even more, you can write a cython function that find next local max point and does normalization up to the local max point iteratively.

Python: How to interpolate errors using scipy interpolate.interp1d

I have a number of data sets, each containing x, y, and y_error values, and I'm simply trying to calculate the average value of y at each x across these data sets. However the data sets are not quite the same length. I thought the best way to get them to an equal length would be to use scipy's interoplate.interp1d for each data set. However, I still need to be able to calculate the error on each of these averaged values, and I'm quite lost on how to accomplish that after doing an interpolation.
I'm pretty new to Python and coding in general, so I appreciate your help!

As long as you can assume that your errors represent one-sigma intervals of normal distributions, you can always generate synthetic datasets, resample and interpolate those, and compute the 1-sigma errors of the results.
Or just interpolate values+err and values-err, if all you need is a quick and dirty rough estimate.

Power spectral density of a signal with gaps?

Does anyone know if it is possible to find a power spectral density of a signal with gaps in it. For example (in matlab syntax cause that is what I'm familiar with)
ta=1:1000;
tb=1200:3000;
t=[ta tb]; % this is the timebase
signal=randn(size(t)); this is a signal
figure(101)
plot(t,signal,'.')
I'd like to be able to determine frequencies on a longer time base that just the individual sections of data. Obviously I could just take the PSD of individual sections but that will limit the lowest frequency. I could interpolate the data, but this would colour the PSD.
Any thoughts would be much appreciated.

The Lomb-Scargle periodogram algorithm is usually used to perform analysis on unevenly spaced data (sampled at arbitrary time points) or when a proportion of the data is missing.
Here's a couple of MATLAB implementations:
lombscargle.m (FEX)
Lomb (Lomb-Scargle) Periodogram (FEX)
lomb.m - ECG tools by Gari Clifford

I found this Non Uniform FFT but I'm not sure that its exactly what I need as it might really be for data that is mostly sampled on an uneven time base, rather than evenly spaced data with significant gaps. I'll give it a go!

Leaving out segments of the Fourier basis vectors results in exactly the same FT, thus PSD, as using the complete basis, but multiplying by zeros within a zero padding in any signal "gaps".

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.