I'm struggling to understand a problem with numerical integration of a signal. Basically I have a signal which I would like to integrate or perform and antiderivative as a function of time (integration of pick-up coil for getting magnetic field). I've tried two different methods which in principle should be consistent but they are not. The code I'm using is the following. Beware that the signals y in the code has been previously high pass filtered using butterworth filtering (similar to what done here http://wiki.scipy.org/Cookbook/ButterworthBandpass). The signal and time basis can be downloaded here (https://www.dropbox.com/s/fi5z38sae6j5410/trial.npz?dl=0)
import scipy as sp
from scipy import integrate
from scipy import fftpack
data = np.load('trial.npz')
y = data['arr_1'] # this is the signal
t = data['arr_0']
# integration using pfft
bI = sp.fftpack.diff(y-y.mean(),order=-1)
bI2= sp.integrate.cumtrapz(y-y.mean(),x=t)
Now the two signals (besides the eventual different linear trend which can be taken out) are different, or better dynamically they are quite similar with the same time of oscillations but there is a factor approximately of 30 between the two signals, in the sense that bI2 is 30 times (approximately) lower than bI. BTW I've subtracted the mean in both the two signals to be sure that they are zero mean signals and performing integration in IDL (both with equivalent cumsumtrapz and in the fourier domain) gives values compatible with bI2. Any clue is really welcomed
It's difficult to know what scipy.fftpack.diff() is doing under the bonnet.
To try and solve your problem, I have dug up an old frequency domain integration function that I wrote a while ago. It's worth pointing out that in practice, one generally wants a bit more control of some of the parameters than scipy.fftpack.diff() gives you. For example, the f_lo and f_hi parameters of my intf() function allow you to band-limit the input to exclude very low or very high frequencies which may be noisy. Noisy low frequencies in particular can 'blow-up' during integration and overwhelm the signal. You may also want to use a window at the start and end of the time series to stop spectral leakage.
I have calculated bI2 and also a result, bI3, integrated once with intf() using the following code (I assumed an average sampling rate for simplicity):
import intf
from scipy import integrate
data = np.load(path)
y = data['arr_1']
t = data['arr_0']
bI2= sp.integrate.cumtrapz(y-y.mean(),x=t)
bI3 = intf.intf(y-y.mean(), fs=500458, f_lo=1, winlen=1e-2, times=1)
I plotted bI2 and bI3:
The two time series are of the same order of magnitude, and broadly the same shape, notwithstanding the piecewise linear trend apparent in bI2. I know this doesn't explain what's going on in the scipy function, but at least this shows it's not a problem with the frequency domain method.
The code for intf is pasted in full below.
def intf(a, fs, f_lo=0.0, f_hi=1.0e12, times=1, winlen=1, unwin=False):
"""
Numerically integrate a time series in the frequency domain.
This function integrates a time series in the frequency domain using
'Omega Arithmetic', over a defined frequency band.
Parameters
----------
a : array_like
Input time series.
fs : int
Sampling rate (Hz) of the input time series.
f_lo : float, optional
Lower frequency bound over which integration takes place.
Defaults to 0 Hz.
f_hi : float, optional
Upper frequency bound over which integration takes place.
Defaults to the Nyquist frequency ( = fs / 2).
times : int, optional
Number of times to integrate input time series a. Can be either
0, 1 or 2. If 0 is used, function effectively applies a 'brick wall'
frequency domain filter to a.
Defaults to 1.
winlen : int, optional
Number of seconds at the beginning and end of a file to apply half a
Hanning window to. Limited to half the record length.
Defaults to 1 second.
unwin : Boolean, optional
Whether or not to remove the window applied to the input time series
from the output time series.
Returns
-------
out : complex ndarray
The zero-, single- or double-integrated acceleration time series.
Versions
----------
1.1 First development version.
Uses rfft to avoid complex return values.
Checks for even length time series; if not, end-pad with single zero.
1.2 Zero-means time series to avoid spurious errors when applying Hanning
window.
"""
a = a - a.mean() # Convert time series to zero-mean
if np.mod(a.size,2) != 0: # Check for even length time series
odd = True
a = np.append(a, 0) # If not, append zero to array
else:
odd = False
f_hi = min(fs/2, f_hi) # Upper frequency limited to Nyquist
winlen = min(a.size/2, winlen) # Limit window to half record length
ni = a.size # No. of points in data (int)
nf = float(ni) # No. of points in data (float)
fs = float(fs) # Sampling rate (Hz)
df = fs/nf # Frequency increment in FFT
stf_i = int(f_lo/df) # Index of lower frequency bound
enf_i = int(f_hi/df) # Index of upper frequency bound
window = np.ones(ni) # Create window function
es = int(winlen*fs) # No. of samples to window from ends
edge_win = np.hanning(es) # Hanning window edge
window[:es/2] = edge_win[:es/2]
window[-es/2:] = edge_win[-es/2:]
a_w = a*window
FFTspec_a = np.fft.rfft(a_w) # Calculate complex FFT of input
FFTfreq = np.fft.fftfreq(ni, d=1/fs)[:ni/2+1]
w = (2*np.pi*FFTfreq) # Omega
iw = (0+1j)*w # i*Omega
mask = np.zeros(ni/2+1) # Half-length mask for +ve freqs
mask[stf_i:enf_i] = 1.0 # Mask = 1 for desired +ve freqs
if times == 2: # Double integration
FFTspec = -FFTspec_a*w / (w+EPS)**3
elif times == 1: # Single integration
FFTspec = FFTspec_a*iw / (iw+EPS)**2
elif times == 0: # No integration
FFTspec = FFTspec_a
else:
print 'Error'
FFTspec *= mask # Select frequencies to use
out_w = np.fft.irfft(FFTspec) # Return to time domain
if unwin == True:
out = out_w*window/(window+EPS)**2 # Remove window from time series
else:
out = out_w
if odd == True: # Check for even length time series
return out[:-1] # If not, remove last entry
else:
return out
Related
I have a time series dataset with sample rate 1/10 and sample size 8390. I just want to apply low and high pass filters for the first 1000 and last 1000 samples so I can compute a fourier transform without low end artefacts affecting the result.
I tried using scipy's butterworth filter function to generate an array of coefficients 'lowpass' then convolving that array with my dataset y values 'yf_ISLL_11_21_irfft'.
from scipy import signal
lowpass = scipy.signal.butter(2, 0.1, btype='low', analog=False, output='ba', fs=0.1)
yf_ISLL_11_21_irfft = np.convolve(lowpass, yf_ISLL_11_21_irfft)
plt.plot(time_data_ISLL_11_21, yf_ISLL_11_21_irfft)
But the error message: 'Digital filter critical frequencies must be 0 < Wn < 1' is returned, despite my Wn == 0.1.
I get a slightly different error message when I run your first two lines:
ValueError: Digital filter critical frequencies must be 0 < Wn < fs/2 (fs=0.1 -> fs/2=0.05)
But I think the underlying reason is the same. When you provide a value for fs, the Wn parameters needs to be between 0 and the Nyquist frequency. It sounds like (and maybe I am misinterpreting), you want to use a value of 0.1*fs as your Wn value.
I have a time serie that represents the X and the Z coordinate in a virtual environment.
X = np.array(df["X"])
Z = np.array(df["Z"])
Both the X and the Z coordinates contain noise from a different source. To filter out the noise I'd like to use a Fourier Transform.
After some research I used the code from https://medium.com/swlh/5-tips-for-working-with-time-series-in-python-d889109e676d to denoise my data.
def fft_denoiser(x, n_components, to_real=True):
n = len(x)
# compute the fft
fft = np.fft.fft(x, n)
# compute power spectrum density
# squared magnitud of each fft coefficient
PSD = fft * np.conj(fft) / n
# keep high frequencies
_mask = PSD > n_components
fft = _mask * fft
# inverse fourier transform
clean_data = np.fft.ifft(fft)
if to_real:
clean_data = clean_data.real
return clean_data
After setting the n_components I like to use the cleaned data. Which goes pretty well, as I plotted for the X-coordinate:
Only at the beginning and the end, the cleaned data suddenly move towards each others values... Can somebody help or expain to me what causes this, and how I can overcome this?
The reason you're having this issue is because the FFT implicitly assumes that the provided input signal is periodic. If you repeat your raw data, you see that at each period there is a large discontinuity (as the signal goes from ~20 back down to ~5). Once some higher frequency components are removed you see the slightly less sharp discontinuity at the edges (a few samples at the start and a few samples at the end).
To avoid this situation, you can do the filtering in the time-domain using a linear FIR filter, which can process the data sequence without the periodicity assumption.
For the purpose of this answer I constructed a synthetic test signal (which you can use to recreate the same conditions), but you can obviously use your own data instead:
# Generate synthetic signal for testing purposes
fs = 1 # Hz
f0 = 0.002/fs
f1 = 0.01/fs
dt = 1/fs
t = np.arange(200, 901)*dt
m = (25-5)/(t[-1]-t[0])
phi = 4.2
x = 5 + m*(t-t[0]) + 2*np.sin(2*np.pi*f0*t) + 1*np.sin(2*np.pi*f1*t+phi) + 0.2*np.random.randn(len(t))
Now to design the filter we can take the inverse transform of the _mask (instead of applying the mask):
import numpy as np
# Design denoising filter
def freq_sampling_filter(x, threshold):
n = len(x)
# compute the fft
fft = np.fft.fft(x, n)
# compute power spectrum density
# squared magnitud of each fft coefficient
PSD = fft * np.conj(fft) / n
# keep frequencies with large contributions
_mask = PSD > threshold
_coff = np.fft.fftshift(np.real(np.fft.ifft(_mask)))
return _coff
coff = freq_sampling_filter(x, threshold)
The threshold is a tunable parameter which would be chosen to keep enough of the frequency components you'd like to keep and get rid of the unwanted frequency components. That is of course highly subjective.
Then we can simply apply the filter with scipy.signal.filtfilt:
from scipy.signal import filtfilt
# apply the denoising filter
cleaned = filtfilt(coff, 1, x, padlen=len(x)-1, padtype='constant')
For the purpose of illustration, using a threshold of 10 with the above generated synthetic signal yields the following raw data (variable x) and cleaned data (variable cleaned):
The choice of padtype to 'constant' ensures that the filtered values start and end at the start and end values of the unfiltered data.
Alternative
As was posted in comments, filtfilt may be expensive for longer data sets.
As an alternative, the filtering can be performed using FFT-based convolution by using scipy.fftconvolve. Note that in this case, there is no equivalent of the padtype argument of filtfilt, so we need to manually pad the signal to avoid edge effects at the start and end.
n = len(x)
# Manually pad signal to avoid edge effects
x_padded = np.concatenate((x[0]*np.ones(n-1), x, x[-1]*np.ones((n-1)//2)))
# Filter using FFT-based convolution
cleaned = fftconvolve(x_padded, coff, mode='same')
# Extract result (remove data from padding)
cleaned = cleaned[2*(n-1)//2:-n//2+1]
For reference, here are some benchmark comparisons (timings in seconds, so smaller is better) for the above signal of length 700:
filtfilt : 0.3831593
fftconvolve : 0.00028040000000029153
Note that relative performance would vary, but FFT-based convolution is expected to perform comparatively better as the signal gets longer.
I would like to apply fft to my time series data to extract the lowest 5 dominant frequency components for predicting the y value (bacteria count) at the end of each time series. My code is as below:
df = pd.read_csv('/content/drive/My Drive/df.csv', sep=',')
X = df.iloc[0:2,0:10000]
dft_X = np.fft.fft(X) # What should I fill in for argument n?
print(dft_X)
print(len(dft_X))
plt.plot(dft_X)
plt.grid(True)
plt.show()
for i in dft_X:
m = i[np.argpartition(i,5)[:5]]
n = i[np.argpartition(i,range(5))[:5]]
print(m,'\n',n)
In the scipy doc on numpy.fft.fft, it states that
numpy.fft.fft(a, n=None, axis=-1, norm=None)
...
n : int, optional
Length of the transformed axis of the output. If n is smaller than the length of the input, the input is cropped. If it is larger, the input is padded with zeros. If n is not given, the length of the input along the axis specified by axis is used.
But I am still not clear about the effect of argument n value on the output and how to decide what value to use.
I notice that when n = 10, the output is as follows:
# n= 10
# [-1.5 -1.11022302e-16j -0.46352549-1.42658477e+00j
# -1.21352549-8.81677878e-01j -1.21352549+8.81677878e-01j
# -0.46352549+1.42658477e+00j]
# [-1.5 -1.11022302e-16j -1.21352549-8.81677878e-01j
# -1.21352549+8.81677878e-01j -0.46352549-1.42658477e+00j
# -0.46352549+1.42658477e+00j]
and when n = 10000, the output is as follows:
# n= 10000
# [-4752.15448944 +4113.44846878j -5199.36419709 -1826.78753048j
# -4659.45705354-13014.97971229j -4752.15448944 -4113.44846878j
# -5199.36419709 +1826.78753048j]
# [-5199.36419709 -1826.78753048j -5199.36419709 +1826.78753048j
# -4752.15448944 -4113.44846878j -4752.15448944 +4113.44846878j
# -4659.45705354-13014.97971229j]
What determines the right n value to use? Besides, why are output values complex numbers? Any help is appreciated.
Here is the time series plot for reference:
But I am still not clear about the effect of argument n value on the output and how to decide what value to use.
For "n = size of input", the result is the plain discrete fourier transform: it represents the signal of duration (T = n dt) exactly in frequency space. The lowest frequency component is a sine/cosine of wave period 2T.
For "n > size of input", you perform the transform of a signal that is the original one with zeros appended. The lowest frequency that can be represented corresponds thus to a longer wave period 2T. The signal is thus cut abruptly to zero. Depending on the input signal this may introduce unwanted higher frequency components.
For "n < size of input", you truncate the signal. If you have a "stationary signal", it could make sense to analyze shorter samples (possibly with windowing).
What determines the right n value to use?
It depends on the application and on the sampling (very long stationary series, short measurement, ...). Unless you have some reason to use the option, you can omit n.
Besides, why are output values complex numbers?
A Fourier transform of a real signal is complex in general. It is real only for even signals.
You can play with FFTs with simple signals such as a pure sine or cosine to make sense of this.
I'm trying to calculate the total harmonic distortion values of ac voltage supplied. I am sampling voltage data using Arduino at over 8 KHz rate and storing those data into a text file. Then I'm trying to calculate thd using the following code snippet written in python:
import numpy as np
import scipy.fftpack
from scipy.fftpack import fft
from numpy import genfromtxt
sampled_data = genfromtxt('/../file.txt',delimiter=',')
abs_yf=np.abs(fft(sampled_data))
#As far as I know, THD=sqrt(sum of square magnitude of
#harmonics+noise)/Fundamental value (Is it correct?)So I'm
#just summing up square of all frequency data obtained from FFT,
#sqrt() them and dividing them with fundamental frequecy value.
def thd(abs_data):
sq_sum=0.0
for r in range(len(abs_data)):
sq_sum=sq_sum+(abs_data[r])**2
sq_harmonics=sq_sum-(max(abs_data))**2.0
thd=100*sq_harmonics**0.5/max(abs_data)
return thd
print "Total Harmonic Distortion(in percent):"
print thd(abs_yf)
Problem is, The obtained Thd values vary within 5% to 25% in my case. (In reality it's not more than 5% actually). What am I doing wrong? Is there any other way to find out thd?
Though this is long quiet, for anyone encountering this post like me: There are a couple of problems with the OP method.
1) The magnitudes returned by FFT include a magnitude of the 0 frequency bin, so the assumption that max(abs_data) is the magnitude corresponding to the fundamental frequency is not correct if there is any DC bias in the signal. This is a problem in the line
thd = 100*sq_harmonics**0.5 / max(abs_data)
The amplitude associated with the 0 frequency can just be ignored as a quick solution.
2) The second half of the abs_data should be thrown out, it is a "mirrored" reflection of the first. This is due to the nature of the Fourier transform.
Both these issues can be addressed by changing the input to the function, i.e by replacing
print thd(abs_yf)
with
print( thd(abs_yf[1:int(len(abs_yf)/2) ]) )
where we have changed the input to not include the first or the last N/2 elements.
The result is still not ideal because the window needs to be exactly an integer number of cycles as the previous answers noted above. Testing with a pure sine with offset and adjusting the window demonstrates that the method works fairly well but fails terribly if significant window errors.
t0=0
tf = 0.02 # integer number of cycles
dt = 1e-4
offset = 0.5
N = int((tf-t0)/dt)
time = np.linspace(0.0,tf,N ) #;
commandSigFreq = 100
Amplitude = 2
waveOfSin = Amplitude*np.sin(2.0*pi*commandSigFreq*time) + offset
abs_yf = np.abs(fft(waveOfSin))
#print("freq is" + str(scipy.fftpack.fftfreq(sampled_data, dt ) ))
#As far as I know, THD=sqrt(sum of square magnitude of
#harmonics+noise)/Fundamental value (Is it correct?)So I'm
#just summing up square of all frequency data obtained from FFT,
#sqrt() them and dividing them with fundamental frequency value.
def thd(abs_data):
sq_sum=0.0
for r in range( len(abs_data)):
sq_sum = sq_sum + (abs_data[r])**2
sq_harmonics = sq_sum -(max(abs_data))**2.0
thd = 100*sq_harmonics**0.5 / max(abs_data)
return thd
print("Total Harmonic Distortion(in percent):")
print(thd(abs_yf[1:int(len(abs_yf)/2) ]))
It is quite likely that you add additional distortion by the measurement process itself.
If you compare an Arduino ADC with a high class measurement device, the values of the Arduino will very likely much worse. At least you need a very stable and jitter-free clock.
Furthermore, the output of the data (I guess via UART) might interfere with the timing of the ADC measurement.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am new in python as well as in signal processing. I am trying to calculate mean value among some frequency range of a signal.
What I am trying to do is as follows:
import numpy as np
data = <my 1d signal>
lF = <lower frequency>
uF = <upper frequency>
ps = np.abs(np.fft.fft(data)) ** 2 #array of power spectrum
time_step = 1.0 / 2000.0
freqs = np.fft.fftfreq(data.size, time_step) # array of frequencies
idx = np.argsort(freqs) # sorting frequencies
sum = 0
c =0
for i in idx:
if (freqs[i] >= lF) and (freqs[i] <= uF) :
sum += ps[i]
c +=1
avgValue = sum/c
print 'mean value is=',avgValue
I think calculation is fine, but it takes a lot of time like for data of more than 15GB and processing time grows exponentially. Is there any fastest way available such that I would be able to get mean value of power spectrum within some frequency range in fastest manner. Thanks in advance.
EDIT 1
I followed this code for calculation of power spectrum.
EDIT 2
This doesn't answer to my question as it calculates mean over the whole array/list but I want mean over part of the array.
EDIT 3
Solution by jez of using mask reduces time. Actually I have more than 10 channels of 1D signal and I want to treat them in a same manner i.e. average frequencies in a range of each channel separately. I think python loops are slow. Is there any alternate for that?
Like this:
for i in xrange(0,15):
data = signals[:, i]
ps = np.abs(np.fft.fft(data)) ** 2
freqs = np.fft.fftfreq(data.size, time_step)
mask = np.logical_and(freqs >= lF, freqs <= uF )
avgValue = ps[mask].mean()
print 'mean value is=',avgValue
The following performs a mean over a selected region:
mask = numpy.logical_and( freqs >= lF, freqs <= uF )
avgValue = ps[ mask ].mean()
For proper scaling of power values that have been computed as abs(fft coefficients)**2, you will need to multiply by (2.0 / len(data))**2 (Parseval's theorem)
Note that it gets slightly fiddly if your frequency range includes the Nyquist frequency—for precise results, handling of that single frequency component would then need to depend on whether data.size is even or odd). So for simplicity, ensure that uF is strictly less than max(freqs). [For similar reasons you should ensure lF > 0.]
The reasons for this are tedious to explain and even more tedious to correct for, but basically: the DC component is represented once in the DFT, whereas most other frequency components are represented twice (positive frequency and negative frequency) at half-amplitude each time. The even-more-annoying exception is the Nyquist frequency which is represented once at full amplitude if the signal length is even, but twice at half amplitude if the signal length is odd. All of this would not affect you if you were averaging amplitude: in a linear system, being represented twice compensates for being at half amplitude. But you're averaging power, i.e. squaring the values before averaging, so this compensation doesn't work out.
I've pasted my code for grokking all of this. This code also shows how you can work with multiple signals stacked in one numpy array, which addresses your follow-up question about avoiding loops in the multi-channel case. Remember to supply the correct axis argument both to numpy.fft.fft() and to my fft2ap().
If you really have a signal of 15 GB size, you'll not be able to calculate the FFT in an acceptable time. You can avoid using the FFT, if it is acceptable for you to approximate your frequency range by a band pass filter. The justification is the Poisson summation formula, which states that sum of squares is not changed by a FFT (or: the power is preserved). Staying in the time domain will let the processing time rise proportionally to the signal length.
The following code designs a Butterworth band path filter, plots the filter response and filters a sample signal:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
dd = np.random.randn(10**4) # generate sample data
T = 1./2e3 # sampling interval
n, f_s = len(dd), 1./T # number of points and sampling frequency
# design band path filter:
f_l, f_u = 50, 500 # Band from 50 Hz to 500 Hz
wp = np.array([f_l, f_u])*2/f_s # normalized pass band frequnecies
ws = np.array([0.8*f_l, 1.2*f_u])*2/f_s # normalized stop band frequencies
b, a = signal.iirdesign(wp, ws, gpass=60, gstop=80, ftype="butter",
analog=False)
# plot filter response:
w, h = signal.freqz(b, a, whole=False)
ff_w = w*f_s/(2*np.pi)
fg, ax = plt.subplots()
ax.set_title('Butterworth filter amplitude response')
ax.plot(ff_w, np.abs(h))
ax.set_ylabel('relative Amplitude')
ax.grid(True)
ax.set_xlabel('Frequency in Hertz')
fg.canvas.draw()
# do the filtering:
zi = signal.lfilter_zi(b, a)*dd[0]
dd1, _ = signal.lfilter(b, a, dd, zi=zi)
# calculate the avarage:
avg = np.mean(dd1**2)
print("RMS values is %g" % avg)
plt.show()
Read the documentation to Scipy's Filter design to learn how to modify the parameters of the filter.
If you want to stay with the FFT, read the docs on signal.welch and plt.psd. The Welch algorithm is a method to efficiently calculate the power spectral density of a signal (with some trade-offs).
It is much easier to work with FFT if your arrays are power of 2. When you do fft the frequencies ranges from -pi/timestep to pi/timestep (assuming that frequency is defined as w = 2*pi/t, change the values accordingly if you use f =1/t representation). Your spectrum is arranged as 0 to minfreqq--maxfreq to zero. you can now use fftshift function to swap the frequencies and your spectrum looks like minfreq -- DC -- maxfreq. now you can easily determine your desired frequency range because it is already sorted.
The frequency step dw=2*pi/(time span) or max-frequency/(N/2) where N is array size.
N/2 point is DC or 0 frequency. Nth position is max frequency now you can easily determine your range
Lower_freq_indx=N/2+N/2*Lower_freq/max_freq
Higher_freq_index=N/2+N/2*Higher_freq/Max_freq
avg=sum(ps[lower_freq_indx:Higher_freq_index]/(Higher_freq_index-Lower_freq_index)
I hope this will help
regards