power spectral density-scipy.signal - python

While trying to compute the Power spectral density with an acquisition rate of 300000hz using ... signal.periodogram(x, fs,nfft=4096) , I get the graph upto 150000Hz and not upto 300000. Why is this upto half the value ? What is the meaning of sampling rate here?
In the example given in scipy documentation , the sampling rate is 10000Hz but we see in the plot only upto 5000Hz.
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.periodogram.html

The spectrum of real-valued signal is always symmetric with respect to the Nyquist frequency (half of the sampling rate). As a result, there is often no need to store or plot the redundant symmetric portion of the spectrum.
If you still want to see the whole spectrum, you can set the return_onesided argument to True as follows:
f, Pxx_den = signal.periodogram(x, fs, return_onesided=False)
The resulting plot of the same example provided in scipy.periodogram documentation would then cover a 10000Hz frequency range as would be expected:

If you check the length of f in the example:
>>> len(f)
>>> 50001
This is NOT 50000 Hz. This is because scipy.signal.periodogram calls scipy.signal.welch with the parameter nperseg=x.shape[-1] by default. This is the correct input for scipy.signal.welch. However, if dig into source and see lines 328-329 (as of now), you'll see the reason why the size of output is 50001.
if nfft % 2 == 0: # even
outshape[-1] = nfft // 2 + 1

Related

Which parameters of welch do determin the length of the output? (python)

I am using welch in python.
Which parameter in welch does define the length of the output array?
Based on my trials, the output length is related to nperseg/2; but I cannot understand its reason and mathematics. And, I am not sure about the effect of other parameters on the output length.
Also, there is not enough explanation in its documentation (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.welch.html)
I will be more than happy if anyone can help me. I could not find any clear info on the web!
Parsing the sentence from the documents
Welch’s method [1] computes an estimate of the power spectral density by dividing the data into overlapping segments, computing a modified periodogram for each segment and averaging the periodograms.
def periodogram(x, Ts=1.0):
'''
This function will compute one score for each period
in
'''
return abs(np.fft.rfft(x))**2
Then they say that the data is divided in overlapped segments and then added, so it is something like this
def welsh(data, nperseg, noverlap, window):
stride = nperseg - noverlap
return sum(periodogram(data[i*stride:i*stride+nperseg])
for i in range((len(data) - nperseg) // stride))
i.e. you take segments of length nperseg to compute the periodograms, since you can get the periodogram only up to the Nyquist frequency, you end up having nperseg/2 (for nperseg).

Fastest way to get average value of frequencies within range [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am new in python as well as in signal processing. I am trying to calculate mean value among some frequency range of a signal.
What I am trying to do is as follows:
import numpy as np
data = <my 1d signal>
lF = <lower frequency>
uF = <upper frequency>
ps = np.abs(np.fft.fft(data)) ** 2 #array of power spectrum
time_step = 1.0 / 2000.0
freqs = np.fft.fftfreq(data.size, time_step) # array of frequencies
idx = np.argsort(freqs) # sorting frequencies
sum = 0
c =0
for i in idx:
if (freqs[i] >= lF) and (freqs[i] <= uF) :
sum += ps[i]
c +=1
avgValue = sum/c
print 'mean value is=',avgValue
I think calculation is fine, but it takes a lot of time like for data of more than 15GB and processing time grows exponentially. Is there any fastest way available such that I would be able to get mean value of power spectrum within some frequency range in fastest manner. Thanks in advance.
EDIT 1
I followed this code for calculation of power spectrum.
EDIT 2
This doesn't answer to my question as it calculates mean over the whole array/list but I want mean over part of the array.
EDIT 3
Solution by jez of using mask reduces time. Actually I have more than 10 channels of 1D signal and I want to treat them in a same manner i.e. average frequencies in a range of each channel separately. I think python loops are slow. Is there any alternate for that?
Like this:
for i in xrange(0,15):
data = signals[:, i]
ps = np.abs(np.fft.fft(data)) ** 2
freqs = np.fft.fftfreq(data.size, time_step)
mask = np.logical_and(freqs >= lF, freqs <= uF )
avgValue = ps[mask].mean()
print 'mean value is=',avgValue
The following performs a mean over a selected region:
mask = numpy.logical_and( freqs >= lF, freqs <= uF )
avgValue = ps[ mask ].mean()
For proper scaling of power values that have been computed as abs(fft coefficients)**2, you will need to multiply by (2.0 / len(data))**2 (Parseval's theorem)
Note that it gets slightly fiddly if your frequency range includes the Nyquist frequency—for precise results, handling of that single frequency component would then need to depend on whether data.size is even or odd). So for simplicity, ensure that uF is strictly less than max(freqs). [For similar reasons you should ensure lF > 0.]
The reasons for this are tedious to explain and even more tedious to correct for, but basically: the DC component is represented once in the DFT, whereas most other frequency components are represented twice (positive frequency and negative frequency) at half-amplitude each time. The even-more-annoying exception is the Nyquist frequency which is represented once at full amplitude if the signal length is even, but twice at half amplitude if the signal length is odd. All of this would not affect you if you were averaging amplitude: in a linear system, being represented twice compensates for being at half amplitude. But you're averaging power, i.e. squaring the values before averaging, so this compensation doesn't work out.
I've pasted my code for grokking all of this. This code also shows how you can work with multiple signals stacked in one numpy array, which addresses your follow-up question about avoiding loops in the multi-channel case. Remember to supply the correct axis argument both to numpy.fft.fft() and to my fft2ap().
If you really have a signal of 15 GB size, you'll not be able to calculate the FFT in an acceptable time. You can avoid using the FFT, if it is acceptable for you to approximate your frequency range by a band pass filter. The justification is the Poisson summation formula, which states that sum of squares is not changed by a FFT (or: the power is preserved). Staying in the time domain will let the processing time rise proportionally to the signal length.
The following code designs a Butterworth band path filter, plots the filter response and filters a sample signal:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
dd = np.random.randn(10**4) # generate sample data
T = 1./2e3 # sampling interval
n, f_s = len(dd), 1./T # number of points and sampling frequency
# design band path filter:
f_l, f_u = 50, 500 # Band from 50 Hz to 500 Hz
wp = np.array([f_l, f_u])*2/f_s # normalized pass band frequnecies
ws = np.array([0.8*f_l, 1.2*f_u])*2/f_s # normalized stop band frequencies
b, a = signal.iirdesign(wp, ws, gpass=60, gstop=80, ftype="butter",
analog=False)
# plot filter response:
w, h = signal.freqz(b, a, whole=False)
ff_w = w*f_s/(2*np.pi)
fg, ax = plt.subplots()
ax.set_title('Butterworth filter amplitude response')
ax.plot(ff_w, np.abs(h))
ax.set_ylabel('relative Amplitude')
ax.grid(True)
ax.set_xlabel('Frequency in Hertz')
fg.canvas.draw()
# do the filtering:
zi = signal.lfilter_zi(b, a)*dd[0]
dd1, _ = signal.lfilter(b, a, dd, zi=zi)
# calculate the avarage:
avg = np.mean(dd1**2)
print("RMS values is %g" % avg)
plt.show()
Read the documentation to Scipy's Filter design to learn how to modify the parameters of the filter.
If you want to stay with the FFT, read the docs on signal.welch and plt.psd. The Welch algorithm is a method to efficiently calculate the power spectral density of a signal (with some trade-offs).
It is much easier to work with FFT if your arrays are power of 2. When you do fft the frequencies ranges from -pi/timestep to pi/timestep (assuming that frequency is defined as w = 2*pi/t, change the values accordingly if you use f =1/t representation). Your spectrum is arranged as 0 to minfreqq--maxfreq to zero. you can now use fftshift function to swap the frequencies and your spectrum looks like minfreq -- DC -- maxfreq. now you can easily determine your desired frequency range because it is already sorted.
The frequency step dw=2*pi/(time span) or max-frequency/(N/2) where N is array size.
N/2 point is DC or 0 frequency. Nth position is max frequency now you can easily determine your range
Lower_freq_indx=N/2+N/2*Lower_freq/max_freq
Higher_freq_index=N/2+N/2*Higher_freq/Max_freq
avg=sum(ps[lower_freq_indx:Higher_freq_index]/(Higher_freq_index-Lower_freq_index)
I hope this will help
regards

Python - FFT leads to wrong physical meanings

I am new to Python.
I intend to do Fourier Transform to an array of discrete points, (time, acceleration), and plot the result out.
I copy and paste the sample FFT code, and modify accordingly.
Please see codes:
import numpy as np
import matplotlib.pyplot as plt
# Load the .txt file in
myData = np.loadtxt('twenty_z_up.txt')
# Extract the time and acceleration columns
time = copy(myData[:,0])
# Extract the acceleration columns
zAcc = copy(myData[:,3])
t = np.arange(10080)
sp = np.fft.fft(zAcc)
freq = np.fft.fftfreq(t.shape[-1])
plt.plot(freq, sp.real)
myData is a rectangular matrix with 10080 rows and 10 columns.
Thus, zAcc is the row3 extracted from the matrix.
In the plot drawn by Spyder, most of the harmonics concentrated around 0.
They are all extremely small.
But my data are actually the accelerations of the phone carried by a walking person (including the gravity). So I expect the most significant harmonic happens around 2Hz.
Why is the graph non-sense?
Thanks in advance!
==============UPDATES: My Graphs======================
The first time domain one:
x-axis is in millisecond.
y-axis is in m/s^2, due to earth gravity, it has a DC offset of ~10.
You do get two spikes at (approximately) 2Hz. Your sampling period is around 2.8 ms (as best as I can infer from your first plot), giving +/-2Hz the normalized frequency of +/-0.056, which is about where your spikes are. fft.fftfreq by default returns the normalized frequency (which scales the sampling period). You can set the d argument to be the sampling period, and you'll get a vector containing the actual frequency.
Your huge spike in the middle is obviously the DC offset (which you can trivially remove by subtracting the mean).
As others said, we need to see the data, post it somewhere. Just to check, try first fixing the timestep size in fftfreq, then plot this synthetic signal, and then plot your signal to see how they compare:
timestep=1./50.#Assume sampling at 50Hz. Change this accordingly.
N=10080#the number of samples
T=N*timestep
t = np.linspace(0,T,N)#needed only to generate xAcc_synthetic
freq=2.#peak a frequency at 2Hz
#generate synthetic signal at 2Hz and add some noise to it
xAcc_synthetic = sin((2*np.pi)*freq*t)+np.random.rand(N)*0.2
sp_synthetic = np.fft.fft(xAcc_synthetic)
freq = np.fft.fftfreq(t.size,d=timestep)
print max(abs(freq))==(1/timestep)/2.#simple check highest freq.
plt.plot(freq, abs(sp_synthetic))
xlabel('Hz')
Now, at the x axis equal to 2 you actually have a physical frequency of 2Hz, and you may spot the more pronounced peak you are looking for. Moreover, you may want to have a look also at yAcc and zAcc.

How do I match lomb-scargle and FFT plots of same dataset?

I am doing some work, comparing the interpolated fft of the concentrations of some gases over a period, of which is unevenly sampled, with the lomb-scargle periodogram of the same data. I am using scipy's fft function to calculate the fourier transform and then squaring the modulus of this to give what I believe to be the power spectral density, in units of parts per billion(ppb) squared.
I can get the lomb-scargle plot to match almost the exact pattern as the FFT but never the same scale of magnitude, the FFT power spectral density always is higher, even though I thought the lomb-scargle power was power spectral density. Now the lomb code I am using:http://www.astropython.org/snippet/2010/9/Fast-Lomb-Scargle-algorithm, normalises the dataset taking away the average and dividing by 2 times the variance from the data, therefore I have normalised the FFT data in the same manner, but still the magnitudes do not match.
Therefore I did some more research and found that the normalised lomb-scargle power could unitless and therefore I cannot the plots match. This leads me to the 2 questions:
What units (if any) are the power spectral density of a normalised lim-scargle perioogram in?
How would I proceed to match my fft plot with my lomb-scargle plot, in terms of magnitude and pattern?
Thank you.
The squared modulus of the Fourier transform of a series is defined as the energy spectral density (ESD). You need to divide the ESD by the length of the series to convert to an estimate of power spectral density (PSD).
Units
The units of a PSD are [units]**2/[frequency] where [units] represents the units of your original series.
Normalization
To check for proper normalization, one can numerically integrate the PSD of a white noise (with known variance). If the integrated spectrum equals the variance of the series, the normalization is correct. A factor of 2 (too low) is not incorrect, though, and may indicate the PSD is normalized to be double-sided; in that case, just multiply by 2 and you have a properly normalized, single-sided PSD.
Using numpy, the randn function generates pseudo-random numbers that are Gaussian distributed. For example
10 * np.random.randn(1, 100)
produces a 1-by-100 array with mean=0 and variance=100. If the sampling frequency is, say, 1-Hz, the single-sided PSD will theoretically be flat at 200 units**2/Hz, from [0,0.5] Hz; the integrated spectrum would thus be 10, equaling the variance of the series.
Update
I modified the example included in the python code you linked to demonstrate the normalization for a normally distributed series of length 20, with variance 1, and sampling frequency 10:
import numpy
import lomb
numpy.random.seed(999)
nd = 20
fs = 10
x = numpy.arange(nd)
y = numpy.random.randn(nd)
fx, fy, nout, jmax, prob = lomb.fasper(x, y, 1., fs)
fNy = fx[-1]
fy = fy/fs
Si = numpy.mean(fy)*fNy
print fNy, Si, Si*2
This gives, for me:
5.26315789474 0.482185882163 0.964371764327
which shows you a few things:
The "Nyquist" frequency asked for is actually the sampling frequency.
The result needs to be divided by the sampling frequency.
The output is normalized for a double-sided PSD, so multiplying by 2 makes the integrated spectrum nearly 1.
In the time since this question was asked and answered, the AstroPy project has gained a Lomb-Scargle method, and this question is addressed in the documentation: http://docs.astropy.org/en/stable/stats/lombscargle.html#psd-normalization-unnormalized
In brief, you can compute a Fourier periodogram and compare it to the astropy Lomb-Scargle periodogram as follows
import numpy as np
from astropy.stats import LombScargle
def fourier_periodogram(t, y):
N = len(t)
frequency = np.fft.fftfreq(N, t[1] - t[0])
y_fft = np.fft.fft(y)
positive = (frequency > 0)
return frequency[positive], (1. / N) * abs(y_fft[positive]) ** 2
t = np.arange(100)
y = np.random.randn(100)
frequency, PSD_fourier = fourier_periodogram(t, y)
PSD_LS = LombScargle(t, y).power(frequency, normalization='psd')
np.allclose(PSD_fourier, PSD_LS)
# True
Since AstroPy is a common tool used in astronomy, I thought this might be more useful than an answer based on the code snippet mentioned above.

Clipping FFT Matrix

Audio processing is pretty new for me. And currently using Python Numpy for processing wave files. After calculating FFT matrix I am getting noisy power values for non-existent frequencies. I am interested in visualizing the data and accuracy is not a high priority. Is there a safe way to calculate the clipping value to remove these values, or should I use all FFT matrices for each sample set to come up with an average number ?
regards
Edit:
from numpy import *
import wave
import pymedia.audio.sound as sound
import time, struct
from pylab import ion, plot, draw, show
fp = wave.open("500-200f.wav", "rb")
sample_rate = fp.getframerate()
total_num_samps = fp.getnframes()
fft_length = 2048.
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length);
data = struct.unpack("%dH"%(fft_length), tempb)
temp[i,:] = array(data, short)
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[0])
show()
Here is the plot in non-logarithmic scale, for synthetic wave file containing 500hz(fading out) + 200hz sine wave created using Goldwave.
Simulated waveforms shouldn't show FFTs like your figure, so something is very wrong, and probably not with the FFT, but with the input waveform. The main problem in your plot is not the ripples, but the harmonics around 1000 Hz, and the subharmonic at 500 Hz. A simulated waveform shouldn't show any of this (for example, see my plot below).
First, you probably want to just try plotting out the raw waveform, and this will likely point to an obvious problem. Also, it seems odd to have a wave unpack to unsigned shorts, i.e. "H", and especially after this to not have a large zero-frequency component.
I was able to get a pretty close duplicate to your FFT by applying clipping to the waveform, as was suggested by both the subharmonic and higher harmonics (and Trevor). You could be introducing clipping either in the simulation or the unpacking. Either way, I bypassed this by creating the waveforms in numpy to start with.
Here's what the proper FFT should look like (i.e. basically perfect, except for the broadening of the peaks due to the windowing)
Here's one from a waveform that's been clipped (and is very similar to your FFT, from the subharmonic to the precise pattern of the three higher harmonics around 1000 Hz)
Here's the code I used to generate these
from numpy import *
from pylab import ion, plot, draw, show, xlabel, ylabel, figure
sample_rate = 20000.
times = arange(0, 10., 1./sample_rate)
wfm0 = sin(2*pi*200.*times)
wfm1 = sin(2*pi*500.*times) *(10.-times)/10.
wfm = wfm0+wfm1
# int test
#wfm *= 2**8
#wfm = wfm.astype(int16)
#wfm = wfm.astype(float)
# abs test
#wfm = abs(wfm)
# clip test
#wfm = clip(wfm, -1.2, 1.2)
fft_length = 5*2048.
total_num_samps = len(times)
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
temp[i,:] = wfm[i*fft_length:(i+1)*fft_length]
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[2], linewidth=3)
xlabel("freq (Hz)")
ylabel('abs(FFT)')
show()
FFT's because they are windowed and sampled cause aliasing and sampling in the frequency domain as well. Filtering in the time domain is just multiplication in the frequency domain so you may want to just apply a filter which is just multiplying each frequency by a value for the function for the filter you are using. For example multiply by 1 in the passband and by zero every were else. The unexpected values are probably caused by aliasing where higher frequencies are being folded down to the ones you are seeing. The original signal needs to be band limited to half your sampling rate or you will get aliasing. Of more concern is aliasing that is distorting the area of interest because for this band of frequencies you want to know that the frequency is from the expected one.
The other thing to keep in mind is that when you grab a piece of data from a wave file you are mathmatically multiplying it by a square wave. This causes a sinx/x to be convolved with the frequency response to minimize this you can multiply the original windowed signal with something like a Hanning window.
It's worth mentioning for a 1D FFT that the first element (index [0]) contains the DC (zero-frequency) term, the elements [1:N/2] contain the positive frequencies and the elements [N/2+1:N-1] contain the negative frequencies. Since you didn't provide a code sample or additional information about the output of your FFT, I can't rule out the possibility that the "noisy power values at non-existent frequencies" aren't just the negative frequencies of your spectrum.
EDIT: Here is an example of a radix-2 FFT implemented in pure Python with a simple test routine that finds the FFT of a rectangular pulse, [1.,1.,1.,1.,0.,0.,0.,0.]. You can run the example on codepad and see that the FFT of that sequence is
[0j, Negative frequencies
(1+0.414213562373j), ^
0j, |
(1+2.41421356237j), |
(4+0j), <= DC term
(1-2.41421356237j), |
0j, v
(1-0.414213562373j)] Positive frequencies
Note that the code prints out the Fourier coefficients in order of ascending frequency, i.e. from the highest negative frequency up to DC, and then up to the highest positive frequency.
I don't know enough from your question to actually answer anything specific.
But here are a couple of things to try from my own experience writing FFTs:
Make sure you are following Nyquist rule
If you are viewing the linear output of the FFT... you will have trouble seeing your own signal and think everything is broken. Make sure you are looking at the dB of your FFT magnitude. (i.e. "plot(10*log10(abs(fft(x))))" )
Create a unitTest for your FFT() function by feeding generated data like a pure tone. Then feed the same generated data to Matlab's FFT(). Do a absolute value diff between the two output data series and make sure the max absolute value difference is something like 10^-6 (i.e. the only difference is caused by small floating point errors)
Make sure you are windowing your data
If all of those three things work, then your fft is fine. And your input data is probably the issue.
Check the input data to see if there is clipping http://www.users.globalnet.co.uk/~bunce/clip.gif
Time doamin clipping shows up as mirror images of the signal in the frequency domain at specific regular intervals with less amplitude.

Categories