How to detect time-periods present in a signal using FFT? - python

I'm trying to determine the periodicities present in a given waveform.
This is my signal, which is a sinusoidal waveform:
t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*np.sin(2*np.pi*t_week/T)+10
x_weekend = 2*np.sin(2*np.pi*t_weekend/T)+10
x_daily_weekly_sinu = np.concatenate((x_weekday, x_weekend))
#Creating the Signal
x_daily_weekly_long_sinu = np.concatenate((x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu,x_daily_weekly_sinu))
#Visualization
plt.plot(x_daily_weekly_long_sinu)
plt.show()
In order to determine the two periods present, which are 96 & 672, I'm creating the FFT of the waveform as follows:
f, Pxx = signal.periodogram(x_daily_weekly_long_sinu, fs = 96, window='hanning', scaling='spectrum')
#Visualization
plt.figure(figsize = (10, 8))
plt.plot(f, Pxx)
plt.xlim(0, 10)
plt.yscale('log')
plt.xlabel('Frequency (cycles/day)')
plt.ylabel('Spectrum Amplitude')
The following is the plot of frequencies that I get.
Can anyone tell why is it showing so many frequencies instead of just two distinct frequencies of 96 & 672?
I then try to extract the top frequencies from the FFT:
for amp_arg in np.argsort(np.abs(Pxx))[::-1][1:6]:
day = 1 / f[amp_arg]
print(day)
But my output gives the following values as the top frequencies instead of 96 & 672:
1.0144927536231885
0.9859154929577465
1.1666666666666667
0.875
1.4
Why is this happening? Can anyone please help me to determine the correct periods?
It would be great if I just get a final list of values representing the exact set of periods only.

Related

Time series dBFS plot output modification - current output plot not as expected (matplotlib)

I'm trying to plot the Amplitude (dBFS) vs. Time (s) plot of an audio (.wav) file using matplotlib. I managed to do that with the following code:
def convert_to_decibel(sample):
ref = 32768 # Using a signed 16-bit PCM format wav file. So, 2^16 is the max. value.
if sample!=0:
return 20 * np.log10(abs(sample) / ref)
else:
return 20 * np.log10(0.000001)
from scipy.io.wavfile import read as readWav
from scipy.fftpack import fft
import matplotlib.pyplot as gplot1
import matplotlib.pyplot as gplot2
import numpy as np
import struct
import gc
wavfile1 = '/home/user01/audio/speech.wav'
wavsamplerate1, wavdata1 = readWav(wavfile1)
wavdlen1 = wavdata1.size
wavdtype1 = wavdata1.dtype
gplot1.rcParams['figure.figsize'] = [15, 5]
pltaxis1 = gplot1.gca()
gplot1.axhline(y=0, c="black")
gplot1.xticks(np.arange(0, 10, 0.5))
gplot1.yticks(np.arange(-200, 200, 5))
gplot1.grid(linestyle = '--')
wavdata3 = np.array([convert_to_decibel(i) for i in wavdata1], dtype=np.int16)
yvals3 = wavdata3
t3 = wavdata3.size / wavsamplerate1
xvals3 = np.linspace(0, t3, wavdata3.size)
pltaxis1.set_xlim([0, t3 + 2])
pltaxis1.set_title('Amplitude (dBFS) vs Time(s)')
pltaxis1.plot(xvals3, yvals3, '-')
which gives the following output:
I had also plotted the Power Spectral Density (PSD, in dBm) using the code below:
from scipy.signal import welch as psd # Computes PSD using Welch's method.
fpsd, wPSD = psd(wavdata1, wavsamplerate1, nperseg=1024)
gplot2.rcParams['figure.figsize'] = [15, 5]
pltpsdm = gplot2.gca()
gplot2.axhline(y=0, c="black")
pltpsdm.plot(fpsd, 20*np.log10(wPSD))
gplot2.xticks(np.arange(0, 4000, 400))
gplot2.yticks(np.arange(-150, 160, 10))
pltpsdm.set_xlim([0, 4000])
pltpsdm.set_ylim([-150, 150])
gplot2.grid(linestyle = '--')
which gives the output as:
The second output above, using the Welch's method plots a more presentable output. The dBFS plot though informative is not very presentable IMO. Is this because of:
the difference in the domains (time in case of 1st output vs frequency in the 2nd output)?
the way plot function is implemented in pyplot?
Also, is there a way I can plot my dBFS output as a peak-to-peak style of plot just like in my PSD (dBm) plot rather than a dense stem plot?
Would be much helpful and would appreciate any pointers, answers or suggestions from experts here as I'm just a beginner with matplotlib and plots in python in general.
TLNR
This has nothing to do with pyplot.
The frequency domain is different from the time domain, but that's not why you didn't get what you want.
The calculation of dbFS in your code is wrong.
You should frame your data, calculate RMSs or peaks in every frame, and then convert that value to dbFS instead of applying this transformation to every sample point.
When we talk about the amplitude, we are talking about a periodic signal. And when we read in a series of data from a sound file, we read in a series of sample points of a signal(may be or be not periodic). The value of every sample point represents a, say, voltage value, or sound pressure value sampled at a specific time.
We assume that, within a very short time interval, maybe 10ms for example, the signal is stationary. Every such interval is called a frame.
Some specific function is applied to each frame usually, to reduce the sudden change at the edge of this frame, and these functions are called window functions. If you did nothing to every frame, you added rectangle windows to them.
An example: when the sampling frequency of your sound is 44100Hz, in a 10ms-long frame, there are 44100*0.01=441 sample points. That's what the nperseg argument means in your psd function but it has nothing to do with dbFS.
Given the knowledge above, now we can talk about the amplitude.
There are two methods a get the value of amplitude in every frame:
The most straightforward one is to get the maximum(peak) values in every frame.
Another one is to calculate the RMS(Root Mean Sqaure) of every frame.
After that, the peak values or RMS values can be converted to dbFS values.
Let's start coding:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
# Determine full scall(maximum possible amplitude) by bit depth
bit_depth = 16
full_scale = 2 ** bit_depth
# dbFS function
to_dbFS = lambda x: 20 * np.log10(x / full_scale)
# Read in the wave file
fname = "01.wav"
fs,data = wavfile.read(fname)
# Determine frame length(number of sample points in a frame) and total frame numbers by window length(how long is a frame in seconds)
window_length = 0.01
signal_length = data.shape[0]
frame_length = int(window_length * fs)
nframes = signal_length // frame_length
# Get frames by broadcast. No overlaps are used.
idx = frame_length * np.arange(nframes)[:,None] + np.arange(frame_length)
frames = data[idx].astype("int64") # Convert to in 64 to avoid integer overflow
# Get RMS and peaks
rms = ((frames**2).sum(axis=1)/frame_length)**.5
peaks = np.abs(frames).max(axis=1)
# Convert them to dbfs
dbfs_rms = to_dbFS(rms)
dbfs_peak = to_dbFS(peaks)
# Let's start to plot
# Get time arrays of every sample point and ever frame
frame_time = np.arange(nframes) * window_length
data_time = np.linspace(0,signal_length/fs,signal_length)
# Plot
f,ax = plt.subplots()
ax.plot(data_time,data,color="k",alpha=.3)
# Plot the dbfs values on a twin x Axes since the y limits are not comparable between data values and dbfs
tax = ax.twinx()
tax.plot(frame_time,dbfs_rms,label="RMS")
tax.plot(frame_time,dbfs_peak,label="Peak")
tax.legend()
f.tight_layout()
# Save serval details
f.savefig("whole.png",dpi=300)
ax.set_xlim(1,2)
f.savefig("1-2sec.png",dpi=300)
ax.set_xlim(1.295,1.325)
f.savefig("1.2-1.3sec.png",dpi=300)
The whole time span looks like(the unit of the right axis is dbFS):
And the voiced part looks like:
You can see that the dbFS values become greater while the amplitudes become greater at the vowel start point:

Python: Spectrum's BURG Algorithm and Plotting

I am trying to visualize a frequency spectrum using the BURG algroithm. The data that I am trying to visualize is the distance between heartbeats in milliseconds (e.g: [700, 650, 689, ..., 702]). Time distance is measured from R peak to R peak of next heartbeat.
Now I would like to visualize the frequency band with python's spectrum library (I'm a total noob). The minimum frequency that I am trying to display is 0.0033Hz, so all time differences in my dataset summarized are 5 Minutes long.
My approach was to first take the reciprocal of each value, then multiply by 1000, and then multiply by 60. This should get me the Bpm for each heartbeat.
This is what it looks like: [67.11409396 64.72491909 ... 64.58557589]
Afterwards I use spectrum's burg algorithm to create the PSD. The "data" list contains my BpM for each heartbeat.
AR, rho, ref = arburg(data.tolist(), 7)
PSD = arma2psd(AR, rho=rho, NFFT=1024)
PSD = PSD[len(PSD):len(PSD)//2:-1]
plot(linspace(0, 0.5, len(PSD)), 10*log10(abs(PSD)*2./(1.*pi)))
pylab.legend(['PSD estimate of x using Burg AR(7)'])
The graph that I get looks like this:
5 Minutes Spectrogram
This specific data already exists as a 3D-Spectrogram (Graph above is the equivalent to the last 5 Minutes of 3D-Spectrogram):
Long Time 3D-Spectrogram
My Graph does not seem to match the 3D-Spectrogram. My frequencies are way off.... What causes this and how can I fix it?
Also I would like the y-Axis in my Graph not in [dB] but in absolute Values. I tried with:
plot(linspace(0, 0.5, len(PSD)), abs(PSD))
but that did not really seem to work. It just drew a hyperbole.
Thank you for your help!
The spectrum package comes with a pburg class than can generate a frequencies array, this is shown below. If you want direct comparison between a spectrogram and AR PSDs, I would take the time definition used to compute the spectrogram to also compute the AR PSD per window.
Also, your spectrogram example image looks focused on very low frequencies, so you may want to increase nfft to increase frequency resolution.
import matplotlib.pyplot as plt
from scipy.signal import spectrogram
import numpy as np
from spectrum import pburg
# Parameter settings
n_seconds = 10
fs = 1000 # sampling rate, in hz
freq = 10
nfft = 4096
nperseg = fs
order = 8
# Simulate 10 hz sine wave with white noise
x = np.sin(np.arange(0, n_seconds, 1/fs) * freq * 2 * np.pi)
x += np.random.rand(len(x)) / 10
# Compute spectrogram
freqs, times, powers = spectrogram(x, fs=fs, nfft=nfft)
# Get spectrogram time definition
times = (times * fs).astype(int)
window_times = np.array((times-times[0], times+times[0])).T
# Compute Burg's spectrum per window
powers_burg = np.array([pburg(x[t[0]:t[1]], order=order,
NFFT=nfft, sampling=fs).psd for t in window_times]).T
freqs_burg = np.array(pburg(x, order=order, NFFT=nfft, sampling=fs).frequencies())
# Plot
inds = np.where(freqs < 20)
inds_burg = np.where(freqs_burg < 20)
fig, axes = plt.subplots(ncols=2, figsize=(10, 5))
axes[0].pcolormesh(times/fs, freqs[inds], powers[inds], shading='gouraud')
axes[1].pcolormesh(times/fs, freqs_burg[inds_burg], powers_burg[inds_burg], shading='gouraud')
axes[0].set_title('Spectrogram')
axes[1].set_title('Burg\'s Spectrogram')

time-series segmentation in python

I am trying to segment the time-series data as shown in the figure. I have lots of data from the sensors, any of these data can have different number of isolated peaks region. In this figure, I have 3 of those. I would like to have a function that takes the time-series as the input and returns the segmented sections of equal length.
My initial thought was to have a sliding window that calculates the relative change in the amplitude. Since the window with the peaks will have relatively higher changes, I could just define certain threshold for the relative change that would help me take the window with isolated peaks. However, this will create problem when choosing the threshold as the relative change is very sensitive to the noises in the data.
Any suggestions?
To do this you need to find signal out of noise.
get mean value of you signal and add some multiplayer that place borders on top and on bottom of noise - green dashed line
find peak values below bottom of noise -> array 2 groups of data
find peak values on top of noise -> array 2 groups of data
get min index of bottom first peak and max index of top of first peak to find first peak range
get min index of top second peak and max index of bottom of second peak to find second peak range
Some description in code. With this method you can find other peaks.
One thing that you need to input by hand is to tell program thex value between peaks for splitting data into parts.
See graphic for summary.
import numpy as np
from matplotlib import pyplot as plt
# create noise data
def function(x, noise):
y = np.sin(7*x+2) + noise
return y
def function2(x, noise):
y = np.sin(6*x+2) + noise
return y
noise = np.random.uniform(low=-0.3, high=0.3, size=(100,))
x_line0 = np.linspace(1.95,2.85,100)
y_line0 = function(x_line0, noise)
x_line = np.linspace(0, 1.95, 100)
x_line2 = np.linspace(2.85, 3.95, 100)
x_pik = np.linspace(3.95, 5, 100)
y_pik = function2(x_pik, noise)
x_line3 = np.linspace(5, 6, 100)
# concatenate noise data
x = np.linspace(0, 6, 500)
y = np.concatenate((noise, y_line0, noise, y_pik, noise), axis=0)
# plot data
noise_band = 1.1
top_noise = y.mean()+noise_band*np.amax(noise)
bottom_noise = y.mean()-noise_band*np.amax(noise)
fig, ax = plt.subplots()
ax.axhline(y=y.mean(), color='red', linestyle='--')
ax.axhline(y=top_noise, linestyle='--', color='green')
ax.axhline(y=bottom_noise, linestyle='--', color='green')
ax.plot(x, y)
# split data into 2 signals
def split(arr, cond):
return [arr[cond], arr[~cond]]
# find bottom noise data indexes
botom_data_indexes = np.argwhere(y < bottom_noise)
# split by visual x value
splitted_bottom_data = split(botom_data_indexes, botom_data_indexes < np.argmax(x > 3))
# find top noise data indexes
top_data_indexes = np.argwhere(y > top_noise)
# split by visual x value
splitted_top_data = split(top_data_indexes, top_data_indexes < np.argmax(x > 3))
# get first signal range
first_signal_start = np.amin(splitted_bottom_data[0])
first_signal_end = np.amax(splitted_top_data[0])
# get x index of first signal
x_first_signal = np.take(x, [first_signal_start, first_signal_end])
ax.axvline(x=x_first_signal[0], color='orange')
ax.axvline(x=x_first_signal[1], color='orange')
# get second signal range
second_signal_start = np.amin(splitted_top_data[1])
second_signal_end = np.amax(splitted_bottom_data[1])
# get x index of first signal
x_second_signal = np.take(x, [second_signal_start, second_signal_end])
ax.axvline(x=x_second_signal[0], color='orange')
ax.axvline(x=x_second_signal[1], color='orange')
plt.show()
Output:
red line = mean value of all data
green line - top and bottom noise borders
orange line - selected peak data
1, It depends on how you want to define a "region", but looks like you just have feeling instead of strict definition. If you have a very clear definition of what kind of piece you want to cut out, you can try some method like "matched filter"
2, You might want to detect the peak of absolute magnitude. If not working, try peak of absolute magnitude of first-order difference, even 2nd-order.
3, it is hard to work on the noisy data like this. My suggestion is to do filtering before you pick up sections (on unfiltered data). Filtering will give you smooth peaks so that the position of peaks can be detected by the change of derivative sign. For filtering, try just "low-pass filter" first. If it doesn't work, I also suggest "Hilbert–Huang transform".
*, Looks like you are using matlab. The methods mentioned are all included in matlab.

Frequency Voltage Graph from EEG Data - FFT in Python

I'm slightly unsure how to handle this as it's a topic which is new to me so any guidance with my code would be greatly appreciated. I have a set of eeg recordings (18949 EEG records with a sampling rate of 500Hz, where the records are in nV). I'm trying to create a Frequency against Voltage graph from the data but I'm having no luck so far.
My code is as follows:
data = pd.read_csv('data.csv')
data = data['O1']
Fs = 500.0
Ts = 1.0/Fs
t = np.arange(len(data)) / Fs
n = len(data) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[range(int(n/2))]
Y = np.fft.fft(data)/n
Y = Y[range(int(n/2))]
fig, ax = plt.subplots(2, 1)
ax[0].plot(t,data)
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Voltage')
ax[1].plot(frq,abs(Y),'r')
ax[1].set_xlabel('Freq (Hz)')
plt.draw()
plt.show()
fig.savefig("graph.png")
And my resulting graph looks like:
Could anyone provide some guidance as to where I may be going wrong with this?
Your signal has a fairly large (at least relative to the other signal variations) DC offset in the time-domain. In the frequency-domain this would be plotted as a strong line at 0Hz (which is hidden by the plot's axis), then the amplitude of the other frequency components are relatively speaking close to 0.
For better visualization you should plot the frequency spectrum in Decibels (dB) using the formula 20*log10(abs(Y)), so you could actually see those other frequency components.

Unequal width binned histogram in python

I have an array with probability values stored in it. Some values are 0. I need to plot a histogram such that there are equal number of elements in each bin. I tried using matplotlibs hist function but that lets me decide number of bins. How do I go about plotting this?(Normal plot and hist work but its not what is needed)
I have 10000 entries. Only 200 have values greater than 0 and lie between 0.0005 and 0.2. This distribution isnt even as 0.2 only one element has whereas 2000 approx have value 0.0005. So plotting it was an issue as the bins had to be of unequal width with equal number of elements
The task does not make much sense to me, but the following code does, what i understood as the thing to do.
I also think the last lines of the code are what you really wanted to do. Using different bin-widths to improve visualization (but don't target the distribution of equal amount of samples within each bin)! I used astroml's hist with method='blocks' (astropy supports this too)
Code
# Python 3 -> beware the // operator!
import numpy as np
import matplotlib.pyplot as plt
from astroML import plotting as amlp
N_VALUES = 1000
N_BINS = 100
# Create fake data
prob_array = np.random.randn(N_VALUES)
prob_array /= np.max(np.abs(prob_array),axis=0) # scale a bit
# Sort array
prob_array = np.sort(prob_array)
# Calculate bin-borders,
bin_borders = [np.amin(prob_array)] + [prob_array[(N_VALUES // N_BINS) * i] for i in range(1, N_BINS)] + [np.amax(prob_array)]
print('SAMPLES: ', prob_array)
print('BIN-BORDERS: ', bin_borders)
# Plot hist
counts, x, y = plt.hist(prob_array, bins=bin_borders)
plt.xlim(bin_borders[0], bin_borders[-1] + 1e-2)
print('COUNTS: ', counts)
plt.show()
# And this is, what i think, what you really want
fig, (ax1, ax2) = plt.subplots(2)
left_blob = np.random.randn(N_VALUES/10) + 3
right_blob = np.random.randn(N_VALUES) + 110
both = np.hstack((left_blob, right_blob)) # data is hard to visualize with equal bin-widths
ax1.hist(both)
amlp.hist(both, bins='blocks', ax=ax2)
plt.show()
Output

Categories