Both Librosa and Scipy have the fft function, however, they give me a different spectrogram output even with the same signal input.
Scipy
I am trying to get the spectrogram with the following code
import numpy as np # fast vectors and matrices
import matplotlib.pyplot as plt # plotting
from scipy import fft
X = np.sin(np.linspace(0,1e10,5*44100))
fs = 44100 # assumed sample frequency in Hz
window_size = 2048 # 2048-sample fourier windows
stride = 512 # 512 samples between windows
wps = fs/float(512) # ~86 windows/second
Xs = np.empty([int(2*wps),2048])
for i in range(Xs.shape[0]):
Xs[i] = np.abs(fft(X[i*stride:i*stride+window_size]))
fig = plt.figure(figsize=(20,7))
plt.imshow(Xs.T[0:150],aspect='auto')
plt.gca().invert_yaxis()
fig.axes[0].set_xlabel('windows (~86Hz)')
fig.axes[0].set_ylabel('frequency')
plt.show()
Then I get the following spectrogram
Librosa
Now I try to get the same spectrogram with Librosa
from librosa import stft
X_libs = stft(X, n_fft=window_size, hop_length=stride)
X_libs = np.abs(X_libs)[:,:int(2*wps)]
fig = plt.figure(figsize=(20,7))
plt.imshow(X_libs[0:150],aspect='auto')
plt.gca().invert_yaxis()
fig.axes[0].set_xlabel('windows (~86Hz)')
fig.axes[0].set_ylabel('frequency')
plt.show()
Question
The two spectrogram are obviously different, specifically, the Librosa version has an attack at the very beginning.
What causes the difference? I don't see many parameters that I can tune in the documentation for Scipy and Librosa.
The reason for this is the argument center for librosa's stft. By default it's True (along with pad_mode = 'reflect').
From the docs:
librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=, pad_mode='reflect')
center:boolean
If True, the signal y is padded so that frame D[:, t] is centered at y[t * hop_length].
If False, then D[:, t] begins at y[t * hop_length]
pad_mode:string
If center=True, the padding mode to use at the edges of the signal. By default, STFT uses reflection padding.
Calling the STFT like this
X_libs = stft(X, n_fft=window_size, hop_length=stride,
center=False)
does lead to a straight line:
Note that librosa's stft also uses the Hann window function by default. If you want to avoid this and make it more like your Scipy stft implementation, call the stft with a window consisting only of ones:
X_libs = stft(X, n_fft=window_size, hop_length=stride,
window=np.ones(window_size),
center=False)
You'll notice that the line is thinner.
Related
Data clip I'm using
I'm trying to bandpass the attached EEG signal, then apply a hilbert transform and take the absolute of the hilbert to get the instantaneous power (e.g., here). The bandpassed signal looks fine (first plot), and the hilbert of the raw signal looks fine (second plot), but the hilbert of the bandpassed signal does not show up (last plot). The resulting array is: [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj].
Reproducible error with:
import numpy as np
from neurodsp.filt import filter_signal
from scipy import signal
import matplotlib.pyplot as plt
Fs = 1024
LBP, HBP = 1, 100
Chan1 = np.loadtxt('SampleData')
Chan1_BP = filter_signal(Chan1, Fs, 'bandpass', (LBP,HBP))
analytical_signal = signal.hilbert(Chan1)
amplitude_envelope = np.abs(analytical_signal)
#Show bandpassed signal works:
fig0 = plt.figure(figsize=(10, 8))
plt.plot(Chan1)
plt.plot(Chan1_BP)
fig1 = plt.figure(figsize=(10, 8))
plt.plot(Chan1)
plt.plot(amplitude_envelope)
# Now with bandpassed signal
analytical_signal = signal.hilbert(Chan1_BP)
amplitude_envelope = np.abs(analytical_signal)
fig2 = plt.figure(figsize=(10, 8))
plt.plot(Chan1_BP)
plt.plot(amplitude_envelope)
Take a closer look at the values in Chan1_BP. You'll see that the values at the beginning and end of the array are nan. The nans were generated by neurodsp.filt.filter_signal. The default filter used by filter_signal is a FIR filter, and the default behavior is to pad the output with nans for values that cannot be computed with the full length of the FIR filter.
You can change that behavior by passing remove_edges=False, e.g.
Chan1_BP = filter_signal(Chan1, Fs, 'bandpass', (LBP,HBP), remove_edges=False)
With that change, the plots should look like you expected.
Tensorflow is not reconstructing the original signal when applying the STFT followed by the inverse STFT. The problems arise when the frames of the STFT overlap: It seems like every frame contributes with a weight of 1 regardless of the number of overlapping frames N = frame_size / frame_step. As a result, the central part of the signal is N times larger than the original. Here is a minimal code to reproduce the error:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
size = 2048
frame_length = 512
frame_step = 128
waveform = np.sin(np.arange(size) * 1 / 100)
stft = tf.signal.stft(waveform, frame_length, frame_step, window_fn=None)
inverse_stft = tf.signal.inverse_stft(stft, frame_length, frame_step, window_fn=None)
plt.plot(waveform)
plt.plot(inverse_stft)
plt.show()
plt.clf()
Notice that I'm using no window. If I put the Hann window, the central part works well but the borders are smoothly going to zero, a related but surprisingly different error. The implementation of scipy works well under all circumstances.
Am I missing something?
Seems that Tensorflow has some not perfect implementation that does not work well on boundaries of signals/stfts. And it has not take into the account approriate scaling when frame_rate =/= frame length
This could be posted as an issue on their github.
For longer signals (plus taking into account scaling) it looks "good enough":
size = 20480
frame_length = 512
frame_step = 256
waveform = np.sin(np.arange(size) * 1 / 100)
stft = tf.signal.stft(waveform, frame_length, frame_step, window_fn=None, pad_end=True)
inverse_stft = tf.signal.inverse_stft(stft, frame_length, frame_step, window_fn=None) / (frame_length / frame_step)
plt.plot(waveform)
plt.plot(inverse_stft)
plt.show()
plt.clf()
Using PyWavelets and Matplotbib.Specgram on a signal gives more detailed plots with pywt.dwt then pywt.cwt. How can I get a pywt.cwt specgram in a similar way?
With dwt:
import pywt
import pywt.data
import matplotlib.pyplot as plot
from scipy import signal
from scipy.io import wavfile
bA, bD = pywt.dwt(datamean, 'db2')
powerSpectrum, freqenciesFound, time, imageAxis = plot.specgram(bA, NFFT = 387, Fs=100)
plot.xlabel('Time')
plot.ylabel('Frequency')
plot.show()
with this spectrogram plot:
https://imgur.com/a/bYb8bBS
With cwt:
widths = np.arange(1,5)
coef, freqs = pywt.cwt(datamean, widths,'morl')
powerSpectrum, freqenciesFound, time, imageAxis = plot.specgram(coef, NFFT = 129, Fs=100)
plot.xlabel('Time')
plot.ylabel('Frequency')
plot.show()
with this spectrogram plot:
https://imgur.com/a/GIINzJp
and for better results:
sig = datamean
widths = np.arange(1, 31)
cwtmatr = signal.cwt(sig, signal.ricker, widths)
plt.imshow(cwtmatr, extent=[-1, 1, 1, 5], cmap='PRGn', aspect='auto',
vmax=abs(cwtmatr).max(), vmin=-abs(cwtmatr).max())
plt.show()
with this spectrogram plot:
https://imgur.com/a/TnXqgGR
How can I get for cwt (spectrogram plot 2 and 3) a similar spectogram plot and style like in the first one?
It seems like the 1st spectrogram plot compared to the 3rd has much more details.
This would be better as a comment, but since I lack the Karma to do that:
You don't want to make a spectrogram with wavelets, but a scalogram instead. What it looks like you're doing above is projecting your data in a scale subspace (that correlates to frequency), then taking those scales and finding the frequency content of them which is not what you probably want.
The detail and approximation coefficients are what you would want to use directly. Unfortunately, PyWavelets doesn't have a simple plotting function to do this for you, AFAIK. Matlab does, and their help page may be illuminating if I fail.
def scalogram(data):
wave='db4'
coeff=pywt.wavedec(data,wave)
levels=len(coeff)
lengths=[len(co) for co in coeff]
col=np.max(lengths)
im=np.ones([levels,col])
col=col.astype(float)
for level in range(levels):
#print [lengths[level],col]
y=coeff[level]
if lengths[1+level]<col:
x=col/(lengths[1+level]+1)*np.arange(1,len(y)+1)
xi=np.linspace(0,int(col),int(col))
yi=griddata(points=x,values=y,xi=xi,method='nearest')
else:
yi=y
im[level,:]=yi
im[im==0]=np.nan
tiles=sum(lengths)-lengths[0]
return im,tiles
Wxx,tiles=scalogram(data)
IM=plt.imshow(np.log10(abs(Wxx)),aspect='auto')
plt.show()
There are better ways of doing that, but it works. This produces a square matrix similar to spectrogram in "Wxx", and tiles is simply a counter of the number of time-frequency tilings to compare to the number used in a SFFT.
I've attached a picture of what these tilings look like
I am trying to implement FFT by using the conv1d function provided in Pytorch.
Generating artifical signal
import numpy as np
import torch
from torch.autograd import Variable
from torch.nn.functional import conv1d
from scipy import fft, fftpack
import matplotlib.pyplot as plt
%matplotlib inline
# Creating filters
d = 4096 # size of windows
def create_filters(d):
x = np.arange(0, d, 1)
wsin = np.empty((d,1,d), dtype=np.float32)
wcos = np.empty((d,1,d), dtype=np.float32)
window_mask = 1.0-1.0*np.cos(x)
for ind in range(d):
wsin[ind,0,:] = np.sin(2*np.pi*((ind+1)/d)*x)
wcos[ind,0,:] = np.cos(2*np.pi*((ind+1)/d)*x)
return wsin,wcos
wsin, wcos = create_filters(d)
wsin_var = Variable(torch.from_numpy(wsin), requires_grad=False)
wcos_var = Variable(torch.from_numpy(wcos),requires_grad=False)
# Creating signal
t = np.linspace(0,1,4096)
x = np.sin(2*np.pi*100*t)+np.sin(2*np.pi*200*t)+np.random.normal(scale=5,size=(4096))
plt.plot(x)
FFT with Pytorch
signal_input = torch.from_numpy(x.reshape(1,-1),)[:,None,:4096]
signal_input = signal_input.float()
zx = conv1d(signal_input, wsin_var, stride=1).pow(2)+conv1d(signal_input, wcos_var, stride=1).pow(2)
FFT with Scipy
fig = plt.figure(figsize=(20,5))
plt.plot(np.abs(fft(x).reshape(-1))[:500])
My Question
As you can see the two outputs are quite similar in terms of the peaks characteristics. That means my implementation is not totally wrong.
However, there are also some subtleties, such as the scale of the spectrum, and the signal to noise ratio. I am unable to figure out what's missing here to get the exact same result.
You calculated the power rather than the amplitude.
You simply need to add the line zx = zx.pow(0.5) to take the square root to get the amplitude.
As of version 1,8, PyTorch has a native implementation torch.fft:
torch.fft.fft(x)
I am trying to convert stft of a wav file into chromagram.
Here's my code :-
def stft(x,fs,framesize,hopsize):
frame = int(framesize*fs)
hop = int(hopsize*fs)
w = scipy.hamming(frame)
X = scipy.array([scipy.fft(w*x[i:i+frame])])
for i in range(0,len(x)-frame,hop)
return X
Here's the code for chromagram :-
def chromagram(x,fs,framesize,hopsize):
X = stft(x,fs,framesize,hopsize)
chroma = np.fmod(np.round(np.log2(X / 440) * 12), 12)
return chroma
When I calculate fft I get an array with complex values so I have to cast the result to float before calculating chroma. Am I doing anything wrong here?
Also, How do I plot the result?
I don't think, that works the way to do it. In X you have the complex-valued STFT. You can get its magnitude values with np.abs(X). Did you want to apply this formula? This was to convert frequencies to musical notes, but in X there are no frequencies. You can get the the corresponding frequencies with np.fft.fftfreq(framesize, 1.0/fs).
If you don't want to use the Bregman Audio-Visual Information Toolbox for Chroma Features, and want to implement them for you own, you could port the Matlab Chroma Toolbox. I think they use filterbanks instead of the FFT. Down on this page you find references where Chroma Features are explained in detail.
Anyway, if you have Chroma Features, you can plot them like any 2-dimensional array with imshow.
from matplotlib import pyplot as plt
import numpy as np
X = np.random.random((30, 30))
plt.imshow(X)
plt.show()