In my algorithm I created a spectogram and did manipulation on the data:
import scipy.signal as signal
data = spio.loadmat(mat_file, squeeze_me=True)['records'][:, i]
data = data- np.mean(data)
data = data/ np.max(np.abs(data))
freq, time, Sxx = signal.spectrogram(data, fs=250000, window=signal.get_window("hamming", 480), nperseg=None, noverlap=360, nfft=480)
// ...
// manipulation on Sxx
// ...
Is there anyway to revert the freq, time, and Sxx back to signal?
No, this is not possible. To calculate a spectrogram, you divide your input time-domain signal into (half overlapping) chunks of data, which each are multiplied by an appropriate window function, after which you do a FFT, which gives you a complex vector that indicates the amplitude and phase for every frequency bin. Each column of the spectrogram is finally formed by taking the absolute square of one FFT (and normally you throw away the negative frequencies, since a PSD is symmetric for a real input signal). By taking the absolute square, you lose any phase information. This makes it impossible to accurately reconstruct the original time-domain signal.
Since your ear doesn't care about phase information (your brain would sense something similar as a spectrogram), it might however be possible to reconstruct a signal that sounds approximately the same. This could basically be done by doing all the described steps in reverse, while choosing a random phase for the FFT.
Note that there is one issue with your code: you create a variable named signal, which 'shadows' the scipy.signal module which you import with the same name.
Related
I need to utilize Fourier transform on Lorentzian function with ln scale.
I know Lorentzian function after FFT is exp(-pi|k|), it seems right.
I do that on ln scale. It supposed to be linear and no oscillation at all.
However there is oscillation. I lost it totally.
Here is my code:
import numpy as np
from scipy import fft
import matplotlib.pyplot as plt
a =1
N = 500
x =np.linspace(-5,5,N)
lorentz = (a/np.pi) * (1/(a**2 + x**2))
fourier = (fft.fft(lorentz))
fig, (ax1) = plt.subplots(nrows=1, ncols=1)
ax1.loglog(abs(fourier[0:int(N/2)]),basey=np.e)
ax1.grid(True)
plt.show()
How could I solve the problem?
Follow comment said:
Here is
x =np.linspace(-20,20,N)
It seems like postpone the oscillation but still there.
after adding hamming window :
Hamming window postpones it also.
I try to extend to
x =np.linspace(-60,60,N)
It seems correct(related to a and wider range and point interval). But I'm curious about what happened.
Pascal's remark helps to explain. At first glance, I felt the oscillating distortion you show was related to analysis window boundaries. When your signal is not zero on either side of the window, the FFT analysis will find a step, which results into "butterfly" problems that have nothing to do with your input signal. Hamming window - raised cosine - can solve that, but if the hamming has to do too much smoothing on the edges, you analyse a signal you don't have !
Nice to see the tip to enlarge the analysis window worked in this case.. so you took a larger range around zero, you get the expected result for a Lorentzian function. My science expertise is too limited to actually understand why this particular spectrum is the correct result.
Attempt to explain why 2xNyquist requirement is relevant: you are using signal analysis tools in the real domain (not complex input). For FFT analysis of real samples, the analysis window should accommodate 2 periods of the lowest frequency you are interested in. You are investigating an impulse response, so its "period" shape will be the only one you are interested in. By taking a larger interval around 0 into account, you have put your impulse response in the middle. Hamming window will be around 1 there. When your analysis window is wide enough, and a Hamming window is applied, the FFT will see proper input (zero on either side !) and yield proper, smooth output, as if you are analysing a very low frequency periodic signal.
My experience with FFT tools is in the field of speech research. In a speech sample, there is a pitch. The lowest frequency or "f0" of the speaker. For males, you have e.g. typically 100 Hz pitch. With a sampling frequency of 20kHz, a single pitch period will require 200 samples. Two pitch periods require 400 samples. I prefer setting FFT order instead of analysis window. Order 9 FFT is 512 samples in your window, it will yield 256 frequencies. Order 10 is 512 result frequencies, requiring 1024 samples, etc. The hamming window I use in my spectrum tool is always the full window, not a truncated window.
I'm using the librosa library to get and filter spectrograms from audio data.
I mostly understand the math behind generating a spectrogram:
Get signal
window signal
for each window compute Fourier transform
Create matrix whose columns are the transforms
Plot heat map of this matrix
So that's really easy with librosa:
spec = np.abs(librosa.stft(signal, n_fft=len(window), window=window)
Yay! I've got my matrix of FFTs. Now I see this function librosa.amplitude_to_db and I think this is where my ignorance of signal processing starts to show. Here is a snippet I found on Medium:
spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)
Why does the author use this amplitude_to_db function? Why not just plot the output of the STFT directly?
The range of perceivable sound pressure is very wide, from around 20 μPa (micro Pascal) to 20 Pa, a ratio of 1 million. Furthermore the human perception of sound levels is not linear, but better approximated by a logarithm.
By converting to decibels (dB) the scale becomes logarithmic. This limits the numerical range, to something like 0-120 dB instead. The intensity of colors when this is plotted corresponds more closely to what we hear than if one used a linear scale.
Note that the reference (0 dB) point in decibels can be chosen freely. The default for librosa.amplitude_to_db is to compute numpy.max, meaning that the max value of the input will be mapped to 0 dB. All other values will then be negative. The function also applies a threshold on the range of sounds, by default 80 dB. So anything lower than -80 dB will be clipped -80 dB.
I am trying to generate synthetic data for a time-domain signal. Let's say my signal is a square wave and I have on top of it some random noise. I'll model the noise as Gaussian. If I generate the data as a vector of length N and then add to it random noise sampled from a normal distribution of mean 0 and width 1, I have a rough simulation of the situation I care about. However, this adds noise with characteristic timescale set by the sampling rate. I do not want this, as in reality the noise has a much longer timescale associated with it. What is an efficient way to generate noise with a specific bandwidth?
I've tried generating the noise at each sampled point and then using the FFT to cut out frequencies above a certain value. However, this severely attenuates the signal.
My idea was basically:
noise = normrnd(0,1);
f = fft(noise);
f(1000:end) = 0;
noise = ifft(f);
This kind of works but severly attenuates the signal.
It's pretty common just to generate white noise and filter it. Often an IIR is used since it's cheap and the noise phases are random anyway. It does attenuate the signal, but it costs nothing to amplify it.
You can also generate noise directly with an IFFT. In the example you give every coefficient in the output of fft(noise) is a Gaussian-distributed random variable, so instead of getting those coefficients with an FFT and zeroing out the ones you don't want, you can just set the ones you want and IFFT to get the resulting signal. Remember that the coefficents are complex, but the real and imaginary parts are independently Gaussion-distributed.
This is a followup question to one I asked earlier based on the chat after the answer given by #hotpaw2. I have a signal which will be a single cosine wave, with a phase offset. My task is to extract (with very high accuracy required) the amplitude and phase of this single freuqnecy component.
On paper, the following relations hold assuming a properly normalized fourier transform T:
Unsurprisingly, there is a bit more to the DFT than simply taking the transform and picking off the relevent components. In particular, the discussion suggested to me that I was not entirely clear on what the phase offset is being measured with respect to, and that there are significant edge effects that can destroy the accuracy of the result if data is not properly windowed.
I have been googling around but most of the discussion is fairly technical and light on example, so I was hoping that someone can shed some light on things. In particular, I came across one example suggesting that instead of doing a simple transform, I should be shifting it first:
import numpy as np
import pylab as pl
f = 30.0
w = 2.0*np.pi*f
phase = np.pi/2
num_t = 10*f
t, dt = np.linspace(0, 1, num_t, endpoint=False, retstep=True)
signal = np.cos(w*t+phase)#+np.random.normal(0,0.25,len(t))
amp = np.fft.fftshift(np.fft.rfft(np.fft.ifftshift(signal)))
freqs = np.fft.fftshift(np.fft.rfftfreq(t.shape[-1],dt))
index = np.where(freqs==30)
print index[0][0]
print(np.angle(amp))[index[0][0]]
print (np.abs(amp))[index[0][0]]*(2.0/len(t))
pl.subplot(211)
pl.semilogy(freqs,np.abs(amp))
pl.subplot(212)
pl.plot(freqs,(np.angle(amp)))
pl.show()
So: first set of questions: can someone explain the point of using fftshift, and what exactly it is doing to the data? Why does using an inverse shift, transforming, and then shifting then require a frequency component set which is only shifted once, with no inverse opration? Is this approach the correct one (ignoring for the moment the issue of a window).
second set of questions: if I window my data, presumably this will affect the amplitude and likely the phase(?) of the result. Is there an analytical way to correct for the amplitude change for a given window shape? I can find a few tables that list correction factors, but I haven't really seen a good explanation yet.
In the linked question, it was stated that phase should be measured near the center of the hump of the window function. But because the window function is a time-domain function and I want the phase for a specific frequency, I don't quite understand what that means.
Any light that could be shed on the matter (perhaps in the form of references, since clearly I need to do some more reading) would be most appreciated.
I used the Short Time Fourier Transform on a random wav file and performed some changes on the magnitude spectrum. To "hear" what I did, I'm trying to reverse the process by using the inverse STFT.
In Python the spectrum looks like this.
Magnitude Spectrum in Python
However, if I'm trying to convert the whole thing back into a wav and look at its spectrum it looks like this:
Magnitude Spectrum after ISTFT
I'm just using the magnitude for the Inverse FFT. What am I doing wrong? Do I need the phase signal as well? And can I use the same phase signal even if I manipulated the magnitude?
# ....read wav-file, perform STFT on it and manipulate the magnitude spectrum
# Then (as follows) I'm trying to write it back
fs = 41000.0 # Frequency Spectrum
filteredwrite = istft(magnitude)
from scipy.io.wavfile import read, write
write('../data/mxx.wav', fs, filteredwrite.astype(x.dtype))
The Fourier functions work properly if I'm doing ifft(fft(F)) - nothing wrong here.
The phase is required to reconstruct the original signal because the relative phases of the component frequencies determine the superposition of those waves.