I'm using the librosa library to get and filter spectrograms from audio data.
I mostly understand the math behind generating a spectrogram:
Get signal
window signal
for each window compute Fourier transform
Create matrix whose columns are the transforms
Plot heat map of this matrix
So that's really easy with librosa:
spec = np.abs(librosa.stft(signal, n_fft=len(window), window=window)
Yay! I've got my matrix of FFTs. Now I see this function librosa.amplitude_to_db and I think this is where my ignorance of signal processing starts to show. Here is a snippet I found on Medium:
spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)
Why does the author use this amplitude_to_db function? Why not just plot the output of the STFT directly?
The range of perceivable sound pressure is very wide, from around 20 μPa (micro Pascal) to 20 Pa, a ratio of 1 million. Furthermore the human perception of sound levels is not linear, but better approximated by a logarithm.
By converting to decibels (dB) the scale becomes logarithmic. This limits the numerical range, to something like 0-120 dB instead. The intensity of colors when this is plotted corresponds more closely to what we hear than if one used a linear scale.
Note that the reference (0 dB) point in decibels can be chosen freely. The default for librosa.amplitude_to_db is to compute numpy.max, meaning that the max value of the input will be mapped to 0 dB. All other values will then be negative. The function also applies a threshold on the range of sounds, by default 80 dB. So anything lower than -80 dB will be clipped -80 dB.
Related
I need to utilize Fourier transform on Lorentzian function with ln scale.
I know Lorentzian function after FFT is exp(-pi|k|), it seems right.
I do that on ln scale. It supposed to be linear and no oscillation at all.
However there is oscillation. I lost it totally.
Here is my code:
import numpy as np
from scipy import fft
import matplotlib.pyplot as plt
a =1
N = 500
x =np.linspace(-5,5,N)
lorentz = (a/np.pi) * (1/(a**2 + x**2))
fourier = (fft.fft(lorentz))
fig, (ax1) = plt.subplots(nrows=1, ncols=1)
ax1.loglog(abs(fourier[0:int(N/2)]),basey=np.e)
ax1.grid(True)
plt.show()
How could I solve the problem?
Follow comment said:
Here is
x =np.linspace(-20,20,N)
It seems like postpone the oscillation but still there.
after adding hamming window :
Hamming window postpones it also.
I try to extend to
x =np.linspace(-60,60,N)
It seems correct(related to a and wider range and point interval). But I'm curious about what happened.
Pascal's remark helps to explain. At first glance, I felt the oscillating distortion you show was related to analysis window boundaries. When your signal is not zero on either side of the window, the FFT analysis will find a step, which results into "butterfly" problems that have nothing to do with your input signal. Hamming window - raised cosine - can solve that, but if the hamming has to do too much smoothing on the edges, you analyse a signal you don't have !
Nice to see the tip to enlarge the analysis window worked in this case.. so you took a larger range around zero, you get the expected result for a Lorentzian function. My science expertise is too limited to actually understand why this particular spectrum is the correct result.
Attempt to explain why 2xNyquist requirement is relevant: you are using signal analysis tools in the real domain (not complex input). For FFT analysis of real samples, the analysis window should accommodate 2 periods of the lowest frequency you are interested in. You are investigating an impulse response, so its "period" shape will be the only one you are interested in. By taking a larger interval around 0 into account, you have put your impulse response in the middle. Hamming window will be around 1 there. When your analysis window is wide enough, and a Hamming window is applied, the FFT will see proper input (zero on either side !) and yield proper, smooth output, as if you are analysing a very low frequency periodic signal.
My experience with FFT tools is in the field of speech research. In a speech sample, there is a pitch. The lowest frequency or "f0" of the speaker. For males, you have e.g. typically 100 Hz pitch. With a sampling frequency of 20kHz, a single pitch period will require 200 samples. Two pitch periods require 400 samples. I prefer setting FFT order instead of analysis window. Order 9 FFT is 512 samples in your window, it will yield 256 frequencies. Order 10 is 512 result frequencies, requiring 1024 samples, etc. The hamming window I use in my spectrum tool is always the full window, not a truncated window.
My question is whether I can optimally determine the distance between the source and the center of rotation and the distance between the center of rotation and the detector array for a given image and projection geometry.
By optimal I mean that the number of zero entries of the measurement vector is minimized.
In the following code snippet I used the Astra toolbox with which we simulate our 2D tomography.
from skimage.io import imread
from astra import creators, optomo
import numpy as np
# load some 400x400 pixel image
image = imread("/path/to/image.png, as_gray=True)"
# create geometries and projector
# proj_geom = astra_create_proj_geom('fanflat', det_width, det_count, angles, source_origin, origin_det)
proj_geom = creators.create_proj_geom('fanflat', 1.0, 400, np.linspace(0,np.pi,180), 1500.0, 500.0);
vol_geom = creators.create_vol_geom(400,400)
proj_id = creators.create_projector('line_fanflat', proj_geom, vol_geom)
# create forward projection
# fp is our measurement vector
W = optomo.OpTomo(proj_id)
fp = W*image
In my example if I use np.linspace(0,np.pi,180) the number of zero-entries of fp are 1108, if I use np.linspace(0,np.pi/180,180) instead the number increases to 5133 which makes me believe that the values 1500.0 and 500.0 are not well chosen.
Generally speaking, those numbers are chosen due to experimental constrains and not algorithmic ones. In many settings these values are fixed, but lets ignore those, as you seem to have the flexibility.
In an experimental scan, what do you want?
If you are looking for high resolution you want the "magnification" DSD/DSO to be the highest, thus putting the detector far, and the object close to the source. This comes with problems though. A far detector requires higher scanning times for the same SNR (due to scatter and other phenomena that will make your X-rays not go straight). And not only that, the bigger the mag, the more likely you are to have huge parts of the object completely outside your detector range, as detectors are not that big (in mm).
So the common scanning strategy to set these up is 1) put the detector as far as you can allow with your strict scanning time. 2) put the object as close to the source as you can, but always making sure its entire width fits in the detector.
Often compromises can be done, particularly if you know what is the smallest feature you want to see (allow 3 or 4 pixels to properly see it).
However, algorithmically speaking? its irrelevant. I can't speak for ASTRA, but likely not even the computational time will be affected, as pixels that have zeroes are because they are out of the field of view and therefore simply not computed, at all.
Now, for your simple toy example, if you completely ignore all physics, there is a way:
1.- use simple trigonometry to compute what ratios of distances you need to make sure all the object is in the detector.
2.- create a fully white image and go changing the sizes iteratively until the first couple of pixels in the outside part of the detector become zero.
I have signal(s) (of a person climbing stairs.) of the following nature. This is a signal worth 38K + samples over a period of 6 minutes of stair ascent. The parts where there is some low frequency noise are the times when the person would take a turnabout to get to the next flight of stairs (and hence does not count as stair ascent.)
Figure 1
This is why I need to get rid of it for my deep learning model which only accepts the stair ascent data. Essentially, I only need the high frequency regions where the person is climbing stairs. I could do eliminate it manually, but it would take me a lot of time since there are 58 such signals.
My approach for a solution to this problem was modulating this signal with a square wave which is 0 for low frequency regions and 1 for high frequency regions and then to multiply the signals together. But the problem is how to create such a square wave signal which detects the high and low frequency regions on its own?
I tried enveloping the signal (using MATLAB's envelope rms function) and I got the following result:
Figure 2
As you can see the envelope rms signal follows the function quite well. But I am stuck as to how to create a modulating square wave function off of it (essentially what I am asking for a variable pulse-width modulating waveform.)
PS: I have considered using high-pass filter but this won't work because there are some low frequency signals in the high frequency stair-climbing region which I cannot afford to remove. I have also thought of using some form of rising/falling edge detection(for the envelope rms function) but have found no practical way of implementing it.) Please advise.
Thank you for your help in advance,
Shreya
Thanks to David for his thresholding suggestion which I did on my dataset I have these results... though I am again stuck with trying to get rid of the redundant peaks between zeros (see image below) What do I do next?
Figure 3
I think I have been able to solve my problem of being able to isolate the "interesting" part of the waveform from the entire original waveform successfully using the following procedure (for the reader's future reference:)
A non-uniform waveform such as Figure 1 can have the "envelope(rms)" MATLAB function applied to obtain the orange function such as the one in Figure 2. Subsequently, I filtered this enveloperms waveform using MATLAB's very own "idfilt" function. This enabled me to get rid of the unwanted spikes (between zeroes) that were occurring between the "interesting" parts of the waveform. Then, using thresholding, I converted this waveform to be equal to 1 at the "interesting" parts and 0 at the "uninteresting" parts giving me a pulse-width modulated square wave form that follows ONLY the "interesting parts of the original waveform (in Figure 1) I then multiplied my square waveform with the original function and was able to filter out the "uninteresting" parts as demonstrated in Figure 4.
Figure 4
Thank You all for your help! This thread is now resolved!
I think I have been able to solve my problem of being able to isolate the "interesting" part of the waveform from the entire original waveform successfully using the following procedure (for the reader's future reference:)
A non-uniform waveform such as Figure 1 can have the "envelope(rms)" MATLAB function applied to obtain the orange function such as the one in Figure 2. Subsequently, I filtered this enveloperms waveform using MATLAB's very own "idfilt" function. This enabled me to get rid of the unwanted spikes (between zeroes) that were occurring between the "interesting" parts of the waveform. Then, using thresholding, I converted this waveform to be equal to 1 at the "interesting" parts and 0 at the "uninteresting" parts giving me a pulse-width modulated square wave form that follows ONLY the "interesting parts of the original waveform (in Figure 1) I then multiplied my square waveform with the original function and was able to filter out the "uninteresting" parts as demonstrated in Figure 4.
Thank You all for your help! This thread is now resolved!
I am trying to generate synthetic data for a time-domain signal. Let's say my signal is a square wave and I have on top of it some random noise. I'll model the noise as Gaussian. If I generate the data as a vector of length N and then add to it random noise sampled from a normal distribution of mean 0 and width 1, I have a rough simulation of the situation I care about. However, this adds noise with characteristic timescale set by the sampling rate. I do not want this, as in reality the noise has a much longer timescale associated with it. What is an efficient way to generate noise with a specific bandwidth?
I've tried generating the noise at each sampled point and then using the FFT to cut out frequencies above a certain value. However, this severely attenuates the signal.
My idea was basically:
noise = normrnd(0,1);
f = fft(noise);
f(1000:end) = 0;
noise = ifft(f);
This kind of works but severly attenuates the signal.
It's pretty common just to generate white noise and filter it. Often an IIR is used since it's cheap and the noise phases are random anyway. It does attenuate the signal, but it costs nothing to amplify it.
You can also generate noise directly with an IFFT. In the example you give every coefficient in the output of fft(noise) is a Gaussian-distributed random variable, so instead of getting those coefficients with an FFT and zeroing out the ones you don't want, you can just set the ones you want and IFFT to get the resulting signal. Remember that the coefficents are complex, but the real and imaginary parts are independently Gaussion-distributed.
I'm using scipy.signal.fft.rfft() to calculate power spectral density of a signal. The sampling rate is 1000Hz and the signal contains 2000 points. So frequency bin is (1000/2)/(2000/2)=0.5Hz. But I need to analyze the signal in [0-0.1]Hz.
I saw several answers recommending chirp-Z transform, but I didn't find any toolbox of it written in Python.
So how can I complete this small-bin analysis in Python? Or can I just filter this signal to [0-0.1]Hz using like Butterworth filter?
Thanks a lot!
Even if you use another transform, that will not make more data.
If you have a sampling of 1kHz and 2s of samples, then your precision is 0.5Hz. You can interpolate this with chirpz (or just use sinc(), that's the shape of your data between the samples of your comb), but the data you have on your current point is the data that determines what you have in the lobes (between 0Hz and 0.5Hz).
If you want a real precision of 0.1Hz, you need 10s of data.
You can't get smaller frequency bins to separate out close spectral peaks unless you use more (a longer amount of) data.
You can't just use a narrower filter because the transient response of such a filter will be longer than your data.
You can get smaller frequency bins that are just a smooth interpolation between nearby frequency bins, for instance to plot the spectrum on wider paper or at a higher dpi graphic resolution, by zero-padding the data and using a longer FFT. But that won't create more detail.