Signal processing-remove noise for a series of spectra

Signal processing-remove noise for a series of spectra - python

I am a chemist and measured a series of spectra of one compound under increasing temperature. (-200 degree to 0 degree). The shape of spectra is very similar at different temperature. The only difference is the intensity: at higher temperature the intensity is lower.
My problem is at high temperature, e.g. 0 degree, the real signal's intensity is quite close to the background noise's amplitude, which make the spectra at high temperature very noisy. I tried some simple smoothing method but the result is not good.
The noise is much less affected by the temperature change than the real signal(which means we can assume the background noise doesn't change too much). Thus, I wonder is there any method that can remove the noise(background) using the series of spectra I have, since they share the "common" background noise.
Any information (e.g. name of method, tools in python or R, reference) will be helpful. Thanks for your help!

Related

What is the conceptual purpose of librosa.amplitude_to_db?

I'm using the librosa library to get and filter spectrograms from audio data.
I mostly understand the math behind generating a spectrogram:
Get signal
window signal
for each window compute Fourier transform
Create matrix whose columns are the transforms
Plot heat map of this matrix
So that's really easy with librosa:
spec = np.abs(librosa.stft(signal, n_fft=len(window), window=window)
Yay! I've got my matrix of FFTs. Now I see this function librosa.amplitude_to_db and I think this is where my ignorance of signal processing starts to show. Here is a snippet I found on Medium:
spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)
Why does the author use this amplitude_to_db function? Why not just plot the output of the STFT directly?

The range of perceivable sound pressure is very wide, from around 20 μPa (micro Pascal) to 20 Pa, a ratio of 1 million. Furthermore the human perception of sound levels is not linear, but better approximated by a logarithm.
By converting to decibels (dB) the scale becomes logarithmic. This limits the numerical range, to something like 0-120 dB instead. The intensity of colors when this is plotted corresponds more closely to what we hear than if one used a linear scale.
Note that the reference (0 dB) point in decibels can be chosen freely. The default for librosa.amplitude_to_db is to compute numpy.max, meaning that the max value of the input will be mapped to 0 dB. All other values will then be negative. The function also applies a threshold on the range of sounds, by default 80 dB. So anything lower than -80 dB will be clipped -80 dB.

Simulating noise with specific time character

I am trying to generate synthetic data for a time-domain signal. Let's say my signal is a square wave and I have on top of it some random noise. I'll model the noise as Gaussian. If I generate the data as a vector of length N and then add to it random noise sampled from a normal distribution of mean 0 and width 1, I have a rough simulation of the situation I care about. However, this adds noise with characteristic timescale set by the sampling rate. I do not want this, as in reality the noise has a much longer timescale associated with it. What is an efficient way to generate noise with a specific bandwidth?
I've tried generating the noise at each sampled point and then using the FFT to cut out frequencies above a certain value. However, this severely attenuates the signal.
My idea was basically:
noise = normrnd(0,1);
f = fft(noise);
f(1000:end) = 0;
noise = ifft(f);
This kind of works but severly attenuates the signal.

It's pretty common just to generate white noise and filter it. Often an IIR is used since it's cheap and the noise phases are random anyway. It does attenuate the signal, but it costs nothing to amplify it.
You can also generate noise directly with an IFFT. In the example you give every coefficient in the output of fft(noise) is a Gaussian-distributed random variable, so instead of getting those coefficients with an FFT and zeroing out the ones you don't want, you can just set the ones you want and IFFT to get the resulting signal. Remember that the coefficents are complex, but the real and imaginary parts are independently Gaussion-distributed.

How to obtain small bins after FFT in python?

I'm using scipy.signal.fft.rfft() to calculate power spectral density of a signal. The sampling rate is 1000Hz and the signal contains 2000 points. So frequency bin is (1000/2)/(2000/2)=0.5Hz. But I need to analyze the signal in [0-0.1]Hz.
I saw several answers recommending chirp-Z transform, but I didn't find any toolbox of it written in Python.
So how can I complete this small-bin analysis in Python? Or can I just filter this signal to [0-0.1]Hz using like Butterworth filter?
Thanks a lot!

Even if you use another transform, that will not make more data.
If you have a sampling of 1kHz and 2s of samples, then your precision is 0.5Hz. You can interpolate this with chirpz (or just use sinc(), that's the shape of your data between the samples of your comb), but the data you have on your current point is the data that determines what you have in the lobes (between 0Hz and 0.5Hz).
If you want a real precision of 0.1Hz, you need 10s of data.

You can't get smaller frequency bins to separate out close spectral peaks unless you use more (a longer amount of) data.
You can't just use a narrower filter because the transient response of such a filter will be longer than your data.
You can get smaller frequency bins that are just a smooth interpolation between nearby frequency bins, for instance to plot the spectrum on wider paper or at a higher dpi graphic resolution, by zero-padding the data and using a longer FFT. But that won't create more detail.

How to get the correct peaks and troughs from an 1d-array

We are trying to find peaks and troughs from an 1d-array.
We are using the numpy.r_() and it finds every peak and trough from an array but we want only the peaks and troughs that correspond to relaxation and contraction points of diaphragmatic motion.
Is there any function that rejects the wrong min and max points?
See a bad example below:

You have high-frequency, small-amplitude oscillations that are undesirable for peak finding purposes. Filter them out prior to searching for peaks. A simple filter to use is 1-dimensional Gaussian from scipy.ndimage. On the scale of your chart, it seems that
smooth_signal = ndimage.gaussian_filter1d(signal, 5)
should be about right (the window size should be large enough to suppress unwanted oscillation but small enough to not distort actual peaks). Then apply your peak finding algorithm to smooth_signal.
The signal processing module has more sophisticated filters, but those take some time to learn to use.

How to correlate a sample curve with a reference curve

I have a sensor that is continually collecting data (shown in blue) every minute that outputs a voltage output. I have a reference sensor collecting data (shown in red) that outputs in the units that I am interested. I am interested in determining a scaling factor so that I can scale the blue sensor's data to match the red sensor's data.
Normally, I would do a simple linear regression between the values of two sensors at any given time, which would give me a scaling factor based on the slope of the regression. I have noticed, however, that red sensor is slower at sensing a change in the environment, and can anywhere from 6-15 minutes behind -- this makes a regression difficult because at any given time, the two sensors may be measuring different things.
I was wondering if there is any sort of curve fitting that can be performed such that I can extract a scaling factor so that I can scale the blue sensor's data to match the red sensors.
I typically work in Python, so any Python packages (e.g. Numpy/Scipy) that would help with this would be especially helpful.

Thanks for the help. What I ended up doing was finding all the local maxima and minima on the reference curve, then used those peak locations to search for the same maxima or minima on the sample curve. I basically used the reference curve's maxima/minima points as the center of a "window" and I would search for the highest/lowest point on the sample curve within a few minutes of the center point.
Once I had found all the matched maxima/minima on the sample curve, I then could perform a linear regression between these points to determine a scaling factor.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Signal processing-remove noise for a series of spectra - python

Related

What is the conceptual purpose of librosa.amplitude_to_db?

Simulating noise with specific time character

How to obtain small bins after FFT in python?

How to get the correct peaks and troughs from an 1d-array

How to correlate a sample curve with a reference curve

Categories

Resources