I'm very new to signal processing. I have two sound signal data right now. Each of the data is collected at a sample rate of 10 KHz, 2 seconds. I have imported this data into python. Both sound_1 and sound_2 is a numpy array right now. The length of each sound data is 20000 of course.
Sound_1 contains a water flow sound(which I'm interested) and environmental noise(I'm not interested), while sound_2 only contains environment noise(I'm not interested).
I'm looking for an algorithm(or package) which can help me determine the frequency range of this water flow sound. I think if I can find out the frequency range, I can use an inverse Fourier transform to filter the environment noise.
However, my ultimate purpose is to extract the water flow sound from sound_1 data and eliminate environmental noise. It would be great if there are other approaches.
I'm currently looking at this post: Python frequency detection
But I don't understand how they can find out the frequency by only one sound signal. I think we need to compare 2 signal data at least(one contains the sound I am interested, the other doesn't), so we can find out the difference.
Since sound_1 contains both water flow and environmental noise, there's no straightforward way of extracting the water flow. The Fourier transform will get you all frequencies in the signal, irrespective of the source.
The way to approach is get frequencies of environmental noise from sound_2 and then remove them from sound_1. After that is done, you can extract the frequencies from already denoised sound_1.
One of popular approaches to such noise reduction is with spectral gating. Essentially, you first determine how the noise sounds like and then remove smoothed spectrum from your signal. Smoothing is crucial, as sound is a wave, a continuous entity. If you simply chop out discrete frequencies from the wave, you will get very poor results (audio will sound unnatural and robotic). The amount of smoothing you apply will determine how much noise is reduced (mind it's never truly removed - you will always get some residue).
To the concrete solution.
As you're new to the subject, I'd recommend first how noise reduction works in a software that will do the work for you. Audacity is an excellent choice. I linked the manual for noise reduction, but there are plenty of tutorials out there.
After you know what you want to get, you can either implement spectral gating yourself or use existing package. Audacity has an excellent implementation in C++, but it may prove difficult to a newbie to port. I'd recommend going first with noisereduce package. It's based on Audacity implementation. If you use it, you will be done in a few lines.
Here's a snippet:
import noisereduce as nr
# load data
rate, data = wavfile.read("sound_1.wav")
# select section of data that is noise
noisy_part = wavfile.read("sound_2.wav")
# perform noise reduction
reduced_noise = nr.reduce_noise(audio_clip=data, noise_clip=noisy_part, verbose=True)
Now simply run FFT on the reduced_noise to discover the frequencies of water flow.
Here's how I am using noisereduce. In this part I am determining the frequency statistics.
Related
I am trying to replicate this article but its corresponding github repo is written quite badly. In the article, an NN is trained on manually corrupted audio signals. Unfortunately, the researchers did not add the audio files nor a clean code that show how they have corrupted their audio files. In the paper they write:
..for the noisy test set, the 100 utterances were corrupted with four
unseen noise types (engine, white, street, and baby cry), at six SNR
levels (-6 dB, 0 dB, 6 dB, 12 dB, 18 dB, and 24 dB); for the enhanced
set, the utterances in the noisy set were enhanced by the enhancement
model above.
Now to the question - is there a python (R/MATLAB libraries are fine as well) that takes as an input the signal, the type of desired noise and the SNR and returns a corrupted signal? If not, where do I get an engine or a crying baby noise types?
Thanks!
So, if someone is getting into the same problem, here is what I did. First, I looked for databases that include real-life noises. Most of them costs money and offer limited variety of environments (see the AURORA-2 corpus, the CHiME background noise data, or the NOISEX-92 database). Finally I found the DEMAND dataset that includes multi-channel noises from 16 different environments (office, car, road, etc.) and is available freely.
Now, before merging noise and signal, one has to verify they share the same sampling rate (actually, it is not such a severe problem as I understand from this discussion, but it is better to be on the safe side). If you are using python, you can use the librosa.resample module to standardize the two. After that, you can add the two signals. When adding the noise, you may want to control the magnitude of each of inputs (signal and noise). You can use the signal to noise ratio formula that is given below in order to find $a$, the multiplayer by which you have to multiply your noise in order to get the desired signal to noise ratio (SNR).
Where the desired SNR is given, and the two RMS are calculated from your data.
I am new about these topics. I research a lot of article about this issue. There are a lot of different techniques. But I am confused, because I don't know, where to start.
According to my research, first important thing; I must make preprocessing to the raw sensor data. There are some techniques, fft is one of them. (But how can I search to learn all techniques? I did not see all techniques in same page.)
Then I start the statistical calculates to processing.
I did not draw a roadmap. Can you help these issue or suggest books or anything?
Welcome to SO ... to leverage this site hover your mouse over top of tag fft on your question ... then click View tag ... then hit learn more ... then after reading the info page on fft hit Votes to see the highest voted posts here on SO ... those questions/answers will get you into the ball park
I highly suggest you master the details explained here Discrete Fourier Transform - Simple Step by Step
An Interactive Guide To The Fourier Transform
https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/
Intuitive Understanding of the Fourier Transform and FFTs
https://www.youtube.com/watch?v=FjmwwDHT98c
An Intuitive Discrete Fourier Transform Tutorial
http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform/
How to get frequency from fft result?
I could go on mentioning nuggets from my notes however I will leave you with this excerpt from an excellent book
http://www.dspguide.com/ch10/6.htm
The Discrete Time Fourier Transform (DTFT) is the member of the Fourier transform family that operates on aperiodic,
discrete signals. The best way to understand the DTFT is how it relates to the DFT. To start, imagine that you
acquire an N sample signal, and want to find its frequency spectrum. By using the DFT, the signal can be
decomposed into sine and cosine waves, with frequencies equally spaced between zero and one-half of the
sampling rate. As discussed in the last chapter, padding the time domain signal with zeros makes the period
of the time domain longer, as well as making the spacing between samples in the frequency domain narrower.
As N approaches infinity, the time domain becomes aperiodic, and the frequency domain becomes a continuous signal.
This is the DTFT, the Fourier transform that relates an aperiodic, discrete signal, with a periodic,
continuous frequency spectrum
The first step will be data cleaning and feature extraction. You need to prepare data in format that is applicable to Machine Learning algorithms. I recommend to you my paper "Generic Data Imputation and Feature Extraction for Signals
from Multifunctional Printers". It is about preparing data from IoT signals for further application of ML algorithms.
I'm working on analysis of audio files in Python, specifically music audio, and I've applied the DFT (FFT) to get data in the frequency domain, but no amount of searching or fiddling around with it has revealed a good way to identify "peaks"/local maxima in the frequencies. My data is pretty noisy, an example of the graph after applying the Fourier Transform is below. Help would be really appreciated. I'm also looking at retrieving MFCC coefficients from this data, but I'm also not sure how to go about doing that, so knowledge on that subject would also be useful.
First, you need to smooth your (fft) data by running a low pass filtering. After that, you can find zero crossings on the gradients of signal. You can filter the signal with [-1, 1] to find the gradient, and pick the elements whose predecessor is positive and successor is negative.
Im working on EEG signal processing method for recognition of P300 ERP.
At the moment, Im training my classifier with a single vector of data that I get by averaging across preprocessed data from chosen subset of original 64 channels. Im using the values from EEG directly, not a frequency features from fft. The method actually got quite a solid performance of around 75% accurate classification.
I would like to improve it by using ICA to clean the EEG data a bit. I read through a lot of tutorials and papers and I am still kinda confused.
Im implementing my method in python so I chose to use sklearn's FastICA.
from sklearn.decomposition import FastICA
self.ica = FastICA(n_components=64,max_iter=300)
icaSignal = self.ica.fit_transform(self.signal)
From 25256 samples x 64 channels matrix I get matrix of original sources, that is also 25256x64. The problem is, that im not quite sure how to use the output.
Averaging those components and training a classifier same way as with signal reduces performance to less than 30%. So this is not probably the way.
Another way that I read about, is rejecting some of components at this point - the ones that represent eye blinks, muscle activity etc. Doing that based on their frequency and some other heuristics. - I also not quite confident about how to do that exactly.
After I reject some of the components, what is the next step? Should I try to average the ones that left and feed the classifier with them, or should i try to reconstruct the EEG signal without them now - if so, how to do that in python? I wasnt able to find any information about that reconstruction step. It is probably much easier to do in matlab so nobody bothered to write about it :(
Any suggestions? :)
Thank you very much!
I haven't used Python for ICA, but in turns of the steps, it shouldn't matter whether it's Matlab or Python.
You are completely right that it's hard to reject ICA components. There is no widely-accepted objective measurement. There are certain patterns for eye blinks (high voltage in frontal channels), muscle artifacts (wide spectrum coverage because it's EMG, at peripheral channels). If you don't know where to get started, I recommend reading the help of a Matlab plugin called EEGLAB. This UCSD group has some nice materials to help you start.
https://eeglab.org/
To answer your question on the ICA reconstruction: after rejecting some ICA components, you should reconstruct the original EEG without them.
Anybody able to supply links, advice, or other forms of help to the following?
Objective - use python to classify 10-second audio samples so that I afterwards can speak into a microphone and have python pick out and play snippets (faded together) of closest matches from db.
My objective is not to have the closest match and I don't care what the source of the audio samples is. So the result is probably of no use other than speaking in noise (fun).
I would like the python app to be able to find a specific match of FFT for example within the 10 second samples in the db. I guess the real-time sampling of the microphone will have a 100 millisecond buffersample.
Any ideas? FFT? What db? Other?
In order to do this, you need three things:
Segmentation (decide how to make your audio samples)
Feature Extraction (decide what audio feature (e.g. FFT) you care about)
Distance Metric (decide what the "closest" sample is)
Segmentation: you currently describe using 10-second samples. I think you might have better results with shorter segments (closer to 100-1000ms) in order to get something that fits the changes in the voice better.
Feature Extraction: you mention using FFT. The zero crossing rate is surprisingly ok considering how simple it is. If you want to get more fancy, using MFCCs or spectral centroid is probably the way to go.
Distance Metric: most people use the euclidean distance, but there are also fancier ones like the manhattan distance, cosine distance, and earth-movers distance.
For a database, if you have a small enough set of samples, you might try just loading everything up into a kdtree so that you can do fast distance calculations, and just hold it in memory.
Good luck! It sounds like a fun project.
Try searching for algorithms on "music fingerprinting".
You could try some typical short-term feature extraction (e.g. energy, zero crossing rate, MFCCs, spectral features, chroma, etc) and then model your segment through a vector of feature statistics. Then you could use a simple distance-based classifier (e.g. kNN) in order to retrieve the "closest" training samples from a manually laballed set, given an unknown "query".
Check out my lib on several Python Audio Analysis functionalities: pyAudioAnalysis