I'm working on analysis of audio files in Python, specifically music audio, and I've applied the DFT (FFT) to get data in the frequency domain, but no amount of searching or fiddling around with it has revealed a good way to identify "peaks"/local maxima in the frequencies. My data is pretty noisy, an example of the graph after applying the Fourier Transform is below. Help would be really appreciated. I'm also looking at retrieving MFCC coefficients from this data, but I'm also not sure how to go about doing that, so knowledge on that subject would also be useful.
First, you need to smooth your (fft) data by running a low pass filtering. After that, you can find zero crossings on the gradients of signal. You can filter the signal with [-1, 1] to find the gradient, and pick the elements whose predecessor is positive and successor is negative.
Related
This article https://www.bitweenie.com/listings/fft-zero-padding/ gives a simple relation between time-length of the input data to the FFT and the minimum distance between two frequencies that can be distinguished in the FFT. The article calls this Waveform frequency resolution.
In other words; if two input-frequencies are closer in frequency than 1/time-length_of_input_data, they will show as only one peak in the FFT-plot.
My question is: is there a way to increase this Waveform frequency resolution? I am finding it difficult to work with rather short data-series due to this limitation.
As an example, if I use a combination of sine series with periods 9.5, 10, and 11 over 240 datapoints I cannot distinguish between the different frequencies.
To have good frequency resultion you need a long time series.
This is a fundamental issue, called uncertainty principle. It cannot be overcome within Fourier analysis (Fourier transform, DFT, short-time Fourier transform and so on).
Also note that zero padding will not overcome this issue.
It gives more points in the frequency domain, in the sense that the same spectral information is sampled more densely, but it will not make peaks sharper or more separated.
The only way to overcome the uncertainty principle is to make further assumptions on the data.
If for example you know that there is only a single frequency component, it is possible to determine its frequency more accurately than the uncertainty principle predicts.
Also you can use transforms such as the Vigner-Wille transform . It is not bound by the uncertainty principle, but generates "crossterms", i.e. frequency component artifacts. However, when you only have few frequency compoents this might be acceptable. Depends on the use-case.
I need to optimise a method for finding the number of data peaks in a 1D array. The data is a time-series of the amplitude of a wav file.
I have the code implemented already:
from scipy.io.wavfile import read
from scipy.signal import find_peaks
_, amplitudes = read('audio1.wav')
indexes, _ = find_peaks(amplitudes, height=80)
print(f'Number of peaks: {len(indexes)}')
When plotted, the data looks like this:
General scale
The 'peaks' that I am interested in are clear to the human eye - there are 23 in this particular dataset.
However, because the array is so large, the data is extremely variant within the peaks that are clear at a general scale (hence the many hundreds of peaks labelled with blue crosses):
Zoomed in view of one peak
Peak-finding questions have been asked many times before (I've been through a lot of them!) - but I can't find any help or explanation of optimising the parameters for finding only the peaks I want. I know a little about Python, but am blind when it comes to mathematical analysis!
Analysing by width seems useless because, as per the second image, the peaks clear at a large scale are actually interspersed with 'silent' ranges. Distance is not helpful because I do not know how close the peaks will be in other wav files. Prominence has been suggested as the best method but I could not get the results I needed; threshold likewise. I have also tried decimating the signal, smoothing the signal with a Savitzky-Golay filter, and different combinations of parameters and values, all with inaccurate results.
Height alone has been useful because I can see from the charts that peaks always reach above 80.
This is a common task in audio processing and there are several approaches which totally depend on your data.
However, there are implementations out there which are used for finding peaks in novelty functions (e.g., the output from a beat tracker). Try these ones:
https://madmom.readthedocs.io/en/latest/modules/features/onsets.html#madmom.features.onsets.peak_picking
https://librosa.github.io/librosa/generated/librosa.util.peak_pick.html#librosa.util.peak_pick
Basically they implement the same method but there might be differences in the details.
Furthermore, you could check, if you really need to work on this high sampling frequency. Try downsampling the signal or use a moving average filter.
Look at 0d persistent homology to find a good strategy, where the parameter you can optimize for is peak persistence. A nice blog post here explains the basics.
But in short the idea is to imagine your graph being filled by water, and then slowly draining the water. Every time a piece of the graph comes above water a new island is born. When two islands are next to each other they merge, which causes the younger island (with the lower peak) to die.
Then each data point has a birth time and a death time. The most significant peaks are those with the longest persistence, which is death - birth.
If the water level drops at a continuous rate, then the persistence is defined in terms of peak height. Another possibility is by dropping the water instantaneously from point to point as time goes from step t to step t+1, in wich case the persistence is defined in peak width in terms of signal samples.
For you it seems that using the original definition in terms of peak height > 70 finds all peaks you are interested in, albeit possibly too many, clustered together. You can limit this by choosing the first peak in each cluster or the highest peak in each cluster or by doing both approaches and only choosing peaks that have both great height persistence as well as width persistence.
I'm very new to signal processing. I have two sound signal data right now. Each of the data is collected at a sample rate of 10 KHz, 2 seconds. I have imported this data into python. Both sound_1 and sound_2 is a numpy array right now. The length of each sound data is 20000 of course.
Sound_1 contains a water flow sound(which I'm interested) and environmental noise(I'm not interested), while sound_2 only contains environment noise(I'm not interested).
I'm looking for an algorithm(or package) which can help me determine the frequency range of this water flow sound. I think if I can find out the frequency range, I can use an inverse Fourier transform to filter the environment noise.
However, my ultimate purpose is to extract the water flow sound from sound_1 data and eliminate environmental noise. It would be great if there are other approaches.
I'm currently looking at this post: Python frequency detection
But I don't understand how they can find out the frequency by only one sound signal. I think we need to compare 2 signal data at least(one contains the sound I am interested, the other doesn't), so we can find out the difference.
Since sound_1 contains both water flow and environmental noise, there's no straightforward way of extracting the water flow. The Fourier transform will get you all frequencies in the signal, irrespective of the source.
The way to approach is get frequencies of environmental noise from sound_2 and then remove them from sound_1. After that is done, you can extract the frequencies from already denoised sound_1.
One of popular approaches to such noise reduction is with spectral gating. Essentially, you first determine how the noise sounds like and then remove smoothed spectrum from your signal. Smoothing is crucial, as sound is a wave, a continuous entity. If you simply chop out discrete frequencies from the wave, you will get very poor results (audio will sound unnatural and robotic). The amount of smoothing you apply will determine how much noise is reduced (mind it's never truly removed - you will always get some residue).
To the concrete solution.
As you're new to the subject, I'd recommend first how noise reduction works in a software that will do the work for you. Audacity is an excellent choice. I linked the manual for noise reduction, but there are plenty of tutorials out there.
After you know what you want to get, you can either implement spectral gating yourself or use existing package. Audacity has an excellent implementation in C++, but it may prove difficult to a newbie to port. I'd recommend going first with noisereduce package. It's based on Audacity implementation. If you use it, you will be done in a few lines.
Here's a snippet:
import noisereduce as nr
# load data
rate, data = wavfile.read("sound_1.wav")
# select section of data that is noise
noisy_part = wavfile.read("sound_2.wav")
# perform noise reduction
reduced_noise = nr.reduce_noise(audio_clip=data, noise_clip=noisy_part, verbose=True)
Now simply run FFT on the reduced_noise to discover the frequencies of water flow.
Here's how I am using noisereduce. In this part I am determining the frequency statistics.
I am new about these topics. I research a lot of article about this issue. There are a lot of different techniques. But I am confused, because I don't know, where to start.
According to my research, first important thing; I must make preprocessing to the raw sensor data. There are some techniques, fft is one of them. (But how can I search to learn all techniques? I did not see all techniques in same page.)
Then I start the statistical calculates to processing.
I did not draw a roadmap. Can you help these issue or suggest books or anything?
Welcome to SO ... to leverage this site hover your mouse over top of tag fft on your question ... then click View tag ... then hit learn more ... then after reading the info page on fft hit Votes to see the highest voted posts here on SO ... those questions/answers will get you into the ball park
I highly suggest you master the details explained here Discrete Fourier Transform - Simple Step by Step
An Interactive Guide To The Fourier Transform
https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/
Intuitive Understanding of the Fourier Transform and FFTs
https://www.youtube.com/watch?v=FjmwwDHT98c
An Intuitive Discrete Fourier Transform Tutorial
http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform/
How to get frequency from fft result?
I could go on mentioning nuggets from my notes however I will leave you with this excerpt from an excellent book
http://www.dspguide.com/ch10/6.htm
The Discrete Time Fourier Transform (DTFT) is the member of the Fourier transform family that operates on aperiodic,
discrete signals. The best way to understand the DTFT is how it relates to the DFT. To start, imagine that you
acquire an N sample signal, and want to find its frequency spectrum. By using the DFT, the signal can be
decomposed into sine and cosine waves, with frequencies equally spaced between zero and one-half of the
sampling rate. As discussed in the last chapter, padding the time domain signal with zeros makes the period
of the time domain longer, as well as making the spacing between samples in the frequency domain narrower.
As N approaches infinity, the time domain becomes aperiodic, and the frequency domain becomes a continuous signal.
This is the DTFT, the Fourier transform that relates an aperiodic, discrete signal, with a periodic,
continuous frequency spectrum
The first step will be data cleaning and feature extraction. You need to prepare data in format that is applicable to Machine Learning algorithms. I recommend to you my paper "Generic Data Imputation and Feature Extraction for Signals
from Multifunctional Printers". It is about preparing data from IoT signals for further application of ML algorithms.
Anybody able to supply links, advice, or other forms of help to the following?
Objective - use python to classify 10-second audio samples so that I afterwards can speak into a microphone and have python pick out and play snippets (faded together) of closest matches from db.
My objective is not to have the closest match and I don't care what the source of the audio samples is. So the result is probably of no use other than speaking in noise (fun).
I would like the python app to be able to find a specific match of FFT for example within the 10 second samples in the db. I guess the real-time sampling of the microphone will have a 100 millisecond buffersample.
Any ideas? FFT? What db? Other?
In order to do this, you need three things:
Segmentation (decide how to make your audio samples)
Feature Extraction (decide what audio feature (e.g. FFT) you care about)
Distance Metric (decide what the "closest" sample is)
Segmentation: you currently describe using 10-second samples. I think you might have better results with shorter segments (closer to 100-1000ms) in order to get something that fits the changes in the voice better.
Feature Extraction: you mention using FFT. The zero crossing rate is surprisingly ok considering how simple it is. If you want to get more fancy, using MFCCs or spectral centroid is probably the way to go.
Distance Metric: most people use the euclidean distance, but there are also fancier ones like the manhattan distance, cosine distance, and earth-movers distance.
For a database, if you have a small enough set of samples, you might try just loading everything up into a kdtree so that you can do fast distance calculations, and just hold it in memory.
Good luck! It sounds like a fun project.
Try searching for algorithms on "music fingerprinting".
You could try some typical short-term feature extraction (e.g. energy, zero crossing rate, MFCCs, spectral features, chroma, etc) and then model your segment through a vector of feature statistics. Then you could use a simple distance-based classifier (e.g. kNN) in order to retrieve the "closest" training samples from a manually laballed set, given an unknown "query".
Check out my lib on several Python Audio Analysis functionalities: pyAudioAnalysis