Anybody able to supply links, advice, or other forms of help to the following?
Objective - use python to classify 10-second audio samples so that I afterwards can speak into a microphone and have python pick out and play snippets (faded together) of closest matches from db.
My objective is not to have the closest match and I don't care what the source of the audio samples is. So the result is probably of no use other than speaking in noise (fun).
I would like the python app to be able to find a specific match of FFT for example within the 10 second samples in the db. I guess the real-time sampling of the microphone will have a 100 millisecond buffersample.
Any ideas? FFT? What db? Other?
In order to do this, you need three things:
Segmentation (decide how to make your audio samples)
Feature Extraction (decide what audio feature (e.g. FFT) you care about)
Distance Metric (decide what the "closest" sample is)
Segmentation: you currently describe using 10-second samples. I think you might have better results with shorter segments (closer to 100-1000ms) in order to get something that fits the changes in the voice better.
Feature Extraction: you mention using FFT. The zero crossing rate is surprisingly ok considering how simple it is. If you want to get more fancy, using MFCCs or spectral centroid is probably the way to go.
Distance Metric: most people use the euclidean distance, but there are also fancier ones like the manhattan distance, cosine distance, and earth-movers distance.
For a database, if you have a small enough set of samples, you might try just loading everything up into a kdtree so that you can do fast distance calculations, and just hold it in memory.
Good luck! It sounds like a fun project.
Try searching for algorithms on "music fingerprinting".
You could try some typical short-term feature extraction (e.g. energy, zero crossing rate, MFCCs, spectral features, chroma, etc) and then model your segment through a vector of feature statistics. Then you could use a simple distance-based classifier (e.g. kNN) in order to retrieve the "closest" training samples from a manually laballed set, given an unknown "query".
Check out my lib on several Python Audio Analysis functionalities: pyAudioAnalysis
Related
Let's say I have a piece of music, and I want to find patterns that reproduces themselves so that I can cut out certain areas without it being audible.
For example :
What would be the best approach in python?
I thought about generating a waveform and then slicing it into images to find two similar ones but I don't know where to start and if it's a good idea.
you can split signal into buffers and compare it with fft. If the result of the fft differs from previous one by certain specified value, you can divide the part... but this really depends on what kind of music you are doing this alorithm for - for example, for house music it could be problematic to distinguish parts with fft, so you could for example acquire tempo of the track via waveform of the percussions and measure rms value.. if the rms value is changed, you have next part. The most fun and valid solution for this problem woudl be to actually use neural network, where waveform is your input and output list of the timestamps of certain parts and there you go
To complete Mateusz answer, here is a post about Fourier transform to générate new features.
Other tools exists to split an audio file into patterns or parts using pyAudioAnalysis. An explanation is given here
I'm very new to signal processing. I have two sound signal data right now. Each of the data is collected at a sample rate of 10 KHz, 2 seconds. I have imported this data into python. Both sound_1 and sound_2 is a numpy array right now. The length of each sound data is 20000 of course.
Sound_1 contains a water flow sound(which I'm interested) and environmental noise(I'm not interested), while sound_2 only contains environment noise(I'm not interested).
I'm looking for an algorithm(or package) which can help me determine the frequency range of this water flow sound. I think if I can find out the frequency range, I can use an inverse Fourier transform to filter the environment noise.
However, my ultimate purpose is to extract the water flow sound from sound_1 data and eliminate environmental noise. It would be great if there are other approaches.
I'm currently looking at this post: Python frequency detection
But I don't understand how they can find out the frequency by only one sound signal. I think we need to compare 2 signal data at least(one contains the sound I am interested, the other doesn't), so we can find out the difference.
Since sound_1 contains both water flow and environmental noise, there's no straightforward way of extracting the water flow. The Fourier transform will get you all frequencies in the signal, irrespective of the source.
The way to approach is get frequencies of environmental noise from sound_2 and then remove them from sound_1. After that is done, you can extract the frequencies from already denoised sound_1.
One of popular approaches to such noise reduction is with spectral gating. Essentially, you first determine how the noise sounds like and then remove smoothed spectrum from your signal. Smoothing is crucial, as sound is a wave, a continuous entity. If you simply chop out discrete frequencies from the wave, you will get very poor results (audio will sound unnatural and robotic). The amount of smoothing you apply will determine how much noise is reduced (mind it's never truly removed - you will always get some residue).
To the concrete solution.
As you're new to the subject, I'd recommend first how noise reduction works in a software that will do the work for you. Audacity is an excellent choice. I linked the manual for noise reduction, but there are plenty of tutorials out there.
After you know what you want to get, you can either implement spectral gating yourself or use existing package. Audacity has an excellent implementation in C++, but it may prove difficult to a newbie to port. I'd recommend going first with noisereduce package. It's based on Audacity implementation. If you use it, you will be done in a few lines.
Here's a snippet:
import noisereduce as nr
# load data
rate, data = wavfile.read("sound_1.wav")
# select section of data that is noise
noisy_part = wavfile.read("sound_2.wav")
# perform noise reduction
reduced_noise = nr.reduce_noise(audio_clip=data, noise_clip=noisy_part, verbose=True)
Now simply run FFT on the reduced_noise to discover the frequencies of water flow.
Here's how I am using noisereduce. In this part I am determining the frequency statistics.
I would like to test if a set of documents have some special similarity, looking on a graph built with each one's vector representation, showed together with a text dataset of other documents. I guess that they will be together in a visualization.
The solution is to use doc2vec to calculate the vector for each document and plot it? Can it be done in a unsupervised way? Which python library should I use to get those beautiful 2D and 3D representations of Word2vec?
Not sure of what you're asking but if you want a way to check if vector are of the same type you could use K-Means.
K-Means make a number K of cluster out of a list of vector, so if you choose a good K (not too low so it will search for something but not too high so it will not be too discriminant) it could work.
K-Means grossly work that way:
init_center(K) # randomly set K vector that will be the center of your cluster
while not converge(): # This one is tricky as you can find a lot of way to check for the convergence, the easiest is to check if your center has moved since the last itteration
associate_vector() # Here you associate all the vectors to the closest center
re_calculate_center() # And now you put the center at the... well center of their point, you can do that just by doing the mean of all the vector of the cluster.
This gif is probably clearer than me:
And this article (where this gif is from) is really clearer than me, even if he talk for java here:
https://picoledelimao.github.io/blog/2016/03/12/multithreaded-k-means-in-java/
Im working on EEG signal processing method for recognition of P300 ERP.
At the moment, Im training my classifier with a single vector of data that I get by averaging across preprocessed data from chosen subset of original 64 channels. Im using the values from EEG directly, not a frequency features from fft. The method actually got quite a solid performance of around 75% accurate classification.
I would like to improve it by using ICA to clean the EEG data a bit. I read through a lot of tutorials and papers and I am still kinda confused.
Im implementing my method in python so I chose to use sklearn's FastICA.
from sklearn.decomposition import FastICA
self.ica = FastICA(n_components=64,max_iter=300)
icaSignal = self.ica.fit_transform(self.signal)
From 25256 samples x 64 channels matrix I get matrix of original sources, that is also 25256x64. The problem is, that im not quite sure how to use the output.
Averaging those components and training a classifier same way as with signal reduces performance to less than 30%. So this is not probably the way.
Another way that I read about, is rejecting some of components at this point - the ones that represent eye blinks, muscle activity etc. Doing that based on their frequency and some other heuristics. - I also not quite confident about how to do that exactly.
After I reject some of the components, what is the next step? Should I try to average the ones that left and feed the classifier with them, or should i try to reconstruct the EEG signal without them now - if so, how to do that in python? I wasnt able to find any information about that reconstruction step. It is probably much easier to do in matlab so nobody bothered to write about it :(
Any suggestions? :)
Thank you very much!
I haven't used Python for ICA, but in turns of the steps, it shouldn't matter whether it's Matlab or Python.
You are completely right that it's hard to reject ICA components. There is no widely-accepted objective measurement. There are certain patterns for eye blinks (high voltage in frontal channels), muscle artifacts (wide spectrum coverage because it's EMG, at peripheral channels). If you don't know where to get started, I recommend reading the help of a Matlab plugin called EEGLAB. This UCSD group has some nice materials to help you start.
https://eeglab.org/
To answer your question on the ICA reconstruction: after rejecting some ICA components, you should reconstruct the original EEG without them.
I have some points that I need to classify. Given the collection of these points, I need to say which other (known) distribution they match best. For example, given the points in the top left distribution, my algorithm would have to say whether they are a better match to the 2nd, 3rd, or 4th distribution. (Here the bottom-left would be correct due to the similar orientations)
I have some background in Machine Learning, but I am no expert. I was thinking of using Gaussian Mixture Models, or perhaps Hidden Markov Models (as I have previously classified signatures with these- similar problem).
I would appreciate any help as to which approach to use for this problem. As background information, I am working with OpenCV and Python, so I would most likely not have to implement the chosen algorithm from scratch, I just want a pointer to know which algorithms would be applicable to this problem.
Disclaimer: I originally wanted to post this on the Mathematics section of StackExchange, but I lacked the necessary reputation to post images. I felt that my point could not be made clear without showing some images, so I posted it here instead. I believe that it is still relevant to Computer Vision and Machine Learning, as it will eventually be used for object identification.
EDIT:
I read and considered some of the answers given below, and would now like to add some new information. My main reason for not wanting to model these distributions as a single Gaussian is that eventually I will also have to be able to discriminate between distributions. That is, there might be two different and separate distributions representing two different objects, and then my algorithm should be aware that only one of the two distributions represents the object that we are interested in.
I think this depends on where exactly the data comes from and what sort of assumptions you would like to make as to its distribution. The points above can easily be drawn even from a single Gaussian distribution, in which case the estimation of parameters for each one and then the selection of the closest match are pretty simple.
Alternatively you could go for the discriminative option, i.e. calculate whatever statistics you think may be helpful in determining the class a set of points belongs to and perform classification using SVM or something similar. This can be viewed as embedding these samples (sets of 2d points) in a higher-dimensional space to get a single vector.
Also, if the data is actually as simple as in this example, you could just do the principle component analysis and match by the first eigenvector.
You should just fit the distributions to the data, determine the chi^2 deviation for each one, look at F-Test. See for instance these notes on model fitting etc
You might want to consider also non-parametric techniques (e.g. multivariate kernel density estimation on each of your new data set) in order to compare the statistics or distances of the estimated distributions. In Python stats.kde is an implementation in SciPy.Stats.