Let's say I have a piece of music, and I want to find patterns that reproduces themselves so that I can cut out certain areas without it being audible.
For example :
What would be the best approach in python?
I thought about generating a waveform and then slicing it into images to find two similar ones but I don't know where to start and if it's a good idea.
you can split signal into buffers and compare it with fft. If the result of the fft differs from previous one by certain specified value, you can divide the part... but this really depends on what kind of music you are doing this alorithm for - for example, for house music it could be problematic to distinguish parts with fft, so you could for example acquire tempo of the track via waveform of the percussions and measure rms value.. if the rms value is changed, you have next part. The most fun and valid solution for this problem woudl be to actually use neural network, where waveform is your input and output list of the timestamps of certain parts and there you go
To complete Mateusz answer, here is a post about Fourier transform to générate new features.
Other tools exists to split an audio file into patterns or parts using pyAudioAnalysis. An explanation is given here
Related
In school we have to listen to intervals and chords and determine their name. I'm really into neuronal network. Thats why I want to create a neuronal network with Python which listen to the audio and give me the name as an output. I've learned once that for music I need a LSTM. Should I need for this purpose also a LSTM and how/where should I start? Can anybody teach me how to achieve my goal?
first of all you need to exactly define the task you like to solve: Do you like to classify a whole piece of music/track or do you like to classify segments of the piece/track? This will influence which architecture you need to use to solve your task. I will briefly present an approach for each of those tasks.
Classifying a track: Recordings of music are time series, for each of your recordings you need to have a label. Your first intuition of using LSTMs (or RNNs in general) is a good one. Just use your recording transformed into a vector as input-sequence for your LSTM-network and let it put out probabilities for each class. As already indicated by a comment, working in frequency-space can be beneficial. However just using the Fourier-Transformation of the whole track will most likely lose important information since the temporal frequency information is lost. Rather use Short-time Fourier-Transormation (STFT) or Mel-frequency cepstrum coefficients (MFCC, here is a python-library to calculate them: libROSA). Very oversimplified, those methods will transform your time series into some kind of 'image', an two-dimensional frequncy-spectrum, and for image classification task Convolutional Neural Networks (CNNs) are the way to go.
Classifying segments: If you like to classify segments of your track you need to have a labels for each time-frame in your song. Lets say your song is 3 minutes long and you have a sampling frequency of 60 Hz, your vector representation of the song will have 3*60*60 = 10800 time-frames, thus for each of the entries you need to provide a class label (chord or whatever). Again you can use LSTMs, use your vector as input-sequence and let your network produce an output sequence of the same length of your song and compare it to the class labels. You also could use the previously mentioned STFT- or MFC-coefficients as inputs and take advantage of the frequency information, now you will have a spectrum for each time-frame as input.
I hope these broad ideas will bring you one step closer to solve your task. For implementation details I like to point you to the keras documentation and to countless tutorials on the internet.
Disclaimer:
My knowledge of music theory is rather limited, so please take my answer with a grain of salt and feel free to correct me or ask for clarification. Have fun
Does aubio have a way to detect sections of a piece of audio that lack tonal elements -- rhythm only? I tested a piece of music that has 16 seconds of rhythm at the start, but all the aubiopitch and aubionotes algorithms seemed to detect tonality during the rhythmic section. Could it be tuned somehow to distinguish tonal from non-tonal onsets? Or is there a related library that can do this?
Been busy the past couple of days - but started looking into this today...
It'll take a while to perfect I guess but I thought I'd give you a few thoughts and some code I've started working on to attack this!
Firstly, pseudo code's a good way to design an initial method.
1/ use import matplotlib.pyplot as plt to spectrum analyse the audio, and plot various fft and audio signals.
2/ import numpy as np for basic array-like structure handling.
(I know this is more than pseudo code, but hey :-)
3/ plt.specgram creates spectral maps of your audio. Apart from the image it creates (which can be used to start to manually deconstruct your audio file), it returns 4 structures.
eg
ffts,freqs,times,img = plt.specgram(signal,Fs=44100)
ffts is a 2 dimentional array where the columns are the ffts (Fast Fourier Transforms) of the time sections (rows).
The plain vanilla specgram analyses time sections of 256 samples long, stepping 128 samples forwards each time.
This gives a very low resolution frequency array at a pretty fast rate.
As musical notes merge into a single sound when played at more or less 10 hz, I decided to use the specgram options to divide the audio into 4096 sample lengths (circa 10 hz) stepping forwards every 2048 samples (ie 20 times a second).
This gives a decent frequency resolution, and the time sections being 20th sec apart are faster than people can perceive individual notes.
This means calling the specgram as follows:
plt.specgram(signal,Fs=44100,NFFT=4096,noverlap=2048,mode='magnitude')
(Note the mode - this seems to give me amplitudes of between 0 - 0.1: I have a problem with fft not giving me amplitudes of the same scale as the audio signal (you may have seen the question I posted). But here we are...
4/ Next I decided to get rid of noise in the ffts returned. This means we can concentrate on freqs of a decent amplitude, and zero out the noise which is always present in ffts (in my experience).
Here is (are) my function(s):
def gate(signal,minAmplitude):
return np.array([int((((a-minAmplitude)+abs(a-minAmplitude))/2) > 0) * a for a in signal])
Looks a bit crazy - and I'm sure a proper mathematician could come up with something more efficient - but this is the best I could invent. It zeros any freqencies of amplitude less than minAmplitude.
This is the relevant code to call it from the ffts returned by plt.specgram as follows, my function is more involved as it is part of a class, and has other functions it references - but this should be enough:
def fft_noise_gate(minAmplitude=0.001,check=True):
'''
zero the amplitudes of frequencies
with amplitudes below minAmplitude
across self.ffts
check - plot middle fft just because!
'''
nffts = ffts.shape[1]
gated_ffts = []
for f in range(nffts):
fft = ffts[...,f]
# Anyone got a more efficient noise gate formula? Best I could think up!
fft_gated = gate(fft,minAmplitude)
gated_ffts.append(fft_gated)
ffts = np.array(gated_ffts)
if check:
# plot middle fft just to see!
plt.plot(ffts[int(nffts/2)])
plt.show(block=False)
return ffts
This should give you a start I'm still working on it and will get back to you when I've got further - but if you have any ideas, please share them.
Any way my strategy from here is to:
1/ find the peaks ( ie start of any sounds) then
2/ Look for ranges of frequencies which rise and fall in unison (ie make up a sound).
And
3/ Differentiate them into individual instruments (sound sources more specifically), and plot the times and amplitudes thereof to create your analysis (score).
Hope you're having fun with it - I know I am.
As I said any thoughts...
Regards
Tony
Use a spectrum analyser to detect sections with high amplitude. If you program - you could take each section and make an average of the freqencies (and amplitudes) present to give you an idea of the instrument(s) involved in creating that amplitude peak.
Hope that helps - if you're using python I could give you some pointers how to program this!?
Regards
Tony
I'm very new to signal processing. I have two sound signal data right now. Each of the data is collected at a sample rate of 10 KHz, 2 seconds. I have imported this data into python. Both sound_1 and sound_2 is a numpy array right now. The length of each sound data is 20000 of course.
Sound_1 contains a water flow sound(which I'm interested) and environmental noise(I'm not interested), while sound_2 only contains environment noise(I'm not interested).
I'm looking for an algorithm(or package) which can help me determine the frequency range of this water flow sound. I think if I can find out the frequency range, I can use an inverse Fourier transform to filter the environment noise.
However, my ultimate purpose is to extract the water flow sound from sound_1 data and eliminate environmental noise. It would be great if there are other approaches.
I'm currently looking at this post: Python frequency detection
But I don't understand how they can find out the frequency by only one sound signal. I think we need to compare 2 signal data at least(one contains the sound I am interested, the other doesn't), so we can find out the difference.
Since sound_1 contains both water flow and environmental noise, there's no straightforward way of extracting the water flow. The Fourier transform will get you all frequencies in the signal, irrespective of the source.
The way to approach is get frequencies of environmental noise from sound_2 and then remove them from sound_1. After that is done, you can extract the frequencies from already denoised sound_1.
One of popular approaches to such noise reduction is with spectral gating. Essentially, you first determine how the noise sounds like and then remove smoothed spectrum from your signal. Smoothing is crucial, as sound is a wave, a continuous entity. If you simply chop out discrete frequencies from the wave, you will get very poor results (audio will sound unnatural and robotic). The amount of smoothing you apply will determine how much noise is reduced (mind it's never truly removed - you will always get some residue).
To the concrete solution.
As you're new to the subject, I'd recommend first how noise reduction works in a software that will do the work for you. Audacity is an excellent choice. I linked the manual for noise reduction, but there are plenty of tutorials out there.
After you know what you want to get, you can either implement spectral gating yourself or use existing package. Audacity has an excellent implementation in C++, but it may prove difficult to a newbie to port. I'd recommend going first with noisereduce package. It's based on Audacity implementation. If you use it, you will be done in a few lines.
Here's a snippet:
import noisereduce as nr
# load data
rate, data = wavfile.read("sound_1.wav")
# select section of data that is noise
noisy_part = wavfile.read("sound_2.wav")
# perform noise reduction
reduced_noise = nr.reduce_noise(audio_clip=data, noise_clip=noisy_part, verbose=True)
Now simply run FFT on the reduced_noise to discover the frequencies of water flow.
Here's how I am using noisereduce. In this part I am determining the frequency statistics.
My idea is that instead of using DFT/FFT to calculate guitar chords or notes, I can attempt a different approach. How I plan to do it is so (in Python)
Record myself playing a large range of notes and chords and store them as a large dataset.
After this, I would create another recording of me doing a single note or chord which needs to be recognised.
In a similar manner of which I compare two datasets using spearman's rank correlation coefficient, I could compare the recording to each file in the dataset and see which note or chord is most similar.
For my situation, I aim for this calculation to occur as I play, so no preprocessing would be involved. To do this I would need to calibrate the background noise volume, so I could distinct each note/chord from each other.
Diagram to explain the concept:
To help explain my idea, I have created a simple image which has a dataset of two notes.
Imgur link of the diagram
My question is how viable this would be, and if it is feasible, how would I conduct it in Python?
Thanks,
Aj.
Anybody able to supply links, advice, or other forms of help to the following?
Objective - use python to classify 10-second audio samples so that I afterwards can speak into a microphone and have python pick out and play snippets (faded together) of closest matches from db.
My objective is not to have the closest match and I don't care what the source of the audio samples is. So the result is probably of no use other than speaking in noise (fun).
I would like the python app to be able to find a specific match of FFT for example within the 10 second samples in the db. I guess the real-time sampling of the microphone will have a 100 millisecond buffersample.
Any ideas? FFT? What db? Other?
In order to do this, you need three things:
Segmentation (decide how to make your audio samples)
Feature Extraction (decide what audio feature (e.g. FFT) you care about)
Distance Metric (decide what the "closest" sample is)
Segmentation: you currently describe using 10-second samples. I think you might have better results with shorter segments (closer to 100-1000ms) in order to get something that fits the changes in the voice better.
Feature Extraction: you mention using FFT. The zero crossing rate is surprisingly ok considering how simple it is. If you want to get more fancy, using MFCCs or spectral centroid is probably the way to go.
Distance Metric: most people use the euclidean distance, but there are also fancier ones like the manhattan distance, cosine distance, and earth-movers distance.
For a database, if you have a small enough set of samples, you might try just loading everything up into a kdtree so that you can do fast distance calculations, and just hold it in memory.
Good luck! It sounds like a fun project.
Try searching for algorithms on "music fingerprinting".
You could try some typical short-term feature extraction (e.g. energy, zero crossing rate, MFCCs, spectral features, chroma, etc) and then model your segment through a vector of feature statistics. Then you could use a simple distance-based classifier (e.g. kNN) in order to retrieve the "closest" training samples from a manually laballed set, given an unknown "query".
Check out my lib on several Python Audio Analysis functionalities: pyAudioAnalysis