Splitting AudioSegments

Splitting AudioSegments - python

Here I am practicing analyzing audio(wav format) in order to remove low volumes in given range and export to new audio. It was formatted to int16 array and max value gave +(some number), min gave -(some number). Now as a result the output audio is too small and i think the problem is in wrong range. So how to choose the right range? I gave it between min/2 and max/2.
from pydub import AudioSegment
import io
import scipy.io.wavfile
import IPython
import numpy as np
w = AudioSegment.from_file("input.wav", format="wav")
a = w.get_array_of_samples()
fp_arr = np.array(a).T.astype(np.int16)
avg = (max(fp_arr)/2).astype(np.int16)
avg2= (min(fp_arr)/2).astype(np.int16)
b=[]
for d in a:
if d not in range(avg2,avg) :#d<avg2 and d>avg:
b.append(d)
myarray = np.asarray(b)
wav_io = io.BytesIO()
scipy.io.wavfile.write(wav_io, 16000, myarray)
wav_io.seek(0)
sound = AudioSegment.from_wav(wav_io)
file_handle = sound.export("output.wav", format="wav")

If you reject some samples without replacing them by something, it's normal for the resulting wave to be shorter. If what you plane to do is a kind of noise gate, you should probably replace the eliminated samples by silence instead.
However, a real noise gate, as any dynamic processor, works a little bit differently. First if follows the enveloppe of the signal meaning that it doesn't take into account each oscillation around the axis (if you do that, you'll cut some samples inside each oscillation, meaning several dozen of times per second, which is probably not what you want to do). Instead, a noise gate analyses the variation of amplitude at a highest temporal level. After that step, the resulting enveloppe contains no negative value anymore. When this enveloppe goes below the defined threshold (let's say 0.125 for power, or an equivalent integer value in 16 or 24 bits), it takes a few milliseconds to make a little fade out (it means that it multiply the amplitude by a factor going progressively from 1 to 0). At the contrary, when the signal passes above the threshold again, it reopens the gate with a little fade in.
If you bypass these little fades in/out, the resulting wave will contains unpleasant numeric clicks. If you bypass the enveloppe follower used to smooth the amplitude, you will close the gate a lot too often.

Related

Concatenating Waves in Python using Numpy

I'm try to produce a tone that transitions linearly between a list of arbitrary frequencies with respect to time. I'm using scipy.signal to create waves that transition between pairs of frequencies, then concatenating them. When the frequencies are round numbers and relatively far from each other this works. When the numbers are close together or aren't as nice I get pops between each transition.
What is causing these pops, why only in the first case and not the other two, and what can I do to fix it? (Or if there's a better/easier way to do what I'm trying to do, what is it?)
Any thoughts would be very much appreciated.
from scipy.signal import sweep_poly
import numpy
import sounddevice
sample_rate = 44100.0
time = 1.0
amplitude = 10000
sounddevice.default.samplerate = sample_rate
freq_list = [100, 200, 100, 200]
#freq_list = [100.05,200.21,100.02,200.65,100.16]
#freq_list = [100,101,102,103]
#get samples for each segment
samples = numpy.arange(sample_rate * time) / sample_rate
#make all the different segments
wave_list = []
for i in range(len(freq_list)-1):
wave_list.append(amplitude * sweep_poly(samples, [float(freq_list[i+1])-float(freq_list[i]),float(freq_list[i])]))
#join them together
wave = numpy.concatenate(wave_list)
#convert it to wav format (16 bits)
wav_wave = numpy.array(wave, dtype=numpy.int16)
sounddevice.play(wav_wave, blocking=True)

My comment was correct. The problem was that when I made a new wave it always started at its peak, which didn't necessarily line up with the old wave, resulting in discontinuities:
I fixed this by setting the phase offset parameter of sweep_poly to (180/math.pi)*math.acos(prev_point/amplitude) where prev_point was the last point in the previous sine wave.
Unfortunately since sine isn't one-to-one sometimes I got waves where the values matched, but the slopes didn't:
My fix for this was to check if the signs of the slopes matched, and if they didn't, slowly increase the offset until they did, then continue slowly increasing the offset until the discontinuity was small (<10). I'm sure this isn't the nicest or most mathematically satisfying way to solve this, but it works well enough for me. Now I have beautiful (pretty close to) continuously differentiable waves.

FFT sound analysis yields the correct note but in another octave [duplicate]

I have implemented Demetri's Pitch Detector project for the iPhone and hitting up against two problems. 1) any sort of background noise sends the frequency reading bananas and 2) lower frequency sounds aren't being pitched correctly. I tried to tune my guitar and while the higher strings worked - the tuner could not correctly discern the low E.
The Pitch Detection code is located in RIOInterface.mm and goes something like this ...
// get the data
AudioUnitRender(...);
// convert int16 to float
Convert(...);
// divide the signal into even-odd configuration
vDSP_ctoz((COMPLEX*)outputBuffer, 2, &A, 1, nOver2);
// apply the fft
vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_FORWARD);
// convert split real form to split vector
vDSP_ztoc(&A, 1, (COMPLEX *)outputBuffer, 2, nOver2);
Demetri then goes on to determine the 'dominant' frequency as follows:
float dominantFrequency = 0;
int bin = -1;
for (int i=0; i<n; i+=2) {
float curFreq = MagnitudeSquared(outputBuffer[i], outputBuffer[i+1]);
if (curFreq > dominantFrequency) {
dominantFrequency = curFreq;
bin = (i+1)/2;
}
}
memset(outputBuffer, 0, n*sizeof(SInt16));
// Update the UI with our newly acquired frequency value.
[THIS->listener frequencyChangedWithValue:bin*(THIS->sampleRate/bufferCapacity)];
To start with, I believe I need to apply a LOW PASS FILTER ... but I'm not an FFT expert and not sure exactly where or how to do that against the data returned from the vDSP functions. I'm also not sure how to improve the accuracy of the code in the lower frequencies. There seem to be other algorithms to determine the dominant frequency - but again, looking for a kick in the right direction when using the data returned by Apple's Accelerate framework.
UPDATE:
The accelerate framework actually has some windowing functions. I setup a basic window like this
windowSize = maxFrames;
transferBuffer = (float*)malloc(sizeof(float)*windowSize);
window = (float*)malloc(sizeof(float)*windowSize);
memset(window, 0, sizeof(float)*windowSize);
vDSP_hann_window(window, windowSize, vDSP_HANN_NORM);
which I then apply by inserting
vDSP_vmul(outputBuffer, 1, window, 1, transferBuffer, 1, windowSize);
before the vDSP_ctoz function. I then change the rest of the code to use 'transferBuffer' instead of outputBuffer ... but so far, haven't noticed any dramatic changes in the final pitch guess.

Pitch is not the same as peak magnitude frequency bin (which is what the FFT in the Accelerate framework might give you directly). So any peak frequency detector will not be reliable for pitch estimation. A low-pass filter will not help when the note has a missing or very weak fundamental (common in some voice, piano and guitar sounds) and/or lots of powerful overtones in its spectrum.
Look at a wide-band spectrum or spectrograph of your musical sounds and you will see the problem.
Other methods are usually needed for a more reliable estimate of musical pitch. Some of these include autocorrelation methods (AMDF, ASDF), Cepstrum/Cepstral analysis, harmonic product spectrum, phase vocoder, and/or composite algorithms such as RAPT (Robust Algorithm for Pitch Tracking) and YAAPT. An FFT is useful as only a sub-part of some of the above methods.

At the very least you need to apply a window function to your time domain data, prior to calculating the FFT. Without this step the power spectrum will contain artefacts (see: spectral leakage) which will interfere with your attempts at extracting pitch information.
A simple Hann (aka Hanning) window should suffice.

What is your sample frequency and blocksize? Low E is around 80 Hz, so you need to make sure your capture block is long enough to capture many cycles at this frequency. This is because the Fourier Transform divides the frequency spectrum into bins, each several Hz wide. If you sample at 44.1 kHz and have a 1024 point time domain sample, for instance, each bin will be 44100/1024 = 43.07 Hz wide. Thus a low E would be in the second bin. For a bunch of reasons (to do with spectral leakage and the nature of finite time blocks), practically speaking you should consider the first 3 or 4 bins of data in an FFT result with extreme suspicion.
If you drop the sample rate to 8 kHz, the same blocksize gives you bins that are 7.8125 Hz wide. Now low E will be in the 10th or 11th bin, which is much better. You could also use a longer blocksize.
And as Paul R points out, you MUST use a window to reduce spectral leakage.

The frequency response function of the iPhone drops off below 100 - 200 Hz (see http://blog.faberacoustical.com/2009/ios/iphone/iphone-microphone-frequency-response-comparison/ for an example).
If you are trying to detect the fundamental mode of a low guitar string, the microphone might be acting as a filter and suppressing the frequency you are interested in. There are a couple of options if you interested in using the fft data you can get - you can window the data in the frequency domain around the note you are trying to detect so that all you can see is the first mode even if it is of lower magnitude than higher modes(i.e. have a toggle to tune the first string and put it in this mode).
Or you can low pass filter the sound data - you can do this either in the time domain or even easier since you already have frequency domain data, in the frequency domain. A very simple time domain low pass filter is to do a time-moving average filter. A very simple frequency domain low pass filter is to multiply your fft magnitudes by a vector with 1's in the low frequency range and a linear (or even a step) ramp down in the higher frequencies.

Find plateau in Numpy array

I am looking for an efficient way to detect plateaus in otherwise very noisy data. The plateaus are always relatively broad A simple example of what this data could look like:
test=np.random.uniform(0.9,1,100)
test[10:20]=0
plt.plot(test)
Note that there can be multiple plateaus (which should all be detected) which can have different values.
I've tried using scipy.signal.argrelextrema, but it doesn't seem to be doing what I want it to:
peaks=argrelextrema(test,np.less,order=25)
plt.vlines(peaks,ymin=0, ymax=1)
I don't need the exact interval of the plateau- a rough range estimate would be enough, as long as that estimate is bigger or equal than the actual plateau range. It should be relatively efficient however.

There is a method scipy.signal.find_peaks that you can try, here is an exmple
import numpy
from scipy.signal import find_peaks
test = numpy.random.uniform(0.9, 1.0, 100)
test[10 : 20] = 0
peaks, peak_plateaus = find_peaks(- test, plateau_size = 1)
although find_peaks only finds peaks, it can be used to find valleys if the array is negated, then you do the following
for i in range(len(peak_plateaus['plateau_sizes'])):
if peak_plateaus['plateau_sizes'][i] > 1:
print('a plateau of size %d is found' % peak_plateaus['plateau_sizes'][i])
print('its left index is %d and right index is %d' % (peak_plateaus['left_edges'][i], peak_plateaus['right_edges'][i]))
it will print
a plateau of size 10 is found
its left index is 10 and right index is 19

This is really just a "dumb" machine learning task. You'll want to code a custom function to screen for them. You have two key characteristics to a plateau:
They're consecutive occurrences of the same value (or very nearly so).
The first and last points deviate strongly from a forward and backward moving average, respectively. (Try quantifying this based on the standard deviation if you expect additive noise, for geometric noise you'll have to take the magnitude of your signal into account too.)
A simple loop should then be sufficient to calculate a forward moving average, stdev of points in that forward moving average, reverse moving average, and stdev of points in that reverse moving average.
Read until you find a point well outside the regular noise (compare to variance). Start buffering those indices into a list.
Keep reading and buffering indices into that list while they have the same value (or nearly the same, if your plateaus can be a little rough; you'll want to use some tolerance plus the standard deviation of your plateaus, or just some tolerance if you expect them all to behave similarly).
If the variance of the points in your buffer gets too high, it's not a plateau, too rough; throw it out and start scanning again from your current position.
If the last value was very different from the previous (on the order of the change that triggered your code to start buffering indices) and in the opposite direction of the original impulse, cap your buffer here; you've got a plateau there.
Now do whatever you want with the points at those indices. Delete them, replace them with a linear interpolation between the two boundary points, whatever.
I could generate some noise and give you some sample code, but this is really something you're going to have to adapt to your application. (For example, there's a shortcoming in this method that a plateau which captures a point on the middle of the "cliff edge" may leave that point when it removes the rest of the plateau. If that's something you're worried about, you'll have to do a little more exploring after you ID the plateau.) You should be able to do this in a single pass over the data, but it might be wise to get some statistics on the whole set first to intelligently tweak your thresholds.
If you have an exact definition of what constitutes a plateau, you can make this a lot less hand-wavey and ML-looking, but so long as you're trying to identify fuzzy pattern, you're gonna have to take a statistics-based approach.

I had a similar problem, and found a simple heuristic solution shared below. I find plateaus as ranges of constant gradient of the signal. You could change the code to also check that the gradient is (close to) 0.
I apply a moving average (uniform_filter_1d) to filter out noise. Also, I calculate the first and second derivative of the signal numerically, so I'm not sure it matches the requirement of efficiency. But it worked perfectly for my signal and might be a good starting point for others.
def find_plateaus(F, min_length=200, tolerance = 0.75, smoothing=25):
'''
Finds plateaus of signal using second derivative of F.
Parameters
----------
F : Signal.
min_length: Minimum length of plateau.
tolerance: Number between 0 and 1 indicating how tolerant
the requirement of constant slope of the plateau is.
smoothing: Size of uniform filter 1D applied to F and its derivatives.
Returns
-------
plateaus: array of plateau left and right edges pairs
dF: (smoothed) derivative of F
d2F: (smoothed) Second Derivative of F
'''
import numpy as np
from scipy.ndimage.filters import uniform_filter1d
# calculate smooth gradients
smoothF = uniform_filter1d(F, size = smoothing)
dF = uniform_filter1d(np.gradient(smoothF),size = smoothing)
d2F = uniform_filter1d(np.gradient(dF),size = smoothing)
def zero_runs(x):
'''
Helper function for finding sequences of 0s in a signal
https://stackoverflow.com/questions/24885092/finding-the-consecutive-zeros-in-a-numpy-array/24892274#24892274
'''
iszero = np.concatenate(([0], np.equal(x, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
# Find ranges where second derivative is zero
# Values under eps are assumed to be zero.
eps = np.quantile(abs(d2F),tolerance)
smalld2F = (abs(d2F) <= eps)
# Find repititions in the mask "smalld2F" (i.e. ranges where d2F is constantly zero)
p = zero_runs(np.diff(smalld2F))
# np.diff(p) gives the length of each range found.
# only accept plateaus of min_length
plateaus = p[(np.diff(p) > min_length).flatten()]
return (plateaus, dF, d2F)

Sign on results of fft

I am attempting to calculate the MTF from a test target. I calculate the spread function easily enough, but the FFT results do not quite make sense to me. To summarize,the values seem to alternate giving me a reflection of what I would expect. To test, I used a simple square wave and numpy:
from numpy import fft
data = []
for x in range (0, 20):
data.append(0)
data[9] = 10
data[10] = 10
data[11] = 10
dataFFT = fft.fft(data)
The results look correct, with the exception of the sign... I am seeing the following for the first 4 values as an example:
30.00000000 +0.00000000e+00j
-29.02113033 +7.10542736e-15j
26.18033989 -1.24344979e-14j
-21.75570505 +1.24344979e-14j
So my question is why positive->negative->positive->negative in the real plane? This is not what I would expect... It I plot it, it almost appears that the correct function is mirrored around the x axis.
Note: I was expecting the following as an example:
This is what I am getting:

Your pulse is symmetric and positioned in the center of your FFT window (around N/2). Symmetric real data corresponds to only the cosine or "real" components of an FFT result. Note that the cosine function alternates between being -1 and 1 at the center of the FFT window, depending on the frequency bin index (representing cosine periods per FFT width). So the correlation of these FFT basis functions with a positive going pulse will also alternate as long as the pulse is narrower than half the cosine period.
If you want the largest FFT coefficients to be mostly positive, try centering your narrow rectangular pulse around time 0 (or circularly, time N), where the cosine function is always 1 for any frequency.

It works if you shift the data around 0 instead of half your array, with:
dataFFT = fft.fft(np.fftshift(data))

This isn't all that unexpected. If you want to check against conventional plots, make sure you convert that info to magnitude and phase before coming to any conclusions.
I did a quick check using your code and numpy.abs for mag, numpy,angle for phase. It sure looks like a sinc() function to me, which is what would be expected if the time-domain is a square pulse. If you do this, you'll find a pretty wide sinc, as would be expeceted for a short duration pulse on so few samples.

you forget to specify if your data is Real or Complex
not everyone code in python/numpy (including me) and if you do not know this then you probably handle data to/from FFT the wrong way.
FFT input can be both real or complex domain
FFT output is complex domain
so check the docs for your FFT implementation and specify it and also repair your data handling accordingly. Complex domain usually have first value Re and Second Im but that depends on FFT implementation/configuration.
signal
here is an example of impulse response from FFT
first is input Real domain signal (Im=0) single finite nonzero width pulse and second is the Re part of FFT output. The third is the Im part of FFT output. If you zoom it a bit then you will see amplitude range of y axis of each signal (on left).
Do not forget that different FFT implementations can have different normalization constants which will change the amplitude of signal. If you want magnitude and phase convert it like this:
mag=sqrt(Re*Re+Im*Im); // power
ang=atanxy(Re,Im); // phase angle
atanxy(dx,dy) is 4 quadrant arctan also called atan2 but be careful to get the operand order the same as your atanxy/atan2 implementation needs. Also can use mine C++ atanxy implementation
[Notes]
if your input signal is Real domain then FFT output is symmetric. Both Re and Im signals will be like:
{ a0,a1,a2,a3,...,a(n-1),a(n-1)...,a3,a2,a1,a0 }
exactly like on the image above. On the left are low frequencies and in the middle is the top frequency. If your input signal is Complex domain then the output can be anything.

Clipping FFT Matrix

Audio processing is pretty new for me. And currently using Python Numpy for processing wave files. After calculating FFT matrix I am getting noisy power values for non-existent frequencies. I am interested in visualizing the data and accuracy is not a high priority. Is there a safe way to calculate the clipping value to remove these values, or should I use all FFT matrices for each sample set to come up with an average number ?
regards
Edit:
from numpy import *
import wave
import pymedia.audio.sound as sound
import time, struct
from pylab import ion, plot, draw, show
fp = wave.open("500-200f.wav", "rb")
sample_rate = fp.getframerate()
total_num_samps = fp.getnframes()
fft_length = 2048.
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length);
data = struct.unpack("%dH"%(fft_length), tempb)
temp[i,:] = array(data, short)
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[0])
show()
Here is the plot in non-logarithmic scale, for synthetic wave file containing 500hz(fading out) + 200hz sine wave created using Goldwave.

Simulated waveforms shouldn't show FFTs like your figure, so something is very wrong, and probably not with the FFT, but with the input waveform. The main problem in your plot is not the ripples, but the harmonics around 1000 Hz, and the subharmonic at 500 Hz. A simulated waveform shouldn't show any of this (for example, see my plot below).
First, you probably want to just try plotting out the raw waveform, and this will likely point to an obvious problem. Also, it seems odd to have a wave unpack to unsigned shorts, i.e. "H", and especially after this to not have a large zero-frequency component.
I was able to get a pretty close duplicate to your FFT by applying clipping to the waveform, as was suggested by both the subharmonic and higher harmonics (and Trevor). You could be introducing clipping either in the simulation or the unpacking. Either way, I bypassed this by creating the waveforms in numpy to start with.
Here's what the proper FFT should look like (i.e. basically perfect, except for the broadening of the peaks due to the windowing)
Here's one from a waveform that's been clipped (and is very similar to your FFT, from the subharmonic to the precise pattern of the three higher harmonics around 1000 Hz)
Here's the code I used to generate these
from numpy import *
from pylab import ion, plot, draw, show, xlabel, ylabel, figure
sample_rate = 20000.
times = arange(0, 10., 1./sample_rate)
wfm0 = sin(2*pi*200.*times)
wfm1 = sin(2*pi*500.*times) *(10.-times)/10.
wfm = wfm0+wfm1
# int test
#wfm *= 2**8
#wfm = wfm.astype(int16)
#wfm = wfm.astype(float)
# abs test
#wfm = abs(wfm)
# clip test
#wfm = clip(wfm, -1.2, 1.2)
fft_length = 5*2048.
total_num_samps = len(times)
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
temp[i,:] = wfm[i*fft_length:(i+1)*fft_length]
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[2], linewidth=3)
xlabel("freq (Hz)")
ylabel('abs(FFT)')
show()

FFT's because they are windowed and sampled cause aliasing and sampling in the frequency domain as well. Filtering in the time domain is just multiplication in the frequency domain so you may want to just apply a filter which is just multiplying each frequency by a value for the function for the filter you are using. For example multiply by 1 in the passband and by zero every were else. The unexpected values are probably caused by aliasing where higher frequencies are being folded down to the ones you are seeing. The original signal needs to be band limited to half your sampling rate or you will get aliasing. Of more concern is aliasing that is distorting the area of interest because for this band of frequencies you want to know that the frequency is from the expected one.
The other thing to keep in mind is that when you grab a piece of data from a wave file you are mathmatically multiplying it by a square wave. This causes a sinx/x to be convolved with the frequency response to minimize this you can multiply the original windowed signal with something like a Hanning window.

It's worth mentioning for a 1D FFT that the first element (index [0]) contains the DC (zero-frequency) term, the elements [1:N/2] contain the positive frequencies and the elements [N/2+1:N-1] contain the negative frequencies. Since you didn't provide a code sample or additional information about the output of your FFT, I can't rule out the possibility that the "noisy power values at non-existent frequencies" aren't just the negative frequencies of your spectrum.
EDIT: Here is an example of a radix-2 FFT implemented in pure Python with a simple test routine that finds the FFT of a rectangular pulse, [1.,1.,1.,1.,0.,0.,0.,0.]. You can run the example on codepad and see that the FFT of that sequence is
[0j, Negative frequencies
(1+0.414213562373j), ^
0j, |
(1+2.41421356237j), |
(4+0j), <= DC term
(1-2.41421356237j), |
0j, v
(1-0.414213562373j)] Positive frequencies
Note that the code prints out the Fourier coefficients in order of ascending frequency, i.e. from the highest negative frequency up to DC, and then up to the highest positive frequency.

I don't know enough from your question to actually answer anything specific.
But here are a couple of things to try from my own experience writing FFTs:
Make sure you are following Nyquist rule
If you are viewing the linear output of the FFT... you will have trouble seeing your own signal and think everything is broken. Make sure you are looking at the dB of your FFT magnitude. (i.e. "plot(10*log10(abs(fft(x))))" )
Create a unitTest for your FFT() function by feeding generated data like a pure tone. Then feed the same generated data to Matlab's FFT(). Do a absolute value diff between the two output data series and make sure the max absolute value difference is something like 10^-6 (i.e. the only difference is caused by small floating point errors)
Make sure you are windowing your data
If all of those three things work, then your fft is fine. And your input data is probably the issue.
Check the input data to see if there is clipping http://www.users.globalnet.co.uk/~bunce/clip.gif
Time doamin clipping shows up as mirror images of the signal in the frequency domain at specific regular intervals with less amplitude.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.