I am currently creating a network in keras to perform harmonic/percussive source separation on an audio spectrogram using a median filtering technique (http://dafx10.iem.at/papers/DerryFitzGerald_DAFx10_P15.pdf).
Given an input magnitude spectrogram S, and denoting the ith time frame as Si, and the hth frequency slice as Sh, a percussion-enhanced spectrogram frame Pi can be generated by performing median filtering on Si : Pi = M{Si, lperc} where M denotes the median filtering and lperc is the filter length. The individual percussion-enhanced frames Pi are then combined to yield a percussion-enhanced spectrogram P. Similarly, a harmonic-enhanced spectrogram frequency slice Hh can be obtained by median filtering frequency slice Sh : Hi = M{Sh, lharm}.
Once you have P and H, you can see whether each frequency bin Sh,i belongs either to the harmonic or the percussive source : if Hh,i > Ph,i, Sh,i goes to the harmonic spectrogram and takes the value 0 in the percussive spectrogram, and vice versa.
In my network, given the input spectrogram and for a specific time frame Si, I need to compute the medians horizontally for each frequency h. This can be easily done with a lambda layer and tensorflow :
layer_H = Lambda(lambda x:tf.contrib.distributions.percentile(x[0], 50, axis=0))(layer)
Here, the length of the harmonic median filter lharm is the horizontal length of the input spectrogram. The output is a vector whose size is equal to the number of frequencies (in my case, 88).
The next step is where I am stuck right now : I need to compute the medians vertically for the current time frame Si, given the length of the percussive median filter lperc, and knowing that I want the resulting vector to be the same size as the input, so I have to be careful on each end of the the input (the size of the filter will be between lharm and lharm/2 depending on where we're at). This looks like some sort of convolution, for lack of a better word.
Once I have the two resulting vectors Hi and Pi, I want to compare them and assign each value of the original frame Si to either a percussive layer (Lp) or a harmonic layer (Lh). So, I have three different inputs, Hi, Pi and Si, and I want to end up with Lp and Lh by comparing Hi and Pi, and continue building my network from there. If Hi,j > Pi,j, then Lpi,j = 0 and Lhi,j = Si,j.
To sum up, I am stuck on two different problems :
How to compute the horizontal medians ?
How to implement in the network the operations that will allow me to go from Hi, Pi and Si to Lp and Lh ?
Thank you very much in advance !
Related
I have been reading this paper where they have used Error Level Analysis (ELA) and Block Artifact Grid (BAG).
I have found a way to do the ELA using skimage from here
I want a final output of the BAG part showed in the diagram below [Look at the BAG output specifically] -
But I haven't found a single implementation of BAG in python:
Divide the image into 8 × 8 blocks. Take the DCT of the blocks (using an 8 × 8 DCT matrix and matrix multiply).
Make a histogram of the color-quantized DCT values for each of the 64 locations of the blocks (where the number of blocks and the number
of values in each histogram is equal to the number that can fit into
the image).
Take the Fast-Fourier Transform (FFT) of the histogram of each of the 64 frequencies to get the periodicity and then power spectrum to
get peaks.
Calculate the number of local minimums of the extrema. This is the estimated Q value.
Get a Q estimate for at least 32 Q values, and use it to calculate the block artifact (error in the Q value) for each image block. Output
an error map of the image.
What I have tried
I can try and figure out individual pieces like taking DCT, calculating FFT but unable to put everything together, especially Q estimate for at least 32 Q values, which is not making sense to me. Thanks in advance.
I am using welch in python.
Which parameter in welch does define the length of the output array?
Based on my trials, the output length is related to nperseg/2; but I cannot understand its reason and mathematics. And, I am not sure about the effect of other parameters on the output length.
Also, there is not enough explanation in its documentation (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.welch.html)
I will be more than happy if anyone can help me. I could not find any clear info on the web!
Parsing the sentence from the documents
Welch’s method [1] computes an estimate of the power spectral density by dividing the data into overlapping segments, computing a modified periodogram for each segment and averaging the periodograms.
def periodogram(x, Ts=1.0):
'''
This function will compute one score for each period
in
'''
return abs(np.fft.rfft(x))**2
Then they say that the data is divided in overlapped segments and then added, so it is something like this
def welsh(data, nperseg, noverlap, window):
stride = nperseg - noverlap
return sum(periodogram(data[i*stride:i*stride+nperseg])
for i in range((len(data) - nperseg) // stride))
i.e. you take segments of length nperseg to compute the periodograms, since you can get the periodogram only up to the Nyquist frequency, you end up having nperseg/2 (for nperseg).
I am using this algorithm to detect the pitch of
this audio file. As you can hear, it is an E2 note played on a guitar with a bit of noise in the background.
I generated this spectrogram using STFT:
And I am using the algorithm linked above like this:
y, sr = librosa.load(filename, sr=40000)
pitches, magnitudes = librosa.core.piptrack(y=y, sr=sr, fmin=75, fmax=1600)
np.set_printoptions(threshold=np.nan)
print pitches[np.nonzero(pitches)]
As a result, I am getting pretty much every possible frequency between my fmin and fmax. What do I have to do with the output of the piptrack method to discover the fundamental frequency of a time frame?
UPDATE
I am still not sure what those 2D array represents, though. Let's say I want to find out how strong is 82Hz in frame 5. I could do that using the STFT function which simply returns a 2D matrix (which was used to plot the spectrogram).
However, piptrack does something additional which could be useful and I don't really understand what. pitches[f, t] contains instantaneous frequency at bin f, time t. Does that mean that, if I want to find the maximum frequency at time frame t, I have to:
Go to the magnitudes[][t] array, find the bin with the maximum
magnitude.
Assign the bin to a variable f.
Find pitches[b][t] to find the frequency that belongs to that bin?
Pitch detection is a tricky topic and is often counter-intuitive. I'm not wild about the way the source code is documented for this particular function -- it almost seems like the developer is confusing a 'harmonic' with a 'pitch'.
When a single note (a 'pitch') is made on a guitar or piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies, called harmonics. Typical pitch tracking techniques include searching the results of a FFT for magnitudes in certain bins that correspond to the expected frequencies of harmonics. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ). However, the frequencies where harmonics are located are logarithmically spaced, but the FFT uses a linear spacing. Often the vertical spacing for FFTs are not resolved enough at the lower frequencies.
For that reason when I wrote a pitch detecting application (PitchScope Player), I chose to create a logarithmically spaced DFT, rather than a FFT, so I could focus on the precise frequencies of interest for music ( see the attached diagram of my custom DFT from 3 seconds of a guitar solo ). If you are serious about pursuing pitch detection, you should consider doing more reading into the topic, looking at other sample code (mine is linked below), and consider writing your own functions to measure frequency.
https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection
https://github.com/CreativeDetectors/PitchScope_Player
Turns out the way to pick the pitch at a certain frame t is simple:
def detect_pitch(y, sr, t):
index = magnitudes[:, t].argmax()
pitch = pitches[index, t]
return pitch
First getting the bin of the strongest frequency by looking at the magnitudes array, and then finding the pitch at pitches[index, t].
To find the pitch of the whole audio segment:
def detect_pitch(y, sr):
pitches, magnitudes = librosa.core.piptrack(y=y, sr=sr, fmin=75, fmax=1600)
# get indexes of the maximum value in each time slice
max_indexes = np.argmax(magnitudes, axis=0)
# get the pitches of the max indexes per time slice
pitches = pitches[max_indexes, range(magnitudes.shape[1])]
return pitches
I am looking for an efficient way to detect plateaus in otherwise very noisy data. The plateaus are always relatively broad A simple example of what this data could look like:
test=np.random.uniform(0.9,1,100)
test[10:20]=0
plt.plot(test)
Note that there can be multiple plateaus (which should all be detected) which can have different values.
I've tried using scipy.signal.argrelextrema, but it doesn't seem to be doing what I want it to:
peaks=argrelextrema(test,np.less,order=25)
plt.vlines(peaks,ymin=0, ymax=1)
I don't need the exact interval of the plateau- a rough range estimate would be enough, as long as that estimate is bigger or equal than the actual plateau range. It should be relatively efficient however.
There is a method scipy.signal.find_peaks that you can try, here is an exmple
import numpy
from scipy.signal import find_peaks
test = numpy.random.uniform(0.9, 1.0, 100)
test[10 : 20] = 0
peaks, peak_plateaus = find_peaks(- test, plateau_size = 1)
although find_peaks only finds peaks, it can be used to find valleys if the array is negated, then you do the following
for i in range(len(peak_plateaus['plateau_sizes'])):
if peak_plateaus['plateau_sizes'][i] > 1:
print('a plateau of size %d is found' % peak_plateaus['plateau_sizes'][i])
print('its left index is %d and right index is %d' % (peak_plateaus['left_edges'][i], peak_plateaus['right_edges'][i]))
it will print
a plateau of size 10 is found
its left index is 10 and right index is 19
This is really just a "dumb" machine learning task. You'll want to code a custom function to screen for them. You have two key characteristics to a plateau:
They're consecutive occurrences of the same value (or very nearly so).
The first and last points deviate strongly from a forward and backward moving average, respectively. (Try quantifying this based on the standard deviation if you expect additive noise, for geometric noise you'll have to take the magnitude of your signal into account too.)
A simple loop should then be sufficient to calculate a forward moving average, stdev of points in that forward moving average, reverse moving average, and stdev of points in that reverse moving average.
Read until you find a point well outside the regular noise (compare to variance). Start buffering those indices into a list.
Keep reading and buffering indices into that list while they have the same value (or nearly the same, if your plateaus can be a little rough; you'll want to use some tolerance plus the standard deviation of your plateaus, or just some tolerance if you expect them all to behave similarly).
If the variance of the points in your buffer gets too high, it's not a plateau, too rough; throw it out and start scanning again from your current position.
If the last value was very different from the previous (on the order of the change that triggered your code to start buffering indices) and in the opposite direction of the original impulse, cap your buffer here; you've got a plateau there.
Now do whatever you want with the points at those indices. Delete them, replace them with a linear interpolation between the two boundary points, whatever.
I could generate some noise and give you some sample code, but this is really something you're going to have to adapt to your application. (For example, there's a shortcoming in this method that a plateau which captures a point on the middle of the "cliff edge" may leave that point when it removes the rest of the plateau. If that's something you're worried about, you'll have to do a little more exploring after you ID the plateau.) You should be able to do this in a single pass over the data, but it might be wise to get some statistics on the whole set first to intelligently tweak your thresholds.
If you have an exact definition of what constitutes a plateau, you can make this a lot less hand-wavey and ML-looking, but so long as you're trying to identify fuzzy pattern, you're gonna have to take a statistics-based approach.
I had a similar problem, and found a simple heuristic solution shared below. I find plateaus as ranges of constant gradient of the signal. You could change the code to also check that the gradient is (close to) 0.
I apply a moving average (uniform_filter_1d) to filter out noise. Also, I calculate the first and second derivative of the signal numerically, so I'm not sure it matches the requirement of efficiency. But it worked perfectly for my signal and might be a good starting point for others.
def find_plateaus(F, min_length=200, tolerance = 0.75, smoothing=25):
'''
Finds plateaus of signal using second derivative of F.
Parameters
----------
F : Signal.
min_length: Minimum length of plateau.
tolerance: Number between 0 and 1 indicating how tolerant
the requirement of constant slope of the plateau is.
smoothing: Size of uniform filter 1D applied to F and its derivatives.
Returns
-------
plateaus: array of plateau left and right edges pairs
dF: (smoothed) derivative of F
d2F: (smoothed) Second Derivative of F
'''
import numpy as np
from scipy.ndimage.filters import uniform_filter1d
# calculate smooth gradients
smoothF = uniform_filter1d(F, size = smoothing)
dF = uniform_filter1d(np.gradient(smoothF),size = smoothing)
d2F = uniform_filter1d(np.gradient(dF),size = smoothing)
def zero_runs(x):
'''
Helper function for finding sequences of 0s in a signal
https://stackoverflow.com/questions/24885092/finding-the-consecutive-zeros-in-a-numpy-array/24892274#24892274
'''
iszero = np.concatenate(([0], np.equal(x, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
# Find ranges where second derivative is zero
# Values under eps are assumed to be zero.
eps = np.quantile(abs(d2F),tolerance)
smalld2F = (abs(d2F) <= eps)
# Find repititions in the mask "smalld2F" (i.e. ranges where d2F is constantly zero)
p = zero_runs(np.diff(smalld2F))
# np.diff(p) gives the length of each range found.
# only accept plateaus of min_length
plateaus = p[(np.diff(p) > min_length).flatten()]
return (plateaus, dF, d2F)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am new in python as well as in signal processing. I am trying to calculate mean value among some frequency range of a signal.
What I am trying to do is as follows:
import numpy as np
data = <my 1d signal>
lF = <lower frequency>
uF = <upper frequency>
ps = np.abs(np.fft.fft(data)) ** 2 #array of power spectrum
time_step = 1.0 / 2000.0
freqs = np.fft.fftfreq(data.size, time_step) # array of frequencies
idx = np.argsort(freqs) # sorting frequencies
sum = 0
c =0
for i in idx:
if (freqs[i] >= lF) and (freqs[i] <= uF) :
sum += ps[i]
c +=1
avgValue = sum/c
print 'mean value is=',avgValue
I think calculation is fine, but it takes a lot of time like for data of more than 15GB and processing time grows exponentially. Is there any fastest way available such that I would be able to get mean value of power spectrum within some frequency range in fastest manner. Thanks in advance.
EDIT 1
I followed this code for calculation of power spectrum.
EDIT 2
This doesn't answer to my question as it calculates mean over the whole array/list but I want mean over part of the array.
EDIT 3
Solution by jez of using mask reduces time. Actually I have more than 10 channels of 1D signal and I want to treat them in a same manner i.e. average frequencies in a range of each channel separately. I think python loops are slow. Is there any alternate for that?
Like this:
for i in xrange(0,15):
data = signals[:, i]
ps = np.abs(np.fft.fft(data)) ** 2
freqs = np.fft.fftfreq(data.size, time_step)
mask = np.logical_and(freqs >= lF, freqs <= uF )
avgValue = ps[mask].mean()
print 'mean value is=',avgValue
The following performs a mean over a selected region:
mask = numpy.logical_and( freqs >= lF, freqs <= uF )
avgValue = ps[ mask ].mean()
For proper scaling of power values that have been computed as abs(fft coefficients)**2, you will need to multiply by (2.0 / len(data))**2 (Parseval's theorem)
Note that it gets slightly fiddly if your frequency range includes the Nyquist frequency—for precise results, handling of that single frequency component would then need to depend on whether data.size is even or odd). So for simplicity, ensure that uF is strictly less than max(freqs). [For similar reasons you should ensure lF > 0.]
The reasons for this are tedious to explain and even more tedious to correct for, but basically: the DC component is represented once in the DFT, whereas most other frequency components are represented twice (positive frequency and negative frequency) at half-amplitude each time. The even-more-annoying exception is the Nyquist frequency which is represented once at full amplitude if the signal length is even, but twice at half amplitude if the signal length is odd. All of this would not affect you if you were averaging amplitude: in a linear system, being represented twice compensates for being at half amplitude. But you're averaging power, i.e. squaring the values before averaging, so this compensation doesn't work out.
I've pasted my code for grokking all of this. This code also shows how you can work with multiple signals stacked in one numpy array, which addresses your follow-up question about avoiding loops in the multi-channel case. Remember to supply the correct axis argument both to numpy.fft.fft() and to my fft2ap().
If you really have a signal of 15 GB size, you'll not be able to calculate the FFT in an acceptable time. You can avoid using the FFT, if it is acceptable for you to approximate your frequency range by a band pass filter. The justification is the Poisson summation formula, which states that sum of squares is not changed by a FFT (or: the power is preserved). Staying in the time domain will let the processing time rise proportionally to the signal length.
The following code designs a Butterworth band path filter, plots the filter response and filters a sample signal:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
dd = np.random.randn(10**4) # generate sample data
T = 1./2e3 # sampling interval
n, f_s = len(dd), 1./T # number of points and sampling frequency
# design band path filter:
f_l, f_u = 50, 500 # Band from 50 Hz to 500 Hz
wp = np.array([f_l, f_u])*2/f_s # normalized pass band frequnecies
ws = np.array([0.8*f_l, 1.2*f_u])*2/f_s # normalized stop band frequencies
b, a = signal.iirdesign(wp, ws, gpass=60, gstop=80, ftype="butter",
analog=False)
# plot filter response:
w, h = signal.freqz(b, a, whole=False)
ff_w = w*f_s/(2*np.pi)
fg, ax = plt.subplots()
ax.set_title('Butterworth filter amplitude response')
ax.plot(ff_w, np.abs(h))
ax.set_ylabel('relative Amplitude')
ax.grid(True)
ax.set_xlabel('Frequency in Hertz')
fg.canvas.draw()
# do the filtering:
zi = signal.lfilter_zi(b, a)*dd[0]
dd1, _ = signal.lfilter(b, a, dd, zi=zi)
# calculate the avarage:
avg = np.mean(dd1**2)
print("RMS values is %g" % avg)
plt.show()
Read the documentation to Scipy's Filter design to learn how to modify the parameters of the filter.
If you want to stay with the FFT, read the docs on signal.welch and plt.psd. The Welch algorithm is a method to efficiently calculate the power spectral density of a signal (with some trade-offs).
It is much easier to work with FFT if your arrays are power of 2. When you do fft the frequencies ranges from -pi/timestep to pi/timestep (assuming that frequency is defined as w = 2*pi/t, change the values accordingly if you use f =1/t representation). Your spectrum is arranged as 0 to minfreqq--maxfreq to zero. you can now use fftshift function to swap the frequencies and your spectrum looks like minfreq -- DC -- maxfreq. now you can easily determine your desired frequency range because it is already sorted.
The frequency step dw=2*pi/(time span) or max-frequency/(N/2) where N is array size.
N/2 point is DC or 0 frequency. Nth position is max frequency now you can easily determine your range
Lower_freq_indx=N/2+N/2*Lower_freq/max_freq
Higher_freq_index=N/2+N/2*Higher_freq/Max_freq
avg=sum(ps[lower_freq_indx:Higher_freq_index]/(Higher_freq_index-Lower_freq_index)
I hope this will help
regards