Determine length of a sound in an audio file - python

I have a .wav file that has 2 types of sounds: Long and short. What I need to do is I need to encode them as bits and write them to a binary file.
I got the code from this SO answer: https://stackoverflow.com/a/53309191/2588339 and using it I get this plot for my input wav file:
As you can see, there are shorter and wider parts in the first plot as for the shorter and longer sounds in my file.
My question is how can I encode each one of the sounds as a bit? Like having each long sound in the file represent a 1 and a short sound represent a 0.
EDIT: The 2 types of sound differ by how long they play and by frequency also. The longer sound is also lower frequency and the shorter sound is also higher frequency. You can find a sample of the file here: https://vocaroo.com/i/s0A1weOF3I3f

Measuring the loudness of each frequency by taking the FFT of the signal is the more "scientific" way to do it, but the image of the raw signal indicates it should be possible to get away much easier than that.
If you take a sliding window (at least as wide as 1 period of the primary frequency of the sound (~300Hz)) and find the maximum value within that window, it should be fairly easy to apply a threshold to determine if the tone is playing at a given time interval or not. Here's a quick article I found on rolling window functions.
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
window_size = sample_rate / primary_freq #minimum size window. could be larger.
rolling_max = np.max(rolling_window(wav_data, window_size),-1)
threshold_max = rolling_max > threshold # maybe about 1000ish based on your graph
Then simply determine the length of the runs of True in threshold_max. Again, I'll pull on the community from this answer showing a concise way to get the run length of an array (or other iterable).
def runs_of_ones(bits):
for bit, group in itertools.groupby(bits):
if bit: yield sum(group)
run_lengths = list(runs_of_ones(threshold_max))
The values in run_lengths should now be the length of each "on" pulse of sound in # of samples. It should now be relatively straightforward for you to test each value if it's long or short and write to a file.

Related

Fastest way to find the best level of compression of an image

I wrote a function (in Python) that saves an image with a certain quality level (0 to 100, higher is better). Depending on the quality level, the final file is bigger or smaller (in byte).
I need to write a function that selects the best possible quality level for an image, keeping it under a maximum file size (in byte). The only way I can do it is by attempts.
The simplest approach is: save the file with a certain quality and if it is bigger than expected then reduce the quality and so forth.
Unfortunately this approach is very time consuming. Even if I reduce the quality by five points at each iteration, the risk is that I have to save the same file 21 times before I find the right quality.
Another solution is: try with the half of the previous quality and focus on the lower range or on the higher range of quality according to the result.
Let me clarify:
assuming that quality can be between 0 and 100, try with quality = 50
if file size is higher than expected, focus on the lower quality range (e.g. 0-49) otherwise on the higher one (e.g. 51-100).
set the quality value in the middle of the considered range and save the file (e.g. 25 if lower range and 75 if higher range); return to 2
exit when the range is smaller than 5
This second solution requires always 6 iterations.
Here it is a Python implementation:
limit_file_size = 512 # maximum file size, change this for different results
q_max = 100
q_min = 0
quality = q_min + (q_max - q_min) // 2
while True:
file_size = save_file(quality)
if (q_max - q_min) <= 5: break
if file_size > limit_file_size:
q_max = quality
else:
q_min = quality
quality = q_min + (q_max - q_min) // 2
Please note that function save_file is not provided for brevity, a fake implementation of it can be the following:
import math
def save_file(quality):
return int(math.sqrt(quality))*100
How to reduce the amount of cycles required by the above function to converge to a valid solution?
You can try to do a binary search function like you mentioned and save the result in a dictionary so that next iteration it will check if the file size quality was calculated already like that:
# { file_size : quality }
{ 512 : 100 , 256 : 100, 1024 : 27 }
Note that each image has different dimensions, color depth, format etc so you may get a range of results, I suggest to play with it and create sub keys in the properties that have the most impact for example:
{ format : { file_size : quality } }
You could make some kind of simple ML (Machine Learning) approach. Train model that given:
image size for quality 50
limit_file_size
as input will produce best quality you seek, or at least narrow down your search so that you need 2-3 iterations instead of 6.
So you would have to gather training (and validation) data, train the model that will be simple and fast (should be fasterer than save_file).
I think this is best approach in terms of determining this quality as fast as possible, but requires a lot of work (specially if you have no experience in ML).

Windowing an audio signal in Python for a gammatone filterbank implementation

I am new to programming, particularly to python. I am trying to implement an auditory model using 4th order gammatone filters. I need to break down a signal into 39 channels. When I used a smaller signal (about 884726 bits), the code runs well but I think the buffers are full, so I have to restart the shell to run the code second time. Tried using flush() but didn't work out.
So, I decided to window the signal using a Hanning window but couldn't succeed in it either. To be very clear, I need to break an audio signal into 39 channels, rectify it (half wave) and then pass it into a second bank of 4th order filters, this time into 10 channels. I am trying to downsample the signal before sending into the second bank of filters. This is the piece of code that implements the filter bank by using the coefficients generated by another function. The dimensions of b are 39x4096.
def filterbank_application(input, b, verbose = False):
"""
A function to run the input through a bandpass filter bank with parameters defined by the b and a coefficients.
Parameters:
* input (type: array-like matrix of floats) - input signal. (Required)
* b (type: array-like matrix of floats) - the b coefficients of each filter in shape b[numOfFilters][numOfCoeffs]. (Required)
Returns:
* y (type: numpy array of floats) - an array with inner dimensions equal to that of the input and outer dimension equal to
the length of fc (i.e. the number of bandpass filters in the bank) containing the outputs to each filter. The output
signal of the nth filter can be accessed using y[n].
"""
input = np.array(input)
bshape = np.shape(b)
nFilters = bshape[0]
lengthFilter = bshape[1]
shape = (nFilters,) + (np.shape(input))
shape = np.array(shape[0:])
shape[-1] = shape[-1] + lengthFilter -1
y = np.zeros((shape))
for i in range(nFilters):
if(verbose):
sys.stdout.write("\r" + str(int(np.round(100.0*i/nFilters))) + "% complete.")
sys.stdout.flush()
x = np.array(input)
y[i] = signal.fftconvolve(x,b[i])
if(verbose): sys.stdout.write("\n")
return y
samplefreq,input = wavfile.read('sine_sweep.wav')
input = input.transpose()
input = (input[0] + input[1])/2
b_coeff1 = gammatone_filterbank(samplefreq, 39)
Output = filterbank_application(input, b_coeff1)
Rect_Output = half_rectification(Output)
I want to window audio into chunks of 20 seconds length. I would appreciate if you could let me know an efficient way of windowing my signal as the whole audio will be 6 times bigger than the signal I am using. Thanks in advance.
You may have a problem with the memory consumption, if you run a 32-bit Python. Your code consumes approximately 320 octets (bytes) per sample (40 buffers, 8 octets per sample). The maximum memory available is 2 GB, which means that then the absolute maximum size for the signal is around 6 million samples. If your file around 100 seconds, then you may start having problems.
There are two ways out of that problem (if that really is the problem, but I cannot see any evident reason why your code would otherwise crash). Either get a 64-bit Python or rewrite your code to use memory in a more practical way.
If I have understood your problem correctly, you want to:
run the signal through 39 FIR filters (4096 points each)
half-rectify the resulting signals
downsample the resulting half-rectified signal
filter each of the downsampled rectified signals by 10 FIR filters (or IIR?)
This wil give you 39 x 10 signals which give you the attack and frequency response of the incoming auditory signal.
How I would do this is:
take the original signal and keep it in memory (if it does not fit, that can be fixed by a trick called memmap, but if your signal is not very long it will fit)
take the first gammatone filter and run the convolution (scipy.signal.fftconvolve)
run the half-wave rectification (sig = np.clip(sig, 0, None, out=sig))
downsample the signal (e.g. scipy.signal.decimate)
run the 10 filters (e.g. scipy.signal.fftconvolve)
repeat steps 2-5 for all other gammatones
This way you do not need to keep the 39 copies of the filtered signal in memory, if you only need the end results.
Without seeing the complete application and knowing more about the environment it is difficult to say whether you really have a memory problem.
Just a stupid signal-processing question: Why half-wave rectification? Why not full-wave rectification: sig = np.abs(sig)? The low-pass filtering is easier with full-wave rectified signal, and the audio signals should anyway be rather symmetric.
There are a few things which you might want to change in your code:
you convert input into an array as the first thing in your function - there is no need to do it again within the loop (just use input instead of x when running the fftconvolve)
creating an empty ycould be done by y = np.empty((b.shape[0], input.shape[0] + b.shape[1] - 1)); this will be more readable and gets rid of a number of unnecessary variables
input.transpose() takes some time and memory and is not required. You may instead do: input = np.average(input, axis=1) This will average every row in the array (i.e. average the channels).
There is nothing wrong with your sys.stdout.write, etc. There the flush is used because otherwise the text is written into a buffer and only shown on the screen when the buffer is full.

Fast string to array copying python

I'm looking to cut up image data into regularly sized screen blocks. Currently the method I've been using is this:
def getScreenBlocksFastNew(bmpstr):
pixelData = array.array('c')
step = imgWidth * 4
pixelCoord = (blockY * blockSizeY * imgWidth +
blockSizeX * blockX)* 4
for y in range(blockSizeY):
pixelData.extend( bmpstr[pixelCoord : pixelCoord + blockSizeX * 4] )
pixelCoord += step
return pixelData
bmpstr is a string of the raw pixel data, stored as one byte per RGBA value. (I also have the option of using a tuple of ints. They seem to take about the same amount of time for each). This creates an array of a block of pixels, depicted by setting blockX, blockY and blockSizeX, blockSizeY. Currently blockSizeX = blockSizeY = 22, which is the optimal size screen block for what I am doing.
My problem is that this process takes .0045 seconds per 5 executions, and extrapolating that out to the 2000+ screen blocks to fill the picture resolution requires about 1.7 seconds per picture, which is far too slow.
I am looking to make this process faster, but I'm not sure what the proper algorithm will be. I am looking to have my pixelData array pre-created so I don't have to reinstantiate it every time. However this leaves me with a question: what is the fastest way to copy the pixel RGBA values from bmpstr to an array, without using extend or append? Do I need to set each value individually? That can't be the most efficient way.
For example, how can I copy values bmpstr[0:100] into pixelData[0:100] without using extend or setting each value individually?

Why do I have such a distortion with pygame sndarray objects?

I'm using sndarray from pygame to play with basic sound synthesis. The problem is Whatever I do, I have an awful distortion on the generated sound.
In the code I'll provide at the end of the question, you'll see a bunch of code coming from here and there. Actually, the main stuff comes from a MIT's source I found on the interweb which is using Numeric to do mathematic stuff and handling arrays, and since I can't install it for now, I decided to use Numpy for this.
First, I thought the problem was coming from the Int format of my arrays, but if I cast the values to numpy.int16, I don't have sound anymore.
Plus, I can't find anything on google about that kind of behavior from pygame / sndarray.
Any idea ?
Thanks !
Code :
global_sample_rate = 44100
def sine_array_onecycle(hz, peak):
length = global_sample_rate / float(hz)
omega = numpy.pi * 2 / length
xvalues = numpy.arange(int(length)) * omega
return (peak * numpy.sin(xvalues))
def zipstereo(f):
return numpy.array(zip (f , f))
def make_sound(arr, n_samples = global_sample_rate):
return pygame.sndarray.make_sound( zipstereo( numpy.resize(numpy.array(arr), (n_samples,)) ) )
def sine(hz, peak):
snd = make_sound(sine_array_onecycle(hz, peak), global_sample_rate)
return snd
=> 'hope I didn't make any lame mistake, I'm pretty new in the world of python
Presuming you have some initialization code like
pygame.mixer.pre_init(44100, -16, 2) # 44.1kHz, 16-bit signed, stereo
sndarray expects you to be passing it 16-bit integer arrays, not float arrays.
Your "peak" value needs to make sense given the 16-bit integer representation. So, if your float array has values in the range -1.0 to +1.0, then you need to multiply by 2**15 to get it scaled appropriately.
To be clear, you may want a conversion like:
numpy.int16(float_array*(2**15))
My best guess of the situation is that you had a float array with a low peak value like 1.0, so when converting it to int16 most everything was getting converted to 0 or +/-1, which you wouldn't be able to hear. When passing the float array, you were probably just getting random bits (when interpreted as 16 bit integers) so then it sounded like harsh noise (I stumbled through this phase on my way to getting this working).

Performing a moving linear fit to 1D data in Python

I have a 1D array of data and wish to extract the spatial variation. The standard way to do this which I wish to pythonize is to perform a moving linear regression to the data and save the gradient...
def nssl_kdp(phidp, distance, fitlen):
kdp=zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
print "Sweep ", swn+1
for rayn in range(myshape[1]):
print "ray ", rayn+1
small=[polyfit(distance[a:a+2*fitlen], phidp[swn, rayn, a:a+2*fitlen],1)[0] for a in xrange(myshape[2]-2*fitlen)]
kdp[swn, rayn, :]=array((list(itertools.chain(*[fitlen*[small[0]], small, fitlen*[small[-1]]]))))
return kdp
This works well but is SLOW... I need to do this 17*360 times...
I imagine the overhead is in the iterator in the [ for in arange] line... Is there an implimentation of a moving fit in numpy/scipy?
the calculation for linear regression is based on the sum of various values. so you could write a more efficient routine that modifies the sum as the window moves (adding one point and subtracting an earlier one).
this will be much more efficient than repeating the process every time the window shifts, but is open to rounding errors. so you would need to restart occasionally.
you can probably do better than this for equally spaced points by pre-calculating all the x dependencies, but i don't understand your example in detail so am unsure whether it's relevant.
so i guess i'll just assume that it is.
the slope is (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2) where the "2" is "squared" - http://easycalculation.com/statistics/learn-regression.php
for evenly spaced data the denominator is fixed (since you can shift the x axis to the start of the window without changing the gradient). the (ΣX) in the numerator is also fixed (for the same reason). so you only need to be concerned with ΣXY and ΣY. the latter is trivial - just add and subtract a value. the former decreases by ΣY (each X weighting decreases by 1) and increases by (N-1)Y (assuming x_0 is 0 and x_N is N-1) each step.
i suspect that's not clear. what i am saying is that the formula for the slope does not need to be completely recalculated each step. particularly because, at each step, you can rename the X values as 0,1,...N-1 without changing the slope. so almost everything in the formula is the same. all that changes are two terms, which depend on Y as Y_0 "drops out" of the window and Y_N "moves in".
I've used these moving window functions from the somewhat old scikits.timeseries module with some success. They are implemented in C, but I haven't managed to use them in a situation where the moving window varies in size (not sure if you need that functionality).
http://pytseries.sourceforge.net/lib.moving_funcs.html
Head here for downloads (if using Python 2.7+, you'll probably need to compile the extension itself -- I did this for 2.7 and it works fine):
http://sourceforge.net/projects/pytseries/files/scikits.timeseries/0.91.3/
I/we might be able to help you more if you clean up your example code a bit. I'd consider defining some of the arguments/objects in lines 7 and 8 (where you're defining 'small') as variables, so that you don't end row 8 with so many hard-to-follow parentheses.
Ok.. I have what seems to be a solution.. not an answer persay, but a way of doing a moving, multi-point differential... I have tested this and the result looks very very similar to a moving regression... I used a 1D sobel filter (ramp from -1 to 1 convolved with the data):
def KDP(phidp, dx, fitlen):
kdp=np.zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
#print "Sweep ", swn+1
for rayn in range(myshape[1]):
#print "ray ", rayn+1
kdp[swn, rayn, :]=sobel(phidp[swn, rayn,:], window_len=fitlen)/dx
return kdp
def sobel(x,window_len=11):
"""Sobel differential filter for calculating KDP
output:
differential signal (Unscaled for gate spacing
example:
"""
s=np.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
#print(len(s))
w=2.0*np.arange(window_len)/(window_len-1.0) -1.0
#print w
w=w/(abs(w).sum())
y=np.convolve(w,s,mode='valid')
return -1.0*y[window_len/2:len(x)+window_len/2]/(window_len/3.0)
this runs QUICK!

Categories