So i recently successfully built a system which will record, plot, and playback an audio wav file entirely with python. Now, I'm trying to put some filtering and audio mixing in between the when i record and when i start plotting and outputting the file to the speakers. However, i have no idea where to start. Right now I'm to read in a the intial wav file, apply a low pass filter, and then re-pack the newly filtered data into a new wav file. Here is the code i used to plot the initial data once i recorded it.
import matplotlib.pyplot as plt
import numpy as np
import wave
import sys
spf = wave.open('wavfile.wav','r')
#Extract Raw Audio from Wav File
signal = spf.readframes(-1)
signal = np.fromstring(signal, 'Int16')
plt.figure(1)
plt.title('Signal Wave...')
plt.plot(signal)
And here is some code i used to generate a test audio file of a single tone:
import numpy as np
import wave
import struct
freq = 440.0
data_size = 40000
fname = "High_A.wav"
frate = 11025.0
amp = 64000.0
sine_list_x = []
for x in range(data_size):
sine_list_x.append(np.sin(2*np.pi*freq*(x/frate)))
wav_file = wave.open(fname, "w")
nchannels = 1
sampwidth = 2
framerate = int(frate)
nframes = data_size
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, framerate, nframes,
comptype, compname))
for s in sine_list_x:
wav_file.writeframes(struct.pack('h', int(s*amp/2)))
wav_file.close()
I'm not really sure how to apply said audio filter and repack it, though. Any help and/or advice you could offer would be greatly appreciated.
First step : What kind of audio filter do you need ?
Choose the filtered band
Low-pass Filter : remove highest frequency from your audio signal
High-pass Filter : remove lowest frequencies from your audio signal
Band-pass Filter : remove both highest and lowest frequencies from your audio signal
For the following steps, i assume you need a Low-pass Filter.
Choose your cutoff frequency
The Cutoff frequency is the frequency where your signal will be attenuated by -3dB.
Your example signal is 440Hz, so let's choose a Cutoff frequency of 400Hz. Then your 440Hz-signal is attenuated (more than -3dB), by the Low-pass 400Hz filter.
Choose your filter type
According to this other stackoverflow answer
Filter design is beyond the scope of Stack Overflow - that's a DSP
problem, not a programming problem. Filter design is covered by any
DSP textbook - go to your library. I like Proakis and Manolakis'
Digital Signal Processing. (Ifeachor and Jervis' Digital Signal
Processing isn't bad either.)
To go inside a simple example, I suggest to use a moving average filter (for a simple low-pass filter).
See Moving average
Mathematically, a moving average is a type of convolution and so it can be viewed as an example of a low-pass filter used in signal processing
This Moving average Low-pass Filter is a basic filter, and it is quite easy to use and to understand.
The parameter of the moving average is the window length.
The relationship between moving average window length and Cutoff frequency needs little bit mathematics and is explained here
The code will be
import math
sampleRate = 11025.0
cutOffFrequency = 400.0
freqRatio = cutOffFrequency / sampleRate
N = int(math.sqrt(0.196201 + freqRatio**2) / freqRatio)
So, in the example, the window length will be 12
Second step : coding the filter
Hand-made moving average
see specific discussion on how to create a moving average in python
Solution from Alleo is
def running_mean(x, windowSize):
cumsum = numpy.cumsum(numpy.insert(x, 0, 0))
return (cumsum[windowSize:] - cumsum[:-windowSize]) / windowSize
filtered = running_mean(signal, N)
Using lfilter
Alternatively, as suggested by dpwilson, we can also use lfilter
win = numpy.ones(N)
win *= 1.0/N
filtered = scipy.signal.lfilter(win, [1], signal).astype(channels.dtype)
Third step : Let's Put It All Together
import matplotlib.pyplot as plt
import numpy as np
import wave
import sys
import math
import contextlib
fname = 'test.wav'
outname = 'filtered.wav'
cutOffFrequency = 400.0
# from http://stackoverflow.com/questions/13728392/moving-average-or-running-mean
def running_mean(x, windowSize):
cumsum = np.cumsum(np.insert(x, 0, 0))
return (cumsum[windowSize:] - cumsum[:-windowSize]) / windowSize
# from http://stackoverflow.com/questions/2226853/interpreting-wav-data/2227174#2227174
def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):
if sample_width == 1:
dtype = np.uint8 # unsigned char
elif sample_width == 2:
dtype = np.int16 # signed 2-byte short
else:
raise ValueError("Only supports 8 and 16 bit audio formats.")
channels = np.fromstring(raw_bytes, dtype=dtype)
if interleaved:
# channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
channels.shape = (n_frames, n_channels)
channels = channels.T
else:
# channels are not interleaved. All samples from channel M occur before all samples from channel M-1
channels.shape = (n_channels, n_frames)
return channels
with contextlib.closing(wave.open(fname,'rb')) as spf:
sampleRate = spf.getframerate()
ampWidth = spf.getsampwidth()
nChannels = spf.getnchannels()
nFrames = spf.getnframes()
# Extract Raw Audio from multi-channel Wav File
signal = spf.readframes(nFrames*nChannels)
spf.close()
channels = interpret_wav(signal, nFrames, nChannels, ampWidth, True)
# get window size
# from http://dsp.stackexchange.com/questions/9966/what-is-the-cut-off-frequency-of-a-moving-average-filter
freqRatio = (cutOffFrequency/sampleRate)
N = int(math.sqrt(0.196196 + freqRatio**2)/freqRatio)
# Use moviung average (only on first channel)
filtered = running_mean(channels[0], N).astype(channels.dtype)
wav_file = wave.open(outname, "w")
wav_file.setparams((1, ampWidth, sampleRate, nFrames, spf.getcomptype(), spf.getcompname()))
wav_file.writeframes(filtered.tobytes('C'))
wav_file.close()
sox library can be used for static noise removal.
I found this gist which has some useful commands as examples
Related
I'm trying to make a wavetable synthesizer in Python for the first time (based off an example I found here https://blamsoft.com/tutorials/expanse-creating-wavetables/) but the resultant sound I'm getting doesn't sound tonal at all. My output is just a low grainy buzz. I'm pretty new to making wavetables in Python and I was wondering if anybody might be able to tell me what I'm missing in order to write an A440 sine wavetable to the file "wavetable.wav" and have it actually produce a pure sine tone? Here's what I have at the moment:
import wave
import struct
import numpy as np
frame_count = 256
frame_size = 2048
sps = 44100
freq_hz = 440
file = "wavetable.wav" #write waveform to file
wav_file = wave.open(file, 'w')
wav_file.setparams((1, 2, sps, frame_count, 'NONE', 'not compressed'))
values = bytes(0)
for i in range(frame_count):
for ii in range(frame_size):
sample = np.sin((float(ii)/frame_size) * (i+128)/256 * 2 * np.pi * freq_hz/sps) * 65535
if sample < 0:
sample = 0
sample -= 32768
sample = int(sample)
values += struct.pack('h', sample)
wav_file.writeframes(values)
wav_file.close()
print("Generated " + file)
The sine function I have inside the for loop is probably the part I understand the least because I just went by the example verbatim. I'm used to making sine functions like (y = Asin(2πfx)) but I'm not sure what the purpose is of multiplying by ((i+128)/256) and 65535 (16-bit amplitude resolution?). I'm also not sure what the purpose is of subtracting 32768 from each sample. Is anyone able to clarify what I'm missing and maybe point me in the right direction? Am I going about this the wrong way? Any help is appreciated!
If you just wanted to generate sound data ahead of time and then dump it all into a file, and you’re also comfortable using NumPy, I’d suggest using it with a library like SoundFile. Then there’s no need to delimit the data into frames.
Starting with a naïve approach (using numpy.sin, not trying to optimize things yet), one ends with something like this:
from math import tau
import numpy as np
import soundfile as sf
file_path = 'sine.flac'
sample_rate = 48_000 # hertz
duration = 1.0 # seconds
frequency = 432.0 # hertz
amplitude = 0.8 # (not in decibels!)
start_phase = 0.0 # at what phase to start
sample_count = floor(sample_rate * duration)
# cyclical frequency in sample^-1
omega = frequency * tau / sample_rate
# all phases for which we want to sample our sine
phases = np.linspace(start_phase, start_phase + omega * sample_count,
sample_count, endpoint=False)
# our sine wave samples, generated all at once
audio = amplitude * np.sin(phases)
# now write to file
fmt, sub = 'FLAC', 'PCM_24'
assert sf.check_format(fmt, sub) # to make sure we ask the correct thing beforehand
sf.write(file_path, audio, sample_rate, format=fmt, subtype=sub)
This will be a mono sound, you can write stereo using 2d arrays (see NumPy and SoundFile’s docs).
But note that to make a wavetable specifically, you need to be sure it contains just a single period (or an integer number of periods) of the wave exactly, so the playback of the wavetable will be without clicks and have a correct frequency.
You can play chunked sound in real time in Python too, using something like PyAudio. (I’ve not yet used that, so at least for a time this answer would lack code related to that.)
Finally, frankly, all above is unrelated to the generation of sound data from a wavetable: you just pick a wavetable from somewhere, that doesn’t do much for actual synthesis. Here is a simple starting algorithm for that. Assume you want to play back a chunk of sample_count samples and have a wavetable stored in wavetable, a single period which loops perfectly and is normalized. And assume your current wave phase is start_phase, frequency is frequency, sample rate is sample_rate, amplitude is amplitude. Then:
# indices for the wavetable values; this is just for `np.interp` to work
wavetable_period = float(len(wavetable))
wavetable_indices = np.linspace(0, wavetable_period,
len(wavetable), endpoint=False)
# frequency of the wavetable played at native resolution
wavetable_freq = sample_rate / wavetable_period
# start index into the wavetable
start_index = start_phase * wavetable_period / tau
# code above you run just once at initialization of this wavetable ↑
# code below is run for each audio chunk ↓
# samples of wavetable per output sample
shift = frequency / wavetable_freq
# fractional indices into the wavetable
indices = np.linspace(start_index, start_index + shift * sample_count,
sample_count, endpoint=False)
# linearly interpolated wavetavle sampled at our frequency
audio = np.interp(indices, wavetable_indices, wavetable,
period=wavetable_period)
audio *= amplitude
# at last, update `start_index` for the next chunk
start_index += shift * sample_count
Then you output the audio. Though there are better ways to play back a wavetable, linear interpolation is at least a fine start. Frequency slides are also possible with this approach: just compute indices in another way, no longer spaced uniformly.
Below I have code that will take input from a microphone, and if the average of the audio block passes a certain threshold it will produce a spectrogram of the audio block (which is 30 ms long). Here is what a generated spectrogram looks like in the middle of normal conversation:
From what I have seen, this doesn't look anything like what I'd expect a spectrogram to look like given the audio and it's environment. I was expecting something more like the following (transposed to preserve space):
The microphone I'm recording with is the default on my Macbook, any suggestions on what's going wrong?
record.py:
import pyaudio
import struct
import math
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
THRESHOLD = 40 # dB
RATE = 44100
INPUT_BLOCK_TIME = 0.03 # 30 ms
INPUT_FRAMES_PER_BLOCK = int(RATE * INPUT_BLOCK_TIME)
def get_rms(block):
return np.sqrt(np.mean(np.square(block)))
class AudioHandler(object):
def __init__(self):
self.pa = pyaudio.PyAudio()
self.stream = self.open_mic_stream()
self.threshold = THRESHOLD
self.plot_counter = 0
def stop(self):
self.stream.close()
def find_input_device(self):
device_index = None
for i in range( self.pa.get_device_count() ):
devinfo = self.pa.get_device_info_by_index(i)
print('Device %{}: %{}'.format(i, devinfo['name']))
for keyword in ['mic','input']:
if keyword in devinfo['name'].lower():
print('Found an input: device {} - {}'.format(i, devinfo['name']))
device_index = i
return device_index
if device_index == None:
print('No preferred input found; using default input device.')
return device_index
def open_mic_stream( self ):
device_index = self.find_input_device()
stream = self.pa.open( format = pyaudio.paInt16,
channels = 1,
rate = RATE,
input = True,
input_device_index = device_index,
frames_per_buffer = INPUT_FRAMES_PER_BLOCK)
return stream
def processBlock(self, snd_block):
f, t, Sxx = signal.spectrogram(snd_block, RATE)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.savefig('data/spec{}.png'.format(self.plot_counter), bbox_inches='tight')
self.plot_counter += 1
def listen(self):
try:
raw_block = self.stream.read(INPUT_FRAMES_PER_BLOCK, exception_on_overflow = False)
count = len(raw_block) / 2
format = '%dh' % (count)
snd_block = np.array(struct.unpack(format, raw_block))
except Exception as e:
print('Error recording: {}'.format(e))
return
amplitude = get_rms(snd_block)
if amplitude > self.threshold:
self.processBlock(snd_block)
else:
pass
if __name__ == '__main__':
audio = AudioHandler()
for i in range(0,100):
audio.listen()
Edits based on comments:
If we constrain the rate to 16000 Hz and use a logarithmic scale for the colormap, this is an output for tapping near the microphone:
Which still looks slightly odd to me, but also seems like a step in the right direction.
Using Sox and comparing with a spectrogram generated from my program:
First, observe that your code plots up to 100 spectrograms (if processBlock is called multiple times) on top of each other and you only see the last one. You may want to fix that. Furthermore, I assume you know why you want to work with 30ms audio recordings. Personally, I can't think of a practical application where 30ms recorded by a laptop microphone could give interesting insights. It hinges on what you are recording and how you trigger the recording, but this issue is tangential to the actual question.
Otherwise the code works perfectly. With just a few small changes in the processBlock function, applying some background knowledge, you can get informative and aesthetic spectrograms.
So let's talk about actual spectrograms. I'll take the SoX output as reference. The colorbar annotation says that it is dBFS1, which is a logarithmic measure (dB is short for Decibel). So, let's first convert the spectrogram to dB:
f, t, Sxx = signal.spectrogram(snd_block, RATE)
dBS = 10 * np.log10(Sxx) # convert to dB
plt.pcolormesh(t, f, dBS)
This improved the color scale. Now we see noise in the higher frequency bands that was hidden before. Next, let's tackle time resolution. The spectrogram divides the signal into segments (default length is 256) and computes the spectrum for each. This means we have excellent frequency resolution but very poor time resolution because only a few such segments fit into the signal window (which is about 1300 samples long). There is always a trade-off between time and frequency resolution. This is related to the uncertainty principle. So let's trade some frequency resolution for time resolution by splitting the signal into shorter segments:
f, t, Sxx = signal.spectrogram(snd_block, RATE, nperseg=64)
Great! Now we got a relatively balanced resolution on both axes - but wait! Why is the result so pixelated?! Actually, this is all the information there is in the short 30ms time window. There are only so many ways 1300 samples can be distributed in two dimensions. However, we can cheat a bit and use higher FFT resolution and overlapping segments. This makes the result smoother although it does not provide additional information:
f, t, Sxx = signal.spectrogram(snd_block, RATE, nperseg=64, nfft=256, noverlap=60)
Behold pretty spectral interference patterns. (These patterns depend on the window function used, but let's not get caught in details, here. See the window argument of the spectrogram function to play with these.) The result looks nice, but actually does not contain any more information than the previous image.
To make the result more SoX-lixe observe that the SoX spectrogram is rather smeared on the time axis. You get this effect by using the original low time resolution (long segments) but let them overlap for smoothness:
f, t, Sxx = signal.spectrogram(snd_block, RATE, noverlap=250)
I personally prefer the 3rd solution, but you will need to find your own preferred time/frequency trade-off.
Finally, let's use a colormap that is more like SoX's:
plt.pcolormesh(t, f, dBS, cmap='inferno')
A short comment on the following line:
THRESHOLD = 40 # dB
The threshold is compared against the RMS of the input signal, which is not measured in dB but raw amplitude units.
1 Apparently FS is short for full scale. dBFS means that the dB measure is relative to the maximum range. 0 dB is the loudest signal possible in the current representation, so actual values must be <= 0 dB.
UPDATE to make my answer clearer and hopefully compliment the excellent explanation by #kazemakase, I found three things that I hope will help:
Use LogNorm:
plt.pcolormesh(t, f, Sxx, cmap='RdBu', norm=LogNorm(vmin=Sxx.min(), vmax=Sxx.max()))
use numpy's fromstring method
Turns out the RMS calculation wont work with this method as the data is constrained length data type and overflows become negative: ie 507*507=-5095.
use colorbar() as eveything becomes easier when you can see scale
plt.colorbar()
Original Answer:
I got a decent result playing a 10kHz frequency into your code with only a couple of alterations:
import the LogNorm
from matplotlib.colors import LogNorm
Use the LogNorm in the mesh
plt.pcolormesh(t, f, Sxx, cmap='RdBu', norm=LogNorm(vmin=Sxx.min(), vmax=Sxx.max()))
This gave me:
You may also need to call plt.close() after the savefig, and I think the stream read needs some work as later images were dropping the first quarter of the sound.
Id also recommend plt.colorbar() so you can see the scale it ends up using
UPDATE: seeing as someone took the time to downvote
Heres my code for a working version of the spectrogram.
It captures five seconds of audio and writes them out to a spec file and an audio file so you can compare. Theres stilla lot to improve and its hardly optimized: Im sure its dropping chunks because of the time to write audio and spec files. A better approach would be to use the non-blocking callback and I might do this later
The major difference to the original code was the change to get the data in the right format for numpy:
np.fromstring(raw_block,dtype=np.int16)
instead of
struct.unpack(format, raw_block)
This became obvious as a major problem as soon as I tried to write the audio to a file using:
scipy.io.wavfile.write('data/audio{}.wav'.format(self.plot_counter),RATE,snd_block)
Heres a nice bit of music, drums are obvious:
The code:
import pyaudio
import struct
import math
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import time
from scipy.io.wavfile import write
THRESHOLD = 0 # dB
RATE = 44100
INPUT_BLOCK_TIME = 1 # 30 ms
INPUT_FRAMES_PER_BLOCK = int(RATE * INPUT_BLOCK_TIME)
INPUT_FRAMES_PER_BLOCK_BUFFER = int(RATE * INPUT_BLOCK_TIME)
def get_rms(block):
return np.sqrt(np.mean(np.square(block)))
class AudioHandler(object):
def __init__(self):
self.pa = pyaudio.PyAudio()
self.stream = self.open_mic_stream()
self.threshold = THRESHOLD
self.plot_counter = 0
def stop(self):
self.stream.close()
def find_input_device(self):
device_index = None
for i in range( self.pa.get_device_count() ):
devinfo = self.pa.get_device_info_by_index(i)
print('Device %{}: %{}'.format(i, devinfo['name']))
for keyword in ['mic','input']:
if keyword in devinfo['name'].lower():
print('Found an input: device {} - {}'.format(i, devinfo['name']))
device_index = i
return device_index
if device_index == None:
print('No preferred input found; using default input device.')
return device_index
def open_mic_stream( self ):
device_index = self.find_input_device()
stream = self.pa.open( format = self.pa.get_format_from_width(2,False),
channels = 1,
rate = RATE,
input = True,
input_device_index = device_index)
stream.start_stream()
return stream
def processBlock(self, snd_block):
f, t, Sxx = signal.spectrogram(snd_block, RATE)
zmin = Sxx.min()
zmax = Sxx.max()
plt.pcolormesh(t, f, Sxx, cmap='RdBu', norm=LogNorm(vmin=zmin, vmax=zmax))
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.axis([t.min(), t.max(), f.min(), f.max()])
plt.colorbar()
plt.savefig('data/spec{}.png'.format(self.plot_counter), bbox_inches='tight')
plt.close()
write('data/audio{}.wav'.format(self.plot_counter),RATE,snd_block)
self.plot_counter += 1
def listen(self):
try:
print "start", self.stream.is_active(), self.stream.is_stopped()
#raw_block = self.stream.read(INPUT_FRAMES_PER_BLOCK, exception_on_overflow = False)
total = 0
t_snd_block = []
while total < INPUT_FRAMES_PER_BLOCK:
while self.stream.get_read_available() <= 0:
print 'waiting'
time.sleep(0.01)
while self.stream.get_read_available() > 0 and total < INPUT_FRAMES_PER_BLOCK:
raw_block = self.stream.read(self.stream.get_read_available(), exception_on_overflow = False)
count = len(raw_block) / 2
total = total + count
print "done", total,count
format = '%dh' % (count)
t_snd_block.append(np.fromstring(raw_block,dtype=np.int16))
snd_block = np.hstack(t_snd_block)
except Exception as e:
print('Error recording: {}'.format(e))
return
self.processBlock(snd_block)
if __name__ == '__main__':
audio = AudioHandler()
for i in range(0,5):
audio.listen()
I think the problem is that you are trying to do the spectrogram of a 30ms audio block, which is so short that you can consider the signal as stationary. The spectrogram is in fact the STFT, and you can find this also in the Scipy documentation:
scipy.signal.spectrogram(x, fs=1.0, window=('tukey', 0.25), nperseg=None, noverlap=None, nfft=None, detrend='constant', return_onesided=True, scaling='density', axis=-1, mode='psd')
Compute a spectrogram with consecutive Fourier transforms.
Spectrograms can be used as a way of visualizing the change of a nonstationary signal’s frequency content over time.
In the first figure you have four slices which are the result of four consecutive fft on your signal block, with some windowing and overlapping. The second figure has a unique slice, but it depends on the spectrogram parameters you have used.
The point is what do you want to do with that signal. What is the purpose of the algorithm?
I am not sure that working directly in Python is the best way for sound processing and most precisely with FFT... [ in my opinion using cython appear like an obligation in sound processing with python]
Have you evaluate the possiblity to bind any external FFT method (using fftw for example) and keep using python only to dispatch job to external method & to update the picture result ?
You may found some information relatively to optimze FFT in python here, and may also take a look at scipy FFT implementation.
I am trying to write a Python script that can demodulate an FSK modulated audio file and return the data encoded in the audio. The data being transmitted is GPS NMEA strings which are embedded as the audio channel in video files. Basically, text is encoded with FSK modulation, and I am trying to retrieve the text using Python. The device I am using to encode the data can also decode it, so I have been able to generate the correct output, but I need to be able to do it using software.
I have done some background reading to introduce myself to signal processing and FSK, and I have looked at example scripts (e.g. this one and minimodem).
I managed to write a Python script that runs successfully, although the output is incorrect. The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786. I think I am missing a symbol synchronizer, but I'm not sure how this works.
My question now is: how can I add symbol synchronization to the script and/or symbol timing? Are there better examples or explanations of how to do FSK demodulation in Python? I would appreciate any feedback or direction. Thank you.
Here's my script so far:
from scipy.io.wavfile import read
import numpy as np
import wave
import matplotlib.pyplot as plt
import scipy.signal as signal
from scipy.signal import blackman, butter
from scipy.fftpack import fft, rfft, rfftfreq, irfft
import scipy.signal.signaltools as sigtool
import binascii
# Read in data; 'wav' allows getting paramters, 'wav1' is actual signal data
wavfile = 'Sample4_160224_mono.wav'
wavfile1 = open(wavfile, 'r')
wav = wave.open(wavfile, 'r')
wav_1 = read(wavfile1)
params = wav.getparams()
N = params[3] #Sample size
wav1 = read(wavfile1)
wav2 = wav1[1][0:N]
duration = float(params[3] / params[2])
n_samples = len(wav2)
Fs = params[2]
nyq = 0.5 * Fs #Nyquist rate
Fbit = (params[2]*params[0]*16)/100
print "Fbit", Fbit
# Windowing function
w = blackman(n_samples)
print "W is", w
# FFT
wfft = rfft(wav2 * w)
wfft_norm = wfft/N
wfft_norm = abs(wfft_norm[range(N/2)])
# Working with frequencies...
freqs = rfftfreq(len(wfft_norm))
index = np.argmax(np.abs(wfft)) #Returns the index of the maximum absolute value of the windowed FFT
freq = freqs[index] #Returns the frequency from the above index
freq_range = [freq - 0.01, freq + 0.01]
freq_in_Hz = abs(freq * params[2]) #Converts the Hz
freq_range_Hz = [abs(freq_range[0] * params[2]), abs(freq_range[1] * params[2])]
# Differentiator
diff = np.diff(wav2)
# Envelope detector
env = np.abs(sigtool.hilbert(diff))
print "ENV", len(env)
# Low-pass filter
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
filt = signal.lfilter(h, 1, env)
# Signal's mean
mean = np.mean(filt)
#Do some crazy stuff to get binary **maybe wrong**
rx_data = []
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
for bit in sampled_signal:
if bit > mean:
rx_data.append(int(1))
else:
rx_data.append(int(0))
# Save raw binary output
rx_data1 = ''.join(map(str, (rx_data)))
outfile1 = open('FSK_wav6_output_binary.txt', 'w')
outfile1.write(rx_data1)
outfile1.close()
Seems that you use multiple channles and the sound you need is embedded in one of them.
So far I have found few problems in your scripts:
Nyquist rate is not a half rate of your sound. It is the rate which could sample the original sound wave, and should be at least 2 times bigger than the sound sampling rate. Hence,
nyq = 0.5 * Fs
is wrong.
If you take advantage of the noiseless sound to demodulate, then the Differentiator can be omitted.
For the low-pass filter:
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
the cutoff frequency is your data sample rate, please read this.
filt is the final signal which can extract the specific data you desire.
How to choose points in sampled_signal to recreate the original signal actually depends on the ratio between the original signal rate and the sampling rate. Just like the first link you provided, assuming the data were written in 11025 Hz and the sampling or recording rate is 44100 Hz, then the code you gave:
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
should be:
sampled_signal = filt[Fs/Fbit*2:params[3]:Fs/Fbit*4]
where Fs/Fbit*2 is the beginning, params[3] is the ending, Fs/Fbit*4 is the step length.
The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786.
It is normal, because of different sample rates, you can add some special characters acting like a start-sign and end-sign in you text, and try to find them, then you might find the data with correct lenght you need.
I'm trying to extract data from an wav file for audio analysis of each frequency and their amplitude with respect to time, my aim to run this data for a machine learning algorithm for a college project, after a bit of googling I found out that this can be done by Python's matplotlib library, I saw some sample codes that ran a Short Fourier transform and plotted a spectrogram of these wav files but wasn't able to understand how to use this library to extract data (all frequency's amplitude at a given time in the audio file) and store it in an 3D array or a .mat file.
Here's the code I saw on some website:
#!/usr/bin/env python
""" This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Frank Zalkow, 2012-2013 """
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))
frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win
return np.fft.rfft(frames)
""" scale frequency axis logarithmically """
def logscale_spec(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)
scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))
# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
if i == len(scale)-1:
newspec[:,i] = np.sum(spec[:,scale[i]:], axis=1)
else:
newspec[:,i] = np.sum(spec[:,scale[i]:scale[i+1]], axis=1)
# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
if i == len(scale)-1:
freqs += [np.mean(allfreqs[scale[i]:])]
else:
freqs += [np.mean(allfreqs[scale[i]:scale[i+1]])]
return newspec, freqs
""" plot spectrogram"""
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"):
samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)
sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
timebins, freqbins = np.shape(ims)
plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
plt.colorbar()
plt.xlabel("time (s)")
plt.ylabel("frequency (hz)")
plt.xlim([0, timebins-1])
plt.ylim([0, freqbins])
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])
if plotpath:
plt.savefig(plotpath, bbox_inches="tight")
else:
plt.show()
plt.clf()
plotstft("abc.wav")
Please guide me to understand how to extract the data, if not by matplotlib, recommend me some other library which will help me achieve this.
First of all, this looks like my code which is stated to be under a CC license. I don't take it too serious, but you should not ignore those aspects (you omitted the statement of authorship in this case), others could be more miffed about such a thing.
To your question: In this code the stft isn't computed by matplotlib, but just by numpy. You can get it like this:
samplerate, samples = wav.read(audiopath)
s = stft(samples, 1024)
I am not sure why you want a 3D array? It is a 2D-array, but it is complex valued. If you want to save it in a .mat file:
from scipy.io import savemat
savemat("file.mat", {'arr': s})
You can see once the wav audio file is read into variable samples it is passed to a function called stft :
samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)
here you already have access to the audio samples in var samples in the form of integers ... be aware that bit depth will impact number of bytes per sample as represented as a series of integers ... also know your endianness (left to right or visa versa) ... however in function stft that array is further processed into an array of floats in variable : frames before its passed into function np.fft.rfft
Depending on your needs those are your access choices without doing any of your own processing
What I am trying to achieve is the following: I need the frequency values of a sound file (.wav) for analysis. I know a lot of programs will give a visual graph (spectrogram) of the values but I need to raw data. I know this can be done with FFT and should be fairly easily scriptable in python but not sure how to do it exactly.
So let's say that a signal in a file is .4s long then I would like multiple measurements giving an output as an array for each timepoint the program measures and what value (frequency) it found (and possibly power (dB) too). The complicated thing is that I want to analyse bird songs, and they often have harmonics or the signal is over a range of frequency (e.g. 1000-2000 Hz). I would like the program to output this information as well, since this is important for the analysis I would like to do with the data :)
Now there is a piece of code that looked very much like I wanted, but I think it does not give me all the values I want.... (thanks to Justin Peel for posting this to a different question :)) So I gather that I need numpy and pyaudio but unfortunately I am not familiar with python so I am hoping that a Python expert can help me on this?
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
I'm not sure if this is what you want, if you just want the FFT:
import scikits.audiolab, scipy
x, fs, nbits = scikits.audiolab.wavread(filename)
X = scipy.fft(x)
If you want the magnitude response:
import pylab
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, len(Xdb))
pylab.plot(f, Xdb)
pylab.show()
I think that what you need to do is a Short-time Fourier Transform(STFT). Basically, you do multiple partially overlapping FFTs and add them together for each point in time. Then you would find the peak for each point in time. I haven't done this myself, but I've looked into it some in the past and this is definitely the way to go forward.
There's some Python code to do a STFT here and here.