How to identify a sound in python? - python

Disclaimer: Forgive me if I missed something as I am new to sounds
I have a wav file which is a pure beep. I want to create a python program that listens through my microphone and every time it hears that same beep, it will act(i.e- print "heard").
I recorded that beep with silence at the start and end to make it more realistic. Next, I did this:
import noisereduce as nr
import librosa
import matplotlib.pyplot as plt
import numpy as np
# OG Loading
ogAudio, sampling_rate = librosa.load('.\\Beep.wav')
recordAudio, sampling_rate = librosa.load('.\\OneBeepRecord.wav')
noisy_part = recordAudio[0:25000]
reducedRecord = nr.reduce_noise(recordAudio, sampling_rate, False, noisy_part)
trimmedRecord, index = librosa.effects.trim(reducedRecord, top_db=35, frame_length=512, hop_length=64)
I picked top_db=35 because the result's shape is the closest to the shape of ogAudio.
I figured that to 'compare' sounds I can use the correlation coefficients but I have different ndarray shapes as:
print(ogAudio.shape, trimmedRecord.shape)
Returns:
(2981,) (2944,)
To visualize(Blue is ogAudio and orange is trimmedRecord):
Plotted Arrays
Is it possible that the record is shorter than the sound it self?

Related

How to struct pack an array of frequencies and get a wave in a wav file? Python [duplicate]

I want to create "heart rate monitor" effect from a 2D array in numpy and want the tone to reflect the values in the array.
You can use the write function from scipy.io.wavfile to create a wav file which you can then play however you wish. Note that the array must be integers, so if you have floats, you might want to scale them appropriately:
import numpy as np
from scipy.io.wavfile import write
rate = 44100
data = np.random.uniform(-1, 1, rate) # 1 second worth of random samples between -1 and 1
scaled = np.int16(data / np.max(np.abs(data)) * 32767)
write('test.wav', rate, scaled)
If you want Python to actually play audio, then this page provides an overview of some of the packages/modules.
For the people coming here in 2016 scikits.audiolab doesn't really seem to work anymore. I was able to get a solution using sounddevice.
import numpy as np
import sounddevice as sd
fs = 44100
data = np.random.uniform(-1, 1, fs)
sd.play(data, fs)
in Jupyter the best option is:
from IPython.display import Audio
wave_audio = numpy.sin(numpy.linspace(0, 3000, 20000))
Audio(wave_audio, rate=20000)
In addition, you could try scikits.audiolab. It features file IO and the ability to 'play' arrays. Arrays don't have to be integers. To mimick dbaupp's example:
import numpy as np
import scikits.audiolab
data = np.random.uniform(-1,1,44100)
# write array to file:
scikits.audiolab.wavwrite(data, 'test.wav', fs=44100, enc='pcm16')
# play the array:
scikits.audiolab.play(data, fs=44100)
I had some problems using scikit.audiolabs, so I looked for some other options for this task. I came up with sounddevice, which seems a lot more up-to-date. I have not checked if it works with Python 3.
A simple way to perform what you want is this:
import numpy as np
import sounddevice as sd
sd.default.samplerate = 44100
time = 2.0
frequency = 440
# Generate time of samples between 0 and two seconds
samples = np.arange(44100 * time) / 44100.0
# Recall that a sinusoidal wave of frequency f has formula w(t) = A*sin(2*pi*f*t)
wave = 10000 * np.sin(2 * np.pi * frequency * samples)
# Convert it to wav format (16 bits)
wav_wave = np.array(wave, dtype=np.int16)
sd.play(wav_wave, blocking=True)
PyGame has the module pygame.sndarray which can play numpy data as audio. The other answers are probably better, as PyGame can be difficult to get up and running. Then again, scipy and numpy come with their own difficulties, so maybe it isn't a large step to add PyGame into the mix.
http://www.pygame.org/docs/ref/sndarray.html
Another modern and convenient solution is to use pysoundfile, which can read and write a wide range of audio file formats:
import numpy as np
import soundfile as sf
data = np.random.uniform(-1, 1, 44100)
sf.write('new_file.wav', data, 44100)
Not sure of the particulars of how you would produce the audio from the array, but I have found mpg321 to be a great command-line audio player, and could potentially work for you.
I use it as my player of choice for Anki, which is written in python and has libraries that could be a great starting place for interfacing your code/arrays with audio.
Check out:
anki.sound.py
customPlayer.py

Fidelity of sound created from frequency domain data

I know it is possible to create .wav file from frequency domain data (magnitude + phase) but I would like to know how close would that be to the real(orginal) sound ? Does it depend on the frequency step for example (or something else).
Second question:
I need to write a code that takes a frequency domain data (magnitude + phase) to build a wav file.
In order to do so, I started by the following code which creates a fake signal --> fft (at this point I have the kind of input(mag + phase) that I would expect for my target code). But it doesn't seem top work fine, could you please help
import numpy as np
from scipy import pi
import matplotlib.pyplot as plt
#%matplotlib inline
from scipy.fftpack import fft
min=0
max=400
def calculateFFT (timeStep,micDataX,micDataY):
n=micDataX.size
FFT=np.fft.fft(micDataY)
fft_amlitude=2*abs(FFT)/n
fft_phase=np.angle(FFT)
fft_freq= np.fft.fftfreq(n, d=timeStep) #not used created manually (7 lines) check pi_fFreqDomainCreateConstantBW it is kept here to compare sizes
upper_bound=int((n)/2)
return fft_freq[1:upper_bound],fft_amlitude[1:upper_bound],fft_phase[1:upper_bound]
def calculateI_FFT (n,amplitude_spect,phase_spect):
data=list()
for mag,phase in zip(amplitude_spect,phase_spect):
data.append((mag*n/2)*(np.cos(phase)+1j* np.sin(phase)))
full_data=list(data)
i_data=np.fft.irfft(data)
return i_data
#sampling rate and time vector
start_time=0 #sec
end_time= 2
sampling_rate=1000 #Hz
N=(end_time-start_time)*sampling_rate
#Freq domain peaks
peak1_hz=60 # freq of peak
peak1_mag= 25
peak2_hz=270 # freq of peak
peak2_mag= 2
#Vibration data generation
time =np.linspace(start_time,end_time,N)
vib_data=peak1_mag*np.sin(2*pi*peak1_hz*time)+peak2_mag*np.sin(2*pi*peak2_hz*time)
#Data plotting
plt.plot(time[min:max],vib_data[min:max])
# fft
time_step=1/sampling_rate
fft_freq,fft_data,fft_phase=calculateFFT(time_step,time,vib_data)
#ifft
i_data=calculateI_FFT(N,fft_data,fft_phase)
#plotting
plt.plot(time[min:max],i_data[min:max])
plt.xlabel("Time (s)")
plt.ylabel("Vibration (g)")
plt.title("Time domain")
plt.show()
The output signal screenshot is attached (blue for original signal Orange for the reconstructed one)
enter image description here
Thank you!

Play square wave SciPy and PyAudio

I'm trying to play square waves generated using SciPy with PyAudio but I get the error
TypeError: len() of unsized object
which is kind of strange because the square wave object should have a size, right?
RATE = 48000
p = pyaudio.PyAudio()
stream = p.open(format = pyaudio.paInt16,
channels = 2,
rate = RATE,
output = True)
# ... inside a loop
wav = signal.square(2*math.pi*FREQ*t)
wav = wav.astype(np.int16)
stream.write(wav) # crash here
The crash happens on the first iteration of the loop, so I suppose the loop is not a problem.
I get the same error. However, you are omitting some information, so I will assume these are your imports:
import pyaudio
import math
import numpy as np
from scipy import signal
And that
FREQ = 440
It looks like the variable you are iterating is t and it's a scalar. You might have good reasons to do this, but I don't think it is how scipy.signal is meant to work. If you use a vector t instead:
t = np.linspace(0, 2)
Then signal.square(...) and stream.write(wav.astype(np.int16)) work without problems.

Matplotlib Magnitude_spectrum Units in Python for Comparing Guitar Strings

I'm using matplotlib's magnitude_spectrum to compare the tonal characteristics of guitar strings. Magnitude_spectrum shows the y axis as having units of "Magnitude (energy)". I use two different 'processes' to compare the FFT. Process 2 (for lack of a better description) is much easier to interpret- code & graphs below
My questions are:
In terms of units, what does "Magnitude (energy)" mean and how does it relate to dB?
Using #Process 2 (see code & graphs below), what type of units am I looking at, dB?
If #Process 2 is not dB, then what is the best way to scale it to dB?
My code below (simplified) shows an example of what I'm talking about/looking at.
import numpy as np
from scipy.io.wavfile import read
from pylab import plot
from pylab import plot, psd, magnitude_spectrum
import matplotlib.pyplot as plt
#Hello Signal!!!
(fs, x) = read('C:\Desktop\Spectral Work\EB_AB_1_2.wav')
#Remove silence out of beginning of signal with threshold of 1000
def indices(a, func):
#This allows to use the lambda function for equivalent of find() in matlab
return [i for (i, val) in enumerate(a) if func(val)]
#Make the signal smaller so it uses less resources
x_tiny = x[0:100000]
#threshold is 1000, 0 is calling the first index greater than 1000
thresh = indices(x_tiny, lambda y: y > 1000)[1]
# backs signal up 20 bins, so to not ignore the initial pluck sound...
thresh_start = thresh-20
#starts at threshstart ends at end of signal (-1 is just a referencing thing)
analysis_signal = x[thresh_start-1:]
#Split signal so it is 1 second long
one_sec = 1*fs
onesec = x[thresh_start-1:one_sec+thresh_start-1]
#process 1
(spectrum, freqs, _) = magnitude_spectrum(onesec, Fs=fs)
#process 2
spectrum1 = spectrum/len(spectrum)
I don't know how to bulk process on multiple .wav files so I run this code separately on a whole bunch of different .wav files and i put them into excel to compare. But for the sake of not looking at ugly graphs, I graphed it in Python. Here's what #process1 and #process2 look like when graphed:
Process 1
Process 2
Magnetude is just the absolute value of the frequency spectrum. As you have labelled in Process 1 "Energy" is a good way to think about it.
Both Process 1 and Process 2 are in the same units. The only difference is that the values in Process 2 has been divided by the total length of the array (a scalar, hence no change of units). Normally this happens as part of the FFT, but sometimes it does not (e.g. numpy.FFT doesn't include the divide by length).
The easiest way to scale it to dB is:
(spectrum, freqs, _) = magnitude_spectrum(onesec, Fs=fs, scale='dB')
If you wanted to do this yourself then you would need to do something like:
spectrum2 = 20*numpy.log10(spectrum)
**It is worth noting that I'm not sure if you should be applying the /len(spectrum) or not. I would suggest using the scale='dB' !!
To convert to dB, take the log of any non-zero spectrum magnitudes, and scale (scale to match a calibrated mic and sound source if available, or use an arbitrarily scale to make the levels look familiar otherwise), before plotting.
For zero magnitude values, perhaps just replace or clamp the log with whatever you want to be on the bottom of your log plot (certainly not negative-infinity).

How to generate audio from a numpy array?

I want to create "heart rate monitor" effect from a 2D array in numpy and want the tone to reflect the values in the array.
You can use the write function from scipy.io.wavfile to create a wav file which you can then play however you wish. Note that the array must be integers, so if you have floats, you might want to scale them appropriately:
import numpy as np
from scipy.io.wavfile import write
rate = 44100
data = np.random.uniform(-1, 1, rate) # 1 second worth of random samples between -1 and 1
scaled = np.int16(data / np.max(np.abs(data)) * 32767)
write('test.wav', rate, scaled)
If you want Python to actually play audio, then this page provides an overview of some of the packages/modules.
For the people coming here in 2016 scikits.audiolab doesn't really seem to work anymore. I was able to get a solution using sounddevice.
import numpy as np
import sounddevice as sd
fs = 44100
data = np.random.uniform(-1, 1, fs)
sd.play(data, fs)
in Jupyter the best option is:
from IPython.display import Audio
wave_audio = numpy.sin(numpy.linspace(0, 3000, 20000))
Audio(wave_audio, rate=20000)
In addition, you could try scikits.audiolab. It features file IO and the ability to 'play' arrays. Arrays don't have to be integers. To mimick dbaupp's example:
import numpy as np
import scikits.audiolab
data = np.random.uniform(-1,1,44100)
# write array to file:
scikits.audiolab.wavwrite(data, 'test.wav', fs=44100, enc='pcm16')
# play the array:
scikits.audiolab.play(data, fs=44100)
I had some problems using scikit.audiolabs, so I looked for some other options for this task. I came up with sounddevice, which seems a lot more up-to-date. I have not checked if it works with Python 3.
A simple way to perform what you want is this:
import numpy as np
import sounddevice as sd
sd.default.samplerate = 44100
time = 2.0
frequency = 440
# Generate time of samples between 0 and two seconds
samples = np.arange(44100 * time) / 44100.0
# Recall that a sinusoidal wave of frequency f has formula w(t) = A*sin(2*pi*f*t)
wave = 10000 * np.sin(2 * np.pi * frequency * samples)
# Convert it to wav format (16 bits)
wav_wave = np.array(wave, dtype=np.int16)
sd.play(wav_wave, blocking=True)
PyGame has the module pygame.sndarray which can play numpy data as audio. The other answers are probably better, as PyGame can be difficult to get up and running. Then again, scipy and numpy come with their own difficulties, so maybe it isn't a large step to add PyGame into the mix.
http://www.pygame.org/docs/ref/sndarray.html
Another modern and convenient solution is to use pysoundfile, which can read and write a wide range of audio file formats:
import numpy as np
import soundfile as sf
data = np.random.uniform(-1, 1, 44100)
sf.write('new_file.wav', data, 44100)
Not sure of the particulars of how you would produce the audio from the array, but I have found mpg321 to be a great command-line audio player, and could potentially work for you.
I use it as my player of choice for Anki, which is written in python and has libraries that could be a great starting place for interfacing your code/arrays with audio.
Check out:
anki.sound.py
customPlayer.py

Categories