How to get sound level (dB) of audio in Python? - python

I wonder that how to get sound pressure level in dB. The input should be the signal from the microphone of PC (real-time audio signal). The output is the real-time sound pressure level of input signal. This is my simple code.
import pyaudio
import numpy as np
import wave
from threading import Thread
from pysine import sine
import math
import time
def print_sound():
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
pa = pyaudio.PyAudio()
stream = pa.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
buffer = []
while True:
string_audio_data = stream.read(CHUNK)
audio_data = np.frombuffer(string_audio_data, np.int16)
volume_norm = np.linalg.norm(audio_data)*10
dfft = 10.*np.log10(abs(np.fft.rfft(audio_data)))
print(int(volume_norm))
print_sound()
and this error is
Warning (from warnings module): File "C:\Users\Admin\1.py", line 27 dfft = 10.*np.log10(abs(np.fft.rfft(audio_data))) RuntimeWarning: divide by zero encountered in log10

As #Thierry pointed out above it will not be possible to get a dB SPL answer without calibrating your speaker/microphone system with actual measurements. It will be possible (as it seems you've done with your code snippet) to report relative levels.
These are the minimum requirements you need to report absolute dB SPL:
dynamic range of your ADC (max/min voltage range) - what is the loudest sound that can be captured by the ADC of the computer.
the sensitivity of your microphone (mV/Pa). For a sound of X Pascals pressure, how many mV (or sample-related measurement) are produced
speaker calibration - given an input signal with Y rms, how many Pa (and thus dB SPL) will the output sound be?

Related

slow down spoken audio (not from mp3/wav) using python

I need to slow down short bursts of spoken audio, captured over a mic and then play it out in realtime in a python script. I can capture and playback audio fine without changing the speed using an input and an output stream using PyAudio but I can't work out how to slow it down.
I've seen this post which uses pydub does something similar for audio from a file but can't work out how to modify it for my purposes.
Just to stress the key point from the question title, "(not from mp3/wav or any other file type)" as I want to do this in realtime with short bursts, idealy <= ~0.1s so just want to work with data read in from a PyAudio stream.
Does anyone who has experience with pydub know if it might do what I need?
NB I realise that the output would lag further and further behind and that there might be buffering issues however I'm just doing this for short bursts of upto 30 seconds and only want to slow the speech down by ~10%.
So it turns out it was very very simple.
Once I looked into the pydub and pyaudio code bases i realised that by simply specifying a lower value for the 'rate' parameter on the output audio stream (speaker) compared with the input audio stream (mic) the stream.write() function would handle it for me.
I had been expecting that a physical manipulation of the raw data would be required to transform the data into a loarger buffer.
Here's a simple example:
import pyaudio
FORMAT = pyaudio.paInt16
CHANNELS = 1
FRAME_RATE = 44100
CHUNK = 1024*4
# simply modify the value for the 'rate' parameter to change the playback speed
# <1 === slow down; >1 === speed up
FRAMERATE_OFFSET = 0.8
audio = pyaudio.PyAudio()
#output stream
stream_out = audio.open(format=FORMAT,
channels=CHANNELS,
rate= int(FRAME_RATE * FRAMERATE_OFFSET),
output=True)
# open input steam to start recording mic audio
stream_in = audio.open(format=FORMAT,
channels=CHANNELS,
rate=FRAME_RATE,
input=True)
for i in range(1):
# modify the chunk multiplier below to captyre longer time durations
data = stream_in.read(CHUNK*25)
stream_out.write(data)
stream_out.stop_stream()
stream_out.close()
audio.terminate()
To make this operational I'll need to set up a shared memory data buffer and setup a subprocess to handle the output so that I don't miss anything significant from the input signal.
Here is what I did.
import wave
channels = 1
swidth = 2
multiplier = 0.2
spf = wave.open('flute-a4.wav', 'rb')
fr=spf.getframerate() # frame rate
signal = spf.readframes(-1)
wf = wave.open('ship.wav', 'wb')
wf.setnchannels(channels)
wf.setsampwidth(swidth)
wf.setframerate(fr*multiplier)
wf.writeframes(signal)
wf.close()
I used flute from this repo.
As mentioned in the comments, by simply increasing or decreasing sampling frequency / frame rate , you can speed-up of slowdown audio. Although if you are planning to do it from microphone in realtime, one of the idea will be to record in chunks of few seconds, play the slowed down audio and then move onto recording again.
Here's an example using sounddevice , which is basically slight mod of my answer here.
We record audio for 4 seconds in loop for 3 times, and play back immediatly with frame rate offset ( > 1 to speedup and < 1 for slowdown). Added time delay of 1 sec for audio playback to complete before we start new chunk.
import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav
import time
fs=44100
duration = 4 # seconds
#fs_offset = 1.3 #speedup
fs_offset = 0.8 #speedup slow down
for count in range(1,4):
myrecording = sd.rec(duration * fs, samplerate=fs, channels=2, dtype='float64')
print "Recording Audio chunk {} for {} seconds".format(count, duration)
sd.wait()
print "Recording complete, Playing chunk {} with offset {} ".format(count, fs_offset)
sd.play(myrecording, fs * fs_offset)
sd.wait()
print "Playing chunk {} Complete".format(count)
time.sleep(1)
Output:
$python sdaudio.py
Recording Audio chunk 1 for 4 seconds
Recording complete, Playing chunk 1 with offset 0.8
Playing chunk 1 Complete
Recording Audio chunk 2 for 4 seconds
Recording complete, Playing chunk 2 with offset 0.8
Playing chunk 2 Complete
Recording Audio chunk 3 for 4 seconds
Recording complete, Playing chunk 3 with offset 0.8
Playing chunk 3 Complete
Here's an example using PyAudio for recording from microphone and pydub for playback. Although you can also use pyaudio blocking wire capability to modify outgoing audio. I used pydub since you referrred to a pydub based solution. This is a mod of code from here.
import pyaudio
import wave
from pydub import AudioSegment
from pydub.playback import play
import time
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 4
#FRAMERATE_OFFSET = 1.4 #speedup
FRAMERATE_OFFSET = 0.7 #slowdown
WAVE_OUTPUT_FILENAME = "file.wav"
def get_audio():
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
#save to file
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE * FRAMERATE_OFFSET)
waveFile.writeframes(b''.join(frames))
waveFile.close()
for count in range(1,4):
print "recording segment {} ....".format(count)
frame_array = get_audio()
print "Playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
audio_chunk = AudioSegment.from_wav(WAVE_OUTPUT_FILENAME)
print "Finished playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
play(audio_chunk)
time.sleep(1)
Output:
$python slowAudio.py
recording segment 1 ....
Playing segment 1 .... at offset 0.7
Finished playing segment 1 .... at offset 0.7
recording segment 2 ....
Playing segment 2 .... at offset 0.7
Finished playing segment 2 .... at offset 0.7
recording segment 3 ....
Playing segment 3 .... at offset 0.7
This question has been answered here.
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
slow_sound = speed_change(sound, 0.75)
fast_sound = speed_change(sound, 2.0)

Second .wav file plays enhanced noise of first .wav file instead of reduced noise

I am trying to achieve active noise reduction in python. My project is composed of two set of codes:
sound recording code
sound filtering code
What I aim for is that when you run the program, it will start recording through the microphone. After you've finished recording there will be a saved file called "file1.wav" When you play that file, it is the one that you recorded originally. After you're finished with that, you will now put "file1.wav" through a filter by calling "fltrd()". This will create a second wav file in the same folder and that second wav file is supposedly the one with less/reduced noise. Now my problem is that the second wav file is enhancing noise instead of reducing it. Can anyone please troubleshoot my code? :(
Here is my code below:
import pyaudio
import wave
import matplotlib.pyplot as plt
import numpy as np
import scipy.io.wavfile
import scipy.signal as sp
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "file1.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print ("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print ("finished recording")
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
x = scipy.io.wavfile.read('file1.wav')
n = x[1]
y = np.zeros(n.shape)
y = n.cumsum(axis=0)
def fltrd():
n,x = scipy.io.wavfile.read('file1.wav')
a2 = x.cumsum(axis=0)
a3 = np.asarray(a2, dtype = np.int16)
scipy.io.wavfile.write('file2.wav',n,a3)
Actual noise filtering is difficult and intense. However, an simple noise filter using high and low pass filter can be easily created using pydub library. See here for more details (install, requirements etc)
Also see here for more details on low and high pass filter using pydub.
Basic idea is to take a audio file and then pass it through both low and high pass filter such that audio above and below certain threahold will be highly attenuated (in effect demonstrating filtering).
Although, this will not affect any noise falling in pass-band for which you will need to look at other noise cancellation techniques.
from pydub import AudioSegment
from pydub import low_passfilter
from pydub import high_pass_filter
from pydub.playback import play
song = AudioSegment.from_wav('file1.wav')
#Freq in Hz ,Adjust as per your needs
new = song.low_pass_filter(5000).high_pass_filter(200)
play(new)

cracking sound sine tone in pyaudio

I am using python and pyaudio to stream a pure sine tone using a callback method, in order to later modulate the sound via user input. Everything is fine except that when i run the code, i get 1-2 seconds of a cracking-buzzing sound associated to the warning message
ALSA lib pcm.c:7339:(snd_pcm_recover) underrun occurred
After that, the sine tone is streamed correctly. Any hints about how to remove the initial popping sound?
here is the code that stream the sound for one second
import pyaudio
import time
import numpy as np
CHANNELS = 1
RATE = 44100
freq = 600
CHUNK = 1024
lastchunk = 0
def sine(current_time):
global freq,lastchunk
length = CHUNK
factor = float(freq)*2*np.pi/RATE
this_chunk = np.arange(length)+lastchunk
lastchunk = this_chunk[-1]
return np.sin(this_chunk*factor)
def get_chunk():
data = sine(time.time())
return data * 0.1
def callback(in_data, frame_count, time_info, status):
chunk = get_chunk() * 0.25
data = chunk.astype(np.float32).tostring()
return (data, pyaudio.paContinue)
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paFloat32,
channels=CHANNELS,
rate=RATE,
output=True,
stream_callback=callback)
stream.start_stream()
time.sleep(1)
stream.stop_stream()
stream.close()
Cheers
PortAudio (the library behind PyAudio) allows you to specify a block size, which is typically called CHUNK in the PyAudio examples. If you don't specify one, the default is 0, which in PortAudio terms means that the block size will be chosen automatically and will even change from callback to callback!
To check that, try printing frame_count (which is another name for the block size) within the callback. I suspect that PortAudio chooses a too small block size in the beginning and when that causes underruns, it increases the block size. Am I right?
To avoid this, you should specify a fixed block size from the beginning, using:
stream = p.open(..., frames_per_buffer=CHUNK, ...)
... where frames_per_buffer is yet another name for the block size.
This also makes more sense since up to now you use length = CHUNK in your code without knowing the actual block size!
If this still leads to underruns, you can try further increasing the block size to 2048.
Finally, let me take the liberty to make a shameless plug for my own PortAudio wrapper, the sounddevice module. It basically does the same as PyAudio, but it's easier to install, IMHO has a nicer API and it supports NumPy directly, without you having to do the manual conversions.
The accepted answer still didn't give perfect audio quality. Judging from what I heard (didn't measure) there are sometimes drop outs and/or phase jumps in the sine. Based on the code in the PyAudio examples and what can be found here I came to this solution :
"""PyAudio Example: Play a wave file (callback version)."""
import pyaudio
import time
import math
from itertools import count
import numpy as np
RATE = 96000
# More efficient calculation but period = int(framer... causes high granularity for higher frequencies (15kHz becoming 16kHz for instance)
# def sine_wave(frequency=1000, framerate=RATE, amplitude=0.5):
# period = int(framerate / frequency)
# amplitude = max(min(amplitude, 1), 0)
# lookup_table = [float(amplitude) * math.sin(2.0 * math.pi * float(frequency) *
# (float(i % period) / float(framerate))) for i in xrange(period)]
# return (lookup_table[i % period] for i in count(0))
def sine_wave(frequency=440.0, framerate=RATE, amplitude=0.5):
amplitude = max(min(amplitude, 1), 0)
return (float(amplitude) * math.sin(2.0*math.pi*float(frequency)*(float(i)/float(framerate))) for i in count(0))
sine = [sine_wave(150), sine_wave(1500), sine_wave(15000)]
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# define callback (2)
def callback(in_data, frame_count, time_info, status):
wave = sine[0]
data = [wave.next()]
for i in range(frame_count - 1):
data.append(wave.next())
ret_array =np.array(data).astype(np.float32).tostring()
return (ret_array, pyaudio.paContinue)
# open stream using callback (3)
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=RATE,
frames_per_buffer=1024,
output=True,
stream_callback=callback)
# start the stream (4)
stream.start_stream()
# Insert your own solution to end the sound
time.sleep(3)
# stop stream (6)
stream.stop_stream()
stream.close()
# close PyAudio (7)
p.terminate()
This should be able to play the sine until your hardware dies or the next power outage... But I only tested for half an hour ;-)

Transforming small WAV files in a single value of frequency (PYTHON)

I'm in a need for a program that transforms tones recorded by the microphone into keybord presses. Example: if somebody sings at a frequency between 400hz and 600hz at the microphone and the average tone is 550hz, then i store the average frequency in the var 'tom', and the key "G" of my keyboard is pressed.
Even tho i'm newbye at programming, i searched and figured out a way to do so,
by using Audiopy at python language, by recording small WAV files, i could then read those and get a number as an average frequency, and with this number and some ifs and elifs, press keys (not that hard to find a code to press keys), in an enormous WHILE that repeats the process while the program runs, and so i would have the process of talking, reading the small files the talk would produce, and then transforming into key presses, according to the tone.
The main problem is that I have no idea on how to transform the WAV files i've been recording on a single average number of frequency. Can somebody help me with this? Or with the big picture? Cuz i know this method is not a really good one. Thanks! I was using this code to record, that I found on the Audiopy website:
import pyaudio
import wave
import numpy as np
import pyaudio
CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "output1.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done chunk")
stream.stop_stream()
stream.close()
p.terminate()
To press the keys, this other code:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
if tom >= 400 and tom<=500:
shell.SendKeys("G")
PS.: I'm using Windows
You can use the Fourier transform to convert sound into frequencies.
More specifically, use the one-dimensional discrete Fourier Transform provided by numpy.fft.rfft.
An example to read a single second from a stereo WAV file and extract the frequencies.
import wave
import numpy as np
with wave.open('input.wav', 'r') as wr:
sz = wr.getframerate() # Read and process 1 second.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # separate into left and right channel
lf, rf = np.absolute(np.fft.rfft(left)), np.absolute(np.fft.rfft(right))
The lf and rf are numpy arrays containing the intensity of each frequency. Using numpy.argmax you can get the index (frequency) with the highest strength.
But try it and graph the result using e.g. matplotlib. You'll see that there are probably multiple peaks in the data. For example you might find a peak at 50 Hz or 60 Hz. This is most probably interference from mains electricity and should be ignored by zero-ing out the data.
Example for 60 Hz:
lf[55:65], rf[55:65] = 0, 0
Below is an example plot made with matplotlib from a one-second sound clip. The top graph shows the samples from the WAV file while the bottom one shows the same data converted to frequencies. This is a graph of a person speaking, so there are many peaks. The highest is around 200 Hz.

naive filtering using fft in python

I'm trying to write naiv low pass filter using Python.
Values of the Fourier Transformant higher than a specific frequency should be equal to 0, right?
As far as I know that should to work.
But after an inverse fourier transformation what I get is just noise.
Program1 records RECORD_SECONDS from microphone and writes information about fft in fft.bin file.
Program2 reads from this file, do ifft and plays result on speakers.
In addition, I figured out, that every, even very little change in fft causes Program2 to fail.
Where do I make mistake?
Program1:
import pickle
import pyaudio
import wave
import numpy as np
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1 #1-mono, 2-stereo
RATE = 44100
RECORD_SECONDS = 2
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
f = open("fft.bin", "wb")
Tsamp = 1./RATE
#arguments for a fft
fft_x_arg = np.fft.rfftfreq(CHUNK/2, Tsamp)
#max freq
Fmax = 4000
print("* recording")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
#read one chunk from mic
SigString = stream.read(CHUNK)
#convert string to int
SigInt = np.fromstring(SigString, 'int')
#calculate fft
fft_Sig = np.fft.rfft(SigInt)
"""
#apply low pass filter, maximum freq = Fmax
j=0
for value in fft_x_arg:
if value > Fmax:
fft_Sig[j] = 0
j=j+1
"""
#write one chunk of data to file
pickle.dump(fft_Sig,f)
print("* done recording")
f.close()
stream.stop_stream()
stream.close()
p.terminate()
Program2:
import pyaudio
import pickle
import numpy as np
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=44100/2, #anyway, why 44100 Hz plays twice faster than normal?
output=True)
f = open("fft.bin", "rb")
#load first value from file
fft_Sig = pickle.load(f)
#calculate ifft and cast do int
SigInt = np.int16(np.fft.irfft(fft_Sig))
#convert once more - to string
SigString = np.ndarray.tostring(SigInt)
while SigString != '':
#play sound
stream.write(SigString)
fft_Sig = pickle.load(f)
SigInt = np.int16(np.fft.irfft(fft_Sig))
SigString = np.ndarray.tostring(SigInt)
f.close()
stream.stop_stream()
stream.close()
p.terminate()
FFTs operate on complex numbers. You might be able to feed them real numbers (which will get converted to complex by setting the imaginary part to 0) but their outputs will always be complex.
This is probably throwing off your sample counting by 2 among other things. It should also be trashing your output because you're not converting back to real data.
Also, you forgot to apply a 1/N scale factor to the IFFT output. And you need to keep in mind that the frequency range of an FFT is half negative, that is it's approximately the range -1/(2T) <= f < 1/(2T). BTW, 1/(2T) is known as the Nyquist frequency, and for real input data, the negative half of the FFT output will mirror the positive half (i.e. for y(f) = F{x(t)} (where F{} is the forward Fourier transform) y(f) == y(-f).
I think you need to read up a bit more on DSP algorithms using FFTs. What you're trying to do is called a brick wall filter.
Also, something that will help you a lot is matplotlib, which will help you see what the data looks like at intermediate steps. You need to look at this intermediate data to find out where things are going wrong.

Categories