Detect and record a sound with python - python

I'm using this program to record a sound in python:
Detect & Record Audio in Python
I want to change the program to start recording when sound is detected by the sound card input. Probably should compare the input sound level in chunk, but how do this?

You could try something like this:
based on this question/answer
# this is the threshold that determines whether or not sound is detected
THRESHOLD = 0
#open your audio stream
# wait until the sound data breaks some level threshold
while True:
data = stream.read(chunk)
# check level against threshold, you'll have to write getLevel()
if getLevel(data) > THRESHOLD:
break
# record for however long you want
# close the stream
You'll probably want to play with your chunk size and threshold values until you get the desired behavior.
Edit:
You can use the built-in audioop package to find the root-mean-square (rms) of a sample, which is generally how you would get the level.
import audioop
import pyaudio
chunk = 1024
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=44100,
input=True,
frames_per_buffer=chunk)
data = stream.read(chunk)
rms = audioop.rms(data, 2) #width=2 for format=paInt16

Detecting when there isn't silence is usually done by using the root mean square(RMS) of some chunk of the sound and comparing it with some threshold value that you set (the value will depend on how sensitive your mic is and other things so you'll have to adjust it). Also, depending on how quickly you want the mic to detect sound to be recorded, you might want to lower the chunk size, or compute the RMS for overlapping chunks of data.

how to do it is indicated in the link you give:
print "* recording"
for i in range(0, 44100 / chunk * RECORD_SECONDS):
data = stream.read(chunk)
# check for silence here by comparing the level with 0 (or some threshold) for
# the contents of data.
# then write data or not to a file
You have to set the threshold variable and compare with the average value (the amplitude) or other related parameter in data each time it is read in the loop.
You could have two nested loops, the first one to trigger the recording and the other to continously save sound data chuncks after that.

Related

Read left channel of wav data into numpy array

I'm using pyaudio to take input from a microphone or read a wav file, and analyze the stream while playing it. I want to only analyze the right channel if the input is stereo. I've been able to extract the data and convert to integers using loops:
levels = []
length = len(data)
if channels == 1:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
elif channels == 2:
for i in range(length//4):
j = 4 * i + 2
volume = abs(struct.unpack('<h', data[j:j+2])[0])
levels.append(volume)
I think this working correctly, I know it runs without error on a laptop and Raspberry Pi 3, but it appears to consume too much time to run on a Raspberry Pi Zero when simultaneously streaming the output to a speaker. I figure that eliminating the loop and using numpy may help. I assume I need to use np.ndarray to do this, and the first parameter will be (CHUNK,) where CHUNK is my chunk size for analyzing the audio (I'm using 1024). And the format would be '<h', as in the struct code above, I think. But I'm at a loss as to how to code it correctly for each of the two cases (mono and right channel only for stereo). How do I create the numpy arrays for each of the two cases?
You are here reading 16-bit integers from a binary file. It seems that you are first reading the data into data variable with something like data = f.read(), which is here not visible. Then you do:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
BTW, that code is wrong, it shoud be abs(struct.unpack('<h', data[2*i:2*i+2])[0]), otherwise you are overlapping bytes from different values.
To do the same with numpy, you should just do this (instead of both f.read()and the whole loop):
data = np.fromfile(f, dtype='<i2')
This is over 100 times faster than the manual thing above in my test on 5 MB of data.
In the second case, you have interleaved left-right-left-right values. Again you can read them all (assuming you have enough memory) and then access only one half:
data = np.fromfile(f, dtype='<i2')
left = data[::2]
right = data[1::2]
This processes everything, even though you need just one half, but it is still much much faster.
EDIT: If the data not coming from a file, np.fromfile can be replaced with np.frombuffer. Then you have this:
channel_data = np.frombuffer(data, dtype='<i2')
if channels == 2:
channel_data = channel_data[1::2]
levels = np.abs(channel_data)

Numpy RFFT/IRFFT volume

I'm doing an rfft and irfft from a wave file:
samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
output = np.fft.irfft(fftData).astype(data.dtype)
So it reads from a file and then does rfft. However it produces a lot of noise when I play the audio with py audio stream. I tried to search an answer to this question and used this solution:
rfft or irfft increasing wav file volume in python
That is why I have the .astype(data.dtype) when doing the irfft. However it doesn't reduce the noise, it reduced it a bit but still it sounds all wrong.
This is the playback, where p is the pyAudio:
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
stream.write(output)
stream.stop_stream()
stream.close()
p.terminate()
So what am I doing wrong here?
Thanks!
edit: Also I tried to use .astype(dtype=np.float32) when doing the irfft as the pyaudio uses that when streaming audio. However it was still noisy.
The best working solution this far seems to be normalization with median value and using .astype(np.float32) as pyAudio output is float32:
samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
fftData = np.divide(fftData, np.median(fftData))
output = np.fft.irfft(fftData).astype(dtype=np.float32)
If anyone has better solutions I'd like to hear. I tried with mean normalization but it still resulted in clipping audio, normalization with np.max made the whole audio too low. This normalization problem with FFT is always giving me trouble and haven't found any 100% working solutions here in SO.

Converting microphone data to frequency spectrum

I'm trying to create a spectrogram program (in python), which will analyze and display the frequency spectrum from a microphone input in real time. I am using a template program for recording audio from here: http://people.csail.mit.edu/hubert/pyaudio/#examples (recording example)
This template program works fine, but I am unsure of the format of the data that is being returned from the data = stream.read(CHUNK) line. I have done some research on the .wav format, which is used in this program, but I cannot find the meaning of the actual data bytes themselves, just definitions for the metadata in the .wav file.
I understand this program uses 16 bit samples, and the 'chunks' are stored in python strings. I was hoping somebody could help me understand exactly what the data in each sample represents. Even just a link to a source for this information would be helpful. I tried googling, but I don't think I know the terminology well enough to search accurately.
stream.read gives you binary data. To get the decimal audio samples, you can use numpy.fromstring to turn it into a numpy array or you use Python's built-in struct.unpack.
Example:
import pyaudio
import numpy
import struct
CHUNK = 128
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=44100, input=True, frames_per_buffer=CHUNK)
data = stream.read(CHUNK)
print numpy.fromstring(data, numpy.int16) # use external numpy module
print struct.unpack('h'*CHUNK, data) # use built-in struct module
stream.stop_stream()
stream.close()
p.terminate()

how to adjust the volume of the audio file by serially getting the voltage signals from the potentiometer using arduino board and python scripts

I want to adjust the volume of the mp3 file while it is being playing by adjusting the potentiometer. I am reading the potentiometer signal serially via Arduino board with python scripts. With the help of pydub library i can able to read the file but cannot adjust the volume of the file while it is being playing. This is the code i have done after a long search
I specified only the portion of Pydub part. for your information im using vlc media player for changing the volume.
>>> from pydub import AudioSegment
>>> song = AudioSegment.from_wav("C:\Users\RAJU\Desktop\En_Iniya_Ponnilave.wav")
While the file is playing, i cannot adjust the value. Please, someone explain how to do it.
First you need decode your audio signal to raw audio and Split your signal in X frames, and you can manipulate your áudio and at every frame you can change Volume or change the Pitch or change the Speed, etc!
To change the volume you just need multiply your raw audio vector by one factor (this can be your potentiometer data signal).
This factor can be different if your vector are in short int or float point format !
One way to get raw audio data from wav files in python is using wave lib
import wave
spf = wave.open('wavfile.wav','r')
#Extract Raw Audio from Wav File
signal = spf.readframes(-1)
decoded = numpy.fromstring(signal, 'Float32');
Now you can multiply the vector decoded by one factor, for example if you want increase 10dB you need calculate 10^(DbValue/20) then in python 10**(10/20) = 3.1623
newsignal = decoded * 3.1623;
Now you need encode the vector again to play the new framed audio, you can use "from struct import pack" and pyaudio to do it!
stream = pyaud.open(
format = pyaudio.paFloat32,
channels = 1,
rate = 44100,
output = True,
input = True)
EncodeAgain = pack("%df"%(len(newsignal)), *list(newsignal))
And finally Play your framed audio, note that you will do it at every frame and play it in one loop, this process is too fast and the latency can be imperceptibly !
stream.write(EncodeAgain)
PS: This example is for float point format !
Ederwander,As u said I have treid coding but when packing the data, im getting total zero. so it is not streaming. I understand the problem may occur in converting the format data types.This is the code i have written. Please look at it and say the suggestion
import sys
import serial
import time
import os
from pydub import AudioSegment
import wave
from struct import pack
import numpy
import pyaudio
CHUNK = 1024
wf = wave.open('C:\Users\RAJU\Desktop\En_Iniya_Ponnilave.wav', 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2)
stream = p.open(format = p.get_format_from_width(wf.getsampwidth()),channels = wf.getnchannels(),rate = wf.getframerate(),output = True)
# read data
data_read = wf.readframes(CHUNK)
decoded = numpy.fromstring(data_read, 'int32', sep = '');
data = decoded*3.123
while(1):
EncodeAgain = struct.pack(h,data)
stream.write(EncodeAgain)

Speech or no speech detection in Python

I am writing a program that recognizes speech. What it does is it records audio from the microphone and converts it to text using Sphinx. My problem is I want to start recording audio only when something is spoken by the user.
I experimented by reading the audio levels from the microphone and recording only when the level is above a particular value. But it ain't that effective. The program starts recording whenever it detects anything loud. This is the code I used
import audioop
import pyaudio as pa
import wav
class speech():
def __init__(self):
# soundtrack properties
self.format = pa.paInt16
self.rate = 16000
self.channel = 1
self.chunk = 1024
self.threshold = 150
self.file = 'audio.wav'
# intialise microphone stream
self.audio = pa.PyAudio()
self.stream = self.audio.open(format=self.format,
channels=self.channel,
rate=self.rate,
input=True,
frames_per_buffer=self.chunk)
def record(self)
while True:
data = self.stream.read(self.chunk)
rms = audioop.rms(data,2) #get input volume
if rms>self.threshold: #if input volume greater than threshold
break
# array to store frames
frames = []
# record upto silence only
while rms>threshold:
data = self.stream.read(self.chunk)
rms = audioop.rms(data,2)
frames.append(data)
print 'finished recording.... writing file....'
write_frames = wav.open(self.file, 'wb')
write_frames.setnchannels(self.channel)
write_frames.setsampwidth(self.audio.get_sample_size(self.format))
write_frames.setframerate(self.rate)
write_frames.writeframes(''.join(frames))
write_frames.close()
Is there a way I can differentiate between human voice and other noise in Python ? Hope somebody can find me a solution.
I think that your issue is that at the moment you are trying to record without recognition of the speech so it is not discriminating - recognisable speech is anything that gives meaningful results after recognition - so catch 22. You could simplify matters by looking for an opening keyword. You can also filter on voice frequency range as the human ear and the telephone companies both do and you can look at the mark space ratio - I believe that there were some publications a while back on that but look out - it varies from language to language. A quick Google can be very informative. You may also find this article interesting.
I think waht you are looking for is VAD (voice activity detection). VAD can be used for preprocessing speech for ASR. Here is some open-source project for implements of VAD link. May it help you.
This is an example script using a VAD library.
https://github.com/wiseman/py-webrtcvad/blob/master/example.py

Categories