I've been experimenting with the python speech recognition library https://pypi.python.org/pypi/SpeechRecognition/
To read downloaded versions of the BBC shipping forecast. The clipping of those files from live radio to the iplayer are obviously automated and not very accurate - so usually there is some audio before the forecast itself starts - a trailer, or the end of the news. I don't need to be that accurate but I'd like to get speech recognition to recognise the phrase "and now the shipping forecast" (or just 'shipping' would do actually) and cut the file from there.
My code so far (adpated form an example) transcribes and audio file of the forecast and uses a formula (based on 200 words per minute) to predict where the word shipping comes, but it's not proving to be very accurate.
Is there a way of getting the actual 'frame' or second onset that pocketsphinx itself detected for that word? I can't find anything in the documentation. Anyone any ideas?
import speech_recognition as sr
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "test_short2.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
# recognize speech using Sphinx
try:
print "Sphinx thinks you said "
returnedSpeech = str(r.recognize_sphinx(audio))
wordsList = returnedSpeech.split()
print returnedSpeech
print "predicted loacation of start ", float(wordsList.index("shipping")) * 0.3
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
You need to use pocketsphinx API directly for such things. It is also highly recommended to read pocketsphinx documentation on keyword spotting.
You can spot for keyphrase as demonstrated in example:
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'en-us/en-us'))
config.set_string('-dict', os.path.join(modeldir, 'en-us/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'shipping forecast')
config.set_float('-kws_threshold', 1e-30)
stream = open(os.path.join(datadir, "test_short2.wav"), "rb")
decoder = Decoder(config)
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
if decoder.hyp() != None:
print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
print ("Detected keyphrase, restarting search")
decoder.end_utt()
decoder.start_utt()
Related
I am building a voice assistant that can tell stories. While the bot is telling stories I want to interrupt in between and ask it to stop or go backward or to end the story. I tried few ways, but they are not working I am not able to listen while it is speaking cause after the speaking part gets ended it goes to listening part.
Thanks in advance
Here is my code
while True:
r = sr.Recognizer()
with sr.Microphone() as source:
print("Talk")
audio_text = r.listen(source)
print("Time over, thanks")
try:
inp=r.recognize_google(audio_text, language = "en-IN")
print("Text: "+inp)
except:
inp="sorry"
print("Sorry, I did not get that")
if inp.lower() == "quit":
bot_voice("Ok bye see you later")
break
if inp.lower() == "sorry":
bot_voice("Sorry, I did not get that")
if (deter==0):
y=-1
deter=1
for x in stories:
m=x
y=m.find(inp)
if(y>-1):
filename = 'Stories/data/'+x+'.txt'
with open(filename) as f_in:
for line in f_in:
bot_voice(line)
break
else:
results = model.predict([bag_of_words(inp, words)])
results_index = numpy.argmax(results)
tag = labels[results_index]
for tg in data["intents"]:
if tg['tag'] == tag:
responses = tg['responses']
reply=random.choice(responses)
if(reply=='7417'):
filename = "Stories/list.txt"
bot_voice("I know quite a few stories, they are")
with open(filename) as f_in:
for x,line in enumerate(f_in):
bot_voice(line)
bot_voice("which one you want")
deter=0
else:
print("bot:",reply)
bot_voice(reply)
This isn't possible with the speech recognition you are using. This speech recognition takes input and doesn't provide output. With your output system, which I assume is something like pyttsx, would just read as it is told. You would require a stop system and you would need to do this with a machine learning-based program that is capable of conversations and can stop when told to stop and take keywords as commands.
I recommend Pytorch as a starter for Python Machine Learning. Here is an article on conversational AI.
Article on Conversational AI with Pytorch
I'm writing a simple python program to control my LED lights with voice commands and am running into a problem: after a few hours of continuous running, it freezes without any kind of error. I checked task manager every time it was frozen and it didn't show any thing significant, everything was well within normal parameters. I believe it may have something to do with the SpeechRecognition module I'm using, as it always freezes in the "recordAudio()" function below.
(Excuse the iffy code, I'm new to programming)
from BLEDevices import *
import speech_recognition as sr
import winsound
import wolframalpha
import pyttsx3
app_id = 'V3R5HG-H7U9VH6YR4'
client = wolframalpha.Client(app_id)
engine = pyttsx3.init()
def recordAudio(phrase):
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print(phrase)
audio = r.listen(source)
# recognize speech using Google Speech Recognition
try:
data = r.recognize_google(audio)
print("Speech Recognition thinks you said " + data)
except sr.UnknownValueError:
return "failed"
except sr.RequestError as e:
print("Could not request results from Speech Recognition service; {0}".format(e))
return "failed"
return data
def wolfram(text):
try:
res = client.query(text)
result = next(res.results).text
except:
result = 'Unable to Answer Query'
print(result)
engine.say(result)
engine.runAndWait()
def processAudio(text):
if ('color' in text or 'light' in text or 'led' in text or 'brightness' in text or 'work mode' in text or 'relax mode' in text):
led_control(text)
else:
wolfram(text)
while True:
text = recordAudio('Listening for "Controller"...').lower()
if 'controller' in text:
winsound.Beep(500, 100)
text = recordAudio('Listening...').lower()
processAudio(text)
else:
print('\b', end='')
I am currently making the rough outline for a voice controller so I can stop designing parts, but even while making the rough outline I ran into some issues with my rusty skill level in if statement and syntax of variable addition. The specific error is that the program exits out after the first time I ask the script how its doing rather than continuing to run like it should if I remove the line with the variable addition along with the chunk about the differed statement that replaces the old "how are you" after it being asked once.
from time import ctime
import time
import os
import pyttsx3
import random
repetitionsocial1=0
numberList = ["Thanks for asking. But I am a computer","222","I eat poop","444","555"]
def recordAudio():
# Record Audio
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# Speech recognition using Google Speech Recognition
data = ""
try:
# Uses the default API key
# To use another API key: `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
data = r.recognize_google(audio)
print("You said: " + data)
except sr.UnknownValueError:
print("Offline Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Offline Recognition service; {0}".format(e))
return data
def jarvis(data):
global repetitionsocial1
if "how are you" in data and repetitionsocial1==0 :
repetitionsocial1=repetitionsocial1+1
engine = pyttsx3.init()
engine.say("What answer do you expect. I am a Computer?")
engine.runAndWait()
if "how are you" in data and repetitionsocial1>=0:
engine = pyttsx3.init()
engine.say("I am still fine, but again, I am a computer. You have asked me this"+str(repetitionsocial1)+"times")
engine.runAndWait()
if "what time is it" in data:
engine = pyttsx3.init()
engine.say(ctime())
engine.runAndWait()
# if "where is" in data:
# data = data.split(" ")
# location = data[2]
# speak("Hold on Frank, I will show you where " + location + " is.")
# os.system("chromium-browser https://www.google.nl/maps/place/" + location + "/&")
# initialization
time.sleep(2)
engine = pyttsx3.init()
engine.say("Hi Frank, what can I do for you?")
engine.runAndWait()
while 1:
data = recordAudio()
jarvis(data)
Look at these two snippets.
if "how are you" in data and repetitionsocial1==0 :
repetitionsocial1=repetitionsocial1+1
...
if "how are you" in data and repetitionsocial1>=0:
...
when the first condition is true, it adds 1 to repetitionsocial1, so now that variable contains 1. Therefore, the second condition will also be true, because 1>=0.
You should use elif when you have mutually exclusive conditions, and don't want to consider the variable changes that occurred as a result of the first condition.
I am just starting out with python and the google speech api/speech recognition. I was wondering if it is possible to always have the speech api listening for a keyword and when it hears the keyword to process commands. Since there is a limit on how much free audio the google speech api can process, is this possible? So far I have code that looks like this, but once the api does not hear any speech for a certain amount of seconds(I think 4), it throws an error. In a final project, I'd like to get this to work on a raspberry pi 3.
import speech_recognition as sr
import speak
from time import ctime
import time
import sys
r = sr.Recognizer()
lang = 'en'
data = ''
nameCalled = 0
# Enable Microphone and use for audio input
# Speech recognition using Google Speech Recognition
def spk(text, lang):
speak.tts(text, lang)
def audioRecord():
try:
with sr.Microphone() as source:
#r.energy_threshold = 500
# Increase for less sensitivity, decrease for more
print('Listening...')
audio = r.listen(source)
#r.adjust_for_ambient_noise(source)
data = r.recognize_google(audio)
print('You said ' + data)
return data
except sr.UnknownValueError:
print('Google could not understand audio!')
except sr.RequestError as e:
print('Could not request results for GSR')
def brain(data):
global nameCalled
#^^Keep track to see if amber was called^^
global lang
#If amber was said, than the next command heard can be executed
if nameCalled == 0:
if 'Amber' in data:
nameCalled = 1
spk('Yes?', lang)
elif 'nothing' in data:
spk('Okay', lang)
sys.exit()
else:
return 'null'
#Once we hear amber, the next command spoken can be executed,
# if something goes wrong, just set the nameCalled variable to 0
#and restart the process
elif nameCalled == 1:
if 'what time is it' in data:
spk(ctime(), lang)
if 'nothing' in data:
spk('Okay', lang)
sys.exit()
nameCalled = 0
else:
nameCalled = 0
# initialization
spk('hello nick, what can I do for you today', lang)
while 1:
data = audioRecord()
brain(data)
Kitt.ai provides 'Snowboy', a hotword detection engine which serves that purpose. You may trigger the speech recognition after the hotword is detected and it's pretty accurate too and it exactly fits this use-case.
Best of all, it runs offline.
You can set your code to run, after getting triggered by the hotword.
Check it out:
https://snowboy.kitt.ai
I have searched and tried to implement solutions suggested here:
Errno 13 Permission denied: 'file.mp3' Python
Error while re-opening sound file in python
But there doesn't seem to be any good solutions to this. Here is my code, can anyone tell me what I am doing wrong here:
#!/usr/bin/env python3
# Requires PyAudio and PySpeech.
import time, os
import speech_recognition as sr
from gtts import gTTS
import pygame as pg
import mutagen.mp3
#Find out what input sound device is default (use if you have issues with microphone)
#import pyaudio
#sdev= pyaudio.pa.get_default_input_device()
def play_music(sound_file, volume=0.8):
'''
stream music with mixer.music module in a blocking manner
this will stream the sound from disk while playing
'''
# set up the mixer, this will set it up according to your sound file
mp3 = mutagen.mp3.MP3(sound_file)
pg.mixer.init(frequency=mp3.info.sample_rate)
pg.mixer.music.set_volume(volume)
try:
pg.mixer.music.load(sound_file)
print("HoBo Sound file {} loaded!".format(sound_file))
except pg.error:
print("HoBo Sound file {} not found! ({})".format(sound_file, pg.get_error()))
return
pg.mixer.music.play()
while pg.mixer.music.get_busy() == True:
continue
pg.mixer.quit()
sound_file.close()
def speak(audioString):
print(audioString)
tts = gTTS(text=audioString, lang='en')
tts.save("audio.mp3")
# pick a mp3 file in folder or give full path
sound_file = "audio.mp3"
# optional volume 0 to 1.0
volume = 0.6
play_music(sound_file, volume)
def audioIn():
# Record Audio from Microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# Google Speech Recognition
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
data = r.recognize_google(audio)
print("You said: ", data)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
return data
def hobo(data):
if "how are you" in data:
speak("I am fine")
if "what time is it" in data:
speak(time.ctime())
if "where is" in data:
data = data.split(" ")
location = data[2]
speak("Hold on Sir, I will show you where " + location + " is.")
os.system("chromium-browser https://www.google.nl/maps/place/" + location + "/&")
# Starts the program
#time.sleep(2)
speak("Testing")
while(data != "stop"):
data = audioIn()
hobo(data)
else:
quit
So I found the fix in one of the original threads I already went over. The fix was to implement a delete() function like so:
def delete():
time.sleep(2)
pg.mixer.init()
pg.mixer.music.load("somefilehere.mp3")
os.remove("audio.mp3")
and changing the play_music() function so it includes the delete() function in the end (and I removed the sound_file.close() statement of course).
Follow below method
import time
from gtts import gTTS
import pygame
def Text_to_speech():
Message = "hey there"
speech = gTTS(text=Message)
speech.save('textToSpeech.mp3')
pygame.mixer.init()
pygame.mixer.music.load("textToSpeech.mp3")
pygame.mixer.music.play()
time.sleep(3)
pygame.mixer.music.unload()