I am building a voice assistant that can tell stories. While the bot is telling stories I want to interrupt in between and ask it to stop or go backward or to end the story. I tried few ways, but they are not working I am not able to listen while it is speaking cause after the speaking part gets ended it goes to listening part.
Thanks in advance
Here is my code
while True:
r = sr.Recognizer()
with sr.Microphone() as source:
print("Talk")
audio_text = r.listen(source)
print("Time over, thanks")
try:
inp=r.recognize_google(audio_text, language = "en-IN")
print("Text: "+inp)
except:
inp="sorry"
print("Sorry, I did not get that")
if inp.lower() == "quit":
bot_voice("Ok bye see you later")
break
if inp.lower() == "sorry":
bot_voice("Sorry, I did not get that")
if (deter==0):
y=-1
deter=1
for x in stories:
m=x
y=m.find(inp)
if(y>-1):
filename = 'Stories/data/'+x+'.txt'
with open(filename) as f_in:
for line in f_in:
bot_voice(line)
break
else:
results = model.predict([bag_of_words(inp, words)])
results_index = numpy.argmax(results)
tag = labels[results_index]
for tg in data["intents"]:
if tg['tag'] == tag:
responses = tg['responses']
reply=random.choice(responses)
if(reply=='7417'):
filename = "Stories/list.txt"
bot_voice("I know quite a few stories, they are")
with open(filename) as f_in:
for x,line in enumerate(f_in):
bot_voice(line)
bot_voice("which one you want")
deter=0
else:
print("bot:",reply)
bot_voice(reply)
This isn't possible with the speech recognition you are using. This speech recognition takes input and doesn't provide output. With your output system, which I assume is something like pyttsx, would just read as it is told. You would require a stop system and you would need to do this with a machine learning-based program that is capable of conversations and can stop when told to stop and take keywords as commands.
I recommend Pytorch as a starter for Python Machine Learning. Here is an article on conversational AI.
Article on Conversational AI with Pytorch
I am currently making the rough outline for a voice controller so I can stop designing parts, but even while making the rough outline I ran into some issues with my rusty skill level in if statement and syntax of variable addition. The specific error is that the program exits out after the first time I ask the script how its doing rather than continuing to run like it should if I remove the line with the variable addition along with the chunk about the differed statement that replaces the old "how are you" after it being asked once.
from time import ctime
import time
import os
import pyttsx3
import random
repetitionsocial1=0
numberList = ["Thanks for asking. But I am a computer","222","I eat poop","444","555"]
def recordAudio():
# Record Audio
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# Speech recognition using Google Speech Recognition
data = ""
try:
# Uses the default API key
# To use another API key: `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
data = r.recognize_google(audio)
print("You said: " + data)
except sr.UnknownValueError:
print("Offline Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Offline Recognition service; {0}".format(e))
return data
def jarvis(data):
global repetitionsocial1
if "how are you" in data and repetitionsocial1==0 :
repetitionsocial1=repetitionsocial1+1
engine = pyttsx3.init()
engine.say("What answer do you expect. I am a Computer?")
engine.runAndWait()
if "how are you" in data and repetitionsocial1>=0:
engine = pyttsx3.init()
engine.say("I am still fine, but again, I am a computer. You have asked me this"+str(repetitionsocial1)+"times")
engine.runAndWait()
if "what time is it" in data:
engine = pyttsx3.init()
engine.say(ctime())
engine.runAndWait()
# if "where is" in data:
# data = data.split(" ")
# location = data[2]
# speak("Hold on Frank, I will show you where " + location + " is.")
# os.system("chromium-browser https://www.google.nl/maps/place/" + location + "/&")
# initialization
time.sleep(2)
engine = pyttsx3.init()
engine.say("Hi Frank, what can I do for you?")
engine.runAndWait()
while 1:
data = recordAudio()
jarvis(data)
Look at these two snippets.
if "how are you" in data and repetitionsocial1==0 :
repetitionsocial1=repetitionsocial1+1
...
if "how are you" in data and repetitionsocial1>=0:
...
when the first condition is true, it adds 1 to repetitionsocial1, so now that variable contains 1. Therefore, the second condition will also be true, because 1>=0.
You should use elif when you have mutually exclusive conditions, and don't want to consider the variable changes that occurred as a result of the first condition.
I am looking for a small pause, wait, break or anything that will allow for a short break (looking for about 2 seconds +-, configurable would be ideal) when speaking out the desired text.
People online have said that adding three full stops followed by a space creates a break but I don't seem to be getting that. Code below is my test that has no pauses, sadly.. Any ideas or suggestions?
Edit: It would be ideal if there is some command from gTTS that would allow me to do this, or maybe some trick like using the three full stops if that actually worked.
from gtts import gTTS
import os
tts = gTTS(text=" Testing ... if there is a pause ... ... ... ... ... longer pause? ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... insane pause " , lang='en', slow=False)
tts.save("temp.mp3")
os.system("temp.mp3")
Ok, you need Speech Synthesis Markup Language (SSML) to achieve this.
Be aware you need to setting up Google Cloud Platform credentials
first in the bash:
pip install --upgrade google-cloud-texttospeech
Then here is the code:
import html
from google.cloud import texttospeech
def ssml_to_audio(ssml_text, outfile):
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Writes the synthetic audio to the output file.
with open(outfile, "wb") as out:
out.write(response.audio_content)
print("Audio content written to file " + outfile)
def text_to_ssml(inputfile):
raw_lines = inputfile
# Replace special characters with HTML Ampersand Character Codes
# These Codes prevent the API from confusing text with
# SSML commands
# For example, '<' --> '<' and '&' --> '&'
escaped_lines = html.escape(raw_lines)
# Convert plaintext to SSML
# Wait two seconds between each address
ssml = "<speak>{}</speak>".format(
escaped_lines.replace("\n", '\n<break time="2s"/>')
)
# Return the concatenated string of ssml script
return ssml
text = """Here are <say-as interpret-as="characters">SSML</say-as> samples.
I can pause <break time="3s"/>.
I can play a sound"""
ssml = text_to_ssml(text)
ssml_to_audio(ssml, "test.mp3")
More documentation:
Speaking addresses with SSML
But if you don't have Google Cloud Platform credentials, the cheaper and easier way is to use time.sleep(1) method
If there is any background waits required, you can use the time module to wait as below.
import time
# SLEEP FOR 5 SECONDS AND START THE PROCESS
time.sleep(5)
Or you can do a 3 time check with wait etc..
import time
for tries in range(3):
if someprocess() is False:
time.sleep(3)
You can save multiple mp3 files, then use time.sleep() to call each with your desired amount of pause:
from gtts import gTTS
import os
from time import sleep
tts1 = gTTS(text="Testingn" , lang='en', slow=False)
tts2 = gTTS(text="if there is a pause" , lang='en', slow=False)
tts3 = gTTS(text="insane pause " , lang='en', slow=False)
tts1.save("temp1.mp3")
tts2.save("temp2.mp3")
tts3.save("temp3.mp3")
os.system("temp1.mp3")
sleep(2)
os.system("temp2.mp3")
sleep(3)
os.system("temp3.mp3")
Sadly the answer is no, gTTS package has no additional function for pause,an issue already been created in 2018 for adding a pause function ,but it is smart enough to add natural pauses by tokenizer.
What is tokenizer?
Function that takes text and returns it split into a list of tokens (strings). In the gTTS context, its goal is
to cut the text into smaller segments that do not exceed the maximum character size allowed(100) for each TTS API
request, while making the speech sound natural and continuous. It does so by splitting text where speech would
naturaly pause (for example on ".") while handling where it should not (for example on “10.5” or “U.S.A.”).
Such rules are called tokenizer cases, which it takes a list of.
Here is an example:
text = "regular text speed no pause regular text speed comma pause, regular text speed period pause. regular text speed exclamation pause! regular text speed ellipses pause... regular text speed new line pause \n regular text speed "
So in this case, adding a sleep() seems like the only answer. But tricking the tokenizer is worth mentioning.
You can add arbitrary pause with Pydub by saving and concatenating temporary mp3. Then you can use a silent audio for your pause.
You can use any break point symbols of your choice where you want to add pause (here $):
from pydub import AudioSegment
from gtts import gTTS
contents = "Hello with $$ 2 seconds pause"
contents.split("$") # I have chosen this symbol for the pause.
pause2s = AudioSegment.from_mp3("silent.mp3")
# silent.mp3 contain 2s blank mp3
cnt = 0
for p in parts:
# The pause will happen for the empty element of the list
if not p:
combined += pause2s
else:
tts = gTTS(text=p , lang=langue, slow=False)
tmpFileName="tmp"+str(cnt)+".mp3"
tts.save(tmpFileName)
combined+=AudioSegment.from_mp3(tmpFileName)
cnt+=1
combined.export("out.mp3", format="mp3")
Late to the party here, but you might consider trying out the audio_program_generator package. You provide a text file comprised of individual phrases, each of which has a configurable pause at the end. In return, it gives you an mp3 file that 'stitches together' all the phrases and their pauses into one continuous audio file. You can optionally mix in a background sound-file, as well. And it implements several of the other bells and whistles that Google TTS provides, like accents, slow-play-speech, etc.
Disclaimer: I am the author of the package.
I had the same problem, and didn't want to use lots of temporary files on disk. This code parses an SSML file, and creates silence whenever a <break> tag is found:
import io
from gtts import gTTS
import lxml.etree as etree
import pydub
ssml_filename = 'Section12.35-edited.ssml'
wav_filename = 'Section12.35-edited.mp3'
events = ('end',)
DEFAULT_BREAK_TIME = 250
all_audio = pydub.AudioSegment.silent(100)
for event, element in etree.iterparse(
ssml_filename,
events=events,
remove_comments=True,
remove_pis=True,
attribute_defaults=True,
):
tag = etree.QName(element).localname
if tag in ['p', 's'] and element.text:
tts = gTTS(element.text, lang='en', tld='com.au')
with io.BytesIO() as temp_bytes:
tts.write_to_fp(temp_bytes)
temp_bytes.seek(0)
audio = pydub.AudioSegment.from_mp3(temp_bytes)
all_audio = all_audio.append(audio)
elif tag == 'break':
# write silence to the file.
time = element.attrib.get('time', None) # Shouldn't be possible to have no time value.
if time:
if time.endswith('ms'):
time_value = int(time.removesuffix('ms'))
elif time.endswith('s'):
time_value = int(time.removesuffix('s')) * 1000
else:
time_value = DEFAULT_BREAK_TIME
else:
time_value = DEFAULT_BREAK_TIME
silence = pydub.AudioSegment.silent(time_value)
all_audio = all_audio.append(silence)
with open(wav_filename, 'wb') as output_file:
all_audio.export(output_file, format='mp3')
I know 4Rom1 used this method above, but to put it more simply, I found this worked really well for me. Get a 1 sec silent mp3, I found one by googling 1 sec silent mp3. Then use pydub to add together audio segments however many times you need. For example to add 3 seconds of silence
from pydub import AudioSegment
seconds = 3
output = AudioSegment.from_file("yourfile.mp3")
output += AudioSegment.from_file("1sec_silence.mp3") * seconds
output.export("newaudio.mp3", format="mp3")
I want to show the current playing song in Spotify on a 16x2 LCD.
I was thinking of connecting the LCD with my Arduino and then making a Python script that sends the current playing song of Spotify to the Arduino.
To get to the point, I'm looking for a way to get Spotify's current playing song in Python. (I'm using Windows 8.) I found some ways like dbus, but they were either for Linux or for Mac.
Thanks in advance! (And sorry for bad English grammar.)
I encountered the same issue, so I wrote a library to solve this issue. The library can be found at github: https://github.com/XanderMJ/spotilib. Keep in mind that this is still work in progress.
Just copy the file and place it in your Python/Lib directory.
import spotilib
spotilib.artist() #returns the artist of the current playing song
spotilib.song() #returns the song title of the current playing song
spotilib.artist() returns only the first artist. I started working on an other library spotimeta.py to solve this issue. However, this is not working at 100% yet.
import spotimeta
spotimeta.artists() #returns a list of all the collaborating artists of the track
If an error occurs, spotimeta.artists() will return only the first artist (found with spotilib.artist())
Hope this will help you (if still needed)!
The easiest way would probably be to scrobble the currently playing tracks from the Spotify client to a last.fm account and then use python to get it from there.
Last.fm allows you to get scrobbled tracks via their api with user.getRecentTracks which provides the nowplaying="true" attribute if a song is playing. It also provides some other useful things you may want for an external display like a link to the album art and last.fm page for the song.
Here's a quick example that takes a username and api key as cmd line arguments and fetches what is currently playing for that user using the requests library.
from time import sleep
import requests
import json
from pprint import pprint
import sys
def get_playing():
base_url = 'http://ws.audioscrobbler.com/2.0/?method=user.getrecenttracks&user='
user = sys.argv[1]
key = sys.argv[2]
r = requests.get(base_url+user+'&api_key='+key+'&format=json')
data = json.loads(r.text)
latest_track = data['recenttracks']['track'][0]
try:
if latest_track['#attr']['nowplaying'] == 'true':
artist = latest_track['artist']['#text']
song = latest_track['name']
album = latest_track['album']['#text']
print "\nNow Playing: {0} - {1}, from the album {2}.\n".format(artist, song, album)
except KeyError:
print '\nNothing playing...\n'
def main():
if len(sys.argv)<3:
print "\nError: Please provide a username and api key in the format of: 'scrobble.py username api_key\n"
else:
get_playing()
if __name__ == '__main__':
main()
In a quick test it does seem to take a minute or so to realize the track is no longer playing after pausing or exiting the Spotify client however.
Is there more than one pytify ?
This worked for me until today:
https://code.google.com/p/pytify/
Spotify has been updated, now song is not shown in windows title anymore.
They will bring the "feature" back:
https://community.spotify.com/t5/ideas/v2/ideapage/blog-id/ideaexchange/article-id/73238/page/2#comments
while True:
print("Welcome to the project, " + user_name['display_name'])
print("0 - Exit the console")
print("1 - Search for a Song")
user_input = int(input("Enter Your Choice: "))
if user_input == 1:
search_song = input("Enter the song name: ")
results = spotifyObject.search(search_song, 1, 0, "track")
songs_dict = results['tracks']
song_items = songs_dict['items']
song = song_items[0]['external_urls']['spotify']
webbrowser.open(song)
print('Song has opened in your browser.')
elif user_input == 0:
print("Good Bye, Have a great day!")
break
else:
print("Please enter valid user-input.")
I created a small module to speak the text that is sent to it.
It works fine if I don't use engine.setProperty to set the voice, but if I set the voice it will only play the first command.
import pyttsx
def speak( text ):
if text != "":
engine = pyttsx.init()
engine.setProperty('voice', "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\VW Kate") #if I don't do this line then it says both the commands
engine.say( text )
engine.runAndWait()
else:
print "you didnt enter anything"
if __name__ == "__main__":
speak("Hello")
speak("This one won't play unless I use the default voice")
I think you should try the following code snippet :
import pyttsx
engine = pyttsx.init()
engine.say('Sally sells seashells by the seashore.')
engine.say('The quick brown fox jumped over the lazy dog.')
engine.runAndWait()
which is originally from this page