Adding a pause in Google-text-to-speech

Adding a pause in Google-text-to-speech - python

I am looking for a small pause, wait, break or anything that will allow for a short break (looking for about 2 seconds +-, configurable would be ideal) when speaking out the desired text.
People online have said that adding three full stops followed by a space creates a break but I don't seem to be getting that. Code below is my test that has no pauses, sadly.. Any ideas or suggestions?
Edit: It would be ideal if there is some command from gTTS that would allow me to do this, or maybe some trick like using the three full stops if that actually worked.
from gtts import gTTS
import os
tts = gTTS(text=" Testing ... if there is a pause ... ... ... ... ... longer pause? ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... insane pause " , lang='en', slow=False)
tts.save("temp.mp3")
os.system("temp.mp3")

Ok, you need Speech Synthesis Markup Language (SSML) to achieve this.
Be aware you need to setting up Google Cloud Platform credentials
first in the bash:
pip install --upgrade google-cloud-texttospeech
Then here is the code:
import html
from google.cloud import texttospeech
def ssml_to_audio(ssml_text, outfile):
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Writes the synthetic audio to the output file.
with open(outfile, "wb") as out:
out.write(response.audio_content)
print("Audio content written to file " + outfile)
def text_to_ssml(inputfile):
raw_lines = inputfile
# Replace special characters with HTML Ampersand Character Codes
# These Codes prevent the API from confusing text with
# SSML commands
# For example, '<' --> '<' and '&' --> '&'
escaped_lines = html.escape(raw_lines)
# Convert plaintext to SSML
# Wait two seconds between each address
ssml = "<speak>{}</speak>".format(
escaped_lines.replace("\n", '\n<break time="2s"/>')
)
# Return the concatenated string of ssml script
return ssml
text = """Here are <say-as interpret-as="characters">SSML</say-as> samples.
I can pause <break time="3s"/>.
I can play a sound"""
ssml = text_to_ssml(text)
ssml_to_audio(ssml, "test.mp3")
More documentation:
Speaking addresses with SSML
But if you don't have Google Cloud Platform credentials, the cheaper and easier way is to use time.sleep(1) method

If there is any background waits required, you can use the time module to wait as below.
import time
# SLEEP FOR 5 SECONDS AND START THE PROCESS
time.sleep(5)
Or you can do a 3 time check with wait etc..
import time
for tries in range(3):
if someprocess() is False:
time.sleep(3)

You can save multiple mp3 files, then use time.sleep() to call each with your desired amount of pause:
from gtts import gTTS
import os
from time import sleep
tts1 = gTTS(text="Testingn" , lang='en', slow=False)
tts2 = gTTS(text="if there is a pause" , lang='en', slow=False)
tts3 = gTTS(text="insane pause " , lang='en', slow=False)
tts1.save("temp1.mp3")
tts2.save("temp2.mp3")
tts3.save("temp3.mp3")
os.system("temp1.mp3")
sleep(2)
os.system("temp2.mp3")
sleep(3)
os.system("temp3.mp3")

Sadly the answer is no, gTTS package has no additional function for pause,an issue already been created in 2018 for adding a pause function ,but it is smart enough to add natural pauses by tokenizer.
What is tokenizer?
Function that takes text and returns it split into a list of tokens (strings). In the gTTS context, its goal is
to cut the text into smaller segments that do not exceed the maximum character size allowed(100) for each TTS API
request, while making the speech sound natural and continuous. It does so by splitting text where speech would
naturaly pause (for example on ".") while handling where it should not (for example on “10.5” or “U.S.A.”).
Such rules are called tokenizer cases, which it takes a list of.
Here is an example:
text = "regular text speed no pause regular text speed comma pause, regular text speed period pause. regular text speed exclamation pause! regular text speed ellipses pause... regular text speed new line pause \n regular text speed "
So in this case, adding a sleep() seems like the only answer. But tricking the tokenizer is worth mentioning.

You can add arbitrary pause with Pydub by saving and concatenating temporary mp3. Then you can use a silent audio for your pause.
You can use any break point symbols of your choice where you want to add pause (here $):
from pydub import AudioSegment
from gtts import gTTS
contents = "Hello with $$ 2 seconds pause"
contents.split("$") # I have chosen this symbol for the pause.
pause2s = AudioSegment.from_mp3("silent.mp3")
# silent.mp3 contain 2s blank mp3
cnt = 0
for p in parts:
# The pause will happen for the empty element of the list
if not p:
combined += pause2s
else:
tts = gTTS(text=p , lang=langue, slow=False)
tmpFileName="tmp"+str(cnt)+".mp3"
tts.save(tmpFileName)
combined+=AudioSegment.from_mp3(tmpFileName)
cnt+=1
combined.export("out.mp3", format="mp3")

Late to the party here, but you might consider trying out the audio_program_generator package. You provide a text file comprised of individual phrases, each of which has a configurable pause at the end. In return, it gives you an mp3 file that 'stitches together' all the phrases and their pauses into one continuous audio file. You can optionally mix in a background sound-file, as well. And it implements several of the other bells and whistles that Google TTS provides, like accents, slow-play-speech, etc.
Disclaimer: I am the author of the package.

I had the same problem, and didn't want to use lots of temporary files on disk. This code parses an SSML file, and creates silence whenever a <break> tag is found:
import io
from gtts import gTTS
import lxml.etree as etree
import pydub
ssml_filename = 'Section12.35-edited.ssml'
wav_filename = 'Section12.35-edited.mp3'
events = ('end',)
DEFAULT_BREAK_TIME = 250
all_audio = pydub.AudioSegment.silent(100)
for event, element in etree.iterparse(
ssml_filename,
events=events,
remove_comments=True,
remove_pis=True,
attribute_defaults=True,
):
tag = etree.QName(element).localname
if tag in ['p', 's'] and element.text:
tts = gTTS(element.text, lang='en', tld='com.au')
with io.BytesIO() as temp_bytes:
tts.write_to_fp(temp_bytes)
temp_bytes.seek(0)
audio = pydub.AudioSegment.from_mp3(temp_bytes)
all_audio = all_audio.append(audio)
elif tag == 'break':
# write silence to the file.
time = element.attrib.get('time', None) # Shouldn't be possible to have no time value.
if time:
if time.endswith('ms'):
time_value = int(time.removesuffix('ms'))
elif time.endswith('s'):
time_value = int(time.removesuffix('s')) * 1000
else:
time_value = DEFAULT_BREAK_TIME
else:
time_value = DEFAULT_BREAK_TIME
silence = pydub.AudioSegment.silent(time_value)
all_audio = all_audio.append(silence)
with open(wav_filename, 'wb') as output_file:
all_audio.export(output_file, format='mp3')

I know 4Rom1 used this method above, but to put it more simply, I found this worked really well for me. Get a 1 sec silent mp3, I found one by googling 1 sec silent mp3. Then use pydub to add together audio segments however many times you need. For example to add 3 seconds of silence
from pydub import AudioSegment
seconds = 3
output = AudioSegment.from_file("yourfile.mp3")
output += AudioSegment.from_file("1sec_silence.mp3") * seconds
output.export("newaudio.mp3", format="mp3")

Related

Language Translator Using Google API in Python

I have used this code from geeksforgeeks (https://www.geeksforgeeks.org/language-translator-using-google-api-in-python/), I am trying to run it and it runs without any error, and it prints out:
Speak 'hello' to initiate the Translation !
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
but when i say "hello" it does not recognize it and do not start listening for translation.
I have imported all the modules, tried updating every one of them, and also Im using a macbook m1 pro.
And heres the code:
import speech_recognition as spr
from googletrans import Translator
from gtts import gTTS
import os
# Creating Recogniser() class object
recog1 = spr.Recognizer()
# Creating microphone instance
mc = spr.Microphone()
# Capture Voice
with mc as source:
print("Speak 'hello' to initiate the Translation !")
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
recog1.adjust_for_ambient_noise(source, duration=0.2)
audio = recog1.listen(source)
MyText = recog1.recognize_google(audio)
MyText = MyText.lower()
# Here initialising the recorder with
# hello, whatever after that hello it
# will recognise it.
if 'hello' in MyText:
# Translator method for translation
translator = Translator()
# short form of english in which
# you will speak
from_lang = 'en'
# In which we want to convert, short
# form of hindi
to_lang = 'hi'
with mc as source:
print("Speak a stentence...")
recog1.adjust_for_ambient_noise(source, duration=0.2)
# Storing the speech into audio variable
audio = recog1.listen(source)
# Using recognize.google() method to
# convert audio into text
get_sentence = recog1.recognize_google(audio)
# Using try and except block to improve
# its efficiency.
try:
# Printing Speech which need to
# be translated.
print("Phase to be Translated :"+ get_sentence)
# Using translate() method which requires
# three arguments, 1st the sentence which
# needs to be translated 2nd source language
# and 3rd to which we need to translate in
text_to_translate = translator.translate(get_sentence,
src= from_lang,
dest= to_lang)
# Storing the translated text in text
# variable
text = text_to_translate.text
# Using Google-Text-to-Speech ie, gTTS() method
# to speak the translated text into the
# destination language which is stored in to_lang.
# Also, we have given 3rd argument as False because
# by default it speaks very slowly
speak = gTTS(text=text, lang=to_lang, slow= False)
# Using save() method to save the translated
# speech in capture_voice.mp3
speak.save("captured_voice.mp3")
# Using OS module to run the translated voice.
os.system("start captured_voice.mp3")
# Here we are using except block for UnknownValue
# and Request Error and printing the same to
# provide better service to the user.
except spr.UnknownValueError:
print("Unable to Understand the Input")
except spr.RequestError as e:
print("Unable to provide Required Output".format(e))

from gtts import gTTS
from io import BytesIO
from pygame import mixer
import time
def speak():
mp3_fp = BytesIO()
tts = gTTS('KGF is a Great movie to watch', lang='en')
tts.write_to_fp(mp3_fp)
tts.save("Audio.mp3")
return mp3_fp
mixer.init()
sound = speak()
sound.seek(0)
mixer.music.load(sound, "mp3")
mixer.music.play()

How can i send many videos from list in telebot python

So I was writing a python script using telebot and got an error
A request to the Telegram API was unsuccessful. Error code: 400.
Description: Bad Request: file must be non-empty
I have tried different methods from many forums, but nothing helps
import telebot
import random
import time
token = #token here
bot = telebot.TeleBot(token)
shit = ["C:\\Users\\glebc\\Documents\\source(bot)\\3wZ3.gif.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\65216814_456719028224290_7745639790787166208_n.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\doc_2022-03-10_16-41-49.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\doc_2022-03-10_16-42-04.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\doc_2022-03-10_16-42-39.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\giphy.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_0080.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_0835.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_1362.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_4698.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_4962.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_6359.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_7497.MOV", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_7909.MOV", "C:\\Users\\glebc\\Documents\\source(bot)\\IMG_9540.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\mp4.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\video.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\комочек тьмы.mp4", "C:\\Users\\glebc\\Documents\\source(bot)\\кот.mp4"]
video = open(shit[random.randint(0, len(shit)-1)], 'rb')
#bot.message_handler(commands=['start'])
def start_message(message):
bot.send_message(message.chat.id, 'hello message 1')
#bot.message_handler(commands=['haha'])
def haha_message(message):
while True:
bot.send_video(message.chat.id, vidos)
time.sleep(3600) #1 hour
#bot.message_handler(commands=['hehe'])
def shit_message(message):
bot.send_video(message.chat.id, vidos)
bot.infinity_polling()
Also i dont understand error cause i dont close file only open

Problem can be because you open file only once and you never close it and open again.
When it reads then it move special pointer which shows where to read next time. When it reads to the end of file then this pointer is moved to the end of file and when it tries to read again then it trires to read from the end of file and there is nothing to read and it may say that you have empty file.
After reading you may have to use vidoe.seek(0) to move pointer to the beginnig of file.
Or you should close and open it again. And this can be even more useful because at this moment you select random file only once and later it would use always the same path. You should use random inside loop.
#bot.message_handler(commands=['haha'])
def haha_message(message):
while True:
video = open( random.choice(shit), 'rb')
bot.send_video(message.chat.id, video)
video.close()
time.sleep(3600) # 1 hour
and the same in other functions
#bot.message_handler(commands=['hehe'])
def shit_message(message):
video = open( random.choice(shit), 'rb')
bot.send_video(message.chat.id, video)
video.close()
BTW:
telegram may has some methods to execute tasks periodically.
For example module python-telegram-bot has telegram.ext.jobqueue for this.
Full working code
For tests I set logging.DEBUG to see all error messages.
Normally telebot catchs all errors and hides them.
I also used with open() as video so it automatically closes file.
import os
import random
import logging
import telebot
# display errors
telebot.logger.setLevel(logging.DEBUG)
TOKEN = os.getenv('TELEGRAM_TOKEN')
bot = telebot.TeleBot(TOKEN)
all_videos = [
"C:\\Users\\glebc\\Documents\\source(bot)\\3wZ3.gif.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\65216814_456719028224290_7745639790787166208_n.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\doc_2022-03-10_16-41-49.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\doc_2022-03-10_16-42-04.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\doc_2022-03-10_16-42-39.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\giphy.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_0080.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_0835.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_1362.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_4698.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_4962.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_6359.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_7497.MOV",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_7909.MOV",
"C:\\Users\\glebc\\Documents\\source(bot)\\IMG_9540.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\mp4.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\video.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\комочек тьмы.mp4",
"C:\\Users\\glebc\\Documents\\source(bot)\\кот.mp4"
]
#bot.message_handler(commands=['start'])
def start_message(message):
bot.send_message(message.chat.id, 'hello message 1')
#bot.message_handler(commands=['haha'])
def haha_message(message):
while True:
with open(random.choice(all_videos), 'rb') as video:
bot.send_video(message.chat.id, video)
time.sleep(3600) # 1 hour
#bot.message_handler(commands=['hehe'])
def shit_message(message):
with open(random.choice(all_videos), 'rb') as video:
bot.send_video(message.chat.id, video)
bot.infinity_polling()

Why google speech_v1p1beta1 output only shows the last word?

I am using the code below to transcribe an audio file. When the process is completed, I only get the last word.
I have tried both flac and wav files and made sure the files are in my bucket.
Also verified service account is google is working fine. But can't figure out why I am only getting the last word.
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://MYBUCKET/MYAudiofile")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
RESULTS is shown here after running the code:
python diarazation.py
Waiting for operation to complete...
word: 'bye', speaker_tag: 0

pyttsx produces no sound

I am just creating a chatbot in Python. It's working well but I want to add pyttsx to this chatbot so that it could speak its output.
My code is
import aiml
import sys
import pyttsx
engine = pyttsx.init()
# Create a Kernel object.
kern = aiml.Kernel()
brainLoaded = False
forceReload = False
while not brainLoaded:
if forceReload or (len(sys.argv) >= 2 and sys.argv[1] == "reload"):
kern.bootstrap(learnFiles="std-startup.xml", commands="load aiml b")
brainLoaded = True
kern.saveBrain("standard.brn")
else:
try:
kern.bootstrap(brainFile = "standard.brn")
brainLoaded = True
except:
forceReload = True
print "\nINTERACTIVE MODE (ctrl-c to exit)"
while(True):
hea = kern.respond(raw_input("> "))
print hea
engine.say (hea)
engine.runAndWait()
When I am running this code I am not hearing any voice but I can see chat on terminal. I want it to speak the response, too. What am I doing wrong?

engine.runAndWait is outside the while(True): loop, so it's unlikely to be played until the loop is interrupted.
If you move it into the loop, and and the sound is choppy, test the code below:
import pyttsx
engine = pyttsx.init()
engine.say("Oh, hello!")
My experience with pyttsx is that it needs to be fed short amounts of text, otherwise the text is interrupted. I'm not sure exactly why that is, but truncating the sentences yourself and saying several phrases should suit your purpose:
engine.say("It's nice to meet you.")
engine.say("I hope you are doing well.")
engine.say("Would you like to join us ")
engine.say ("tomorrow at eight for dinner?")
But you'd need to parse the text and truncate it in a way that would keep the message intact.

Downloading Streams Simulatenously with Python 3.5

EDIT: I think I've figured out a solution using subprocess.Popen with separate .py files for each stream being monitored. It's not pretty, but it works.
I'm working on a script to monitor a streaming site for several different accounts and to record when they are online. I am using the livestreamer package for downloading a stream when it comes online, but the problem is that the program will only record one stream at a time. I have the program loop through a list and if a stream is online, start recording with subprocess.call(["livestreamer"... The problem is that once the program starts recording, it stops going through the loop and doesn't check or record any of the other livestreams. I've tried using Process and Thread, but none of these seem to work. Any ideas?
Code below. Asterisks are not literally part of code.
import os,urllib.request,time,subprocess,datetime,random
status = {
"********":False,
"********":False,
"********":False
}
def gen_name(tag):
return stuff <<Bunch of unimportant code stuff here.
def dl(tag):
subprocess.call(["livestreamer","********.com/"+tag,"best","-o",".\\tmp\\"+gen_name(tag)])
def loopCheck():
while True:
for tag in status:
data = urllib.request.urlopen("http://*******.com/" + tag + "/").read().decode()
if data.find(".m3u8") != -1:
print(tag + " is online!")
if status[tag] == False:
status[tag] = True
dl(tag)
else:
print(tag+ " is offline.")
status[tag] = False
time.sleep(15)
loopCheck()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a pause in Google-text-to-speech - python

If there is any background waits required, you can use the time module to wait as below. import time # SLEEP FOR 5 SECONDS AND START THE PROCESS time.sleep(5) Or you can do a 3 time check with wait etc.. import time for tries in range(3): if someprocess() is False: time.sleep(3)

Related

Language Translator Using Google API in Python

How can i send many videos from list in telebot python

Why google speech_v1p1beta1 output only shows the last word?

pyttsx produces no sound

Downloading Streams Simulatenously with Python 3.5

Categories

Resources