How to maintain punctuation while doing Text to speech?

How to maintain punctuation while doing Text to speech? - python

I have a table which contains HTML content to show it on Web Page.
I am trying to convert that content into text to speech after extracting plain text from it. But When I hear that converted content, there is no pause in it and voice is very continuous.
How can i maintain pauses and other things while doing Text to speech.
Example
Python Code for TTS
#app.route('/convert', methods=['POST'])
def convert():
if request.method=='POST' or request.method=='GET':
area = request.form['text']
parsed =BeautifulSoup(area).get_text()
print("******************",area)
from gtts import gTTS
# Language in which you want to convert
language = 'en'
myobj = gTTS(text=parsed, lang=language, slow=False)
# Saving the converted audio in a mp3 file named
# welcome
import uuid
file= str(uuid.uuid4())
myobj.save(file+'.mp3')
return parsed

Converting that to plain text removes the semantic info in the HTML - you'll get "1. Sustained Attention 2. Selective Attention...". You may try doing addition parsing/formatting using gtts functionality, or consider converting to SSML and using a different API (like Amazon Polly or espeak) which supports SSML.

Related

Language Translator Using Google API in Python

I have used this code from geeksforgeeks (https://www.geeksforgeeks.org/language-translator-using-google-api-in-python/), I am trying to run it and it runs without any error, and it prints out:
Speak 'hello' to initiate the Translation !
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
but when i say "hello" it does not recognize it and do not start listening for translation.
I have imported all the modules, tried updating every one of them, and also Im using a macbook m1 pro.
And heres the code:
import speech_recognition as spr
from googletrans import Translator
from gtts import gTTS
import os
# Creating Recogniser() class object
recog1 = spr.Recognizer()
# Creating microphone instance
mc = spr.Microphone()
# Capture Voice
with mc as source:
print("Speak 'hello' to initiate the Translation !")
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
recog1.adjust_for_ambient_noise(source, duration=0.2)
audio = recog1.listen(source)
MyText = recog1.recognize_google(audio)
MyText = MyText.lower()
# Here initialising the recorder with
# hello, whatever after that hello it
# will recognise it.
if 'hello' in MyText:
# Translator method for translation
translator = Translator()
# short form of english in which
# you will speak
from_lang = 'en'
# In which we want to convert, short
# form of hindi
to_lang = 'hi'
with mc as source:
print("Speak a stentence...")
recog1.adjust_for_ambient_noise(source, duration=0.2)
# Storing the speech into audio variable
audio = recog1.listen(source)
# Using recognize.google() method to
# convert audio into text
get_sentence = recog1.recognize_google(audio)
# Using try and except block to improve
# its efficiency.
try:
# Printing Speech which need to
# be translated.
print("Phase to be Translated :"+ get_sentence)
# Using translate() method which requires
# three arguments, 1st the sentence which
# needs to be translated 2nd source language
# and 3rd to which we need to translate in
text_to_translate = translator.translate(get_sentence,
src= from_lang,
dest= to_lang)
# Storing the translated text in text
# variable
text = text_to_translate.text
# Using Google-Text-to-Speech ie, gTTS() method
# to speak the translated text into the
# destination language which is stored in to_lang.
# Also, we have given 3rd argument as False because
# by default it speaks very slowly
speak = gTTS(text=text, lang=to_lang, slow= False)
# Using save() method to save the translated
# speech in capture_voice.mp3
speak.save("captured_voice.mp3")
# Using OS module to run the translated voice.
os.system("start captured_voice.mp3")
# Here we are using except block for UnknownValue
# and Request Error and printing the same to
# provide better service to the user.
except spr.UnknownValueError:
print("Unable to Understand the Input")
except spr.RequestError as e:
print("Unable to provide Required Output".format(e))

from gtts import gTTS
from io import BytesIO
from pygame import mixer
import time
def speak():
mp3_fp = BytesIO()
tts = gTTS('KGF is a Great movie to watch', lang='en')
tts.write_to_fp(mp3_fp)
tts.save("Audio.mp3")
return mp3_fp
mixer.init()
sound = speak()
sound.seek(0)
mixer.music.load(sound, "mp3")
mixer.music.play()

Azure API Not Working(sorry for the title I have no idea what's wrong)

As I said already sorry for the title. I have never worked with Azure API and have no idea what is wrong with the code, as I just copied from the documentation and put in my information.
Here is the code:
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
Well as I run this I get no error and in fact, get in my desired folder a .wav file, but this file has 0 bytes and it looks corrupted.
Now here is why I have no idea of what's wrong because if I remove this
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
So it becomes this
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
It now works all of the sudden, but with what I assume to be the basic/common voice.
So here is my question: how do I choose a voice that I want(btw is this one "en-US-JennyNeural" style="customerservice" or something among these lines)
Thank You in advance!

ChristopherNeural is not a valid voice name. The actual name of the voice is en-US-ChristopherNeural.
speech_config.speech_synthesis_voice_name = "en-US-ChristopherNeural"
This is well-documented on the Language support page of the Speech services documentation.
For other, more fine-grained control over voice characteristics, you'll require the use of SSML as outlined in text-to-speech-basics.py.

RecognitionConfig error with .flac files in google transcribe

I am trying to transcribe an audio file with google cloud. Here is my code:
from google.cloud.speech_v1 import enums
from google.cloud import speech_v1p1beta1
import os
import io
def sample_long_running_recognize(local_file_path):
client = speech_v1p1beta1.SpeechClient()
# local_file_path = 'resources/commercial_mono.wav'
# If enabled, each word in the first alternative of each result will be
# tagged with a speaker tag to identify the speaker.
enable_speaker_diarization = True
# Optional. Specifies the estimated number of speakers in the conversation.
diarization_speaker_count = 2
# The language of the supplied audio
language_code = "en-US"
config = {
"enable_speaker_diarization": enable_speaker_diarization,
"diarization_speaker_count": diarization_speaker_count,
"language_code": language_code,
"encoding": enums.RecognitionConfig.AudioEncoding.FLAC
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
# audio = {"uri": storage_uri}
operation = client.long_running_recognize(config, audio)
print(u"Waiting for operation to complete...")
response = operation.result()
for result in response.results:
# First alternative has words tagged with speakers
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
# Print the speaker_tag of each word
for word in alternative.words:
print(u"Word: {}".format(word.word))
print(u"Speaker tag: {}".format(word.speaker_tag))
sample_long_running_recognize('/Users/asi/Downloads/trimmed_3.flac')
I keep getting this error:
google.api_core.exceptions.InvalidArgument: 400 audio_channel_count `1` in RecognitionConfig must either be unspecified or match the value in the FLAC header `2`.
I cannot figure out what I am doing wrong. I have pretty much copy and pasted this from the google cloud speech API docs. Any advice?

This attribute (audio_channel_count) is the number of channels in the input audio data, and you only need to set this for MULTI-CHANNEL recognition. I would assume that this is your case, so as the message suggests, you need to set 'audio_channel_count' : 2 in your config to exactly match your audio file.
Please take a look on the source code for more information about the attributes for RecognitionConfig object.

Adding a pause in Google-text-to-speech

I am looking for a small pause, wait, break or anything that will allow for a short break (looking for about 2 seconds +-, configurable would be ideal) when speaking out the desired text.
People online have said that adding three full stops followed by a space creates a break but I don't seem to be getting that. Code below is my test that has no pauses, sadly.. Any ideas or suggestions?
Edit: It would be ideal if there is some command from gTTS that would allow me to do this, or maybe some trick like using the three full stops if that actually worked.
from gtts import gTTS
import os
tts = gTTS(text=" Testing ... if there is a pause ... ... ... ... ... longer pause? ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... insane pause " , lang='en', slow=False)
tts.save("temp.mp3")
os.system("temp.mp3")

Ok, you need Speech Synthesis Markup Language (SSML) to achieve this.
Be aware you need to setting up Google Cloud Platform credentials
first in the bash:
pip install --upgrade google-cloud-texttospeech
Then here is the code:
import html
from google.cloud import texttospeech
def ssml_to_audio(ssml_text, outfile):
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Writes the synthetic audio to the output file.
with open(outfile, "wb") as out:
out.write(response.audio_content)
print("Audio content written to file " + outfile)
def text_to_ssml(inputfile):
raw_lines = inputfile
# Replace special characters with HTML Ampersand Character Codes
# These Codes prevent the API from confusing text with
# SSML commands
# For example, '<' --> '<' and '&' --> '&'
escaped_lines = html.escape(raw_lines)
# Convert plaintext to SSML
# Wait two seconds between each address
ssml = "<speak>{}</speak>".format(
escaped_lines.replace("\n", '\n<break time="2s"/>')
)
# Return the concatenated string of ssml script
return ssml
text = """Here are <say-as interpret-as="characters">SSML</say-as> samples.
I can pause <break time="3s"/>.
I can play a sound"""
ssml = text_to_ssml(text)
ssml_to_audio(ssml, "test.mp3")
More documentation:
Speaking addresses with SSML
But if you don't have Google Cloud Platform credentials, the cheaper and easier way is to use time.sleep(1) method

If there is any background waits required, you can use the time module to wait as below.
import time
# SLEEP FOR 5 SECONDS AND START THE PROCESS
time.sleep(5)
Or you can do a 3 time check with wait etc..
import time
for tries in range(3):
if someprocess() is False:
time.sleep(3)

You can save multiple mp3 files, then use time.sleep() to call each with your desired amount of pause:
from gtts import gTTS
import os
from time import sleep
tts1 = gTTS(text="Testingn" , lang='en', slow=False)
tts2 = gTTS(text="if there is a pause" , lang='en', slow=False)
tts3 = gTTS(text="insane pause " , lang='en', slow=False)
tts1.save("temp1.mp3")
tts2.save("temp2.mp3")
tts3.save("temp3.mp3")
os.system("temp1.mp3")
sleep(2)
os.system("temp2.mp3")
sleep(3)
os.system("temp3.mp3")

Sadly the answer is no, gTTS package has no additional function for pause,an issue already been created in 2018 for adding a pause function ,but it is smart enough to add natural pauses by tokenizer.
What is tokenizer?
Function that takes text and returns it split into a list of tokens (strings). In the gTTS context, its goal is
to cut the text into smaller segments that do not exceed the maximum character size allowed(100) for each TTS API
request, while making the speech sound natural and continuous. It does so by splitting text where speech would
naturaly pause (for example on ".") while handling where it should not (for example on “10.5” or “U.S.A.”).
Such rules are called tokenizer cases, which it takes a list of.
Here is an example:
text = "regular text speed no pause regular text speed comma pause, regular text speed period pause. regular text speed exclamation pause! regular text speed ellipses pause... regular text speed new line pause \n regular text speed "
So in this case, adding a sleep() seems like the only answer. But tricking the tokenizer is worth mentioning.

You can add arbitrary pause with Pydub by saving and concatenating temporary mp3. Then you can use a silent audio for your pause.
You can use any break point symbols of your choice where you want to add pause (here $):
from pydub import AudioSegment
from gtts import gTTS
contents = "Hello with $$ 2 seconds pause"
contents.split("$") # I have chosen this symbol for the pause.
pause2s = AudioSegment.from_mp3("silent.mp3")
# silent.mp3 contain 2s blank mp3
cnt = 0
for p in parts:
# The pause will happen for the empty element of the list
if not p:
combined += pause2s
else:
tts = gTTS(text=p , lang=langue, slow=False)
tmpFileName="tmp"+str(cnt)+".mp3"
tts.save(tmpFileName)
combined+=AudioSegment.from_mp3(tmpFileName)
cnt+=1
combined.export("out.mp3", format="mp3")

Late to the party here, but you might consider trying out the audio_program_generator package. You provide a text file comprised of individual phrases, each of which has a configurable pause at the end. In return, it gives you an mp3 file that 'stitches together' all the phrases and their pauses into one continuous audio file. You can optionally mix in a background sound-file, as well. And it implements several of the other bells and whistles that Google TTS provides, like accents, slow-play-speech, etc.
Disclaimer: I am the author of the package.

I had the same problem, and didn't want to use lots of temporary files on disk. This code parses an SSML file, and creates silence whenever a <break> tag is found:
import io
from gtts import gTTS
import lxml.etree as etree
import pydub
ssml_filename = 'Section12.35-edited.ssml'
wav_filename = 'Section12.35-edited.mp3'
events = ('end',)
DEFAULT_BREAK_TIME = 250
all_audio = pydub.AudioSegment.silent(100)
for event, element in etree.iterparse(
ssml_filename,
events=events,
remove_comments=True,
remove_pis=True,
attribute_defaults=True,
):
tag = etree.QName(element).localname
if tag in ['p', 's'] and element.text:
tts = gTTS(element.text, lang='en', tld='com.au')
with io.BytesIO() as temp_bytes:
tts.write_to_fp(temp_bytes)
temp_bytes.seek(0)
audio = pydub.AudioSegment.from_mp3(temp_bytes)
all_audio = all_audio.append(audio)
elif tag == 'break':
# write silence to the file.
time = element.attrib.get('time', None) # Shouldn't be possible to have no time value.
if time:
if time.endswith('ms'):
time_value = int(time.removesuffix('ms'))
elif time.endswith('s'):
time_value = int(time.removesuffix('s')) * 1000
else:
time_value = DEFAULT_BREAK_TIME
else:
time_value = DEFAULT_BREAK_TIME
silence = pydub.AudioSegment.silent(time_value)
all_audio = all_audio.append(silence)
with open(wav_filename, 'wb') as output_file:
all_audio.export(output_file, format='mp3')

I know 4Rom1 used this method above, but to put it more simply, I found this worked really well for me. Get a 1 sec silent mp3, I found one by googling 1 sec silent mp3. Then use pydub to add together audio segments however many times you need. For example to add 3 seconds of silence
from pydub import AudioSegment
seconds = 3
output = AudioSegment.from_file("yourfile.mp3")
output += AudioSegment.from_file("1sec_silence.mp3") * seconds
output.export("newaudio.mp3", format="mp3")

Python - How to test similarity between strings and only print new strings?

I have developed a webscraper with beautiful soup that scrapes news from a website and then sends them to a telegram bot. Every time the program runs it picks up all the news currently on the news web page, and I want it to just pick the new entries on the news and send only those.
How can I do this? Should I use a sorting algorithm of some sort?
Here is the code:
#Lib requests
import requests
import bs4
fonte = requests.get('https://www.noticiasaominuto.com/')
soup = bs4.BeautifulSoup(fonte.text, 'lxml')
body = soup.body
for paragrafo in body.find_all('p', class_='article-thumb-text'):
print(paragrafo.text)
conteudo = paragrafo.text
id = requests.get('https://api.telegram.org/bot<TOKEN>/getUpdates')
chat_id = id.json()['result'][0]['message']['from']['id']
print(chat_id)
msg = requests.post('https://api.telegram.org/bot<TOKEN>/sendMessage', data = {'chat_id': chat_id ,'text' : conteudo})

You need to keep track of articles that you have seen before, either by using a full database solution or by simply saving the information in a file. The file needs to be read before starting. The website is then scraped and compared against the existing list. Any articles not in the list are added to the list. At the end, the updated list is saved back to the file.
Rather that storing the whole text in the file, a hash of the text can be saved instead. i.e. convert the text into a unique number, in this case a hex digest is used to make it easier to save to a text file. As each hash will be unique, they can be stored in a Python set to speed up the checking:
import hashlib
import requests
import bs4
import os
# Read in hashes of past articles
db = 'past.txt'
if os.path.exists(db):
with open(db) as f_past:
past_articles = set(f_past.read().splitlines())
else:
past_articles = set()
fonte = requests.get('https://www.noticiasaominuto.com/')
soup = bs4.BeautifulSoup(fonte.text, 'lxml')
for paragrafo in soup.body.find_all('p', class_='article-thumb-text'):
m = hashlib.md5(paragrafo.text.encode('utf-8'))
if m.hexdigest() not in past_articles:
print('New {} - {}'.format(m.hexdigest(), paragrafo.text))
past_articles.add(m.hexdigest())
# ...Update telegram here...
# Write updated hashes back to the file
with open(db, 'w') as f_past:
f_past.write('\n'.join(past_articles))
The first time this is run, all articles will be displayed. The next time, no articles will be displayed until the website is updated.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to maintain punctuation while doing Text to speech? - python

Related

Language Translator Using Google API in Python

Azure API Not Working(sorry for the title I have no idea what's wrong)

RecognitionConfig error with .flac files in google transcribe

Adding a pause in Google-text-to-speech

Python - How to test similarity between strings and only print new strings?

Categories

Resources