IBM Watson Text to Speech API Python

IBM Watson Text to Speech API Python - python

I'm trying to adjust the pitch of IBM Watson but I can't seem to find any documentation on this whatsoever.
If you visit this link then you can see that there is an option to adjust the pitch/speed.
The code I have is very simply this:
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('api_key')
text_to_speech = TextToSpeechV1(
authenticator=authenticator
)
text_to_speech.set_service_url('service_url')
sample = "insert what you want to say here"
with open('test.wav', 'wb') as audio_file:
audio_file.write(
text_to_speech.synthesize(
sample,
voice='en-GB_JamesV3Voice',
accept='audio/wav'
).get_result().content)
I have literally no idea what parameters to adjust in order to make the voice low. Thank you so much!

What you are looking for is the prosody element. Neural voices (V3) only use the pitch and rate attribute.
Using your example:
sample = 'Here is a <prosody pitch="150Hz"> modified pitch </prosody> example.'
sample = 'Here is a <prosody rate="x-slow"> modified rate </prosody> example.'

And here is a link to the docs about the prosody element:
https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-elements#prosody_element

Related

Azure API Not Working(sorry for the title I have no idea what's wrong)

As I said already sorry for the title. I have never worked with Azure API and have no idea what is wrong with the code, as I just copied from the documentation and put in my information.
Here is the code:
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
Well as I run this I get no error and in fact, get in my desired folder a .wav file, but this file has 0 bytes and it looks corrupted.
Now here is why I have no idea of what's wrong because if I remove this
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
So it becomes this
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
It now works all of the sudden, but with what I assume to be the basic/common voice.
So here is my question: how do I choose a voice that I want(btw is this one "en-US-JennyNeural" style="customerservice" or something among these lines)
Thank You in advance!

ChristopherNeural is not a valid voice name. The actual name of the voice is en-US-ChristopherNeural.
speech_config.speech_synthesis_voice_name = "en-US-ChristopherNeural"
This is well-documented on the Language support page of the Speech services documentation.
For other, more fine-grained control over voice characteristics, you'll require the use of SSML as outlined in text-to-speech-basics.py.

how to read mp3 data from google cloud using python

I am trying to read mp3/wav data from google cloud and trying to implement audio diarization technique. Issue is that I am not able to read the result which has passed by google api in variable response.
below is my python code
speech_file = r'gs://pp003231/a4a.wav'
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
audio = speech.types.RecognitionAudio(uri=speech_file)
response = client.long_running_recognize(config, audio)
print response
result = response.results[-1]
print result
Output displayed on console is
Traceback (most recent call last):
File "a1.py", line 131, in
print response.results
AttributeError: 'Operation' object has no attribute 'results'
Can you please share your expert advice about what I am doing wrong?
Thanks for your help.

Its too late for the author of this thread. However, posting the solution for someone in future as I too had similar issue.
Change
result = response.results[-1]
to
result = response.result().results[-1]
and it will work fine

Do you have access to the wav file in your bucket? also, this is the entire code? It seems that the sample_rate_hertz and the imports are missing. Here you have the code copy/pasted from the google docs samples, but I edited it to have just the diarization function.
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://<YOUR_BUCKET/<YOUR_WAV_FILE>")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
To run the code just name it diarization.py and use the command:
python diarization.py
Also, you have to install the latest google-cloud-speech library:
pip install --upgrade google-cloud-speech
And you need to have the credentials of your service account in a json file, you can check more info here

Python IBM Watson NLU Sentiment Analysis - TypeError: cannot convert dic

Using the following code, I get the error message
TypeError: cannot convert dictionary update sequence element #0 to a sequence
The code used is he following:
import watson_developer_cloud as WDC
import watson_developer_cloud.natural_language_understanding.features.v1 as nluFeatures
#from wdc_config import nlu_config
nlu = WDC.NaturalLanguageUnderstandingV1('2017-02-27',username='myusernamehere',password='mypasswordhere')
data = ([1, "I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhauser Gate. All those moments will be lost in time, like tears in rain. Time to die."])
def nlu_analyze():
response = nlu.analyze(text=data,features=[nluFeatures.Keywords(),nluFeatures.Entities(),nluFeatures.Categories(),nluFeatures.Emotion(),nluFeatures.Sentiment()])
return response
response = nlu_analyze()
print(response["keywords"])
print(response["entities"])
print(response["categories"])
print(response["emotion"])
print(response["sentiment"])
Why do I get this error?
Solved
Thanks to chughts this problem is solved.
import watson_developer_cloud as WDC
from watson_developer_cloud.natural_language_understanding_v1 import Features, KeywordsOptions, EntitiesOptions, CategoriesOptions, EmotionOptions, SentimentOptions
nlu = WDC.NaturalLanguageUnderstandingV1('2017-02-27',username='yourusername',password='yourpassword')
data = ("I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhauser Gate. All those moments will be lost in time, like tears in rain. Time to die.")
def nlu_analyze():
response = nlu.analyze(text=data, features=Features(keywords=KeywordsOptions(), entities=EntitiesOptions(), categories=CategoriesOptions(), emotion=EmotionOptions(), sentiment=SentimentOptions()))
return response
response = nlu_analyze()
print(response["keywords"])
print(response["entities"])
print(response["categories"])
print(response["emotion"])
print(response["sentiment"])

I think the problem lies in your import of the features
import watson_developer_cloud.natural_language_understanding.features.v1 as nluFeatures
which according to the API documentation - https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/?python#post-analyze
should be
from watson_developer_cloud.natural_language_understanding_v1 \
import Features, KeywordsOptions, EntitiesOptions, CategoriesOptions, EmotionOptions, SentimentOptions
and
response = nlu.analyze(text=data,features=[nluFeatures.Keywords(),nluFeatures.Entities(),nluFeatures.Categories(),nluFeatures.Emotion(),nluFeatures.Sentiment()])
would become
response = nlu.analyze(text=data,features=Features(keywords=KeywordsOptions(),entities=EntitiesOptions(),categories=CategoriesOptions(),emotion=EmotionOptions(),sentiment=SentimentOptions()))

How to use unofficial Google Trend API ( pyGTrends.py)

I'm starting to learn python to make a program for crawling the web data. So I was googling and I found the google trend API, pyGTrend.py. But I can't use it.
I can found the same problem in google but no solution which I can understand.
Please help me.
I just used the API as written at the API owner's website: Programmatic Google Trends Api
from pyGTrends import pyGTrends
connector = pyGTrends('googleID','passwaord')
connector.download_report(('banana', 'bread', 'bakery'),date='2008-4',geo='AT',scale=1)
print connector.csv()
error message is below,
Traceback(most recent call last):
File ('Stdin') line1, in <Module>
File "C:\Pyhon27\Lib\site-pacakage\pyGTrends.py" line 115, in csv
KeyError: 'main'

you need to call it like this
from pytrends.pyGTrends import pyGTrends

Here's an example of how to use it. Let me know if you would need further assistance:
from pytrends.pyGTrends import pyGTrends
import time
from random import randint
from IPython.display import display
from pprint import pprint
import urllib
import sys
google_username = "GMAIL_USERNAME"
google_password = "PASSWORD"
path = "."
terms = [
"Image Processing",
"Signal Processing",
"Computer Vision",
"Machine Learning",
"Information Retrieval",
"Data Mining"
]
# connect to Google Trends API
connector = pyGTrends(google_username, google_password)
for label in terms:
print(label)
sys.stdout.flush()
#kw_string = '"{0}"'.format(keyword, base_keyword)
connector.request_report(label, geo="US", date="01/2014 96m")
# wait a random amount of time between requests to avoid bot detection
time.sleep(randint(5, 10))
# download file
connector.save_csv(path, label)
for term in terms:
data = connector.get_suggestions(term)
pprint(data)

Python script for "Google search by image"

I have checked Google Search API's and it seems that they have not released any API for searching "Images". So, I was wondering if there exists a python script/library through which I can automate the "search by image feature".

This was annoying enough to figure out that I thought I'd throw a comment on the first python-related stackoverflow result for "script google image search". The most annoying part of all this is setting up your proper application and custom search engine (CSE) in Google's web UI, but once you have your api key and CSE, define them in your environment and do something like:
#!/usr/bin/env python
# save top 10 google image search results to current directory
# https://developers.google.com/custom-search/json-api/v1/using_rest
import requests
import os
import sys
import re
import shutil
url = 'https://www.googleapis.com/customsearch/v1?key={}&cx={}&searchType=image&q={}'
apiKey = os.environ['GOOGLE_IMAGE_APIKEY']
cx = os.environ['GOOGLE_CSE_ID']
q = sys.argv[1]
i = 1
for result in requests.get(url.format(apiKey, cx, q)).json()['items']:
link = result['link']
image = requests.get(link, stream=True)
if image.status_code == 200:
m = re.search(r'[^\.]+$', link)
filename = './{}-{}.{}'.format(q, i, m.group())
with open(filename, 'wb') as f:
image.raw.decode_content = True
shutil.copyfileobj(image.raw, f)
i += 1

There is no API available but you are can parse the page and imitate the browser, but I don't know how much data you need to parse because google may limit or block access.
You can imitate the browser by simply using urllib and setting correct headers, but if you think parsing complex web-pages may be difficult from python, you can directly use a headless browser like phontomjs, inside a browser it is trivial to get correct elements using javascript/DOM
Note before trying all this check google's TOS

You can try this:
https://developers.google.com/image-search/v1/jsondevguide#json_snippets_python
It's deprecated, but seems to work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

IBM Watson Text to Speech API Python - python

What you are looking for is the prosody element. Neural voices (V3) only use the pitch and rate attribute. Using your example: sample = 'Here is a <prosody pitch="150Hz"> modified pitch </prosody> example.' sample = 'Here is a <prosody rate="x-slow"> modified rate </prosody> example.'

And here is a link to the docs about the prosody element: https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-elements#prosody_element

Related

Azure API Not Working(sorry for the title I have no idea what's wrong)

how to read mp3 data from google cloud using python

Python IBM Watson NLU Sentiment Analysis - TypeError: cannot convert dic

How to use unofficial Google Trend API ( pyGTrends.py)

Python script for "Google search by image"

Categories

Resources