IBM Watson SpechtoTextV1 error - Python - python

I have been trying my hands on IBM Watson speechtotext api. However, it works with short length audio files, but not with audio files which are around 5 mins. It's giving me below error
"watson {'code_description': 'Bad Request', 'code': 400, 'error': 'No speech detected for 30s.'}"
I am using Watson's trial account. Is there a limitation in case of trial account? or bug in below code.
Python code:-
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
username='XXX',
password='XXX',
x_watson_learning_opt_out=False
)
with open('trial.flac', 'rb') as audio_file:
print(speech_to_text.recognize(audio_file, content_type='audio/flac', model='en-US_NarrowbandModel', timestamps=False, word_confidence=False, continuous=True))
Appreciate any help!

Please see the implementation notes from the Speech to Text API Explorer for the recognize API you are attempting to use:
Implementation Notes
Sends audio and returns transcription results for
a sessionless recognition request. Returns only the final results; to
enable interim results, use session-based requests or the WebSocket
API. The service imposes a data size limit of 100 MB. It automatically
detects the endianness of the incoming audio and, for audio that
includes multiple channels, downmixes the audio to one-channel mono
during transcoding.
Streaming mode
For requests to transcribe live
audio as it becomes available or to transcribe multiple audio files
with multipart requests, you must set the Transfer-Encoding header to
chunked to use streaming mode. In streaming mode, the server closes
the connection (status code 408) if the service receives no data chunk
for 30 seconds and the service has no audio to transcribe for 30
seconds. The server also closes the connection (status code 400) if no
speech is detected for inactivity_timeout seconds of audio (not
processing time); use the inactivity_timeout parameter to change the
default of 30 seconds.
There are two factors here. First there is a data size limit of 100 MB, so I would make sure you do not send files larger then that to the Speech to Text service. Secondly, you can see the server will close the connection and return a 400 error if there is no speech detected for the amount of seconds defined for inactivity_timeout. It seems the default value is 30 seconds, so this matches the error you are seeing above.
I would suggest you make sure there is valid speech in the first 30 seconds of your file and/or make the inactivity_timeout parameter larger to see if the problem still exists. To make things easier, you can test the failing file and other sound files by using the API Explorer in a browser:
Speech to Text API Explorer

In the API documentation, there is this python code, it will avoid to close the server when the default 30s finishes, and works for other errors too.
It's like a "try and except" with the extra step of instanciating the function in a class.
def on_error(self, error):
print('Error received: {}'.format(error))
Here it is the link
https://cloud.ibm.com/apidocs/speech-to-text?code=python

Related

Get the size of a single message in Google Cloud PubSub

I have a setup where I am publishing messages to Google Cloud PubSub service.
I wish to get the size of each individual message that I am publishing to PubSub. So for this, I identified the following approaches (Note: I am using the Python clients for publishing and subscribing, following a line-by-line implementation as presented in their documentation):
View the message count from the Google Cloud Console using the 'Monitoring' feature
Create a pull subscription client and view the size using message.size in the callback function for the messages that are being pulled from the requested topic.
Estimate the size of the messages before publishing by converting them to JSON as per the PubSub message schema and using sys.getsizeof()
For a sample message like as follows which I published using a Python publisher client:
{
"data": 'Test_message',
"attributes": {
'dummyField1': 'dummyFieldValue1',
'dummyField2': 'dummyFieldValue2'
}
}
, I get the size as 101 as the message.size output from the following callback function in the subcription client:
def callback(message):
print(f"Received {message.data}.")
if message.attributes:
print("Attributes:")
for key in message.attributes:
value = message.attributes.get(key)
print(f"{key}: {value}")
print(message.size)
message.ack()
Whereas the size displayed on Cloud Console Monitoring is something around 79 B.
So these are my questions:
Why are the sizes different for the same message?
Is the output of message.size in bytes?
How do I view the size of a message before publishing using the python client?
How do I view the size of a single message on the Cloud Console, rather than a aggregated measure of size during a given timeframe which I could find in the Monitoring section?
In order to further contribute to the community, I am summarising our discussion as an answer.
Regarding message.size, it is an attribute from a message in the subscriber client. In addition, according to the documentation, its definition is:
Returns the size of the underlying message, in bytes
Thus you would not be able to use it before publishing.
On the opposite side, message_size is a metric in Google Cloud Metrics and it is used by Cloud Monitoring, here.
Finally, the last topic discussed was that your aim is to monitor your quota expenditure, so you can stay in the free tier. For this reason, the best option would be using Cloud Monitoring and setup alerts based on the metrics such as pubsub.googleapis.com/topic/byte_cost. Here are some links, where you can find more about it: Quota utilisation, Alert event based, Alert Policies.
Regarding your third question about viewing the message size before publishing, the billable message size is the sum of the message data, the attributes (key plus value), 20 bytes for the timestamp, and some bytes for the message_id. See the Cloud Pub/Sub Pricing guide. Note that the minimum of 1000 bytes is billable regardless of message size so if your messages may be smaller than 1000 bytes it’s important to have good batch settings. The message_id is assigned server-side and is not guaranteed to be a certain size but it is returned by the publish call as a future so you can see examples. This should allow you to get a pretty accurate estimate of message cost within the publisher client. Note that you can also use the monitoring client library to read Cloud Monitoring metrics from within the Python client.
Regarding your fourth question, there’s no way to extract single data points from a distribution metric (Unless you have only published one message during the time period in the query in which case the mean would tell you the size of that one message).

How Do I Download 30,000 Images Using Google Drive API?

I need to download 30,000 images using the Google Drive API (I have all of their file_ids saved locally) so that I can upload them to AWS S3, but after only 20-30 image requests to the API, I get a 403 error, which means I'm exceeding the API Quota (1,000 requests per 100sec per user - not sure how I'm exceeding it but that's besides the point). My code sleeps for 2 seconds between each request and I still get this error. I need to download and upload these files in a reasonable amount of time, any suggestions?
I am not sure which library are you using to get the request. But as per my understanding urlopen will raise an HTTPError for those it can’t handle like `‘403’ (request forbidden).
Reference - List of Errors
403: ('Forbidden',
'Request forbidden -- authorization will not help').
Instead you can use - urlretrieve()
Sharing a small code sample : -
import urllib.request
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read() # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
Downloading images with Drive API will count as one request per image, so can easily surpass the quota limit
Luckily there is a workaround - you can use batch requests which allows you to download up to 100 images with one request.
The documentaiton provides samples for implementation in Python.
Btw., you can check your quota usage in your GCP console.

openshift 3 django - request too large

I migrated a django app from Openshift 2 to Openshift3 Online. It has an upload feature that allows users to upload audio files. The files are usually larger than 50MB. In Openshift3 if I try to upload the file it only works for files up to around 12 MB. Larger than 12 MB leads to an error message in the firefox saying "connection canceled". Chromium gives more details:
Request Entity Too Large
The requested resource
/myApp/upload
does not allow request data with POST requests, or the amount of data provided in the request exceeds the capacity limit.
I'm using wsgi_mod-express. From searching on this error message on google, I could see that it I'm probably hitting any limit in the webserver configuration. Which limit could that be and how would I be able to change it?
As per help messages from running mod_wsgi-express start-server --help:
--limit-request-body NUMBER
The maximum number of bytes which are allowed in a
request body. Defaults to 10485760 (10MB).
Change your app.sh to add the option and set it to a larger value.

Kinesis python client drops any message with "\x" escape?

I'm using boto3 (version 1.4.4) to talk to Amazon's Kinesis API:
import boto3
kinesis = boto3.client('kinesis')
# write a record with data '\x08' to the test stream
response = kinesis.put_record(StreamName='test', Data=b'\x08', PartitionKey='foobar')
print(response['ResponseMetadata']['HTTPStatusCode']) # 200
# now read from the test stream
shard_it = kinesis.get_shard_iterator(StreamName="test", ShardId='shardId-000000000000', ShardIteratorType="LATEST")["ShardIterator"]
response = kinesis.get_records(ShardIterator=shard_it, Limit=10)
print(response['ResponseMetadata']['HTTPStatusCode']) # 200
print(response['Records']) # []
When I test it with any data without the \x escape I'm able to get back the record as expected. Amazon boto3's doc says that "The data blob can be any type of data; for example, a segment from a log file, geographic/location data, website clickstream data, and so on." then why is the message with \x escaped characters dropped? Am I expected to '\x08'.encode('string_escape') before sending the data to kinesis?
If you are interested, I have characters like \x08 in the message data because I'm trying to write a serialized protocol buffer message to a Kinesis stream.
Okay so I finally figured it out. The reason why it wasn't working was because my botocore was on version 1.4.62. I only realized it because another script that ran fine on my colleague's machine was throwing exceptions on mine. We had the same boto3 version but different botocore versions. After I pip install botocore==1.5.26 both the other script and my kinesis put_record started working.
tldr: botocore 1.4.62 is horribly broken in many ways so upgrade NOW. I can't believe how much of my life is wasted by outdated broken libraries. I wonder if Amazon dev can unpublish broken versions of the client?

Why Google Speech Recognition API only return first 2-3 seconds converted text of audio

I created a project in Google Cloud Console, and enabled Google Speech API in this project, and create credentials.
Also used the transcribe.py recommended by Google,
https://cloud.google.com/speech/docs/samples
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/speech
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py
I can use it with API key generated by Google could console to successfully translate audio file(30 seconds) into text, but not fully, only first 2-3 seconds. My account now is of free trial, so I doubt whether it is because of my account type( free trial).
Response from google is like
{"results": [{"alternatives": [{"confidence": 0.89569235, "transcript": "I've had a picnic in the forest and I'm going home so come on with me"}]}]}
The audio file is wav file with format( printed by ffprobe )
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
Audio file has been uploaded in google drive, link is here
https://drive.google.com/file/d/0B3koIsnLksOLQXhvQ1ljS0dDXzg/view?usp=sharing
Anybody know whats wrong with above process/steps? or this is bug google speech recognition api?
Using the Google APIs Explorer with the Cloud Speech API service, it was possible to isolate the following relevant speech recognition results by analyzing separate samples of your audio file:
Cut 1 : 0 - 00'08"015 , Result 9 : "I've had a picnic in the forest and I'm going home so come on come with me"
Cut 2 : 00'08"732 - 00'11"184 , Result 2 : "listen what's that"
Cut 3 : 00’13”500 - Till end , Result 2 : "what is it look"
These results were obtained using the following Configuration:
“config”: {
“encoding”: “LINEAR16”,
“sampleRate”: 16000,
“maxAlternatives”: “30”,
}
As a matter of fact, there exists known issues with the speech API that is currently in Beta and so may prevent the transcription from working correctly (regardless if the account is on a paid or free trial basis). As described in the following best practices, there would be two issues to consider in your case:
A background music is playing throughout the speech recording which may create enough background noise to reduce the transcription accuracy. (Note that the Speech API was designed to transcribe the text of users dictating to an application’s microphone)
As advised further, it is recommended to split the audio when it is captured from more than one person. In your case, the frog’s sound may be detected as a different human voice and so also impact on the transcription accuracy.
Considering these two known issues, it would be important to remove any noise and only process the uniform speech originating from the protagonist of your recording. Alternatively, you can split the recording and try to transcribe individually each separate parts of the recording containing the voice of a single character.
I had a similar issue, but using one of the Enhanced models was able to get the complete transcription
config = {
...
use_enhanced: true,
model: "phone_call"
}
you can check more on: https://cloud.google.com/speech-to-text/docs/phone-model

Categories