Resource has been exhausted Google Cloud Speech - python

I am trying to transcribe a 45 min long audio file with google cloud speech but I keep getting
Resource has been exhausted (e.g. check quota)
I have the free credit that the api offers you for a year. Tried it in the api explorer and in python where I will be using it but the result is the same. This is the request I send:
{
"audio": {
"uri": "gs://speech_summarization/mq.3gp"
},
"config": {
"encoding": "AMR",
"sampleRate": 8000
}
}
and response:
429
- Show headers -
{
"error": {
"code": 429,
"message": "Resource has been exhausted (e.g. check quota).",
"status": "RESOURCE_EXHAUSTED"
}
}
I saw similar problems solved by cutting the video in shorter versions but with 10 min it didn't work for me. Any ideas?

Files under 1 minute are working because, in addition to an absolute quota on audio sent to the API there are specific limits on streams of audio:
https://cloud.google.com/speech/limits
I haven't had much luck finding a free version (even on trial) of transcription products like those offered by Nuance.

Related

Python - Azure function service bus trigger batch processing

I am using Azure function service bus trigger in Python to receive messages in batch from a service bus queue. Even though this process is not well documented in Python, but I managed to enable the batch processing by following the below Github PR.
https://github.com/Azure/azure-functions-python-library/pull/73
Here is the sample code I am using -
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "msg",
"type": "serviceBusTrigger",
"direction": "in",
"cardinality": "many",
"queueName": "<some queue name>",
"dataType": "binary",
"connection": "SERVICE_BUS_CONNECTION"
}
]
}
__init__.py
import logging
import azure.functions as func
from typing import List
def main(msg: List[func.ServiceBusMessage]):
message_length = len(msg)
if message_length > 1:
logging.warn('Handling multiple requests')
for m in msg:
#some call to external web api
host.json
"version": "2.0",
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[3.3.0, 4.0.0)"
},
"extensions": {
"serviceBus": {
"prefetchCount": 100,
"messageHandlerOptions": {
"autoComplete": true,
"maxConcurrentCalls": 32,
"maxAutoRenewDuration": "00:05:00"
},
"batchOptions": {
"maxMessageCount": 100,
"operationTimeout": "00:01:00",
"autoComplete": true
}
}
}
}
After using this code , I can see that service bus trigger is picking up messages in a batch of 100 (or sometimes < 100) based on the maxMessageCount but I have also observed that most of the messages are ending up in the dead letter queue with the MaxDeliveryCountExceeded reason code. I have tried with different values of MaxDeliveryCount from 10-20 but I had the same result. So my question is do we need to adjust/optimize the MaxDeliveryCount in case of batch processing of service bus messages ? How both of them are related ? What kind of change can be done in the configuration to avoid this dead letter issue ?
From what we discussed in the comments, this is what you encounter:
Your function app is fetching 100 messages from ServiceBus (prefetchCount) and locking them for a maximum of maxAutoRenewDuration
Your function code is processing messages one at a time at a slow rate because of the API you call.
By the time you finish a batch of messages (maxMessageCount), the lock already expired which is why you have exceptions and the message gets redelivered again. This eventually causes MaxDeliveryCountExceeded errors.
What can you do to improve this?
Reduce maxMessageCount and prefetchCount
Increase maxAutoRenewDuration
Increase the performance of your API (how to do that would be a different question)
Your current code would be much better off by using a "normal" single message trigger instead of the batch trigger
PS: Beware that your function app may scale horizontally if you are running in a consumption plan, further increasing the load on your struggling API.

EventGrid-triggered Python Azure Function "ClientOtherError" and "AuthorizationError", how to troubleshoot?

For some reason, today my Python Azure Function is not firing.
Setup:
Trigger: Blob upload to storage account
Method: EventGrid
Auth: Uses System-assigned Managed Identity to auth to Storage Account
Advanced Filters:
Subject ends with .csv, .json
data.api contains "FlushWithClose"
Issue:
Upload a .csv file
No EventGrid triggered
New "ClientOtherError" and "AuthorizationError"s shown in logs
Question:
These are NEW errors and this is NEW behavior of an otherwise working Function. No changes have been recently made.
What do these errors mean?
How do I troubleshoot them?
The way I troubleshot the Function was to:
Remove ALL ADVANCED FILTERS from the EventGrid trigger
Attempt upload
Upload successful
Look at EventGrid message
The culprit (though unclear why ClientOtherError and AuthorizationError are generated here!) seems to be:
Files pushed to Azure Storage via Azure Data Factory use the FlushWithClose api.
These are the only ones I want to grab
Our automations all use ADF and if you don't have the FlushWithClose filter in place, your Functions will run 2x (because ADF causes two events on the storage but only one (flush with close) is the actual blob write.)
{
"id": "redact",
"data": {
"api": "FlushWithClose",
"requestId": "redact",
"eTag": "redact",
"contentType": "application/octet-stream",
"contentLength": 87731520,
"contentOffset": 0,
"blobType": "BlockBlob",
"blobUrl": "https://mything.blob.core.windows.net/mything/20201209/yep.csv",
"url": "https://mything.dfs.core.windows.net/mything/20201209/yep.csv",
"sequencer": "0000000000000000000000000000701b0000000000008177",
"identity": "redact",
"storageDiagnostics": {
"batchId": "redact"
}
},
"topic": "/subscriptions/redact/resourceGroups/redact/providers/Microsoft.Storage/storageAccounts/redact",
"subject": "/blobServices/default/containers/mything/blobs/20201209/yep.csv",
"event_type": "Microsoft.Storage.BlobCreated"
}
Files pushed to Azure Storage via Azure Storage Explorer (and via Azure Portal) use the PutBlob api.
{
"id": "redact",
"data": {
"api": "PutBlob",
"clientRequestId": "redact",
"requestId": "redact",
"eTag": "redact",
"contentType": "application/vnd.ms-excel",
"contentLength": 1889042,
"blobType": "BlockBlob",
"blobUrl": "https://mything.blob.core.windows.net/thing/yep.csv",
"url": "https://mything.blob.core.windows.net/thing/yep.csv",
"sequencer": "0000000000000000000000000000761d0000000000000b6e",
"storageDiagnostics": {
"batchId": "redact"
}
},
"topic": "/subscriptions/redact/resourceGroups/redact/providers/Microsoft.Storage/storageAccounts/redact",
"subject": "/blobServices/default/containers/thing/blobs/yep.csv",
"event_type": "Microsoft.Storage.BlobCreated"
}
I was testing locally with ASE instead of using our ADF automations
Thus the advanced filter for data.api was not triggering the EventGrid
Ok... but what about the errors?

How to read chrome console using python without selenium?

I would like to read chrome's js console using Python3 without any webdriver such as selenium (bot detection and stuff).
I've tried Chrome DevTools Protocol python libraries such as chromewhip, pychrome and PyChromeDevTools, but I'm unable to read any data from the console.
I want to read Runtime.consoleAPICalled or Log.entryAdded, but I don't know how to implement these callbacks as the documentation for these libraries doesn't specify any of that. Also there are no examples to be found either.
Does anyone know how to properly access these events or some other library which provides it?
#kundapanda could you at least post a snippet of the code that worked for you.
I want to read Runtime.consoleAPICalled or Log.entryAdded, but I don't
know how to implement these callbacks
The following assumes (per your question phrasing) that you're able to send and receive debug protocol messages on the stream that's open to the web debugger endpoint.
After you send debug protocol messages Runtime.enable and Log.enable messages, the Runtime.consoleAPICalled and Log.entryAdded "events" you are looking for are represented by messages you receive on the same debugging channel.
You may need to match the console event messages with the execution context (seen in the Runtime.enable response) by examining the executionContextId field in the received event messages. The log events are not associated with any single execution context. All of these "event" messages will have Id=0, which helps to recognize they're "event" messages and not response messages.
Here are a couple of sample messages received from Chrome (formatted as JSON with arbitrary field order):
Console API event message:
{
"method": "Runtime.consoleAPICalled",
"params":
{
"type": "warning",
"args": [
{ "type": "string",
"value": "Google Maps JavaScript API warning: NoApiKeys https://developers.google.com/maps/documentation/javascript/error-messages#no-api-keys"
}],
"executionContextId": 1,
"timestamp": 1618949706735.553,
"stackTrace":
{
"callFrames": [
{
"functionName": "TA.j",
"scriptId": "206",
"url": "https://maps.googleapis.com/maps-api-v3/api/js/44/10/util.js",
"lineNumber": 228,
"columnNumber": 26
} ]
}
}
},
"id": 0
}
Log event message:
{
"method":"Log.entryAdded",
"params":
{
"entry":
{
"source":"javascript",
"level":"warning",
"text":"The deviceorientation events are blocked by permissions policy. See https://github.com/w3c/webappsec-permissions-policy/blob/master/features.md#sensor-features",
"timestamp":1.6189509536801208e+12
}
},
"id": 0
}

Facebook Graph API - Service temporarily unavailable OAuthException

I am attempting to scrape the statuses of a public facebook place via version 2.11's posts endpoint. The generated URL I am calling is as follows:
https://graph.facebook.com/v2.11/41585566807/posts?access_token=XXX&fields=message%2Clink%2Ccreated_time%2Ctype%2Cname%2Cid%2Ccomments.limit%280%29.summary%28true%29%2Cshares%2Creactions.limit%280%29.summary%28true%29&limit=100&after=Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5QTBNVFU0TlRVMk5qZA3dOem90TmprM05qWTFOREEwTmprMU1qUTVOREEwTXc4TVlYQnBYM04wYjNKNVgybGtEeDAwTVRVNE5UVTJOamd3TjE4eE1ERTFNRFF6TmpjME5UZAzNOamd3T0E4RWRHbHRaUVpPcUpBRUFRPT0ZD
I am paging through chunks of 100 statuses to return them all. This method works great for the first 2600 statuses on the page, but once I try to get the next 100, I get the following error message.
{
"error": {
"message": "(#2) Service temporarily unavailable",
"type": "OAuthException",
"is_transient": true,
"code": 2,
"fbtrace_id": "EezWeUvcCXd"
}
}
Even though, is_transient is true, it lasts through hours of repeated calls (every 5 seconds) to the api. Any idea why this error message is occurring and how to avoid it?

Google Fusion Tables POST requests return 400 Bad Request using Python

I am trying to create a script Python script that allows me to both Read a Google Fusion Table and to import rows in to it. I have been working with the sample code found at https://developers.google.com/fusiontables/docs/samples/python to try and develop an understanding of how to work with fusion tables.
My problem is that the samples in here that involve making POST requests all result in a "400 Bad Request" being returned. GET requests work just fine though. As far as I can tell the requests follow the API style guide. Why would this be failing?
EDIT:
Specifically the request returns:
400 Bad Request
{
"error": {
"errors": [
{
"domain": "global",
"reason": "parseError",
"message": "Parse Error"
}
],
"code": 400,
"message": "Parse Error"
}
}

Categories