Google Cloud Speech-to-Text API - Waiting infinitely - python

I`m trying to use Google Cloud Speech-to-Text API.
I converted mp3 audio file format to .raw as I understood from API documentation, and uploaded to bucket storage.
Here is my code:
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result()
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
transcribe_gcs("gs://cloudh3-200314.appspot.com/cs.raw")
What I`m doing wrong?

I faced a similar issue, this is something to do with the format that is acceptable. Even though you may have converted into RAW, there still could be something wrong with the format, it wouldn't give you output if it can't read the file.
I recently processed a 56 min audio that took 17 mins so that should give you an idea of how long it should be.
Process your file using sox, I found the conversion parameters that work using the command -
sox basefile.mp3 -r 16000 -c 1 newfile.flac

Related

How to convert video on s3 using python without download it?

I have bunch of videos in my s3 bucket and i want to convert their format using python but Currently I'm stuck at one issue. My python script for fetching all objects of bucket is as below.
s3 = boto3.client('s3',
region_name = S3_REGION,
aws_access_key_id = S3_ACCESS_KEY_ID,
aws_secret_access_key = S3_ACCESS_SECRET_KEY)
result = s3.list_objects(Bucket = bucket_name, Prefix='videos/')
for o in result.get('Contents'):
data = s3.get_object(Bucket=bucket_name, Key=o.get('Key'))
And for conversion of video format i have used MoviePy library which convert video format to mp4
import moviepy.editor as moviepy
clip = moviepy.VideoFileClip("video-529.webm")
clip.write_videofile("converted-recording.mp4")
But problem with this library is it need a file only you can not pass s3 object as a file so i don't know how to overcome this issue if anyone have better idea for this then please help me ? How to resolve this ?.
You are correct. Libraries require the video file to be on the 'local disk', so you should use download_file() instead of get_object().
Alternatively, you could use Amazon Elastic Transcoder to transcode the file 'as a service' rather than doing it in your own code. (Charges apply, based on video length.)

Efficient upload of large amount of images to Azure Storage in Python

I need to find the optimal way to upload a large number of images (up to a few thousand) of size ~6MB per image on average. Our service is written in Python.
We have the following flow:
There is a service that has a single BlobServiceClient created. We are using CertificateCredentials to authenticate
Service is running in a container on Linux and written in Python code
Service is receiving a message that has 6 to 9 images as Numpy ndarray + JSON metadata object for each
every time we get a message we are sending all the files plus JSON files to storage using ThreadPoolExecutor with max_threads = 20
We are NOT using the async version of the library
Trimmed out and simplified code will look like this (below will not work, just an illustration, azurestorageclient is out wrapper around Azure Python SDK. It has single BlobServiceClient instance that we are using to create containers and upload blobs):
def _upload_file(self,
blob_name: str,
data: bytes,
blob_type: BlobType,
length=None):
blob_client = self._upload_container.get_blob_client(blob_name)
return blob_client.upload_blob(data, length=len(data), blob_type=BlobType.BlockBlob)
def _upload(self, executor: ThreadPoolExecutor, storage_client: AzureStorageClient,
image: ndarray, metadata: str) -> (Future, Future):
DEFAULT_LOGGER.info(f"Uploading image blob: {img_blob_name} ...")
img_upload_future = executor.submit(
self.upload_file,
blob_name=img_blob_name, byte_array=image.tobytes(),
content_type="image/jpeg",
overwrite=True,
)
DEFAULT_LOGGER.info(f"Uploading JSON blob: {metadata_blob_name} ...")
metadata_upload_future = executor.submit(
self.upload_file,
blob_name=metadata_blob_name, byte_array=metadata_json_bytes,
content_type="application/json",
overwrite=True,
)
return img_upload_future, metadata_upload_future
def send(storage_client: AzureStorageClient,
image_data: Dict[metadata, ndarray]):
with ThreadPoolExecutor(max_workers=_THREAD_SEND_MAX_WORKERS) as executor:
upload_futures = {
image_metadata: _upload(
executor=executor,
storage_client=storage_client,
image=image,
metadata=metadata
)
for metadata, image in image_data.items()
}
We observe a very bad performance of such a service when uploading files in a slow network with big signal strength fluctuations.
We are now trying to find and measure different options how to improve performance:
We will store files to HDD first and then upload them in bigger chunks from time to time
We think that uploading a single big file should perform better (e.g. 100files into zip/tar file)
We think that reducing the number of parallel jobs when the connection is bad should be also better
We consider using AzCopy instead of Python
Has anyone other suggestions or nice code samples in Python on how to work in such scenarios? Or maybe we should change a service that is used to upload data? For example use ssh to connect to VM and upload files that way (I doubt it will be faster, but got such suggestions).
Mike
According to your situation, I suggest you zip some files as a big file and upload the bigfile in chunks. Regarding how to upload the file in chunks, you can use the method BlobClient.stage_block and BlobClient.commit_block_list to implement it.
For example
block_list=[]
chunk_size=1024
with open('csvfile.csv','rb') as f:
while True:
read_data = f.read(chunk_size)
if not read_data:
break # done
blk_id = str(uuid.uuid4())
blob_client.stage_block(block_id=blk_id,data=read_data)
block_list.append(BlobBlock(block_id=blk_id))
blob_client.commit_block_list(block_list)

'Audio data must be audio data' error with google speech recognition in python

I am trying to load an audio file in python and process it with google speech recognition
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
I dont understand how it's possible to convert from one data type to another in python
The code in question is below,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
The error is:
audio_data must be audio data
How do I convert the audio object to be used in google speech recognition
#Mich, I hope you have found a solution by now. If not, please try the below.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
Librosa returns numpy array, you need to convert it back to wav. Something like this:
raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()
You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.
Try this with speech recognizer:
import speech_recognition as spr
with spr.WavFile('sample_data/metal.mp3') as source:
audio = r.record(source)
r = spr.Recognizer()
r.recognize_google(audio)

Flask Audio File to Wave Object Python

I want to convert an audio file received from flask api (of type class 'werkzeug.datastructures.FileStorage') to a Wave (https://pypi.org/project/Wave/) object. Usually, you do this by supplying a path on your comp:
import wave
wav = wave.open("test.wav", "r")
But this doesn't work as I do not want to save the audio file to my computer. This is how I get the audio file in my flask script:
audio = request.files["audio"]
Please let me know what I can do! Thanks.
You can try the following modification of your code:
audio = request.files['audio_file']
The request.files is a dictionary. The dictionary key that will allow you to retrieve the audio file is 'audio_file' instead of 'audio'.
you can use save() function
audio = request.files["audio"]
path='./videos/sample.wav';
audio.save(path)
check for further details
https://werkzeug.palletsprojects.com/en/2.0.x/datastructures/#werkzeug.datastructures.FileStorage.save

Silence/Pauses in audio file leads to Google Speech to Text transcription ending early

I am using Google Speech to Text API to convert FLAC audio files using the synchronous Recognize intent in Python3. However, when the audio file contains short pauses or silences, the transcription ends early and response does not capture the text after the pause/silence.
audiofile = self.convert_mp3_to_flac(audiofile)
with io.open(audiofile, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=24000,
language_code='en-US',
enable_automatic_punctuation=True)
response = self.client.recognize(config, audio)
The response object does not contain transcription for speech after silences in the audio file.
I expect to see the entire transcription since I am making the request with the entire audio file.
Is the source of the audio produced with a noise canceling mic? One work around is to add some white noise to the audio.

Categories