Flask Audio File to Wave Object Python - python

I want to convert an audio file received from flask api (of type class 'werkzeug.datastructures.FileStorage') to a Wave (https://pypi.org/project/Wave/) object. Usually, you do this by supplying a path on your comp:
import wave
wav = wave.open("test.wav", "r")
But this doesn't work as I do not want to save the audio file to my computer. This is how I get the audio file in my flask script:
audio = request.files["audio"]
Please let me know what I can do! Thanks.

You can try the following modification of your code:
audio = request.files['audio_file']
The request.files is a dictionary. The dictionary key that will allow you to retrieve the audio file is 'audio_file' instead of 'audio'.

you can use save() function
audio = request.files["audio"]
path='./videos/sample.wav';
audio.save(path)
check for further details
https://werkzeug.palletsprojects.com/en/2.0.x/datastructures/#werkzeug.datastructures.FileStorage.save

Related

How can I convert a .wav to .mp3 in-memory?

I have a numpy array from a some.npy file that contains data of an audio file that is encoded in the .wav format.
The some.npy was created with sig = librosa.load(some_wav_file, sr=22050) and np.save('some.npy', sig).
I want to convert this numpy array as if its content was encoded with .mp3 instead.
Unfortunately, I am restricted to the use of in-memory file objects for two reasons.
I have many .npy files. They are cached in advance and it would be highly inefficient to have that much "real" I/O when actually running the application.
Conflicting access rights of people who are executing the application on a server.
First, I was looking for a way to convert the data in the numpy array directly, but there seems to be no library function. So is there a simple way to achieve this with in-memory file objects?
NOTE: I found this question How to convert MP3 to WAV in Python and its solution could be theoretically adapted but this is not in-memory.
You can read and write memory using BytesIO, like this:
import BytesIO
# Create "in-memory" buffer
memoryBuff = io.BytesIO()
And you can read and write MP3 using pydub module:
from pydub import AudioSegment
# Read a file in
sound = AudioSegment.from_wav('stereo_file.wav')
# Write to memory buffer as MP3
sound.export(memoryBuff, format='mp3')
Your MP3 data is now available at memoryBuff.getvalue()
You can convert between AudioSegments and Numpy arrays using this answer.
I finally found a working solution. This is what I wanted.
from pydub import AudioSegment
wav = np.load('some.npy')
with io.BytesIO() as inmemoryfile:
compression_format = 'mp3'
n_channels = 2 if wav.shape[0] == 2 else 1 # stereo and mono files
AudioSegment(wav.tobytes(), frame_rate=my_sample_rate, sample_width=wav.dtype.itemsize,
channels=n_channels).export(inmemoryfile, format=compression_format)
wav = np.array(AudioSegment.from_file_using_temporary_files(inmemoryfile)
.get_array_of_samples())
There exists a wrapper package (audiosegment) with which one could convert the last line to:
wav = audiosegment.AudioSegment.to_numpy_array(AudioSegment.from_file_using_temporary_files(inmemoryfile))

'Audio data must be audio data' error with google speech recognition in python

I am trying to load an audio file in python and process it with google speech recognition
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
I dont understand how it's possible to convert from one data type to another in python
The code in question is below,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
The error is:
audio_data must be audio data
How do I convert the audio object to be used in google speech recognition
#Mich, I hope you have found a solution by now. If not, please try the below.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
Librosa returns numpy array, you need to convert it back to wav. Something like this:
raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()
You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.
Try this with speech recognizer:
import speech_recognition as spr
with spr.WavFile('sample_data/metal.mp3') as source:
audio = r.record(source)
r = spr.Recognizer()
r.recognize_google(audio)

Google Cloud Speech-to-Text API - Waiting infinitely

I`m trying to use Google Cloud Speech-to-Text API.
I converted mp3 audio file format to .raw as I understood from API documentation, and uploaded to bucket storage.
Here is my code:
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result()
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
transcribe_gcs("gs://cloudh3-200314.appspot.com/cs.raw")
What I`m doing wrong?
I faced a similar issue, this is something to do with the format that is acceptable. Even though you may have converted into RAW, there still could be something wrong with the format, it wouldn't give you output if it can't read the file.
I recently processed a 56 min audio that took 17 mins so that should give you an idea of how long it should be.
Process your file using sox, I found the conversion parameters that work using the command -
sox basefile.mp3 -r 16000 -c 1 newfile.flac

Record RTSP stream to file (.wav)

I'm trying to save X seconds from a audio stream to a file. I have a RTSP server, and I made a simple script in python to save several seconds from this server to record in a file (.wav).
def main():
########################### MAIN INIT ###########################
instance = vlc.Instance("-vvv", "--no-video", "--clock-jitter=0", "--sout-audio", "--sout",
"#transcode{acodec=s16l,channels=2}:std{access=file,mux=wav,dst=test.wav}")
# Create a MediaPlayer with the default instance
player = instance.media_player_new()
# Load the media file
media = instance.media_new("rtsp://XXX.XX.XXX.XX:YYYY/")
# Add the media to the player
player.set_media(media)
# Play for 10 seconds then exit
player.play()
time.sleep(10)
if __name__ == '__main__':
main()
But when I run the script it creates the file "test.wav" but it's a text plane file instead of wav, what it's I'm waiting for.
Log show me next info:
[00000000022aec08] core input error: ES_OUT_RESET_PCR called
[00007f6704040518] core decoder error: cannot continue streaming due to errors
So I really appreciate someone who can help me.
Thank so much.
Wav files are structured with different fields representing different information as you probably know - see an exmample here from this link (https://github.com/kushalpandya/WavStagno):
It sounds like your output is not formatted correctly - there are tools available to inspect a WAV file which would be a good place to start, or if you are bale to share a link to the file here then people can take a look.
If what you are trying to do is to listen to the stream and save it at the same time, then you likely want to use the duplicate functionality - there is a god example here (albeit video based): https://stackoverflow.com/a/16758988/334402

pydub accessing the sampling rate(Hz) and the audio signal from an mp3 file

Just found out this interesting python package pydub which converts any audio file to mp3, wav, etc.
As far as I have read its documentation, the process is as follows:
read the mp3 audio file using from_mp3()
creates a wav file using export().
Just curious if there is a way to access the sampling rate and the audio signal(of 1-dimensional array, supposing it is a mono) directly from the mp3 file without converting it to a wav file. I am working on thousands of audio files and it might be expensive to convert all of them to wav file.
If you aren't interested in the actual audio content of the file, you may be able to use pydub.utils.mediainfo():
>>> from pydub.utils import mediainfo
>>> info = mediainfo("/path/to/file.mp3")
>>> print info['sample_rate']
44100
>>> print info['channels']
1
This uses avlib's avprobe utility, and returns all kinds of info. I suggest giving it a try :)
Should be much faster than opening each mp3 using AudioSegment.from_mp3(…)
frame_rate means sample_rate, so you can get like below;
from pydub import AudioSegment
filename = "hoge.wav"
myaudio = AudioSegment.from_file(filename)
print(myaudio.frame_rate)

Categories