I am trying to covert audio file to text.I have seen many post on it.But nothing is working for me.Could you please suggest me pythonic way to covert speech to text.Thank you.
Thats a pretty big question, but my best suggestion to you is to go with something like https://pypi.python.org/pypi/pygsr
from pygsr import Pygsr
speech = Pygsr()
speech.record(3) # durationin seconds (3)
phrase, complete_response =speech.speech_to_text('en_US') # select the language
print phrase
Edit1: If its an audio file you could open a google voice account, call yourself, play the audio file to leave as a voicemail. Google converts all your messages to text so it's essentially a free transcription service.
Related
I'm tryng to create a video creation bot, using Python, pyttsx3 and moviepy.
I'm having some trouble with forced allineament( matching audio and the text of the audio shown on video). I would like to make the forced allineament during the text-to-speech genaration. I thought that a possible solution could be to split the input string and generate more audio files. But I'd rather find a way to know what the length of the generated audio is up to a certain word/location.
Thank you ;)
I'm using Google AIY Voice Kit (v2) with Raspberry Pi Zero to build a voice-control robot. It's working great! But I have an elementary question. While the robot is processing user speech (and deciding how to respond) I want to play a short sound file to indicate the robot is "thinking." The sound file is currently playing too loud. How to set the playback volume of a soundfile in python?
Here's a snippet of code:
aiy.voice.audio.play_wav_async("think.wav")
This plays successfully, but I can't figure out how to set the volume the way I can set volume in the text to speech function aiy.voice.tts.say(sentence, lang='en-GB', volume=10, pitch=120, speed=90, device='default')
Many thanks for any suggestions!
So this is a very hacky way to look at this problem, but after reading the AIY documentation and seeing that it is straight-up reading the bytes of the file pointer with no option to set volume [or anything else] I think hacks might be the best route.
Let's take your input file, modify it, then save it back as a tmp file to be read into AIY.
We can do something like the following:
# Throw this line somewhere higher up so you can edit it
# like you would the volume level in
# aiy.voice.tts.say(sentence, lang='en-GB', volume=10, pitch=120, speed=90, device='default')
# or just replace the later reference with the int and modify that line directly instead
HOW_MUCH_QUIETER: 10
from pydub import AudioSegment
song = AudioSegment.from_wav("think.wav")
# reduce volume by 10 dB
song_10_db_quieter = song - HOW_MUCH_QUIETER
# save the output
song.export("quieter.wav", "wave")
aiy.voice.audio.play_wav_async("quieter.wav")
To save a lot of work, I'm trying to generate a sound file of a script for a play that includes several different voices. These voices should be computer generated. I have software (NaturalReader13) for generating these voices. Since I don't want the entire play read in one voice, I can't upload the whole text into NaturalReader and export it.
I could export several voice files and then mix them into a coherent whole, but this takes a long time and patience. I tried this already using Audacity to mix. Instead, I want to automate this by interfacing with the program using Python. I have no idea how to do this.
There are free voice generators online that would work for this task, but they are much lower quality. If interfacing with NaturalReaders is too complex, getting the data from the web might be easier.
So basically, the script I have is of the form:
Character: These are the lines that need to be read...
Any ideas on how to approach this?
I use Windows 7. All I want to do is create raw audio and stream it to a speaker. After that, I want to create classes that can generate sine progressions (basically, a tone that slowly gets more and more shrill). After that, I want to put my raw audio into audio codecs and containers like .WAV and .MP3 without going insane. How might I be able to achieve this in Python without using dependencies that don't come with a standard install?
I looked up a great deal of files, descriptions, and related questions from here and all over the internet. I read about PCM and ADPCM, as well as A/D Converters. Where I get lost is somewhere between the ratio of byte input --> Kbps output, and all that stuff.
Really, all I want is for somebody to please be able to point me in the right direction to learn the audio formats precisely, and how to use them in Python (but first I want to start with raw audio).
This questions really has 2 parts:
How do I generate audio signals
How do I play audio signals through the speakers.
I wrote a simple wrapper around the python std lib's wave module, called pydub, which you can look at (on github) as a point of reference for how to manipulate raw audio data.
I generally just export the audio data to a file and then play it using VLC player. IMHO there's no reason to write a bunch of code to playback audio unless you're making a synthesizer or a game or some other realtime app.
Anyway, I hope that helps you get started :)
I have two .wav files that I need to compare and decide if they contain the same words (same order too).
I have been searching for the best method for a while now. I can't figure out how to have pyspeech use a file as input. I've tried getting the CMU sphinx project working but I cant seem to get GStreamer to work with Python 27 let alone their project. I've messed around with DragonFly as well with no luck.
I am using Win7 64bit with Python27. Does anyone have any ideas?
Any help is greatly appreciated.
You could try PySpeech. For some more info see pyspeech (python) - Transcribe mp3 files?. I have never used this, but I believe it leverages the built in speech recognition engine of Windows. This will let you convert the Wav files to text and then you can do a text compare.
To use the Windows speech engine and use a wav file for input there are two requirements.
Use an inproc recognizer (SpeechRecognitionEngine). Shared recognizers cannot use Wav files as input.
On the recognizer object call SetInputToWaveFile to specify your input wav file.
You may have to resample the wav files because the speech recognition engines only support certain sample rates.
8 bits per sample
single channel mono
22,050 samples per second
PCM encoding
works well on Windows. See https://stackoverflow.com/a/6203533/90236 for some more info.
For some more background on the windows speech engines, you might take a look at SAPI and Windows 7 Problem and What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?