How do I train the Python SpeechRecognition 2.1.1 Library - python

I am just getting started in speech recognition and was wondering what the general process was for training the SpeechRecognition library from Python:
https://pypi.python.org/pypi/SpeechRecognition/
I know basic machine learning techniques and basic text analytics, but I am not sure how to apply this to train sound data. (my end result would resemble the typical speech typing from phones where if you change the speech analyzer result often enough, it will "remember" the user preference).
Thanks!

That speech recognition library is using Google's speech recognition engine so there is no particular provision for training at the user end. Your sound data goes to Google (in digest form). If you get a dedicated API (as that documentation page suggests) it is possible Google will be building a user-specific profile on your voice and will gain statistical quality over time based on this, but that is not something that would be stored or written at your end.
Any further questions or unaddressed elements of your question, please let me know.

Related

python voice signature identification?

I'm working on a system that locks several parts in my computer and opens them ONLY using my voice saying specific
words (in python). I've already made the system that locks parts in my computer until you give it password but I want to change it to voice.
I did find some voice processing on the web but its really complicated and without explanation
in python.
I know python might not be the right language to do so, but I want to try!
thanks for any help!
You can start from pre-built services, like Azure Speaker Recognition. The process is quite straightforward, you provide audio training sample for each single speaker (yourself for example). This will create an enrollment profile based on the unique characteristics of your voice. If your are looking to build a solution from scratch, then Fast Fourier Transform would be a great start.
Python is a good point to start!
Search Python Speech recognition in google in you will have a lot of examples for doing that.
Enjoy!

Speech to Text API offline(Preferred) or online

I'm making a windows desktop application that needs to transcribe videos and I'm looking for a good free API to help me achieve that.
I looked a lot but most of the API's that I've found have bad accuracies.
This doesn't work with .NET Core but if you're using the legacy .NET Framework (which is supported) you can use System.Speech to both recognize and synthesize speech offline.
https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition?view=netframework-4.8
https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition.speechrecognitionengine?view=netframework-4.8
Update 3/1/21: System.Speech is now been ported to .NET Core. The Nuget package is available at: https://www.nuget.org/packages/System.Speech
Google's Speech-to-Text API has state of the art accuracy, a simple interface, and client libraries in many languages. You get 60 minutes free per month.
Link: https://cloud.google.com/speech-to-text/
If you want online API that is totally free, you most likely will not find it.
If you are willing to go offline, you will probably have to come up with a custom solution using the weights of some openly available deep learning model. Read some papers on state-of-the-art transcription models and see if any of the weights are available on GitHub. Keep in mind that performing such a task offline is very computationally expensive, and might require a GPU to give you results in a reasonable amount of time.

Automatic voice recognition when a word is said

I am trying to create a simulation of Alexa or Google Home (very basic). I am using the SpeechRecognition module with the Google as recognizer. I have managed to get it working but I don't know how to run the whole script when I say a word (I want it to be hearing always (as Alexa does)).
Ex:
'Hey, Robot'
AI = Hi, how may I help you? (runs whole script)
I had thought about looping through a piece of code every 5 seconds and then connecting to Google API but this isn't possible as the API is limited to 50 requests per day.
Any help is appreciated,
Thanks in advance
You can use "Silence" threshold to identify need of sending request to google, with that approach you will avoid sending to match requests. For code sample see Python record audio on detected sound.
Alternatively you can use open sourced speech recognition packages and end up with independent application, see The Ultimate Guide To Speech Recognition With Python article for that approach.
However, if you still prefer using of the remote API, you can combine approaches above, and use SpeechRecognition to understand the Hey, Robot phrase and after that, switch application to use Google API for speech recognition for some small short period of time, of course the threshold check shall be used to avoid querying Google API when client don't continue to speak after saying Hey, Robot.
Good Luck !
Go with CMU Sphinx. It does exactly what you want. See here:
https://cmusphinx.github.io/wiki/

Speech recognition for python, raspberry pi

This is a really big ask but I have tried for about 4 months now trying to get this to work. So, I am creating a personal assistant using a raspberry pi 3 model B and python (I know they are not the best of choices). Most of it works apart from the main feature, the speech to text (STT). I would like it to convert all spoken words to text and when you finish a sentence I would like it to enter and finish so the text can be processed as a string. Do you have any suggestions on what I could use to do this or any links to help me.
Thanks in advance.
I have completed similar project to yours recently.
If internet connection is not a problem for you, I would suggest using Wit.ai. It has nice Python API, or you could use it through HTTP API.
Your assistant would have to record speech, then send data to remote API and receive response with text as an answer.
Take into account, that STT process is quite complex, so trying to solve it with local solutions might be a bit too much to cope with for Raspberry. What's more, you would have to (probably) prepare vocabularies, etc. Using remote STT service, you don't have to do that.
If you cannot, or do not want to use remote service, you can always try CMU Sphinx. But for that, you need somebody else to help you with it, as I have no experience using it whatsoever.

Python handwriting recognition software?

Is there a Python handwriting recognition library? What are the inputs to hand writing recognition packages, .jpg images? .pdf images?
Zinnia is a C/C++ library with SWIG generated wrappers for Perl/Python/Ruby. It has a BSD license and converts user pen strokes provided as coordinates into character best matches. It also has a training module.
It looks like it performs single character recognition, so you might need to build something on top of it to improve the results.
PenCommander from PhatWare is a commercial, non-Python, Windows-only SDK. If you can live with all of those limitations, PhatWare products are the best handwriting recognition products that I've found so far, although I haven't been looking that hard since the Microsoft's digital ink for the Tablet PC came out. I'm still saving for the Tablet PC though :-(

Categories