I'm working on a system that locks several parts in my computer and opens them ONLY using my voice saying specific
words (in python). I've already made the system that locks parts in my computer until you give it password but I want to change it to voice.
I did find some voice processing on the web but its really complicated and without explanation
in python.
I know python might not be the right language to do so, but I want to try!
thanks for any help!
You can start from pre-built services, like Azure Speaker Recognition. The process is quite straightforward, you provide audio training sample for each single speaker (yourself for example). This will create an enrollment profile based on the unique characteristics of your voice. If your are looking to build a solution from scratch, then Fast Fourier Transform would be a great start.
Python is a good point to start!
Search Python Speech recognition in google in you will have a lot of examples for doing that.
Enjoy!
Related
i am kind of new to Python and trying to learn the language the best i can. Right now i am having a problem of trying to automate something. I dont know how i should
What i am trying to automate I was trying to make the Programm being able to Read the value of how much Food, Wood etc is in the Storage right now(maybe also know if the storage is full or not) If it is full, or has reached the value needed to upgrade the storage for Food for ecaple, then it should do so.
I was thinking at first of how it should know that the storage is full, should be done by using pyautogui using but i am unsure...i am sorry that this is such a long massage and i dont know if you guys understand what i am trying to explain,...if not so, just ask
Python does not recognize (nor know) the value of the characters for this type of automation it is recommended to use an image recognition library. (I Recommend Opencv) and for clicking on the screen you can use pyput.
Opencv Link: https://pypi.org/project/opencv-python/
Pyput Link: https://pypi.org/project/pynput/
and you can also watch the following project since it can help you get started.
https://www.youtube.com/watch?v=vXqKniVe6P8
I downloaded rainmeter in order to get some nifty computer stats showing while I work, and I found an audio visualizer that allows for you to see the frequency (I believe) at which noise is coming in around the office. This made me think, hmph, is it possible to get a realtime audio feed to translate to text with some sort of programming language? I figured this could be handy for any interviews I do and I wanted to reflect back on it, and it just seems like a cool concept to me. I do not want to download any third-party software, rather to take the time to learn some new coding concepts in either python, PowerShell, C, or anything. I am not limited to any language and it is up to the answerer's discretion!
Thanks for the help.
I am trying to create a simulation of Alexa or Google Home (very basic). I am using the SpeechRecognition module with the Google as recognizer. I have managed to get it working but I don't know how to run the whole script when I say a word (I want it to be hearing always (as Alexa does)).
Ex:
'Hey, Robot'
AI = Hi, how may I help you? (runs whole script)
I had thought about looping through a piece of code every 5 seconds and then connecting to Google API but this isn't possible as the API is limited to 50 requests per day.
Any help is appreciated,
Thanks in advance
You can use "Silence" threshold to identify need of sending request to google, with that approach you will avoid sending to match requests. For code sample see Python record audio on detected sound.
Alternatively you can use open sourced speech recognition packages and end up with independent application, see The Ultimate Guide To Speech Recognition With Python article for that approach.
However, if you still prefer using of the remote API, you can combine approaches above, and use SpeechRecognition to understand the Hey, Robot phrase and after that, switch application to use Google API for speech recognition for some small short period of time, of course the threshold check shall be used to avoid querying Google API when client don't continue to speak after saying Hey, Robot.
Good Luck !
Go with CMU Sphinx. It does exactly what you want. See here:
https://cmusphinx.github.io/wiki/
This is a really big ask but I have tried for about 4 months now trying to get this to work. So, I am creating a personal assistant using a raspberry pi 3 model B and python (I know they are not the best of choices). Most of it works apart from the main feature, the speech to text (STT). I would like it to convert all spoken words to text and when you finish a sentence I would like it to enter and finish so the text can be processed as a string. Do you have any suggestions on what I could use to do this or any links to help me.
Thanks in advance.
I have completed similar project to yours recently.
If internet connection is not a problem for you, I would suggest using Wit.ai. It has nice Python API, or you could use it through HTTP API.
Your assistant would have to record speech, then send data to remote API and receive response with text as an answer.
Take into account, that STT process is quite complex, so trying to solve it with local solutions might be a bit too much to cope with for Raspberry. What's more, you would have to (probably) prepare vocabularies, etc. Using remote STT service, you don't have to do that.
If you cannot, or do not want to use remote service, you can always try CMU Sphinx. But for that, you need somebody else to help you with it, as I have no experience using it whatsoever.
I am just getting started in speech recognition and was wondering what the general process was for training the SpeechRecognition library from Python:
https://pypi.python.org/pypi/SpeechRecognition/
I know basic machine learning techniques and basic text analytics, but I am not sure how to apply this to train sound data. (my end result would resemble the typical speech typing from phones where if you change the speech analyzer result often enough, it will "remember" the user preference).
Thanks!
That speech recognition library is using Google's speech recognition engine so there is no particular provision for training at the user end. Your sound data goes to Google (in digest form). If you get a dedicated API (as that documentation page suggests) it is possible Google will be building a user-specific profile on your voice and will gain statistical quality over time based on this, but that is not something that would be stored or written at your end.
Any further questions or unaddressed elements of your question, please let me know.