Sphinx speech to text
From MvWiki
Sphinx is a speech recognition system that is being developed at Carnegie Mellon University.
OpenGov is using sphinx3 primarily, which works under GNU/Linux.
CMU has released:
- sphinx2 (slightly faster, but less accurate)
- sphinx3 (slower but more accurate)
- sphinx4 (java)
- pocketsphinx
If you are new to speech recognition, there are three main steps involved in creating a useful system.
You need:
- An appropriate language model (just text, nothing to do with sound)
- An appropriate dictionary for the language model
- An appropriate acoustic model
You should usually create your own language model based on text transcriptions relating to the people whose voices you want to recognize.
You should have a dictionary that contains pronunciations of all words that you want to recognize, making sure to add pronunciations that are unique to your subjects.
You should choose an acoustic model, and adapt it based on *accurate* transcriptions and audio files spoken by your subjects.
Once those three things are done, you should be able to use sphinx with a degree of accuracy.
To measure accuracy, you will need a word alignment tool.
(I'll link to specific tools later)
Tutorials:
When adapting your acoustic model, I recommend using the pocketsphinx method and then using the new model with sphinx3.

