An AI needs to be able to listen and to learn. To listen is to first take in the words that are being spoken and to then interpret and understand that which is being spoken. As we listen we might think to transcribe what we are hearing. That is, capture the text of the spoken words. This is what we call speech-to-text.
The MLHub package deepspeech demonstrates a pre-built model for speech-to-text processing, and provides a command line tool for converting speech into text. The package is based on Mozilla’s DeepSpeech Playbook which in turn is based on Baidu’s Deep Speech research paper.
To install, configure, and demonstrate the package:
ml install deepspeech ml configure deepspeech ml readme deepspeech ml commands deepspeech ml demo deepspeech
ml transcribe deepspeech myspeech.wav
The pre-built model used for this package only supports English and was trained on 16kHz audio files. The commands will convert wav files with different sample rates to 16kHz and may result in poor performance.
The source code for this MLHub package is available from github: https://github.com/JingjingShii/deepspeech.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0