14.5 azspeech transcribe
20210609 The transcribe command will, by default, listen for up to 15 seconds of speech from the microphone and then convert it to text, written to the console. The command can also be used to transcribe speech from an audio file (wav). The source language may be required, though several languages are automatically identified.
$ ml transcribe azspeech -i <file.wav> --input=<file.wav> -l <lang> --lang=<lang>
A simple example, listening for the audio on the microphone:
$ ml transcribe azspeech The machine learning hub is useful for demonstrating capability of models as well as providing command line tools.
The command can take an audio wav file, specified using the
--input options, and transcribe it to the console. For large
audio files this can take some time. Currently only wav files are
supported through the command line (though the cloud service also
supports mp3, ogg, and flac). In the following
$ wget https://github.com/realpython/python-speech-recognition/raw/master/audio_files/harvard.wav $ ml transcribe azspeech --input=harvard.wav The stale smell of old beer lingers it takes heat to bring out the odor. A cold dip restore's health and Zest, a salt pickle taste fine with Ham tacos, Al Pastore are my favorite a zestful food is the hot cross bun.
To convert between file formats see the section on GNU/Linux Desktop Survival Guide.
To save the output to a text file simply use the shell redirect
$ ml transcribe azspeech --input=harvard.wav > harvard.txt $ cat harvard.txt The stale smell of old beer lingers it takes heat to bring out the odor. A cold dip restore's health and Zest, a salt pickle taste fine with Ham tacos, Al Pastore are my favorite a zestful food is the hot cross bun.
The input language will affect the AI’s capability and whilst it can automatically identify some languages, it can identify them all (at least not yet). We can assist by identifying the source language. In this example it is Indonesian. The first attempt results in a mix of English and some Indonesia.
$ ml transcribe azspeech --input=indonews.wav Any luck a barbaric abair poker delapan waktu Indonesia parrot, cyano millionaire.
Knowing the language results in greater accuracy:
$ ml transcribe azspeech --lang=id-ID --input=indonews.wav Inilah Kabar baru kabeer 8:00 waktu Indonesia Barat saya Naomi liandra.
The language code is the BCP-47 locale and supported codes are listed at https://docs.microsoft.com/en-gb/azure/cognitive-services/speech-service/language-support
Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.