17.5 azspeech transcribe

20210609 The transcribe command will, by default, listen for up to 15 seconds of speech from the microphone and then convert it to text, written to the console. The command can also be used to transcribe speech from an audio file (wav). The source language may be required, though several languages are automatically identified.

$ ml transcribe azspeech
     -i <file.wav>   --input=<file.wav>
     -l <lang>       --lang=<lang>

A simple example, listening for the audio on the microphone:

$ ml transcribe azspeech
The machine learning hub is useful for demonstrating capability of 
models as well as providing command line tools.

The command can take an audio wav file, specified using the -i or --input options, and transcribe it to the console. For large audio files this can take some time. Currently only wav files are supported through the command line (though the cloud service also supports mp3, ogg, and flac). To convert from other audio formats see the GNU/Linux Survival Guide.

$ wget https://github.com/realpython/python-speech-recognition/raw/master/audio_files/harvard.wav

$ ml transcribe azspeech --input=harvard.wav
The stale smell of old beer lingers it takes heat to bring out the odor.
A cold dip restore's health and Zest, a salt pickle taste fine with
Ham tacos, Al Pastore are my favorite a zestful food is the hot cross bun.

To convert between file formats see the section on GNU/Linux Desktop Survival Guide.

To save the output to a text file simply use the shell redirect operator >.

$ ml transcribe azspeech --input=harvard.wav > harvard.txt

$ cat harvard.txt
The stale smell of old beer lingers it takes heat to bring out the odor.
A cold dip restore's health and Zest, a salt pickle taste fine with
Ham tacos, Al Pastore are my favorite a zestful food is the hot cross bun.

The input language will affect the AI’s capability and whilst it can automatically identify some languages, it can not identify them all (at least not yet). We can assist by identifying the source language. In this example it is Indonesian. The first attempt results in a mix of English and some Indonesia.

$ ml transcribe azspeech --input=indonews.wav
Any luck a barbaric abair poker delapan waktu Indonesia parrot, cyano
millionaire.

Knowing the language results in greater accuracy:

$ ml transcribe azspeech --lang=id-ID --input=indonews.wav
Inilah Kabar baru kabeer 8:00 waktu Indonesia Barat saya Naomi
liandra.

The language code is the BCP-47 locale and supported codes are listed at https://docs.microsoft.com/en-gb/azure/cognitive-services/speech-service/language-support



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0