22.7 azspeech transcribe languages

20220314

The input language for the transcribe command can affect the AI’s capability. The language can be automatically identified for some languages but not for them all (at least not yet).

We can assist the AI by identifying the source language if it is known. In the following example it is known to be Indonesian. The first attempt results in a mix of English and some Indonesia.

ml transcribe azspeech indonews.wav
Any luck a barbaric abair poker delapan waktu Indonesia parrot, cyano
millionaire.

Knowing the language and using the --lang (-l) option we can use the standard identification for the Indonesian language (id-ID) as is listed in the table of supported BCP-47 locale codes:

ml transcribe azspeech --lang=id-ID indonews.wav

The result is considerably more accurate:

Inilah Kabar baru kabeer 8:00 waktu Indonesia Barat saya Naomi
liandra.

Other languages include: zh-TW (Taiwanese Mandarin); en-SG (Singaporian English); fr-CA (Canadian French); de-DE (German); hi-IN (Hindi); ko-KR (Korean); es-CL (Chilean Spanish); and th-TH (Thai).



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0