22 Azure Speech


Package: azspeech.

AI can speak. And through speech human intelligence communicates, though AI are not really ready to communicate via speech yet. Nonetheless, we can communicate in a limited way with AI via speech and AI can communicate with us through speech. The chapter on Natural Language in the Data Science Desktop Survival Guide explores the speech capabilities of AI. In this chapter we explore an MLHub package that provides natural language capabilities.

Microsoft’s proprietary Azure Speech service can perform a variety of natural language related tasks. AI can indeed speak and listen, though maybe it doesn’t actually understand what it is hearing and saying, yet. For free we can use the Azure service to get an idea of what is possible.

The azspeech package utilises the Azure Speech services and can transcribe speech to text, and can create (synthesise) speech from text. To do this the package utilises pre-built speech models provided through Azure’s Cognitive Services. It actually supports many languages and voices, so within a pipeline, a male English speaker can generate a speech presented in a female French voice.

Most of the commands provided by the package will accept an audio file or will record audio from the computer’s microphone and play the synthesised audio through the computer’s speakers.

To install, configure, and demonstrate the package:

ml install   azspeech
ml configure azspeech
ml readme    azspeech
ml commands  azspeech
ml demo      azspeech

In addition to the demo command the package also supports synthesize, and transcribe:

ml sythesize  azspeech myspeech.txt
ml transcribe azspeech myspeech.wav

Azure-based models, unlike the MLHub models in general, use closed source services which have no guarantee of ongoing availability and do not come with the freedom to modify and share. This cloud based service also sends your text (for synthesis) and audio (for transcription) to the Azure cloud for analysis.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0