25.3 deepspeech demo

The demonstration highlights the capabilities of the DeepSpeech pre-built speech-to-text model. Example audio files are transcribed, to demonstrate the accuracy obtainable by the model (on the carefully chosen audio) and that further training is required for other audio in general.

$ ml demo deepspeech

==========
DeepSpeech
==========

Welcome to a demo of Mozilla's DeepSpeech pre-built model for speech
to text. This model is trained using machine learning techniques based
on Baidu's Deep Speech research paper (https://arxiv.org/abs/1412.5567), 
and implemented by Mozilla. In this demo the audio will be played and
then transcribed to text.

Press Enter to continue: 

=======================
Experience proves this.
=======================

The audio has been played and if you listen carefully you should hear:

  Experience proves this.

Press Enter to continue: 

TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.4-0-gfcd9563
Loaded model in 0.0139s.
Loading scorer from files .mlhub/deepspeech/cache/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.000202s.

Running inference to transcribe the audio...

  experience proves this

Inference took 1.714s for 1.975s audio file.

Press Enter to continue: 

===============================
Why should one halt on the way?
===============================

The audio has been played and if you listen carefully you should hear:

  Why should one halt on the way?

Press Enter to continue: 

TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.4-0-gfcd9563
Loaded model in 0.0139s.
Loading scorer from files .mlhub/deepspeech/cache/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.000202s.

Running inference to transcribe the audio...

  why should one halt on the way

Inference took 1.681s for 2.735s audio file.

Press Enter to continue: 

================================
Your power is sufficient I said.
================================

The audio has been played and if you listen carefully you should hear:

  Your power is sufficient I said.

Press Enter to continue: 

TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.4-0-gfcd9563
Loaded model in 0.0188s.
Loading scorer from files .mlhub/deepspeech/cache/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.0003s.

Running inference to transcribe the audio...

  your paris sufficient i said

Inference took 1.486s for 2.590s audio file.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0