22.9 azspeech transcribe video

20230213

Consider the use case of wanting to transcribe a YouTube video for an audio impaired colleague. The video can be downloaded from YouTube using youtube-dl:

pip install --upgrade youtube-dl

We can use youtube-dl to download the audio from a YouTube video:

youtube-dl --extract-audio --audio-format wav --output "myvid.%(ext)s" https://www.youtube.com/watch?v=AZKcl4-tcuo

Alternatively we can download the video to then manually extract the audio using ffmpeg ourself, as explained in the GNU/Linux Survival Guide. In our example we explicitly ask for an mp4 rather than the default webm format. Either would do and ffmpeg will extract the audio as required.

youtube-dl --recode-video mp4 --output myvid https://www.youtube.com/watch?v=AZKcl4-tcuo

To extract the audio from the video using ffmpeg:

ffmpeg -i myvid.mp4 myvid.wav

It is then straightforward to transcribe the wav file, saving the output into the text version:

ml transcribe azspeech myvid.wav > myvid.txt

If the video is in Indonesian, for example, then a better result will be obtained with:

ml transcribe azspeech --lang=id-ID myvid.wav > myvid.txt

With a pipeline, the current implementation of the transcribe command will not wait for the audio to complete, and so the pipeline will not be successful.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0