Chrome Extension
WeChat Mini Program
Use on ChatGLM

A Visual Inspection Tool for Evaluation of ASR Model Using PyKaldi and PyCHAIN

2022 9th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE)(2022)

Cited 0|Views8
No score
Abstract
We developed a tool to create and evaluate a transcript and an alignment of an utterance. The tool will display speech waveform and MFCC features on HTML5 canvas. It also shows transcript and phonemes alignment using PyKaldi and PyCHAIN. Maintainers of medical dictation systems will use this tool to examine speech waveform, MFCC features, transcription results, and phonemes alignment of an utterance in the evaluation process. PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. At the same time, PyCHAIN is a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the chain models in the Kaldi speech recognition toolkit. As a user guide, we demonstrate in this paper a use case for the operation of the tool's features to analyze the performance of the model by inspecting the transcript and the alignment of the utterance.
More
Translated text
Key words
speech recognition,medical dictation,PyKaldi,PyCHAIN,Kaldi
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined