Chrome Extension
WeChat Mini Program
Use on ChatGLM

GPU-Based WFST Decoding with Extra Large Language Model

INTERSPEECH(2019)

Cited 4|Views1
No score
Abstract
Weighted finite-state transducer (WFST) decoding in speech recognition can be accelerated by using graphics processing units (GPUs). To obtain a high recognition accuracy in a WFST-based speech recognition system, a very large language model (LM) represented as a WFST with more than 10 GB of data is required. Since a GPU typically has only several GB of memory, it is impossible to store such a large LM in GPU memory. In this paper, we propose a new method for WFST decoding on a GPU. The method utilizes the on-the-fly rescoring algorithm, which performs the Viterbi search on a WFST with a small LM and rescores hypotheses using a large LM during decoding. We solve the problem of insufficient GPU memory by storing most of the large LM in a memory on the host and copying the data from the host memory to the GPU memory as needed during runtime. Our evaluation of the proposed method on the LibriSpeech test sets using an NVIDIA Tesla V100 GPU shows that it achieves a ten times faster decoding than an equivalent CPU implementation without recognition accuracy degradation.
More
Translated text
Key words
speech recognition, weighted finite-state transducer, graphics processing unit
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined