Sequential Routing Framework: Fully Capsule Network-Based Speech Recognition

Kyungmin Lee,Hyunwhan Joe,Hyeontaek Lim,Kwangyoun Kim,Sungsoo Kim,Chang Woo Han,Hong-Gee Kim

COMPUTER SPEECH AND LANGUAGE（2021）

引用 2|浏览55

暂无评分

摘要

Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing mechanisms. Afterwards, losses are computed by connectionist temporal classification (CTC). During routing, the required number of parameters can be controlled by the window size regardless of the length of sequences by sharing learnable weights across the slices. We additionally propose a sequential dynamic routing algorithm to replace traditional dynamic routing. The proposed technique can minimize decoding speed degradation caused by the routing iterations since it can operate in a non-iterative manner without dropping accuracy. The method achieves a 1.1% lower word error rate at 16.9% on the Wall Street Journal corpus compared to bidirectional long short-term memory based CTC networks. On the TIMIT corpus, it attains a 0.7% lower phone error rate at 17.5% compared to convolutional neural network-based CTC networks (Zhang et al., 2016).(c) 2021 Elsevier Ltd. All rights reserved.

查看译文

关键词

Capsule network, Automatic speech recognition, Sequence-to-sequence, Connectionist temporal classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要