Distant, Multichannel Speech Recognition Using Microphone Array Coding and Cloud-Based Beamforming with a Self-Attention Channel Combinator.

Dushyant Sharma, Daniel T. Jones, Stanislav Yu. Kruchinin,Rong Gong,Patrick A. Naylor

Asilomar Conference on Signals, Systems and Computers(2023)

引用 0|浏览0
暂无评分
摘要
Distant Automatic Speech Recognition (ASR) holds the promise of more natural human-machine interface and using multiple microphones to acquire speech in such environments often leads to better accuracy of ASR. The benefits come from encoding spatial information which can be used to enhance the speech and estimate the direction of sound arrival. Current ASR systems are based on end-to-end models that require considerable computational resources and are typically deployed in the cloud, which requires the use of a CODEC to help reduce the transmission bandwidth. We present a multichannel speech coding scheme specifically adapted for microphone array signals and unlike typical speech codecs, this scheme preserves phase relationships of the signals so that the spatial information can be exploited in the cloud. We explore the use of a frequency domain relative transfer function estimator as part of the CODEC. We also explore the use of a modified discrete cosine transform based Self Attention Channel Combinator (SACC) front-end for ASR and show that the time domain signal post SACC processing leads to significant improvements in C50. Furthermore, we show that preprocessing of the array signals with a de-reverberation method leads to a lower WER and also more accurate DOA estimation.
更多
查看译文
关键词
Speech Recognition,Beamforming,Microphone Array,Multi-channel Speech,Spatial Information,Time Domain,Time-domain Signal,Discrete Cosine Transform,Array Signal,Signal Preprocessing,Direction Of Arrival,Speech Coding,Direction Of Arrival Estimation,Word Error Rate,Automatic Speech Recognition System,Neural Network,Covariance Matrix,Decoding,Utterances,Spatial Filter,Short-time Fourier Transform,Multi-channel Signals,Bitrate,Reference Channel,Front End,Uniform Linear Array
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要