LiveLocalizer: Augmenting Mobile Text-to-Speech with Microphone Arrays, Optimized Localization and Beamforming

Artem Dementyev, Dimitri Kavensky,Samuel J. Yang, Mathieu Parvaix, Chiong Lai,Alex Olwal

ADJUNCT PROCEEDINGS OF THE 36TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE & TECHNOLOGY, UIST 2023 ADJUNCT(2023)

引用 0|浏览0
暂无评分
摘要
Speech-to-text capabilities on mobile devices have proven helpful for language translation, note-taking, hearing and speech accessibility, and meeting transcripts. However, their usefulness is constrained by being unable to distinguish between multiple speakers, track which direction speech is coming from, and provide acceptable performance in noisy environments. This work introduces efficient real-time audio localization and adaptive beamforming algorithms on custom sound perception hardware running on a low-power microcontroller and four integrated microphones. A prototype is implemented in a phone case form factor and is plug-and-play with modern smartphones. We characterize the performance in technical evaluations of localization, beamforming, and diarization. We demonstrate how the phone case extends existing smartphones with speaker diarization in a speech-to-text app, sound direction visualization, and sound enhancement through beamforming. In the future, we hope our approach will inspire the widespread adoption of advanced microphone arrays that natively unlock the potential of spatial sound processing and perception in mobile and wearable devices.
更多
查看译文
关键词
Speech-to-text,ASR,STT,audio,speech,microphone array,beamforming,accessibility
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要