LiveLocalizer: Augmenting Mobile Speech-to-Text with Microphone Arrays, Optimized Localization and Beamforming.

Artem Dementyev, Dimitri Kanevsky, Samuel Yang, Mathieu Parvaix, Chiong Lai,Alex Olwal

UIST '23 Adjunct: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(2023)

引用 0|浏览1
暂无评分
摘要
Speech-to-text capabilities on mobile devices have proven helpful for language translation, note-taking, hearing and speech accessibility, and meeting transcripts. However, their usefulness is constrained by being unable to distinguish between multiple speakers, track which direction speech is coming from, and provide acceptable performance in noisy environments. This work introduces efficient real-time audio localization and adaptive beamforming algorithms on custom sound perception hardware running on a low-power microcontroller and four integrated microphones. A prototype is implemented in a phone case form factor and is plug-and-play with modern smartphones. We characterize the performance in technical evaluations of localization, beamforming, and diarization. We demonstrate how the phone case extends existing smartphones with speaker diarization in a speech-to-text app, sound direction visualization, and sound enhancement through beamforming. In the future, we hope our approach will inspire the widespread adoption of advanced microphone arrays that natively unlock the potential of spatial sound processing and perception in mobile and wearable devices.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要