Chrome Extension
WeChat Mini Program
Use on ChatGLM

Audio-Visual Speech Super-Resolution.

British Machine Vision Conference(2021)

Cited 0|Views6
No score
Abstract
In this paper, we present an audio-visual model to perform speech super-resolution at large scale-factors (8× and 16×). Previous works attempted to solve this problem using only the audio modality as input and thus were limited to low scale-factors of 2× and 4×. In contrast, we propose to incorporate both visual and auditory signals to superresolve speech of sampling rates as low as 1kHz. In such challenging situations, the visual features assist in learning the content and improves the quality of the generated speech. Further, we demonstrate the applicability of our approach to arbitrary speech signals where the visual stream is not accessible. Our “pseudo-visual network” precisely synthesizes the visual stream solely from the low-resolution speech input. Extensive experiments and the demo video illustrate our method’s remarkable results and benefits over state-of-the-art audio-only speech super-resolution approaches. Figure 1: We present an audio-visual model for super-resolving very low-resolution speech inputs (example, 1kHz) at large scale-factors. In contrast to the existing audio-only speech super-resolution approaches, our method benefits from the visual stream, either the realvisual stream (if available), or the generated visual stream from our pseudo-visual network.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined