Talking Face Generation for Impression Conversion Considering Speech Semantics

Saki Mizuno,Nobukatsu Hojo, Kazutoshi Shinoda,Keita Suzuki,Mana Ihori, Hiroshi Sato,Tomohiro Tanaka, Naotaka Kawata,Satoshi Kobashikawa,Ryo Masumura

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2024）

引用 0|浏览4

暂无评分

摘要

This study investigates the talking face generation method to convert a speaker’s video to give a target impression, such as “favorable” or “considerate”. Such an impression conversion method needs to consider the input speech semantics because they affect the impression of a speaker’s video along with the facial expression. Conventional emotional talking face generation methods utilize speech information to synchronize the lip and speech of the output video. However, they cannot consider speech semantics because the speech representations contain only phonetic information. To solve this problem, we propose a facial expression conversion model that uses a semantic vector obtained from BERT embeddings of speech recognition results of input speech. We first constructed an audio-visual dataset with impression labels assigned to each utterance. The evaluation results based on the dataset showed that the proposed method could improve the estimation accuracy of the facial expressions of the target video.

查看译文

关键词

Impression Conversion,Talking Face Generation,Keypoint

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要