Detecting Deep-Fake Videos from Aural and Oral Dynamics

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGITION WORKSHOPS (CVPRW 2021)(2021)

引用 42|浏览17
暂无评分
摘要
A face-swap deep fake replaces a person's face - from eyebrows to chin - with another face. A lip-sync deep fake replaces a person's mouth region to be consistent with an impersonated or synthesized audio track. An overlooked aspect in the creation of these deep-fake videos is the human ear. Statically, the shape of the human ear has been shown to provide a biometric signal. Dynamically, movement of the mandible (lower jaw) causes changes in the shape of the ear and ear canal. While the facial identity in a face-swap deep fake may accurately depict the co-opted identity, the ears belong to the original identity. While the mouth in a lip-sync deep fake may be well synchronized with the audio, the dynamics of the ear motion will be de-coupled from the mouth and jaw motion. We describe a forensic technique that exploits these static and dynamic aural properties.
更多
查看译文
关键词
face-swap deep fake,lip-sync deep fake,ear motion,deep-fake videos detection,aural dynamics,oral dynamics,forensic technique
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要