Confidence-based dynamic cross-modal memory network for image aesthetic assessment

PATTERN RECOGNITION(2024)

引用 0|浏览11
暂无评分
摘要
Image aesthetic assessment (IAA) aims to design algorithms that can make human-like aesthetic decisions. Due to its high subjectivity and complexity, visual information alone is limited to fully predict the aesthetic quality of an image. More and more researchers try to use complementary information from user comments. However, user comments are not always available due to various technical and practical reasons. Therefore, it is necessary to find a way to reconstruct the missing textual information for aesthetic prediction with visual information only. This paper solves this problem by proposing a Confidence-based Dynamic Cross-modal Memory Network (CDCM-Net). Specifically, the proposed CDCM-Net consists of two key components: Visual and Textual Memory (VTM) network and Confidence-based Dynamical Multi-modal Fusion module (CDMF). VTM is based on the key-value memory network. It consists of a visual key memory and a textual value memory. The visual key memory learns the visual information. While the textual value memory learns to remember the textual feature and align them with the corresponding visual features. During inference, textual information can be reconstructed using only visual features. Furthermore, a CDMF module is introduced to perform trustworthy fusion. CDMF evaluates modality-level informativeness and then dynamically integrates reliable information. Extensive experiments are performed to demonstrate the superiority of the proposed method.
更多
查看译文
关键词
Image aesthetic assessment (IAA),Memory-based network,Dynamical multi-modal fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要