(Retracted) Mimicking human vision systems: deep-learning-based feature fusion for semantic image retrieval

Zhongzhe Chen,Luming Zhang

Journal of Electronic Imaging(2023)

引用 0|浏览0
暂无评分
摘要
Cross-form feature combination is an important multifeature fusion technique where the purpose is to implicitly discover the relationship between samples from different modalities, i.e., to retrieve another image encoded by similar semantics through one example image. In the past decade, cross-modal image retrieval has becoming a hotspot investigated by many academicians. Moreover, it is now a significant tool for the future performance enhancement of image retrieval. A long-short term memory (LSTM)-based feature fusion model is proposed. First, aiming at the competitiveness of nonmixed deep architecture for image retrieval, the mechanism of LSTM is introduced in detail. Among them, ground-truth-based methods are used to improve cross-modality. We notice that LSTM can mimic human visual understanding of image semantics well. To improve the accuracy of oblique-form image retrieval, systems based on binary representation are proposed to improve cross-modal similarity and effectiveness of message recovery. Second, we use a quality model to measure the commonly used image low-/high-level visual features, where the disqualified features are abandoned accordingly. This in turn achieves an optimal set of highly descriptive features for image retrieval. Furthermore, we use LSTM and the refined visual features to build a biological model for image retrieval, wherein the multimodel features can be optimally incorporated at the temporal level. Extensive experimental validations on multiple well-known image sets have shown the superiority of our method.
更多
查看译文
关键词
feature fusion,human vision systems,deep-learning-based deep-learning-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要