Multi-Modal Domain Distribution Dilation For Text-Based Person Retrieval

2023 China Automation Congress (CAC)(2023)

Cited 0|Views7
No score
The goal of text-based person retrieval is to recognize a corresponding target person from a mass of person image dataset according to a provided text query. Many previous methods are faced with a limited domain distribution (LD2) dilemma. To solve this tough problem, we propose a novel Multi-modal Domain Distribution Dilation (MD 3 ) framework for text-based person retrieval. MD3 consists of two streams, namely an original distribution stream (ODS) and a dilated distribution stream (DDS). A Visual Distribution Dilating (VDD) module is proposed to perturb the key attributes (such as brightness, contrast, and saturation) of an input raw image. A Textual Distribution Dilating (TDD) module is also adopted to make a variation on the textual domain distribution. In order to achieve the purpose of adapting to various domain distribution in a reasonable and effective way, we adopt a mutual learning mechanism that facilitates communication and learning between two streams with diverse distribution information. We carried out a large number of experiments on the widely-used CUHK-PEDES, RSTPReid and ICFG-PEDES datasets to verify the effectiveness of MD3. Compared with the existing methods, MD3 is superior and has achieved the state-of-the-art performance.
Translated text
Key words
text-based person retrieval,person reidentification,cross-modal retrieval,color domain distribution,mutual learning
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined