Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer
CoRR(2024)
摘要
Emotion recognition aims to discern the emotional state of subjects within an
image, relying on subject-centric and contextual visual cues. Current
approaches typically follow a two-stage pipeline: first localize subjects by
off-the-shelf detectors, then perform emotion classification through the late
fusion of subject and context features. However, the complicated paradigm
suffers from disjoint training stages and limited interaction between
fine-grained subject-context elements. To address the challenge, we present a
single-stage emotion recognition approach, employing a Decoupled
Subject-Context Transformer (DSCT), for simultaneous subject localization and
emotion classification. Rather than compartmentalizing training stages, we
jointly leverage box and emotion signals as supervision to enrich
subject-centric feature learning. Furthermore, we introduce DSCT to facilitate
interactions between fine-grained subject-context cues in a decouple-then-fuse
manner. The decoupled query token–subject queries and context
queries–gradually intertwine across layers within DSCT, during which spatial
and semantic relations are exploited and aggregated. We evaluate our
single-stage framework on two widely used context-aware emotion recognition
datasets, CAER-S and EMOTIC. Our approach surpasses two-stage alternatives with
fewer parameter numbers, achieving a 3.39
average precision gain on CAER-S and EMOTIC datasets, respectively.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要