MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
arxiv(2024)
摘要
Decoding natural visual scenes from brain activity has flourished, with
extensive research in single-subject tasks and, however, less in cross-subject
tasks. Reconstructing high-quality images in cross-subject tasks is a
challenging problem due to profound individual differences between subjects and
the scarcity of data annotation. In this work, we proposed MindTuner for
cross-subject visual decoding, which achieves high-quality and rich-semantic
reconstructions using only 1 hour of fMRI training data benefiting from the
phenomena of visual fingerprint in the human visual system and a novel
fMRI-to-text alignment paradigm. Firstly, we pre-train a multi-subject model
among 7 subjects and fine-tune it with scarce data on new subjects, where LoRAs
with Skip-LoRAs are utilized to learn the visual fingerprint. Then, we take the
image modality as the intermediate pivot modality to achieve fMRI-to-text
alignment, which achieves impressive fMRI-to-text retrieval performance and
corrects fMRI-to-image reconstruction with fine-tuned semantics. The results of
both qualitative and quantitative analyses demonstrate that MindTuner surpasses
state-of-the-art cross-subject visual decoding models on the Natural Scenes
Dataset (NSD), whether using training data of 1 hour or 40 hours.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要