Spurious reconstruction from brain activity
arxiv(2024)
摘要
Advances in brain decoding, particularly visual image reconstruction, have
sparked discussions about the societal implications and ethical considerations
of neurotechnology. As these methods aim to recover visual experiences from
brain activity and achieve prediction beyond training samples (zero-shot
prediction), it is crucial to assess their capabilities and limitations to
inform public expectations and regulations. Our case study of recent
text-guided reconstruction methods, which leverage a large-scale dataset (NSD)
and text-to-image diffusion models, reveals limitations in their
generalizability. We found decreased performance when applying these methods to
a different dataset designed to prevent category overlaps between training and
test sets. UMAP visualization of the text features with NSD images showed
limited diversity of semantic and visual clusters, with overlap between
training and test sets. Formal analysis and simulations demonstrated that
clustered training samples can lead to "output dimension collapse," restricting
predictable output feature dimensions. Diversifying the training set improved
generalizability. However, text features alone are insufficient for mapping to
the visual space. We argue that recent photo-like reconstructions may primarily
be a blend of classification into trained categories and generation of
inauthentic images through text-to-image diffusion (hallucination). Diverse
datasets and compositional representations spanning the image space are
essential for genuine zero-shot prediction. Interdisciplinary discussions
grounded in understanding the current capabilities and limitations, as well as
ethical considerations, of the technology are crucial for its responsible
development.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要