Transformers and Cortical Waves: Encoders for Pulling In Context Across Time
CoRR(2024)
摘要
The capabilities of transformer networks such as ChatGPT and other Large
Language Models (LLMs) have captured the world's attention. The crucial
computational mechanism underlying their performance relies on transforming a
complete input sequence - for example, all the words in a sentence into a long
"encoding vector" - that allows transformers to learn long-range temporal
dependencies in naturalistic sequences. Specifically, "self-attention" applied
to this encoding vector enhances temporal context in transformers by computing
associations between pairs of words in the input sequence. We suggest that
waves of neural activity, traveling across single cortical regions or across
multiple regions at the whole-brain scale, could implement a similar encoding
principle. By encapsulating recent input history into a single spatial pattern
at each moment in time, cortical waves may enable temporal context to be
extracted from sequences of sensory inputs, the same computational principle
used in transformers.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要