Distinguished In Uniform: Self Attention Vs. Virtual Nodes
CoRR(2024)
Abstract
Graph Transformers (GTs) such as SAN and GPS are graph processing models that
combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were
shown to be universal function approximators, with two reservations: 1. The
initial node features must be augmented with certain positional encodings. 2.
The approximation is non-uniform: Graphs of different sizes may require a
different approximating network.
We first clarify that this form of universality is not unique to GTs: Using
the same positional encodings, also pure MPGNNs and even 2-layer MLPs are
non-uniform universal approximators. We then consider uniform expressivity: The
target function is to be approximated by a single network for graphs of all
sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node
architecture. The essential difference between the two model definitions is in
their global computation method – Self-Attention Vs Virtual Node. We prove
that none of the models is a uniform-universal approximator, before proving our
main result: Neither model's uniform expressivity subsumes the other's. We
demonstrate the theory with experiments on synthetic data. We further augment
our study with real-world datasets, observing mixed results which indicate no
clear ranking in practice as well.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined