Transcriptomics-guided Slide Representation Learning in Computational Pathology
CVPR 2024(2024)
摘要
Self-supervised learning (SSL) has been successful in building patch
embeddings of small histology images (e.g., 224x224 pixels), but scaling these
models to learn slide embeddings from the entirety of giga-pixel whole-slide
images (WSIs) remains challenging. Here, we leverage complementary information
from gene expression profiles to guide slide representation learning using
multimodal pre-training. Expression profiles constitute highly detailed
molecular descriptions of a tissue that we hypothesize offer a strong
task-agnostic training signal for learning slide embeddings. Our slide and
expression (S+E) pre-training strategy, called Tangle, employs
modality-specific encoders, the outputs of which are aligned via contrastive
learning. Tangle was pre-trained on samples from three different organs: liver
(n=6,597 S+E pairs), breast (n=1,020), and lung (n=1,012) from two different
species (Homo sapiens and Rattus norvegicus). Across three independent test
datasets consisting of 1,265 breast WSIs, 1,946 lung WSIs, and 4,584 liver
WSIs, Tangle shows significantly better few-shot performance compared to
supervised and SSL baselines. When assessed using prototype-based
classification and slide retrieval, Tangle also shows a substantial performance
improvement over all baselines. Code available at
https://github.com/mahmoodlab/TANGLE.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要