Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods
arxiv(2024)
摘要
The Connectivity Map (CMap) is a large publicly available database of
cellular transcriptomic responses to chemical and genetic perturbations built
using a standardized acquisition protocol known as the L1000 technique.
Databases such as CMap provide an exciting opportunity to enrich drug discovery
efforts, providing a 'known' phenotypic landscape to explore and enabling the
development of state of the art techniques for enhanced information extraction
and better informed decisions. Whilst multiple methods for measuring phenotypic
similarity and interrogating profiles have been developed, the field is
severely lacking standardized benchmarks using appropriate data splitting for
training and unbiased evaluation of machine learning methods. To address this,
we have developed 'Leak Proof CMap' and exemplified its application to a set of
common transcriptomic and generic phenotypic similarity methods along with an
exemplar triplet loss-based method. Benchmarking in three critical performance
areas (compactness, distinctness, and uniqueness) is conducted using carefully
crafted data splits ensuring no similar cell lines or treatments with shared or
closely matching responses or mechanisms of action are present in training,
validation, or test sets. This enables testing of models with unseen samples
akin to exploring treatments with novel modes of action in novel patient
derived cell lines. With a carefully crafted benchmark and data splitting
regime in place, the tooling now exists to create performant phenotypic
similarity methods for use in personalized medicine (novel cell lines) and to
better augment high throughput phenotypic screening technologies with the L1000
transcriptomic technology.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要