Similarity metric learning on perturbational datasets improves functional identification of perturbations

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览1
Analysis of high-throughput perturbational datasets, including the Next Generation Connectivity Map (L1000) and the Cell Painting projects, uses similarity metrics to identify perturbations or disease states that induce similar changes in the biological feature space. Similarities among perturbations are then used to identify drug mechanisms of action, to nominate therapeutics for a particular disease, and to construct bio-logical networks among perturbations and genes. Standard similarity metrics include correlations, cosine distance and gene set enrichment methods, but these methods operate on the measured features without refinement by transforming the measurement space. We introduce Perturbational Metric Learning (PeML), a weakly supervised similarity metric learning method to learn a data-driven similarity function that maximizes discrimination of replicate signatures by transforming the biological measurements into an intrinsic, dataset-specific basis. The learned similarity functions show substantial improvement for recovering known biological relationships, like mechanism of action identification. In addition to capturing a more meaningful notion of similarity, data in the transformed basis can be used for other analysis tasks, such as classification and clustering. Similarity metric learning is a powerful tool for the analysis of large biological datasets. ### Competing Interest Statement I.S. is a paid consultant for Fauna Bio. BHK is a paid consultant and shareholder of Code Ocean Inc. and is part of the Scientific Advisory Board of the Break through Cancer (BTC) Foundation. * bAUC : balanced area under Receiver Operating Characteristic curve (auROC) CDRP : Center-Driven Research Project, the name of a Cell Painting morphological dataset[[35][1]] L1000 : Luminex L1000, the name of an assay used for the Connectivity Map, or equivalently the dataset generated by that assay MoA : mechanism of action PCA : Principal Component Analysis PeML : Perturbational metric learning, the method introduced by this paper WSL : Weakly supervised learning [1]: #ref-35
perturbational datasets,metric learning,functional identification
AI 理解论文
Chat Paper