Direct prediction of Homologous Recombination Deficiency from routine histology in ten different tumor types with attention-based Multiple Instance Learning: a development and validation study
medRxiv (Cold Spring Harbor Laboratory)(2023)
摘要
Background Homologous Recombination Deficiency (HRD) is a pan-cancer predictive biomarker that identifies patients who benefit from therapy with PARP inhibitors (PARPi). However, testing for HRD is highly complex. Here, we investigated whether Deep Learning can predict HRD status solely based on routine Hematoxylin & Eosin (H&E) histology images in ten cancer types.
Methods We developed a fully automated deep learning pipeline with attention-weighted multiple instance learning (attMIL) to predict HRD status from histology images. A combined genomic scar HRD score, which integrated loss of heterozygosity (LOH), telomeric allelic imbalance (TAI) and large-scale state transitions (LST) was calculated from whole genome sequencing data for n=4,565 patients from two independent cohorts. The primary statistical endpoint was the Area Under the Receiver Operating Characteristic curve (AUROC) for the prediction of genomic scar HRD with a clinically used cutoff value.
Results We found that HRD status is predictable in tumors of the endometrium, pancreas and lung, reaching cross-validated AUROCs of 0.79, 0.58 and 0.66. Predictions generalized well to an external cohort with AUROCs of 0.93, 0.81 and 0.73 respectively. Additionally, an HRD classifier trained on breast cancer yielded an AUROC of 0.78 in internal validation and was able to predict HRD in endometrial, prostate and pancreatic cancer with AUROCs of 0.87, 0.84 and 0.67 indicating a shared HRD-like phenotype is across tumor entities.
Conclusion In this study, we show that HRD is directly predictable from H&E slides using attMIL within and across ten different tumor types.
### Competing Interest Statement
JNK reports consulting services for Owkin, France, Panakeia, UK and DoMore Diagnostics, Norway and has received honoraria for lectures by MSD, Eisai and Fresenius. JSRF reports a leadership (board of directors) role at Grupo Oncoclinicas, stock or other ownership interests at Repare Therapeutics and Paige.AI, and a consulting or Advisory Role at Genentech/Roche, Invicro, Ventana Medical Systems, Volition RX, Paige.AI, Goldman Sachs, Bain Capital, Novartis, Repare Therapeutics, Lilly, Saga Diagnostics, Swarm and Personalis. No other potential conflicts of interest are reported by any of the authors.
### Funding Statement
JNK is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111) and the Max-Eder-Programme of the German Cancer Aid (grant #70113864), the German Federal Ministry of Education and Research (PEARL, 01KD2104C), and the German Academic Exchange Service (SECAI, 57616814). This research was supported by the National Institute for Health and Care Research (NIHR, NIHR213331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. JSRF is funded in part by the Breast Cancer Research Foundation, a Susan G Komen Leadership Grant, the NIH/NCI P50 CA247749 01 grant and by the NIH/NCI Cancer Center Core Grant P30-CA008748.
### Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
The WSI, molecular and clinical data for TCGA and CPTAC cohorts are publicly accessible at and (accessed, 08 March 2022). Script for calculating the HRD score is available under (accessed 06 June 2022). All other source codes can be downloaded under . Our calculated HRD score is publicly available in Supplementary Table 2. Moreover, our custom TCGA-BRCA HRD-H and HRD-L group can be accessed for the PanCancer Atlas cohort at (Supplementary 3).
.
* AI
: artificial intelligence
ASCAT
: Allele-Specific Copy number Analysis of Tumors
attMIL
: attention-weighted multiple instance learning
AUROC
: Area Under the Receiver Operating Characteristic curve
BRCA
: breast invasive carcinoma
BRCA1/2
: Breast Cancer genes 1 and 2
CI
: confidence interval
CIOMS
: Council for International Organizations of Medical Sciences
CPTAC
: Clinical Proteomic Tumor Analysis Consortium
CRC
: colorectal cancer
DL
: Deep Learning
DSB
: DNA double-strand breaks
ER-
: estrogen receptor negative
ER+
: estrogen receptor positive
FDA
: U.S. Food and Drug Administration
GBM
: glioblastoma
GDC
: Genomic Data Commons
GIS
: genomic instability score
H&E
: Hematoxylin & Eosin
HR
: Homologous recombination
HRD-H
: HRD high
HRD-L
: HRD low
HRD
: Homologous Recombination Deficiency
HRR
: Homologous recombination repair
LIHC
: liver hepatocellular carcinoma
LOH
: loss of heterozygosity
LSCC
: squamous cell carcinoma of the lung
LST
: large-scale state transitions
LUAD
: adenocarcinoma of the lung
LUSC
: squamous cell carcinoma of the lung
OV
: ovarian cancer (OV)
PAAD
: pancreatic adenocarcinoma
PDA
: pancreatic adenocarcinoma
PARP
: Poly(ADP-Ribose)-polymerase
PARPi
: Poly(ADP-Ribose)-polymerase inhibitor
PRAD
: prostate adenocarcinoma
PRC
: precision recall curve
ROC
: receiving operating curve
SBS3
: single base substitution 3
SNP
: single nucleotide polymorphism
SSDBs
: single strand DNA breaks
SSL
: self-supervised learning
TAI
: telomeric allelic imbalance
TCGA
: The Cancer Genome Atlas
TRIPOD
: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis
UCEC
: endometrial carcinoma
WSI
: whole slide images
更多查看译文
关键词
multiple instance learning,homologous recombination deficiency,different tumor types,routine histology,homologous recombination,attention-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要