Single‐molecule real‐time sequencing of the M protein: Toward personalized medicine in monoclonal gammopathies

American Journal of Hematology(2022)

引用 0|浏览6
暂无评分
摘要
Each patient with a monoclonal gammopathy has a unique monoclonal (M) protein, whose sequence can be used as a tumoral fingerprint to track the presence of the B cell or plasma cell (PC) clone itself. Moreover, the M protein can directly cause potentially life-threatening organ damage, which is dictated by the specific, patient's unique clonal light and/or heavy chain amino acid sequence, as in patients affected by immunoglobulin light chain (AL) amyloidosis.1 However, patients' specific M protein sequences remain mostly undefined and molecular mechanisms underlying M protein-related clinical manifestations are largely obscure. We combined the unbiased amplification of expressed immunoglobulin genes through inverse PCR from circularized, double-stranded cDNA using primers annealing to the constant regions of immunoglobulin genes, with single-molecule, real-time, long-read DNA sequencing and bioinformatics and immunogenetic analyses2-4 (Online Methods, Figures S1, S2, Table S1). The resulting methodology, termed Single-Molecule Real-Time Sequencing of the M protein (SMaRT M-Seq), identifies the full-length sequence of the variable region of expressed immunoglobulin genes and ranks the obtained sequences based on their relative abundance, thus enabling the identification of the full-length variable sequence of light and/or heavy chains from a high number of patients analyzed in parallel. SMaRT M-Seq has undergone appropriate technical validation (Table S2). Sequencing of contrived bone marrow (BM) samples generated through serial dilutions of κ- or λ-expressing PC lines into control BM, as well as sequencing of replicate, bona fide BM samples from AL patients and comparison with gold-standard techniques of immunoglobulin gene cloning and sequencing, showed: (i) 100% sequence-accuracy at the individual base-pair level; (ii) high repeatability (coefficient of variation <0.8% for sequencing of pentaplicate BM samples) in defining the molecular clonal size (i.e., the fraction of total immunoglobulin sequences coinciding with the clonal sequence); (iii) a high sensitivity in identifying clonal immunoglobulin sequences (10−2–10−3 when employing low-coverage sequencing on multiple, pooled samples) (Appendix S1, Figures S3–S5). To further extend the technical validation of the methodology and assess its throughout, we employed SMaRT M-Seq for the identification of clonal immunoglobulin sequences from BM mononuclear cells of a cohort of 89 consecutive patients with a diagnosis or a suspicion of systemic AL amyloidosis analyzed in parallel in one sequencing round (Figure S6). In 6 of these patients, comparison with standard cloning and sequencing approaches confirmed 100% identity with respect to the sequence obtained by SMaRT M-Seq (Figure S7). In addition, 3 of these patients were analyzed in duplicate with SMaRT M-Seq, and the sequence-based molecular clonal sizes of the two technical replicates were highly comparable (Figure 1). These results further confirm the accuracy and repeatability of this method also when the assay is employed to analyze a higher number of samples in parallel. Of the 89 sequenced patients, a final diagnosis of systemic AL amyloidosis could be established in 84 patients, including 5 cases with undetectable M protein by means of conventional M protein studies (Figure S8, Table S3). Of note, SMaRT M-Seq identified a dominant immunoglobulin LC sequence in all 84 patients (but not in patients analyzed in parallel where a monoclonal gammopathy was eventually excluded, Figure S9). The median molecular clonal size was 88.3% (IQR: 70.7%–93%) (Figure 1) and showed a significant correlation with the percentage of BM-PC infiltrate and with serum free LC levels (p < 0.0001 in each case) (Figure S10). Patients' clonal sequences proved to be unique (Figure S11). Germline gene usage was in agreement with the expectations for a population of patients with AL amyloidosis (Figure S12) and correlated with selected clinical features (Figure S13). As an additional way to verify the accuracy of the methodology in identifying the clonal, expressed LC, we compared the sequencing results obtained with SMaRT M-Seq on BM samples with proteomics data from matched, amyloid-containing fat tissues for 4 patients. In all cases, the expected clonal LC variable sequence as assessed by SMaRT M-Seq was the potentially amyloidogenic protein with the highest sequence coverage and was by far the first immunoglobulin LC sequence in terms of unique peptides identified compared to other, published immunoglobulin LCs (Figure S14). Collectively, these data show that SMaRT M-Seq performed on a high number of BM samples from patients with monoclonal gammopathies analyzed in parallel can accurately and reproducibly identify a clonal immunoglobulin LC sequence in all instances, even in cases with low BM-B cell/ PC clonal burden and with undetectable M protein by means of conventional diagnostic techniques. We then investigated whether the full-length variable sequence information attainable at diagnosis using SMaRT M-Seq and the use of inverse PCR coupled to short-read sequencing might enable the detection of low-level, residual clonotypic sequences, as in the context of minimal residual disease (MRD) assessment. Using contrived BM samples mimicking progressively smaller plasma cell clones, a clonotypic sequence was identified up to the 10−7 dilution, with a progressively decreasing molecular clonal size, indicating linearity of the amplification and sequencing approach (Figure 1B,C, Appendix S1). Overall, these data suggest that the knowledge of patients' specific, expressed full-length clonal immunoglobulin variable sequence may be exploited to facilitate MRD assessment. We have established SMaRT M-Seq as a novel, validated assay to reliably identify the full-length variable sequence of M proteins. The assay relies on the unbiased amplification of expressed immunoglobulin gene(s) through an inverse PCR, coupled with single-molecule real-time DNA sequencing. Within a complex biological sample like BM or peripheral blood, SMaRT M-Seq enables ranking the obtained reads based on their relative abundance, which is a measure of the relative abundance of mRNA molecules within the sample. As such, this method can be employed to infer the full-length variable sequence of an expressed clonal immunoglobulin as the dominant sequence identified within the sample under examination, thus enabling the use of clonal, expressed immunoglobulin sequence information for basic research studies or potential diagnostic applications. The accuracy of SMaRT M-Seq at identifying the full-length variable sequence of the clonal immunoglobulin gene at the single nucleotide level has been demonstrated through different approaches. First, through the comparison with sequencing results obtained by means of conventional sequencing methods, both in contrived samples spiked with PC lines secreting a known κ- or λ-LC (thus serving as reference material) and in bona fide BM samples. Second, through the investigation of germline gene usage in our analyzed cohort of consecutive AL patients, which closely reflects the expected distribution of immunoglobulin LC genes based on previous studies. Third, through the identification—within amyloid-laden fat tissues—of high amounts of tryptic digestion peptides aligning to the clonal LC sequence obtained through SMaRT M-Seq. The analysis of multiple BM samples from AL patients as technical replicates demonstrated the repeatability of SMaRT M-Seq for identifying both the full-length variable sequence of clonal immunoglobulin genes at the single nucleotide level, and the molecular clonal size. The latter is a measure of the relative abundance of clonal sequences in a given sample. Differently from genomic DNA-based sequencing methods, where reads are considered to be directly proportionate to the amount of tumoral cells within the sample, in mRNA/cDNA-based methods, including SMaRT M-Seq, reads reflect both the amount of tumoral cells within the sample and the average number of immunoglobulin transcripts per cell, which may differ among different patients. This may reduce linearity between the true clonal burden within the biological sample and the obtained molecular clonal size based on the frequency of clonal sequences and could be regarded as a limitation of mRNA/cDNA-based methods. On the other hand, the excess of light- and/or heavy-chain mRNA molecules compared to a single rearranged genomic DNA molecule in each tumoral cell may favor sensitivity of mRNA/cDNA-based sequencing methods for clonal detection.5 Besides the relative abundance of clonal cells within the sample under exam, sensitivity of SMaRT M-Seq is determined also by the number of reads analyzed per sample. This is in turn dictated by the sequencing output of the employed sequencing platform, and by the number of pooled samples analyzed in a given sequencing round, thus proving to be scalable. Even when analyzing multiple samples on a sequencing platform with low sequencing output, the achieved sensitivity of SMaRT M-Seq significantly exceeds the requirements for the identification of clonal B cells/plasma cells in patients with AL amyloidosis at diagnosis. Besides AL amyloidosis, clinical manifestations causally linked to the presence of specific M proteins can also develop in the context of multiple myeloma, Waldenström macroglobulinemia, and monoclonal gammopathies of clinical significance.1 The molecular mechanisms underlying these conditions are poorly understood, partly because of the limited number of clinically annotated, sequenced M proteins. Therefore, the possibility of reliably determining the entire variable region of expressed, disease-related immunoglobulin gene sequences from a high number of affected patients has the potential of elucidating molecular mechanisms of pathogenicity and enabling sequence-based predictive models.6 The identification of pathogenic LCs may improve our capability to make an early diagnosis that is of vital importance. Moreover, at the individual patient level, the identification of expressed clonotypic immunoglobulin sequences could enable approaches of personalized medicine for the sensitive detection of patients' specific M proteins at diagnosis and after anti-clonal therapy. This work was supported by grants from Amyloidosis Foundation (MN), Italian Ministry of Health (Ricerca Finalizzata, grant #GR-2018-12 368 387, Ricerca Corrente) (EMil, MN), CARIPLO Foundation (grant #2018–0257) (EMil, MN), Cancer Research UK [C355/A26819], FC AECC and AIRC under the Accelerator Award Program (GM, GP, MN), the Italian Ministry of Research and Education (PRIN 20207XLJB2) (SR, GP) and the Fondazione ARISLA (project TDP- 43-STRUCT) (SR). Pasquale Cascino, Giovanni Palladini, and Mario Nuvolone are inventors on a patent application related to this work. EMih owns shares in the company aiNET GmbH. The data generated in this study are available within the article and its supplementary data files. LC sequences have been deposited to GenBank (MZ595009-MZ595094) and proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (PXD028093 and PXD03587). The data generated in this study are available within the article and its supplementary data files. LC sequences have been deposited to GenBank (MZ595009-MZ595094) and proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (PXD028093 and PXD03587). Appendix S1. Supporting Information. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
更多
查看译文
关键词
protein
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要