Objective:
The m6A modification is the most common ribonucleic acid (RNA) modification, playing a role in prompting the virus's gene mutation and protein structure changes in the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Nanopore single-molecule direct RNA sequencing (DRS) provides data support for RNA modification detection, which can preserve the potential
$m^6A$
signature compared to second-generation sequencing. However, due to insufficient DRS data, there is a lack of methods to find m6A RNA modifications in DRS. Our purpose is to identify
$m^6A$
modifications in DRS precisely.
Methods:
We present a methodology for identifying
$m^6A$
modifications that incorporated mapping and extracted features from DRS data. To detect
$m^6A$
modifications, we introduce an ensemble method called mixed-weight neural bagging (MWNB), trained with 5-base RNA synthetic DRS containing modified and unmodified
$m^6A$
.
Results:
Our MWNB model achieved the highest classification accuracy of 97.85% and AUC of 0.9968. Additionally, we applied the MWNB model to the COVID-19 dataset; the experiment results reveal a strong association with biomedical experiments.
Conclusion:
Our strategy enables the prediction of
$m^6A$
modifications using DRS data and completes the identification of
$m^6A$
modifications on the SARS-CoV-2.
Significance:
The Corona Virus Disease 2019 (COVID-19) outbreak has significantly influence, caused by the SARS-CoV-2. An RNA modification called
$m^6A$
is connected with viral infections. The appearance of
$m^6A$
modifications related to several essential proteins affects proteins’ structure and function. Therefore, finding the location and number of
$m^6A$
RNA modifications is crucial for subsequent analysis of the protein expression profile.