Chrome Extension
WeChat Mini Program
Use on ChatGLM

The Drug-Like Molecule Pre-Training Strategy for Drug Discovery

Jonghyun Lee,In-Soo Myeong,Yun Kim

IEEE Access(2023)

Cited 0|Views6
No score
Abstract
Recent advances in artificial intelligence (AI) have led to the development of transformer-based models that have shown success in identifying potential drug molecules for therapeutic purposes. However, for a molecule to be considered a viable drug candidate, it must exhibit certain desirable properties such as low toxicity, high druggability, and synthesizability. To address this, we propose an approach that incorporates prior knowledge about these properties during the model training process. In this study, we utilized the PubChem database, which contains 100 million molecules, to filter drug-like molecules based on the quantity of drug-likeliness (QED) score and the Pfizer rule. We then used this filtered dataset of drug-like molecules to train both molecular representation (ChemBERTa) and molecular generation models (MolGPT). To assess the performance of the molecular representation model, we fine-tuned the results on the MoleculeNet benchmark datasets. Meanwhile, we evaluated the performance of the molecular generation model based on the generated samples comprising 10,000 molecules. Despite the limited diversity of the pre-training dataset, the models for molecular representation were able to retain at least 90% of their original performance on benchmark datasets, with an additional improvement of 6% in predicting clinical toxicology. In the domain of molecular generation, the model pre-trained on drug-like molecules exhibited a high rate of desirable molecule properties in the unconditionally generated outputs. Additionally, the diversity of generated structures demonstrated notable performance compared to the conditional generation approach. Moreover, the drug-like molecule pre-training strategy is not limited to a specific model or training method, making it a flexible approach that can be easily modified based on the research interests and criteria of interest.
More
Translated text
Key words
AI-based drug discovery,pre-training,quantity of drug-likeliness,Pfizer rule
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined