Standardizing chemical compounds with language models

MACHINE LEARNING-SCIENCE AND TECHNOLOGY(2023)

Cited 0|Views5
No score
Abstract
With the growing amount of chemical data stored digitally, it has become crucial to represent chemical compounds accurately and consistently. Harmonized representations facilitate the extraction of insightful information from datasets, and are advantageous for machine learning applications. To achieve consistent representations throughout datasets, one relies on molecule standardization, which is typically accomplished using rule-based algorithms that modify descriptions of functional groups. Here, we present the first deep-learning model for molecular standardization. We enable custom standardization schemes based solely on data, which, as additional benefit, support standardization options that are difficult to encode into rules. Our model achieves over 98%
More
Translated text
Key words
chemoinformatics,molecule standardization,molecular transformer,compound representation,natural language processing,chemical datasets
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined