TSIS with A Comparative Study on Linear Molecular Representation

CoRR(2024)

Cited 0|Views5
No score
Abstract
Encoding is the carrier of information. AI models possess basic capabilities in syntax, semantics, and reasoning, but these capabilities are sensitive to specific inputs. In this study, we introduce an encoding algorithm, TSIS (Simplified TSID), to the t-SMILES family as a fragment-based linear molecular representation. TSID has been demonstrated to significantly outperform classical SMILES, DeepSMILES, and SELFIES in previous work. A further comparative analysis in this study reveals that the tree structure used by TSID is more easily learned than anticipated, regardless of whether Transformer or LSTM models are used. Furthermore, TSIS demonstrates comparable performance to TSID and significantly outperforms SMILES, SELFIES, and SAFE. While SEFLIES and SAFE present significant challenges in semantic and syntactic analysis, respectively, due to their inherent complexity.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined