TSIS with A Comparative Study on Linear Molecular Representation
CoRR(2024)
Abstract
Encoding is the carrier of information. AI models possess basic capabilities
in syntax, semantics, and reasoning, but these capabilities are sensitive to
specific inputs. In this study, we introduce an encoding algorithm, TSIS
(Simplified TSID), to the t-SMILES family as a fragment-based linear molecular
representation. TSID has been demonstrated to significantly outperform
classical SMILES, DeepSMILES, and SELFIES in previous work. A further
comparative analysis in this study reveals that the tree structure used by TSID
is more easily learned than anticipated, regardless of whether Transformer or
LSTM models are used. Furthermore, TSIS demonstrates comparable performance to
TSID and significantly outperforms SMILES, SELFIES, and SAFE. While SEFLIES and
SAFE present significant challenges in semantic and syntactic analysis,
respectively, due to their inherent complexity.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined