Gta: Graph Truncated Attention For Retrosynthesis

Seung-Woo Seo,You Young Song,June Yong Yang,Seohui Bae,Hankook Lee,Jinwoo Shin,Sung Ju Hwang,Eunho Yang

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE（2021）

引用 34|浏览171

暂无评分

摘要

Retrosynthesis is the task of predicting reactant molecules from a given product molecule and is, important in organic chemistry because the identification of a synthetic path is as demanding as the discovery of new chemical compounds. Recently, the retrosynthesis task has been solved automatically without human expertise using powerful deep learning models. Recent deep models are primarily based on seq2seq or graph neural networks depending on the function of molecular representation, sequence, or graph. Current state-of-the-art models represent a molecule as a graph, but they require joint training with auxiliary prediction tasks, such as the most probable reaction template or reaction center prediction. Furthermore, they require additional labels by experienced chemists, thereby incurring additional cost. Herein, we propose a novel template-free model, i.e., Graph Truncated Attention (GTA), which leverages both sequence and graph representations by inserting graphical information into a seq2seq model. The proposed GTA model masks the self-attention layer using the adjacency matrix of product molecule in the encoder and applies a new loss using atom mapping acquired from an automated algorithm to the cross-attention layer in the decoder. Our model achieves new state-of-the-art records, i.e., exact match top-1 and top-10 accuracies of 51.1% and 81.6% on the USPTO-50k benchmark dataset, respectively, and 46.0% and 70.0% on the USPTO-full dataset, respectively, both without any reaction class information. The GTA model surpasses prior graph-based template-free models by 2% and 7% in terms of the top-1 and top-10 accuracies on the USPTO-50k dataset, respectively, and by over 6% for both the top-1 and top-10 accuracies on the USPTO-full dataset.

查看译文

关键词

graph truncated attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要