Unbiasing Retrosynthesis Language Models with Disconnection Prompts.

Amol Thakkar,Alain C Vaucher,Andrea Byekwaso,Philippe Schwaller,Alessandra Toniato,Teodoro Laino

ACS central science（2023）

引用 2|浏览1

暂无评分

摘要

Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user's digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical.

查看译文

关键词

retrosynthesis language models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要