Dirichlet Flow Matching with Applications to DNA Sequence Design
CoRR(2024)
摘要
Discrete diffusion or flow models could enable faster and more controllable
sequence generation than autoregressive models. We show that naïve linear
flow matching on the simplex is insufficient toward this goal since it suffers
from discontinuities in the training target and further pathologies. To
overcome this, we develop Dirichlet flow matching on the simplex based on
mixtures of Dirichlet distributions as probability paths. In this framework, we
derive a connection between the mixtures' scores and the flow's vector field
that allows for classifier and classifier-free guidance. Further, we provide
distilled Dirichlet flow matching, which enables one-step sequence generation
with minimal performance hits, resulting in O(L) speedups compared to
autoregressive models. On complex DNA sequence generation tasks, we demonstrate
superior performance compared to all baselines in distributional metrics and in
achieving desired design targets for generated sequences. Finally, we show that
our classifier-free guidance approach improves unconditional generation and is
effective for generating DNA that satisfies design targets. Code is available
at https://github.com/HannesStark/dirichlet-flow-matching.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要