Symbolic Autoencoding for Self-Supervised Sequence Learning
CoRR(2024)
摘要
Traditional language models, adept at next-token prediction in text
sequences, often struggle with transduction tasks between distinct symbolic
systems, particularly when parallel data is scarce. Addressing this issue, we
introduce symbolic autoencoding (ΣAE), a self-supervised
framework that harnesses the power of abundant unparallel data alongside
limited parallel data. ΣAE connects two generative models via a discrete
bottleneck layer and is optimized end-to-end by minimizing reconstruction loss
(simultaneously with supervised loss for the parallel data), such that the
sequence generated by the discrete bottleneck can be read out as the transduced
input sequence. We also develop gradient-based methods allowing for efficient
self-supervised sequence learning despite the discreteness of the bottleneck.
Our results demonstrate that ΣAE significantly enhances performance on
transduction tasks, even with minimal parallel data, offering a promising
solution for weakly supervised learning scenarios.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要