Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
arxiv(2022)
摘要
Are multimodal inputs necessary for grammar induction? Recent work has shown
that multimodal training inputs can improve grammar induction. However, these
improvements are based on comparisons to weak text-only baselines that were
trained on relatively little textual data. To determine whether multimodal
inputs are needed in regimes with large amounts of textual training data, we
design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is
a C-PFCG that incorporates em-beddings from text-only large language models
(LLMs). We use a fixed grammar family to directly compare LC-PCFG to various
multi-modal grammar induction methods. We compare performance on four benchmark
datasets. LC-PCFG provides an up to 17
compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is
also more computationally efficient, providing an up to 85
parameter count and 8.8x reduction in training time compared to multimodal
approaches. These results suggest that multimodal inputs may not be necessary
for grammar induction, and emphasize the importance of strong vision-free
baselines for evaluating the benefit of multimodal approaches.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要