Wav2Gloss: Generating Interlinear Glossed Text from Speech
arxiv(2024)
摘要
Thousands of the world's languages are in danger of extinction–a tremendous
threat to cultural identities and human language diversity. Interlinear Glossed
Text (IGT) is a form of linguistic annotation that can support documentation
and resource creation for these languages' communities. IGT typically consists
of (1) transcriptions, (2) morphological segmentation, (3) glosses, and (4)
free translations to a majority language. We propose Wav2Gloss: a task to
extract these four annotation components automatically from speech, and
introduce the first dataset to this end, Fieldwork: a corpus of speech with all
these annotations covering 37 languages with standard formatting and
train/dev/test splits. We compare end-to-end and cascaded Wav2Gloss methods,
with analysis suggesting that pre-trained decoders assist with translation and
glossing, that multi-task and multilingual approaches are underperformant, and
that end-to-end systems perform better than cascaded systems, despite the
text-only systems' advantages. We provide benchmarks to lay the ground work for
future research on IGT generation from speech.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要