Edisum: Summarizing and Explaining Wikipedia Edits at Scale
arxiv(2024)
摘要
An edit summary is a succinct comment written by a Wikipedia editor
explaining the nature of, and reasons for, an edit to a Wikipedia page. Edit
summaries are crucial for maintaining the encyclopedia: they are the first
thing seen by content moderators and help them decide whether to accept or
reject an edit. Additionally, edit summaries constitute a valuable data source
for researchers. Unfortunately, as we show, for many edits, summaries are
either missing or incomplete. To overcome this problem and help editors write
useful edit summaries, we propose a model for recommending edit summaries
generated by a language model trained to produce good edit summaries given the
representation of an edit diff. This is a challenging task for multiple
reasons, including mixed-quality training data, the need to understand not only
what was changed in the article but also why it was changed, and efficiency
requirements imposed by the scale of Wikipedia. We address these challenges by
curating a mix of human and synthetically generated training data and
fine-tuning a generative language model sufficiently small to be used on
Wikipedia at scale. Our model performs on par with human editors. Commercial
large language models are able to solve this task better than human editors,
but would be too expensive to run on Wikipedia at scale. More broadly, this
paper showcases how language modeling technology can be used to support humans
in maintaining one of the largest and most visible projects on the Web.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要