Transformer-Based Models for the Automatic Indexing of Scientific Documents in French

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022)(2022)

引用 0|浏览8
暂无评分
摘要
Automatic indexing is a challenging task in which computers must emulate the behaviour of professional indexers to assign to a document some keywords or keyphrases that represent concisely the content of the document. While most of the existing algorithms are based on a select-and-rank strategy, it has been shown that selecting only keywords from text is not ideal as human annotators tend to assign keywords that are not present in the source. This problem is more evident in scholarly literature. In this work we leverage a transformer-based language model to approach the automatic indexing task from a generative point of view. In this way we overcome the problem of keywords that are not in the original document, as the neural language models can rely on knowledge acquired during their training process. We apply our method to a French collection of annotated scientific articles.
更多
查看译文
关键词
Automatic indexing, Keyword generation, Transformers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要