FarsAcademic: A Standard Persian Test Collection for Information Retrieval in Scientific Texts

Davoud Haseli,Hashem Atapour, Fatemeh Fahimniya,nader Naghshineh, Molook sadat Beheshti Hoseini,Mohammad Sadegh Zahedi

International Journal of Information Science and Management(2023)

引用 0|浏览0
暂无评分
摘要
A significant amount of scientific texts is produced in Persian and available in scientific information databases through the Web. In this paper, FarsAcademic, a test collection of Persian scientific texts has been built for implementation of information retrieval models among academic search comprising 102238 documents and 61 topics. While constructing FarsAcademic, we have tried to resolve the problems specific to information retrieval (IR) and natural language processing (NLP) in Persian scientific texts. Domain experts were employed to create queries within their research area and user relevance and topical relevance were applied to improve the precision of relevance judgment of documents. Further, to improve retrieval performance in Persian scientific texts, automated query expansion was applied using one of the relevant feedback techniques named as Local Context Analysis algorithm. The result showed that query expansion techniques outperformed other information retrieval models in the Persian scientific texts retrieval task. Eventually, FarsAcademic is the only one that has been provided free of charge for Iranian information retrieval scholars for them to implement and evaluate different information retrieval models and algorithms on Persian scientific text and academic search.
更多
查看译文
关键词
information search,farsacademic,information retrieval,persian language,scientific texts,test collection,retrieving information tool
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要