Discourse features of blogs in subcorpus of russian ru-rstreebank

Computational Linguistics and Intellectual Technologies(2020)

引用 1|浏览0
暂无评分
摘要
The paper presents a corpus study of the discourse features in the corpus of blogs. It is based on the data of Ru-RSTreebank annotated within the framework of the Rhetorical Structure theory [Mann, Thompson 1988]. The Ru-RSTreebank represents genres of news and popular science, scientific papers, and blogs texts. Blog subcorpus contains such topics as travelling, cosmetics, sports and health, psychology, IT and tech and some others. Blogs texts constitute a specific genre as they combine properties of written and spoken discourse. The purpose of the paper is to investigate discourse features of blogs in comparison with other genres. We analyze the variation in rhetoric relations distribution among genres, and single out the differences in discourse connectives usage. Furthermore, we check the distribution of other discourse features reported in different studies for spoken discourse and for social media in the Ru-RSTreebank blogs subcorpus. The general frequency analysis and the experiments on RandomForest classifier application to genre recognition have shown that the most important rhetoric relations specific to blogs are Evaluation and Contrast, that there is a tendency to use shorter discourse units and not to express the discourse relations overtly via subordinative conjunctions.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要