Challenges in Compiling Expert Corpora for Academic Writing Support

semanticscholar(2021)

Cited 0|Views7
No score
Abstract
Since most of the academic articles relevant for many disciplines are to be found in English, it is important to understand the linguistic challenges of academic publishing in English L2 in contrast with the mother tongue academic writing specifics. The present paper explores a series of challenges faced in the attempt to build expert corpora for academic writing research and teaching. Particularly, the study reports on the construction of the DACRE corpus, an expert bilingual comparable corpus consisting of discipline-specific peer-reviewed scientific articles. The corpus should facilitate the extraction of the salient linguistic and rhetorical features specific for each selected discipline (Linguistics, IT, Political Sciences, Economics) and language variety (Romanian, English L1 and L2). At the initial stage of the corpus compilation process, when assessing the linguistic resources to be included in the corpus, a multitude of challenges emerges. For example, the linguistic level of these resources is not consistent. Other difficulties we encountered were the data availability (open sources or subscription-based), lack of recent resources for certain corpus batches, “multi-authorship” in determining L1 texts, and, most important, legal aspects (i.e. copyright). By describing, comparing and analysing data collection obstacles, we propose a model for expert corpus building in English vs lowresource languages such as Romanian.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined