Coping With Unruly Language: Non-Standard Usage In A Corpus

GRAMMAR AND CORPORA 2016(2018)

Cited 0|Views0
No score
Abstract
A language as used in real situations may differ substantially from its standard form. Before the entire range of NLP methods and tools can be applied to non-canonical variants of a language, appropriate categories for the analysis of deviant forms and constructions are needed, together with texts annotated by these categories. A discussion of non-standard language is followed by two case studies. The first study proposes a taxonomy of morphosyntactic categories as an attempt to analyze non-standard forms in non-native learners' Czech. The second study focuses on the role of a rule-based grammar and lexicon as tools for the detection and diagnostics of non-standard words and constructions in the process of building and using a parsebank.
More
Translated text
Key words
Non-standard language, Czech, learner corpus, parsebank, tree-bank, constrain-based grammar, valency, HPSG
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined