Integrating Probabilistic and Knowledge-based Approaches to Corpus Parsing

Computational Linguistics for Speech and Handwriting Recognition: A one-day workshop organized by LJ Evett and T. G Rose as part of the AISB 1994 Workshop Series(2017)

引用 3|浏览10
暂无评分
摘要
We have developed a prototype system for syntactic parsing of corpus text based on a wide-coverage unification-based grammar of English and domain-independent statistical techniques for selecting the most plausible parses from the typically large number licensed by the grammar. Although the results from initial experiments are promising, the system is ‘brittle’, relying particularly on the correctness and completeness of lexical entries. We are currently concentrating on parsing large amounts of tagged text with a relatively simple, but robust, grammar of tag sequences and punctuation. This grammar produces coarse phrasal analyses of sentences from which possible complementation patterns can be extracted, allowing omissions in the set of lexical entries to be remedied. 1 The Probabilistic LR Parsing System Briscoe & Carroll (1993) describe an approach to probabilistic parse selection using a large unification-based grammar of English. The grammar contains approximately 800 phrase structure rules written in the Alvey Natural Language Tools (ANLT) formalism (Briscoe et al. 1987), a syntactic variant of the Definite Clause Grammar formalism (Pereira & Warren 1980). The ANLT grammar has wide coverage and has been shown, for instance, to be capable of assigning a correct analysis to 96.8% of a corpus of 10,000 noun phrases extracted randomly from manually analysed corpora (Taylor, Grover & Briscoe 1989). The grammar is linked to a lexicon containing about 64,000 entries for 40,000 lexemes, including 1This research is supported in part by ESPRIT BRA 7315 ‘The Acquisition of Lexical Knowledge’ (ACQUILEX II).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要