Chrome Extension
WeChat Mini Program
Use on ChatGLM

Evaluation Metrics

Theory and Applications of Natural Language Processing(2016)

Cited 0|Views8
No score
Abstract
This chapter discusses how to evaluate anaphora or coreference resolution systems. The problem is non-trivial in that it needs to deal with a multitude of sub-problems, such as: (1) What is the evaluation unit (entities or links); if entities, is entity-alignment needed? if links, how to handle single-mention entities? (2) How to deal with the fact that the response mention set may differ from that of the key mention set? We will review the prevailing metrics proposed in the last two decades, including MUC, B-cubed, CEAF and BLANC. We will give illustrative examples to show how they are computed, and the scenarios under which they are intended to be used. We will present their strengths and weaknesses, and clarify some misunderstandings of the metrics found in the recent literature.
More
Translated text
Key words
Evaluation metrics NLP evaluation, Evaluation metrics, Shared tasks
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined