Evaluation Metrics

Theory and Applications of Natural Language Processing（2016）

Cited 0|Views8

No score

Abstract

This chapter discusses how to evaluate anaphora or coreference resolution systems. The problem is non-trivial in that it needs to deal with a multitude of sub-problems, such as: (1) What is the evaluation unit (entities or links); if entities, is entity-alignment needed? if links, how to handle single-mention entities? (2) How to deal with the fact that the response mention set may differ from that of the key mention set? We will review the prevailing metrics proposed in the last two decades, including MUC, B-cubed, CEAF and BLANC. We will give illustrative examples to show how they are computed, and the scenarios under which they are intended to be used. We will present their strengths and weaknesses, and clarify some misunderstandings of the metrics found in the recent literature.

Translated text

Key words

Evaluation metrics NLP evaluation, Evaluation metrics, Shared tasks

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined