Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction.
CoRR(2023)
摘要
ChatGPT has demonstrated impressive performance in various downstream tasks.
However, in the Chinese Spelling Correction (CSC) task, we observe a
discrepancy: while ChatGPT performs well under human evaluation, it scores
poorly according to traditional metrics. We believe this inconsistency arises
because the traditional metrics are not well-suited for evaluating generative
models. Their overly strict length and phonics constraints may lead to
underestimating ChatGPT's correction capabilities. To better evaluate
generative models in the CSC task, this paper proposes a new evaluation metric:
Eval-GCSC. By incorporating word-level and semantic similarity judgments, it
relaxes the stringent length and phonics constraints. Experimental results show
that Eval-GCSC closely aligns with human evaluations. Under this metric,
ChatGPT's performance is comparable to traditional token-level classification
models (TCM), demonstrating its potential as a CSC tool. The source code and
scripts can be accessed at https://github.com/ktlKTL/Eval-GCSC.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要