TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models
arxiv(2024)
Abstract
The recently unprecedented advancements in Large Language Models (LLMs) have
propelled the medical community by establishing advanced medical-domain models.
However, due to the limited collection of medical datasets, there are only a
few comprehensive benchmarks available to gauge progress in this area. In this
paper, we introduce a new medical question-answering (QA) dataset that contains
massive manual instruction for solving Traditional Chinese Medicine examination
tasks, called TCMD. Specifically, our TCMD collects massive questions across
diverse domains with their annotated medical subjects and thus supports us in
comprehensively assessing the capability of LLMs in the TCM domain. Extensive
evaluation of various general LLMs and medical-domain-specific LLMs is
conducted. Moreover, we also analyze the robustness of current LLMs in solving
TCM QA tasks by introducing randomness. The inconsistency of the experimental
results also reveals the shortcomings of current LLMs in solving QA tasks. We
also expect that our dataset can further facilitate the development of LLMs in
the TCM area.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined