Trends in accuracy and appropriateness of alopecia areata information obtained from a popular online large language model, ChatGPT.

Dermatology (Basel, Switzerland)(2023)

引用 0|浏览5
暂无评分
摘要
Patients with alopecia areata (AA) may access a wide range of sources for information about AA, including the recently developed ChatGPT. Assessing the quality of health information provided by these sources is crucial, as patients are utilizing them in increasing numbers. This study aimed to evaluate appropriateness and accuracy of responses to common patient questions about AA generated by ChatGPT. Responses generated by ChatGPT 3.5 and ChatGPT 4.0 to 25 questions addressing common patient concerns were assessed by multiple attending dermatologists in an academic center for appropriateness and accuracy. Appropriateness of responses by both models for use in two hypothetical contexts: 1) for patient-facing general information websites, and 2) for electronic health record (EHR) message drafts were measured. This study found the accuracy across all responses was 4.41 out of 5. Accuracy scores of responses ChatGPT 3.5 responses had a mean of 4.29, whereas those generated by ChatGPT 4.0 had mean accuracy score of 4.53. Assessments ranged from 100% of responses rated as appropriate for the general question category to 79% questions about management for an EHR message draft. Raters largely preferred responses generated by ChatGPT 4.0 vs. ChatGPT 3.5. Reviewer agreement was found to be moderate across all questions, with a 53.7% agreement and Fleiss' κ co-efficient of 0.522 (p-value < 0.001). The large language model ChatGPT outputted mostly appropriate information for common patient concerns. While not all responses were accurate, the trend toward improvement with newer iterations suggests potential future utility for patients and dermatologists.
更多
查看译文
关键词
Alopecia areata, AA, artificial intelligence, ChatGPT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要