Performance of AI Chatbots on Controversial Topics in Oral Medicine, Pathology, and Radiology

Hossein Mohammad-Rahimi, Zaid H Khoury, Mina Iranparvar Alamdari, Rata Rokhshad,Parisa Motie,Azin Parsa, Tiffany Tavares, James J Sciubba,Jeffery B Price,Ahmed S Sultan

Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology(2024)

引用 0|浏览1
暂无评分
摘要
Objectives In this study, we assessed six different artificial intelligence (AI) chatbots (Bing, GPT-3.5, GPT-4, Google Bard, Claude, Sage) responses to controversial and difficult questions in oral pathology, oral medicine, and oral radiology. Study Design The chatbots’ answers were evaluated by board-certified specialists using a modified version of the global quality score on a 5-point Likert scale. The quality and validity of chatbot citations were evaluated. Results Claude had the highest mean score of 4.341 ± 0.582 for oral pathology and medicine. Bing had the lowest scores of 3.447 ± 0.566. In oral radiology, GPT-4 had the highest mean score of 3.621 ± 1.009 and Bing the lowest score of 2.379 ± 0.978. GPT-4 achieved the highest mean score of 4.066 ± 0.825 for performance across all disciplines. 82 out of 349 (23.50%) of generated citations from chatbots were fake. Conclusions The most superior chatbot in providing high-quality information for controversial topics in various dental disciplines was GPT-4. Although the majority of chatbots performed well, it is suggested that developers of AI medical chatbots incorporate scientific citation authenticators to validate the outputted citations given the relatively high number of fabricated citations.
更多
查看译文
关键词
Artificial intelligence,Chatbots,Machine Learning,Oral and Maxillofacial Pathology,Oral Medicine,Oral and Maxillofacial Radiology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要