Enhancing Health Information Retrieval with Large Language Models: A Study on MedQuAD Dataset.

Prajwol Lamichhane,Indika Kahanda

International Conference on Machine Learning and Applications(2023)

引用 0|浏览0
暂无评分
摘要
Given the enormous volume of textual data generated in the healthcare sector, effective and accurate retrieval systems are essential. A major challenge is presented by the explosive growth of scientific publications and medical information. In this study, an advanced pipeline was developed to enhance information retrieval in the healthcare domain. The pipeline has two components: information retrieval and evaluation. The information retrieval component is composed of the retriever, which uses BM25, and the reader, which is powered by a pre-trained Large Language Model. The evaluation component, which uses standardized dataset formats such as SQuAD, provides a framework for evaluating system performance and comparing different parameters. Based on the Cancer category in the MedQuAD dataset, the retriever component showed a strong recall of 0.881 and a Mean Reciprocal Rank score of 0.804, demonstrating its effectiveness in retrieving relevant information and accurate ranking. A Semantic Answer Similarity score of 0.677 for the reader component indicates room for improvement. This work has implications for healthcare providers and the text-mining community working in health information retrieval.
更多
查看译文
关键词
health information retrieval,BM25,pre-trained large language models,MedQuAD
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要