Creating a Labeled Dataset for Medical Misinformation in Health Forums

Alexander Kinsora, Kate Barron,Qiaozhu Mei,V. G. Vinod Vydiswaran

2017 IEEE International Conference on Healthcare Informatics (ICHI)(2017)

引用 17|浏览26
暂无评分
摘要
The dissemination of medical misinformation online presents a challenge to human health. Machine learning techniques provide a unique opportunity for decreasing the cognitive load associated with deciding upon whether any given user comment is likely to contain misinformation, but a paucity of labeled data of medical misinformation makes supervised approaches a challenge. In order to ameliorate this condition, we present a new labeled dataset of misinformative and non-misinformative comments developed over posted questions and comments on a health discussion forum. This required extraction of candidate misinformative entries from the corpus using information retrieval techniques, development of a codex and labeling strategy for the dataset, and the creation of features for use in machine learning tasks. By identifying the nine most descriptive features with regard to classification as misinformative or non-misinformative through the use of Recursive Feature Elimination, we achieved a classification accuracy of 90.1%, where the dataset is comprised 85.8% of non-misinformative comments. In our opinion, this dataset and analysis will aid the machine learning community in the development of an online misinformation classification system over user-generated content such as medical forum posts.
更多
查看译文
关键词
nonmisinformative comments,machine learning community,online misinformation classification system,medical forum posts,labeled dataset,human health,machine learning techniques,health discussion forum,codex,labeling strategy,information retrieval techniques,misinformative comments,online medical misinformation dissemination,user comment,candidate misinformative entries extraction,recursive feature elimination
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要