A platform for connecting social media data to domain-specific topics using large language models: an application to student mental health

JAMIA OPEN(2024)

引用 0|浏览10
暂无评分
摘要
Objectives To design a novel artificial intelligence-based software platform that allows users to analyze text data by identifying various coherent topics and parts of the data related to a specific research theme-of-interest (TOI).Materials and Methods Our platform uses state-of-the-art unsupervised natural language processing methods, building on top of a large language model, to analyze social media text data. At the center of the platform's functionality is BERTopic, which clusters social media posts, forming collections of words representing distinct topics. A key feature of our platform is its ability to identify whole sentences corresponding to topic words, vastly improving the platform's ability to perform downstream similarity operations with respect to a user-defined TOI.Results Two case studies on mental health among university students are performed to demonstrate the utility of the platform, focusing on signals within social media (Reddit) data related to depression and their connection to various emergent themes within the data.Discussion and Conclusion Our platform provides researchers with a readily available and inexpensive tool to parse large quantities of unstructured, noisy data into coherent themes, as well as identifying portions of the data related to the research TOI. While the development process for the platform was focused on mental health themes, we believe it to be generalizable to other domains of research as well. We present a novel artificial intelligence-platform that allows researchers to study large, unstructured and incoherent bodies of text using state-of-the-art natural language processing (NLP) tools. Our platform uses unsupervised NLP methods to structure the text as well as modern large-language models to understand whole sentences as opposed to individual words. With this platform, researchers can investigate a chosen theme-of-interest (TOI), identifying portions of the text related to their specific theme as well as other topics and themes that are correlated with their TOI. Mental health in the student population is a common research interest and we demonstrate the functionality of our platform through 2 case studies, in which we identify themes related to depression within text from student social media Reddit data. We also report on secondary topics of discussion correlated with the TOI, which offer insights into the context behind the detected depression-related themes.
更多
查看译文
关键词
natural language processing,artificial intelligence,mental health,topic modeling,social media
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要