Design and Execution of ETL Process to Build Topic Dimension from User-Generated Content.

RCIS(2021)

引用 1|浏览0
暂无评分
摘要
Latest research studies on multi-dimensional design have combined business data with User-Generated Content (UGC). They have integrated new analytical aspects, such as user's behavior, sentiments, opinions or topics of interest, to ameliorate decisional analysis. In this paper, we deal with the complexity of designing topics dimension schema due to the dynamicity and heterogeneity of its hierarchies. Researchers addressed partially this issue by offering technical solutions to topics detection without focusing on the Extraction, Transformation and Loading (ETL) process allowing their integration in multi-dimensional schema. Our contribution consists in modeling ETL steps generating valid topic dimension hierarchies referring to UGC informal texts. In this research work, we propose a generic ETL4SocialTopic process model defining a set of operations executed following a specific order. The implementation of these steps offers a set of customized jobs simplifying the ETL designer's work by automating a large part of the process. Experimentation results show the consistency of ETL4SocialTopic to design valid topic dimension schemas in several contexts.
更多
查看译文
关键词
User-generated content,Topic hierarchy design,Data warehouse,ETL process,Twitter,Social media
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要