BARD: Bangla Article Classification Using a New Comprehensive Dataset

2018 International Conference on Bangla Speech and Language Processing (ICBSLP)(2018)

引用 5|浏览0
暂无评分
摘要
In the literature, automated Bangla article classification has been studied, where several supervised learning models have been proposed by utilizing a large textual data corpus. Despite several comprehensive textual datasets are available for different languages, a few small datasets are curated on Bangla language. As a result, a few works address Bangla document classification problem, and due to the lack of enough training data, these approaches could not able to learn sophisticated supervised learning model. In this work, we curated a large dataset of Bangla articles from different news portals, which contains around 3,76,226 articles. This huge diverse dataset helps us to train several supervised learning models by utilizing a set of sophisticated textual features, such as word embeddings, TF-IDF. In this works, our learning model shows promising performance on our curated dataset, compared to state-of-the-art works in Bangla article classification. Furthermore, we deployed our proposed Bangla content classifier as a web application: bard2018.pythonanywhere.com and the video demo of this application is available here: bit.lylBARD_ VIDEO_DEMO. Additionally, we open-sourced the BARD dataset(bit.lyIBARD_DATASET) and source code of this work(bit.lvlBARD SC).
更多
查看译文
关键词
Document Classification,Machine Learning,Bangla Article Dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要