Content-based Multiclass Classification on Indonesian SMS Messages

Hasna Roihan Nafiisah,Fariska Zakhralativa Ruskanda

2022 International Symposium on Electronics and Smart Devices (ISESD)(2022)

引用 0|浏览0
暂无评分
摘要
Short Message Service (SMS) is a text-based communication service without internet connection, provided in most cellular phones worldwide, including Indonesia. SMS is used for multiple purposes, starting from ads, notifications, and daily conversations. The convenience given by SMS also comes with risks, where fraud messages are commonly sent to phones. Some types of SMS texts are sent in large amount, which make it difficult for phone users to access certain type of SMS. In this research, SMS texts are classified into 4 types of contents: ads, information, fraud, and regular. Shallow learning and deep learning methods are both used to classify text messages, including logistic regression, decision tree, CharCNN (Zhang et al., 2015), McM (Shakeel et al., 2019), and pretrained model IndoBERT (Wilie et al., 2020). Based on observation from experiment, IndoBERTbase-p2 outperformed the others with macro-F1 score 94.05%. In addition to prediction evaluation, storage size and inference time of the best model also analyzed on mobile devices. Model deployment on Android phones shows that storage space for IndoBERT model is 241.34 MB, and average inference time 0.2279 second on Samsung Galaxy A52s5G and 0.789 second on Vivo Y65.
更多
查看译文
关键词
SMS,Indonesian language,classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要