Developing a Singlish Neural Language Model using ELECTRA

Galangkangin Gotera,Radityo Eko Prasojo, Yugo Kartono Isal

2022 International Conference on Advanced Computer Science and Information Systems (ICACSIS)(2022)

引用 0|浏览22
暂无评分
摘要
We develop and benchmark a Singlish pretrained neural language model. To this end, we build a novel 3 GB Singlish freetext dataset collected through various Singaporean websites. Then, we leverage ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) to train a transformer-based Singlish language model. ELECTRA is chosen due to its resource-efficiency to better ensure reproducibility. We further build two text classification datasets in Singlish: sentiment analysis and language identification. We use the two datasets to fine-tune our ELECTRA model and benchmark the results against other available pretrained models in English and Singlish. Our experiments show that our Singlish ELECTRA model is competitive against the best open-source models we found despite being pretrained within a significantly less amount of time. We publicly release the benchmarking dataset.
更多
查看译文
关键词
Singlish,ELECTRA,language model pretraining,benchmarking dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要