In-idris: modification of idris stemming algorithm for indonesian text

Febiarty Wulan Suci,Nur Hayatin,Yuda Munarko

IIUM ENGINEERING JOURNAL(2022)

引用 0|浏览3
暂无评分
摘要
Stemming has an important role in text processing. Stemming of each language is different and strongly affected by the type of text language. Besides that, each language has different rules in the use of words with an affix. A large number of the words used in the Indonesian language are formed by combining root words with affixes and other combining forms. One of the problems in Indonesian stemming is having different types of affixes, and also having some prefixes that changes according to the first letters of the root words. Implementing Idris stemmer for Indonesian text is of interest because Indonesia and Malaysia have the same language root. However, the results do not always produce the actual word, because the Idris algorithm first removes the prefix according to Rule 2. This elimination directly affected the Idris stemmer result when implemented to Indonesian text. In this study, we focus on a modified Idris stemmer (from Malay) to IN-Indris with Indonesia context. In order to test the proposed modification to the original algorithm, Indonesian online novels excerpts are used to measure the performance of IN-Idris. A test was conducted to compare the proposed algorithm with other stemmers. From the experiment result, IN-Idris had an accuracy of approximately 82.81%. There was an increased accuracy up to 5.25% when compared to Idris accuracy. Moreover, the proposed stemmer is also running faster than Idris with a gap of speed of around 0.25 seconds.
更多
查看译文
关键词
Idris stemming, IN-Idris, NLP, text pre-processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要