Building NLP Tools to Process Sinhala Text Data Written using English Letters

W.S.A. Kurera, R.K. Rajapaksha, H.A.P. Rupasinghe, K.L.D.U.B. Liyanage,Sanvitha Kasthuriarachchi,Samantha Rajapakshe

2022 22nd International Conference on Advances in ICT for Emerging Regions (ICTer)(2022)

引用 0|浏览5
暂无评分
摘要
Sri Lanka is a country where the Sinhala and English languages take a prominent place, but most people are not so fluent in the two languages when using digital platforms like social media. Although most of the Sri Lankans use Singlish there is no proper application to translate the relevant language into the main two languages, Sinhala or English.The purpose of this research is to bring more attention of the researchers towards the sub varieties of Sinhala language, especially Singlish by bridging the gap between NLP tools available for processing textual data of main languages and their varieties. And each component focuses on translating Singlish text data into English, translating Singlish into Sinhala, sentiment analysis for Singlish text data and mapping emojis for Singlish tokens.There are language translation tools to translate many languages like Russian, Italian, Japanese into the English Language but there are no proper tools to translate the mainly used sublanguage in Sri Lanka; that is Singlish. Therefore, the proposed solution for the above problem is developing a Singlish to English Language Translation Model and removing slang words to allow NLP technologies created for English to be utilized to handle Singlish textual data.This NLP model shall have the capacity of translating Singlish words that are used on social media into English and if there are any slang words found during the translation, those words will be identified and removed. Translating the Singlish textual data into English would help the users to use other tools built to plagiarize, tools that are used to summarize and tools that are used to paraphrase English text.
更多
查看译文
关键词
Machine learning,Natural Language Processing,Sentiment analysis,Feature Engineering,API facilities
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要