Towards the Creation of the Filipino Wordnet: A Two-Way Approach.

Briane Paul Samson,Charibeth Cheng, Unisse C. Chua,Dan John Velasco, Axel Alba, Trisha Gail Pelagio, Bryce Anthony Ramirez, Robi Jeanne Bangonon, Christine Deticio, Sharmaine Gaw, Danielle Kirsten Sison, Criscela Ysabelle Racelis,James Kevin Lin, Mark Edward M. Gonzales, Phoebe Clare Ong

2023 International Conference on Asian Language Processing (IALP)(2023)

引用 0|浏览0
暂无评分
摘要
As databases of lexical information on words and their lexical relationships, WordNets are important for various downstream natural language processing applications. However, the construction of WordNets can be challenging, especially for low-resource languages such as Filipino. The existing Filipino WordNet has not been maintained, and lacks contextual information for identifying the evolution of word senses. In this study, we built a corpus of 5,370,667 unique tokens and used it to construct a Filipino WordNet via a two-way approach that combines natural language processing and network science. For the natural language processing approach, we utilized only two linguistic sources: our corpus and a RoBERTa-based language model that generates sentence embeddings. For the network science approach, we created a temporal-multiplex network that represents the co-occurrence of words, their semantic relationships, and their usage in different sources across time. We show that our proposed method can induce existing senses (30% of our validation data, as evaluated by matching with the senses from Princeton WordNet) and generate 9,549 semantic sets.
更多
查看译文
关键词
wordnet construction,word sense induction,word sense disambiguation,word co-occurrence networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要