Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu

Asia Samreen,Syed Asif Ali

DATA IN BRIEF(2024)

引用 0|浏览0
暂无评分
摘要
Roman Urdu and English are often used together as a hybrid language for communication on social media. Because writers don't worry about spelling when utilizing the English alphabet to write Urdu during texting, it becomes challenging to interpret mixed codes for emotions. There are over 14,0 0 0 emotion lexicons in this dataset, each of which lists nine different emotions and their polarities. The NRC emotion lexicons [8] provided in Urdu have been transliterated into Roman Urdu. To verify that the provided translation is accurate, we used three online dictionaries of Urdu. A Python script that transliterates words from Urdu to Roman Urdu has been used to develop Roman Urdu transliteration. Sentiment and mood, depending on the emotion lexicon, are also provided. The textual data has been annotated using the unigram feature and distance estimation among strings and lexicons. Approximately 10,0 0 0 sentences from the baseline sample have been automatically annotated. (c) 2023 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
更多
查看译文
关键词
Natural language processing,Bilingual text,Text transliteration,Text annotation,Emotion Lexicons
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要