Building a "Corpus of 7 Types Emotion Co-occurrences Words" of Chinese Emotional Words with Big Data Corpus.

International Conference on Human-Computer Interaction (HCI International)(2022)

引用 2|浏览2
Past studies used human rated as the way of establishing a corpus which costs a lot of time and money but contains insufficient words, also the Categorical Approach was seldom used for building corpus, which may also lead to study bias. Therefore, study 1 of present study has used the Spreading Activation Model as the structure, and used big data of text corpus and word co-occurrences to build a corpus that contains more categories of emotions and much more words. First, study 1 selected the words that can clearly describe the meanings or can effectively evoke the feeling of its emotion category for seven emotions, including Happiness, Surprise, Sadness, Anger, Disgust, Fear, and Love. Then study 1 calculated the averages of co-occurrences for selected words and text corpora by seven emotions categories (measure is Baroni-Urbani, unit is chunk), it computes the averages of co-occurrences by emotional categories for 33669 words, it represents the conceptual consonance of words and the emotions. Study 2 has investigated the practical use of the corpus built in study 1, and used C-LIWC dictionary which was built by human rated as a comparison, taking the posts of Happy Board, Sad Board, Hate Board of PTT Bulletin Board System into the analyses of emotions recognition, result showed that Corpus of 7 Types Emotion Co-occurrences Words" built in study 1 had higher correct rate than human rated corpus. Present study has also compared the correct rates between the Corpus of 7 Types Emotion Co-occurrences Words and CLIWC (Chinese Linguistic Inquiry and Word Count), result showed correct rates of two databases were significant different, the corpus of present study has higher correct rate. Present study has built a text corpus for the material of emotion research, and the results also supports a potential of building the corpora of emotional words with big data measures.
Emotional words, Co-occurrence, Chinese, Big data, Corpus
AI 理解论文
Chat Paper