TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(2022)

Cited 0|Views58
No score
Abstract
Although conceptualization has been widely studied in semantics and knowledge representation, it is still challenging to find the most accurate concept terms to tag fast-growing social media content. This is partly attributed to the fact that most traditional knowledge bases contain general terms of the world, such as trees and cars, which are not interesting to users, and do not have the defining power for social media content. Another reason is that the intricate use of tense, negation and grammar in social media content may change the logic or emphasis of the content, thus focusing on different main ideas. In this paper, we present TAG, a high-quality concept matching dataset consisting of 10,000 labeled pairs of fine-grained concepts and web-styled natural language sentences, mined from open-domain social media content. The concepts we provide are the trending terms on social media and have the right granularity to define user interests, e.g., highly educated actors instead of just actors. In the meantime, TAG offers a concept graph which interconnects these fine-grained concepts and entities to provide contextual information. We evaluate a wide range of neural text matching models as well as pre-trained language models for the concept matching task on TAG, and point out their insufficiency to tag social media content to characterize its main idea. We further propose a novel graph-graph matching framework that demonstrates superior abstraction and generalization performance by better utilizing both the structural information in the concept graph and logic interactions between semantic units in the natural language sentence via syntactic dependency parsing.
More
Translated text
Key words
concept,social media,content
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined