Exploring Twitter by Combining Structured and Unstructured Information

PROCESAMIENTO DEL LENGUAJE NATURAL(2015)

引用 0|浏览0
暂无评分
摘要
In this paper we show how it is possible to extract useful knowledge from Twitter structured information that can improve the results of a NLP task. Tweets are short and low quality and this makes it difficult to apply classical NLP techniques to this kind of texts. However, Twitter offers more than 140 characters in their messages to work with. In Twitter ecosystem there are many objects ( tweets, hashtags, users, words,...) and relationships between them (co-occurrence, mentions, re-tweets,...) that allow us to experiment with alternative processing techniques. In this paper we have worked with a tweet classification task. If we only use knowledge extracted from the relationship Follow we achieve similar results to those of a classifier based on bags of words. When we combine the knowledge from both sources we improve the results in more than 13 percentual points with respect to the original models. This shows that structured information is not only a good source of knowledge but is also complementary to the content of the messages. We also have applied the same philosophy to the task of collecting the corpus for our classification task. In this case we have use a dynamic retrieval technique based on relationships between Twitter entities that allows us to build a collection of more representative tweets.
更多
查看译文
关键词
Tweets retrieval,tweets categorization,structured and unstructured information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要