NewB: 200,000+ Sentences for Political Bias Detection

arxiv(2020)

引用 0|浏览22
暂无评分
摘要
We present the Newspaper Bias Dataset (NewB), a text corpus of more than 200,000 sentences from eleven news sources regarding Donald Trump. While previous datasets have labeled sentences as either liberal or conservative, NewB covers the political views of eleven popular media sources, capturing more nuanced political viewpoints than a traditional binary classification system does. We train two state-of-the-art deep learning models to predict the news source of a given sentence from eleven newspapers and find that a recurrent neural network achieved top-1, top-3, and top-5 accuracies of 33.3%, 61.4%, and 77.6%, respectively, significantly outperforming a baseline logistic regression model's accuracies of 18.3%, 42.6%, and 60.8%. Using the news source label of sentences, we analyze the top n-grams with our model to gain meaningful insight into the portrayal of Trump by media sources.We hope that the public release of our dataset will encourage further research in using natural language processing to analyze more complex political biases. Our dataset is posted at https://github.com/JerryWei03/NewB .
更多
查看译文
关键词
sentences,detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要