A Hybrid Method for Dissimilarity Analysis between Short Text Documents

Ramitha Abeyratne,Cassim Farook

2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer)(2018)

引用 0|浏览0
暂无评分
摘要
Similarity analysis is an extremely popular aspect of Natural Language Processing (NLP). Most of the existing works focuses on analysing content of large documents. There are comparatively a smaller number of researches available which focuses on analysing similarity between short unstructured documents. This work proposes a hybrid approach which uses WordNet Path vector cosine angle analysis and Dice co-efficient overlap level analysis to determine the similarity levels of short texts. A regression model is used to dynamically weight and combine the calculated two individual scores into a single score. This hybrid approach was found to have significantly higher accuracy rates against Term Frequency_Inverse Document Frequency (TF-IDF) and Dice co-efficient techniques.
更多
查看译文
关键词
Similarity Analysis,WordNet Path Vector Cosine Angle,Dice Co-efficient,Linear Regression.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要