A Hybrid Method for Dissimilarity Analysis between Short Text Documents
2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer)(2018)
摘要
Similarity analysis is an extremely popular aspect of Natural Language Processing (NLP). Most of the existing works focuses on analysing content of large documents. There are comparatively a smaller number of researches available which focuses on analysing similarity between short unstructured documents. This work proposes a hybrid approach which uses WordNet Path vector cosine angle analysis and Dice co-efficient overlap level analysis to determine the similarity levels of short texts. A regression model is used to dynamically weight and combine the calculated two individual scores into a single score. This hybrid approach was found to have significantly higher accuracy rates against Term Frequency_Inverse Document Frequency (TF-IDF) and Dice co-efficient techniques.
更多查看译文
关键词
Similarity Analysis,WordNet Path Vector Cosine Angle,Dice Co-efficient,Linear Regression.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要