Chrome Extension
WeChat Mini Program
Use on ChatGLM

Sentence Embedding Model Based on Feature Selection

Tian Hongpeng, Jiang Jia

2020 International Conference on Computer Engineering and Application (ICCEA)(2020)

Cited 0|Views0
No score
Abstract
The method of calculating the word vector using the neural network method provides the motivation for generating the representation model of the sentence. For the smooth inverse frequency sentence vector model, only the word frequency information on the general data set is considered to calculate the word weight, but when it is specific to the task, the problem that the different words contribute differently to the task and its weight correction is not considered. According to the distribution of characteristic words in different categories in the dataset, the task contribution factor (TCF) is proposed by using the improved information gain feature selection method. Based on this factor, a sentence vector representation model (IIG-SIF) based on task contribution is proposed. By testing on the standard text classification dataset 20 Newsgroups and the text similarity calculation dataset SICK, the IIG-SIF model has a greater improvement in the two tasks of text categorization and Text similarity calculation than the original SIF model.
More
Translated text
Key words
task contribution factor,sentence embedding model,information gain,text similarity calculation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined