On Tree-structured Multi-stage Principal Component Analysis (TMPCA) for Text Classification.

arXiv: Computation and Language(2018)

Cited 23|Views9
No score
Abstract
A novel sequence-to-vector (seq2vec) embedding method, called the tree-structured multi-stage principal component analysis (TMPCA), is proposed for the text classification problem in this paper. Unlike conventional word-to-vector embedding methods, the TMPCA method conducts dimension reduction at the sequence level without labeled training data. Furthermore, it can preserve the sequential structure of input sequences. We show that TMPCA is computationally efficient and able to facilitate sequence-based text classification tasks by preserving strong mutual information between its input and output mathematically. It is also demonstrated by experimental results that a dense (fully connected) network trained on the TMPCA preprocessed data achieves better performance than state-of-the-art fastText and other neural-network-based solutions.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined