Estimating the number of components and detecting outliers using Angle Distribution of Loading Subspaces (ADLS) in PCA analysis.

Analytica chimica acta(2018)

引用 28|浏览41
暂无评分
摘要
Principal Component Analysis (PCA) is widely used in analytical chemistry, to reduce the dimensionality of a multivariate data set in a few Principal Components (PCs) that summarize the predominant patterns in the data. An accurate estimate of the number of PCs is indispensable to provide meaningful interpretations and extract useful information. We show how existing estimates for the number of PCs may fall short for datasets with considerable coherence, noise or outlier presence. We present here how Angle Distribution of the Loading Subspaces (ADLS) can be used to estimate the number of PCs based on the variability of loading subspace across bootstrap resamples. Based on comprehensive comparisons with other well-known methods applied on simulated dataset, we show that ADLS (1) may quantify the stability of a PCA model with several numbers of PCs simultaneously; (2) better estimate the appropriate number of PCs when compared with the cross-validation and scree plot methods, specifically for coherent data, and (3) facilitate integrated outlier detection, which we introduce in this manuscript. We, in addition, demonstrate how the analysis of different types of real-life spectroscopic datasets may benefit from these advantages of ADLS.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要