Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms
arxiv(2024)
摘要
Randomized algorithms can be used to speed up the analysis of large datasets.
In this paper, we develop a unified methodology for statistical inference via
randomized sketching or projections in two of the most fundamental problems in
multivariate statistical analysis: least squares and PCA. The methodology
applies to fixed datasets – i.e., is data-conditional – and the only
randomness is due to the randomized algorithm. We propose statistical inference
methods for a broad range of sketching distributions, such as the subsampled
randomized Hadamard transform (SRHT), Sparse Sign Embeddings (SSE) and
CountSketch, sketching matrices with i.i.d. entries, and uniform subsampling.
To our knowledge, no comparable methods are available for SSE and for SRHT in
PCA. Our novel theoretical approach rests on showing the asymptotic normality
of certain quadratic forms. As a contribution of broader interest, we show
central limit theorems for quadratic forms of the SRHT, relying on a novel
proof via a dyadic expansion that leverages the recursive structure of the
Hadamard transform. Numerical experiments using both synthetic and empirical
datasets support the efficacy of our methods, and in particular suggest that
sketching methods can have better computation-estimation tradeoffs than
recently proposed optimal subsampling methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要