Non-Bayesian Parametric Missing-Mass Estimation

IEEE TRANSACTIONS ON SIGNAL PROCESSING(2022)

引用 5|浏览11
暂无评分
摘要
We consider the classical problem of missing-mass estimation, which deals with estimating the total probability of unseen elements in a sample. The missing-mass estimation problem has various applications in machine learning, statistics, language processing, ecology, sensor networks, and others. The naive, constrained maximum likelihood (CML) estimator is inappropriate for this problem since it tends to overestimate the probability of the observed elements. Similarly, the constrained Cramér-Rao bound (CCRB), which is a lower bound on the mean-squared-error (MSE) of unbiased estimators of the entire probability mass function (pmf) vector, does not provide a relevant bound for missing-mass estimation. In this paper, we introduce a non-Bayesian parametric model of the problem of missing-mass estimation. We introduce the concept of missing-mass unbiasedness by using the Lehmann unbiasedness definition. We derive a non-Bayesian CCRB-type lower bound on the missing-mass MSE (mmMSE), named the missing-mass CCRB (mmCCRB), based on the missing-mass unbiasedness. The proposed mmCCRB can be used for system design and for the performance evaluation of existing estimators. Moreover, based on the mmCCRB, we propose a new method to improve estimators by an iterative missing-mass Fisher-scoring method. Finally, we demonstrate via numerical simulations that the biased mmCCRB is a valid and informative lower bound on the mmMSE of state-of-the-art estimators for this problem: the CML, asymptotic profile maximum likelihood (aPML), Good-Turing, and Laplace estimators. We also show that the mmMSE and missing-mass bias of the Laplace estimator is reduced by using the new missing-mass Fisher-scoring method.
更多
查看译文
关键词
Non-Bayesian estimation, Good-Turing estimator, probability of missing mass, constrained Cramer-Rao bound, Lehmann unbiasedness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要