Hardness of Maximum Likelihood Learning of DPPs.

Annual Conference on Computational Learning Theory(2022)

引用 0|浏览26
暂无评分
摘要
Determinantal Point Processes (DPPs) are a widely used probabilistic model for negatively correlated sets. DPPs are used in Machine Learning applications to select a diverse, yet representative subset of data. In these applications, the parameters of the DPP need to be fit to match the data; typically, we seek a set of parameters that maximize the likelihood of the data. The algorithms used for this task either optimize over a limited family of DPPs, or else use local improvement heuristics that do not provide theoretical guarantees of optimality. It is natural to ask if there exist efficient algorithms for finding a maximum likelihood DPP model for a given data set. In seminal work on DPPs in Machine Learning, Kulesza conjectured in his PhD Thesis (2012) that the problem is NP-complete. In this work we prove Kulesza’s conjecture: we prove moreover, that even computing a $1-\frac{1}{\mathrm{poly} \log N}$-approximation to the maximum log-likelihood of a DPP on a set of $N$ items is NP-complete. At the same time, we also obtain the first polynomial-time algorithm obtaining a nontrivial worst-case approximation to the optimal likelihood: we present a polynomial-time $1/\log m$-approximation algorithm (for data sets of size $m$), which moreover obtains a $1-\frac{1}{\log N}$-approximation if all $N$ elements appear in a $O(1/N)$-fraction of the subsets. In terms of techniques, the hardness result reduces to solving a gap instance of a “vector coloring" problem on a hypergraph obtained from an adaptation of the constructions of Bogdanov, Obata and Trevisan (FOCS 2002), using the strong expanders of Alon and Capalbo (FOCS 2007).
更多
查看译文
关键词
maximum likelihood learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要