Powered Dirichlet Process - Controlling the "Rich-Get-Richer" Assumption in Bayesian Clustering

Gael Poux-Medard,Julien Velcin,Sabine Loudcher

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT I（2023）

Cited 0|Views1

No score

Abstract

The Dirichlet process is one of the most widely used priors in Bayesian clustering. This process allows for a nonparametric estimation of the number of clusters when partitioning datasets. The "rich-getricher" property is a key feature of this process, and transcribes that the a priori probability for a cluster to get selected dependent linearly on its population. In this paper, we show that such hypothesis is not necessarily optimal. We derive the Powered Dirichlet Process as a generalization of the Dirichlet-Multinomial distribution as an answer to this problem. We then derive some of its fundamental properties (expected number of clusters, convergence). Unlike state-of-the-art efforts in this direction, this new formulation allows for direct control of the importance of the "rich-get-richer" prior. We confront our proposition to several simulated and real-world datasets, and confirm that our formulation allows for significantly better results in both cases.

Translated text

Key words

Dirichlet processes,Rich-get-richer,Discrete mathematics,Clustering,Bayesian prior

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined