Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R
Australian & New Zealand Journal of Statistics(2023)
Abstract
Semi-supervised learning is being extensively applied to estimate classifiers
from training data in which not all the labels of the feature vectors are
available. We present gmmsslm, an R package for estimating the Bayes'
classifier from such partially classified data in the case where the feature
vector has a multivariate Gaussian (normal) distribution in each of the
predefined classes. Our package implements a recently proposed Gaussian mixture
modelling framework that incorporates a missingness mechanism for the missing
labels in which the probability of a missing label is represented via a
logistic model with covariates that depend on the entropy of the feature
vector. Under this framework, it has been shown that the accuracy of the Bayes'
classifier formed from the Gaussian mixture model fitted to the partially
classified training data can even have lower error rate than if it were
estimated from the sample completely classified. This result was established in
the particular case of two Gaussian classes with a common covariance matrix.
Here, we focus on the effective implementation of an algorithm for multiple
Gaussian classes with arbitrary covariance matrices. A strategy for
initialising the algorithm is discussed and illustrated. The new package is
demonstrated on some real data.
MoreTranslated text
Key words
gaussian,semi-supervised,missing-data
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined