Hu, Zhenzheng (Helen);
(2021)
Dirichlet process probit misclassification mixture model for misclassified binary data.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Hu_thesis_corr2.pdf - Accepted Version Download (31MB) | Preview |
Abstract
Mislabelling or misclassification in binary data refers to incorrectly labelled responses and could arise due to problems in the labelling process or imperfect evidence for labelling. The latent misclassification process could take a variety of forms depending on how it relates to the true labels as well as the associated covariates of each response. Modelling under misclas- sification is challenging because of the inherent identifiability issues and ignoring misclassi- fication could lead to inaccurate inferences. Statistical methods addressing misclassification have appeared in the literature in a variety of contexts, sometimes using di↵erent terminology, and often focusing on a particular application. In this thesis, we first cast existing statistical methods under a unified framework and later propose a new flexible Bayesian mixture model for modelling misclassified binary data - the Dirichlet process probit misclassification mix- ture model. The main idea is to assume a Dirichlet process mixture model over the covariate space and misclassification probabilities. This naturally partitions observations into clusters where di↵erent clusters can possess di↵erent misclassification probabilities. The clustering uses both covariates and observed responses and covariates are approximated using a Dirich- let mixture of multivariate Gaussians. The incorporation of cluster-specific misclassification probabilities takes into consideration of the misclassification in the observed responses. An e cient Gibbs-like algorithm is available based on the truncated approximation of Dirichlet process and the stick-breaking construction. This thesis is motivated by the pervasiveness of label noise in a wide variety of applica- tions, coupled with the lack of unified statistical exposition and comparison of all available methods. The structure of the thesis as follows. Chapter 1 introduces the problem of label misclassification and reviews existing methods for modelling misclassification in binary data. Chapter 2 discusses the basic of Bayesian nonparametrics, Dirichlet process, Dirichlet pro- cess mixture models, and posterior inference procedures for Dirichlet process mixture models, which are essential components of the Dirichlet process probit misclassification mixtures that we propose later. Chapter 3 describes our proposed model for modelling mislabelled binary data. Chapter 4 presents experimental studies on our proposed model using a real dataset. Section 5 wraps up the discussion on the topic and include final remarks such as possible model extension.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Dirichlet process probit misclassification mixture model for misclassified binary data |
Event: | UCL (University College London) |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2021. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10140643 |
Archive Staff Only
View Item |