UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

High-dimensional non-Gaussian data analysis based on sample relationship

Ling, Yurong; (2022) High-dimensional non-Gaussian data analysis based on sample relationship. Doctoral thesis (Ph.D), University College London. Green open access

[thumbnail of Thesis.pdf]
Preview
Text
Thesis.pdf - Other

Download (9MB) | Preview

Abstract

High-dimensional data are omnipresent. Although many statistical methods developed for analysing high-dimensional data adopt the normality assumption, the Gaussian distribution could be a poor approximation of real data in many applications. In this thesis, we investigate how to properly analyse such high-dimensional non-Gaussian data. As quantifying sample relationships, such as measuring the inter-sample proximity and determining neighbours for samples, is an important step in numerous statistical approaches, this thesis develops three methods for analysing different high-dimensional non-Gaussian data types based on the sample relationship: dimension reduction for single cell RNA-sequencing data with missingness with a proposed proximity measure, dimension reduction for data of small counts with a developed proximity measure, and modelling skewed survival data with a proposed procedure of identifying neighbours for samples. In chapter 3, I develop an unbiased estimator of the Gram matrix, which characterises the proximity between samples. The proposed estimator improves a broad spectrum of dimension reduction methods when applied to single cell RNA-sequencing data with missingness. In addition, the consequences of directly applying existing dimension reduction methods to data with missingness are empirically and theoretically clarified. In chapter 4, I develop a dissimilarity measure for count data with an excess of zeros based on the Kullback-Leibler divergence and the empirical Bayes estimators. The proposed measure is shown to have better discriminative power compared with other popular measures. The proposed measure boosts the performance of standard dimension reduction methods on count data containing many zeros. In chapter 5, I clarify that graphs derived from features themselves can be beneficial for the analysis of high-dimensional survival data when used in graph convolutional networks. Besides, a sequential forward floating selection algorithm is proposed to simultaneously perform survival analysis and unveil the local neighbourhoods of samples with the aid of graph convolutional networks.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: High-dimensional non-Gaussian data analysis based on sample relationship
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2022. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10149318
Downloads since deposit
7,854Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item