Ward, Jonathan James;
(2005)
Kernel-based classification of protein structure and function from amino acids sequences.
Doctoral thesis , UCL (University College London).
Preview |
Text
Ward.Jonathan.James_thesis.pdf Download (28MB) | Preview |
Abstract
The thesis describes the application of kernel methods and, in particular, the support vector machine (SVM) to the classification of protein structure and function. The thesis is divided into two related halves with chapters 2 and 3 containing descriptions of methods for predicting different aspects of protein structure. Chapter 4 investigates the functions of disorder in the proteome of a model eukaryote and Chapter 5 describes algorithms and data sources for inferring protein function. The data sources include structure predictions and other properties that can be derived directly from amino acid sequences. Chapter 2 describes a new method for the prediction of secondary structure using an SVM learning algorithm. This is presented as a guide to the application of SVMs to problems in bioinformatics, and includes a discussion of the positive and negative aspects of the technique. The final prediction method is shown to have comparable performance to several of the most accurate modern methods. The third chapter discusses the development of a method to recognize native disorder from amino acid sequences. This predictor (DISOPRED2) is shown to be the most accurate contemporary method on targets from the fifth CASP experiment. The false positive rate of DISOPRED2 is determined using ordered structures from the Protein Data Bank, and the classifier is then used to estimate the frequency of disorder in complete genomes. The final part of this chapter presents the design and implementation of a publicly-available web service for disorder prediction. The fourth chapter describes the use of DISOPRED2 to investigate the functional annotations that are associated with long predictions of disorder in the proteome of the model organism Saccharomyces cerevisiae. This chapter also provides several biochemical and evolutionary explanations for the disparity in the predicted frequencies of disorder between eukaryote and prokaryote proteomes. The chapter also demonstrates that the boundaries between structural domain have a propensity toward being predicted as disordered by DISOPRED2. The final research chapter discusses the development of machine learning methods for determining the function of unannotated proteins. Individual classifiers are trained using phylogenetic profiles, structure predictions and simple features derived from the amino acid sequence to predict the function of yeast proteins in the absence of significant sequence similarity.
Type: | Thesis (Doctoral) |
---|---|
Title: | Kernel-based classification of protein structure and function from amino acids sequences |
Identifier: | PQ ETD:602823 |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
UCL classification: | UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science > CoMPLEX: Mat&Phys in Life Sci and Exp Bio |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/1446881 |
Archive Staff Only
View Item |