Bray, James Edward;
(2001)
Predicting the structure and function of genomic sequences using the CATH structural database.
Doctoral thesis (Ph.D), UCL (University College London).
![]() |
Text
Predicting_the_structure_and_f.pdf Download (23MB) |
Abstract
The field of bioinformatics faces the challenge of reliably annotating genomic sequences with structural and functional information. Structure classification databases are now sufficiently populated to provide a framework for meeting this challenge. This thesis focuses on the superfamily level of structural classification that groups together distantly related proteins that have evolved from a common ancestor. In order to cope with the functional diversity that occurs at the structural superfamily level, sequences have been classified into functionally related protein families that can serve as the basis for genome annotation. Knowledge of the key structural and functional features of structural superfamilies provides valuable insights for accurately transferring biological information. This thesis describes the development of two new structure-based resources that enhance the ability of the CATH structural database to annotate genomic sequences. Firstly, the CATH Dictionary of Homologous Superfamilies (DHS) presents functionally annotated structural alignments for distantly related domains. Key residues can be identified and used diagnostically for validating the results of sequence search algorithms. Secondly, the CATH Protein Family Database (CATH-PFDB) integrates sequence and structure by assigning genomic sequences to structural superfamilies. The sequences within each superfamily are further clustered into families sharing close functional similarity. Extensive benchmarking of this sequence library using pairwise and profile search algorithms showed that both approaches can used to reliably identify distantly related genomic sequences. A protocol for analysing the quality of three-dimensional protein models derived from distantly related proteins has also been developed. Residue environment scores from the SSAP structure comparison algorithm have been used to identify well- modelled structural fragments through histogram and coverage plots. This facilitates the assessment of structure prediction and modelling algorithms that are vital for accurately transferring structural data to genomic sequences. This work was generously supported by the Biotechnology and Biological Sciences Research Council.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Predicting the structure and function of genomic sequences using the CATH structural database |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
Keywords: | Biological sciences; CATH structural database |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10102945 |
Archive Staff Only
![]() |
View Item |