UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Clustering of protein domains for functional and evolutionary studies.

Goldstein, P; Zucko, J; Vujaklija, D; Krisko, A; Hranueli, D; Long, PF; Etchebest, C; ... Cullum, J; + view all (2009) Clustering of protein domains for functional and evolutionary studies. BMC Bioinformatics , 10 (Februa) , Article 335. 10.1186/1471-2105-10-335. Green open access

[thumbnail of Longprotein.pdf]
Preview
PDF
Longprotein.pdf

Download (638kB)

Abstract

Background: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.

Type: Article
Title: Clustering of protein domains for functional and evolutionary studies.
Open access status: An open access version is available from UCL Discovery
DOI: 10.1186/1471-2105-10-335
Publisher version: http://dx.doi.org./10.1186/1471-2105-10-335
Language: English
Additional information: © 2009 Goldstein et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification: UCL
URI: https://discovery-pp.ucl.ac.uk/id/eprint/1363854
Downloads since deposit
3,816Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item