UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

A phylogenetic approach for weighting genetic sequences

De Maio, N; Alekseyenko, AV; Coleman-Smith, WJ; Pardi, F; Suchard, MA; Tamuri, AU; Truszkowski, J; (2021) A phylogenetic approach for weighting genetic sequences. BMC Bioinformatics , 22 , Article 285. 10.1186/s12859-021-04183-8. Green open access

[thumbnail of A phylogenetic approach for weighting genetic sequences.pdf]
Preview
Text
A phylogenetic approach for weighting genetic sequences.pdf - Published Version

Download (3MB) | Preview

Abstract

BACKGROUND: Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are ‘novel’ compared to the others in the same dataset, and low weights to sequences that are over-represented. RESULTS: We formalise this principle by rigorously defining the evolutionary ‘novelty’ of a sequence within an alignment. This results in new sequence weights that we call ‘phylogenetic novelty scores’. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column—important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes. CONCLUSIONS: Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.

Type: Article
Title: A phylogenetic approach for weighting genetic sequences
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1186/s12859-021-04183-8
Publisher version: https://doi.org/10.1186/s12859-021-04183-8
Language: English
Additional information: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Keywords: Alignment, Conservation scores, Phylogenetics, Protein profile, Sequence weights, Algorithms, Computational Biology, Phylogeny, Sequence Alignment
UCL classification: UCL
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10200985
Downloads since deposit
54Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item