Briand, S;
Dessimoz, C;
El-Mabrouk, N;
Lafond, M;
Lobinska, G;
(2020)
A generalized Robinson-Foulds distance for labeled trees.
BMC Genomics
, 21
(Suppl 10)
, Article 779. 10.1186/s12864-020-07011-0.
Preview |
Text
s12864-020-07011-0.pdf - Published Version Download (2MB) | Preview |
Abstract
Background: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. Conclusions: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions. Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf.
Type: | Article |
---|---|
Title: | A generalized Robinson-Foulds distance for labeled trees |
Location: | England |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1186/s12864-020-07011-0 |
Publisher version: | https://doi.org/10.1186/s12864-020-07011-0 |
Language: | English |
Additional information: | This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Keywords: | Edit distance, Labeled trees, Robinson-Foulds, Tree metric |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10115921 |
Archive Staff Only
View Item |