UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

A generalized Robinson-Foulds distance for labeled trees

Briand, S; Dessimoz, C; El-Mabrouk, N; Lafond, M; Lobinska, G; (2020) A generalized Robinson-Foulds distance for labeled trees. BMC Genomics , 21 (Suppl 10) , Article 779. 10.1186/s12864-020-07011-0. Green open access

[thumbnail of s12864-020-07011-0.pdf]
Preview
Text
s12864-020-07011-0.pdf - Published Version

Download (2MB) | Preview

Abstract

Background: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. Conclusions: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions. Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf.

Type: Article
Title: A generalized Robinson-Foulds distance for labeled trees
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1186/s12864-020-07011-0
Publisher version: https://doi.org/10.1186/s12864-020-07011-0
Language: English
Additional information: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Keywords: Edit distance, Labeled trees, Robinson-Foulds, Tree metric
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10115921
Downloads since deposit
2,736Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item