UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Merizo: a rapid and accurate protein domain segmentation method using invariant point attention

Lau, Andy M; Kandathil, Shaun M; Jones, David T; (2023) Merizo: a rapid and accurate protein domain segmentation method using invariant point attention. Nature Communications , 14 , Article 8445. 10.1038/s41467-023-43934-4. Green open access

[thumbnail of lau_merizo_supp.pdf]
Preview
Text
lau_merizo_supp.pdf - Accepted Version

Download (14MB) | Preview
[thumbnail of Lau_VoR_41467_2023_Article_43934.pdf]
Preview
Text
Lau_VoR_41467_2023_Article_43934.pdf

Download (11MB) | Preview

Abstract

The AlphaFold Protein Structure Database, containing predictions for over 200 million proteins, has been met with enthusiasm over its potential in enriching structural biological research and beyond. Currently, access to the database is precluded by an urgent need for tools that allow the efficient traversal, discovery, and documentation of its contents. Identifying domain regions in the database is a non-trivial endeavour and doing so will aid our understanding of protein structure and function, while facilitating drug discovery and comparative genomics. Here, we describe a deep learning method for domain segmentation called Merizo, which learns to cluster residues into domains in a bottom-up manner. Merizo is trained on CATH domains and finetuned on AlphaFold2 models via self-distillation, enabling it to be applied to both experimental and AlphaFold2 models. As proof of concept, we apply Merizo to the human proteome, identifying 40,818 putative domains that can be matched to CATH representative domains.

Type: Article
Title: Merizo: a rapid and accurate protein domain segmentation method using invariant point attention
Open access status: An open access version is available from UCL Discovery
DOI: 10.1038/s41467-023-43934-4
Publisher version: https://doi.org/10.1038/s41467-023-43934-4
Language: English
Additional information: © The Author(s), 2023. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0/
Keywords: Machine learning, Molecular modelling, Protein structure predictions
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10179102
Downloads since deposit
828Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item