Lau, Andy M;
Kandathil, Shaun M;
Jones, David T;
(2023)
Merizo: a rapid and accurate protein domain segmentation method using invariant point attention.
Nature Communications
, 14
, Article 8445. 10.1038/s41467-023-43934-4.
Preview |
Text
lau_merizo_supp.pdf - Accepted Version Download (14MB) | Preview |
Preview |
Text
Lau_VoR_41467_2023_Article_43934.pdf Download (11MB) | Preview |
Abstract
The AlphaFold Protein Structure Database, containing predictions for over 200 million proteins, has been met with enthusiasm over its potential in enriching structural biological research and beyond. Currently, access to the database is precluded by an urgent need for tools that allow the efficient traversal, discovery, and documentation of its contents. Identifying domain regions in the database is a non-trivial endeavour and doing so will aid our understanding of protein structure and function, while facilitating drug discovery and comparative genomics. Here, we describe a deep learning method for domain segmentation called Merizo, which learns to cluster residues into domains in a bottom-up manner. Merizo is trained on CATH domains and finetuned on AlphaFold2 models via self-distillation, enabling it to be applied to both experimental and AlphaFold2 models. As proof of concept, we apply Merizo to the human proteome, identifying 40,818 putative domains that can be matched to CATH representative domains.
Type: | Article |
---|---|
Title: | Merizo: a rapid and accurate protein domain segmentation method using invariant point attention |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1038/s41467-023-43934-4 |
Publisher version: | https://doi.org/10.1038/s41467-023-43934-4 |
Language: | English |
Additional information: | © The Author(s), 2023. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0/ |
Keywords: | Machine learning, Molecular modelling, Protein structure predictions |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10179102 |
Archive Staff Only
![]() |
View Item |