Self-supervised learning in medicine and healthcare

R Krishnan, P Rajpurkar, EJ Topol - Nature Biomedical Engineering, 2022 - nature.com
The development of medical applications of machine learning has required manual
annotation of data, often by medical experts. Yet, the availability of large-scale unannotated …

Controllable protein design with language models

N Ferruz, B Höcker - Nature Machine Intelligence, 2022 - nature.com
The twenty-first century is presenting humankind with unprecedented environmental and
medical challenges. The ability to design novel proteins tailored for specific purposes would …

Accurate proteome-wide missense variant effect prediction with AlphaMissense

J Cheng, G Novati, J Pan, C Bycroft, A Žemgulytė… - Science, 2023 - science.org
The vast majority of missense variants observed in the human genome are of unknown
clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on …

Large language models struggle to learn long-tail knowledge

N Kandpal, H Deng, A Roberts… - International …, 2023 - proceedings.mlr.press
The Internet contains a wealth of knowledge—from the birthdays of historical figures to
tutorials on how to code—all of which may be learned by language models. However, while …

Evolutionary-scale prediction of atomic-level protein structure with a language model

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu, N Smetanin… - Science, 2023 - science.org
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …

Generalized biomolecular modeling and design with RoseTTAFold All-Atom

R Krishna, J Wang, W Ahern, P Sturmfels, P Venkatesh… - Science, 2024 - science.org
Deep-learning methods have revolutionized protein structure prediction and design but are
presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which …

Predicting multiple conformations via sequence clustering and AlphaFold2

HK Wayment-Steele, A Ojoawo, R Otten, JM Apitz… - Nature, 2024 - nature.com
Abstract AlphaFold2 (ref.) has revolutionized structural biology by accurately predicting
single structures of proteins. However, a protein's biological function often depends on …

Learning inverse folding from millions of predicted structures

C Hsu, R Verkuil, J Liu, Z Lin, B Hie… - International …, 2022 - proceedings.mlr.press
We consider the problem of predicting a protein sequence from its backbone atom
coordinates. Machine learning approaches to this problem to date have been limited by the …

[PDF][PDF] Language models of protein sequences at the scale of evolution enable accurate structure prediction

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu… - BioRxiv, 2022 - biorxiv.org
Large language models have recently been shown to develop emergent capabilities with
scale, going beyond simple pattern matching to perform higher level reasoning and …

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models

V Thumuluri, JJ Almagro Armenteros… - Nucleic acids …, 2022 - academic.oup.com
The prediction of protein subcellular localization is of great relevance for proteomics
research. Here, we propose an update to the popular tool DeepLoc with multi-localization …