A guide to machine learning for biologists

JG Greener, SM Kandathil, L Moffat… - Nature reviews Molecular …, 2022 - nature.com
The expanding scale and inherent complexity of biological data have encouraged a growing
use of machine learning in biology to build informative and predictive models of the …

[HTML][HTML] AlphaFold and implications for intrinsically disordered proteins

KM Ruff, RV Pappu - Journal of molecular biology, 2021 - Elsevier
Accurate predictions of the three-dimensional structures of proteins from their amino acid
sequences have come of age. AlphaFold, a deep learning-based approach to protein …

Accurate proteome-wide missense variant effect prediction with AlphaMissense

J Cheng, G Novati, J Pan, C Bycroft, A Žemgulytė… - Science, 2023 - science.org
The vast majority of missense variants observed in the human genome are of unknown
clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on …

Evolutionary-scale prediction of atomic-level protein structure with a language model

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu, N Smetanin… - Science, 2023 - science.org
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …

[PDF][PDF] Language models of protein sequences at the scale of evolution enable accurate structure prediction

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu… - BioRxiv, 2022 - biorxiv.org
Large language models have recently been shown to develop emergent capabilities with
scale, going beyond simple pattern matching to perform higher level reasoning and …

Genome-wide prediction of disease variant effects with a deep protein language model

N Brandes, G Goldman, CH Wang, CJ Ye, V Ntranos - Nature Genetics, 2023 - nature.com
Predicting the effects of coding variants is a major challenge. While recent deep-learning
models have improved variant effect prediction accuracy, they cannot analyze all coding …

Disease variant prediction with deep generative models of evolutionary data

J Frazer, P Notin, M Dias, A Gomez, JK Min, K Brock… - Nature, 2021 - nature.com
Quantifying the pathogenicity of protein variants in human disease-related genes would
have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of …

Efficient evolution of human antibodies from general protein language models

BL Hie, VR Shanker, D Xu, TUJ Bruun… - Nature …, 2024 - nature.com
Natural evolution must explore a vast landscape of possible sequences for desirable yet
rare mutations, suggesting that learning from natural evolutionary strategies could guide …

[HTML][HTML] Progen2: exploring the boundaries of protein language models

E Nijkamp, JA Ruffolo, EN Weinstein, N Naik, A Madani - Cell systems, 2023 - cell.com
Attention-based models trained on protein sequences have demonstrated incredible
success at classification and generation tasks relevant for artificial-intelligence-driven …

Language models enable zero-shot prediction of the effects of mutations on protein function

J Meier, R Rao, R Verkuil, J Liu… - Advances in neural …, 2021 - proceedings.neurips.cc
Modeling the effect of sequence variation on function is a fundamental problem for
understanding and designing proteins. Since evolution encodes information about function …