[HTML][HTML] The language of proteins: NLP, machine learning & protein sequences

D Ofer, N Brandes, M Linial - Computational and Structural Biotechnology …, 2021 - Elsevier
Natural language processing (NLP) is a field of computer science concerned with automated
text and language analysis. In recent years, following a series of breakthroughs in deep and …

Protein sequence design with deep generative models

Z Wu, KE Johnston, FH Arnold, KK Yang - Current opinion in chemical …, 2021 - Elsevier
Protein engineering seeks to identify protein sequences with optimized properties. When
guided by machine learning, protein sequence generation methods can draw on prior …

Large language models generate functional protein sequences across diverse families

A Madani, B Krause, ER Greene, S Subramanian… - Nature …, 2023 - nature.com
Deep-learning language models have shown promise in various biotechnological
applications, including protein design and engineering. Here we describe ProGen, a …

SignalP 6.0 predicts all five types of signal peptides using protein language models

F Teufel, JJ Almagro Armenteros, AR Johansen… - Nature …, 2022 - nature.com
Signal peptides (SPs) are short amino acid sequences that control protein secretion and
translocation in all living organisms. SPs can be predicted from sequence data, but existing …

Advances in machine learning for directed evolution

BJ Wittmann, KE Johnston, Z Wu, FH Arnold - Current opinion in structural …, 2021 - Elsevier
Machine learning (ML) can expedite directed evolution by allowing researchers to move
expensive experimental screens in silico. Gathering sequence-function data for training ML …

Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies

R Akbar, H Bashour, P Rawat, PA Robert, E Smorodina… - MAbs, 2022 - Taylor & Francis
Although the therapeutic efficacy and commercial success of monoclonal antibodies (mAbs)
are tremendous, the design and discovery of new candidates remain a time and cost …

A roadmap for metagenomic enzyme discovery

SL Robinson, J Piel, S Sunagawa - Natural Product Reports, 2021 - pubs.rsc.org
Covering: up to 2021 Metagenomics has yielded massive amounts of sequencing data
offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While …

Machine learning to navigate fitness landscapes for protein engineering

CR Freschlin, SA Fahlberg, PA Romero - Current opinion in biotechnology, 2022 - Elsevier
Machine learning (ML) is revolutionizing our ability to understand and predict the complex
relationships between protein sequence, structure, and function. Predictive sequence …

Protein design via deep learning

W Ding, K Nakai, H Gong - Briefings in bioinformatics, 2022 - academic.oup.com
Proteins with desired functions and properties are important in fields like nanotechnology
and biomedicine. De novo protein design enables the production of previously unseen …

TULIP: A transformer-based unsupervised language model for interacting peptides and T cell receptors that generalizes to unseen epitopes

B Meynard-Piganeau, C Feinauer… - Proceedings of the …, 2024 - National Acad Sciences
The accurate prediction of binding between T cell receptors (TCR) and their cognate
epitopes is key to understanding the adaptive immune response and developing …