[HTML][HTML] The language of proteins: NLP, machine learning & protein sequences

D Ofer, N Brandes, M Linial - Computational and Structural Biotechnology …, 2021 - Elsevier
Natural language processing (NLP) is a field of computer science concerned with automated
text and language analysis. In recent years, following a series of breakthroughs in deep and …

DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts

Y Geffen, Y Ofran, R Unger - Bioinformatics, 2022 - academic.oup.com
Recently, deep learning models, initially developed in the field of natural language
processing (NLP), were applied successfully to analyze protein sequences. A major …

Linguistically inspired roadmap for building biologically reliable protein language models

MH Vu, R Akbar, PA Robert, B Swiatczak… - Nature Machine …, 2023 - nature.com
Deep neural-network-based language models (LMs) are increasingly applied to large-scale
protein sequence data to predict protein function. However, being largely black-box models …

Learning functional properties of proteins with language models

S Unsal, H Atas, M Albayrak, K Turhan… - Nature Machine …, 2022 - nature.com
Data-centric approaches have been used to develop predictive methods for elucidating
uncharacterized properties of proteins; however, studies indicate that these methods should …

Protein language models and structure prediction: Connection and progression

B Hu, J Xia, J Zheng, C Tan, Y Huang, Y Xu… - arXiv preprint arXiv …, 2022 - arxiv.org
The prediction of protein structures from sequences is an important task for function
prediction, drug design, and related biological processes understanding. Recent advances …

Prottrans: Toward understanding the language of life through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …

Bilingual language model for protein sequence and structure

M Heinzinger, K Weissenow, JG Sanchez, A Henkel… - bioRxiv, 2023 - biorxiv.org
Advanced Artificial Intelligence (AI) enabled large language models (LLMs) to revolutionize
Natural Language Processing (NLP). Their adaptation to protein sequences spawned the …

Learning the protein language: Evolution, structure, and function

T Bepler, B Berger - Cell systems, 2021 - cell.com
Language models have recently emerged as a powerful machine-learning approach for
distilling information from massive protein sequence databases. From readily available …

MSA transformer

RM Rao, J Liu, R Verkuil, J Meier… - International …, 2021 - proceedings.mlr.press
Unsupervised protein language models trained across millions of diverse sequences learn
structure and function of proteins. Protein language models studied to date have been …

Codon language embeddings provide strong signals for use in protein engineering

C Outeiral, CM Deane - Nature Machine Intelligence, 2024 - nature.com
Protein representations from deep language models have yielded state-of-the-art
performance across many tasks in computational protein engineering. In recent years …