Language models for biological research: a primer

E Simon, K Swanson, J Zou - Nature Methods, 2024 - nature.com
Abstract Language models are playing an increasingly important role in many areas of
artificial intelligence (AI) and computational biology. In this primer, we discuss the ways in …

A large-scale assessment of sequence database search tools for homology-based protein function prediction

C Zhang, L Freddolino - Briefings in Bioinformatics, 2024 - academic.oup.com
Sequence database searches followed by homology-based function transfer form one of the
oldest and most popular approaches for predicting protein functions, such as Gene Ontology …

Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone

L Pantolini, G Studer, J Pereira, J Durairaj… - …, 2024 - academic.oup.com
Motivation Language models are routinely used for text classification and generative tasks.
Recently, the same architectures were applied to protein sequences, unlocking powerful …

[HTML][HTML] The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction

C Zhang, Q Wang, Y Li, A Teng, G Hu, Q Wuyun… - …, 2024 - pmc.ncbi.nlm.nih.gov
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological
sciences, playing a pivotal role in predicting molecular structures and functions. With broad …

Protein embedding based alignment

BG Iovino, Y Ye - BMC bioinformatics, 2024 - Springer
Purpose Despite the many progresses with alignment algorithms, aligning divergent protein
sequences with less than 20–35% pairwise identity (so called" twilight zone") remains a …

[HTML][HTML] Detection of circular permutations by Protein Language Models

Y Hu, B Huang, CZ Zang, JJ Xu - Computational and Structural …, 2024 - Elsevier
Protein circular permutations are crucial for understanding protein evolution and
functionality. Traditional detection methods face challenges: sequence-based approaches …

Progress on the development of prediction tools for detecting disease causing mutations in proteins

MM Gromiha, M Pandey, A Kulandaisamy… - Computers in Biology …, 2025 - Elsevier
Proteins are involved in a variety of functions in living organisms. The mutation of amino acid
residues in a protein alters its structure, stability, binding, and function, with some mutations …

learnMSA2: deep protein multiple alignments with large language and hidden Markov models

F Becker, M Stanke - Bioinformatics, 2024 - academic.oup.com
Motivation For the alignment of large numbers of protein sequences, tools are predominant
that decide to align two residues using only simple prior knowledge, eg amino acid …

[HTML][HTML] Understanding and Therapeutic Application of Immune Response in Major Histocompatibility Complex (MHC) Diversity Using Multimodal Artificial Intelligence

Y Matsuzaka, R Yashiro - BioMedInformatics, 2024 - mdpi.com
Human Leukocyte Antigen (HLA) is like a device that monitors the internal environment of
the body. T lymphocytes immediately recognize the HLA molecules that are expressed on …

Exploiting protein language model sequence representations for repeat detection

K Qiu, S Dunin-Horkawicz, AN Lupas - bioRxiv, 2024 - biorxiv.org
Duplication is an essential evolutionary mechanism that operates at the scale of
chromosomes, large chunks of DNA sequences, genes, protein domains, and shorter motifs …