Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

V Chen, M Yang, W Cui, JS Kim, A Talwalkar, J Ma - Nature methods, 2024 - nature.com
Recent advances in machine learning have enabled the development of next-generation
predictive models for complex computational biology problems, thereby spurring the use of …

[HTML][HTML] How to build the virtual cell with artificial intelligence: Priorities and opportunities

C Bunne, Y Roohani, Y Rosen, A Gupta, X Zhang… - Cell, 2024 - cell.com
Cells are essential to understanding health and disease, yet traditional models fall short of
modeling and simulating their function and behavior. Advances in AI and omics offer …

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc
Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

Sequence modeling and design from molecular to genome scale with Evo

E Nguyen, M Poli, MG Durrant, B Kang, D Katrekar… - Science, 2024 - science.org
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …

<? sty\usepackage {wasysym}?> Bilingual language model for protein sequence and structure

M Heinzinger, K Weissenow… - NAR Genomics and …, 2024 - academic.oup.com
Adapting language models to protein sequences spawned the development of powerful
protein language models (pLMs). Concurrently, AlphaFold2 broke through in protein …

Fine-tuning protein language models boosts predictions across diverse tasks

R Schmirler, M Heinzinger, B Rost - Nature Communications, 2024 - nature.com
Prediction methods inputting embeddings from protein language models have reached or
even surpassed state-of-the-art performance on many protein prediction tasks. In natural …

Dnabert-2: Efficient foundation model and benchmark for multi-species genome

Z Zhou, Y Ji, W Li, P Dutta, R Davuluri, H Liu - arXiv preprint arXiv …, 2023 - arxiv.org
Decoding the linguistic intricacies of the genome is a crucial problem in biology, and pre-
trained foundational models such as DNABERT and Nucleotide Transformer have made …

Large language models (LLMs): survey, technical frameworks, and future challenges

P Kumar - Artificial Intelligence Review, 2024 - Springer
Artificial intelligence (AI) has significantly impacted various fields. Large language models
(LLMs) like GPT-4, BARD, PaLM, Megatron-Turing NLG, Jurassic-1 Jumbo etc., have …

GENA-LM: a family of open-source foundational DNA language models for long sequences

V Fishman, Y Kuratov, A Shmelev… - Nucleic Acids …, 2025 - academic.oup.com
Recent advancements in genomics, propelled by artificial intelligence, have unlocked
unprecedented capabilities in interpreting genomic sequences, mitigating the need for …

DNA language model GROVER learns sequence context in the human genome

M Sanabria, J Hirsch, PM Joubert… - Nature Machine …, 2024 - nature.com
Deep-learning models that learn a sense of language on DNA have achieved a high level of
performance on genome biological tasks. Genome sequences follow rules similar to natural …