Machine learning methods for small data challenges in molecular science

B Dou, Z Zhu, E Merkurjev, L Ke, L Chen… - Chemical …, 2023 - ACS Publications
Small data are often used in scientific and engineering research due to the presence of
various constraints, such as time, cost, ethics, privacy, security, and technical limitations in …

Linguistically inspired roadmap for building biologically reliable protein language models

MH Vu, R Akbar, PA Robert, B Swiatczak… - Nature Machine …, 2023 - nature.com
Deep neural-network-based language models (LMs) are increasingly applied to large-scale
protein sequence data to predict protein function. However, being largely black-box models …

Protein representation learning by geometric structure pretraining

Z Zhang, M Xu, A Jamasb… - arXiv preprint arXiv …, 2022 - arxiv.org
Learning effective protein representations is critical in a variety of tasks in biology such as
predicting protein function or structure. Existing approaches usually pretrain protein …

Convolutions are competitive with transformers for protein sequence pretraining

KK Yang, N Fusi, AX Lu - Cell Systems, 2024 - cell.com
Pretrained protein sequence language models have been shown to improve the
performance of many prediction tasks and are now routinely integrated into bioinformatics …

Masked inverse folding with sequence transfer for protein representation learning

KK Yang, N Zanichelli, H Yeh - Protein Engineering, Design and …, 2023 - academic.oup.com
Self-supervised pretraining on protein sequences has led to state-of-the art performance on
protein function and fitness prediction. However, sequence-only methods ignore the rich …

Bidirectional learning for offline infinite-width model-based optimization

C Chen, Y Zhang, J Fu, XS Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
In offline model-based optimization, we strive to maximize a black-box objective function by
only leveraging a static dataset of designs and their scores. This problem setting arises in …

Importance-aware co-teaching for offline model-based optimization

Y Yuan, CS Chen, Z Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline model-based optimization aims to find a design that maximizes a property of interest
using only an offline dataset, with applications in robot, protein, and molecule design …

Pre-training protein encoder via siamese sequence-structure diffusion trajectory prediction

Z Zhang, M Xu, AC Lozano… - Advances in …, 2024 - proceedings.neurips.cc
Self-supervised pre-training methods on proteins have recently gained attention, with most
approaches focusing on either protein sequences or structures, neglecting the exploration of …

Injecting multimodal information into rigid protein docking via bi-level optimization

R Wang, Y Sun, Y Luo, S Li, C Yang… - Advances in …, 2023 - proceedings.neurips.cc
The structure of protein-protein complexes is critical for understanding binding dynamics,
biological mechanisms, and intervention strategies. Rigid protein docking, a fundamental …

Parallel-mentoring for offline model-based optimization

CS Chen, C Beckham, Z Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study offline model-based optimization to maximize a black-box objective function with a
static dataset of designs and scores. These designs encompass a variety of domains …