Learning the Language of Protein Structure

B Gaujac, J Donà, L Copoiu, T Atkinson… - arXiv preprint arXiv …, 2024 - arxiv.org
Representation learning and\emph {de novo} generation of proteins are pivotal
computational biology tasks. Whilst natural language processing (NLP) techniques have
proven highly effective for protein sequence modelling, structure modelling presents a
complex challenge, primarily due to its continuous and three-dimensional nature. Motivated
by this discrepancy, we introduce an approach using a vector-quantized autoencoder that
effectively tokenizes protein structures into discrete representations. This method transforms …

Learning the Language of Protein Structures

J Gómez Sánchez - 2023 - mediatum.ub.tum.de
The AI revolution, which culminated recently in the release of GPT-4, not only impacted how
we process natural language but the very same principles could readily be transferred to
other sequential data such as protein sequences. Adapting these techniques to
computational biology changed the way we process proteins today fundamentally. By simply
feeding single amino acid sequences to large language models trained to parse and
understand protein sequences, researchers were able to build state-of...»
以上显示的是最相近的搜索结果。 查看全部搜索结果