computational biology tasks. Whilst natural language processing (NLP) techniques have
proven highly effective for protein sequence modelling, structure modelling presents a
complex challenge, primarily due to its continuous and three-dimensional nature. Motivated
by this discrepancy, we introduce an approach using a vector-quantized autoencoder that
effectively tokenizes protein structures into discrete representations. This method transforms …